Welcome to the Q&A Forum

Travis_Bailey

Looks like that, or some approximation thereof has you sorted. I would just like to add that you should keep an eye on Webmaster Tools.

Travis_Bailey

I'm hesitant to say; "Do X." because I'm not really sure what will happen - with the redirect plugin in the mix. I imagine a lot, if not all of the subdomain folders and pages have already been redirected via the plugin. So I imagine the path of least disaster at the moment is just redirecting the subdomain (sub.domain.com) to the main domain (www.domain.com) alone.

I could be totally wrong, but this one is weird.

Test out the rule and then push live. Here is the code to redirect just the subdomain to just the www domain:

RewriteCond %{HTTP_HOST} ^blog.domain.com$ [NC]
RewriteCond %{REQUEST_URI} ^/?$
RewriteRule .* http://www.domain.com [R=301,L]

Double check it, triple check it and then push live. Keep a very close eye on it. I really hope we don't end up with a loop.

Travis_Bailey

This particular situation won't sort itself out. There's a sub involved and I suspect it's a rewrite rule that shouldn't be there. The developer appears to be somewhat sophisticated as they're using X-FRAME-OPTIONS in a way that doesn't allow iFrames to work outside of the domain.

So who knows what goodies await in .htaccess.

Travis_Bailey

Okay, here's what I got:

The plugin supposedly operates independently of .htaccess. So taking that at face value, I don't think you're going to get what you need out of the plugin.

I would imagine the .htaccess file is much the same as it was when the site launched, or when it was last modified by the developer. So that file is likely going to need editing to achieve what you need. However, that file isn't something you just want to play with in a live environment.

And it's not something anyone in their right mind would blindly say; "Yeah just copy and paste this rule!"

I would talk to Dale and see if he has a block of free time coming up.

Travis_Bailey

You mentioned in the above thread that you're using a redirection plugin. What is it's name? Beyond that Yoast and All in One both allow you to edit htaccess entries. (I despise that feature, btw.)

Travis_Bailey

I'm going to guess that you have something that looks like this in your .htaccess file:
RewriteRule ^blog/$ http://blog.website.com [L,NC,R=301]

WARNING

You can knock your site down with the slightest syntax error when you mess with the htaccess file. Proceed with caution.

Read this first.

Let us know what you find.

Travis_Bailey

It was kind of humorous, at first. It's now showing as returning organic traffic. Direct link to screencap from the original post:

1WnZK0p.png

Travis_Bailey

I would generally dispense with the concern over metrics, considering the source. It sounds like a great citation source, regardless. Plus it may do what links were intended to do in the first place: Drive Traffic

OSE, aHrefs, Majestic and the like are just keyhole views into what's really going on. Albeit important keyhole views, but still limited insights into the big picture.

I would challenge that if one focuses less on granular metrics, and puts more attention into traffic and general relevancy; one would be happier with the results and have more time for generating similar results.

Travis_Bailey

Good to hear you may be getting closer to the root of the problem. Apologies that it took so long to get back to you here. I had 'things'.

I followed the steps and you should be able to determine the outcome. Spoiler Alert: No block, this time.

It's a whole other can of worms, but should you need more human testing on the cheap; you may find Mechanical Turk attractive. One could probably get a couple hundred participants for under a couple hundred dollars, with a task comparable to the one above.

Just a thought...

Travis_Bailey

In your position, I would want to know more about what I'm getting into as well. Before I have a contract, I would like to know what they've been doing over the last three years. There's a lot of time there where, potential, previous actions could help or hinder your efforts.

Did they disavow?
What did they (or a contractor) disavow, if anything?
If they 'performed a disavow', where is the file? (There's a possibility it wasn't properly formatted, or it may not have been submitted.)
Have they sent out link removal requests?
If so, what were the results?
Did they continue building low quality links after the fact? (History is a factor.)
If so, for how long?
Have they tried a reconsideration request after a, what you would deem sufficient, disavow/removal effort? (Though it may walk and quack like an algo/filter penalty, it could be manual.)

The above would be a few of my primary concerns before I started looking at anchor text ratios. If you've already covered those bases, good on you. Just let it be known, to everyone's general disinterest, that I said as much.

You may find that a lot of the heavy lifting is already done, but the execution was flawed at some critical point. Which may free resources toward building a better internet and generally making your client giddy. Easy peasy, right?

I agree with Ryan's second paragraph. Definitely under-promise and attempt to over-deliver. I haven't seen many sites that didn't have at least a chance at recovery, if money were no object. However, there are sites where it would be wise to start over from an economic perspective. (Time/Opportunity Cost+Actual Money)

It's that nearly three year long penalty that would give me pause, prior to jumping in. Again with the ratios, if there's been a disavow and you don't have the file; you're not looking at anything remotely accurate - until you go through the same process. Still, no one ever has the entire picture. It's various shades of confidence in what you can gather about the situation.

There. I made it two paragraphs without emoting. I can go play video games now.

Travis_Bailey

I can't really argue with log files, in most instances. Unfortunately, I didn't export crawl data. I used to irrationally horde that stuff, until I woke up one day and realized one of my drives was crammed full of spreadsheets I will never use again.

There may be some 'crawlability' issues, beyond the aggressive blocking practices. Though I managed to crawl 400+ URI before timeouts, after I throttled the crawl rate back the next day. Screaming Frog is very impressive, but Googlebot it ain't, even though it performs roughly the same function. Though, given enough RAM, it won't balk at magnitudes greater than the 400 or so URIs. (I've seen... things... ) And with default settings, Screaming Frog can easily handle tens of thousands of URI before it hits it's default RAM allocation limit.

It's more than likely worth your while to purchase an annual license at ~$150. That way, you get all the bells and whistles - though there is a stripped-down free version. There are other crawlers out there, but this one is the bee's knees. Plus you can run all kinds of theoretical crawl scenarios.

But moving along to the actual blocking, barring the crawler, I could foresee a number of legit use scenarios that would be comparable to my previous sessions. Planning night out > Pal sends link to site via whatever > Distracted by IM > Lose session in a sea of tabs > Search Google > Find Site > Phone call > Not Again... > Remember domain name > Blocked

Anyway, I just wanted to be sure that my IP isn't white listed, just unblocked. I could mess around all night trying to replicate it, without the crawling, just to find I 'could do no wrong'. XD

Otherwise it looks like this thread has become a contention of heuristics. I'm not trying to gang up on you here, but I would err on the side of plenty. Apt competition is difficult to overcome in obscurity. : )

Travis_Bailey

I'll PM my public IP through Moz. I don't really have any issue with that. Oddly enough, I'm still blocked though.

I thought an okay, though slightly annoying, middle ground would be to give me a chance to prove that I'm not a bot. It seems cases like mine may be few and far between, but it happened.

It turns out that our lovely friends at The Googles just released a new version of reCAPTCHA. It's a one-click-prove-you're-not-a-bot-buddy-okay-i-will-friend-who-you-calling-friend-buddy bot check. (One click - and a user can prove they aren't a bot - without super annoying squiggle interpretation and entry.)

I don't speak fluent developer, but there are PHP code snippets hosted on this GitHub repo. From the the documentation, it looks like you can fire the widget when you need to. So if it works like I think it could work, you can have a little breathing room to figure out the possible session problem.

I've also rethought the whole carpenter/mason career path. After much searches on the Yahoos, I think they may require me to go outside. That just isn't going to work.

Travis_Bailey

Rest assured, that I don't scrape/hammer so hard that it would knock your site down for a period. I often throttle it back to 1 thread and two URI per second. If I forget to configure it, the default is 5 threads at two URI per second. So yeah, maybe a bit of the moz effect.

Chrome Incognito Settings:

Just the typical/vanilla/default incognito settings. It should accept cookies, but they generally wouldn't persist after the session ends.

I didn't receive a message regarding cookies prior to the block notification.

On a side note, I don't allow plugins/extensions while using incognito.

Fun w/ Screaming Frog:

It's hard to say if the 8.5 hour later instance was my instance of Screaming Frog. The IP address would probably tell you the traffic came out of San Antonio, if it was mine. I didn't record the IP at the time, but I remember that much about it. Otherwise it's back in the pool.

Normally Screaming Frog would display notifications, but in this instance the connection just timed out for requested URLs. It didn't appear to be a connectivity issue on my end, so... yeah...

Fun w/ Scraping and/or Spoofing:

Screaming Frog will crawl CSS and JS links in source code. I found it a little odd that it didn't.

I also ran the domain through the Google Page Speed tool for giggles, since it would be traffic from Googlebot. It failed to fetch the resources necessary to run the test. Though cached versions of pages seemed to render fine, with the exception of broken images in some cases. Though I think that may have something to do with the lazy load script in indexinit.js, but I didn't do much more than read the code comments there.

In regard to the settings for the crawler, I had it set to allow cookies. The user agent was googlebot, but it wouldn't have came from the typical IPs. Basically just trying to get around the user agent and cookie problem with an IP that hadn't been blocked. You know, quick - dirty - and likely stupid.

Fun w/ Meta Robots Directives:

A few of the pages that had noindex directives appeared to lack genuine content, in line with the purpose of the site. So I left that avenue alone and figured it was intentional. The noarchive directive should prevent a cache link. I was just wondering if one or more somehow made into the mix, for added zest. Apparently not.

While I'm running off in an almost totally unrelated direction, I thought this was interesting. Apparently Bingbot can be cheeky at times.

Fun w/ The OP:

It looks like Ryan had your answer, and now you have an entirely new potential problem which is interesting. I think I'm just going to take up masonry and carpentry. Feel free to come along if you're interested.

Travis_Bailey

No worries, I'm not frustrated at all.

I usually take my first couple passes at a site in Chrome Incognito. I had sent a request via Screaming Frog. I didn't spoof the user agent, or set it to allow cookies. So that may have been 'suspicious' enough from one IP in a short amount of time. You can easily find the screaming frog user agent in your logs.

Every once in a while I'll manage to be incorrect about something I should have known. The robots.txt file isn't necessarily improperly configured. It's just not how I would have handled it. The googlebot, at least, would ignore the directive since there isn't any path specified. A bad bot doesn't necessarily obey robots.txt directives, so I would only disallow all user agents from the few files and directories I don't want crawled by legit bots. I would then block any bad bots at the server level.

But for some reason I had it in my head that robots.txt worked something like a filter, where the scary wildcard and slash trump previous instructions. So, I was wrong about that - and now I finally deserve my ice cream. How I went this long without knowing otherwise is beyond me. At least a couple productive things came out of it... which is why I'm here.

So while I'm totally screwing up, I figured I would ask when the page was first published/submitted to search engines. So, when did that happen?

Since I'm glutton for punishment, I also grabbed another IP and proceeded to spoof googlebot. Even though my crawler managed to scrape meta data from 60+ pages before the IP was blocked, it never managed to crawl the CSS or JavaScript. That's a little odd to me.

I also noticed some noindex meta tags, which isn't terrible, but could a noarchive directive have made it into the head of one or more pages? Just thought about that after the fact. Anyway, I think it's time to go back to sleep.

Travis_Bailey

For starters, the robots.txt file is blocking all search engine bots. Secondly, I was just taking a look at the live site and I received a message that stated something like; "This IP has been blocked for today due to activity similar to bots." I had only visited two or three pages and the cached home page.

Suffice to say, you need to remove the User-agent: * Disallow: / directive from robots.txt and find a better way to handle potentially malicious bots. Otherwise, you're going to have a bad time.

My guess is the robots.txt file was pushed from dev to production and no one edited it. As for the IP blocking script, I'm Paul and that's between y'all. But either fix or remove it. You also don't necessarily want blank/useless robots.txt directives either. Only block those files and directories you need to block.

Best of luck.

Here's your current robots.txt entries:

User-agent: googlebot
Disallow:

User-agent: bingbot
Disallow:

User-agent: rogerbot
Disallow:

User-agent: sitelock
Disallow:

User-agent: Yahoo!
Disallow:

User-agent: msnbot
Disallow:

User-agent: Facebook
Disallow:

User-agent: hubspot
Disallow:

User-agent: metatagrobot
Disallow:

User-agent: *
Disallow: /

Travis_Bailey

Are we talking ACME Haberdashery and ACME Cobblers, or is this more of an ACME Plumbing and ACME Drain Cleaning situation? I gather that we're talking about local businesses, which comes with it's own bit of fun. I'm just trying to gauge if the difference in service offerings merit the effort.

Travis_Bailey

Seconding EGOL's statement, for the most part.

Years ago, The Matt Cutts stated W3C valid code wasn't a ranking factor. There's been a bit of debate over the years, but there still isn't much evidence to support W3C validation itself as a ranking factor. So it's something you probably can put on the back burner for more pressing concerns.

Honestly, sometimes errors are flagged simply because a comment or two are a little wonky. But that won't really inhibit how competitive a site is. If the site has 'quite a few' errors and warnings, that could potentially decrease site speed. Site speed is a ranking factor.

I suppose my best answer is; "No, it's not a ranking factor itself. Though there's some potential for poor coding to harm something that is a ranking factor."

Travis_Bailey

You're welcome.

In regard to Schema, you'll probably be ahead of most contractors in the Montgomery area in adoption. It's been around for a few years, all major search engines endorse it's usage. It makes their job easier, so there are some perks.

You can go nuts with Schema markup. Fax, hours of business, logo, reviews and your second cousin's brother... well almost.

Though you will need to edit source code to implement the markup. You can get away with copying and pasting my first example (Though I think this editor trimmed off the word 'Map'.), once you get there with the Weebly WYSIWYG.

This is more of a 'nice to have' in regard to the site's blog; maybe add a little bit of text describing what's happening in the images. Sites get found in ways we never targeted. Mixing up the media a bit helps a lot.

Travis_Bailey

Dang it, the WYSIWYG stripped out the code. That feature is wonky... so... here goes....

Example: Filled Out

Guyette Roofing and Construction

     1849 Upper Wetumpka Rd
     Montgomery,
     AL
     36107

   Phone: 334-279-8326
   URL of Map

Example: Blankish

,



   Phone:
   <a href="" itemprop="maps">URL of Map</a>

Travis_Bailey

First, in regard to 'After' on http://www.guyetteroofing.com/blog/montgomery-roof-115. That weird little split looks a lot better than the crazy cobbled psuedo-valley they had going on. I've done some roofing in the past, as a home owner and a starving student (Local job boards - between 15 credit hours - it helps if you can do construction). That job was a big improvement. I would imagine the ridge vent will add a bit of life to the job and make summers a little more bearable.

I've worked with quite a few commercial and residential contractors in the DFW area. There was a common theme that I noticed that I like to call 'Contractor's Syndrome'. Usually I would run into 'Name Roofing and Construction', 'Name Construction', 'Name Contracting', 'Name Contractors' and a few other variants. If the business had been around for more than a few years, the NAP cleanup was usually pretty involved.

I think this is the case here. There are a lot of citations for Guyettes Contracting LLC including the BBB listing. All in all, I picked up Guyette's Roofing and Construction, Guyette Roofing and Guyette's Contracting. The last being the most prominent. So it's safe to say there are actually a lot NAP inconsistencies happening.

There are a lot of great local citations for Guyettes Contracting, so if I had to do it myself and run a business - I would probably err towards using that. The site seems to be doing okayish in organic for three months old. So just make sure that you're properly categorized in your local listings.

I noticed that you have another domain, which is owned by Hibu. If it's not doing anything for you, shut it down and ask them to transfer the domain to you. I've seen domain transfer requests go both ways with Hibu, but I wasn't handling the admin stuff at those times.

As an on-site consideration, I would recommend using Schema markup on at least your contact page. I noticed you're using Weebly, so I'm uncertain of your level of skill with site editing. I'll post a couple of snippets after this, one filled out with Guyette's Roofing - and one that's blank-ish. That way you'll have an example, should you go with a different name.

First Example: Filled Out

Just note that Schema markup isn't cruise control for local/organic rankings. It's just a nifty way to spoon feed search engines and possibly get some nice snippets. Hopefully that will help some.

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

Moz Q&A is closed.

Travis_Bailey

@Travis_Bailey

Posts made by Travis_Bailey

Products

Moz Solutions

Free SEO Tools

Resources

About Moz

Why Moz

Get Involved