Soft 404's from pages blocked by robots.txt -- cause for concern?
-
We're seeing soft 404 errors appear in our google webmaster tools section on pages that are blocked by robots.txt (our search result pages).
Should we be concerned? Is there anything we can do about this?
-
Me too. It was that video that helped to clear things up for me. Then I could see when to use robots.txt vs the noindex meta tag. It has made a big difference in how I manage sites that have large amounts of content that can be sorted in a huge number of ways.
-
Good stuff. I was always under the impression they still crawled them (otherwise, how would you know if the block was removed).
-
Take a look at
http://www.youtube.com/watch?v=KBdEwpRQRD0
to see what I am talking about.
Robots.txt does prevent crawling according to Matt Cutts.
-
Robots.txt prevents indexation, not crawling. The good news is that Googlebot stops crawling 404s.
-
Just a couple of under the hood things to check.
-
Are you sure your robots.txt is setup correctly. Check in GWT to see that Google is reading it.
-
This may be a timing issue. Errors take 30-60 days to drop out (as what I have seen) so did they show soft 404 and then you added them to robots.txt?
If that was the case, this may be a sequence issue. If Google finds a soft 404 (or some other error) then it comes back to spider and is not able to crawl the page due to robots.txt - it does not know what the current status of the page is so it may just leave the last status that it found.
-
I tend to see soft 404 for pages that you have a 301 redirect on where you have a many to one association. In other words, you have a bunch of pages that are 301ing to a single page. You may want to consider changing where some of the 301s redirect so that they going to a specific page vs an index page.
-
If you have a page in robots.txt - you do not want them in Google, here is what I would do. Show a 200 on that page but then put in the meta tags a noindex nofollow.
http://support.google.com/webmasters/bin/answer.py?hl=en&answer=93710
"When we see the noindex meta tag on a page, Google will completely drop the page from our search results, even if other pages link to it"
Let Google spider it so that it can see the 200 code - you get rid of the soft 404 errors. Then toss in the noindex nofollow meta tags to have the page removed from the Google index. It sounds backwards that you have to let Google spider to get it to remove stuff, but it works it you walk through the logic.
Good luck!
-
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Robots txt is case senstive? Pls suggest
Hi i have seen few urls in the html improvements duplicate titles Can i disable one of the below url in the robots.txt? /store/Solar-Home-UPS-1KV-System/75652
Intermediate & Advanced SEO | | Rahim119
/store/solar-home-ups-1kv-system/75652 if i disable this Disallow: /store/Solar-Home-UPS-1KV-System/75652 will the Search engines scan this /store/solar-home-ups-1kv-system/75652 im little confused with case senstive.. Pls suggest go ahead or not in the robots.txt0 -
Site migration - 301 or 404 for pages no longer needed?
Hi I am migrating from my old website to a new one on a different, server with a very different domain and url structure. I know it's is best to change as little as possible but I just wasn't able to do that. Many of my pages can be redirected to new urls with similar or the same content. My old site has around 400 pages. Many of these pages/urls are no longer required on the new site - should I 404 these pages or 301 them to the homepage? I have looked through a lot of info online to work this out but cant seem to find a definative answer. Thanks for this!! James
Intermediate & Advanced SEO | | Curran0 -
Why isn't my uneven link flow among index pages causing uneven search traffic?
I'm working with a site that has millions of pages. The link flow through index pages is atrocious, such that for the letter A (for example) the index page A/1.html has a page authority of 25 and the next pages drop until A/70.html (the last index page listing pages that start with A) has a page authority of just 1. However, the pages linked to from the low page authority index pages (that is, the pages whose second letter is at the end of the alphabet) get just as much traffic as the pages linked to from A/1.html (the pages whose second letter is A or B). The site gets a lot of traffic and has a lot of pages, so this is not just a statistical biip. The evidence is overwhelming that the pages from the low authority index pages are getting just as much traffic as those getting traffic from the high authority index pages. Why is this? Should I "fix" the bad link flow problem if traffic patterns indicate there's no problem? Is this hurting me in some other way? Thanks
Intermediate & Advanced SEO | | GilReich0 -
Can't find X-Robots tag!
Hi all. I've been checking out http://www.unthankbooks.com/ as it seems to have some indexing problems. I ran a server header check, and got a 200 response. However, it also shows the following: X-Robots-Tag:
Intermediate & Advanced SEO | | Blink-SEO
noindex, nofollow It's not in the page HTML though. Could it be being picked up from somewhere else?0 -
My landing page changed in google's serp. I used to have a product page now I have a pdf?
I have been optimizing this page for a few weeks now and and have seen our page for up from 23rd to 11th on the serp's. I come to work today and not only have I dropped to 15 but I've also had my relevant product page replaced by this page . Not to mention the second page is a pdf! I am not sure what happened here but any advice on how I could fix this would be great. My site is www.mynaturalmarket.com and the keyword I'm working on is Zyflamend.
Intermediate & Advanced SEO | | KenyonManu3-SEOSEM0 -
Does having a page that ends with ? cause duplicate content?
I am working on a site that has lots of dynamic parameters. So lets say we have www.example.com/page?parameter=1 When the page has no parameters you can still end up at www.example.com/page? Should I redirect this to www.example.com/page/ ? Im not sure if Google ignores this, or if these pages need to be dealt with. Thanks
Intermediate & Advanced SEO | | MarloSchneider0 -
Will implementing a 'Scroll to Div Anchor' cause a duplicate content issue?
I have just been building a website for a client with pages that contain a lot of text content. To make things easier for site visitors I have created a menu bar that sticks to the top of the page and the page will scroll to different areas of content (i/e different Div id anchors) Having done this I have just had the thought that this might inadvertently introduce duplicate content issue. Does anyone know if adding an #anchor to the end of a url will cause a duplicate content error in google? For example, would the following URLs be treated as different:- http://www.mysite.co.uk/services
Intermediate & Advanced SEO | | AdeLewis
http://www.mysite.co.uk/services#anchor1
http://www.mysite.co.uk/services#anchor2
http://www.mysite.co.uk/services#anchor3
http://www.mysite.co.uk/services#anchor4 Thanks.0 -
301 Redirect All Url's - WWW -> HTTP
Hi guys, This is part 2 of a question I asked before which got partially answered; I clicked question answered before I realized it only fixed part of the problem so I think I have to post a new question now. I have an apache server I believe on Host Gator. What I want to do is redirect every URL to it's corresponding alternative (www redirects to http). So for example if someone typed in www.mysite.com/page1 it would take them to http://mysite.com/page1 Here is a code that has made all of my site's links go from WWW to HTTP which is great, but the problem is still if you try to access the WWW version by typing it, it still works and I need it to redirect. It's important because Google has been indexing SOME of the URL's as http and some as WWW and my site was just HTTP for a long time until I made the mistake of switching it now I'm having a problem with duplicate content and such. Updated it in Webmaster Tools but I need to do this regardless for other SE's. Thanks a ton! RewriteEngine On RewriteBase / RewriteCond %{HTTP_HOST} ^www.yourdomain.com [NC] RewriteRule ^(.*)$ http://yourdomain.com/$1 [L,R=301]
Intermediate & Advanced SEO | | DustinX0