Soft 404's from pages blocked by robots.txt -- cause for concern?
-
We're seeing soft 404 errors appear in our google webmaster tools section on pages that are blocked by robots.txt (our search result pages).
Should we be concerned? Is there anything we can do about this?
-
Me too. It was that video that helped to clear things up for me. Then I could see when to use robots.txt vs the noindex meta tag. It has made a big difference in how I manage sites that have large amounts of content that can be sorted in a huge number of ways.
-
Good stuff. I was always under the impression they still crawled them (otherwise, how would you know if the block was removed).
-
Take a look at
http://www.youtube.com/watch?v=KBdEwpRQRD0
to see what I am talking about.
Robots.txt does prevent crawling according to Matt Cutts.
-
Robots.txt prevents indexation, not crawling. The good news is that Googlebot stops crawling 404s.
-
Just a couple of under the hood things to check.
-
Are you sure your robots.txt is setup correctly. Check in GWT to see that Google is reading it.
-
This may be a timing issue. Errors take 30-60 days to drop out (as what I have seen) so did they show soft 404 and then you added them to robots.txt?
If that was the case, this may be a sequence issue. If Google finds a soft 404 (or some other error) then it comes back to spider and is not able to crawl the page due to robots.txt - it does not know what the current status of the page is so it may just leave the last status that it found.
-
I tend to see soft 404 for pages that you have a 301 redirect on where you have a many to one association. In other words, you have a bunch of pages that are 301ing to a single page. You may want to consider changing where some of the 301s redirect so that they going to a specific page vs an index page.
-
If you have a page in robots.txt - you do not want them in Google, here is what I would do. Show a 200 on that page but then put in the meta tags a noindex nofollow.
http://support.google.com/webmasters/bin/answer.py?hl=en&answer=93710
"When we see the noindex meta tag on a page, Google will completely drop the page from our search results, even if other pages link to it"
Let Google spider it so that it can see the 200 code - you get rid of the soft 404 errors. Then toss in the noindex nofollow meta tags to have the page removed from the Google index. It sounds backwards that you have to let Google spider to get it to remove stuff, but it works it you walk through the logic.
Good luck!
-
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Password Protected Page(s) Indexed
Hi, I am wondering if my website can get a penalty if some password protected pages are showing up when I search on google: site:www.example.com/sub-group/pass-word-protected-page That shows that my password protected page was indexed either before or after adding the password protection. I've seen people suggest no indexing the page. Is that the best method to take care of this? What if we are planning on pushing the page live later on? All of these pages have no title tag, meta description, image alt text, etc. Should I add them for each page? I am wondering what is the best step, especially if we are planning on pushing the page(s) live. Thanks for any help!
Intermediate & Advanced SEO | | aua0 -
Some site's links look different on google search. For example Games.com › Flash games › Decoration games How can we do our url's like this?
For example Games.com › Flash games › Decoration games How can we do our url's like this?
Intermediate & Advanced SEO | | lutfigunduz0 -
Why my site it's not being indexed?
Hello.... I got to tell that I feel like a newbie (I am, but know I feel like it)... We were working with a client until january this year, they kept going on their own until september that they contacted us again... Someone on the team that handled things while we were gone, updated it´s robots.txt file to Disallow everything... for maybe 3 weeks before we were back in.... Additionally they were working on a different subdomain, the new version of the site and of course the didn't block the robots on that one. So now the whole site it's been duplicated, even it´s content, the exact same pages exist on the suddomain that was public the same time the other one was blocked. We came in changes the robots.txt file on both server, resend all the sitemaps, sent our URL on google+... everything the book says... but the site it´s not getting indexed. It's been 5 weeks now and no response what so ever. We were highly positioned on several important keywords and now it's gone. I now you guys can help, any advice will be highly appreciated. thanks Dan
Intermediate & Advanced SEO | | daniel.alvarez0 -
Is my text readable? I don't see it in the page source
Text on my site seems to be readable in a text only version (the page is not cached so I viewed it by disabling JAVA and then copy and pasted the page into Word) However, when I look in the page source I don't see the text there. The text was created using Open X html boxes to help us with formatting, but is this causing an SEO problem?
Intermediate & Advanced SEO | | theLotter0 -
It's a good idea to have a directory on your website?
Currently I have a directory on a sub domain but Google apparently sees it as part of my main domain so all outgoing links may be affecting my rankings?
Intermediate & Advanced SEO | | Valarlf0 -
Starting Over with a new site - Do's and Don'ts?
After six months, we've decided to start over with a new website. Here's what I'm thinking. Please offer any constructive Do's or Don'ts if you see that I'm about to make a mistake. Our original site,(call it mysite.com ) we have come to the conclusion, is never going to make a come back on Google. It seems to us a better investment to start over, then to to simply keep hoping. Quite honestly, we're freakin' tired of trying to fix this. We don't want to screw with it any more. We are creative people, and would much rather be building a new race car rather than trying to overhaul the engine in the old one. We have the matching .net domain, mysite.net, which has been aged about 6 years with some fairly general content on a single page. There are zero links to mysite.net, and it was really only used by us for FTP traffic -- nothing in the SERPS for mysite.net. Mysite.NET will be a complete redesign. All content and images will be totally redone. Content will be new, excellent writing, unique, and targeted. Although the subject matter will be similar to mysite.COM, the content, descriptions, keywords, images -- all will be brand spankin' new. We will have a clean slate to begin the long painful link building process.We will put in the time, and bite the bullet until mysite.NET rules Google once again. We'll change the URL in all of our Adwords campaigns mysite.net. My questions are: 1. Mysite.com still gets some ok traffic from Bing. Can I leave mysite.com substantially intact, or does it need to go? 2. If I have "bad links" pointing to mysite.com/123.html what would happen if I 301 that page to mysite.NET/abc.html ? Does the "bad link juice" get passed on to the clean site? It would be a better experience for users who know our URL if they could be redirected to the new site. 3. Should we put Mysite.net on a different server in a different clean IP block? Or doesn't matter? We're willing to spend for the new server if it would help 4. What have I forgotten? Cheers, all
Intermediate & Advanced SEO | | DarrenX0 -
Robots.txt is blocking Wordpress Pages from Googlebot?
I have a robots.txt file on my server, which I did not develop, it was done by the web designer at the company before me. Then there is a word press plugin that generates a robots.txt file. How Do I unblock all the wordpress pages from googlebot?
Intermediate & Advanced SEO | | ENSO0 -
Charity project for local women's shelter - need help: will Google notice if you alter the document title with Javascript after the page loads?
I am doing some pro-bono work with a local shelter for female victims of domestic abuse. I am trying to help visitors to the site cover their tracks by employing a document.title change when the page loads using JavaScript. This shelter receives a lot of traffic from Google. I worry that the Google bots will see this javascript change and somehow penalize this site or modify the title in the SERPs. Has anyone had any experience with this kind of javascript maneuver? All help would be greatly appreciated!
Intermediate & Advanced SEO | | jkonowitch0