800,000 pages blocked by robots...
-
We made some mods to our robots.txt file. Added in many php and html pages that should not have been indexed.
Well, not sure what happened or if there was some type of dynamic conflict with our CMS and one of these pages, but in a few weeks we checked webmaster tools and to our great surprise and dismay, the number of blocked pages we had by robots.txt was up to about 800,000 pages out of the 900,000 or so we have indexed.
1. So, first question is, has anyone experienced this before? I removed the files from robots.txt and the number of blocked files has still been climbing. Changed the robots.txt file on the 27th. It is the 29th and the new robots.txt file has been downloaded, but the blocked pages count has been rising in spite of it.
2. I understand that even if a page is blocked by robots.txt, it still shows up in the index, but does anyone know how the blocked page affects the ranking? i.e. while it might still show up even though it has been blocked will google show it at a lower rank because it was blocked by robots.txt?
Our current robots.txt just says:
User-agent: *
Disallow:Sitemap: oursitemap
Any thoughts?
Thanks!
Craig
-
Hey Matt,
Thanks for taking the time to answer!
Well, the good news is, this caused us to find some issues with our sitemaps that we have now fixed and we might not have found them if this hadn't happened.
According to wmtools, however, we are still at 937,000 pages blocked... I don't know if they are actually blocked or not... Now that we have re-submitted our sitemap, hopefully we will start to see this change soon.
Thanks for the load time info. Yeah, we are aware that we could speed things up. We are always trying to do that more and more.
Hopefully we will start seeing that number go down very soon...
Thanks!
Craig
-
Hi Craig
Sorry for taking a while to come back to you but I have been very busy, however I have a couple a question -
When you first noticed the blocked pages had you made any changes to the site at that time and if so what were they?
Have you done anything that could have slowed your site down - running your homepage through a speed of load test I notice that it took over 4 secs which isn't very quick.
I once had an issue of lots of pages being de-indexed when we made an update to our sites template and the load time increased drastically. Even if this isn't the cause of the issue looking at optimizing your load time will help increase the speed at which your site is re-crawled. Google will be able to get through more in the time it allocates to crawl your site each time it visits if it is smaller and loads quicker hence speeding up your recovery.
There are lots of tools and information on optimizing load times - here is just one - http://www.webpagetest.org
-
Hey Matt,
Just sent you a PM with our site details.
Yeah, we are on SEOMoz, but nothing standing out there.
We are up to 941,364 pages blocked today. I thought I saw that it had gone down a tiny bit yesterday, but was mistaken.
Thanks for taking a look!
Craig
-
Hi Craig - you are right that the directive without a slash means allow everything - I was just trying to figure out how you could have caused this issue because Google doesn't appear to be following your directive to crawl everything hence why I asked about the layout. Have you tested your robots.txt in Google Webmaster Tools?
What is the location of your robots.txt?
What does your index status say in Google Webmaster Tools?
You can also just create an empty robots.txt file which will allow all as well or
User-agent: *
Allow:/I take it that you have this website setup as a campaign in SEOMoz - has this identified any relevant issues in the latest crawl?
Would you share your web address with us or even private message me with it so I can have a look for clues as this is very interesting!
-
Hi Matt,
Sorry for the confusion, I should have pasted that text using plain text so it wouldn't be on the same line.
I edited it as seen above. The user agent and disallow are on separate lines.
Today we are up to 940,000 blocked URLs in webmaster tools.
The reason I didn't delete robots is I read some suggestion that if you delete it, Google will think that there was a problem accessing it and continue relying on the former version for a period of time. Not sure how much truth is in that, but seemed to make enough sense not to delete it, but just correctly modify it.
Are you saying that my command above is disallowing all? From the research I have done, you have to have a slash at the end to disallow all, i.e. Disallow: /. Having Disallow: with nothing after it, is supposed to allow all, which is the goal.
From robots.txt.org:
To allow all robots complete access
User-agent: *
Disallow:Strangely, we haven't noticed an enormous traffic drop. However, this happened right at the time that we fixed some other issues that should have caused a significant improvement, so it could just be that no positive impact has been felt and things just remained the same.
Ultimately, the fact that the blocked pages keeps rising is worrisome, or says that there is a bug in Google's system.
Thanks!
Craig
-
I have experienced something similar after a site redesign the test version was put live with the robots.txt disallowing all. My site was deindexed quickly as when you block pages with a robots.txt their page content wont be indexed so won't appear in the search results. Google may index urls that are disallowed if they are linked from another page online however the rank will be lower due to the page content being ignored. Remove your robots.txt above as it is disallowing all It would appear although that command should allow all but there is no point in having robot file allowing all as this happens without. Though you would usually have disallow: / - to stop all!! Then I would resubmit an updated site map in Google Webmaster Tools and you should see your pages start to be indexed again. If you don't have a site map you can just wait for Google to start to re crawl your site. I would also check your homepage source code to make sure there isn't a robots meta tag turned on by accident saying no index no follow as I have seen this done by accident with CMS before.
have a look here on exactly how google handles robots.txt - http://support.google.com/webmasters/bin/answer.py?hl=en&answer=156449
One quick question - have you laid out your file exactly as above with the user agent and disallow on the same line as this might be what is causing the issue? I haven't tested it but the standard is to have them on separate lines.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
UTM Links Showing Up as Separate Pages in Google Analytics
Hey everyone, I was just looking at landing pages in Google Analytics, and in addition to just the URL of the landing page, the UTM links are being listed as separate pages. Is this normal? I anticipated seeing the landing page URL and then using the secondary dimension to see source/medium. If this isn't normal, what would I check next?
Reporting & Analytics | | rachelmeyer0 -
Google Analytics reporting traffic for 404 pages
Hi guys, Unique issue with google analytics reporting for one of our sites. GA is reporting sessions for 404 pages (landing pages, organic traffic) e.g. for this page: http://www.milkandlove.com.au/breastfeeding-dresses/index.php the page is currently a 404 page but GA (see screenshot) is reporting organic traffic (to the landing page). Does anyone know any reasons why this is happening? Cheers. http://www.milkandlove.com.au/breastfeeding-dresses/index.php GK0zDzj.jpg
Reporting & Analytics | | jayoliverwright2 -
Can 500 errors hurt rankings for an entire site or just the pages with the errors?
I'm working with a site that had over 700 500 errors after a redesign in april. Most of them were fixed in June, but there are still about 200. Can 500 errors affect rankings sitewide, or just the pages with the errors? Thanks for reading!
Reporting & Analytics | | DA20130 -
Can underscore blanks trigger home page bounce rates
Buongiorno from 8 degrees C Wetherby UK 🙂 On this site the scrolling banner has been hyperlinked with underscore blank causing new pages to open when a user clicks on a banner. My question is please... "If a user clicks on a banner will Google measure this as a home page bounce" ( I dont think it will i just want to be 100% sure) Click here for illustration:
Reporting & Analytics | | Nightwing
http://i216.photobucket.com/albums/cc53/zymurgy_bucket/home-page-sw-banner_zps0fda6318.jpg Grazie Tanto,
David0 -
Previously performing page no longer ranking
The best performing page on my website www.danielalexandra.com/personal-training/ has suddenly completely fallen off the google radar and all of my ranking results for keywords have dropped significantly as a result. My domain is only 4/5 months old but was already becoming relatively competitive and generating some interest until this? Any ideas as to why this may have happened?
Reporting & Analytics | | DAlondon0 -
Confirmation page gets hit multiple times by some users. How I can I segment out unique visits?
Hi All, I'm web marketing manager at http://www.evenues.com which is like an AirBnB for meeting space. When calculating the number of bookings for our meeting spaces, I've set up a goal in analytics with the confirmation page as the goal URL. The problem is, it seems that some users are looking at the same confirmation page several times. We have unique URLs for each confirmation page, but some users seem to be visiting these unique pages more than 2 to 5 times. This skews our numbers a bit. This makes things a bit problematic when it comes to segmenting visitors. is there anything we can so that each unique URL visited only counts once? Thanks, Kenji
Reporting & Analytics | | eVenuesSEO0 -
Google Analytics - In-Page Analytics
I had a strange thought waking up this morning, and was curious to hear other people's opinions on it. In Google Analytics, under Content > In-Page Analytics, Google shows what links on your site pages get clicked and how many times plus other metrics. Do you think they use that data for ranking back links so-to-speak? What I mean is, say I had a back link to my site on example.com, and example.com had google analytics installed. Google can see through google analytics whether my link has been clicked on. Say that my link gets no clicks, do you think that Google would use that metric against my site deeming it "not popular" or "not a good resource", even if example.com was a very popular site? And it could work the other way. Say my link got thousands of clicks on example.com, do you think that Google might use that to promote my site? I couldn't find any other discussion on this anywhere, so am not sure if people have already thought about this.
Reporting & Analytics | | THB0 -
Home page rankings disappeared - Google Penalty?
Hi, I have a clients website that I have been working on for a few months now. On-page SEO for home page is A for all target keywords. All Domain / subdomain / external link data etc. much stronger than all competition. But approx 3 months ago (before Panda update) all home page rankings (except 1 keyword) dropped out sight in Google rankings. 3 months later, still can't work out why. Sometimes rankings come back onto page 9 or so then drop out again. Deeper pages are not effected, just the Home page. Looks like a Google penalty to me but can't work out why. Any suggestions would be gratefully received. Website is - www.marrikashairextensions.com Many thanks
Reporting & Analytics | | websearchseo0