Accidentally blocked Googlebot for 14 days
-
Today after I noticed a huge drop in organic traffic to inner pages of my sites, I looked into the code and realized a bug in last commit cause the server to showing captcha pages to all Googlebot requests from Apr 24.
My site has more than 4,000,000 in the index. Before last code change, Googlebot are exempt from being shown the captcha requests so each inner pages are crawled and indexed perfectly with no problem.
The bug broke the whitelisting mechanism and treat requests from Google's ip addresses the same as regular users. It leads to the captcha page being crawled when Googlebot visits thousands of my site's inner pages. This makes Google thinks all my inner pages are identical to each other. Google remove all the inner pages from SERP starting from May 5th before when many of those inner pages have good rankings.
I formerly thought this was a manual or algorithm penalty but
1. I did not receive a warning message in GWT
2. The ranking for main url is good.I tried with "Fetch as Google" in GWT and realize all Googlebot saw in the past 14 days are the same captcha page for all my inner pages.
Now, I have fixed the bug and updated the production site. I just wanted to ask:
1. How long will it take for Google to remove the "duplicated content" flag on my inner pages and show them in SERP again? From my experience, Googlebot revisits urls quite often. But once a url is flagged as "contains similar content", it could be difficult to recover, is it correct?
2. Besides waiting for Google to update its index, what else can I do right now?
Thanks in advance for your answers.
-
Thanks for the info. My site has current crawl rate at 350,00 pages per day so will take 10-20 days to crawl the entire sites.
Most of organic traffic comes to 10,000 urls while others are pagination urls etc. Now all the traffic 1st inner page of each term disappeared in the results of inurl: command.
-
One of my competitors made this type of error and we figured it out right away when their site dropped from the SERPs. It took them a couple weeks to figure it out and make the change. We were hoping that they never figured it out so we could rake in lots of dough. When they fixed it they were back in the SERPs at full strength within a couple of days.... . but they had 40 indexed pages instead of 4,000,000.
I think you will recover well, but might take a while if you don't have a lot of deep links.
Good luck.
-
Pretty much all you can do is wait for Google to recrawl your entire site. You can try re-submitting your site in Webmaster Tools (Health -> Fetch As Google). Getting links from other sites will help speed up the crawling as well. Links from social sites like Twitter/Google+ can help with crawling also.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Googlebot size limit
Hi there, There is about 2.8 KB of java script above the content of our homepage. I know it isn't desirable, but is this something I need to be concerned about? Thanks,
Technical SEO | | SSFCU
Sarah Update: It's fine. Ran a Fetch as Google and it's rendering as it should be. I would delete my question if I could figure out how!0 -
Why blocking a subfolder dropped indexed pages with 10%?
Hy Guys, maybe you can help me to understand better: on 17.04 I had 7600 pages indexed in google (WMT showing 6113). I have included in the robots.txt file, Disallow: /account/ - which contains the registration page, wishlist, etc. and other stuff since I'm not interested to rank with registration form. on 23.04 I had 6980 pages indexed in google (WMT showing 5985). I understand that this way I'm telling google I don't want that section indexed, by way so manny pages?, Because of the faceted navigation? Cheers
Technical SEO | | catalinmoraru0 -
HTTP Status showing up in opensiteexplorer top pages as blocked by robot.txt file
I am trying to find an answer to this question it has alot of url on this page with no data when i go into the data source and search for noindex or robot.txt but the site is visible in the search engines ?
Technical SEO | | ReSEOlve0 -
Google Still Taking 2 - 3 Days to Index New Posts
My website is fairly new, it was launched about 3.5 months ago. I've been publishing new content daily (except on weekends) since then. Is there any reason why new posts don't get indexed faster? All of my posts gets +1's and they're shared on G+, FB and Twitter. My website's at www.webhostinghero.com
Technical SEO | | sbrault740 -
Googlebot take 5 times longer to crawl each page
Hello All From about mid September my GWMT has show that the average time to crawl a page on my site has shot up from an average of 130ms to an average of 700ms and peaks at 4000ms. I have checked my server error logs and found nothing there, I have checked with the hosting comapny and there are no issues with the server or other sites on the same server. Two weeks after this my ranking fell by about 950 places for most of my keywords etc.I am really just trying to eliminate this as a possible cause, of these ranking drops. Or was it the Pand/ EMD algo that has done it. Many Thanks Si
Technical SEO | | spes1230 -
If googlebot fetch doesnt find our site will it be indexed?
We have a problem with one of our sites (wordpress) not getting fetched by googlebot. Some folders on the url get found others not. So we have isolated it as a wordpress issue. Will this affect our page in google serps anytime soon? Does any whizz kid out there know how to begin fixing this as we have spent two days solid on this. url is www.holden-jones.co.uk Thanks in advance guys Rob
Technical SEO | | wonderwall0 -
Micro formats to block HTML text portions of pages
I have a client that wants to use micro formatting to keep a portion of their page (the disclaimer) from being read by the search engines. They want to do this because it will help with their keyword density on the rest of the page and block the “bad keywords” that come from their legally required disclaimer. We have suggested alternate methods to resolve this problem, but they do not want to implement those, they just want a POV from us explaining how this micro formatting process will work. And that’s where the problem is. I’ve never heard of this use case and can’t seem to find anyone who has. I'm posting the question to the Moz Community to see if anyone knows how microformats can keep copy from being crawled by the bots. Please include any links to sites that you know that are using micro formatting in this way. Have you implemented it and seen results? Do you know of a website that is using it now? We're looking for use cases please!
Technical SEO | | Merkle-Impaqt0 -
Site links -> anchor text and blocking
1.Does anyone know where google pulls that anchor text for the organic site links? -- Is there a way to manipulate the anchor text of the sitelinks to get our more important pages to stick out more (capitalization, punctuation etc.) 2. If i block a few of my sitelinks from showing will goolge replace it with a new sitelink or will i be left with fewer? Thanks! Srs
Technical SEO | | Morris770