Accidentally blocked Googlebot for 14 days
-
Today after I noticed a huge drop in organic traffic to inner pages of my sites, I looked into the code and realized a bug in last commit cause the server to showing captcha pages to all Googlebot requests from Apr 24.
My site has more than 4,000,000 in the index. Before last code change, Googlebot are exempt from being shown the captcha requests so each inner pages are crawled and indexed perfectly with no problem.
The bug broke the whitelisting mechanism and treat requests from Google's ip addresses the same as regular users. It leads to the captcha page being crawled when Googlebot visits thousands of my site's inner pages. This makes Google thinks all my inner pages are identical to each other. Google remove all the inner pages from SERP starting from May 5th before when many of those inner pages have good rankings.
I formerly thought this was a manual or algorithm penalty but
1. I did not receive a warning message in GWT
2. The ranking for main url is good.I tried with "Fetch as Google" in GWT and realize all Googlebot saw in the past 14 days are the same captcha page for all my inner pages.
Now, I have fixed the bug and updated the production site. I just wanted to ask:
1. How long will it take for Google to remove the "duplicated content" flag on my inner pages and show them in SERP again? From my experience, Googlebot revisits urls quite often. But once a url is flagged as "contains similar content", it could be difficult to recover, is it correct?
2. Besides waiting for Google to update its index, what else can I do right now?
Thanks in advance for your answers.
-
Thanks for the info. My site has current crawl rate at 350,00 pages per day so will take 10-20 days to crawl the entire sites.
Most of organic traffic comes to 10,000 urls while others are pagination urls etc. Now all the traffic 1st inner page of each term disappeared in the results of inurl: command.
-
One of my competitors made this type of error and we figured it out right away when their site dropped from the SERPs. It took them a couple weeks to figure it out and make the change. We were hoping that they never figured it out so we could rake in lots of dough. When they fixed it they were back in the SERPs at full strength within a couple of days.... . but they had 40 indexed pages instead of 4,000,000.
I think you will recover well, but might take a while if you don't have a lot of deep links.
Good luck.
-
Pretty much all you can do is wait for Google to recrawl your entire site. You can try re-submitting your site in Webmaster Tools (Health -> Fetch As Google). Getting links from other sites will help speed up the crawling as well. Links from social sites like Twitter/Google+ can help with crawling also.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Brushing up on my SEO skills - how do I check my website to see if Javascript is blocking search engines from crawling the links within a javascript-enabled drop down menu?
I set my user agent in my Chrome browser to Googlebot and I disable javascript within my Chrome settings, but then what?
Technical SEO | | MagnitudeSEO0 -
Blocked jquery in Robots.txt, Any SEO impact?
I've heard that Google is now indexing links and stuff available in javascript and jquery. My webmastertools is showing that some links are blocked in robots.txt of jquery. Sorry I'm not a developer or designer. I want to know is there any impact of this on my SEO? and also how can I unblock it for the robots? Check this screenshot: http://i.imgur.com/3VDWikC.png
Technical SEO | | hammadrafique0 -
Block Domain in robots.txt
Hi. We had some URLs that were indexed in Google from a www1-subdomain. We have now disabled the URLs (returning a 404 - for other reasons we cannot do a redirect from www1 to www) and blocked via robots.txt. But the amount of indexed pages keeps increasing (for 2 weeks now). Unfortunately, I cannot install Webmaster Tools for this subdomain to tell Google to back off... Any ideas why this could be and whether it's normal? I can send you more domain infos by personal message if you want to have a look at it.
Technical SEO | | zeepartner0 -
I accidentally blocked Google with Robots.txt. What next?
Last week I uploaded my site and forgot to remove the robots.txt file with this text: User-agent: * Disallow: / I dropped from page 11 on my main keywords to past page 50. I caught it 2-3 days later and have now fixed it. I re-imported my site map with Webmaster Tools and I also did a Fetch as Google through Webmaster Tools. I tweeted out my URL to hopefully get Google to crawl it faster too. Webmaster Tools no longer says that the site is experiencing outages, but when I look at my blocked URLs it still says 249 are blocked. That's actually gone up since I made the fix. In the Google search results, it still no longer has my page title and the description still says "A description for this result is not available because of this site's robots.txt – learn more." How will this affect me long-term? When will I recover my rankings? Is there anything else I can do? Thanks for your input! www.decalsforthewall.com
Technical SEO | | Webmaster1230 -
Same day our sitemap finished processing in G WebTools, our SERP results Tank!
Need a little help troubleshooting an SEO issue. The day our sitemap finished processing in Google Webmaster Tools, almost all of our keyword serp results tanked. Our top 4 keywords routinely were placing from 11 - 33 rank in serp result and now they're not even on in the top 200? Would the sitemap processing have anything do do with this or should I look somewhere else. FYI: Site is build in DNN, sitemap is fine and robot.txt file is good. Open to all suggestions!!
Technical SEO | | Firecracker0 -
Block Baidu crawler?
Hello! One of our websites receives a large amount of traffic from the Baidu crawler. We do not have any Chinese content or do any business with China since our market is Uk. Is it a good idea to block the Baidu crawler in the robots.txt or could it have any adverse effects on SEO of our site? What do you suggest?
Technical SEO | | AJPro0 -
Google: site gone from SERPs, back in 1 day, then gone again?
Last December we fumbled our 404 error page with a misconfigured server and broken links on the page. Needless to say, our site dropped into an abyss - for 4 months. Yesterday we appeared again in our regular placement (actually stronger placement). Our site has been around since 09-Jan-1999 and has been a highly regarded site with good link structure and solid content for engineers. Then today we're gone again. Yesterday morning we had 125,000 pages indexed with Google which grew to 187,000 by late afternoon. Then this morning we're nowhere to be found and only 21,800 pages are now indexed. We've been working with Bruce Clay Inc through an SEO site audit doing several updates to improve through good seo practices. We haven't made any changes to the site since last Friday, April 13th (maybe the unlucky number has something to do with it....) Any ideas, insight, suggestions???? Thanks!
Technical SEO | | Prospector-Plastics1 -
Differences between Lynx Viewer, Fetch as Googlebot and SEOMoz Googlebot Rendering
Three tools to render a site as Googlebot would see it: SEOMoz toolbar.
Technical SEO | | qlkasdjfw
Lynxviewer (http://www.yellowpipe.com/yis/tools/lynx/lynx_viewer.php )
Fetch as Googlebot. I have a website where I can see dropdown menus in regular browser rendering, Lynxviewer and Fetch as Googlebot. However, in the SEOMoz toolbar 'render as googlebot' tool, I am unable to see these dropdown menus when I have javascript disabled. Does this matter? Which of these tools is a better way to see how googlebot views your site?0