Client accidently blocked entire site with robots.txt for a week
-
Our client was having a design firm do some website development work for them. The work was done on a staging server that was blocked with a robots.txt to prevent duplicate content issues.
Unfortunately, when the design firm made the changes live, they also moved over the robots.txt file, which blocked the good, live site from search for a full week. We saw the error (!) as soon as the latest crawl report came in.
The error has been corrected, but...
Does anyone have any experience with a snafu like this? Any idea how long it will take for the damage to be reversed and the site to get back in the good graces of the search engines? Are there any steps we should take in the meantime that would help to rectify the situation more quickly?
Thanks for all of your help.
-
Here's a YouMoz post that was promoted to the main blog about what someone else did in this situation that may help.
http://www.seomoz.org/blog/accidental-noindexation-recovery-strategy-amp-results
A couple of preventative steps would have been to make the robots.txt file on the live site read-only so it couldn't have been as easily overwritten, and to use a free service like Pole Position's Code Monitor (https://polepositionweb.com/roi/codemonitor/index.php) to monitor the contents of your robots.txt file once a day and email you if there are changes. I'd also monitor your dev robots.txt, just to make sure the live site robots.txt doesn't get copied over to dev one day and your dev site gets indexed (I've had that happen!).
-
I can't say anything about robots.txt
.... but one of my competitors tossed up a new design with nofollow, noindex tags on every page and their site immediately tanked out of Google.
... it took them a couple weeks to figure it out but once they yanked that line of code they were back at topSERPs within 48 hours.
... this was a relatively strong site and I would expect that type of site recovers faster than a PR2 site with little connectivity.
-
Hi, have you tried logging in to Google Webmaster tools and fetching the URL as googlebot? This helped me recently with a couple of sites that I had blocked with robots.txt. They were up-to-date in SERP's within 2 days.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Website URL, Robots.txt and Google Search Console (www. vs non www.)
Hi MOZ Community,
Technical SEO | | Badiuzz
I would like to request your kind assistance on domain URLs - www. VS non www. Recently, my team have moved to a new website where a 301 Redirection has been done. Original URL : https://www.example.com.my/ (with www.) New URL : https://example.com.my/ (without www.) Our current robots.txt sitemap : https://www.example.com.my/sitemap.xml (with www.)
Our Google Search Console property : https://www.example.com.my/ (with www.) Question:
1. How/Should I standardize these so that Google crawler can effectively crawl my website?
2. Do I have to change back my website URLs to (with www.) or I just need to update my robots.txt?
3. How can I update my Google Search Console property to reflect accordingly (without www.), because I cannot see the options in the dashboard.
4. Is there any to dos such as Canonicalization needed, or should I wait for Google to automatically detect and change it, especially in GSC property? Really appreciate your kind assistance. Thank you,
Badiuzz0 -
Robots.txt - "File does not appear to be valid"
Good afternoon Mozzers! I've got a weird problem with one of the sites I'm dealing with. For some reason, one of the developers changed the robots.txt file to disavow every site on the page - not a wise move! To rectify this, we uploaded the new robots.txt file to the domain's root as per Webmaster Tool's instructions. The live file is: User-agent: * (http://www.savistobathrooms.co.uk/robots.txt) I've submitted the new file in Webmaster Tools and it's pulling it through correctly in the editor. However, Webmaster Tools is not happy with it, for some reason. I've attached an image of the error. Does anyone have any ideas? I'm managing another site with the exact same robots.txt file and there are no issues. Cheers, Lewis FNcK2YQ
Technical SEO | | PeaSoupDigital0 -
Google is Still Blocking Pages Unblocked 1 Month ago in Robots
I manage a large site over 200K indexed pages. We recently added a new vertical to the site that was 20K pages. We initially blocked the pages using Robots.txt while we were developing/testing. We unblocked the pages 1 month ago. The pages are still not indexed at this point. 1 page will show up in the index with an omitted results link. Upon clicking the link you can see the remaining un-indexed pages. Looking for some suggestions. Thanks.
Technical SEO | | Tyler1230 -
What do you think about my new site?
Hi everyone, I'm looking for a review for my new site www.interlive.it Could you please let me know what do you think about the work that I did for my site. I'll be very happy to receive your suggestions. Regards, Mike
Technical SEO | | salvyy0 -
Linking shallow sites to flagship sites
We have hundreds of domains that we are either doing nothing with, or they are very shallow. We do not have the time to build enough quality content on them since they are ancillary to our flagship sites that are already in need of attention and good content. My question is...should we redirect them to the flagship site? If yes, is it ok to do this from root domain to root domain or should we link the root domain to a matching/similar page (gymfranchises.com to http://www.franchisesolutions.com/health_services_franchise_opportunities.cfm)? Or should we do something different altogether? Since we have many to redirect (if this is the route we go), should we redirect gradually?
Technical SEO | | franchisesolutions0 -
Do i have my robots.txt file set up properly
Hi, just doing some seo on my site and i am not sure if i have my robots file set correctly. i use joomla and my website is www.in2town.co.uk. here is my robots file, does this look correct to you User-agent: *
Technical SEO | | ClaireH-184886
Disallow: /administrator/
Disallow: /cache/
Disallow: /components/
Disallow: /includes/
Disallow: /installation/
Disallow: /language/
Disallow: /libraries/
Disallow: /media/
Disallow: /modules/
Disallow: /plugins/
Disallow: /templates/
Disallow: /tmp/
Disallow: /xmlrpc/ many thanks1 -
Too many links on my site
Hi there everybody, I am a total SEO newbie and i am burning with questions. I had my site crawled and found out that it contains too many links. The reason is that it is a site where I constantly write news and articles and each one of them is a new Joomla item, thus a new link. I actually thought lots of content is good for SEO. How am I supposed to reduce the link amount?
Technical SEO | | polyniki0 -
Site not being Indexed that fast anymore, Is something wrong with this Robots.txt
My wordpress site's robots.txt used to be this: User-agent: * Disallow: Sitemap: http://www.domainame.com/sitemap.xml.gz I also have all in one SEO installed and other than posts, tags are also index,follow on my site. My new posts used to appear on google in seconds after publishing. I changed the robots.txt to following and now post indexing takes hours. Is there something wrong with this robots.txt? User-agent: * Disallow: /cgi-bin Disallow: /wp-admin Disallow: /wp-includes Disallow: /wp-content/plugins Disallow: /wp-content/cache Disallow: /wp-content/themes Disallow: /wp-login.php Disallow: /wp-login.php Disallow: /trackback Disallow: /feed Disallow: /comments Disallow: /author Disallow: /category Disallow: */trackback Disallow: */feed Disallow: */comments Disallow: /login/ Disallow: /wget/ Disallow: /httpd/ Disallow: /*.php$ Disallow: /? Disallow: /*.js$ Disallow: /*.inc$ Disallow: /*.css$ Disallow: /*.gz$ Disallow: /*.wmv$ Disallow: /*.cgi$ Disallow: /*.xhtml$ Disallow: /? Disallow: /*?Allow: /wp-content/uploads User-agent: TechnoratiBot/8.1 Disallow: ia_archiverUser-agent: ia_archiver Disallow: / disable duggmirror User-agent: duggmirror Disallow: / allow google image bot to search all imagesUser-agent: Googlebot-Image Disallow: /wp-includes/ Allow: /* # allow adsense bot on entire siteUser-agent: Mediapartners-Google* Disallow: Allow: /* Sitemap: http://www.domainname.com/sitemap.xml.gz
Technical SEO | | ideas1230