Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Temporarily suspend Googlebot without blocking users
-
We'll soon be launching a redesign, on a new platform, migrating millions of pages to new URLs.
How can I tell Google (and other crawlers) to temporarily (a day or two) ignore my site? We're hoping to buy ourselves a small bit of time to verify redirects and live functionality before allowing Google to crawl and index the new architecture.
GWT's recommendation is to 503 all pages - including robots.txt, but that also makes the site invisible to real site visitors, resulting in significant business loss. Bad answer.
I've heard some recommendations to disallow all user agents in robots.txt. Any answer that puts the millions of pages we already have indexed at risk is also a bad answer.
Thanks
-
So it seems like we've gone full circle.
The initial question was, "How can I tell Google (and other crawlers) to temporarily (a day or two) ignore my site? We're hoping to buy ourselves a small bit of time to verify redirects and live functionality before allowing Google to crawl and index the new architecture."
Sounds like the answer is, 'that's not possible'.
-
Putting a noindex/nofollow on an index url will remove it from SERPs, although some ulrs will still show for direct search (using the url itself as a KW) but even then they will appear as clear links without any TItle/Description details.
Using a 301 redirect will remove the old page from index, regardless of noindex/nofollow.
If you are using a noindex/nofollow for the new url - both will not show.
-
Thank you, Ruth!
Can I ask a clarifying question?
If I put a noindex/nofollow on the new urls, wouldn't the result be the same as if I put noindex/nofollow on the indexed urls? There is only one instance of each page - and all of the millions of indexed URLs will be redirecting to new urls.
Here is my assumption: if I put noindex/nofollow on the new urls - a search bot will crawl the old url, follow the redirect to the new url, detect the noindex/nofollow, and then drop the old, indexed url from their index. Is that the wrong assumption?
-
I would use robots.txt to noindex the whole website as well - but just the new pages, not the old ones. Then when you're ready to be crawled, remove the robots.txt entry and Fetch as Googlebot to get re-crawled. You may fall out of the index for a day or two but should quickly be re-indexed.
Another solution would be to use the meta robots tag to individually noindex each page (if there's a way to do that in your CMS, obviously adding them by hand wouldn't be scalable), and then remove. That may increase your chances of getting re-crawled and re-indexed sooner.
-
Thanks for the response, Mark.
It sounds as if you tried this on a few new pages.
I'm talking about millions of existing pages.
Would you robots.txt noindex your entire website? Seems like you'd run a huge risk of being dumped from the index entirely.
-
I recommend robots text noindex, nofollow.
That way people can still see the pages they just aren't indexed in Google yet.
As we developed some new pages on one of our sites we did this and we could still view pages and send folks there that we wanted to see the content for feedback - but no one else knew they were there.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Fetch as Google temporarily lifting a penalty?
Hi, I was wondering if anyone has seen this behaviour before? I haven't! We have around 20 sites and each one has lost all of its rankings (not in index at all) since the medic update apart from specifying a location on the end of a keyword. I set to work trying to identify a common issue on each site, and began by improving speed issues in insights. On one site I realised that after I had improved the speed score and then clicked "fetch as google" the rankings for that site all returned within seconds. I did the same for a different site and exactly the same result. Cue me jumping around the office in delight! The pressure is off, people's jobs are safe, have a cup of tea and relax. Unfortunately this relief only lasted between 6-12 hours and then the rankings go again. To me it seems like what is happening is that the sites are all suffering from some kind of on page penalty which is lifted until the page can be assessed again and when it is the penalty is reapplied. Not one to give up I set about methodically making changes until I found the issue. So far I have completely rewritten a site, reduced over use of keywords, added over 2000 words to homepage. Clicked fetch as google and the site came back - for 6 hours..... So then I gave the site a completely fresh redesign and again clicked fetch as google, and same result. Since doing all that, I have swapped over to https, 301 redirected etc and now the site is completely gone and won't come back after fetching as google. Uh! So before I dig myself even deeper, has anyone any ideas? Thanks.
Technical SEO | | semcheck11 -
Google Search console says 'sitemap is blocked by robots?
Google Search console is telling me "Sitemap contains URLs which are blocked by robots.txt." I don't understand why my sitemap is being blocked? My robots.txt look like this: User-Agent: *
Technical SEO | | Extima-Christian
Disallow: Sitemap: http://www.website.com/sitemap_index.xml It's a WordPress site, with Yoast SEO installed. Is anyone else having this issue with Google Search console? Does anyone know how I can fix this issue?1 -
Robot.txt : How to block a specific file type in several subdirectories ?
Hello everyone ! I need help setting up a robot.txt. I'm trying to block all pdf files in particular directories so I'm using this command. In the example below the line is blocking all .gif in the entire site. Block files of a specific file type (for example, .gif) | Disallow: /*.gif$ 2 questions : Can I use this command to specify one particular directory in which I want to block pdf files ? Will this line be recognized by googlebots ? Disallow: /fileadmin/xxxxxxx/xxx/xxxxxxx/*.pdf$ Then I realized that I would have to write as many lines as many directories there are in which I want to block pdf files. Let's say I want to block pdf files in all these 3 directories /fileadmin/directory1 /fileadmin/directory1/sub1 /fileadmin/directory1/sub1/pdf Is there a pattern-matching rule I could use to blocks access to pdf files in all subdirectories instead of writing 3x the above line for each subdirectory ? For exemple : Disallow: /fileadmin/directory1*/ Many thanks in advance for any insight you may have.
Technical SEO | | LabeliumUSA0 -
Why Can't Googlebot Fetch Its Own Map on Our Site?
I created a custom map using google maps creator and I embedded it on our site. However, when I ran the fetch and render through Search Console, it said it was blocked by our robots.txt file. I read in the Search Console Help section that: 'For resources blocked by robots.txt files that you don't own, reach out to the resource site owners and ask them to unblock those resources to Googlebot." I did not setup our robtos.txt file. However, I can't imagine it would be setup to block google from crawling a map. i will look into that, but before I go messing with it (since I'm not familiar with it) does google automatically block their maps from their own googlebot? Has anyone encountered this before? Here is what the robot.txt file says in Search Console: User-agent: * Allow: /maps/api/js? Allow: /maps/api/js/DirectionsService.Route Allow: /maps/api/js/DistanceMatrixService.GetDistanceMatrix Allow: /maps/api/js/ElevationService.GetElevationForLine Allow: /maps/api/js/GeocodeService.Search Allow: /maps/api/js/KmlOverlayService.GetFeature Allow: /maps/api/js/KmlOverlayService.GetOverlays Allow: /maps/api/js/LayersService.GetFeature Disallow: / Any assistance would be greatly appreciated. Thanks, Ruben
Technical SEO | | KempRugeLawGroup1 -
Google indexing despite robots.txt block
Hi This subdomain has about 4'000 URLs indexed in Google, although it's blocked via robots.txt: https://www.google.com/search?safe=off&q=site%3Awww1.swisscom.ch&oq=site%3Awww1.swisscom.ch This has been the case for almost a year now, and it does not look like Google tends to respect the blocking in http://www1.swisscom.ch/robots.txt Any clues why this is or what I could do to resolve it? Thanks!
Technical SEO | | zeepartner0 -
Can you do a 301 redirect without a hosting account?
Trying to retire domain1 and 301 it to domain2 - just don't want to get stuck having to pay the old hosting provider simply to serve a .htaccess file with the redirect rule.
Technical SEO | | TitanDigital0 -
Urls with or without .html ending
Hello, Can anyone show me some authority info on wheher links are better with or without a .html ending? Thanks is advance
Technical SEO | | sesertin0 -
What is the best method to block a sub-domain, e.g. staging.domain.com/ from getting indexed?
Now that Google considers subdomains as part of the TLD I'm a little leery of testing robots.txt with something like: staging.domain.com
Technical SEO | | fthead9
User-agent: *
Disallow: / in fear it might get the www.domain.com blocked as well. Has anyone had any success using robots.txt to block sub-domains? I know I could add a meta robots tag to the staging.domain.com pages but that would require a lot more work.0