Temporarily suspend Googlebot without blocking users
-
We'll soon be launching a redesign, on a new platform, migrating millions of pages to new URLs.
How can I tell Google (and other crawlers) to temporarily (a day or two) ignore my site? We're hoping to buy ourselves a small bit of time to verify redirects and live functionality before allowing Google to crawl and index the new architecture.
GWT's recommendation is to 503 all pages - including robots.txt, but that also makes the site invisible to real site visitors, resulting in significant business loss. Bad answer.
I've heard some recommendations to disallow all user agents in robots.txt. Any answer that puts the millions of pages we already have indexed at risk is also a bad answer.
Thanks
-
So it seems like we've gone full circle.
The initial question was, "How can I tell Google (and other crawlers) to temporarily (a day or two) ignore my site? We're hoping to buy ourselves a small bit of time to verify redirects and live functionality before allowing Google to crawl and index the new architecture."
Sounds like the answer is, 'that's not possible'.
-
Putting a noindex/nofollow on an index url will remove it from SERPs, although some ulrs will still show for direct search (using the url itself as a KW) but even then they will appear as clear links without any TItle/Description details.
Using a 301 redirect will remove the old page from index, regardless of noindex/nofollow.
If you are using a noindex/nofollow for the new url - both will not show.
-
Thank you, Ruth!
Can I ask a clarifying question?
If I put a noindex/nofollow on the new urls, wouldn't the result be the same as if I put noindex/nofollow on the indexed urls? There is only one instance of each page - and all of the millions of indexed URLs will be redirecting to new urls.
Here is my assumption: if I put noindex/nofollow on the new urls - a search bot will crawl the old url, follow the redirect to the new url, detect the noindex/nofollow, and then drop the old, indexed url from their index. Is that the wrong assumption?
-
I would use robots.txt to noindex the whole website as well - but just the new pages, not the old ones. Then when you're ready to be crawled, remove the robots.txt entry and Fetch as Googlebot to get re-crawled. You may fall out of the index for a day or two but should quickly be re-indexed.
Another solution would be to use the meta robots tag to individually noindex each page (if there's a way to do that in your CMS, obviously adding them by hand wouldn't be scalable), and then remove. That may increase your chances of getting re-crawled and re-indexed sooner.
-
Thanks for the response, Mark.
It sounds as if you tried this on a few new pages.
I'm talking about millions of existing pages.
Would you robots.txt noindex your entire website? Seems like you'd run a huge risk of being dumped from the index entirely.
-
I recommend robots text noindex, nofollow.
That way people can still see the pages they just aren't indexed in Google yet.
As we developed some new pages on one of our sites we did this and we could still view pages and send folks there that we wanted to see the content for feedback - but no one else knew they were there.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
How long before I can use a redirected domain without taking back link juice?
We recently moved our website to a new domain that better matched our brand. I want to use the old domain at some point for another aspect of our business. How long after we do the domain redirect will it be safe to use the old domain again--without affecting the seo of the new domain? Thanks! Harriet
Technical SEO | | zharriet0 -
WWW and Without WWW Backlinks
I have just seen through ahrefs and found without WWW have more backlinks instead of WWW. Is there any way to forward all those without WWW to WWW domain, is there any harm or effect in serp ranking?
Technical SEO | | chandubaba0 -
Summarize your question.Sitemap blocking or not blocking that is the question?
Hi from wet & overcast wetherby UK 😞 Ones question is this... " Is the sitemap plus boxes blocking bots ie they cant pass on this page http://www.langleys.com/Site-Map.aspx " Its just the + boxes that concern me, i remeber reading somewherte javascript nav can be toxic. Is there a way to test javascript nav set ups and see if they block bots or not? Thanks in advance 🙂
Technical SEO | | Nightwing0 -
How to allow googlebot past paywall
Does anyone know of any ways or ideas to allow Google/Bing etc. to index your content, but have it behind a paywall for users?
Technical SEO | | MirandaP0 -
Base HREF set without HTTP. Will this cause search issues?
The base href has been set in the following format: <base href="//www.example.com/"> I am working on a project where many of the programming team don't believe that SEO has an impact on a website. So, we often see some strange things. Recently, they have rolled out an update to the website template that includes the base href I listed above. I found out about it when some of our tools such as Xenu link checker - suddenly stopped working. Google appears to be indexing the the pages fine and following the links without any issue - but I wonder if there is any long term SEO considerations to building the internal links in this manner? Thanks!
Technical SEO | | Nebraska0 -
With or without "/" at the end of domain
Hello, A client domains appear sometimes like www.domain.co.uk and sometimes like www.domain.co.uk/ I would like to place redirects from URLs that contain strings such as /index.aspx?id=42 to the main page but which one should I pick? With or without the "/" ? Thank you
Technical SEO | | DavidSpivac0 -
How can I prevent sh404SEF Anti-flood control from blocking SEOMoz?
I'm using sh404SEF on my Joomla 1.5 website. Last week, I activated the security functions of the tool, which includes an anti-flood control feature. This morning when I looked at my new crawl statistics in SEOMoz, I noticed a significant drop in the number of webpages crawled, and I'm attributing that to the security configurations that I made earlier in the week. I'm looking for a way to prevent this from happening so the next crawl is accurate. I was thinking of using sh404SEFs "UserAgent white list" feature. Does SEOMoz have a UserAgent string that I could try adding to my white list? Is this what you guys recommend as a solution to this problem?
Technical SEO | | JBradySD0 -
Blocking AJAX Content from being crawled
Our website has some pages with content shared from a third party provider and we use AJAX as our implementation. We dont want Google to crawl the third party's content but we do want them to crawl and index the rest of the web page. However, In light of Google's recent announcement about more effectively indexing google, I have some concern that we are at risk for that content to be indexed. I have thought about x-robots but have concern about implementing it on the pages because of a potential risk in Google not indexing the whole page. These pages get significant traffic for the website, and I cant risk. Thanks, Phil
Technical SEO | | AU-SEO0