Temporarily suspend Googlebot without blocking users
-
We'll soon be launching a redesign, on a new platform, migrating millions of pages to new URLs.
How can I tell Google (and other crawlers) to temporarily (a day or two) ignore my site? We're hoping to buy ourselves a small bit of time to verify redirects and live functionality before allowing Google to crawl and index the new architecture.
GWT's recommendation is to 503 all pages - including robots.txt, but that also makes the site invisible to real site visitors, resulting in significant business loss. Bad answer.
I've heard some recommendations to disallow all user agents in robots.txt. Any answer that puts the millions of pages we already have indexed at risk is also a bad answer.
Thanks
-
So it seems like we've gone full circle.
The initial question was, "How can I tell Google (and other crawlers) to temporarily (a day or two) ignore my site? We're hoping to buy ourselves a small bit of time to verify redirects and live functionality before allowing Google to crawl and index the new architecture."
Sounds like the answer is, 'that's not possible'.
-
Putting a noindex/nofollow on an index url will remove it from SERPs, although some ulrs will still show for direct search (using the url itself as a KW) but even then they will appear as clear links without any TItle/Description details.
Using a 301 redirect will remove the old page from index, regardless of noindex/nofollow.
If you are using a noindex/nofollow for the new url - both will not show.
-
Thank you, Ruth!
Can I ask a clarifying question?
If I put a noindex/nofollow on the new urls, wouldn't the result be the same as if I put noindex/nofollow on the indexed urls? There is only one instance of each page - and all of the millions of indexed URLs will be redirecting to new urls.
Here is my assumption: if I put noindex/nofollow on the new urls - a search bot will crawl the old url, follow the redirect to the new url, detect the noindex/nofollow, and then drop the old, indexed url from their index. Is that the wrong assumption?
-
I would use robots.txt to noindex the whole website as well - but just the new pages, not the old ones. Then when you're ready to be crawled, remove the robots.txt entry and Fetch as Googlebot to get re-crawled. You may fall out of the index for a day or two but should quickly be re-indexed.
Another solution would be to use the meta robots tag to individually noindex each page (if there's a way to do that in your CMS, obviously adding them by hand wouldn't be scalable), and then remove. That may increase your chances of getting re-crawled and re-indexed sooner.
-
Thanks for the response, Mark.
It sounds as if you tried this on a few new pages.
I'm talking about millions of existing pages.
Would you robots.txt noindex your entire website? Seems like you'd run a huge risk of being dumped from the index entirely.
-
I recommend robots text noindex, nofollow.
That way people can still see the pages they just aren't indexed in Google yet.
As we developed some new pages on one of our sites we did this and we could still view pages and send folks there that we wanted to see the content for feedback - but no one else knew they were there.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Page Indexing without content
Hello. I have a problem of page indexing without content. I have website in 3 different languages and 2 of the pages are indexing just fine, but one language page (the most important one) is indexing without content. When searching using site: page comes up, but when searching unique keywords for which I should rank 100% nothing comes up. This page was indexing just fine and the problem arose couple of days ago after google update finished. Looking further, the problem is language related and every page in the given language that is newly indexed has this problem, while pages that were last crawled around one week ago are just fine. Has anyone ran into this type of problem?
Technical SEO | | AtuliSulava1 -
3,511 Pages Indexed and 3,331 Pages Blocked by Robots
Morning, So I checked our site's index status on WMT, and I'm being told that Google is indexing 3,511 pages and the robots are blocking 3,331. This seems slightly odd as we're only disallowing 24 pages on the robots.txt file. In light of this, I have the following queries: Do these figures mean that Google is indexing 3,511 pages and blocking 3,331 other pages? Or does it mean that it's blocking 3,331 pages of the 3,511 indexed? As there are only 24 URLs being disallowed on robots.text, why are 3,331 pages being blocked? Will these be variations of the URLs we've submitted? Currently, we don't have a sitemap. I know, I know, it's pretty unforgivable but the old one didn't really work and the developers are working on the new one. Once submitted, will this help? I think I know the answer to this, but is there any way to ascertain which pages are being blocked? Thanks in advance! Lewis
Technical SEO | | PeaSoupDigital0 -
How to use robots.txt to block areas on page?
Hi, Across the categories/product pages on out site there are archives/shipping info section and the texts are always the same. Would this be treated as duplicated content and harmful for seo? How can I alter robots.txt to tell google not to crawl those particular text Thanks for any advice!
Technical SEO | | LauraHT0 -
Redirect of https:// to http:// without SSL. Possible or not?!
Good afternoon, smart dudes : ) I am here to ask for your help. I posted this question on google help forum and stackoverflow, but looks like people do not know the correct answer... QUESTION: We used to have a secured site, but recently purchased a separate reservation software that provides SSL (takes clients to a separate secured website) where they can fill out the reservation form. We cancelled our SSL (just think its a waste to pay $100 for securing plain text). Now i have so many links pointing to our secured site and i have no idea how to fix it! How do i redirect https://www.mysite.comto http://www.mysite.com.Also would like to mention that i already have redirect from non www to www domain (not sure if that matters): RewriteEngine onRewriteCond %{HTTP_HOST} ^mysite.com$ [NC]RewriteRule ^(.*)$ http://www.mysite.com/$1 [R=301,L]As i already mentioned....we do not have SSL!!!! None of those 301 redirect codes i found online work (you have to have SSL for the site to be redirected from https to http | currently i get an error - can't establish a secured connection to the server ). Is there anything i can do???? Or do i have to purchase SSL again?
Technical SEO | | JennaD140 -
Google insists robots.txt is blocking... but it isn't.
I recently launched a new website. During development, I'd enabled the option in WordPress to prevent search engines from indexing the site. When the site went public (over 24 hours ago), I cleared that option. At that point, I added a specific robots.txt file that only disallowed a couple directories of files. You can view the robots.txt at http://photogeardeals.com/robots.txt Google (via Webmaster tools) is insisting that my robots.txt file contains a "Disallow: /" on line 2 and that it's preventing Google from indexing the site and preventing me from submitting a sitemap. These errors are showing both in the sitemap section of Webmaster tools as well as the Blocked URLs section. Bing's webmaster tools are able to read the site and sitemap just fine. Any idea why Google insists I'm disallowing everything even after telling it to re-fetch?
Technical SEO | | ahockley0 -
User Created Subdomain Help
Have I searched FAQ: Yes My issue is unique because of the way our website works and I hope that someone can provide some guidance on this.Our website http://breezi.com is a website builder where users can build their own website. When users build their site it creates a sub-domain route to their created site, for example: http://mike.breezi.com. Now that I have explained how our site works here is the problem: Google Webmaster Tools and Bing Webmaster Tools are indexing ALL the user created websites under our TLD and thus it is our impression that any content created in those sub-domains can confuse the search engine to thinking that the user created website and content is relevant to _OUR _main sitehttp://breezi.com. So, what we would like to know if there is a way to let search engines know that the user created sites and content is not related to our TLD site. Thanks for any help and advise.
Technical SEO | | breezi0 -
301'ing googlebot
I have a client that has been 301’ing googlebot to the canonical page. This is because they have a cart_id and session parameters in urls. This is mainly from when googlebot comes in on a link that has these parameters in the URL, as they don’t serve these parameters up to googlebot at all once it starts to crawl the site.
Technical SEO | | AlanMosley
I am worried about cloaking; I wanted to know if anyone has any info on this.
I know that Google have said that doing anything where you detect goolgebots useragent and treat them different is a problem.
Anybody had any experience on this, I would be glad to hear.0 -
Why googlebot indexing one page, not the other?
Why googlebot indexing one page, not the other in the same conditions? In html sitemap, for example. We have 6 new pages with unique content. Googlebot immediately indexes only 2 pages, and then after sometime the remaining 4 pages. On what parameters the crawler decides to scan or not scan this page?
Technical SEO | | ATCnik0