Temporarily suspend Googlebot without blocking users
-
We'll soon be launching a redesign, on a new platform, migrating millions of pages to new URLs.
How can I tell Google (and other crawlers) to temporarily (a day or two) ignore my site? We're hoping to buy ourselves a small bit of time to verify redirects and live functionality before allowing Google to crawl and index the new architecture.
GWT's recommendation is to 503 all pages - including robots.txt, but that also makes the site invisible to real site visitors, resulting in significant business loss. Bad answer.
I've heard some recommendations to disallow all user agents in robots.txt. Any answer that puts the millions of pages we already have indexed at risk is also a bad answer.
Thanks
-
So it seems like we've gone full circle.
The initial question was, "How can I tell Google (and other crawlers) to temporarily (a day or two) ignore my site? We're hoping to buy ourselves a small bit of time to verify redirects and live functionality before allowing Google to crawl and index the new architecture."
Sounds like the answer is, 'that's not possible'.
-
Putting a noindex/nofollow on an index url will remove it from SERPs, although some ulrs will still show for direct search (using the url itself as a KW) but even then they will appear as clear links without any TItle/Description details.
Using a 301 redirect will remove the old page from index, regardless of noindex/nofollow.
If you are using a noindex/nofollow for the new url - both will not show.
-
Thank you, Ruth!
Can I ask a clarifying question?
If I put a noindex/nofollow on the new urls, wouldn't the result be the same as if I put noindex/nofollow on the indexed urls? There is only one instance of each page - and all of the millions of indexed URLs will be redirecting to new urls.
Here is my assumption: if I put noindex/nofollow on the new urls - a search bot will crawl the old url, follow the redirect to the new url, detect the noindex/nofollow, and then drop the old, indexed url from their index. Is that the wrong assumption?
-
I would use robots.txt to noindex the whole website as well - but just the new pages, not the old ones. Then when you're ready to be crawled, remove the robots.txt entry and Fetch as Googlebot to get re-crawled. You may fall out of the index for a day or two but should quickly be re-indexed.
Another solution would be to use the meta robots tag to individually noindex each page (if there's a way to do that in your CMS, obviously adding them by hand wouldn't be scalable), and then remove. That may increase your chances of getting re-crawled and re-indexed sooner.
-
Thanks for the response, Mark.
It sounds as if you tried this on a few new pages.
I'm talking about millions of existing pages.
Would you robots.txt noindex your entire website? Seems like you'd run a huge risk of being dumped from the index entirely.
-
I recommend robots text noindex, nofollow.
That way people can still see the pages they just aren't indexed in Google yet.
As we developed some new pages on one of our sites we did this and we could still view pages and send folks there that we wanted to see the content for feedback - but no one else knew they were there.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Google Search Console and User-declared canonical is actually Hreflang tag
Hey, We recently launched a US version of UK based ecommerce website on the us.example.com subdomain. Both websites are on Shopify so canonical tags are handled automatically and we have implemented Hreflang tags across both websites. Suddenly our rankings in the UK have dropped and after looking in search console for the UK site ive found that a lot of pages are now no longer indexed in Google because the User-declared canonical is the Hreflang tag for the US URL. Below is an example https://www.example.com/products/pac-man-arcade-cabinet - is the product page is the canonical tag rel="alternate" href="https://www.example.com/products/pac-man-arcade-cabinet" hreflang="en-gb" /> - UK hreflang tag rel="alternate" href="https://us.example.com/products/pac-man-arcade-cabinet" hreflang="en-us" /> - US Hreflang tag then in Google search console the user-defined canonical is https://us.example.com/products/pac-man-arcade-cabinet but it should be https://www.example.com/products/pac-man-arcade-cabinet The UK website has been assigned to target the United Kingdom in Search Console and the US website has been assigned to target the United States. We also do not have access to robots.txt file unfortunately. Any help or insight would be greatly appreciated.
Technical SEO | | PeterRubber0 -
URL structure change for pages without traffic: 301 redirect or not ?
Hi, I am just starting with MOZ PRO and trying to handle the high priority issues, starting with pages with 4XX Client Error. I am wondering what we should do with pages with no traffic and no external links. For instance: So time ago we change the URL structure of our blog to a flatter one, and so eg we moved a page: from: domain-name/dla-rodzicow/poradniki/poradniki-po-markach/vilac/vilac-zabawki-z-dusza to: domain-name/dla-rodzicow/poradniki/marka-vilac/vilac-zabawki-z-dusza/ Still not very flat but this is not the point. MOZ PRO shows we are having internal links to the old url. According to MOZ PRO, we don't have external links. According to Analytics we have no traffic on the old page. So now we changed the internal link, and I am wondering whether we should 301 redirect the old page to the new one, or whether a sitemap update is enough for this kind of pages ? Thanks in advance for your help.
Technical SEO | | isabelledylag0 -
Upgrade old sitemap to a new sitemap index. How to do without danger ?
Hi MOZ users and friends. I have a website that have a php template developed by ourselves, and a wordpress blog in /blog/ subdirectory. Actually we have a sitemap.xml file in the root domain where are all the subsections and blog's posts. We upgrade manually the sitemap, once a month, adding the new posts created in the blog. I want to automate this process , so i created a sitemap index with two sitemaps inside it. One is the old sitemap without the blog's posts and a new one created with "Google XML Sitemap" wordpress plugin, inside the /blog/ subdirectory. That is, in the sitemap_index.xml file i have: Domain.com/sitemap.xml (old sitemap after remove blog posts urls) Domain.com/blog/sitemap.xml (auto-updatable sitemap create with Google XML plugin) Now i have to submit this sitemap index to Google Search Console, but i want to be completely sure about how to do this. I think that the only that i have to do is delete the old sitemap on Search Console and upload the new sitemap index, is it ok ?
Technical SEO | | ClaudioHeilborn0 -
Googlebot cannot access your site
Hello, I have a website http://www.fivestarstoneinc.com/ and earlier today I got an emil from webmaster tools saying "Googlebot cannot access your site" Wondering what the problem could be and how to fix it.
Technical SEO | | Rank-and-Grow0 -
Localization without proper address?
Hi Mozzers, recently I received a project to promote a hotel website in a third world country. They have no street names, no landline phone, no zip-code. So far I tried to give a good address description in all social networks and on the homepage (footer) and signed into hotel directories. Suddently a new website of another hotel came up on google and made it up to number 1. They put a fake telefon number (landline) on the website. Is that a good idea of localizing a business? Do you have recommendations for me how to enhance. Thanks
Technical SEO | | reisefm0 -
SEOMoz Crawler vs Googlebot Question
I read somewhere that SEOMoz’s crawler marks a page in its Crawl Diagnostics as duplicate content if it doesn’t have more than 5% unique content.(I can’t find that statistic anywhere on SEOMoz to confirm though). We are an eCommerce site, so many of our pages share the same sidebar, header, and footer links. The pages flagged by SEOMoz as duplicates have these same links, but they have unique URLs and category names. Because they’re not actual duplicates of each other, canonical tags aren’t the answer. Also because inventory might automatically come back in stock, we can’t use 301 redirects on these “duplicate” pages. It seems like it’s the sidebar, header, and footer links that are what’s causing these pages to be flagged as duplicates. Does the SEOMoz crawler mimic the way Googlebot works? Also, is Googlebot smart enough not to count the sidebar and header/footer links when looking for duplicate content?
Technical SEO | | ElDude0 -
Magento CMS Block Issue --- Help Please
Good Morning, We have a Magento shopping cart based site running on RedHat version of Linux. We had a CMS block created for the homepage of http://goo.gl/JgK1e designed to be visible only on the homepage only and nowhere else. We copied the entire site structure onto a new URL http://goo.gl/XUH3f . (this one running on CentOS) and have an odd situation on our hands... Even though the CMS block “static_after_footer_block” is “enabled”, it either completely disappears (moments later), or whenever it does display, it is visible in ALL levels of the site (not just the homepage it was designed for) Other than this anomaly, the site seems to be operating correctly… Anyone out there with some insight? Thanks!
Technical SEO | | Prime850 -
How to get user genreated reviews indexed properly?
We are currently working to improve the deployment of a review widget on our website. The widget was deployed about 18 months ago and all reviews are behind Java navigation. I have been working with our IT staff to get the reviews into an HTML page which will either live on the product page as a tab or will be a link from the product page. Our IT staff has suggested leaving the Java navigation for users and creating separate HTML pages specifically for search engines. Based on my experience, this sounds like a bad idea, basically creating pages just for search engines that will not be use by site visitors, although the visitors will have access to the same content via the Java navigation. Anyone care to comment on this? Is creating HTML pages specifically for search engines a bad idea? An acceptable idea?
Technical SEO | | seorunner0