Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Best practice for deindexing large quantities of pages
-
We are trying to deindex a large quantity of pages on our site and want to know what the best practice for doing that is. For reference, the reason we are looking for methods that could help us speed it up is we have about 500,000 URLs that we want deindexed because of mis-formatted HTML code and google indexed them much faster than it is taking to unindex them unfortunately.
We don't want to risk clogging up our limited crawl log/budget by submitting a sitemap of URLs that have "noindex" on them as a hack for deindexing. Although theoretically that should work, we are looking for white hat methods that are faster than "being patient and waiting it out", since that would likely take months if not years with Google's current crawl rate of our site.
-
Unfortunately, I don't think there's any easy/fast way to do this. I just ran a test to see how long it take Google to actually obey a noindex tag, and it's taken a little over 2 months for them all to be removed. I had 2 WP blogs that I added the noindex tag to all category, tag, and author pages and monitored the index count 4 or 5 times per week by running site:example.com inurl:/category/ queries. There was a lot of fluctuation at the beginnning, but eventually took hold after about 2 months. On one of the sites, I did add an XML sitemap with only the noindexed URLs on it, submitted it via Search Console, but that didn't seem to have an impact on how quickly they were dropped out.
See the screenshot below of my plotting of indexed pages per subfolder:
-
Hey,
you might be interested in this thread for getting your question answered.
https://moz.com/community/q/quickest-way-to-deindex-a-large-number-of-pages
Hope it helps. Cheers, Martin
-
Hi,
I have never tested the method that I'm sharing here. Please check once it might be helpful in your case.
https://www.searchcommander.com/how-to-bulk-remove-urls-google/
Thanks
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Multiple Markups on The Same Page - Best Solution?
Hi there! I have a website that is build in react javascript, and I'm trying to use markup on my pages. They are mostly articles about general topics with common questions (about the topic), and for most articles I would like to use two markups: article markup + FAQ Markup ( for the questions in the article) article markup + how-to markup Can I do this or will Google get confused? Since I have two @type at the same time, for example @type": "FAQPage" and "@type": "Article". How should I think? I'm using https://schema.dev/ right now. Thanks!
Intermediate & Advanced SEO | | Leowa0 -
Is single H1 tag still best practice?
Hi Guys, Is having a single h1 tag still best practice for SEO? Guessing multiple h1 tags dilute the value of the tag and keywords within the tag. Thoughts? Cheers.
Intermediate & Advanced SEO | | kayl870 -
What is best practice for "Sorting" URLs to prevent indexing and for best link juice ?
We are now introducing 5 links in all our category pages for different sorting options of category listings.
Intermediate & Advanced SEO | | lcourse
The site has about 100.000 pages and with this change the number of URLs may go up to over 350.000 pages.
Until now google is indexing well our site but I would like to prevent the "sorting URLS" leading to less complete crawling of our core pages, especially since we are planning further huge expansion of pages soon. Apart from blocking the paramter in the search console (which did not really work well for me in the past to prevent indexing) what do you suggest to minimize indexing of these URLs also taking into consideration link juice optimization? On a technical level the sorting is implemented in a way that the whole page is reloaded, for which may be better options as well.0 -
Membership/subscriber (/customer) only content and SEO best practice
Hello Mozzers, I was wondering whether there's any best practice guidance out there re: how to deal with membership/subscriber (existing customer) only content on a website, from an SEO perspective - what is best practice? A few SEOs have told me to make some of the content visible to Google, for SEO purposes, yet I'm really not sure whether this is acceptable / manipulative, and I don't want to upset Google (or users for that matter!) Thanks in advance, Luke
Intermediate & Advanced SEO | | McTaggart0 -
Putting "noindex" on a page that's in an iframe... what will that mean for the parent page?
If I've got a page that is being called in an iframe, on my homepage, and I don't want that called page to be indexed.... so I put a noindex tag on the called page (but not on the homepage) what might that mean for the homepage? Nothing? Will Google, Bing, Yahoo, or anyone else, potentially see that as a noindex tag on my homepage?
Intermediate & Advanced SEO | | Philip-DiPatrizio0 -
How long takes to a page show up in Google results after removing noindex from a page?
Hi folks, A client of mine created a new page and used meta robots noindex to not show the page while they are not ready to launch it. The problem is that somehow Google "crawled" the page and now, after removing the meta robots noindex, the page does not show up in the results. We've tried to crawl it using Fetch as Googlebot, and then submit it using the button that appears. We've included the page in sitemap.xml and also used the old Google submit new page URL https://www.google.com/webmasters/tools/submit-url Does anyone know how long will it take for Google to show the page AFTER removing meta robots noindex from the page? Any reliable references of the statement? I did not find any Google video/post about this. I know that in some days it will appear but I'd like to have a good reference for the future. Thanks.
Intermediate & Advanced SEO | | fabioricotta-840380 -
Best practice for removing indexed internal search pages from Google?
Hi Mozzers I know that it’s best practice to block Google from indexing internal search pages, but what’s best practice when “the damage is done”? I have a project where a substantial part of our visitors and income lands on an internal search page, because Google has indexed them (about 3 %). I would like to block Google from indexing the search pages via the meta noindex,follow tag because: Google Guidelines: “Use robots.txt to prevent crawling of search results pages or other auto-generated pages that don't add much value for users coming from search engines.” http://support.google.com/webmasters/bin/answer.py?hl=en&answer=35769 Bad user experience The search pages are (probably) stealing rankings from our real landing pages Webmaster Notification: “Googlebot found an extremely high number of URLs on your site” with links to our internal search results I want to use the meta tag to keep the link juice flowing. Do you recommend using the robots.txt instead? If yes, why? Should we just go dark on the internal search pages, or how shall we proceed with blocking them? I’m looking forward to your answer! Edit: Google have currently indexed several million of our internal search pages.
Intermediate & Advanced SEO | | HrThomsen0 -
There's a website I'm working with that has a .php extension. All the pages do. What's the best practice to remove the .php extension across all pages?
Client wishes to drop the .php extension on all their pages (they've got around 2k pages). I assured them that wasn't necessary. However, in the event that I do end up doing this what's the best practices way (and easiest way) to do this? This is also a WordPress site. Thanks.
Intermediate & Advanced SEO | | digisavvy0