Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Removing Dynamic "noindex" URL's from Index
-
6 months ago my clients site was overhauled and the user generated searches had an index tag on them. I switched that to noindex but didn't get it fast enough to avoid being 100's of pages indexed in Google.
It's been months since switching to the noindex tag and the pages are still indexed. What would you recommend? Google crawls my site daily - but never the pages that I want removed from the index.
I am trying to avoid submitting hundreds of these dynamic URL's to the removal tool in webmaster tools. Suggestions?
-
Hooray! Usually, I just give my advice and then run away, so it's always nice to hear I was actually right about something
Seriously, glad you got it sorted out. -
Just a follow up to your suggestion.
I created sitemaps for the pages I want removed using the google spreadsheet importXML functions, which saved a lot of time.
It took a couple weeks but all of the pages, and similar pages, have successfully been removed from the index. Even the similar pages I didn't get a chance to put in the sitemap yet (importXML limits the results to 100).
Your suggestion worked!
-
I can't 404 dynamic search pages.
-
There are a mix of search pages and old mobile pages.
The search pages I've been testing out having the canonical point to the default search page. I've seen a slight drop in these pages - but I guess I just have to be more patient.
For the other pages the path is no longer there like you were mentioning. I like the idea of setting up the XML sitemap, I never even thought of making a bad/indexed page sitemap. I will give that a shot! Thankfully this will be a quick job with the importXml function in google spreadsheets! Great tip, hopefully it'll work.
-
Is there a crawl path to them currently? One issue I see a lot is that a bunch of pages get indexed, the path is found and cut off, NOINDEX (canonical, 301, etc.) is added, but then the pages never get re-crawled. Since they don't get recrawled, the page-level directive never gets honored.
If there's a URL parameter involved, you could use parameter-handling in GWT - it's not a perfect solution, but it sometimes seems to work without a re-crawl.
The other option would be to create a new XML sitemap with all of the bad/indexed URLs. This may push Google to re-crawl them and then see the tags to deindex. It's a bit safer than re-opening the crawl paths.
If they are being crawled and Google is just ignoring the NOINDEX for some reason, I'd try to 301 or canonical those pages to a primary search page, if that's feasible (probably canonical, since you don't want the users to 301). Sometimes, if a signal isn't working for that long, you just have to shake Google and try a different signal. Even following their exact recommendations, it rarely works as planned at large scale.
-
Don't use GWMT's removal tool to remove URLs which should not be in the index (unless those expose sensitive information). Best practise is to exclude them in robots.txt and to also ensure that the pages either 404 or have a noindex,noarchive tag.
-
Change the site structure and let the pages 404, Google will deindex them if they are not being linked to.
-
You could try adding the pages you want to remove to your robots.txt file. Since you're not linking to them, and it's very unlikely that Googlebot will index those pages naturally now, this might be a better way of telling it which pages to explicitly not index.
I'm not really sure how quickly this will trigger Google to remove those pages from the index - but they do reference robots.txt on the actual "Remove URLs" page of WMT ---> "Use **robots.txt **to specify how search engines should crawl your site, or request **removal **of URLs from Google's search results ..."
For that technique, you'd want to add something like this for all of the pages you want to remove:
Disallow: /oldpage1toremove.phpThat should work. If it doesn't, then I would probably just submit the requests through the "Remove URLs" tool.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Page with metatag noindex is STILL being indexed?!
Hi Mozers, There are over 200 pages from our site that have a meta tag "noindex" but are STILL being indexed. What else can I do to remove them from the Index?
Intermediate & Advanced SEO | | yaelslater0 -
My url disappeared from Google but Search Console shows indexed. This url has been indexed for more than a year. Please help!
Super weird problem that I can't solve for last 5 hours. One of my urls: https://www.dcacar.com/lax-car-service.html Has been indexed for more than a year and also has an AMP version, few hours ago I realized that it had disappeared from serps. We were ranking on page 1 for several key terms. When I perform a search "site:dcacar.com " the url is no where to be found on all 5 pages. But when I check my Google Console it shows as indexed I requested to index again but nothing changed. All other 50 or so urls are not effected at all, this is the only url that has gone missing can someone solve this mystery for me please. Thanks a lot in advance.
Intermediate & Advanced SEO | | Davit19850 -
Sanity Check: NoIndexing a Boatload of URLs
Hi, I'm working with a Shopify site that has about 10x more URLs in Google's index than it really ought to. This equals thousands of urls bloating the index. Shopify makes it super easy to make endless new collections of products, where none of the new collections has any new content... just a new mix of products. Over time, this makes for a ton of duplicate content. My response, aside from making other new/unique content, is to select some choice collections with KW/topic opportunities in organic and add unique content to those pages. At the same time, noindexing the other 90% of excess collections pages. The thing is there's evidently no method that I could find of just uploading a list of urls to Shopify to tag noindex. And, it's too time consuming to do this one url at a time, so I wrote a little script to add a noindex tag (not nofollow) to pages that share various identical title tags, since many of them do. This saves some time, but I have to be careful to not inadvertently noindex a page I want to keep. Here are my questions: Is this what you would do? To me it seems a little crazy that I have to do this by title tag, although faster than one at a time. Would you follow it up with a deindex request (one url at a time) with Google or just let Google figure it out over time? Are there any potential negative side effects from noindexing 90% of what Google is already aware of? Any additional ideas? Thanks! Best... Mike
Intermediate & Advanced SEO | | 945010 -
Will disallowing URL's in the robots.txt file stop those URL's being indexed by Google
I found a lot of duplicate title tags showing in Google Webmaster Tools. When I visited the URL's that these duplicates belonged to, I found that they were just images from a gallery that we didn't particularly want Google to index. There is no benefit to the end user in these image pages being indexed in Google. Our developer has told us that these urls are created by a module and are not "real" pages in the CMS. They would like to add the following to our robots.txt file Disallow: /catalog/product/gallery/ QUESTION: If the these pages are already indexed by Google, will this adjustment to the robots.txt file help to remove the pages from the index? We don't want these pages to be found.
Intermediate & Advanced SEO | | andyheath0 -
Why is /home used in this company's home URL?
Just working with a company that has chosen a home URL with /home latched on - very strange indeed - has anybody else comes across this kind of homepage URL "decision" in the past? I can't see why on earth anybody would do this! Perhaps simply a logic-defying decision?
Intermediate & Advanced SEO | | McTaggart0 -
Dev Subdomain Pages Indexed - How to Remove
I own a website (domain.com) and used the subdomain "dev.domain.com" while adding a new section to the site (as a development link). I forgot to block the dev.domain.com in my robots file, and google indexed all of the dev pages (around 100 of them). I blocked the site (dev.domain.com) in robots, and then proceeded to just delete the entire subdomain altogether. It's been about a week now and I still see the subdomain pages indexed on Google. How do I get these pages removed from Google? Are they causing duplicate content/title issues, or does Google know that it's a development subdomain and it's just taking time for them to recognize that I deleted it already?
Intermediate & Advanced SEO | | WebServiceConsulting.com0 -
Using the same content on different TLD's
HI Everyone, We have clients for whom we are going to work with in different countries but sometimes with the same language. For example we might have a client in a competitive niche working in Germany, Austria and Switzerland (Swiss German) ie we're going to potentially rewrite our website three times in German, We're thinking of using Google's href lang tags and use pretty much the same content - is this a safe option, has anyone actually tries this successfully or otherwise? All answers appreciated. Cheers, Mel.
Intermediate & Advanced SEO | | dancape1 -
Blocking Dynamic URLs with Robots.txt
Background: My e-commerce site uses a lot of layered navigation and sorting links. While this is great for users, it ends up in a lot of URL variations of the same page being crawled by Google. For example, a standard category page: www.mysite.com/widgets.html ...which uses a "Price" layered navigation sidebar to filter products based on price also produces the following URLs which link to the same page: http://www.mysite.com/widgets.html?price=1%2C250 http://www.mysite.com/widgets.html?price=2%2C250 http://www.mysite.com/widgets.html?price=3%2C250 As there are literally thousands of these URL variations being indexed, so I'd like to use Robots.txt to disallow these variations. Question: Is this a wise thing to do? Or does Google take into account layered navigation links by default, and I don't need to worry. To implement, I was going to do the following in Robots.txt: User-agent: * Disallow: /*? Disallow: /*= ....which would prevent any dynamic URL with a '?" or '=' from being indexed. Is there a better way to do this, or is this a good solution? Thank you!
Intermediate & Advanced SEO | | AndrewY1