Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Removing Dynamic "noindex" URL's from Index
-
6 months ago my clients site was overhauled and the user generated searches had an index tag on them. I switched that to noindex but didn't get it fast enough to avoid being 100's of pages indexed in Google.
It's been months since switching to the noindex tag and the pages are still indexed. What would you recommend? Google crawls my site daily - but never the pages that I want removed from the index.
I am trying to avoid submitting hundreds of these dynamic URL's to the removal tool in webmaster tools. Suggestions?
-
Hooray! Usually, I just give my advice and then run away, so it's always nice to hear I was actually right about something
Seriously, glad you got it sorted out.
-
Just a follow up to your suggestion.
I created sitemaps for the pages I want removed using the google spreadsheet importXML functions, which saved a lot of time.
It took a couple weeks but all of the pages, and similar pages, have successfully been removed from the index. Even the similar pages I didn't get a chance to put in the sitemap yet (importXML limits the results to 100).
Your suggestion worked!
-
I can't 404 dynamic search pages.
-
There are a mix of search pages and old mobile pages.
The search pages I've been testing out having the canonical point to the default search page. I've seen a slight drop in these pages - but I guess I just have to be more patient.
For the other pages the path is no longer there like you were mentioning. I like the idea of setting up the XML sitemap, I never even thought of making a bad/indexed page sitemap. I will give that a shot! Thankfully this will be a quick job with the importXml function in google spreadsheets! Great tip, hopefully it'll work.
-
Is there a crawl path to them currently? One issue I see a lot is that a bunch of pages get indexed, the path is found and cut off, NOINDEX (canonical, 301, etc.) is added, but then the pages never get re-crawled. Since they don't get recrawled, the page-level directive never gets honored.
If there's a URL parameter involved, you could use parameter-handling in GWT - it's not a perfect solution, but it sometimes seems to work without a re-crawl.
The other option would be to create a new XML sitemap with all of the bad/indexed URLs. This may push Google to re-crawl them and then see the tags to deindex. It's a bit safer than re-opening the crawl paths.
If they are being crawled and Google is just ignoring the NOINDEX for some reason, I'd try to 301 or canonical those pages to a primary search page, if that's feasible (probably canonical, since you don't want the users to 301). Sometimes, if a signal isn't working for that long, you just have to shake Google and try a different signal. Even following their exact recommendations, it rarely works as planned at large scale.
-
Don't use GWMT's removal tool to remove URLs which should not be in the index (unless those expose sensitive information). Best practise is to exclude them in robots.txt and to also ensure that the pages either 404 or have a noindex,noarchive tag.
-
Change the site structure and let the pages 404, Google will deindex them if they are not being linked to.
-
You could try adding the pages you want to remove to your robots.txt file. Since you're not linking to them, and it's very unlikely that Googlebot will index those pages naturally now, this might be a better way of telling it which pages to explicitly not index.
I'm not really sure how quickly this will trigger Google to remove those pages from the index - but they do reference robots.txt on the actual "Remove URLs" page of WMT ---> "Use **robots.txt **to specify how search engines should crawl your site, or request **removal **of URLs from Google's search results ..."
For that technique, you'd want to add something like this for all of the pages you want to remove:
Disallow: /oldpage1toremove.php
That should work. If it doesn't, then I would probably just submit the requests through the "Remove URLs" tool.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Are HTML Sitemaps Still Effective With "Noindex, Follow"?
A site we're working on has hundreds of thousands of inventory pages that are generally "orphaned" pages. To reach them, you need to do a lot of faceting on the search results page. They appear in our XML sitemaps as well, but I'd still consider these orphan pages. To assist with crawling and indexation, we'd like to create HTML sitemaps to link to these pages. Due to the nature (and categorization) of these products, this would mean we'll be creating thousands of individual HTML sitemap pages, which we're hesitant to put into the index. Would the sitemaps still be effective if we add a noindex, follow meta tag? Does this indicate lower quality content in some way, or will it make no difference in how search engines will handle the links therein?
Intermediate & Advanced SEO | | mothner0 -
Google doesn't index image slideshow
Hi, My articles are indexed and images (full size) via a meta in the body also. But, the images in the slideshow are not indexed, have you any idea? A problem with the JS Example : http://www.parismatch.com/People/Television/Sport-a-la-tele-les-femmes-a-l-abordage-962989 Thank you in advance Julien
Intermediate & Advanced SEO | | Julien.Ferras0 -
Will disallowing URL's in the robots.txt file stop those URL's being indexed by Google
I found a lot of duplicate title tags showing in Google Webmaster Tools. When I visited the URL's that these duplicates belonged to, I found that they were just images from a gallery that we didn't particularly want Google to index. There is no benefit to the end user in these image pages being indexed in Google. Our developer has told us that these urls are created by a module and are not "real" pages in the CMS. They would like to add the following to our robots.txt file Disallow: /catalog/product/gallery/ QUESTION: If the these pages are already indexed by Google, will this adjustment to the robots.txt file help to remove the pages from the index? We don't want these pages to be found.
Intermediate & Advanced SEO | | andyheath0 -
How to de-index old URLs after redesigning the website?
Thank you for reading. After redesigning my website (5 months ago) in my crawl reports (Moz, Search Console) I still get tons of 404 pages which all seems to be the URLs from my previous website (same root domain). It would be nonsense to 301 redirect them as there are to many URLs. (or would it be nonsense?) What is the best way to deal with this issue?
Intermediate & Advanced SEO | | Chemometec0 -
Why Google isn't indexing my images?
Hello, on my fairly new website Worthminer.com I am noticing that Google is not indexing images from my sitemap. Already 560 images submitted and Google indexed only 3 of them. Altough there is more images indexed they are not indexing any new images, and I have no idea why. Posts, categories and other urls are indexing just fine, but images not. I am using Wordpress and for sitemaps Wordpress SEO by yoast. Am I missing something here? Why Google won't index my images? Thanks, I appreciate any help, David xv1GtwK.jpg
Intermediate & Advanced SEO | | Worthminer1 -
Do I need to re-index the page after editing URL?
Hi, I had to edit some of the URLs. But, google is still showing my old URL in search results for certain keywords, which ofc get 404. By crawling with ScremingFrog it gets me 301 'page not found' and still giving old URLs. Why is that? And do I need to re-index pages with new URLs? Is 'fetch as Google' enough to do that or any other advice? Thanks a lot, hope the topic will help to someone else too. Dusan
Intermediate & Advanced SEO | | Chemometec0 -
Dev Subdomain Pages Indexed - How to Remove
I own a website (domain.com) and used the subdomain "dev.domain.com" while adding a new section to the site (as a development link). I forgot to block the dev.domain.com in my robots file, and google indexed all of the dev pages (around 100 of them). I blocked the site (dev.domain.com) in robots, and then proceeded to just delete the entire subdomain altogether. It's been about a week now and I still see the subdomain pages indexed on Google. How do I get these pages removed from Google? Are they causing duplicate content/title issues, or does Google know that it's a development subdomain and it's just taking time for them to recognize that I deleted it already?
Intermediate & Advanced SEO | | WebServiceConsulting.com0 -
Rel="canonical" and rel="alternate" both necessary?
We are fighting some duplicate content issues across multiple domains. We have a few magento stores that have different country codes. For example: domain.com and domain.ca, domain.com is the "main" domain. We have set up different rel="alternative codes like: The question is, do we need to add custom rel="canonical" tags to domain.ca that points to domain.com? For example for domain.ca/product.html to point to: Also how far does rel="canonical" follow? For example if we have:
Intermediate & Advanced SEO | | AlliedComputer
domain.ca/sub/product.html canonical to domain.com/sub/product.html
then,
domain.com/sub/product.html canonical to domain.com/product.html0