Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Best way to permanently remove URLs from the Google index?
-
We have several subdomains we use for testing applications. Even if we block with robots.txt, these subdomains still appear to get indexed (though they show as blocked by robots.txt.
I've claimed these subdomains and requested permanent removal, but it appears that after a certain time period (6 months)? Google will re-index (and mark them as blocked by robots.txt).
What is the best way to permanently remove these from the index? We can't use login to block because our clients want to be able to view these applications without needing to login.
What is the next best solution?
-
I agree with Paul, The Google is re indexing the pages because you have few linking pointing back to these sub domains. The best idea us to restrict Google crawler by using no-index , no-follow tag and remove the instruction available in the robots.txt...
This way Google will neither crawl nor follow the activity on the page and it will get permanently remove from Google Index.
-
Yup - Chris has the solution. The robots.txt disallow directive simply instructs the crawler not to crawl, it doesn't have any instructions regarding removing URLs from the index. I'm betting there are other pages linking in to the subdomains that the bots are following to find and index as the URL Removal requests are expiring.
Do note though that when you add the no-index meta-robots tag, you're going to need to remove the robots.txt disallow directive. Otherwise the crawlers won't make any attempt to crawl all the pages and so won't even discover most of the no-index requests.
Paul
[Edited to add - there's no reason you can't implement the no-index meta-tags and then also again request removal via the Webmaster Tools removal tool. Kind of a "belt & suspenders approach. The removal request will get it out quicker, and the meta-no-index will do the job of keeping it out. Remember to do this in Bing Webmaster Tools as well.]
-
Wouldn't a noindex meta tag on each page take care of it?
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Remove Product & Category from URLS in Wordpress
Does anyone have experience removing /product/ and /product-category/, etc. from URLs in wordpress? I found this link from Wordpress which explains that this shouldn't be done, but I would like some opinions of those who have tried it please. https://docs.woocommerce.com/document/removing-product-product-category-or-shop-from-the-urls/
Intermediate & Advanced SEO | | moon-boots0 -
Will disallowing URL's in the robots.txt file stop those URL's being indexed by Google
I found a lot of duplicate title tags showing in Google Webmaster Tools. When I visited the URL's that these duplicates belonged to, I found that they were just images from a gallery that we didn't particularly want Google to index. There is no benefit to the end user in these image pages being indexed in Google. Our developer has told us that these urls are created by a module and are not "real" pages in the CMS. They would like to add the following to our robots.txt file Disallow: /catalog/product/gallery/ QUESTION: If the these pages are already indexed by Google, will this adjustment to the robots.txt file help to remove the pages from the index? We don't want these pages to be found.
Intermediate & Advanced SEO | | andyheath0 -
URL Injection Hack - What to do with spammy URLs that keep appearing in Google's index?
A website was hacked (URL injection) but the malicious code has been cleaned up and removed from all pages. However, whenever we run a site:domain.com in Google, we keep finding more spammy URLs from the hack. They all lead to a 404 error page since the hack was cleaned up in the code. We have been using the Google WMT Remove URLs tool to have these spammy URLs removed from Google's index but new URLs keep appearing every day. We looked at the cache dates on these URLs and they are vary in dates but none are recent and most are from a month ago when the initial hack occurred. My question is...should we continue to check the index every day and keep submitting these URLs to be removed manually? Or since they all lead to a 404 page will Google eventually remove these spammy URLs from the index automatically? Thanks in advance Moz community for your feedback.
Intermediate & Advanced SEO | | peteboyd0 -
Dev Subdomain Pages Indexed - How to Remove
I own a website (domain.com) and used the subdomain "dev.domain.com" while adding a new section to the site (as a development link). I forgot to block the dev.domain.com in my robots file, and google indexed all of the dev pages (around 100 of them). I blocked the site (dev.domain.com) in robots, and then proceeded to just delete the entire subdomain altogether. It's been about a week now and I still see the subdomain pages indexed on Google. How do I get these pages removed from Google? Are they causing duplicate content/title issues, or does Google know that it's a development subdomain and it's just taking time for them to recognize that I deleted it already?
Intermediate & Advanced SEO | | WebServiceConsulting.com0 -
How is Google crawling and indexing this directory listing?
We have three Directory Listing pages that are being indexed by Google: http://www.ccisolutions.com/StoreFront/jsp/ http://www.ccisolutions.com/StoreFront/jsp/html/ http://www.ccisolutions.com/StoreFront/jsp/pdf/ How and why is Googlebot crawling and indexing these pages? Nothing else links to them (although the /jsp.html/ and /jsp/pdf/ both link back to /jsp/). They aren't disallowed in our robots.txt file and I understand that this could be why. If we add them to our robots.txt file and disallow, will this prevent Googlebot from crawling and indexing those Directory Listing pages without prohibiting them from crawling and indexing the content that resides there which is used to populate pages on our site? Having these pages indexed in Google is causing a myriad of issues, not the least of which is duplicate content. For example, this file <tt>CCI-SALES-STAFF.HTML</tt> (which appears on this Directory Listing referenced above - http://www.ccisolutions.com/StoreFront/jsp/html/) clicks through to this Web page: http://www.ccisolutions.com/StoreFront/jsp/html/CCI-SALES-STAFF.HTML This page is indexed in Google and we don't want it to be. But so is the actual page where we intended the content contained in that file to display: http://www.ccisolutions.com/StoreFront/category/meet-our-sales-staff As you can see, this results in duplicate content problems. Is there a way to disallow Googlebot from crawling that Directory Listing page, and, provided that we have this URL in our sitemap: http://www.ccisolutions.com/StoreFront/category/meet-our-sales-staff, solve the duplicate content issue as a result? For example: Disallow: /StoreFront/jsp/ Disallow: /StoreFront/jsp/html/ Disallow: /StoreFront/jsp/pdf/ Can we do this without risking blocking Googlebot from content we do want crawled and indexed? Many thanks in advance for any and all help on this one!
Intermediate & Advanced SEO | | danatanseo0 -
What is the best URL structure for categories?
A client's site currently uses the URL structure: www.website.com/�tegory%/%postname% Which I think is optimised fairly well, as the categories are keywords being targeted. However, as they are using a category hierarchy, often times the URL looks like this: www.website.com/parent-category/child-category/some-post-titles-are-quite-long-as-they-are-long-tail-terms Best practise often dictates (such as point 3 in this Moz article) that shorter URLs are better for several reasons. So I'm left with a few options: Remove the category from the URL Flatten the category hierarchy Shorten post titles two a word or two - which would hurt my long tail search term traffic. Leave it as it is What do we think is the best route to take? Thanks in advance!
Intermediate & Advanced SEO | | underscorelive0 -
Removing Dynamic "noindex" URL's from Index
6 months ago my clients site was overhauled and the user generated searches had an index tag on them. I switched that to noindex but didn't get it fast enough to avoid being 100's of pages indexed in Google. It's been months since switching to the noindex tag and the pages are still indexed. What would you recommend? Google crawls my site daily - but never the pages that I want removed from the index. I am trying to avoid submitting hundreds of these dynamic URL's to the removal tool in webmaster tools. Suggestions?
Intermediate & Advanced SEO | | BeTheBoss0 -
What is the best way to embed PDF documents for SEO?
I have been using SCRIBD to embed PDF documents on my site but until recently I did not include the link back to SCRIBD. Will my site get credit for this content or will it go to SCRIBD? Is there a better way to embed PDF documents for SEO?
Intermediate & Advanced SEO | | casper4340