Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Robots.txt & url removal vs. noindex, follow?
-
When de-indexing pages from google, what are the pros & cons of each of the below two options:
-
robots.txt & requesting url removal from google webmasters
- Use the noindex, follow meta tag on all doctor profile pages
- Keep the URLs in the Sitemap file so that Google will recrawl them and find the noindex meta tag
- make sure that they're not disallowed by the robots.txt file
-
-
Great, comprehensive answer from Ryan as ever.
Nothing more to see here folks.
Move along now.
Move along.
-
The preferred option would be the noindex, follow tag.
The robots.txt file is a choice of last resort. The best robots.txt file for a site is an empty file (i.e. no disallows). The robots.txt file is a tool that can be used when other options are either not available, or the effort is deemed as too great.
If you use robots.txt and the url removal from google, that will work, the page will get de-indexed, but then Google will never crawl that page again and therefore not follow any of the links on that page. You are blocking their crawler so your site will not be crawled as thoroughly which means pages can be missed, a lower pecentage of your pages will be indexed (mainly applies to larger sites), and the link juice which flows to any of the blocked pages will lose their value. Any anchor text or other link value on those pages will be lost as well.
If you use the "noindex, follow" tag then those pages will still be crawled, those pages will continue to contribute value to your site and the page's links will continue to offer value to their target URLs, many of which will be your site's internal pages.
A final point is the URL removal tool in Google WMT will remove the page from Google, but it wont affect Yahoo, Bing and other directories.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
What happens to crawled URLs subsequently blocked by robots.txt?
We have a very large store with 278,146 individual product pages. Since these are all various sizes and packaging quantities of less than 200 product categories my feeling is that Google would be better off making sure our category pages are indexed. I would like to block all product pages via robots.txt until we are sure all category pages are indexed, then unblock them. Our product pages rarely change, no ratings or product reviews so there is little reason for a search engine to revisit a product page. The sales team is afraid blocking a previously indexed product page will result in in it being removed from the Google index and would prefer to submit the categories by hand, 10 per day via requested crawling. Which is the better practice?
Intermediate & Advanced SEO | | AspenFasteners1 -
Footer no follow links
Just interested to know when putting links at the foot of the site some people use no-follow tags. I'm thinking about internal pages and social networks. Is this still necessary or is it an old-fashioned idea?
Intermediate & Advanced SEO | | seoman100 -
SEO Best Practices regarding Robots.txt disallow
I cannot find hard and fast direction about the following issue: It looks like the Robots.txt file on my server has been set up to disallow "account" and "search" pages within my site, so I am receiving warnings from the Google Search console that URLs are being blocked by Robots.txt. (Disallow: /Account/ and Disallow: /?search=). Do you recommend unblocking these URLs? I'm getting a warning that over 18,000 Urls are blocked by robots.txt. ("Sitemap contains urls which are blocked by robots.txt"). Seems that I wouldn't want that many urls blocked. ? Thank you!!
Intermediate & Advanced SEO | | jamiegriz0 -
If I block a URL via the robots.txt - how long will it take for Google to stop indexing that URL?
If I block a URL via the robots.txt - how long will it take for Google to stop indexing that URL?
Intermediate & Advanced SEO | | Gabriele_Layoutweb0 -
6 .htaccess Rewrites: Remove index.html, Remove .html, Force non-www, Force Trailing Slash
i've to give some information about my website Environment 1. i have static webpage in the root. 2. Wordpress installed in sub-dictionary www.domain.com/blog/ 3. I have two .htaccess , one in the root and one in the wordpress
Intermediate & Advanced SEO | | NeatIT
folder. i want to www to non on all URLs Remove index.html from url Remove all .html extension / Re-direct 301 to url
without .html extension Add trailing slash to the static webpages / Re-direct 301 from non-trailing slash Force trailing slash to the Wordpress Webpages / Re-direct 301 from non-trailing slash Some examples domain.tld/index.html >> domain.tld/ domain.tld/file.html >> domain.tld/file/ domain.tld/file.html/ >> domain.tld/file/ domain.tld/wordpress/post-name >> domain.tld/wordpress/post-name/ My code in ROOT htaccess is <ifmodule mod_rewrite.c="">Options +FollowSymLinks -MultiViews RewriteEngine On
RewriteBase / #removing trailing slash
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(.*)/$ $1 [R=301,L] #www to non
RewriteCond %{HTTP_HOST} ^www.(([a-z0-9_]+.)?domain.com)$ [NC]
RewriteRule .? http://%1%{REQUEST_URI} [R=301,L] #html
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^([^.]+)$ $1.html [NC,L] #index redirect
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /index.html\ HTTP/
RewriteRule ^index.html$ http://domain.com/ [R=301,L]
RewriteCond %{THE_REQUEST} .html
RewriteRule ^(.*).html$ /$1 [R=301,L]</ifmodule> The above code do 1. redirect www to non-www
2. Remove trailing slash at the end (if exists)
3. Remove index.html
4. Remove all .html
5. Redirect 301 to filename but doesn't add trailing slash at the end0 -
Wildcarding Robots.txt for Particular Word in URL
Hey All, So I know that this isn't a standard robots.txt, I'm aware of how to block or wildcard certain folders but I'm wondering whether it's possible to block all URL's with a certain word in it? We have a client that was hacked a year ago and now they want us to help remove some of the pages that were being autogenerated with the word "viagra" in it. I saw this article and tried implementing it https://builtvisible.com/wildcards-in-robots-txt/ and it seems that I've been able to remove some of the URL's (although I can't confirm yet until I do a full pull of the SERPs on the domain). However, when I test certain URL's inside of WMT it still says that they are allowed which makes me think that it's not working fully or working at all. In this case these are the lines I've added to the robots.txt Disallow: /*&viagra Disallow: /*&Viagra I know I have the solution of individually requesting URL's to be removed from the index but I want to see if anybody has every had success with wildcarding URL's with a certain word in their robots.txt? The individual URL route could be very tedious. Thanks! Jon
Intermediate & Advanced SEO | | EvansHunt0 -
Removing index.php
I have question for the community and whether or not this is a good or bad idea. I currently have a Joomla site that displays www.domain.com/index.php in all the URLs with the exception of the home page. I have read that it's better to not have index.php showing in the URL at all. Does it really matter if I have index.php in my URL? I've read that it is a bad practice. I am thinking about installing the sh404SEF component on my site and removing the index.php. However, I rank pretty high for the keywords I want in Google, Bing and Yahoo. All of the URLs that show up in the searches have index.php as part of the URL. Has anyone ever used sh404SEF to remove the index.php and how did you overcome not loosing your search engine links? I don't want an existing search showing www.domain.com/index.php/sales and it not linking to the correct page which would now be www.domain.com/sales. I guess I could insert the proper redirects in the htaccess file. But I was hoping to avoid having every page of my site in the htaccess file for redirecting. Any help or advice appreciated.
Intermediate & Advanced SEO | | MedGroupMedia0 -
Outranking a crappy outdated site with domain age & keywords in URL.
I'm trying to outrank a website with the following: Website with #1 ranking for a search query with "City & Brand" Domain Authority - 2 Domain Age - 11 years & 9 months old Has both the City & brand in the URL name. The site is crap, outdated.. probably last designed in the 90's, old layouts, not a lot of content & NO keywords in the titles & descriptions on all pages. My site ranks 5th for the same keyword.. BEHIND 4 pages from the site described above. Domain Authority - 2 Domain Age - 4 years & 2 months old Has only the CITY in the URL. Brand new site design this past year, new content & individual keywords in the titles, descriptions on each page. My main question is.... do you think it would be be beneficial to buy a new domain name with the BRAND in the URL & CITY & 301 redirect my 4 year old domain to the new domain to pass along the authority it has gained. Will having the brand in the URL make much of a difference? Do you think that small step would even help to beat the crappy but old site out? Thanks for any help & suggestions on how to beat this old site or at least show up second.
Intermediate & Advanced SEO | | DCochrane0