No index tag robots.txt
-
Hi Mozzers,
A client's website has a lot of internal directories defined as /node/*.
I already added the rule 'Disallow: /node/*' to the robots.txt file to prevents bots from crawling these pages.
However, the pages are already indexed and appear in the search results.
In an article of Deepcrawl, they say you can simply add the rule 'Noindex: /node/*' to the robots.txt file, but other sources claim the only way is to add a noindex directive in the meta robots tag of every page.
Can someone tell me which is the best way to prevent these pages from getting indexed? Small note: there are more than 100 pages.
Thanks!
Jens -
Hi Jens
I don't know Drupal but if it's like Wordpress it will add a noindex tag to the page.
Do it for one page then take a look at the code.
Go to the page: right click > View Source
Then go to the three dots top right in chrome and search noindex. It will look like this attached. (ignore the red line crossed out piece)
Best Regards Nigel
-
Hi Guys,
In Drupal between the advanced tags (meta tags), there is an option:
' Prevents search engines from indexing this page 'Do you happen to know whether these tags are seen as valid by Searchbots?
Thanks again guys!
-
For the sake of balance, probably worth mentioning that I'm with David in that I've seen a robots.txt noindex work. It has been relatively recently used by a large publisher when they had an article they had to take down but which Google was holding on to. That's irrelevant nuance in this situation but I think David deserves more credit than he got here.
In terms of this specific fix I agree with Nigel - remove the Disallow and add a noindex (prompt Google to crawl the pages, with a sitemap if they don't seem to be shifting). You can re-add the Disallow if you think it's necessary but once all of the appropriate pages have a noindex tag they should stay out of the index and if they are heavily linked to on the site disallowing them could result in a loss of link equity (it'll stop with the link to the disallowed pages). So if you think you can achieve this with just a noindex you might want to leave it at that.
-
Hi David
I'd rather listen to John Mueller - he has specifically said to not use it:
https://www.seroundtable.com/google-do-not-use-noindex-in-robots-txt-20873.html
I wouldn't be advising people to use it on that basis whether it has worked for you this time or not. It's not best practice.
That's all. (Sorry Jens!)
Regards
Nigel
-
Thanks a lot for your answers guys!
-
Hi Nigel,
I agreed that what you said is the best solution in this case but noindex can definitely be done in robots.txt.
I'm not sure of the questionable sites you've seen it mentioned on, but I'd consider Stone Temple and Deep Crawl to be reputable sources.
That said, I always like to test things for myself!
I tried robots.txt noindex on one of my own big sports news websites a little while ago because I didn't want to manually set thousands of old posts to noindex. The robots.txt noindex worked fine.
Cheers,
David
-
Hi Jens/David
You should not use a noindex in Robots.txt. You can put it on the page as a robots tag, but not in Robots.txt
I have never ever seen it used in the Robots.txt - I have seen it mentioned a few times on some questionable sites and the odd mention many years ago but it's bad practice in my opinion.
Read more about Robots.txt here: https://moz.com/learn/seo/robotstxt
If you follow what I have said, that is the correct solution.
Regards Nigel
-
Hi Nigel and Jens,
Just to clarify - noindex is valid in robots.txt for Google but it's not recognized by Bing.
Here's a case study by Stone Temple on using noindex in robots.txt: https://www.stonetemple.com/does-google-respect-robots-txt-noindex-and-should-you-use-it/
From their case study, it was found to be pretty effective, but not 100%. It would be a good solution for large websites, but if you're only looking at 100+ pages I would do as Nigel said above and implement the meta robots noindex tags.
Cheers,
David
-
Hi Jens
You can't add a noindex in the Robots.txt file.
Firstly you need to add a noindex tag to all of the pages in the /node/ directory.
Then remove the nofollow directive in the Robots.txtYou need to do this for Google to see the noindex tags!
If you have a noindex tag and a nofollow then the directory is blocked so Google can't see the tags!
Once all the pages have gone from search then add the nofollow back to the Robots.txt file so that Google doesn't waste crawl budget trying to index them.
This will solve your problem.
Regards
Nigel
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Robots.txt error
Moz Crawler is not able to access the robots.txt due to server error. Please advice on how to tackle the server error.
Technical SEO | | Shanidel0 -
Why is Google not indexing my site?
I'm a bit confused as to why my site just isn't indexing on Google. Even if I type in my brand name, my social channels rank and there's no evidence of my website. I've followed all of the advice I've read and gone into webmaster tools and got the Wordpress yoast plug-in but nothing seems to be making a difference!One thing I've noticed, in Google Webmaster Tools it says "Couldn’t communicate with the DNS server." in site errors. I've called GoDaddy and they said that everything is fine. A bit frustrating. Trying to work out what my next steps should be but feeling a bit lost to be honest! Any help GREATLY appreciated!
Technical SEO | | j1066s0 -
Question about construction of our sitemap URL in robots.txt file
Hi all, This is a Webmaster/SEO question. This is the sitemap URL currently in our robots.txt file: http://www.ccisolutions.com/sitemap.xml As you can see it leads to a page with two URLs on it. Is this a problem? Wouldn't it be better to list both of those XML files as separate line items in the robots.txt file? Thanks! Dana
Technical SEO | | danatanseo0 -
How do I eliminate indexed products?
Please help! We got clobbered by Penguin and are at risk of having to close down after 10 years. We have been trying to figure out why and believe now it might be because of duplicate content. We added 2" inserts in March (over 500): http://www.trophycentral.com/inserts1.html Even though each is a different products, SEOMOZ is saying they are considered duplicate content. Given the timing, we think this might be the cause, even though it is totally legitimate. Question - since these are now indexed and since we can't easily add content quickly, what is the best way to handle this situation? A no-index tag? Is there a way to let Google know that their algorithm is detroying legitimate businesses??
Technical SEO | | trophycentraltrophiesandawards0 -
I use All in one SEO pack for wordpress and i i have 2 meta tags i need to delete them is this the meta description tag ?
add_option("aiosp_post_meta_tags", '', 'All in One SEO Plugin Additional Post Meta Tags', 'yes'); add_option("aiosp_page_meta_tags", '', 'All in One SEO Plugin Additional Post Meta Tags', 'yes'); add_option("aiosp_home_meta_tags", '', 'All in One SEO Plugin Additional Home Meta Tags', 'yes'); add_option("aiosp_do_log", null, 'All in One SEO Plugin write log file', 'yes'); */
Technical SEO | | fhnhockey0880 -
Index Issues with Iframes
I have pages that are being scrapped and displayed in iframes and I wanted to see if anyone could tell me how I could get theses pages to be indexed here is a URL of one of the pages http://coggno.com/onlinetraining/safety-/other/lab-safety-1INde
Technical SEO | | PageOnePowerGang0 -
Robots.txt and robots meta
I have an odd situation. I have a CMS that has a global robots.txt which has the generic User-Agent: *
Technical SEO | | Highland
Allow: / I also have one CMS site that needs to not be indexed ever. I've read in various pages (like http://www.jesterwebster.com/robots-txt-vs-meta-tag-which-has-precedence/22 ) that robots.txt always wins over meta, but I have also read that robots.txt indicates spiderability whereas meta can control indexation. I just want the site to not be indexed. Can I leave the robots.txt as is and still put NOINDEX in the robots meta?0 -
HTML and no index, follow
I’m just learning about HTML and I was wondering can a tag be put into a dynamic HTML page?
Technical SEO | | EricVallee340