No index tag robots.txt

WeAreDigital_BE

Hi Mozzers,

A client's website has a lot of internal directories defined as /node/*.

I already added the rule 'Disallow: /node/*' to the robots.txt file to prevents bots from crawling these pages.

However, the pages are already indexed and appear in the search results.

In an article of Deepcrawl, they say you can simply add the rule 'Noindex: /node/*' to the robots.txt file, but other sources claim the only way is to add a noindex directive in the meta robots tag of every page.

Can someone tell me which is the best way to prevent these pages from getting indexed? Small note: there are more than 100 pages.

Thanks!
Jens

Nigel_Carr

Hi Jens

I don't know Drupal but if it's like Wordpress it will add a noindex tag to the page.

Do it for one page then take a look at the code.

Go to the page: right click > View Source

Then go to the three dots top right in chrome and search noindex. It will look like this attached. (ignore the red line crossed out piece)

Best Regards Nigel

x6DFb9q.jpg

WeAreDigital_BE

Hi Guys,

In Drupal between the advanced tags (meta tags), there is an option:
' Prevents search engines from indexing this page '

Do you happen to know whether these tags are seen as valid by Searchbots?

Thanks again guys!

R0bin_L0rd

For the sake of balance, probably worth mentioning that I'm with David in that I've seen a robots.txt noindex work. It has been relatively recently used by a large publisher when they had an article they had to take down but which Google was holding on to. That's irrelevant nuance in this situation but I think David deserves more credit than he got here.

In terms of this specific fix I agree with Nigel - remove the Disallow and add a noindex (prompt Google to crawl the pages, with a sitemap if they don't seem to be shifting). You can re-add the Disallow if you think it's necessary but once all of the appropriate pages have a noindex tag they should stay out of the index and if they are heavily linked to on the site disallowing them could result in a loss of link equity (it'll stop with the link to the disallowed pages). So if you think you can achieve this with just a noindex you might want to leave it at that.

Nigel_Carr

Hi David

I'd rather listen to John Mueller - he has specifically said to not use it:

https://www.seroundtable.com/google-do-not-use-noindex-in-robots-txt-20873.html

I wouldn't be advising people to use it on that basis whether it has worked for you this time or not. It's not best practice.

That's all. (Sorry Jens!)

Regards

Nigel

WeAreDigital_BE

Thanks a lot for your answers guys!

davebuts

Hi Nigel,

I agreed that what you said is the best solution in this case but noindex can definitely be done in robots.txt.

I'm not sure of the questionable sites you've seen it mentioned on, but I'd consider Stone Temple and Deep Crawl to be reputable sources.

That said, I always like to test things for myself!

I tried robots.txt noindex on one of my own big sports news websites a little while ago because I didn't want to manually set thousands of old posts to noindex. The robots.txt noindex worked fine.

Cheers,

David

Nigel_Carr

Hi Jens/David

You should not use a noindex in Robots.txt. You can put it on the page as a robots tag, but not in Robots.txt

I have never ever seen it used in the Robots.txt - I have seen it mentioned a few times on some questionable sites and the odd mention many years ago but it's bad practice in my opinion.

Read more about Robots.txt here: https://moz.com/learn/seo/robotstxt

If you follow what I have said, that is the correct solution.

Regards Nigel

davebuts

Hi Nigel and Jens,

Just to clarify - noindex is valid in robots.txt for Google but it's not recognized by Bing.

Here's a case study by Stone Temple on using noindex in robots.txt: https://www.stonetemple.com/does-google-respect-robots-txt-noindex-and-should-you-use-it/

From their case study, it was found to be pretty effective, but not 100%. It would be a good solution for large websites, but if you're only looking at 100+ pages I would do as Nigel said above and implement the meta robots noindex tags.

Cheers,

David

Nigel_Carr

Hi Jens

You can't add a noindex in the Robots.txt file.

Firstly you need to add a noindex tag to all of the pages in the /node/ directory.
Then remove the nofollow directive in the Robots.txt

You need to do this for Google to see the noindex tags!

If you have a noindex tag and a nofollow then the directory is blocked so Google can't see the tags!

Once all the pages have gone from search then add the nofollow back to the Robots.txt file so that Google doesn't waste crawl budget trying to index them.

This will solve your problem.

Regards

Nigel

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

No index tag robots.txt

Got a burning SEO question?

Browse Questions

Explore more categories

Related Questions

Clarification regarding robots.txt protocol

Index bloating issue

If Google's index contains multiple URLs for my homepage, does that mean the canonical tag is not working?

Canonical tag refers to itself (???)

Removing indexed website

Is my robots.txt file working?

Is there a reason to set a crawl-delay in the robots.txt?

Will Google Continue to Index the Page with NoIndex Tag Upon Google +1 Button Impression or Click?