No index tag robots.txt
-
Hi Mozzers,
A client's website has a lot of internal directories defined as /node/*.
I already added the rule 'Disallow: /node/*' to the robots.txt file to prevents bots from crawling these pages.
However, the pages are already indexed and appear in the search results.
In an article of Deepcrawl, they say you can simply add the rule 'Noindex: /node/*' to the robots.txt file, but other sources claim the only way is to add a noindex directive in the meta robots tag of every page.
Can someone tell me which is the best way to prevent these pages from getting indexed? Small note: there are more than 100 pages.
Thanks!
Jens -
Hi Jens
I don't know Drupal but if it's like Wordpress it will add a noindex tag to the page.
Do it for one page then take a look at the code.
Go to the page: right click > View Source
Then go to the three dots top right in chrome and search noindex. It will look like this attached. (ignore the red line crossed out piece)
Best Regards Nigel
-
Hi Guys,
In Drupal between the advanced tags (meta tags), there is an option:
' Prevents search engines from indexing this page 'Do you happen to know whether these tags are seen as valid by Searchbots?
Thanks again guys!
-
For the sake of balance, probably worth mentioning that I'm with David in that I've seen a robots.txt noindex work. It has been relatively recently used by a large publisher when they had an article they had to take down but which Google was holding on to. That's irrelevant nuance in this situation but I think David deserves more credit than he got here.
In terms of this specific fix I agree with Nigel - remove the Disallow and add a noindex (prompt Google to crawl the pages, with a sitemap if they don't seem to be shifting). You can re-add the Disallow if you think it's necessary but once all of the appropriate pages have a noindex tag they should stay out of the index and if they are heavily linked to on the site disallowing them could result in a loss of link equity (it'll stop with the link to the disallowed pages). So if you think you can achieve this with just a noindex you might want to leave it at that.
-
Hi David
I'd rather listen to John Mueller - he has specifically said to not use it:
https://www.seroundtable.com/google-do-not-use-noindex-in-robots-txt-20873.html
I wouldn't be advising people to use it on that basis whether it has worked for you this time or not. It's not best practice.
That's all. (Sorry Jens!)
Regards
Nigel
-
Thanks a lot for your answers guys!
-
Hi Nigel,
I agreed that what you said is the best solution in this case but noindex can definitely be done in robots.txt.
I'm not sure of the questionable sites you've seen it mentioned on, but I'd consider Stone Temple and Deep Crawl to be reputable sources.
That said, I always like to test things for myself!
I tried robots.txt noindex on one of my own big sports news websites a little while ago because I didn't want to manually set thousands of old posts to noindex. The robots.txt noindex worked fine.
Cheers,
David
-
Hi Jens/David
You should not use a noindex in Robots.txt. You can put it on the page as a robots tag, but not in Robots.txt
I have never ever seen it used in the Robots.txt - I have seen it mentioned a few times on some questionable sites and the odd mention many years ago but it's bad practice in my opinion.
Read more about Robots.txt here: https://moz.com/learn/seo/robotstxt
If you follow what I have said, that is the correct solution.
Regards Nigel
-
Hi Nigel and Jens,
Just to clarify - noindex is valid in robots.txt for Google but it's not recognized by Bing.
Here's a case study by Stone Temple on using noindex in robots.txt: https://www.stonetemple.com/does-google-respect-robots-txt-noindex-and-should-you-use-it/
From their case study, it was found to be pretty effective, but not 100%. It would be a good solution for large websites, but if you're only looking at 100+ pages I would do as Nigel said above and implement the meta robots noindex tags.
Cheers,
David
-
Hi Jens
You can't add a noindex in the Robots.txt file.
Firstly you need to add a noindex tag to all of the pages in the /node/ directory.
Then remove the nofollow directive in the Robots.txtYou need to do this for Google to see the noindex tags!
If you have a noindex tag and a nofollow then the directory is blocked so Google can't see the tags!
Once all the pages have gone from search then add the nofollow back to the Robots.txt file so that Google doesn't waste crawl budget trying to index them.
This will solve your problem.
Regards
Nigel
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Robots.txt error
Moz Crawler is not able to access the robots.txt due to server error. Please advice on how to tackle the server error.
Technical SEO | | Shanidel0 -
Accidental No Index
Hi everyone, We control several client sites at my company. The developers accidentally had a no index robot implemented in the site code when we did the HTTPS upgrade without knowing it (yes it's true). Ten days later we noticed traffic was falling. After a couple days we found the no index tags and removed them and resumbitted the sitemaps. The sites started ranking for their own keywords again within a day or two. The organic traffic is still down considerably and other keywords they are not ranking for in the same spot as they were before or at all. If I look in Google Search console, it says we submitted for example 4,000 URLs and only 160 have been indexed. I feel like maybe Google is taking a long time to re-index to remainder of the sites?? Has anyone has this issue?? We're starting to get very concerned so any input would be appreciate. I read an article on here from 2011 about a company that did the same and they were ranking for their keywords within a week. It's been 8 days since our fix.
Technical SEO | | AliMac260 -
Meta tag Syntax
Hi, This might seem silly. What is the correct syntax for the meta tag used when noindexing webpages? I have "". I have seen it both with and without the forward slash before the greater than sign. Does it make any difference if the forward slash is present or not? Cheers
Technical SEO | | McCaldin0 -
Mobile website indexing
Hi we have a mobile version of our website at mobile.gardening-services-edinburgh.com its been live for 5, maybe 6 months, it has its own mobile-sitemap.xml have tried submitting this sitemap to google and for some reason it does not index these pages any ideas, most welcome
Technical SEO | | McSEO0 -
App Indexing
Can anyone please check if our app is indexed or not? Also check if deep linking done is correct or not rel="alternate" href="android-app://in.instafresh.app/http/www.instafrsh.com/" /> Website - http://instafrsh.com/ App - https://play.google.com/store/apps/details?id=in.instafresh.app
Technical SEO | | Obbserv0 -
Exclude root url in robots.txt ?
Hi, I have the following setup: www.example.com/nl
Technical SEO | | mikehenze
www.example.com/de
www.example.com/uk
etc
www.example.com is 301'ed to www.example.com/nl But now www.example.com is ranking instead of www.example.com/nl
Should is block www.example.com in robots.txt so only the subfolders are being ranked?
Or will i lose my ranking by doing this.0 -
Robots.txt
Hi there, My question relates to the robots.txt file. This statement: /*/trackback Would this block domain.com/trackback and domain.com/fred/trackback ? Peter
Technical SEO | | PeterM220 -
Why Google did not index our domain?
Hi, We launched tmart 60 days ago and submitted to google, bing, yahoo 20 days later. But google had never indexed our website still when yahoo indexed it in one week. What we have checked or tried: 1. We got 20~50 inlinks in one month and now 81 inlinks via yahoo site explorer. 2. This domain has registered for 13 years and we purchased it from sedo last year. We
Technical SEO | | zt673
did not find any problems from domain archive pages. 3. Page similar: the homepage is 50% similar to one of our competitors when we just launched.
So we adjusted the page structure and modified the content one month later and decreased the similarity to 30% (by tools from webconfs.com) 4. Google Robots: googlebot crawled our website every day after we submitted for indexing.
We opened GWT account for it and added the xml sitemap last week. GWT said nothing
was wrong except the time of page loading. Our questions: Why google did not indexed our website? What should we do? Thanks, wu0