No index tag robots.txt
-
Hi Mozzers,
A client's website has a lot of internal directories defined as /node/*.
I already added the rule 'Disallow: /node/*' to the robots.txt file to prevents bots from crawling these pages.
However, the pages are already indexed and appear in the search results.
In an article of Deepcrawl, they say you can simply add the rule 'Noindex: /node/*' to the robots.txt file, but other sources claim the only way is to add a noindex directive in the meta robots tag of every page.
Can someone tell me which is the best way to prevent these pages from getting indexed? Small note: there are more than 100 pages.
Thanks!
Jens -
Hi Jens
I don't know Drupal but if it's like Wordpress it will add a noindex tag to the page.
Do it for one page then take a look at the code.
Go to the page: right click > View Source
Then go to the three dots top right in chrome and search noindex. It will look like this attached. (ignore the red line crossed out piece)
Best Regards Nigel
-
Hi Guys,
In Drupal between the advanced tags (meta tags), there is an option:
' Prevents search engines from indexing this page 'Do you happen to know whether these tags are seen as valid by Searchbots?
Thanks again guys!
-
For the sake of balance, probably worth mentioning that I'm with David in that I've seen a robots.txt noindex work. It has been relatively recently used by a large publisher when they had an article they had to take down but which Google was holding on to. That's irrelevant nuance in this situation but I think David deserves more credit than he got here.
In terms of this specific fix I agree with Nigel - remove the Disallow and add a noindex (prompt Google to crawl the pages, with a sitemap if they don't seem to be shifting). You can re-add the Disallow if you think it's necessary but once all of the appropriate pages have a noindex tag they should stay out of the index and if they are heavily linked to on the site disallowing them could result in a loss of link equity (it'll stop with the link to the disallowed pages). So if you think you can achieve this with just a noindex you might want to leave it at that.
-
Hi David
I'd rather listen to John Mueller - he has specifically said to not use it:
https://www.seroundtable.com/google-do-not-use-noindex-in-robots-txt-20873.html
I wouldn't be advising people to use it on that basis whether it has worked for you this time or not. It's not best practice.
That's all. (Sorry Jens!)
Regards
Nigel
-
Thanks a lot for your answers guys!
-
Hi Nigel,
I agreed that what you said is the best solution in this case but noindex can definitely be done in robots.txt.
I'm not sure of the questionable sites you've seen it mentioned on, but I'd consider Stone Temple and Deep Crawl to be reputable sources.
That said, I always like to test things for myself!
I tried robots.txt noindex on one of my own big sports news websites a little while ago because I didn't want to manually set thousands of old posts to noindex. The robots.txt noindex worked fine.
Cheers,
David
-
Hi Jens/David
You should not use a noindex in Robots.txt. You can put it on the page as a robots tag, but not in Robots.txt
I have never ever seen it used in the Robots.txt - I have seen it mentioned a few times on some questionable sites and the odd mention many years ago but it's bad practice in my opinion.
Read more about Robots.txt here: https://moz.com/learn/seo/robotstxt
If you follow what I have said, that is the correct solution.
Regards Nigel
-
Hi Nigel and Jens,
Just to clarify - noindex is valid in robots.txt for Google but it's not recognized by Bing.
Here's a case study by Stone Temple on using noindex in robots.txt: https://www.stonetemple.com/does-google-respect-robots-txt-noindex-and-should-you-use-it/
From their case study, it was found to be pretty effective, but not 100%. It would be a good solution for large websites, but if you're only looking at 100+ pages I would do as Nigel said above and implement the meta robots noindex tags.
Cheers,
David
-
Hi Jens
You can't add a noindex in the Robots.txt file.
Firstly you need to add a noindex tag to all of the pages in the /node/ directory.
Then remove the nofollow directive in the Robots.txtYou need to do this for Google to see the noindex tags!
If you have a noindex tag and a nofollow then the directory is blocked so Google can't see the tags!
Once all the pages have gone from search then add the nofollow back to the Robots.txt file so that Google doesn't waste crawl budget trying to index them.
This will solve your problem.
Regards
Nigel
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
How to index Backlink Fast
hi, From the past some month i am facing the problem in indexing backlinks, please share the method to index backlink in google fast
Technical SEO | | vijay231 -
Hreflang tag implentation
Hi, We've had hreflang tags implemented on our site for a few weeks now, and while we are seeing some improvements for the regional subfolders I wanted to double check I had the tags implemented correctly (a couple of examples are below). However while the regional subfolder sites are now ranking instead of the US site for some keywords, some key search terms are still returning the US site. Could this be due to incorrect implementation for that specific page? Due to complications with using Magento we're implementing the tags in the site maps. Also magento appears to be inserting a rel canonical tag automatically for each page and self referencing e.g. On www.example.com/uk/security-cameras (one of the pages we're having issues with) the canonical tag is http://www.example.com/uk/security-cameras" />. Is this an issue? Any advice would be appreciated. Thanks. <url><loc>http://www.example.com/uk/dvrs-kits</loc>
Technical SEO | | ahyde
<lastmod>2014-07-23</lastmod>
<changefreq>daily</changefreq>
<priority>0.5</priority></url>
<url><loc>http://www.example.com/uk/dvrs-kits/1080p</loc>
<lastmod>2014-07-23</lastmod>
<changefreq>daily</changefreq>
<priority>0.5</priority></url>0 -
GWT returning 200 for robots.txt, but it's actually returning a 404?
Hi, Just wondering if anyone has had this problem before. I'm just checking a client's GWT and I'm looking at their robots.txt file. In GWT, it's saying that it's all fine and returns a 200 code, but when I manually visit (or click the link in GWT) the page, it gives me a 404 error. As far as I can tell, the client has made no changes to the robots.txt recently, and we definitely haven't either. Has anyone had this problem before? Thanks!
Technical SEO | | White.net0 -
Question about construction of our sitemap URL in robots.txt file
Hi all, This is a Webmaster/SEO question. This is the sitemap URL currently in our robots.txt file: http://www.ccisolutions.com/sitemap.xml As you can see it leads to a page with two URLs on it. Is this a problem? Wouldn't it be better to list both of those XML files as separate line items in the robots.txt file? Thanks! Dana
Technical SEO | | danatanseo0 -
Can't find mistake in robots.txt
Hi all, we recently filled our robots.txt file to prevent some directories from crawling. Looks like: User-agent: * Disallow: /Views/ Disallow: /login/ Disallow: /routing/ Disallow: /Profiler/ Disallow: /LILLYPROFILER/ Disallow: /EventRweKompaktProfiler/ Disallow: /AccessIntProfiler/ Disallow: /KellyIntProfiler/ Disallow: /lilly/ now, as Google Webmaster Tools hasn't updated our robots.txt yet, I checked our robots.txt in some ckeckers. They tell me that the User agent: * contains an error. **Example:** **Line 1: Syntax error! Expected <field>:</field> <value></value> 1: User-agent: *** **`I checked other robots.txt written the same way --> they work,`** accordign to the checkers... **`Where the .... is the mistake???`** ```
Technical SEO | | accessKellyOCG0 -
Tagging Assets
As I am finding ways to integrate keyword diversity into my key landing pages, I want to start adding META information to content such as images and videos. 1. Any blog posts on best practices you can send me to? 2. Can I add META information to iFrames? Or do i have to rely on the tags added within Vimeo & You Tube? Thank you again
Technical SEO | | GladdySEO0 -
I am trying to block robots from indexing parts of my site..
I have a few websites that I mocked up for clients to check out my work and get a feel for the style I produce but I don't want them indexed as they have lore ipsum place holder text and not really optimized... I am in the process of optimizing them but for the time being I would like to block them. Most of my warnings and errors on my seomoz dashboard are from these sites and I was going to upload the folioing to the robot.txt file but I want to make sure this is correct: User-agent: * Disallow: /salondemo/ Disallow: /salondemo3/ Disallow: /cafedemo/ Disallow: /portfolio1/ Disallow: /portfolio2/ Disallow: /portfolio3/ Disallow: /salondemo2/ is this all i need to do? Thanks Donny
Technical SEO | | Smurkcreative0 -
Img before or after h1 tag?
I like images to align right at top of content page. img tag before h1 tag looks better on page, but wondering if h1 tag before img tag is preferred by spider. Irrelevant? or possibly matters? thanks for any thoughts.
Technical SEO | | jotham2
All about Stuff or All about Stuff or even
All about Stuff0