No index tag robots.txt
-
Hi Mozzers,
A client's website has a lot of internal directories defined as /node/*.
I already added the rule 'Disallow: /node/*' to the robots.txt file to prevents bots from crawling these pages.
However, the pages are already indexed and appear in the search results.
In an article of Deepcrawl, they say you can simply add the rule 'Noindex: /node/*' to the robots.txt file, but other sources claim the only way is to add a noindex directive in the meta robots tag of every page.
Can someone tell me which is the best way to prevent these pages from getting indexed? Small note: there are more than 100 pages.
Thanks!
Jens -
Hi Jens
I don't know Drupal but if it's like Wordpress it will add a noindex tag to the page.
Do it for one page then take a look at the code.
Go to the page: right click > View Source
Then go to the three dots top right in chrome and search noindex. It will look like this attached. (ignore the red line crossed out piece)
Best Regards Nigel
-
Hi Guys,
In Drupal between the advanced tags (meta tags), there is an option:
' Prevents search engines from indexing this page 'Do you happen to know whether these tags are seen as valid by Searchbots?
Thanks again guys!
-
For the sake of balance, probably worth mentioning that I'm with David in that I've seen a robots.txt noindex work. It has been relatively recently used by a large publisher when they had an article they had to take down but which Google was holding on to. That's irrelevant nuance in this situation but I think David deserves more credit than he got here.
In terms of this specific fix I agree with Nigel - remove the Disallow and add a noindex (prompt Google to crawl the pages, with a sitemap if they don't seem to be shifting). You can re-add the Disallow if you think it's necessary but once all of the appropriate pages have a noindex tag they should stay out of the index and if they are heavily linked to on the site disallowing them could result in a loss of link equity (it'll stop with the link to the disallowed pages). So if you think you can achieve this with just a noindex you might want to leave it at that.
-
Hi David
I'd rather listen to John Mueller - he has specifically said to not use it:
https://www.seroundtable.com/google-do-not-use-noindex-in-robots-txt-20873.html
I wouldn't be advising people to use it on that basis whether it has worked for you this time or not. It's not best practice.
That's all. (Sorry Jens!)
Regards
Nigel
-
Thanks a lot for your answers guys!
-
Hi Nigel,
I agreed that what you said is the best solution in this case but noindex can definitely be done in robots.txt.
I'm not sure of the questionable sites you've seen it mentioned on, but I'd consider Stone Temple and Deep Crawl to be reputable sources.
That said, I always like to test things for myself!
I tried robots.txt noindex on one of my own big sports news websites a little while ago because I didn't want to manually set thousands of old posts to noindex. The robots.txt noindex worked fine.
Cheers,
David
-
Hi Jens/David
You should not use a noindex in Robots.txt. You can put it on the page as a robots tag, but not in Robots.txt
I have never ever seen it used in the Robots.txt - I have seen it mentioned a few times on some questionable sites and the odd mention many years ago but it's bad practice in my opinion.
Read more about Robots.txt here: https://moz.com/learn/seo/robotstxt
If you follow what I have said, that is the correct solution.
Regards Nigel
-
Hi Nigel and Jens,
Just to clarify - noindex is valid in robots.txt for Google but it's not recognized by Bing.
Here's a case study by Stone Temple on using noindex in robots.txt: https://www.stonetemple.com/does-google-respect-robots-txt-noindex-and-should-you-use-it/
From their case study, it was found to be pretty effective, but not 100%. It would be a good solution for large websites, but if you're only looking at 100+ pages I would do as Nigel said above and implement the meta robots noindex tags.
Cheers,
David
-
Hi Jens
You can't add a noindex in the Robots.txt file.
Firstly you need to add a noindex tag to all of the pages in the /node/ directory.
Then remove the nofollow directive in the Robots.txtYou need to do this for Google to see the noindex tags!
If you have a noindex tag and a nofollow then the directory is blocked so Google can't see the tags!
Once all the pages have gone from search then add the nofollow back to the Robots.txt file so that Google doesn't waste crawl budget trying to index them.
This will solve your problem.
Regards
Nigel
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Not all images indexed in Google
Hi all, Recently, got an unusual issue with images in Google index. We have more than 1,500 images in our sitemap, but according to Search Console only 273 of those are indexed. If I check Google image search directly, I find more images in index, but still not all of them. For example this post has 28 images and only 17 are indexed in Google image. This is happening to other posts as well. Checked all possible reasons (missing alt, image as background, file size, fetch and render in Search Console), but none of these are relevant in our case. So, everything looks fine, but not all images are in index. Any ideas on this issue? Your feedback is much appreciated, thanks
Technical SEO | | flo_seo1 -
Wrong title tag
Hi, need help. I notice Google always , on all my pages (about 30), index wrong title tag. If I try to use "my-keywords-here | my_company_name" Google always index "my_company_name: my-keywords-here" and can't figure why is that :(
Technical SEO | | MirkoL
The problem is always only with "my-keywords-here | my_company_name".
If I use "my-keywords-here - my_company_name" or "my-keywords-here my_company_name" (without sign | ) everything is fine
Is anybody having any reasonable explanation?
Is anybody having Joomla page with "my-keywords-here | my_company_name" in the title and have indexed by Google like that? one example is www.ferometal-prerada.hr Thank you1 -
Why are only a few of our pages being indexed
Recently rebuilt a site for an auctioneers, however it has a problem in that none of the lots and auctions are being indexed by Google on the new site, only the pages like About, FAQ, home, contact. Checking WMT shows that Google has crawled all the pages, and I've done a "Fetch as Google" on them and it loads up fine, so there's no crawling issues that is standing out. I've set the "URL Parameters" to no effect too. Also built a sitemap with all the lots in, pushed to Google which then crawled them all (massive spike in Crawl rate for a couple days), and still just indexing a handful of pages. Any clues to look into would be greatly appreciated. https://www.wilkinsons-auctioneers.co.uk/auctions/
Technical SEO | | Blue-shark0 -
Should We Index These Category Pages?
Currently we have marked category pages like http://www.yournextshoes.com/celebrities/kim-kardashian/ as follow/noindex as they essentially do not include any original content. On the other hand, for someone searching for Kim Kardashian shoes, it's a highly relevant page as we provide links to all the Kim Kardashian shoe sightings that we have covered. Should we index the category pages or leave them unindexed?
Technical SEO | | Jantaro0 -
Robots.txt - What is the correct syntax?
Hello everyone I have the following link: http://mywebshop.dk/index.php?option=com_redshop&view=send_friend&pid=39&tmpl=component&Itemid=167 I want to prevent google from indiexing everything that is related to "view=send_friend" The problem is that its giving me dublicate content, and the content of the links has no SEO value of any sort. My problem is how i disallow it correctly via robots.txt I tried this syntax: Disallow: /view=send_friend/ However after doing a crawl on request the 200+ dublicate links that contains view=send_friend is still present in the CSV crawl report. What is the correct syntax if i want to prevent google from indexing everything that is related to this kind of link?
Technical SEO | | teleman0 -
Does Bing ignore robots txt files?
Bonjour from "Its a miracle is not raining" Wetherby Uk 🙂 Ok here goes... Why despite a robots text file excluding indexing to site http://lewispr.netconstruct-preview.co.uk/ is the site url being indexed in Bing bit not Google? Does bing ignore robots text files or is there something missing from http://lewispr.netconstruct-preview.co.uk/robots.txt I need to add to stop bing indexing a preview site as illustrated below. http://i216.photobucket.com/albums/cc53/zymurgy_bucket/preview-bing-indexed.jpg Any insights welcome 🙂
Technical SEO | | Nightwing0 -
Robot.txt pattern matching
Hola fellow SEO peoples! Site: http://www.sierratradingpost.com robot: http://www.sierratradingpost.com/robots.txt Please see the following line: Disallow: /keycodebypid~* We are trying to block URLs like this: http://www.sierratradingpost.com/keycodebypid~8855/for-the-home~d~3/kitchen~d~24/ but we still find them in the Google index. 1. we are not sure if we need to specify the robot to use pattern matching. 2. we are not sure if the format is correct. Should we use Disallow: /keycodebypid*/ or /*keycodebypid/ or even /*keycodebypid~/? What is even more confusing is that the meta robot command line says "noindex" - yet they still show up. <meta name="robots" content="noindex, follow, noarchive" /> Thank you!
Technical SEO | | STPseo0 -
Is this 404 page indexed?
I have a URL that when searched for shows up in the Google index as the first result but does not have any title or description attached to it. When you click on the link it goes to a 404 page. Is it simply that Google is removing it from the index and is in some sort of transitional phase or could there be another reason.
Technical SEO | | bfinternet0