Robots.txt advice
-
Hey Guys,
Have you ever seen coding like this in a robots.txt, I have never seen a noindex rule in a robots.txt file before - have you?
user-agent: AhrefsBot
User-agent: trovitBot
User-agent: Nutch
User-agent: Baiduspider
Disallow: /User-agent: *
Disallow: /WebServices/
Disallow: /*?notfound=
Disallow: /?list=
Noindex: /?*list=
Noindex: /local/
Disallow: /local/
Noindex: /handle/
Disallow: /handle/
Noindex: /Handle/
Disallow: /Handle/
Noindex: /localsites/
Disallow: /localsites/
Noindex: /search/
Disallow: /search/
Noindex: /Search/
Disallow: /Search/
Disallow: ?I have never seen a noindex rule in a robots.txt file before - have you?
Any pointers? -
Never seen this, doubt it's any useful as this isn't part of any search engines recommended statements to use. I don't think this would have any impact on what search engine robots would look at as it's not a statement in the robots.txt documentation.
-
Best I could find was-
Unlike disallowed pages, noindexed pages don’t end up in the index and therefore won’t show in search results. Combine both in robots.txt to optimise your crawl efficiency: the noindex will stop the page showing in search results, and the disallow will stop it being crawled
From-https://www.deepcrawl.com/blog/best-practice/robots-txt-noindex-the-best-kept-secret-in-seo/
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Advice needed on canonical paginated pages
Hi there. I use Genesis and StudioPress themes. I recently noticed that the canonical link for blog pages points to the first page on all paginated pages, which I understand is an SEO no-no. I found some code here that adds a unique canonical link to each paginated page but for categories only. It works fine. I only have one category for my site. My question is: is there a downside (or even upside) to not having a blog page and placing a link to my category page in the navigation bar instead, using the category page as the blog page? It looks good and works. What do you think? I find it odd that this seems to be an issue across the Internet and the only solution that comes up relies on the Yoast plugin, which I don't want to use (don't want to use a plugin for SEO). Thanks in advance.
Intermediate & Advanced SEO | | Nobody16165422281340 -
If my website do not have a robot.txt file, does it hurt my website ranking?
After a site audit, I find out that my website don't have a robot.txt. Does it hurt my website rankings? One more thing, when I type mywebsite.com/robot.txt, it automatically redirect to the homepage. Please help!
Intermediate & Advanced SEO | | binhlai0 -
Syndicated content with meta robots 'noindex, nofollow': safe?
Hello, I manage, with a dedicated team, the development of a big news portal, with thousands of unique articles. To expand our audiences, we syndicate content to a number of partner websites. They can publish some of our articles, as long as (1) they put a rel=canonical in their duplicated article, pointing to our original article OR (2) they put a meta robots 'noindex, follow' in their duplicated article + a dofollow link to our original article. A new prospect, to partner with with us, wants to follow a different path: republish the articles with a meta robots 'noindex, nofollow' in each duplicated article + a dofollow link to our original article. This is because he doesn't want to pass pagerank/link authority to our website (as it is not explicitly included in the contract). In terms of visibility we'd have some advantages with this partnership (even without link authority to our site) so I would accept. My question is: considering that the partner website is much authoritative than ours, could this approach damage in some way the ranking of our articles? I know that the duplicated articles published on the partner website wouldn't be indexed (because of the meta robots noindex, nofollow). But Google crawler could still reach them. And, since they have no rel=canonical and the link to our original article wouldn't be followed, I don't know if this may cause confusion about the original source of the articles. In your opinion, is this approach safe from an SEO point of view? Do we have to take some measures to protect our content? Hope I explained myself well, any help would be very appreciated, Thank you,
Intermediate & Advanced SEO | | Fabio80
Fab0 -
Robots.txt and redirected backlinks
Hey there, since a client's global website has a very complex structure which lead to big duplicate content problems, we decided to disallow crawler access and instead allow access to only a few relevant subdirectories. While indexing has improved since this I was wondering if we might have cut off link juice. Since several backlinks point to the disallowed root directory and are from there redirected (301) to the allowed directory I was wondering if this could cause any problems? Example: If there is a backlink pointing to example.com (disallowed in robots.txt) and is redirected from there to example.com/uk/en (allowed in robots.txt). Would this cut off the link juice? Thanks a lot for your thoughts on this. Regards, Jochen
Intermediate & Advanced SEO | | Online-Marketing-Guy0 -
Robots.txt help
Hi Moz Community, Google is indexing some developer pages from a previous website where I currently work: ddcblog.dev.examplewebsite.com/categories/sub-categories Was wondering how I include these in a robots.txt file so they no longer appear on Google. Can I do it under our homepage GWT account or do I have to have a separate account set up for these URL types? As always, your expertise is greatly appreciated, -Reed
Intermediate & Advanced SEO | | IceIcebaby0 -
Should comments and feeds be disallowed in robots.txt?
Hi My robots file is currently set up as listed below. From an SEO point of view is it good to disallow feeds, rss and comments? I feel allowing comments would be a good thing because it's new content that may rank in the search engines as the comments left on my blog often refer to questions or companies folks are searching for more information on. And the comments are added regularly. What's your take? I'm also concerned about the /page being blocked. Not sure how that benefits my blog from an SEO point of view as well. Look forward to your feedback. Thanks. Eddy User-agent: Googlebot Crawl-delay: 10 Allow: /* User-agent: * Crawl-delay: 10 Disallow: /wp- Disallow: /feed/ Disallow: /trackback/ Disallow: /rss/ Disallow: /comments/feed/ Disallow: /page/ Disallow: /date/ Disallow: /comments/ # Allow Everything Allow: /*
Intermediate & Advanced SEO | | workathomecareers0 -
Robots
I have just noticed this in my code name="robots" content="noindex"> And have noticed some of my keywords have dropped, could this be the reason?
Intermediate & Advanced SEO | | Paul780 -
Block all search results (dynamic) in robots.txt?
I know that google does not want to index "search result" pages for a lot of reasons (dup content, dynamic urls, blah blah). I recently optimized the entire IA of my sites to have search friendly urls, whcih includes search result pages. So, my search result pages changed from: /search?12345&productblue=true&id789 to /product/search/blue_widgets/womens/large As a result, google started indexing these pages thinking they were static (no opposition from me :)), but i started getting WMT messages saying they are finding a "high number of urls being indexed" on these sites. Should I just block them altogether, or let it work itself out?
Intermediate & Advanced SEO | | rhutchings0