Robots.txt advice
-
Hey Guys,
Have you ever seen coding like this in a robots.txt, I have never seen a noindex rule in a robots.txt file before - have you?
user-agent: AhrefsBot
User-agent: trovitBot
User-agent: Nutch
User-agent: Baiduspider
Disallow: /User-agent: *
Disallow: /WebServices/
Disallow: /*?notfound=
Disallow: /?list=
Noindex: /?*list=
Noindex: /local/
Disallow: /local/
Noindex: /handle/
Disallow: /handle/
Noindex: /Handle/
Disallow: /Handle/
Noindex: /localsites/
Disallow: /localsites/
Noindex: /search/
Disallow: /search/
Noindex: /Search/
Disallow: /Search/
Disallow: ?I have never seen a noindex rule in a robots.txt file before - have you?
Any pointers? -
Never seen this, doubt it's any useful as this isn't part of any search engines recommended statements to use. I don't think this would have any impact on what search engine robots would look at as it's not a statement in the robots.txt documentation.
-
Best I could find was-
Unlike disallowed pages, noindexed pages don’t end up in the index and therefore won’t show in search results. Combine both in robots.txt to optimise your crawl efficiency: the noindex will stop the page showing in search results, and the disallow will stop it being crawled
From-https://www.deepcrawl.com/blog/best-practice/robots-txt-noindex-the-best-kept-secret-in-seo/
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Probably basic, but how to use image Title and Alt Text - and confusing advice from Moz!
I've been doing SEO on my business's site for years and have got good results. I've always used image Titles and Alt Text text. Our blog posts are image-intensive, often with 100-200 pictures (not surprising since we're photographers). For any given blog post, I've tended to have a uniform image Title for each image and then a more specialised Alt Text tag giving a description. A typical image on one of our blog posts would be like this: Image filename: wedding-photography-at-so-and-so-venue-001.jpg .... 002, 003 etc Image Title Attribute: Wedding Photography at So-And-So-Venue by Our-Company-Name - this would be the same for every image in the blog post. Alternative Text: Bride and groom exchanging vows during wedding ceremony at so-and-so-venue - this would be tailed for each image. So my question is - is this right? The Moz help page for image SEO is actually incorrect in one aspect: https://moz.com/ugc/10-tips-for-optimizing-your-images-for-search "Alt text (short for “alternative text”) is used to highlight the identity of an image when you hover over it with your mouse cursor. It also shows as text to all users when there are problems rendering the image." This is not the case. Hovering over the image in Firefox, Chrome, Edge and Opera ALL display the Image Title, NOT Alt Text. Thoughts?
Intermediate & Advanced SEO | | robandsarahgillespie0 -
Robots blocked by pages webmasters tools
a mistake made in software. How can I solve the problem quickly? help me. XTRjH
Intermediate & Advanced SEO | | mihoreis0 -
Should I disallow all URL query strings/parameters in Robots.txt?
Webmaster Tools correctly identifies the query strings/parameters used in my URLs, but still reports duplicate title tags and meta descriptions for the original URL and the versions with parameters. For example, Webmaster Tools would report duplicates for the following URLs, despite it correctly identifying the "cat_id" and "kw" parameters: /Mulligan-Practitioner-CD-ROM
Intermediate & Advanced SEO | | jmorehouse
/Mulligan-Practitioner-CD-ROM?cat_id=87
/Mulligan-Practitioner-CD-ROM?kw=CROM Additionally, theses pages have self-referential canonical tags, so I would think I'd be covered, but I recently read that another Mozzer saw a great improvement after disallowing all query/parameter URLs, despite Webmaster Tools not reporting any errors. As I see it, I have two options: Manually tell Google that these parameters have no effect on page content via the URL Parameters section in Webmaster Tools (in case Google is unable to automatically detect this, and I am being penalized as a result). Add "Disallow: *?" to hide all query/parameter URLs from Google. My concern here is that most backlinks include the parameters, and in some cases these parameter URLs outrank the original. Any thoughts?0 -
Robots.txt Question
For our company website faithology.com we are attempting to block out any urls that contain a ? mark to keep google from seeing some pages as duplicates. Our robots.txt is as follows: User-Agent: * Disallow: /*? User-agent: rogerbot Disallow: /community/ Is the above correct? We are wanting them to not crawl any url with a "?" inside, however we don't want to harm ourselves in seo. Thanks for your help!
Intermediate & Advanced SEO | | BMPIRE0 -
About robots.txt for resolve Duplicate content
I have a trouble with Duplicate content and title, i try to many way to resolve them but because of the web code so i am still in problem. I decide to use robots.txt to block contents that are duplicate. The first Question: How do i use command in robots.txt to block all of URL like this: http://vietnamfoodtour.com/foodcourses/Cooking-School/
Intermediate & Advanced SEO | | magician
http://vietnamfoodtour.com/foodcourses/Cooking-Class/ ....... User-agent: * Disallow: /foodcourses ( Is that right? ) And the parameter URL: h
ttp://vietnamfoodtour.com/?mod=vietnamfood&page=2
http://vietnamfoodtour.com/?mod=vietnamfood&page=3
http://vietnamfoodtour.com/?mod=vietnamfood&page=4 User-agent: * Disallow: /?mod=vietnamfood ( Is that right? i have folder contain module, could i use: disallow:/module/*) The 2nd question is: Which is the priority " robots.txt" or " meta robot"? If i use robots.txt to block URL, but in that URL my meta robot is "index, follow"0 -
301 redirect or Robots.txt on an interstatial page
Hey guys, I have an affiliate tracking system that works like this : an affiliate puts up a certain code on his site, for example : www.domain.com/track/aff_id This url leads to a page where the hit is counted, analysed and then 302 redirects to my sales page with the affiliates ID in the url : www.mysalespage.com/?=aff_id. However, we've noticed recently that one affiliate seems to be ranking for our own name and the url google indexed was his tracking url (domain.com/track/aff_id). Which is strange because there is absolutely nothing on that page, its just an interstatial page so that our stats tracking software can properly filter hits. To remove the affiliate's url from showing up in the serps, I've come up with 2 solutions : 1 - Change the redirect to a 301 redirect on his track page. 2 - Change our robots.txt page to block all domain.com/track/ pages from being indexed. My question is : if I 301 redirect instead of 302, will I keep the affiliates from outranking me for my own name AND pass on link juice or should I simply block google from crawling the interstatial tracking pages?
Intermediate & Advanced SEO | | CrakJason0 -
Can I use a "no index, follow" command in a robot.txt file for a certain parameter on a domain?
I have a site that produces thousands of pages via file uploads. These pages are then linked to by users for others to download what they have uploaded. Naturally, the client has blocked the parameter which precedes these pages in an attempt to keep them from being indexed. What they did not consider, was they these pages are attracting hundreds of thousands of links that are not passing any authority to the main domain because they're being blocked in robots.txt Can I allow google to follow, but NOT index these pages via a robots.txt file --- or would this have to be done on a page by page basis?
Intermediate & Advanced SEO | | PapaRelevance0 -
Not using a robot command meta tag
Hi SEOmoz peeps. Was doing some research on robot commands and found a couple major sites that are not using them. If you check out the code for these: http://www.amazon.com http://www.zappos.com http://www.zappos.com/product/7787787/color/92100 http://www.altrec.com/ You fill not find a meta robot command line. Of course you need the line for any noindex, nofollow, noarchive pages. However for pages you want crawled and indexed, is there any benefit for not having the line at all? Thanks!
Intermediate & Advanced SEO | | STPseo0