Does Google respect User-agent rules in robots.txt?
-
We want to use an inline linking tool (LinkSmart) to cross link between a few key content types on our online news site.
LinkSmart uses a bot to establish the linking.
The issue: There are millions of pages on our site that we don't want LinkSmart to spider and process for cross linking.
LinkSmart suggested setting a noindex tag on the pages we don't want them to process, and that we target the rule to their specific user agent.
I have concerns. We don't want to inadvertently block search engine access to those millions of pages. I've seen googlebot ignore nofollow rules set at the page level. Does it ever arbitrarily obey rules that it's been directed to ignore?
Can you quantify the level of risk in setting user-agent-specific nofollow tags on pages we want search engines to crawl, but that we want LinkSmart to ignore?
-
Does Google respect User-agent rules in robots.txt?
Yes
I've seen googlebot ignore nofollow rules set at the page level.
Google honors the nofollow rules set at the page level. The issue is there may be other links on your site or elsewhere on the web that Google will find and follow those links.
Robots.txt is the absolute last means to use for blocking pages. You should not block a page with robots.txt unless you have exhausted all other options. A more appropriate method of keeping a page out of the index is the noindex tag. If you use the tag appropriately, Google will honor the tag.
-
Hi,
I would advise to block the directories which the files sit in in robots.txt, over adding no index tags to specific pages.
Yet then this would also leave these pages to not be indexed by Google, other search engines and also this Link Smart software you are referring to.
The thing is if you add a no index tag or if you add a robots .txt block to pages it will also block all search engines too.
So yes their is some risk involved, you have to do things carefully around this area.
Kind Regards,
James.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Does Google Understand H2 As Subtitle?
I use some HTML 5 tags on my custom template. I implement <header class="entry-header-outer"> Flavour & Chidinma – 40 Yrs 40 Yrs by Flavour & Chidinma </header> html code. h1 tag serves as the title, while h2 tag servers as the subtitle of the post. Take a look at it here: https://xclusiveloaded.com/flavour-chidinma-40-yrs/ I want to know if it's ok or should I remove the h2 tag. Guys, what is your thoughts?
On-Page Optimization | | Kingsmart4 -
Will it upset Google if I aggregate product page reviews up into a product category page?
We have reviews on our product pages and we are considering averaging those reviews out and putting them on specific category pages in order for the average product ratings to be displayed in search results. Each averaged category review would be only for the products within it's category, and all reviews are from users of the site, no 3rd party reviews. For example, averaging the reviews from all of our boxes products pages, and listing that average review on the boxes category page. My question is, will this be doing anything wrong in the eyes of Google, and if so how so? -Derick
On-Page Optimization | | Deluxe0 -
Two Robots.txt files
Hi there Can somebody please help me that one of my client site have two robot.txt files (please see below). One txt file is blocked few folders and another one is blocked completely all the Search engines. Our tech team telling that due to some technical reasons they using second one which placed in inside the server and search engines unable to see this file. www.example.co.uk/robots.txt - Blocked few folderswww.example.co.uk/Robots.txt - Blocked all Search Engines I hope someone can give me the help I need in this one. Thanks in advance! Cheers,
On-Page Optimization | | TrulyTravel
Satla0 -
Google Index HTTPS
Hi,
On-Page Optimization | | JohnHuynh
I had a HTTP protocol file which indexed. Now I want to change this file to HTTPS protocol. I wonder that is there any effects?
I don't know HTTPS would be indexed by google or not? Thanks,0 -
Country Name in Google SERP
I am asking similar type of question that i asked before .I want to display country name in SERP like this. Ask an SEO question |SEOmozQ&A http://seomoz.org/-United States .How to display URL with country name like above.
On-Page Optimization | | Alick3000 -
New CMS system - 100,000 old urls - use robots.txt to block?
Hello. My website has recently switched to a new CMS system. Over the last 10 years or so, we've used 3 different CMS systems on our current domain. As expected, this has resulted in lots of urls. Up until this most recent iteration, we were unable to 301 redirect or use any page-level indexation techniques like rel 'canonical' Using SEOmoz's tools and GWMT, I've been able to locate and redirect all pertinent, page-rank bearing, "older" urls to their new counterparts..however, according to Google Webmaster tools 'Not Found' report, there are literally over 100,000 additional urls out there it's trying to find. My question is, is there an advantage to using robots.txt to stop search engines from looking for some of these older directories? Currently, we allow everything - only using page level robots tags to disallow where necessary. Thanks!
On-Page Optimization | | Blenny0 -
Google Sitemap
Does adding a Google Sitemap to webmaster tools REALLY help SEO? If so, are there any resources for help creating one? Here is my site: http://www.petmedicalcenter.com Thanks,
On-Page Optimization | | PMC-312087
Brant0 -
Confirmation regarding canonical and syndication google tags
Hi, We are in the process of improving our CMS upstream to resolve our duplicate content issues. We were hit pretty hard by the Panda update. One of the steps we have taken is implementation of the canonical link tag across all domains in our site. You see, we are a news release service with muliple channels and websites to represent each. The problem is that a client will submit a release and in many cases the news item is relevant to multiple channels I.E. multiple websites under the same IP range. Site Examples:
On-Page Optimization | | jarrett.mackay
www.hotelnewsresource.com www.restaurantnewsresource.com
www.travelindustrywire.com From a user perspective, it makes sense that they should be able to access the article from the site they are browsing without being redirected to the site we feel carries the most relevance. We hope the canconical tag will resolve this issue for us. I have also read about the syndication tag and was looking for feedback or recommendations if we should implement that also, but it may be overkill as the two tags objectives seem to be similar. I guess my first question is if the syndication tag is only used by Google News. Secondly, and a little off topic is that we also offer an API and like many other sites, I have read, our content partners are now doing better in primary and long tail rankings even thought we are the original source. My assumption is that we should modify the API to force using both caconical and syndication tags as well. Lastly, I´m curious if anyone has tested the original source tag and if we should implement that as well. Thanks everyone. Jarrett0