Does Google respect User-agent rules in robots.txt?
-
We want to use an inline linking tool (LinkSmart) to cross link between a few key content types on our online news site.
LinkSmart uses a bot to establish the linking.
The issue: There are millions of pages on our site that we don't want LinkSmart to spider and process for cross linking.
LinkSmart suggested setting a noindex tag on the pages we don't want them to process, and that we target the rule to their specific user agent.
I have concerns. We don't want to inadvertently block search engine access to those millions of pages. I've seen googlebot ignore nofollow rules set at the page level. Does it ever arbitrarily obey rules that it's been directed to ignore?
Can you quantify the level of risk in setting user-agent-specific nofollow tags on pages we want search engines to crawl, but that we want LinkSmart to ignore?
-
Does Google respect User-agent rules in robots.txt?
Yes
I've seen googlebot ignore nofollow rules set at the page level.
Google honors the nofollow rules set at the page level. The issue is there may be other links on your site or elsewhere on the web that Google will find and follow those links.
Robots.txt is the absolute last means to use for blocking pages. You should not block a page with robots.txt unless you have exhausted all other options. A more appropriate method of keeping a page out of the index is the noindex tag. If you use the tag appropriately, Google will honor the tag.
-
Hi,
I would advise to block the directories which the files sit in in robots.txt, over adding no index tags to specific pages.
Yet then this would also leave these pages to not be indexed by Google, other search engines and also this Link Smart software you are referring to.
The thing is if you add a no index tag or if you add a robots .txt block to pages it will also block all search engines too.
So yes their is some risk involved, you have to do things carefully around this area.
Kind Regards,
James.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Will Google Count Links Loaded from JavaScript Files After the Page Loads
Hi, I have a simple question. If I want to put an image with a link to another site like a banner ad on my page, but do not want it counted by Google. Can I simply load the link and banner using jQuery onload from a separate .js file? The ideal result would be for Google to index a script tag instead of a link.
On-Page Optimization | | CopBlaster.com1 -
Can Google read this code?
I'm working on some basic on-page SEO content for a website within my company and I need some guidance as far as 1. whether Google can read the small amount of existing text (not optimized) and if it isn't spiderable, then 2. what code should be there instead. Here is the site: https://www.le-velgear.com/store/catalog The text I'm referring to is toward the bottom of the page (isn't it always?) and says this: Designed for a Thriving Lifestyle The Le-Vel Gear store is an extension of the LV Life, the Thrive product line, and the world's largest health and wellness Movement, which you helped create. Living a life you deserve includes looking good while showing the world your pride in being a Thriver...Check out all the new and incredible gear and tools and take your Thriving lifestyle to the next level!!! When I "View Source," I cannot see the text, however, the text is highlight-able with my cursor and I can see it when I "Inspect Element" in a container that says Thanks in advance for any help!
On-Page Optimization | | lizzyr0 -
Google not displaying my page title and meta description
Google is automatically picking up random text from my page and displaying it in place of the page title and meta description on the SERP. What can I do to avoid this? I want my page title and meta description to show up on the SERP. The url of my website is: wishpicker dot com Thanks in advance!
On-Page Optimization | | seomanicster0 -
We have 5 postions on page 2 in a google search, but none on page 1\. How can we fix this?
For one of our most important key phrases we have 5 listings on page 2 but none on page 1. We are an ecommerce company, the key phrase we're trying for is a Top Level Category name for us, so the 5 links we have on googles second page for the key phrase (in order) are the appropriate top level category page, the sites home page and than three sub categories of that top level category. So while that all makes sense, can't we convince google to concentrate all that link power/juice into just the top level category page? Hopefully bumping it to first page rank? The 5 ranks are 11-15
On-Page Optimization | | absoauto0 -
New CMS system - 100,000 old urls - use robots.txt to block?
Hello. My website has recently switched to a new CMS system. Over the last 10 years or so, we've used 3 different CMS systems on our current domain. As expected, this has resulted in lots of urls. Up until this most recent iteration, we were unable to 301 redirect or use any page-level indexation techniques like rel 'canonical' Using SEOmoz's tools and GWMT, I've been able to locate and redirect all pertinent, page-rank bearing, "older" urls to their new counterparts..however, according to Google Webmaster tools 'Not Found' report, there are literally over 100,000 additional urls out there it's trying to find. My question is, is there an advantage to using robots.txt to stop search engines from looking for some of these older directories? Currently, we allow everything - only using page level robots tags to disallow where necessary. Thanks!
On-Page Optimization | | Blenny0 -
What is a better mobile domain from an SEO perspective an m.example.com or using your regular domain with user agent detection?
Just wondering what domain is more beneficial for a mobile site and why.
On-Page Optimization | | CabbageTree0 -
Are Content in Inline Javascript and Collapsible Considered Cloaking to Google?
Hi, I would like to save space in my website and do not want my other products to be pushed down below the first fold. In order to do that, I have decided to add content inside inline javascript or using collapsible. For collapsible, I may be using "show/hide" button or "read more" button to show the whole content. So does content in Javascript and collapsible considered hiding from Google? If it is, then I have to think of other options. Thanks.
On-Page Optimization | | globalsources.com0 -
Shall Google index a search result?
Hi, I've a website with about 1000 articles.Each article has one ore more keywords / tags. So I display these keywords at the article page and put a link to the intern search engine. (Like a tag cloud) The search engine lists als articles with the same keyword and creates a result page. This result page is indexed by Google. The search result contains the title of the article, a short description (150-300 chars.) and a link to the article. So, Google believes, that there are about 5.000 pages instead of 1.000 because auf the link to the search result pages. The old rule was for me: More pages in Google = better. But is this still true nowadays? Would be a "noindex, follow" better on these search result pages? (Is there a way to tell Google that this is a search result page?) Best wishes, Georg.
On-Page Optimization | | GeorgFranz0