Does Google respect User-agent rules in robots.txt?
-
We want to use an inline linking tool (LinkSmart) to cross link between a few key content types on our online news site.
LinkSmart uses a bot to establish the linking.
The issue: There are millions of pages on our site that we don't want LinkSmart to spider and process for cross linking.
LinkSmart suggested setting a noindex tag on the pages we don't want them to process, and that we target the rule to their specific user agent.
I have concerns. We don't want to inadvertently block search engine access to those millions of pages. I've seen googlebot ignore nofollow rules set at the page level. Does it ever arbitrarily obey rules that it's been directed to ignore?
Can you quantify the level of risk in setting user-agent-specific nofollow tags on pages we want search engines to crawl, but that we want LinkSmart to ignore?
-
Does Google respect User-agent rules in robots.txt?
Yes
I've seen googlebot ignore nofollow rules set at the page level.
Google honors the nofollow rules set at the page level. The issue is there may be other links on your site or elsewhere on the web that Google will find and follow those links.
Robots.txt is the absolute last means to use for blocking pages. You should not block a page with robots.txt unless you have exhausted all other options. A more appropriate method of keeping a page out of the index is the noindex tag. If you use the tag appropriately, Google will honor the tag.
-
Hi,
I would advise to block the directories which the files sit in in robots.txt, over adding no index tags to specific pages.
Yet then this would also leave these pages to not be indexed by Google, other search engines and also this Link Smart software you are referring to.
The thing is if you add a no index tag or if you add a robots .txt block to pages it will also block all search engines too.
So yes their is some risk involved, you have to do things carefully around this area.
Kind Regards,
James.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Strange google indexing behaviour
Hi all Looking for a second opinion on a strange issue with has occurred on my site. The site is a magento store and because I am using all the default merchant descriptions at the moment I have noindexed the product pages (there are 300k products, the plan is to rewrite the content as we go, starting with most popular sellers). The Gbot is blocked from the pages and all the products have header tag. We forgot to noindex the popular search terms page on the site and as a result google has indexed some search result pages - we may keep this open, not sure yet, We are seeing a very strange thing in the serps. Google has indexed the search result pages, as mentioned above, however, the description and title tag being used do not belong to that page, they belong to the product page the search result links to. If i do a search in google for the indexed pages i get the categories and lots of, what appears to be, product pages. https://www.google.co.uk/search?q=site:arropa.co.uk/store&espv=2&biw=1536&bih=772&ei=LE5xVd3qA4HlUNnggKgH&start=250&sa=N One would assume that a page listed with the title of Ladies 1 Pair Young Trasparenze Mumbai Animal Print . and the description of Come on, program a little of your crazy side! Part of the edgy, sassy Young Trasparenze Medley, these soft touch, nontransparent stockings function a crazy, (along with the price) would be an entry for that individual product. However, clicking on that product opens up a search results page (very slowly as the site is processing an update still - it is not for public use thus far) which can be seen here http://arropa.co.uk/store/catalogsearch/result/?q=+ladies+1+pair+young+trasparenze+mumbai+animal+print+tights+75+off+military+l+ yes, the search result page is for that particular item but nowhere on the page is the title, description and price, nor has it ever been. Am a little puzzled about this and what it would do re duplicate content as im using the manufacturer data at present. Ideally i would like to keep the search results pages open. Any thoughts would be most welcome. Couple of things to note. Im aware the site is too slow for general public use. It will be fully cached once running, as i say, it has 300k+ products so isn't small. Also, am aware that there are no images. They exist but we are moving the images around, hence being down. Always a fun task when there are 25gb of the things!! Many thanks Carl
On-Page Optimization | | WonkyDog0 -
Meta Title in Google does not match the HTML meta title I have coded in a site
I have a client site that is pulling a meta title that is not in his code. I am using Yoast for the titles and descriptions on this site. Not 100% sure why Google is not listing the title we have in place. Could the code be pulling from somewhere else? Is there a fix for this?
On-Page Optimization | | Bryan_Loconto0 -
Will google put logo's in as author snippets?
Are they smart enough to tell it is not a mug shot and then not show it? Has anyone ever seen a logo as a snippet? What are some of the factors to with whether they show them or not?
On-Page Optimization | | Adsau0 -
How do i block an entire category/directory with robots.txt?
Anyone has any idea how to block an entire product category, including all the products in that category using the robots.txt file? I'm using woocommerce in wordpress and i'd like to prevent bots from crawling every single one of products urls for now. The confusing part right now is that i have several different url structures linking to every single one of my products for example www.mystore.com/all-products, www.mystore.com/product-category, etc etc. I'm not really sure how i'd type it into the robots.txt file, or where to place the file. any help would be appreciated thanks
On-Page Optimization | | bricerhodes0 -
Does Google give weight or importance to scholarly articles such as those found in pubmed?
Does Google give weight or importance to scholarly articles such as those found in pubmed? www.ncbi.nlm.nih.gov/pubmed Do you think it matters to Google if you format and word your contents so that they look like research articles?
On-Page Optimization | | monchconch0 -
Do images on a CDN affect my Google Ranking?
I have recently switched my images to a CDN (MaxCDN) and all of the images within my post are now get loaded directly from the CDN. Will this affect my Google ranking? Do Google care if the image is hosted physicaly on the domain?
On-Page Optimization | | Amosnet0 -
Photogallery and Robots.txt
Hey everyone SEOMOZ is telling us that there are to many onpage links on the following page: http://www.surfcampinportugal.com/photos-of-the-camp/ Should we stop it from being indexed via Robots.txt? best regards and thanks in advance... Simon
On-Page Optimization | | Rapturecamps0