Robots.txt and canonical tag
-
In the SEOmoz post - http://www.seomoz.org/blog/robot-access-indexation-restriction-techniques-avoiding-conflicts, it's being said -
If you have a robots.txt disallow in place for a page, the canonical tag will never be seen.
Does it so happen that if a page is disallowed by robots.txt, spiders DO NOT read the html code ?
-
Thanks Ryan for explaining things very clearly.
-
What we know is there have been many cases where a page that is blocked in robots.txt has appeared in search results. The explanation provided is that robots.txt blocks crawlers during normal site visits, but not necessarily on visits where they are following links from other sites.
-
If spiders follow links to an article on my site, will they read the contents then ? If the canonical tag is on article page itself, will canonical tag will be seen ?
-
Daylan offered a great answer but I would like to add one exception. When crawlers from the major SEs visit your site they will honor your robots.txt file but sometimes they will follow links from other sites to an article on your site, and during that particular visit they will not see the robots.txt file and index your page.
This is one of the reasons why your robots.txt file should be used as minimally as possible, and when it is used you should have a backup process in place such as the canonical or noindex tag on a page.
-
Thanks Daylan for your quick response. I just wanted a second opinion that canonical tag will never be seen if a page is disallowed.
-
Thats correct in most cases:
It works likes this: a robot wants to vists a Web site URL, say http://www.example.com/welcome.html. Before it does so, it firsts checks for http://www.example.com/robots.txt, and finds:
User-agent: *
Disallow: /The "User-agent: *" means this section applies to all robots. The "Disallow: /" tells the robot that it should not visit any pages on the site.
Robots can ignore your /robots.txt. Especially malware robots that scan the web for security vulnerabilities, and email address harvesters used by spammers will pay no attention.
More information available here about:
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Canonical Page Question
Hi, I have a question relation to Canonical pages That i need clearing up. I am not sure that my bigcommere website is correctly configured and just wanted clarification from someone in the know. Take this page for example https://www.fishingtackleshop.com.au/barra-lures/ Canonical link is https://www.fishingtackleshop.com.au/barra-lures/ The Rel="next" link is https://www.fishingtackleshop.com.au/barra-lures/?sort=bestselling&page=2 and this page has a canonical tag as rel='canonical' href='https://www.fishingtackleshop.com.au/barra-lures/?page=2' /> Is this correct as above and working as it should or should the canonical tag for the second (pagination page) https://www.fishingtackleshop.com.au/barra-lures/?page=2 in our source code be saying rel='canonical' href='https://www.fishingtackleshop.com.au/barra-lures/' />
Technical SEO | | oceanstorm0 -
Is there an percentage of duplicate content required before you should use a canonical tag?
Is there a percentage (approximate or exact) of duplicate content you should have before you use a canonical tag? Similarly how does Google handle canonical tags if the pages aren’t 100% duplicate? I've added some background and an example below; Nike Trainer model 1 – has an overview page that also links to a sub-page about cushioning, one about Gore-Tex and one about breathability. Nike Trainer model 2,3,4,5 – have an overview page that also links to sub-pages page about cushioning , Gore-Tex and breathability. In each of the sub-pages the URL is a child of the parent so a distinct page from each other e.g. /nike-trainer/model-1/gore-tex /nike-trainer/model-2/gore-tex. There is some differences in material composition, some different images and of course the product name is referred multiple times. This makes the page in the region of 80% unique.
Technical SEO | | punchseo0 -
Best way to create robots.txt for my website
How I can create robots.txt file for my website guitarcontrol.com ? It is having login and Guitar lessons.
Technical SEO | | zoe.wilson170 -
Have I constructed my robots.txt file correctly for sitemap autodiscovery?
Hi, Here is my sitemap: User-agent: * Sitemap: http://www.bedsite.co.uk/sitemaps/sitemap.xml Directories Disallow: /sendfriend/
Technical SEO | | Bedsite
Disallow: /catalog/product_compare/
Disallow: /media/catalog/product/cache/
Disallow: /checkout/
Disallow: /categories/
Disallow: /blog/index.php/
Disallow: /catalogsearch/result/index/
Disallow: /links.html I'm using Magento and want to make sure I have constructed my robots.txt file correctly with the sitemap autodiscovery? thanks,0 -
Canonical Tag - Magento - Help
Hello, I was hoping to get some help or tips on how to best control the canonical tag on a Magento based website. When you go into the Magento admin and enable the option to use the canonical tag on pages, all that does is input the canonical tag to the exact page just with the http:// in the url. My goal is to use the canonical tag on specific pages and point it to other pages, not just the same page with an http:// For example, right now for page: example.com/question/baseball the canonical tag is pointing to http://example.com/question/baseball What i want is to be able to do is take: example.com/question/baseball and have the canonical tag point to example.com/question/baseballbats Is this possible? Does what I'm saying make sense? Please let me know what you all think.... Thanks!
Technical SEO | | Prime850 -
Robots.txt file
How do i get Google to stop indexing my old pages and start indexing my new pages even months down the line? Do i need to install a Robots.txt file on each page?
Technical SEO | | gimes0 -
What if meta description tag comes before meta title tag? Do the search engines disregard or penalize if the order is not title then description in the HTML?
Do the search engines disregard or penalize if the order is not title then description in the HTML? A client's webmaster is a newbie to SEO and did just this. Suggestions?
Technical SEO | | alankoen1230 -
Canonical Tag Pointing To The Same URL
Does it matter if a canonical tag points to the URL in which the tag is on? Example Page: http://www.domain.com Canonical tag: rel="canonical" href="http://www.domain.com" /> I only ask because a client of mine has a CMS that automatically does that to every page on the site and there's no way to remove it. Will this have a negative impact or does it not matter at all? Any insights would be great because I can't find a clear answer anywhere online. Thanks!
Technical SEO | | MichaelWeisbaum0