Robots.txt and canonical tag
-
In the SEOmoz post - http://www.seomoz.org/blog/robot-access-indexation-restriction-techniques-avoiding-conflicts, it's being said -
If you have a robots.txt disallow in place for a page, the canonical tag will never be seen.
Does it so happen that if a page is disallowed by robots.txt, spiders DO NOT read the html code ?
-
Thanks Ryan for explaining things very clearly.
-
What we know is there have been many cases where a page that is blocked in robots.txt has appeared in search results. The explanation provided is that robots.txt blocks crawlers during normal site visits, but not necessarily on visits where they are following links from other sites.
-
If spiders follow links to an article on my site, will they read the contents then ? If the canonical tag is on article page itself, will canonical tag will be seen ?
-
Daylan offered a great answer but I would like to add one exception. When crawlers from the major SEs visit your site they will honor your robots.txt file but sometimes they will follow links from other sites to an article on your site, and during that particular visit they will not see the robots.txt file and index your page.
This is one of the reasons why your robots.txt file should be used as minimally as possible, and when it is used you should have a backup process in place such as the canonical or noindex tag on a page.
-
Thanks Daylan for your quick response. I just wanted a second opinion that canonical tag will never be seen if a page is disallowed.
-
Thats correct in most cases:
It works likes this: a robot wants to vists a Web site URL, say http://www.example.com/welcome.html. Before it does so, it firsts checks for http://www.example.com/robots.txt, and finds:
User-agent: *
Disallow: /The "User-agent: *" means this section applies to all robots. The "Disallow: /" tells the robot that it should not visit any pages on the site.
Robots can ignore your /robots.txt. Especially malware robots that scan the web for security vulnerabilities, and email address harvesters used by spammers will pay no attention.
More information available here about:
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
No: 'noindex' detected in 'robots' meta tag
Pages on my site show No: 'noindex' detected in 'robots' meta tag. However, when I inspect the pages html, it does not show noindex. In fact, it shows index, follow. Majority of pages show the error and are not indexed by Google...Not sure why this is happening. The page below in search console shows the error above...
Technical SEO | | Sean_White_Consult0 -
Canonical Tags Before HTTPS MIgration
Hi Guys I previously asked a question that was helpfully answered on this forum, but I have just one last question to ask. I'm migrating a site tomorrow from http to https. My one question is that it was mentioned that I may need to "add canonical tags to the http pages, pointing to their https equivalent prior to putting the server level redirect in place. This is to ensure that you won't be causing yourself issues if the redirect fails for any reason." This is an e-commerce site with a number of links, is there a quick way of doing this? Many Thanks
Technical SEO | | ruislip180 -
Good robots txt for magento
Dear Communtiy, I am trying to improve the SEO ratings for my website www.rijwielcashencarry.nl (magento). My next step will be implementing robots txt to exclude some crawling pages.
Technical SEO | | rijwielcashencarry040
Does anybody have a good magento robots txt for me? And what need i copy exactly? Thanks everybody! Greetings, Bob0 -
Do I have a robots.txt problem?
I have the little yellow exclamation point under my robots.txt fetch as you can see here- http://imgur.com/wuWdtvO This version shows no errors or warnings- http://imgur.com/uqbmbug Under the tester I can currently see the latest version. This site hasn't changed URLs recently, and we haven't made any changes to the robots.txt file for two years. This problem just started in the last month. Should I worry?
Technical SEO | | EcommerceSite0 -
Do H2 tags carry more weight than h4 tags?
Of course H tags are key signals for relevance in search. Does an h2 tag send a significantly "louder" signal than an h4 tag?
Technical SEO | | aj6130 -
Canonical redirects
Hello, I have a quick question: I use wordpress for my website. I have a plugin for translating the website in other languages. Thus, I have 2 versions of urls, one with /en, one without (original languale). This has been seen as duplicate content. I have been advised that the best to do is to use canonical redirect. Should I use it on the general header.php (the only header I can find in the CMS), or should I redirect each page singularly? I believe the second is the best way, but I can't find headers and txt documents for each page in my FTP. As well I have seen this post, in which is explained that canonical redirects can be done directly in the general header.php http://www.bin-co.com/blog/2009/02/avoid-duplicate-content-use-canonical-url-in-wordpress-fix-plugin/ Is it true? You have any suggestion?
Technical SEO | | socialengaged
Thanks! 🙂 Eugenio0 -
Should I include tags in sitemap?
Hello All, I was wondering if you should include tags and categories in your sitemap. In the past on previous blogs I have always left tags and categories out. The reason for this is a good friend of mine who has been doing SEO for a long time and inhouse always told me that this would result in duplicate content. I thought that it would be a great idea to get some input from the SEOmoz community as this obviously has a big affect on your blog and the number of pages indexed. Any help would be great. Thanks, Luke Hutchinson.
Technical SEO | | LukeHutchinson1 -
Rel=Canonical being ignored?
Hi all, We have a toys website that has several categories. It's setup such that each product has a primary category amongst the categories within it can be found. For example... Addendum's primary url is http://www.brightminds.co.uk/childrens-toys/board-games/addendum.htm but it can also be found here http://www.brightminds.co.uk/learning-toys/maths-learning/addendum.htm. Hence, in the for that url it has a rel=canonical that points to the first url. For some reason though seomoz ignores this and reports duplicate page content. It doesn't seem to record the canonical tag either. Any ideas what's going on? Thanks, Josh.
Technical SEO | | joshgeake_gmail.com0