Robots.txt and canonical tag
-
In the SEOmoz post - http://www.seomoz.org/blog/robot-access-indexation-restriction-techniques-avoiding-conflicts, it's being said -
If you have a robots.txt disallow in place for a page, the canonical tag will never be seen.
Does it so happen that if a page is disallowed by robots.txt, spiders DO NOT read the html code ?
-
Thanks Ryan for explaining things very clearly.
-
What we know is there have been many cases where a page that is blocked in robots.txt has appeared in search results. The explanation provided is that robots.txt blocks crawlers during normal site visits, but not necessarily on visits where they are following links from other sites.
-
If spiders follow links to an article on my site, will they read the contents then ? If the canonical tag is on article page itself, will canonical tag will be seen ?
-
Daylan offered a great answer but I would like to add one exception. When crawlers from the major SEs visit your site they will honor your robots.txt file but sometimes they will follow links from other sites to an article on your site, and during that particular visit they will not see the robots.txt file and index your page.
This is one of the reasons why your robots.txt file should be used as minimally as possible, and when it is used you should have a backup process in place such as the canonical or noindex tag on a page.
-
Thanks Daylan for your quick response. I just wanted a second opinion that canonical tag will never be seen if a page is disallowed.
-
Thats correct in most cases:
It works likes this: a robot wants to vists a Web site URL, say http://www.example.com/welcome.html. Before it does so, it firsts checks for http://www.example.com/robots.txt, and finds:
User-agent: *
Disallow: /The "User-agent: *" means this section applies to all robots. The "Disallow: /" tells the robot that it should not visit any pages on the site.
Robots can ignore your /robots.txt. Especially malware robots that scan the web for security vulnerabilities, and email address harvesters used by spammers will pay no attention.
More information available here about:
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Hreflang and canonical
Hi all, I'm hoping someone can help me solve this once and for all! I keep getting hreflang errors on our site crawls and I cannot understand why. Does anything here look off to you? Thank you! JGdWcqu
Technical SEO | | eGInnovations1 -
H1 Tags the same as Title Tags and other meta questions
I run an ecom store that has about 800 live products. When everything got set up, no one set up the title tags correctly. So I am going through to update them in bulk. What I was going to do was to take the product name (which serves as the H1 tag), use that with a postfix | CompanyName. If length is an issue I trim it down. But the question is, will having essentially duplicate information in here be an issue? Also, when someone was setting up meta descriptions, they often used basically the product name or a half sentence. Would it be better to remove the descriptions and allow google to decide? I even had some that were literally just the brand name of the product, which I already removed.
Technical SEO | | ShockoeCommerce0 -
Canonical Tag when using Ajax and PhantomJS
Hello, We have a site that is built using an AJAX application. We include the meta fragment tag in order to get a rendered page from PhantomJS. The URL that is rendered to google from PhantomJS then is www.oursite.com/?escaped_fragment= In the SERP google of course doesnt include the hashtag in the URL. So my question, with this setup, do i still need a canonical tag and if i do, would the canonical tag be the escaped fragment URL or the regular URL? Much Appreciated!
Technical SEO | | RevanaDigitalSEO0 -
Rel= Canonical
Almost every one of my product has this message: Rel Canonical (Using rel=canonical suggests to search engines which URL should be seen as canonical. ) What is the best way to correct this?
Technical SEO | | tiffany11030 -
Canonical Tag - Magento - Help
Hello, I was hoping to get some help or tips on how to best control the canonical tag on a Magento based website. When you go into the Magento admin and enable the option to use the canonical tag on pages, all that does is input the canonical tag to the exact page just with the http:// in the url. My goal is to use the canonical tag on specific pages and point it to other pages, not just the same page with an http:// For example, right now for page: example.com/question/baseball the canonical tag is pointing to http://example.com/question/baseball What i want is to be able to do is take: example.com/question/baseball and have the canonical tag point to example.com/question/baseballbats Is this possible? Does what I'm saying make sense? Please let me know what you all think.... Thanks!
Technical SEO | | Prime850 -
Robots.txt - What is the correct syntax?
Hello everyone I have the following link: http://mywebshop.dk/index.php?option=com_redshop&view=send_friend&pid=39&tmpl=component&Itemid=167 I want to prevent google from indiexing everything that is related to "view=send_friend" The problem is that its giving me dublicate content, and the content of the links has no SEO value of any sort. My problem is how i disallow it correctly via robots.txt I tried this syntax: Disallow: /view=send_friend/ However after doing a crawl on request the 200+ dublicate links that contains view=send_friend is still present in the CSV crawl report. What is the correct syntax if i want to prevent google from indexing everything that is related to this kind of link?
Technical SEO | | teleman0 -
Rel=Canonical
Any downsides to adding the rel=canonical tag to the canonical page itself? It will make it easier for us to implement based on the way our site's templates work. For example, we would add to the page http://www.mysite.com/original-page.aspx The canonical tag would also appear on other dupe pages like: http://www.mysite.com/original-page.aspx?ref=93929299 http://www.mysite.com/original-page.aspx?ref=view29199292 etc
Technical SEO | | SoulSurfer80 -
Severe rank drop due to overwritten robots.txt
Hi, Last week we made a change to drupal core for an update to our website. We accidentally overwrote our good robots.txt that blocked hundreds of pages with the default drupal robots.txt. Several hours after that happened (and we didn't catch the mistake) our rankings dropped from mostly first, second place in Google organic to bottom and mid first page. Basically I believe we flooded the index with very low quality pages at once and threw a red flag and we got de-ranked. We have since fixed the robots.txt and have been re-crawled but have not seen a return in rank. Would this be a safe assumption of what happened? I haven't seen any other sites getting hit in the retail vertical yet in regards to any Panda 2.3 type of update. Will we see a return in our results anytime soon? Thanks, Justin
Technical SEO | | BrettKrasnove0