Why doesn't Moz crawler follow robots.txt?
-
It is crawling the entire site, and there is stuff we do not want it to. Please advise.
-
Which I am ok with, but why am I getting duplicate content?
-
Yes, it doesn't tell them which pages not to crawl - just not to index them
-
It has been used correctly. The site is a Magento site and they have it built in. There are a lot of filters for products so it uses rel=canonical to tell Google which to index.
-
rel=canonical is not really an robots instruction file - rel=canonical is to help with duplicate copy where you have the same or similar pages and your telling search engines which pages is the preferred page.
If you don't want pages crawling you have to tell Search engines in the robots file
-
Hi There,
Rel=canonical tags tell robots, which page is actually to index out of many.
For SEOs, canonicalization refers to individual web pages that can be loaded from multiple URLs. This is a problem because when multiple pages have the same content but different URLs, links that are intended to go to the same page get split up among multiple URLs. This means that the popularity of the pages gets split up. Unfortunately for web developers, this happens far too often because the default settings for web servers create this problem.
https://moz.com/learn/seo/canonicalization
I feel you have not used it correctly, check the above article and see if it helps.
Thanks,
Vijay
-
So I made a mistake it isn't the robots.txt that is the issue. I am getting hit with a ton of duplicate content penalties so I figured that was it. The problem is that I have pages with rel=canonical tags that it is ignoring. Does Roger not read those?
-
Hi
Have to agree with the above, Rogerbot does listen to robot.txt file, unlike Bing - while they are getting better Bing ignores the robots.txt file frequently.
Ive analysed quite a few server logs over the years and Roger has always listened to the file - its usually a mistake the in the robots file.
There is an option to test your robots.txt file in GCS - while this is testing to see if Google will crawl the page - usually Roger has the same instructions as Google.
However if you are still pretty certain that Roger is ignoring robots.txt please DM your Server Logs and your website and I will take a look and analyse it for you (free of course).
Thanks
Andy
-
All major search engines, including Moz's crawler Rogerbot and Internet Archives, respect Robots.txt as a standard “robots exclusion protocol” to communicate with web crawlers and web robots.
In case you wish to exclude some specific information from all Search Engines, you can use the following sample code as reference to block specific directories.
User-agent: *
Disallow: /cgi-bin/
Disallow: /tmp/
Disallow: /junk/However, if you want to specifically block Mz's Rogerbot from crawling specific sections of your website. You may take the following reference code to block specific areas / directories in your website from rogerbot:
User-agent: Rogerbot
Disallow: /cgi-bin/
Disallow: /tmp/
Disallow: /junk/I hope this helps, If you have specific questions, please feel free to respond, I will be happy to answer them.
Regards,
Vijay
-
Hi there! Moz's crawler, rogerbot, does follow robots.txt. When he's not following robots.txt, it's usually because the robots.txt protocol is formatted improperly. Learn more about formatting your page here: https://moz.com/learn/seo/robotstxt
For more information on Roger, including how to block him, head here: https://moz.com/help/guides/moz-procedures/what-is-rogerbot
And if you want to test your formatting, try the Robots Checker here: https://support.google.com/webmasters/answer/6062598
If you're still unable to determine why rogerbot is crawling your site, feel free to write in to help@moz.com!
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
I want to know about the Impact of Do-follow backlinks and No-follow backlinks for increasing DA.
Have Do-follow backlinks and No-Follow backlinks same value to increase Domain Authority in Moz? Recently, I have created 290 profile backlinks which was do-follow. But from yesterday I have seen all backlinks are no-follow now according to the decision of sites' owners. My site's DA has already increased up to 35. Will this DA fall gradually from now? Apart from it, if I have created same backlinks for my another site; will my DA be 35? Please try to clarify it. My site is: Homeworkpaper.net Thanks. Waiting for your valuable answer from Moz official.
Link Explorer | | darrellpc0 -
MOZ Domain Authority Change frequency
Hello Team, I just want to know - is there any MOZ DA algorithm update frequency because we have not seen any movement in DA on website from last few months. Also, is there any matrices which affecting DA that might we are missing. Thanks in Advance.
Link Explorer | | adlift0 -
Moz crawling http rather than https site
Our site is secure but when I ask moz to crawl it by giving the root domain including https moz insists on crawling the non secure version. How do i force it to crawl the secure version?
Link Explorer | | media12340 -
MOZ doesn't work for .dating and .chat domain extensions
I have been a MOZ subscriber for a few years now. I don't think MOZ works for .dating and .chat domain extensions. I have 2 sites that have authority 1 despite back links. Here are the details: https://oooo.dating > DA = 1 https://talk.chat > DA = 1 oooo has 221 links (Google search console) talk has 1317 links (Google search console) May be a MOZ staff member can look into this. If you are customer and use some of the newer domain extensions please share your details if you have the same problem.
Link Explorer | | dmcubed1 -
How do I fix 885 Duplicate Page Content Errors appearing in my Moz Report due to categories?
Hi There, I want to set up my Moz report to send directly to a client however there are currently 885 duplicate page content errors displaying on the report. These are mostly caused by an item listed in multiple 'categories' and each category is a new pages/URL. I guess my questions are: 1. Does Google see these as duplicate page content? Or does it understand the categories are there for navigation purposes. 2. How do I clear these off my Moz report so that the client doesn't panic that there are some major issues on the site Thanks for your advice.
Link Explorer | | skehoe0 -
Why is moz telling me I have duplicate content, but neither the content nor the urls are duplicates?
I just upgraded our website to a new one. This is the first crawl of the new website. It is telling me I have 24 Critical issues, all of which are duplicate content errors. Thing is, the urls are not duplicates and the content on the page is not duplicated either. Example, here is one error where it says there are two duplicates: <colgroup><col width="595"></colgroup>
Link Explorer | | damon1212
| http://winterguardtarps.com/portfolio-item/props-10 |
| http://winterguardtarps.com/portfolio-item/props-2 |
| http://winterguardtarps.com/portfolio-item/props-7 | There are photos in our portfolio, and none of them are the same. I'm a bit of a noob, but what am I missing here?0 -
Why doesn't my site show up in Moz Site Explorer
I started this website about 2 months ago, www.guyetteroofing.com. I know it still needs a lot of work, but I can't find any information on it on the site explorer tool.
Link Explorer | | billyguyette1 -
What if all the URL's in my website reported by MoZ do not exist on my website? They existed on an old one that has now been rebuilt?
What if all the top URL's report by Moz in my website do not exist on my website? They only existed on an older website? The only exception in the homepage
Link Explorer | | FCB0