Why doesn't Moz crawler follow robots.txt?
-
It is crawling the entire site, and there is stuff we do not want it to. Please advise.
-
Which I am ok with, but why am I getting duplicate content?
-
Yes, it doesn't tell them which pages not to crawl - just not to index them
-
It has been used correctly. The site is a Magento site and they have it built in. There are a lot of filters for products so it uses rel=canonical to tell Google which to index.
-
rel=canonical is not really an robots instruction file - rel=canonical is to help with duplicate copy where you have the same or similar pages and your telling search engines which pages is the preferred page.
If you don't want pages crawling you have to tell Search engines in the robots file
-
Hi There,
Rel=canonical tags tell robots, which page is actually to index out of many.
For SEOs, canonicalization refers to individual web pages that can be loaded from multiple URLs. This is a problem because when multiple pages have the same content but different URLs, links that are intended to go to the same page get split up among multiple URLs. This means that the popularity of the pages gets split up. Unfortunately for web developers, this happens far too often because the default settings for web servers create this problem.
https://moz.com/learn/seo/canonicalization
I feel you have not used it correctly, check the above article and see if it helps.
Thanks,
Vijay
-
So I made a mistake it isn't the robots.txt that is the issue. I am getting hit with a ton of duplicate content penalties so I figured that was it. The problem is that I have pages with rel=canonical tags that it is ignoring. Does Roger not read those?
-
Hi
Have to agree with the above, Rogerbot does listen to robot.txt file, unlike Bing - while they are getting better Bing ignores the robots.txt file frequently.
Ive analysed quite a few server logs over the years and Roger has always listened to the file - its usually a mistake the in the robots file.
There is an option to test your robots.txt file in GCS - while this is testing to see if Google will crawl the page - usually Roger has the same instructions as Google.
However if you are still pretty certain that Roger is ignoring robots.txt please DM your Server Logs and your website and I will take a look and analyse it for you (free of course).
Thanks
Andy
-
All major search engines, including Moz's crawler Rogerbot and Internet Archives, respect Robots.txt as a standard “robots exclusion protocol” to communicate with web crawlers and web robots.
In case you wish to exclude some specific information from all Search Engines, you can use the following sample code as reference to block specific directories.
User-agent: *
Disallow: /cgi-bin/
Disallow: /tmp/
Disallow: /junk/However, if you want to specifically block Mz's Rogerbot from crawling specific sections of your website. You may take the following reference code to block specific areas / directories in your website from rogerbot:
User-agent: Rogerbot
Disallow: /cgi-bin/
Disallow: /tmp/
Disallow: /junk/I hope this helps, If you have specific questions, please feel free to respond, I will be happy to answer them.
Regards,
Vijay
-
Hi there! Moz's crawler, rogerbot, does follow robots.txt. When he's not following robots.txt, it's usually because the robots.txt protocol is formatted improperly. Learn more about formatting your page here: https://moz.com/learn/seo/robotstxt
For more information on Roger, including how to block him, head here: https://moz.com/help/guides/moz-procedures/what-is-rogerbot
And if you want to test your formatting, try the Robots Checker here: https://support.google.com/webmasters/answer/6062598
If you're still unable to determine why rogerbot is crawling your site, feel free to write in to help@moz.com!
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
MOZ Domain Authority Change frequency
Hello Team, I just want to know - is there any MOZ DA algorithm update frequency because we have not seen any movement in DA on website from last few months. Also, is there any matrices which affecting DA that might we are missing. Thanks in Advance.
Link Explorer | | adlift0 -
Can MOZ keyword explorer output sentences instead keywords?
Can MOZ keyword explorer output sentences instead keywords? We need to optimize our website for the Knowledge Graph (OG). There was an old tool that does it, but I can't remember its name.
Link Explorer | | Elchanan1 -
Moz Toolbar "Get Keyword Difficulty"
New Moz Pro user here, and I've a question about the green "Get Keyword Difficulty" button that shows up to the right of the Google Search Keyword Term Input field after a successful search. Clicking on this seems to have no effect for me what-so-ever. Doesn't take me to the keyword tool, download a report, or add anything to the one screen link analyses in the search results. In short, i'm not sure how this tool is supposed to function. I'm using the latest version of Google Chrome (Version 52.0.2743.60 beta-m (64-bit)) on a Windows 10 machine.
Link Explorer | | bvkinsight1 -
403 errors in Moz but not in Google Search Console
Hello, Moz is showing that one of the sites I manage has about ten 403 errors on main pages, including the home page. But when I go to Google Search Console, I'm not getting any 403 errors. I don't know too much about this site (I handle the SEO for a few sites as a contractor for a digital marketing agency), but I can see that it's a WordPress site (I'm not sure if that's relevant). Can I assume this a Moz issue only? Thanks, Susannah Noel
Link Explorer | | SusannahK.Noel0 -
Can't make sense of OSE and MOZ.
I checked this site on my OSE and it shows only 7 inbound links a DA of 14 and no social activity whatsoever yet when I check it on Majestic it shows ExternalBacklinks 71 ReferringDomains 20 Referring IPs 19 Referring Subnets 19 . And.. the page is #1 on google search for hypnotherapy michigan. How can it rate so poorly on MOZ, show so much more on majestic and rank so high on google? I thought MOZ data was supposed to be among the best and that top rated pages on google should also rate high on MOZ. here is the site http://hypnotherapy-detroit.com Additionally, when i look at the site, i notice that most of the backlinks are exchanged links and this site's link exchange page isn't even linked from the home page. Now I thought that kind of link exchange game was now discounted by Google. I don't get it. No social pages at all... low page rank... no new content.. so by MOZ standards there is no justification for this page to be anywhere near page one let alone at position #1. Can someone help me make sense of all this?
Link Explorer | | HypnoPro0 -
May I know multiple campaigns set up, with same site url is allowable in Moz campaign or not?
Hi Guys, I have a question regarding moz campaign. I am a Moz Pro member, have taken Standard Plan. (Time being, am not in a position to upgrade to a higher plan) I have utilized only one campaign slot among total 5 slots. I have added xyz.com website in one campaign, there I have added 350 keywords which is the maximum count we can add in a a single campaign. Still, I have plenty of keywords to track for same website xyz.com. I am not planning to use the remaining 4 campaign slots for time being for any other websites. May I know, if I can add same xyz.com website in remaining campaign slots for tracking the remaining keywords for my website?
Link Explorer | | zco_seo0 -
Is there some way to tell the Moz crawler not to crawl URL's with particular dynamic tags such as "?redirect-to:http//" ?
We are encountering an issue where the crawler is finding a ton of pages from our wordpress login url that has this dynamic tag in it to kinds of different blog entries. It's madness. I can't figure out what is causing these URLs to generate to be crawled in the first place! Does this sound familiar to anyone out there, any constructive suggestions? Robots text or maybe meta robots tags that would resolve this crawl issue?
Link Explorer | | RegistrarCorp0