Why doesn't Moz crawler follow robots.txt?
-
It is crawling the entire site, and there is stuff we do not want it to. Please advise.
-
Which I am ok with, but why am I getting duplicate content?
-
Yes, it doesn't tell them which pages not to crawl - just not to index them
-
It has been used correctly. The site is a Magento site and they have it built in. There are a lot of filters for products so it uses rel=canonical to tell Google which to index.
-
rel=canonical is not really an robots instruction file - rel=canonical is to help with duplicate copy where you have the same or similar pages and your telling search engines which pages is the preferred page.
If you don't want pages crawling you have to tell Search engines in the robots file
-
Hi There,
Rel=canonical tags tell robots, which page is actually to index out of many.
For SEOs, canonicalization refers to individual web pages that can be loaded from multiple URLs. This is a problem because when multiple pages have the same content but different URLs, links that are intended to go to the same page get split up among multiple URLs. This means that the popularity of the pages gets split up. Unfortunately for web developers, this happens far too often because the default settings for web servers create this problem.
https://moz.com/learn/seo/canonicalization
I feel you have not used it correctly, check the above article and see if it helps.
Thanks,
Vijay
-
So I made a mistake it isn't the robots.txt that is the issue. I am getting hit with a ton of duplicate content penalties so I figured that was it. The problem is that I have pages with rel=canonical tags that it is ignoring. Does Roger not read those?
-
Hi
Have to agree with the above, Rogerbot does listen to robot.txt file, unlike Bing - while they are getting better Bing ignores the robots.txt file frequently.
Ive analysed quite a few server logs over the years and Roger has always listened to the file - its usually a mistake the in the robots file.
There is an option to test your robots.txt file in GCS - while this is testing to see if Google will crawl the page - usually Roger has the same instructions as Google.
However if you are still pretty certain that Roger is ignoring robots.txt please DM your Server Logs and your website and I will take a look and analyse it for you (free of course).
Thanks
Andy
-
All major search engines, including Moz's crawler Rogerbot and Internet Archives, respect Robots.txt as a standard “robots exclusion protocol” to communicate with web crawlers and web robots.
In case you wish to exclude some specific information from all Search Engines, you can use the following sample code as reference to block specific directories.
User-agent: *
Disallow: /cgi-bin/
Disallow: /tmp/
Disallow: /junk/However, if you want to specifically block Mz's Rogerbot from crawling specific sections of your website. You may take the following reference code to block specific areas / directories in your website from rogerbot:
User-agent: Rogerbot
Disallow: /cgi-bin/
Disallow: /tmp/
Disallow: /junk/I hope this helps, If you have specific questions, please feel free to respond, I will be happy to answer them.
Regards,
Vijay
-
Hi there! Moz's crawler, rogerbot, does follow robots.txt. When he's not following robots.txt, it's usually because the robots.txt protocol is formatted improperly. Learn more about formatting your page here: https://moz.com/learn/seo/robotstxt
For more information on Roger, including how to block him, head here: https://moz.com/help/guides/moz-procedures/what-is-rogerbot
And if you want to test your formatting, try the Robots Checker here: https://support.google.com/webmasters/answer/6062598
If you're still unable to determine why rogerbot is crawling your site, feel free to write in to help@moz.com!
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
How is Moz DA affected by spam links? Disavow file?
So it does not appear that moz let's you upload your disavow file. So when moz calculates your DA how do spammy links factor in? After digging through our GA it appears our site was hit with the 2016 penguin update and never recovered. Our weekly visitors were 2k, then dropped to 500 and have stayed close to that level for a while. We've used the disavow tool, without success over the past 3 years. During that time we have done link out reach and built around 10 legit good quality DA links since. But we have not recovered. At this point i'm thinking I should just remove the disavow file. Moz says our spam score for our domain is 5%.
Link Explorer | | jessicapremier0 -
MOZ doesn't work for .dating and .chat domain extensions
I have been a MOZ subscriber for a few years now. I don't think MOZ works for .dating and .chat domain extensions. I have 2 sites that have authority 1 despite back links. Here are the details: https://oooo.dating > DA = 1 https://talk.chat > DA = 1 oooo has 221 links (Google search console) talk has 1317 links (Google search console) May be a MOZ staff member can look into this. If you are customer and use some of the newer domain extensions please share your details if you have the same problem.
Link Explorer | | dmcubed1 -
Different numbers in Moz bar vs. OSE
Hi, On a Google SERP, when i activate the MOZ toolbar one of the sites says it has 1,258 links / 667 RDs. But when I open this same site up in OSE I get 3 inbound links (see attch). What's going on? 0Au9H
Link Explorer | | sanjosepainting0 -
How is Moz Page Authority Calculated?
Can anyone refer me to a document that explains the measures that go into the calculation of Moz Page Authority? I'm chiefly interested in what counts and what doesn't -- follow links, no-follow links, internal and external, what (if any) on-site factors, etc.
Link Explorer | | GlennFerrell0 -
How long will it take for the changes we've made to reflect in Moz OSE spam score data?
I signed up for Moz to see the spam flags our site had triggered. As soon as I found out, we worked on it and have been trying to correct our mistakes but it's been more than a month and we've managed to neutralise zero flags. I would appreciate if someone can clarify how long the OSE data takes to refresh. Also, how do you combat the following three specific flags: Ratio of Followed to Nofollowed Subdomains Ratio of Followed to Nofollowed Domains Low Number of Pages Found Crawl only gets a valid response to a small number of pages. Thanks.
Link Explorer | | Oziti0 -
What's with Moz continuously pushing back the Index Update date?
Moz seems to keep pushing back the Index update date, it was the 24th until the 24th, then it got changed to the 30th, now it is may 5th.... What is the reason for this?
Link Explorer | | mgladman1 -
Moz can't crawl domain due to IP Geo redirect loop
Hi, I'm trying to crawl our domain www.salvationarmy.org.au via my Moz account and it only ever returns results for one page when it should be crawling more than 3,000 pages. In talking to support, they have said that because of the redirect we have in place it is creating a 302 loop and therefore not delivering results. Usually in this case I would obtain Moz's IP addresses and add them to the redirect settings as an exception, but Moz have said they use cloud-based services for crawling so the IPs change all the time. Does anyone have any idea how to solve this issue? At this point I've paid for a year's subscription to a product I can't use. Thanks, Mel
Link Explorer | | SalvationArmy0