Why doesn't Moz crawler follow robots.txt?
-
It is crawling the entire site, and there is stuff we do not want it to. Please advise.
-
Which I am ok with, but why am I getting duplicate content?
-
Yes, it doesn't tell them which pages not to crawl - just not to index them
-
It has been used correctly. The site is a Magento site and they have it built in. There are a lot of filters for products so it uses rel=canonical to tell Google which to index.
-
rel=canonical is not really an robots instruction file - rel=canonical is to help with duplicate copy where you have the same or similar pages and your telling search engines which pages is the preferred page.
If you don't want pages crawling you have to tell Search engines in the robots file
-
Hi There,
Rel=canonical tags tell robots, which page is actually to index out of many.
For SEOs, canonicalization refers to individual web pages that can be loaded from multiple URLs. This is a problem because when multiple pages have the same content but different URLs, links that are intended to go to the same page get split up among multiple URLs. This means that the popularity of the pages gets split up. Unfortunately for web developers, this happens far too often because the default settings for web servers create this problem.
https://moz.com/learn/seo/canonicalization
I feel you have not used it correctly, check the above article and see if it helps.
Thanks,
Vijay
-
So I made a mistake it isn't the robots.txt that is the issue. I am getting hit with a ton of duplicate content penalties so I figured that was it. The problem is that I have pages with rel=canonical tags that it is ignoring. Does Roger not read those?
-
Hi
Have to agree with the above, Rogerbot does listen to robot.txt file, unlike Bing - while they are getting better Bing ignores the robots.txt file frequently.
Ive analysed quite a few server logs over the years and Roger has always listened to the file - its usually a mistake the in the robots file.
There is an option to test your robots.txt file in GCS - while this is testing to see if Google will crawl the page - usually Roger has the same instructions as Google.
However if you are still pretty certain that Roger is ignoring robots.txt please DM your Server Logs and your website and I will take a look and analyse it for you (free of course).
Thanks
Andy
-
All major search engines, including Moz's crawler Rogerbot and Internet Archives, respect Robots.txt as a standard “robots exclusion protocol” to communicate with web crawlers and web robots.
In case you wish to exclude some specific information from all Search Engines, you can use the following sample code as reference to block specific directories.
User-agent: *
Disallow: /cgi-bin/
Disallow: /tmp/
Disallow: /junk/However, if you want to specifically block Mz's Rogerbot from crawling specific sections of your website. You may take the following reference code to block specific areas / directories in your website from rogerbot:
User-agent: Rogerbot
Disallow: /cgi-bin/
Disallow: /tmp/
Disallow: /junk/I hope this helps, If you have specific questions, please feel free to respond, I will be happy to answer them.
Regards,
Vijay
-
Hi there! Moz's crawler, rogerbot, does follow robots.txt. When he's not following robots.txt, it's usually because the robots.txt protocol is formatted improperly. Learn more about formatting your page here: https://moz.com/learn/seo/robotstxt
For more information on Roger, including how to block him, head here: https://moz.com/help/guides/moz-procedures/what-is-rogerbot
And if you want to test your formatting, try the Robots Checker here: https://support.google.com/webmasters/answer/6062598
If you're still unable to determine why rogerbot is crawling your site, feel free to write in to help@moz.com!
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
How to tell moz about the domain redirects?
If I redirected my previous domain to my new domain, how can I tell moz that the previous domain is redirected?
Link Explorer | | LannaRosse0 -
Strange error in MOZ report
I get the following warning about our domain name in Link Explorer Moz tool You entered the URL debtacademy.com which redirects to www.hugedomains.com/domain_profile.cfm?d=debtacademy&e=com. Click here to analyze www.hugedomains.com/domain_profile.cfm?d=debtacademy&e=com instead. Please advice me. How I can fix it.
Link Explorer | | jeffreyjohnson0 -
Moz is not indexing all my backlinks
Why is there a difference in the backlinks of my website when checked in moz.com and SEMrush.com.
Link Explorer | | mzakaria0 -
No backlinks reporting in MOZ OSE?
We have a bunch of links and none of them report in MOZ OSE. Does the MOZ OSE tool have a cache that takes a while to refresh? We have recently moved over to CloudFlare and since the move the numbers do not add up. Here is a link to a ahrefs screenshot. Any insights would be greatly appreciated. moz.png
Link Explorer | | seanallen0070 -
Why aren't my "page social metrics" increasing?
I post a lot to Facebook & twitter, & my "page social metrics" haven't budged in 4 weeks. I even stopped using bit.ly & stated the full URL as a test. Still no change. The fb account is /getgoodgifts & twitter is /giftsing. Thoughts on why social metrics aren't increasing?
Link Explorer | | giftsing0 -
Moz showing +3M new inbound links. But nowhere to find them?
Hi there Moz is showing a +3M backlink rise in the dashboard for one of our domains. The site has always had around 350K backlinks and in Ahrefs and Majestic it still shows this number. But Moz shows a growth of +3M in the last two weeks. Is there a way to see where these backlinks come from in OSE? I can't seem to understand how it is possible to see this somehow. Can it be a mistake of Moz maybe?
Link Explorer | | snorkel1 -
Why isn't OSE showing any of my links?
My domain uses a redirect of all traffic to https. The site is https://www.tallslimtees.com. I've been working on it this year and know there are several good, topical links coming in. But OSE shows nothing. Any idea why this would be the case? How can I see all of my links and the data on them?
Link Explorer | | DanDeceuster0 -
Moz crawling bot
Hi guys, in OpenSiteExplorer -> Top Pages, there are no page titles displayed in a raport for certain domain, and "HTTP Status" column shows: "Blocked by robots.txt". I tried to find out what the ID of Moz crawling bot is, and on this page: http://moz.com/community/q/seomoz-spider-bot-details someone says it's: Mozilla/5.0 (compatible; rogerBot/1.0; http://www.seomoz.org/dp/rogerbot). However, my robots.txt doesn't have such entry. Take a look: Automatically banned scanners and crawlers section User-agent: 008 Disallow: / user-agent: AhrefsBot Disallow: / User-agent: MJ12bot Disallow: / User-agent: metajobbot Disallow: / User-agent: Exabot Disallow: / User-agent: Ezooms Disallow: / User-agent: fyberspider Disallow: / User-agent: dotbot Disallow: / User-agent: MojeekBot Disallow: / Section end What could be the problem here, then? Why does the Moz bot think I'm blocking it?
Link Explorer | | superseopl0