Why doesn't Moz crawler follow robots.txt?
-
It is crawling the entire site, and there is stuff we do not want it to. Please advise.
-
Which I am ok with, but why am I getting duplicate content?
-
Yes, it doesn't tell them which pages not to crawl - just not to index them
-
It has been used correctly. The site is a Magento site and they have it built in. There are a lot of filters for products so it uses rel=canonical to tell Google which to index.
-
rel=canonical is not really an robots instruction file - rel=canonical is to help with duplicate copy where you have the same or similar pages and your telling search engines which pages is the preferred page.
If you don't want pages crawling you have to tell Search engines in the robots file
-
Hi There,
Rel=canonical tags tell robots, which page is actually to index out of many.
For SEOs, canonicalization refers to individual web pages that can be loaded from multiple URLs. This is a problem because when multiple pages have the same content but different URLs, links that are intended to go to the same page get split up among multiple URLs. This means that the popularity of the pages gets split up. Unfortunately for web developers, this happens far too often because the default settings for web servers create this problem.
https://moz.com/learn/seo/canonicalization
I feel you have not used it correctly, check the above article and see if it helps.
Thanks,
Vijay
-
So I made a mistake it isn't the robots.txt that is the issue. I am getting hit with a ton of duplicate content penalties so I figured that was it. The problem is that I have pages with rel=canonical tags that it is ignoring. Does Roger not read those?
-
Hi
Have to agree with the above, Rogerbot does listen to robot.txt file, unlike Bing - while they are getting better Bing ignores the robots.txt file frequently.
Ive analysed quite a few server logs over the years and Roger has always listened to the file - its usually a mistake the in the robots file.
There is an option to test your robots.txt file in GCS - while this is testing to see if Google will crawl the page - usually Roger has the same instructions as Google.
However if you are still pretty certain that Roger is ignoring robots.txt please DM your Server Logs and your website and I will take a look and analyse it for you (free of course).
Thanks
Andy
-
All major search engines, including Moz's crawler Rogerbot and Internet Archives, respect Robots.txt as a standard “robots exclusion protocol” to communicate with web crawlers and web robots.
In case you wish to exclude some specific information from all Search Engines, you can use the following sample code as reference to block specific directories.
User-agent: *
Disallow: /cgi-bin/
Disallow: /tmp/
Disallow: /junk/However, if you want to specifically block Mz's Rogerbot from crawling specific sections of your website. You may take the following reference code to block specific areas / directories in your website from rogerbot:
User-agent: Rogerbot
Disallow: /cgi-bin/
Disallow: /tmp/
Disallow: /junk/I hope this helps, If you have specific questions, please feel free to respond, I will be happy to answer them.
Regards,
Vijay
-
Hi there! Moz's crawler, rogerbot, does follow robots.txt. When he's not following robots.txt, it's usually because the robots.txt protocol is formatted improperly. Learn more about formatting your page here: https://moz.com/learn/seo/robotstxt
For more information on Roger, including how to block him, head here: https://moz.com/help/guides/moz-procedures/what-is-rogerbot
And if you want to test your formatting, try the Robots Checker here: https://support.google.com/webmasters/answer/6062598
If you're still unable to determine why rogerbot is crawling your site, feel free to write in to help@moz.com!
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Unsolved Why is Moz so bad at finding lost backlinks?
0 -
Why moz is not crawling my Backlinks & Keyword ?
Hello all brother & sisters, I want to know why moz bot crawler are not detected my backlinks & keyword from last 3-4 months not even single one. All other SEO crawler less popular than moz are detecting my keyword & backlinks. May I know is there any option to forcefully moz bot to crawl my all site specially.
Link Explorer | | loknath.tcms1 -
What steps should I follow after submitting disavow link list?
Hey Moz Team, This is Devesh Srivastava, you can call me Dev.
Link Explorer | | nikh26
Recently there was an update from your side and that was pretty good. I have analyzed model factors regarding Moz 2.0. Though my DA score has been down to 36 from 52, I would still say that an update was really necessary. I have some questions which require some clarification or you can say a proper solution.
As of now, I have removed all the Spam links which were coming to my website (https://edugorilla.com) and I have submitted the disavow list to GSC but I would like to know how should I begin my link building strategy keep everything in mind. And now that, I have removed those spam links how would Moz or how should I tell Moz or Google that those have been removed so how soon can I get back to my previous DA score? I am very much confused with the link building strategy now. What steps should I follow after submitting a disavow list and would Moz stats will change after Google Disavowing those links? Can anyone answer my question?1 -
Is there any study or detailed report on How Moz & Google calculates domain authority ?
I want to know, Is there any detailed report or study done by someone on the Domain authority calculations between Google & Moz. Mainly, I would like to know 1. How Google calculates Domain Authority and How Moz does ? 2. Is there any correlation in majority of the cases or both differs vastly ?
Link Explorer | | sssnee20 -
After how long Moz show the matrices of a new domain
I want to know the Matrices of this site " Clash of lights ". Its a new domain how can i find its matrices like DA PA etc in Moz. Please guide me through it.
Link Explorer | | Muhammadahamd0 -
Internal Equity-Passing Links not getting crawled in Moz Open Site Explorer?
Internal Equity-Passing Links not getting crawled in Moz Open Site Explorer. What is the cause of this? We've checked the robots.txt and htaccess file, but so far we can't find anything that would be blocking Moz from crawling the internal links. We manage loads of other clients on this platform and this is the first time we've run into this issue. What else can I check?
Link Explorer | | OozleMedia0 -
Why moz pro detects inexistent links?
I have a campaign in moz pro to my personal webpage for testing purposes and also a bit of learning. But i have a question: On link -> Link analysis i can see this: http://maqui.darkbolt.net/project/chat/index.php 404http://maqui.darkbolt.net/project/docs/index.php 404http://maqui.darkbolt.net/project/down/index.php 404http://maqui.darkbolt.net/project/foto/index.php 404http://maqui.darkbolt.net/project/news/index.php?news=1 404http://maqui.darkbolt.net/project/project/index.php 404http://maqui.darkbolt.net/project/ro/index.php 404http://maqui.darkbolt.net/project/who/index.php 404Obviously all these address doesn't exist. There are links on the page project/index.php linking to, for example, /chat/index.php.How can i resolve this problem on the stats? There's something bad really on the page? As i can see all links on the page are working properly.
Link Explorer | | Er_Maqui0 -
Do 'Just Discovered' Links get added to the main link index?
Hi, I was wondering if the 'Just Discovered' links get added to the main link crawl index? It would seem to make sense for them to do so, as this would enable the enable the link index to be more up to date than it would otherwise be. Observing the link index it would seem that at the moment it does not do this and they are totally separate indexes (based on personal observation). Thanks
Link Explorer | | James770