Why doesn't Moz crawler follow robots.txt?
-
It is crawling the entire site, and there is stuff we do not want it to. Please advise.
-
Which I am ok with, but why am I getting duplicate content?
-
Yes, it doesn't tell them which pages not to crawl - just not to index them
-
It has been used correctly. The site is a Magento site and they have it built in. There are a lot of filters for products so it uses rel=canonical to tell Google which to index.
-
rel=canonical is not really an robots instruction file - rel=canonical is to help with duplicate copy where you have the same or similar pages and your telling search engines which pages is the preferred page.
If you don't want pages crawling you have to tell Search engines in the robots file
-
Hi There,
Rel=canonical tags tell robots, which page is actually to index out of many.
For SEOs, canonicalization refers to individual web pages that can be loaded from multiple URLs. This is a problem because when multiple pages have the same content but different URLs, links that are intended to go to the same page get split up among multiple URLs. This means that the popularity of the pages gets split up. Unfortunately for web developers, this happens far too often because the default settings for web servers create this problem.
https://moz.com/learn/seo/canonicalization
I feel you have not used it correctly, check the above article and see if it helps.
Thanks,
Vijay
-
So I made a mistake it isn't the robots.txt that is the issue. I am getting hit with a ton of duplicate content penalties so I figured that was it. The problem is that I have pages with rel=canonical tags that it is ignoring. Does Roger not read those?
-
Hi
Have to agree with the above, Rogerbot does listen to robot.txt file, unlike Bing - while they are getting better Bing ignores the robots.txt file frequently.
Ive analysed quite a few server logs over the years and Roger has always listened to the file - its usually a mistake the in the robots file.
There is an option to test your robots.txt file in GCS - while this is testing to see if Google will crawl the page - usually Roger has the same instructions as Google.
However if you are still pretty certain that Roger is ignoring robots.txt please DM your Server Logs and your website and I will take a look and analyse it for you (free of course).
Thanks
Andy
-
All major search engines, including Moz's crawler Rogerbot and Internet Archives, respect Robots.txt as a standard “robots exclusion protocol” to communicate with web crawlers and web robots.
In case you wish to exclude some specific information from all Search Engines, you can use the following sample code as reference to block specific directories.
User-agent: *
Disallow: /cgi-bin/
Disallow: /tmp/
Disallow: /junk/However, if you want to specifically block Mz's Rogerbot from crawling specific sections of your website. You may take the following reference code to block specific areas / directories in your website from rogerbot:
User-agent: Rogerbot
Disallow: /cgi-bin/
Disallow: /tmp/
Disallow: /junk/I hope this helps, If you have specific questions, please feel free to respond, I will be happy to answer them.
Regards,
Vijay
-
Hi there! Moz's crawler, rogerbot, does follow robots.txt. When he's not following robots.txt, it's usually because the robots.txt protocol is formatted improperly. Learn more about formatting your page here: https://moz.com/learn/seo/robotstxt
For more information on Roger, including how to block him, head here: https://moz.com/help/guides/moz-procedures/what-is-rogerbot
And if you want to test your formatting, try the Robots Checker here: https://support.google.com/webmasters/answer/6062598
If you're still unable to determine why rogerbot is crawling your site, feel free to write in to help@moz.com!
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Why Moz is not providing keywords data above 300K
When we extract keywords of a website, moz shows us 300K results, they are enough but, why moz is not showing all keywords data
Link Explorer | | nkodiou1 -
What's the easiest way to see a list of domains ranking well for a keyword with its DA and traffic numbers?
Hi. I'm trying to figure out how to easily see, for a given keyword, the ranking domains, their domain authority, and their traffic. Is it possible to get this all on one screen?
Link Explorer | | scott.thomason.scott80350 -
Moz's new Link Explorer displaying the DA marginally less than Site Explorer
Moz's new Link Explorer displaying the DA marginally less than Site Explorer. Old one is showing it 46 while new link explorer is showing the DA as 40.
Link Explorer | | dhananjay.kumar11 -
Page Authority different between Moz Open Site Explorer and On Page analyser
I have a client who has an old URL with 3 linking root domains and 4 links, with a PA of 24. This is on the open site explorer. There is a 301 redirect in place to direct this old url to the new url. When he uses the Moz bar on page, it shows that he has a PA of only 1 with no LRDs or inbound links? Can anyone please explain why this is happening and if this could affect further page links across his site? It would be appreciated. Kind regards
Link Explorer | | lisa_rothery0 -
Does using Sucuri block Moz?
I am using Opensite explorer on my site which uses Sucuri and it shows 0 for everything. Does Sucuri stop MOZ from reading the link? I also suspect that using Sucuri has made my SEO suffer because the first page is always saying "redirecting" . Anyone with experience to this? Thanks
Link Explorer | | seoprojecter0 -
Question in regards to Comparing Metrics with MOZ tools
Hi all! After using the "Compare Links Metrics" in the research tools, a few questions popped up. Particularly, in regards to what key metrics to look for when determining if a websites outbound links are trustworthy and may pass Link-juice. Basically, "if my link is put on that website, will search engines trust it"? I've copied an example below and typed a few comments next to particular metrics. Please share some insight them: Page Authority: 48 Page MozRank: 4.94 -Is 4.94 considered OK? Page MozTrust: 4.96 -Is 4.96 considered OK? Internal Equity-Passing Links: 2,819 External Equity-Passing Links: 2,326 -Lower = better? Total Equity-Passing Links: 5,145 Total Internal Links: 2,844 Total External Links: 2,383 - Lower = better? Total Links: 5,227 Followed Linking Root Domains: 88 Total Linking Root Domains: 129 Linking C Blocks: 30 Equity-Passing Links vs
Link Explorer | | 90miLLA
Non-Equity-Passing Links:** I am assuming that the higher the Non-equity passing links VS the Equity passing links is typically better? What would be an idea ratio?** Internal Links vs
External Links: Lower external = better?0 -
Does Moz have anything similar to ahrefs batch analysis tool?
I am wanting to do analysis of a whole bunch of URLs at once - i know ahrefs already has a very good tool for this - but I don't really want to have to pay the $79 a month if moz has something similar? I know it has the OSE. Can I do batch analysis in this? Thanks
Link Explorer | | SWD.Advertising1 -
How long does it take for moz to index new domain?
Hi, Two months ago we changed our site www.jicht.nl to jicht.nl. Everthing is redirected properly but the page authority remains 1. How long does is take rogerbot to index the site without www? Google rankings are also dropping so there might be a bigger issue. Thanx
Link Explorer | | AlkaVitae0