Why doesn't Moz crawler follow robots.txt?
-
It is crawling the entire site, and there is stuff we do not want it to. Please advise.
-
Which I am ok with, but why am I getting duplicate content?
-
Yes, it doesn't tell them which pages not to crawl - just not to index them
-
It has been used correctly. The site is a Magento site and they have it built in. There are a lot of filters for products so it uses rel=canonical to tell Google which to index.
-
rel=canonical is not really an robots instruction file - rel=canonical is to help with duplicate copy where you have the same or similar pages and your telling search engines which pages is the preferred page.
If you don't want pages crawling you have to tell Search engines in the robots file
-
Hi There,
Rel=canonical tags tell robots, which page is actually to index out of many.
For SEOs, canonicalization refers to individual web pages that can be loaded from multiple URLs. This is a problem because when multiple pages have the same content but different URLs, links that are intended to go to the same page get split up among multiple URLs. This means that the popularity of the pages gets split up. Unfortunately for web developers, this happens far too often because the default settings for web servers create this problem.
https://moz.com/learn/seo/canonicalization
I feel you have not used it correctly, check the above article and see if it helps.
Thanks,
Vijay
-
So I made a mistake it isn't the robots.txt that is the issue. I am getting hit with a ton of duplicate content penalties so I figured that was it. The problem is that I have pages with rel=canonical tags that it is ignoring. Does Roger not read those?
-
Hi
Have to agree with the above, Rogerbot does listen to robot.txt file, unlike Bing - while they are getting better Bing ignores the robots.txt file frequently.
Ive analysed quite a few server logs over the years and Roger has always listened to the file - its usually a mistake the in the robots file.
There is an option to test your robots.txt file in GCS - while this is testing to see if Google will crawl the page - usually Roger has the same instructions as Google.
However if you are still pretty certain that Roger is ignoring robots.txt please DM your Server Logs and your website and I will take a look and analyse it for you (free of course).
Thanks
Andy
-
All major search engines, including Moz's crawler Rogerbot and Internet Archives, respect Robots.txt as a standard “robots exclusion protocol” to communicate with web crawlers and web robots.
In case you wish to exclude some specific information from all Search Engines, you can use the following sample code as reference to block specific directories.
User-agent: *
Disallow: /cgi-bin/
Disallow: /tmp/
Disallow: /junk/However, if you want to specifically block Mz's Rogerbot from crawling specific sections of your website. You may take the following reference code to block specific areas / directories in your website from rogerbot:
User-agent: Rogerbot
Disallow: /cgi-bin/
Disallow: /tmp/
Disallow: /junk/I hope this helps, If you have specific questions, please feel free to respond, I will be happy to answer them.
Regards,
Vijay
-
Hi there! Moz's crawler, rogerbot, does follow robots.txt. When he's not following robots.txt, it's usually because the robots.txt protocol is formatted improperly. Learn more about formatting your page here: https://moz.com/learn/seo/robotstxt
For more information on Roger, including how to block him, head here: https://moz.com/help/guides/moz-procedures/what-is-rogerbot
And if you want to test your formatting, try the Robots Checker here: https://support.google.com/webmasters/answer/6062598
If you're still unable to determine why rogerbot is crawling your site, feel free to write in to help@moz.com!
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Different results when running spam score tests with and without www in moz
Hey, can anyone help with this? I am receiving extremely different results running spam score tests with and without www in moz.
Link Explorer | | DarkoA
No www - is getting a spam score 8
And with www. is 1. Should I be worried here, and how should I move to handle this? Thanks in advance guys1 -
Page Authority different between Moz Open Site Explorer and On Page analyser
I have a client who has an old URL with 3 linking root domains and 4 links, with a PA of 24. This is on the open site explorer. There is a 301 redirect in place to direct this old url to the new url. When he uses the Moz bar on page, it shows that he has a PA of only 1 with no LRDs or inbound links? Can anyone please explain why this is happening and if this could affect further page links across his site? It would be appreciated. Kind regards
Link Explorer | | lisa_rothery0 -
A few questions regarding Moz tools + E-commerce strategy
Hi everyone 🙂 I'm currently in the midst of optimizing a Scandinavian E-commerce site. I have a few questions, that hopefully someone will be able to help me get answered. Firstly, GoogleBots should be able to recognize "ø" as "oe", "æ" as "ae" and "å" as "aa" in the URL title. I've noticed that Moz' On-page grader does not support this unfortunately - has something changed or do Scandinavians just receive a little less love than the English? Secondly, how does one avoid keyword stuffing on E-commerce sites? The products that are displayed in category pages all make use of the same keyword that is targeted for that category. As such, some pages have 40+ mentions of the keyword, although in reality there are less than 15 (the rest being in the product names). Any tips or tricks on how to get this optimized or does Google simply recognize the site as an E-commerce site and somewhat ignores keyword stuffing - as long as the website has sufficient content? Thirdly, has something happened to Moz' Open Site Explorer? It seems like something has changed and when I checked for backlinks for the site today, only 3 was found. I know for a fact that many many more exist (which other tools also confirm when they scrape the site). Looking forward to hearing from all of you! Best, Mark
Link Explorer | | osn0 -
Moz Pro: Linking RDs to Page much lower than Google Search Console
I'm trying to use the Analyze Keyword tool in Moz Pro, and in the SERP Analysis table, my page has a PA of one, and zero root domains linking to it. If I look at the page in Google Search console, it says I have 229 root domains linking to the page from well known domains like github.com, meetup.com, stackoverflow.com, etc. This particular keyword has been tracked in Moz for the last 6 months, but I just noticed that it was extremely low. I am relatively new to Moz, so forgive me if I sound confused, but can someone explain to me how the numbers can be so low?
Link Explorer | | jakebellacera0 -
DA hasn't Moved over
We moved our site over a couple of weeks ago from bi-mi.com to bimi.co - its a magento site. We followed instructions to wait 2 weeks and upload a coming soon page etc to help google realise it was not just a parked domain. We moved everything over and then set a hta record up to redirect the "bi-mi" base domain to "bimi" , we then logged teh site move within web master tools also and I set the bimi.co preferred url etc. However its been about a week and half now and the DA is still showing as 1 and bi-mi (the old site that is redirecting) is showing as 18 Is it just patience needed here or have any of you also come across this issue?
Link Explorer | | Kelly33301 -
Open Site Explorer is finding old html Files that havn't been on my site in two years... even after a 301 Redirect. HELP!
Hello!
Link Explorer | | morganlindsaycole
My problem started when I became aware that when I checked my backlinks for the past two years, it states that no backlinks have been found. When I ran a site analysis on SEMrush - No backlinks are found on the URL, or Domain. There are 7 Backlinks on the Root Domain and those were configured in 2012. I made a second domain www.columbusweddingphotographersreviews.comwhere I linked to my domain at www.morganlindsayphotography.com so I could test that google had crawled both websites and after, still no backlink was found. I have also been published on a dozen or so wedding websites that has linked to my website where they are follow links and still nothing. (http://www.brendasweddingblog.com/blogs/2015/2/23/an-elegant-fall-wedding-in-ohio-with-morgan-lindsay-photography) **Website Background- **
In 2012 I had two separate websites - One for Seniors that was an HTML website I build in Dreamweaver at www.morganlindsayphotography/seniors and another for Wedding Clients found at www.morganlindsayphotography.com/Wedding - (wordpress) I had a Splash page wish was found atwww.morganlindsayphotography.com. Two years ago when I became aware splash pages were frowned upon in Google, I combined the two websites and stayed with the Wordpress which was www.morganlindsayphotography.com/Wedding
Because I did not want users to have to go to www.morganlindsayphotography.com/Wedding to view my url, Godaddy moved my wordpress site from thewww.morganlindsyphotography.com/Wedding directory towww.morganlindsayphotography.com When I ran the Open Site Explorer with Moz I found after runningwww.morganlindsayphotography.com the TOP pages on this domain according to Page Authority are old HTML files from my senior website, as well as old Posts from when my wordpress site was found atwww.moragnlindsayphotography.com/Weddings
No current pots or pages are showing up besideswww.morganlindsyphotography.com I do run a cache management system to speed up my system and recently cleaned out my .htcacess folder and still had no luck. This is difficulty something **Last night I made a 301 Redirect in my htaccess for all the old links pointing to the new links as best as I could. My htacess folder looks like this.. BEGIN WordPress <ifmodule mod_rewrite.c="">RewriteEngine On
RewriteBase /
RewriteRule ^index.php$ - [L]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.php [L]</ifmodule> END WordPress Permanent URL redirect - generated by www.rapidtables.com Redirect 301 /Wedding http://www.morganlindsayphotography.com/ Permanent URL redirect - generated by www.rapidtables.com Redirect 301 /Wedding/ http://www.morganlindsayphotography.com/ Permanent URL redirect - generated by www.rapidtables.com Redirect 301 /about.html http://www.morganlindsayphotography.com/about-morgan-lindsay/ Permanent URL redirect - generated by www.rapidtables.com Redirect 301 /app.html http://www.morganlindsayphotography.com/blog/ Permanent URL redirect - generated by www.rapidtables.com Redirect 301 /experience.html http://www.morganlindsayphotography.com/senior-sessions/ Permanent URL redirect - generated by www.rapidtables.com Redirect 301 /index.html http://www.morganlindsayphotography.com/ Permanent URL redirect - generated by www.rapidtables.com Redirect 301 /senior.html http://www.morganlindsayphotography.com/ohio-senior-photographer/ Permanent URL redirect - generated by www.rapidtables.com Redirect 301 /seniorsconstruction.html http://www.morganlindsayphotography.com/ohio-senior-photographer/ Permanent URL redirect - generated by www.rapidtables.com Redirect 301 /Wedding/2012/06/22/brittany-reis-jason-mcclaflin-tiffin-ohio-wedding/ http://www.morganlindsayphotography.com/holy-family-church-columbus-wedding/ After I ran the open site moz explorer and the www.morganlindsayphotography/Wedding was still there..0 -
When requesting a CSV for an internal 302 redirect report, the data isn't aligning with the UI. Does anyone have any suggestions?
When requesting a CSV for an internal 302 redirect report, the data isn't aligning with the UI. Does anyone have any suggestions?
Link Explorer | | Merkle-Impaqt0