Moz & Xenu Link Sleuth unable to crawl a website (403 error)
-
It could be that I am missing something really obvious however we are getting the following error when we try to use the Moz tool on a client website. (I have read through a few posts on 403 errors but none that appear to be the same problem as this)
Moz Result
Title 403 : Error
Meta Description 403 Forbidden
Meta Robots_Not present/empty_
Meta Refresh_Not present/empty_
Xenu Link Sleuth Result
Broken links, ordered by link:
error code: 403 (forbidden request), linked from page(s): Thanks in advance!
-
Hey Liam,
Thanks for following up. Unfortunately, we use thousands of dynamic IPs through Amazon Web Services to run our crawler and the IP would change from crawl to crawl. We don't even have a set range for the IPs we use through AWS.
As for throttling, we don't have a set throttle. We try to space out the server hits enough to not bring down the server, but then hit the server as often as necessary in order to crawl the full site or crawl limit in a reasonable amount of time. We try to find a balance between hitting the site too hard and having extremely long crawl times. If the devs are worried about how often we hit the server, they can add a crawl delay of 10 to the robots.txt to throttle the crawler. We will respect that delay.
If the devs use Moz, as well, they would also be getting a 403 on their crawl because the server is blocking our user agent specifically. The server would give the same status code regardless of who has set up the campaign.
I'm sorry this information isn't more specific. Please let me know if you need any other assistance.
Chiaryn
-
Hi Chiaryn
The sage continues....this is the response my client got back from the developers - please could you let me have the answers to the two questions?
Apparently as part of their ‘SAF’ (?) protocols, if the IT director sees a big spike in 3<sup>rd</sup> party products trawling the site he will block them! They did say that they use moz too. What they’ve asked me to get from moz is:
- Moz IP address/range
- Level of throttling they will use
I would question that if THEY USE MOZ themselves why would they need these answers but if I go back with that I will be going around in circles - any chance of letting me know the answer(s)?
Thanks in advance.
Liam
-
Awesome - thank you.
Kind Regards
Liam
-
Hey There,
The robots.txt shouldn't really affect 403s; you would actually get a "blocked by robots.txt" error if that was the cause. Your server is basically telling us that we are not authorized to access your site. I agree with Mat that we are most likely being blocked in the htaccess file. It may be that your server is flagging our crawler and Xenu's crawler as troll crawlers or something along those lines. I ran a test on your URL using a non-existent crawler, Rogerbot with a capital R, and got a 200 status code back but when I run the test with our real crawler, rogerbot with a lowercase r, I get the 403 error (http://screencast.com/t/Sv9cozvY2f01). This tells me that the server is specifically blocking our crawler, but not all crawlers in general.
I hope this helps. Let me know if you have any other questions.
Chiaryn
Help Team Ninja -
Hi Mat
Thanks for the reply - robots.txt file is as follows:
## The following are infinitely deep trees User-agent: * Disallow: /cgi-bin Disallow: /cms/events Disallow: /cms/latest Disallow: /cms/cookieprivacy Disallow: /cms/help Disallow: /site/services/megamenu/ Disallow: /site/mobile/ I can't get access to the .htaccess file at present (we're not the developers) Anyone else any thoughts? Weirdly I can get Screaming Frog info back on the site :-/
-
403s are tricky to diagnose because they, by their very nature, don't tell you much. They're sort of the server equivalent of just shouting "NO!".
You say Moz & Xenu are receiving the 403. I assume that it loads properly from a browser.
I'd start looking at the .htaccess . Any odd deny statements in there? It could be that an IP range or user agent is blocked. Some people like to block common crawlers (Not calling Roger names there). Check the robots.txt whilst you are there, although that shouldn't return a 403 really.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Site Crawl 4xx Errors?
Hello! When I check our website's critical crawler issues with Moz Site Crawler, I'm seeing over 1000 pages with a 4xx error. All of the pages that are showing to have a 4xx error appear to be the brand and product pages we have on our website, but with /URL at the end of each permalink. For example, we have a page on our site for a brand called Davinci. The URL is https://kannakart.com/davinci/. In the site crawler, I'm seeing the 4xx for this URL: https://kannakart.com/davinci/URL. Could this be a plugin on our site that is generating these URLs? If they're going to be an issue, I'd like to remove them. However, I'm not sure exactly where to begin. Thanks in advance for the help, -Andrew
Moz Pro | | mostcg0 -
MoZ & Other Keyword Tools
Hi There, 1- Does MoZ provide the data on “which keyword searchers are searching in a particular market”, so that I can make a better decision regarding my own keywords?If MoZ does not do this what other best tool is out there in market for this particular purpose? 2- Suppose I decide some keywords for my site and start doing optimization for them. "Does MoZ tell me “how much traffic my those keywords are receiving with the passage of time”. If MoZ does not do this what other best tool is out there in market for this particular purpose? 3- Does Moz give "data on competitors keyword activity also"? If MoZ does not do this what other best tool is out there in market for this particular purpose? Hope somebody with concrete knowledge and experience will enlighten me. Cheers Tanveer
Moz Pro | | Sequelmed0 -
If links have been disavowed, do they still show in crawl reports?
I have a new client who says they have disavowed all their bad links, but I still see a bunch of spammy backlinks in my external links report. I understand that disavow does not mean links are actually removed so will they continue to show in Google Webmaster Tools and in my Moz reports? If so, how do I know which ones have been disavowed and which have not? Regards, Dino
Moz Pro | | Dino640 -
404 : Errors in crawl report - all pages are listed with index.html on a WordPress site
Hi Mozers, I have recently submitted a website using moz, which has pulled up a second version of every page on the WordPress site as a 404 error with index.html at the end of the URL. e.g Live page URL - http://www.autostemtechnology.com/applications/civil-blasting/ Report page URL - http://www.autostemtechnology.com/applications/civil-blasting/index.html The permalink structure is set as /%postname%/ For some reason the report has listed every page with index.html at the end of the page URL. I have tried a number of redirects in the .htaccess file but doesn't seem to work. Any suggestions will be strongly appreciated. Thanks
Moz Pro | | AmanziDigital0 -
1 page crawled ... and other errors
1. Why is only one (1) page crawled every second time you crawl my site? 2. Why do your bot not obey the rules specified in the robots.txt? 3. Why does your site constantly loose connection to my facebook account/page? This means that when ever i want to compare performance i need to re-authorize, and therefor can not see any data until next time. Next time i also need to re-authorize ... 4. Why cant i add a competitor twitter account? What ever i type i get an "uh oh account cannot be tracked" - and if i randomly succeed, the account added never shows up with any data. It has been like this for ages. If have reported these issues over and over again. We are part of a large scandinavian company represented by Denmark, Sweden, Norway and Finland. The companies are also part of a larger worldwide company spreading across England, Ireland, Continental Europe and Northern Europe. I count at least 10 accounts on Seomoz.org We, the Northern Europe (4 accounts) are now reconsidering our membership at seomoz.org. We have recently expanded our efforts and established a SEO-community in the larger scale businees spanning all our countries. Also in this community we are now discussing the quality of your services. We'll be meeting next time at 27-28th of june in London. I hope i can bring some answers that clarify the problem we have seen here on seomoz.org. As i have written before: I love your setup and you tools - when they work. Regretebly, that is only occasionally the case!
Moz Pro | | alsvik1 -
Too many on-page links
I received a warning in my most recent report for too many on-page links for the following page: http://www.fateyes.com/blog/. I can't figure out why this would be. I am counting between 60-70 including all pull downs, "read more's", archive, category and a few additional misc. links. Any ideas or suggestions on this? Or what I might do to rectify? Perhaps it's just an SEOmoz report blip... We currently don't have the post list rolling to additional pages so it's kind of passively set up to be endless, but it's in the works.
Moz Pro | | gfiedel0 -
Campaign report errors
one of the heavily noted errors for the first crawl of our domain was duplicate titles. I did not see a list of the pages and their current titles, but i am pretty sure it is somewhere to be found. That would help focus the work to do. Am I wrong about that?
Moz Pro | | Jacog0 -
Crawl Diagnostics Update
I have corrected some errors in my SEOMoz Crawl Diagnostics, however the errors are still showing. It says a crawl has happen since. Any idea's why?
Moz Pro | | petewinter0