Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Moz & Xenu Link Sleuth unable to crawl a website (403 error)
-
It could be that I am missing something really obvious however we are getting the following error when we try to use the Moz tool on a client website. (I have read through a few posts on 403 errors but none that appear to be the same problem as this)
Moz Result
Title 403 : Error
Meta Description 403 Forbidden
Meta Robots_Not present/empty_
Meta Refresh_Not present/empty_
Xenu Link Sleuth Result
Broken links, ordered by link:
error code: 403 (forbidden request), linked from page(s): Thanks in advance!
-
Hey Liam,
Thanks for following up. Unfortunately, we use thousands of dynamic IPs through Amazon Web Services to run our crawler and the IP would change from crawl to crawl. We don't even have a set range for the IPs we use through AWS.
As for throttling, we don't have a set throttle. We try to space out the server hits enough to not bring down the server, but then hit the server as often as necessary in order to crawl the full site or crawl limit in a reasonable amount of time. We try to find a balance between hitting the site too hard and having extremely long crawl times. If the devs are worried about how often we hit the server, they can add a crawl delay of 10 to the robots.txt to throttle the crawler. We will respect that delay.
If the devs use Moz, as well, they would also be getting a 403 on their crawl because the server is blocking our user agent specifically. The server would give the same status code regardless of who has set up the campaign.
I'm sorry this information isn't more specific. Please let me know if you need any other assistance.
Chiaryn
-
Hi Chiaryn
The sage continues....this is the response my client got back from the developers - please could you let me have the answers to the two questions?
Apparently as part of their ‘SAF’ (?) protocols, if the IT director sees a big spike in 3<sup>rd</sup> party products trawling the site he will block them! They did say that they use moz too. What they’ve asked me to get from moz is:
- Moz IP address/range
- Level of throttling they will use
I would question that if THEY USE MOZ themselves why would they need these answers but if I go back with that I will be going around in circles - any chance of letting me know the answer(s)?
Thanks in advance.
Liam
-
Awesome - thank you.
Kind Regards
Liam
-
Hey There,
The robots.txt shouldn't really affect 403s; you would actually get a "blocked by robots.txt" error if that was the cause. Your server is basically telling us that we are not authorized to access your site. I agree with Mat that we are most likely being blocked in the htaccess file. It may be that your server is flagging our crawler and Xenu's crawler as troll crawlers or something along those lines. I ran a test on your URL using a non-existent crawler, Rogerbot with a capital R, and got a 200 status code back but when I run the test with our real crawler, rogerbot with a lowercase r, I get the 403 error (http://screencast.com/t/Sv9cozvY2f01). This tells me that the server is specifically blocking our crawler, but not all crawlers in general.
I hope this helps. Let me know if you have any other questions.
Chiaryn
Help Team Ninja -
Hi Mat
Thanks for the reply - robots.txt file is as follows:
## The following are infinitely deep trees User-agent: * Disallow: /cgi-bin Disallow: /cms/events Disallow: /cms/latest Disallow: /cms/cookieprivacy Disallow: /cms/help Disallow: /site/services/megamenu/ Disallow: /site/mobile/ I can't get access to the .htaccess file at present (we're not the developers) Anyone else any thoughts? Weirdly I can get Screaming Frog info back on the site :-/
-
403s are tricky to diagnose because they, by their very nature, don't tell you much. They're sort of the server equivalent of just shouting "NO!".
You say Moz & Xenu are receiving the 403. I assume that it loads properly from a browser.
I'd start looking at the .htaccess . Any odd deny statements in there? It could be that an IP range or user agent is blocked. Some people like to block common crawlers (Not calling Roger names there). Check the robots.txt whilst you are there, although that shouldn't return a 403 really.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
WEbsite cannot be crawled
I have received the following message from MOZ on a few of our websites now Our crawler was not able to access the robots.txt file on your site. This often occurs because of a server error from the robots.txt. Although this may have been caused by a temporary outage, we recommend making sure your robots.txt file is accessible and that your network and server are working correctly. Typically errors like this should be investigated and fixed by the site webmaster. I have spoken with our webmaster and they have advised the below: The Robots.txt file is definitely there on all pages and Google is able to crawl for these files. Moz however is having some difficulty with finding the files when there is a particular redirect in place. For example, the page currently redirects from threecounties.co.uk/ to https://www.threecounties.co.uk/ and when this happens, the Moz crawler cannot find the robots.txt on the first URL and this generates the reports you have been receiving. From what I understand, this is a flaw with the Moz software and not something that we could fix form our end. _Going forward, something we could do is remove these rewrite rules to www., but these are useful redirects and removing them would likely have SEO implications. _ Has anyone else had this issue and is there anything we can do to rectify, or should we leave as is?
Moz Pro | | threecounties0 -
What is Linking C-Blocks
Currently i am using MOZ pro tool under moz analyticls >> Moz Competitive Link Metrics >> history having a graph "Linking C-Blocks" Please help me understanding Linking C-Blocks, what is, How to build, how to define ...
Moz Pro | | shankar3334 -
Whether or not to remove a link from a website with high spam score on Open Site Explorer
Hello Moz! I just subscribed for your Moz Pro program. Amazing stuff! On open site explorer, I found a number of links to my site from a page called with a very high page authority and high domain authority, but also a high spam score (8 or 9, one with a 10). I say multiple spam scores, because it's strange, there are what appears variations of the same url, and each one is considered a link. For instance, there's an abc.linkstomysite.com and xyz.linktomysite.com, and 123.linktomysite.com... there are about 15 of these (all with the spam scores mentioned above)! This must have been some old SEO work done I payed for back in the prehistoric SEO days. However, my fear is the following: Removing these links, and then losing some potentially strong link juice. I don't have many high DA or PA links to my site, and these are some major ones. The domain in question "linktomysite.com", when entered into OSE, only has a spam score of 4, and it has a domain authority of 45 and page authority of 37. My site has a spam score of 2 and no messages from google regarding a penalty, but an overall reduction in google traffic over the years (just keeps slowly dropping... as if a weight is pulling me down?) What do you think, should I leave, or remove? The linkstomysite page is just a LONG page full of links, with short descriptions, nothing of value, but with a an old domain age (relatively). Most important for me is keeping at least some ranking/visibility, while I personally work on building quality links and helpful content. thanks!
Moz Pro | | DavidC.0 -
Why did my DA and Links go down?
How could I begin to research this to find out why we have less total links and a lower domain authority. Our DA has gone down two points and our total links dropped by a few thousand. I'd like to find out why that happened but I'm just not sure where to start. Thanks!
Moz Pro | | Sika220 -
Moz WordPress Plugin?
WordPress is currently 18% of the Internet. Given its huge footprint, wouldn't it make sense for Moz to develop a WP plugin that can not only report site metrics, but help fix and optimize site structure directly from within the site? Just curious - I can't be the only one who wonders if I'm implementing Moz findings/recommendations correctly given the myriad of WP SEO plugins, authors, implementations.
Moz Pro | | twelvetwo.net5 -
Why are inbound links not showing?
I run the site http://www.eurocheapo.com and am finding that many inbound links are not showing up in OSE and on the toolbar. For example, check out this hotel review: http://www.eurocheapo.com/paris/hotel/hotel-esmeralda.html In OSE it shows only 2 links (from 1 domain), which is crazy. It has dozens of inbound links from many different domains (links:http://www.eurocheapo.com/paris/hotel/hotel-esmeralda.html). I notice this all over my site. Pages that we link between are also showing no internal links -- which is easy to disprove. Was there a problem with this crawl? Or is the problem in our code? Many thanks for your help, Tom
Moz Pro | | TomNYC0 -
How long does a crawl take?
A crawl of my site started on the 8th July & is still going on - is there something wrong???
Moz Pro | | Brian_Worger1 -
Error 403
I'm getting this message "We were unable to grade that page. We received a response code of 403. URL content not parseable" when using the On-Page Report Card. Does anyone know how to go about fixing this? I feel like I've tried everything.
Moz Pro | | Sean_McDonnell0