Moz & Xenu Link Sleuth unable to crawl a website (403 error)
-
It could be that I am missing something really obvious however we are getting the following error when we try to use the Moz tool on a client website. (I have read through a few posts on 403 errors but none that appear to be the same problem as this)
Moz Result
Title 403 : Error
Meta Description 403 Forbidden
Meta Robots_Not present/empty_
Meta Refresh_Not present/empty_
Xenu Link Sleuth Result
Broken links, ordered by link:
error code: 403 (forbidden request), linked from page(s): Thanks in advance!
-
Hey Liam,
Thanks for following up. Unfortunately, we use thousands of dynamic IPs through Amazon Web Services to run our crawler and the IP would change from crawl to crawl. We don't even have a set range for the IPs we use through AWS.
As for throttling, we don't have a set throttle. We try to space out the server hits enough to not bring down the server, but then hit the server as often as necessary in order to crawl the full site or crawl limit in a reasonable amount of time. We try to find a balance between hitting the site too hard and having extremely long crawl times. If the devs are worried about how often we hit the server, they can add a crawl delay of 10 to the robots.txt to throttle the crawler. We will respect that delay.
If the devs use Moz, as well, they would also be getting a 403 on their crawl because the server is blocking our user agent specifically. The server would give the same status code regardless of who has set up the campaign.
I'm sorry this information isn't more specific. Please let me know if you need any other assistance.
Chiaryn
-
Hi Chiaryn
The sage continues....this is the response my client got back from the developers - please could you let me have the answers to the two questions?
Apparently as part of their ‘SAF’ (?) protocols, if the IT director sees a big spike in 3<sup>rd</sup> party products trawling the site he will block them! They did say that they use moz too. What they’ve asked me to get from moz is:
- Moz IP address/range
- Level of throttling they will use
I would question that if THEY USE MOZ themselves why would they need these answers but if I go back with that I will be going around in circles - any chance of letting me know the answer(s)?
Thanks in advance.
Liam
-
Awesome - thank you.
Kind Regards
Liam
-
Hey There,
The robots.txt shouldn't really affect 403s; you would actually get a "blocked by robots.txt" error if that was the cause. Your server is basically telling us that we are not authorized to access your site. I agree with Mat that we are most likely being blocked in the htaccess file. It may be that your server is flagging our crawler and Xenu's crawler as troll crawlers or something along those lines. I ran a test on your URL using a non-existent crawler, Rogerbot with a capital R, and got a 200 status code back but when I run the test with our real crawler, rogerbot with a lowercase r, I get the 403 error (http://screencast.com/t/Sv9cozvY2f01). This tells me that the server is specifically blocking our crawler, but not all crawlers in general.
I hope this helps. Let me know if you have any other questions.
Chiaryn
Help Team Ninja -
Hi Mat
Thanks for the reply - robots.txt file is as follows:
## The following are infinitely deep trees User-agent: * Disallow: /cgi-bin Disallow: /cms/events Disallow: /cms/latest Disallow: /cms/cookieprivacy Disallow: /cms/help Disallow: /site/services/megamenu/ Disallow: /site/mobile/ I can't get access to the .htaccess file at present (we're not the developers) Anyone else any thoughts? Weirdly I can get Screaming Frog info back on the site :-/
-
403s are tricky to diagnose because they, by their very nature, don't tell you much. They're sort of the server equivalent of just shouting "NO!".
You say Moz & Xenu are receiving the 403. I assume that it loads properly from a browser.
I'd start looking at the .htaccess . Any odd deny statements in there? It could be that an IP range or user agent is blocked. Some people like to block common crawlers (Not calling Roger names there). Check the robots.txt whilst you are there, although that shouldn't return a 403 really.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Looking for an SEO & MOZ Pro to add to my team!
Hi community, I'm a long-time MOZer and SEO director for a San Francisco-based digital marketing company providing complete end-to-end online growth solutions...and I'M HIRING! If you've got at least 3 years SEO experience and know MOZ inside and out, please reach out to me.
Moz Pro | | upgrowsf0 -
New Website & DA Score
We are preparing to build a new website for a branch of our company. According to Neil Patel (whom I highly respect) our DA should come in around 30 (categorizing us as a startup, see chart below). So I ask: Why do our current sites DA Score below 30? (20 & 29 respectively) Our sites were redesigned in 2013 and are responsive, pass the Google Mobile Friendly test and I've improved load speed by over 25% by compressing images. I have used Moz as guidance for 10 months on both 7 + yo sites, HP.com is 14 yo, our pages score in the high 90's and I have disavowed bad backlinks for 6 months (now only every other month). I thought we would be in or close to the 50 range by 2017 after a solid year of improvements, but I don't see that happening. Our scores go up a point or two, then back down, we have floundered 2 points either way for 10 months. Most important question: What do I do now to raise our DA? Lastly, my plan is to take all the lessons learned from the past years and apply it to the new site, measure the success and then redesign the other two sites in accordance to the new model. But that may be a year away. KJr www.heritageprinting.com www.heritageprintingcharlotte.com | DA | Rating |
Moz Pro | | KevnJr
| 1-10 | Poor – Your site is young and weak. You have a lot of growing to do. |
| 11-20 | Decent. Your site isn’t stellar, but you’re doing better. It would be good to grow. |
| 21-30 | Fair. Your site shows signs of SEO, but there are many things you can and should do to improve. |
| 31-40 | Competitive. A lot of startups find themselves in this DA range. It’s not bad, and you’re beginning to get close to the sweet spot. |
| 41-50 | Good. Now, you’re getting somewhere. This is a nice place to be, and many good e-commerce sites find themselves squarely in this category. |
| 51-60 | Strong. As you swing out of the lower half of the scale, you’re beginning to get much healthier. This is a good place to be. |
| 61-70 | Excellent. A DA at this level represents a great site with a lot of recognition, a lot of link backs, and a considerable authority in its niche. Many .edus are in this space. |
| 71-80 | Outstanding. You’re dominating in the SERPs and owning your niche. Quick Sprout is a 73. |
| 81-90 | Very outstanding. You’re in the upper echelons of authority. You can consider yourself to have arrived. |
| 91-100 | Rare. These sites are household names — Wikipedia, Facebook, New York Times, etc. Your site will probably never attain this level. Only a miniscule fraction of a percentage of sites on the Internet ever get this high. | https://www.quicksprout.com/2014/04/09/how-to-score-your-websites-seo-in-10-minutes-or-less/0 -
Crawl diagnostics up to date after Magento ecommerce site crawl?
Howdy Mozzers, I have a Magento ecommerce website and I was wondering if the data (errors/warnings) from the Crawl diagnostics are up to date. My Magento website has 2.439 errors, mainly 1.325 duplicate page content and 1.111 duplicate page title issues. I already implemented the Yoast meta data plugin that should fix these issues, however I still see there errors appearing in the crawl diagnostics, but when going to the mentioned URL in the crawl diagnostics for e.g.: http://domain.com/babyroom/productname.html?dir=desc&targetaudience=64&order=name and checking the source code and searching for 'canonical' I do see: http://domain.com/babyroom/productname.html" />. Even I checked the google serp for url: http://domain.com/babyroom/productname.html?dir=desc&targetaudience=64&order=name and I couldn't find the url indexed in Google. So it basically means the Yoast meta plugin actually worked. So what I was wondering is why I still see the error counted in the crawl diagnostics? My goal is to remove all the errors and bring it all to zero in the crawl diagnostics. And now I am still struggling with the "overly-dynamic URL" (1.025) and "too many on-page links" (9.000+) I want to measure whether I can bring the warnings down after implementing an AJAX-based layered navigation. But if it's not updating it here crawl diagnostics I have no idea how to measure the success of eliminating the warnings. Thanks for reading and hopefully you all can give me some feedback.
Moz Pro | | videomarketingboys0 -
When I did my first crawl, I was given some errors.
Do I then need to re-crawl to make sure the errors were fixed accordingly?
Moz Pro | | immortalgamer0 -
Seo moz has only crawled 2 pages of my site. Ive been notified of a 403 error and need an answer as to why my pages are not being crawled?
SEO Moz has only crawled 2 pages of my clients site. I have noticed the following. A 403 error message screaming frog also cannot crawl the site but IIS can. Due to the lack of crawling ability, im getting no feed back on my on page optimization rankings or crawl diagnostics summary, so my competitive analysis and optimization is suffering Anybody have any idea as to what needs to be done to rectify this issue as access to the coding or cms platform is out of my hands. Thank you
Moz Pro | | nitro-digital0 -
Error on SEOMoz When Trying to Track Website. Please Advise
Hi, I'm trying to start a new campaign for a root domain, but I'm getting the "Roger found an error" and am not sure what to make of it. Error #1: "You've decided to set up a root domain campaign, but entered the subdomain path: www.siteurl.com. Don't worry, we'll switch that for you and crawl everything on the subdomain: www.siteurl.com. If you meant to set this up to only crawl pages in the root domain, click 'Go back and Change" and enter a root domain URL in step 1." Error #2: "Oops! The root domain siteurl.com redirects to a domain that is not within the specified root domain (www.siteurl.com). This will cause us to stop crawling as the first discovered page falls outside of the root domain you've defined. Please make sure you enter a root domain that resolves to a page that is under the root domain." What does this mean? Is there something I am doing wrong? The first error is what returned when I input www.siteurl.com. The second was returned when I put just siteurl.com. I didn't put up the exact URL for privacy reasons, but if you really do want to help me out, PM me and I can give you the real URL. Thanks in advance!
Moz Pro | | locallyrank0 -
How do you get Mozbot to crawl your website
I trying to get the mozbot to crawl my site so I can get new crawl diagnostics info. Anyone know how this can be done?
Moz Pro | | Romancing0