Odd crawl test issues

Arropa

Hi all, first post, be gentle...

Just signed up for moz with the hope that it, and the learning will help me improve my web traffic. Have managed to get a bit of woe already with one of the sites we have added to the tool. I cannot get the crawl test to do any actual crawling. Ive tried to add the domain three times now but the initial of a few pages (the auto one when you add a domain to pro) will not work for me.

Instead of getting a list of problems with the site, i have a list of 18 pages where it says 'Error Code 902: Network Errors Prevented Crawler from Contacting Server'. Being a little puzzled by this, i checked the site myself...no problems. I asked several people in different locations (and countries) to have a go, and no problems for them either. I ran the same site through Raven Tool site auditor and got some results. it crawled a few thousand pages. I ran the site through screaming frog as google bot user agent, and again no issues. I just tried the fetch as Gbot in WMT and all was fine there.

I'm very puzzled then as to why moz is having issues with the site but everyone is happy with it. I know the homepage takes 7 seconds to load - caching is off at the moment while we tweak the design - but all the other pages (according to SF) take average of 0.72 seconds to load.

The site is a magento one so we have a lengthy robots.txt but that is not causing problems for any of the other services. The robots txt is below.

Google Image Crawler Setup

User-agent: Googlebot-Image
Disallow:

Crawlers Setup

User-agent: *

Directories

Disallow: /ajax/
Disallow: /404/
Disallow: /app/
Disallow: /cgi-bin/
Disallow: /downloader/
Disallow: /errors/
Disallow: /includes/
#Disallow: /js/
#Disallow: /lib/
Disallow: /magento/
#Disallow: /media/
Disallow: /pkginfo/
Disallow: /report/
Disallow: /scripts/
Disallow: /shell/
Disallow: /skin/
Disallow: /stats/
Disallow: /var/
Disallow: /catalog/product
Disallow: /index.php/
Disallow: /catalog/product_compare/
Disallow: /catalog/category/view/
Disallow: /catalog/product/view/
Disallow: /catalogsearch/
#Disallow: /checkout/
Disallow: /control/
Disallow: /contacts/
Disallow: /customer/
Disallow: /customize/
Disallow: /newsletter/
Disallow: /poll/
Disallow: /review/
Disallow: /sendfriend/
Disallow: /tag/
Disallow: /wishlist/
Disallow: /catalog/product/gallery/

Files

Disallow: /cron.php
Disallow: /cron.sh
Disallow: /error_log
Disallow: /install.php
Disallow: /LICENSE.html
Disallow: /LICENSE.txt
Disallow: /LICENSE_AFL.txt
Disallow: /STATUS.txt

Paths (no clean URLs)

#Disallow: /.js$
#Disallow: /.css$
Disallow: /.php$
Disallow: /?SID=

Pagnation

Disallow: /?dir=
Disallow: /&dir=
Disallow: /?mode=
Disallow: /&mode=
Disallow: /?order=
Disallow: /&order=
Disallow: /?p=
Disallow: /&p=

If anyone has any suggestions then please i would welcome them, be it with the tool or my robots. As a side note, im aware that we are blocking the individual product pages. Too many products on the site at the moment (250k plus) which manufacturer default descriptions so we have blocked them and are working on getting the category pages and guides listed. In time we will rewrite the most popular products and unblock them as we go

Many thanks

Carl

Arropa

Thanks for the hints re the robots, will tidy that up.

DavidLee

Network errors can be somewhere between us and your site and not necessarily directly with your server itself. The best bet would be to check with your ISP for any connectivity issues to your server. Since your issues are only the first time they are reported, the next crawl may be more successful.

One thing though you will want to keep your user-agent directives in a single block of code without spaces.

so

Crawlers Setup

User-agent: *

Directories

Disallow: /ajax/
Disallow: /404/
Disallow: /app/

would need to look like:

Crawlers Setup

User-agent: *

Directories

Disallow: /ajax/
Disallow: /404/
Disallow: /app/

Arropa

Many thanks for the reply. The server we use is a dedicated server which we set up ourselves inc OS and control panel. Just seems very odd that every other tool is working fine etc but moz won't. I cannot see how it would need anything special from, say, Raven's site crawler.

I will check out those other threads though to see if i missed anything, thanks for the links.

Just checked port 80 using http:// www.yougetsignal. com/tools/open-ports/ (not sure if links allowed) and no problems there.

garfield_disliker

This might not be the most helpful response, but this particular question has popped up in the forums a few times now. Here, here, here, and so on. Seems like it might be something that your hosting provider/your server is blocking, not your robots.txt file.

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.