Odd crawl test issues
-
Hi all, first post, be gentle...
Just signed up for moz with the hope that it, and the learning will help me improve my web traffic. Have managed to get a bit of woe already with one of the sites we have added to the tool. I cannot get the crawl test to do any actual crawling. Ive tried to add the domain three times now but the initial of a few pages (the auto one when you add a domain to pro) will not work for me.
Instead of getting a list of problems with the site, i have a list of 18 pages where it says 'Error Code 902: Network Errors Prevented Crawler from Contacting Server'. Being a little puzzled by this, i checked the site myself...no problems. I asked several people in different locations (and countries) to have a go, and no problems for them either. I ran the same site through Raven Tool site auditor and got some results. it crawled a few thousand pages. I ran the site through screaming frog as google bot user agent, and again no issues. I just tried the fetch as Gbot in WMT and all was fine there.
I'm very puzzled then as to why moz is having issues with the site but everyone is happy with it. I know the homepage takes 7 seconds to load - caching is off at the moment while we tweak the design - but all the other pages (according to SF) take average of 0.72 seconds to load.
The site is a magento one so we have a lengthy robots.txt but that is not causing problems for any of the other services. The robots txt is below.
Google Image Crawler Setup
User-agent: Googlebot-Image
Disallow:Crawlers Setup
User-agent: *
Directories
Disallow: /ajax/
Disallow: /404/
Disallow: /app/
Disallow: /cgi-bin/
Disallow: /downloader/
Disallow: /errors/
Disallow: /includes/
#Disallow: /js/
#Disallow: /lib/
Disallow: /magento/
#Disallow: /media/
Disallow: /pkginfo/
Disallow: /report/
Disallow: /scripts/
Disallow: /shell/
Disallow: /skin/
Disallow: /stats/
Disallow: /var/
Disallow: /catalog/product
Disallow: /index.php/
Disallow: /catalog/product_compare/
Disallow: /catalog/category/view/
Disallow: /catalog/product/view/
Disallow: /catalogsearch/
#Disallow: /checkout/
Disallow: /control/
Disallow: /contacts/
Disallow: /customer/
Disallow: /customize/
Disallow: /newsletter/
Disallow: /poll/
Disallow: /review/
Disallow: /sendfriend/
Disallow: /tag/
Disallow: /wishlist/
Disallow: /catalog/product/gallery/Files
Disallow: /cron.php
Disallow: /cron.sh
Disallow: /error_log
Disallow: /install.php
Disallow: /LICENSE.html
Disallow: /LICENSE.txt
Disallow: /LICENSE_AFL.txt
Disallow: /STATUS.txtPaths (no clean URLs)
#Disallow: /.js$
#Disallow: /.css$
Disallow: /.php$
Disallow: /?SID=Pagnation
Disallow: /?dir=
Disallow: /&dir=
Disallow: /?mode=
Disallow: /&mode=
Disallow: /?order=
Disallow: /&order=
Disallow: /?p=
Disallow: /&p=If anyone has any suggestions then please i would welcome them, be it with the tool or my robots. As a side note, im aware that we are blocking the individual product pages. Too many products on the site at the moment (250k plus) which manufacturer default descriptions so we have blocked them and are working on getting the category pages and guides listed. In time we will rewrite the most popular products and unblock them as we go
Many thanks
Carl
-
Thanks for the hints re the robots, will tidy that up.
-
Network errors can be somewhere between us and your site and not necessarily directly with your server itself. The best bet would be to check with your ISP for any connectivity issues to your server. Since your issues are only the first time they are reported, the next crawl may be more successful.
One thing though you will want to keep your user-agent directives in a single block of code without spaces.
so
Crawlers Setup
User-agent: *
Directories
Disallow: /ajax/
Disallow: /404/
Disallow: /app/would need to look like:
Crawlers Setup
User-agent: *
Directories
Disallow: /ajax/
Disallow: /404/
Disallow: /app/ -
Many thanks for the reply. The server we use is a dedicated server which we set up ourselves inc OS and control panel. Just seems very odd that every other tool is working fine etc but moz won't. I cannot see how it would need anything special from, say, Raven's site crawler.
I will check out those other threads though to see if i missed anything, thanks for the links.
Just checked port 80 using http:// www.yougetsignal. com/tools/open-ports/ (not sure if links allowed) and no problems there.
-
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Moz Pro OnDemand Crawl fail on on WordPress site
Hello, I just can't seem to understand why OnDemand Crawl fails on further attempts only 4 pages out of 68 I am using WordPress, Divi Theme and on LiteSpeed server. Robots.txt allows rogerbot just can seem to find the issue
Moz Bar | | ChrisSanClaire0 -
Site Crawl 1-page 301 status error but httpstatus.io says its 403
I am trying to run a site crawl for my website and MOZ is only resulting in 1 page crawled with the home page URL Status Code of 301. However when I run it in httpstatus.io it is giving me a 403 status error. Im curious as to why MOZ is saying its a 301 and httpstatus.io is saying 403. Is there anything I can do in MOZ first to get the site crawled before asking my developers to look into the 403 error?
Moz Bar | | JohnConover0 -
"New" issues not previously found being shown?
I'm not sure what logic Moz is using for its reporting of Site Crawl issues, but it appears to be pretty flawed (unless I'm missing something, which is possible). I've got a client site that has been in Moz for about 6 months now. Every time the crawler runs, the same number of pages are reported as having been crawled. However I'm consistently getting "New Issues" reported that should have been reported during previous crawls. Example: A redirect chain was reported several month ago. The referring URL was the homepage of the website, and we tracked it down to an old link in the header. This was fixed, marked as resolved, and the issue was not shown on the next crawl. Several weeks later, the same issue was reported for a different page on the website - a page which has existed since 2014 and was already crawled many times. Again, we fixed. Fast-forward to the report that just ran on 12/1 and we have the same issue reported, for a different page, which has also existed for years and has been previously crawled. It's very hard to explain to a client "this item you are seeing has been resolved", only to have it continually crop back up in future reports. Note this is not limited to redirect chains - that's just an example. I'm seeing this for other items such as missing canonicals, duplicate titles, etc.
Moz Bar | | RucksackDigital0 -
Why do i get multiple variations of my url with ?order=asc and ?view=list at the end of it in my crawl report?
I just did a crawl for one my clients to validate any error in the structure. Next thing I know is that the website have multiple variation of the same url with query like ?order=asc and ?view=list at the end of it. I am wondering why these url variations appears in the crawl I just did since bots aren't suppose to go further thant the ? normally. Just to show you a couple of url's of my crawl test. <colgroup><col width="484"></colgroup>
Moz Bar | | alexrbrg
| https://test.com/exemple/?per_page=9 |
| https://test.com/exemple/?per_page=15 |
| https://test.com/exemple/?per_page=30 |
| https://test.com/exemple/?orderby=popularity |
| https://test.com/exemple/?orderby=date |
| https://test.com/exemple/?orderby=price |
| https://test.com/exemple/?orderby=price-desc |
| https://test.com/exemple/?order=asc |
| https://test.com/exemple/?order=desc |
| https://test.com/exemple/?view=list | Thank you Guys0 -
Moz crawler only crawls one page?!
Hello there, I'm using Moz for a while and I'm very pleased with the tool and community. But for the first time I encountered a problem. We are trying to run a crawler for a client's website but only one page (only the homepage) was crawled. We tried to do a test on a more detailed level (maybe there is something wrong with the homepage). My campaign test's crawl came back for the Producten folder (level deeper than homepage), and it was also only a 1 page crawl with a 200 status. I did look at the robots.txt file now, and it is very restrictive, but there is nothing that I can clearly see that would explain why the crawl isn't working. Hopefully someone can point us at the right direction. Thanks in advance, Jeremy
Moz Bar | | mediaxplain.nl0 -
Crawl Test Takes Long Time
Hi Moz, I have submitted our website for a crawl test. Usually it would only take a few hours to do the crawl. However this time, it takes quite long time and the result still shows in progress 😞 This is a small website which only contains less than 10 pages. Just wondering if this is our website setting issue or it is a technical issue at your end? Many thanks in advance. sFjAERG.png
Moz Bar | | russellbrown0 -
Does anyone else have issues with Moz's keyword search volume tool for Google's search engine?
It will show the search volume for Bing even when Google is selected. Then, if you select Bing, you'll get the same data as it shows for when you selected "google". So basically, this tool does not work for Google's search engine. Or it is most likely not a reliable way to perform keyword research. Anyone else notice this? Does Moz even offer a way to submit a support ticket to get this fixed?
Moz Bar | | ShokIdeaGroup1 -
Rel Can notice issue on my SEOMoz reporting
Need some help understanding this report... I have 17 notices for Rel Can on my campaign. Then, it lists all the links. But what is this report actually telling me? Is it telling me that Rel Can's are listed on these pages? The are all blog posts...our blog was redirected when the site was recently rebuilt. I just need to understand what the report is really telling me to do/not do. Or is it ok to ignore this "notice"?
Moz Bar | | cschwartzel0