Odd crawl test issues
-
Hi all, first post, be gentle...
Just signed up for moz with the hope that it, and the learning will help me improve my web traffic. Have managed to get a bit of woe already with one of the sites we have added to the tool. I cannot get the crawl test to do any actual crawling. Ive tried to add the domain three times now but the initial of a few pages (the auto one when you add a domain to pro) will not work for me.
Instead of getting a list of problems with the site, i have a list of 18 pages where it says 'Error Code 902: Network Errors Prevented Crawler from Contacting Server'. Being a little puzzled by this, i checked the site myself...no problems. I asked several people in different locations (and countries) to have a go, and no problems for them either. I ran the same site through Raven Tool site auditor and got some results. it crawled a few thousand pages. I ran the site through screaming frog as google bot user agent, and again no issues. I just tried the fetch as Gbot in WMT and all was fine there.
I'm very puzzled then as to why moz is having issues with the site but everyone is happy with it. I know the homepage takes 7 seconds to load - caching is off at the moment while we tweak the design - but all the other pages (according to SF) take average of 0.72 seconds to load.
The site is a magento one so we have a lengthy robots.txt but that is not causing problems for any of the other services. The robots txt is below.
Google Image Crawler Setup
User-agent: Googlebot-Image
Disallow:Crawlers Setup
User-agent: *
Directories
Disallow: /ajax/
Disallow: /404/
Disallow: /app/
Disallow: /cgi-bin/
Disallow: /downloader/
Disallow: /errors/
Disallow: /includes/
#Disallow: /js/
#Disallow: /lib/
Disallow: /magento/
#Disallow: /media/
Disallow: /pkginfo/
Disallow: /report/
Disallow: /scripts/
Disallow: /shell/
Disallow: /skin/
Disallow: /stats/
Disallow: /var/
Disallow: /catalog/product
Disallow: /index.php/
Disallow: /catalog/product_compare/
Disallow: /catalog/category/view/
Disallow: /catalog/product/view/
Disallow: /catalogsearch/
#Disallow: /checkout/
Disallow: /control/
Disallow: /contacts/
Disallow: /customer/
Disallow: /customize/
Disallow: /newsletter/
Disallow: /poll/
Disallow: /review/
Disallow: /sendfriend/
Disallow: /tag/
Disallow: /wishlist/
Disallow: /catalog/product/gallery/Files
Disallow: /cron.php
Disallow: /cron.sh
Disallow: /error_log
Disallow: /install.php
Disallow: /LICENSE.html
Disallow: /LICENSE.txt
Disallow: /LICENSE_AFL.txt
Disallow: /STATUS.txtPaths (no clean URLs)
#Disallow: /.js$
#Disallow: /.css$
Disallow: /.php$
Disallow: /?SID=Pagnation
Disallow: /?dir=
Disallow: /&dir=
Disallow: /?mode=
Disallow: /&mode=
Disallow: /?order=
Disallow: /&order=
Disallow: /?p=
Disallow: /&p=If anyone has any suggestions then please i would welcome them, be it with the tool or my robots. As a side note, im aware that we are blocking the individual product pages. Too many products on the site at the moment (250k plus) which manufacturer default descriptions so we have blocked them and are working on getting the category pages and guides listed. In time we will rewrite the most popular products and unblock them as we go
Many thanks
Carl
-
Thanks for the hints re the robots, will tidy that up.
-
Network errors can be somewhere between us and your site and not necessarily directly with your server itself. The best bet would be to check with your ISP for any connectivity issues to your server. Since your issues are only the first time they are reported, the next crawl may be more successful.
One thing though you will want to keep your user-agent directives in a single block of code without spaces.
so
Crawlers Setup
User-agent: *
Directories
Disallow: /ajax/
Disallow: /404/
Disallow: /app/would need to look like:
Crawlers Setup
User-agent: *
Directories
Disallow: /ajax/
Disallow: /404/
Disallow: /app/ -
Many thanks for the reply. The server we use is a dedicated server which we set up ourselves inc OS and control panel. Just seems very odd that every other tool is working fine etc but moz won't. I cannot see how it would need anything special from, say, Raven's site crawler.
I will check out those other threads though to see if i missed anything, thanks for the links.
Just checked port 80 using http:// www.yougetsignal. com/tools/open-ports/ (not sure if links allowed) and no problems there.
-
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Site Crawl report show strange duplicate pages
Beginning in early in Feb, we got a big bump in duplicate pages. The URLs of the pages are very odd: Example URL:
Moz Bar | | Neo4j
http://firstname.lastname@website.com/dir/page.php
is duplicate with http://website.com/dir/page.php I checked though the site, nginx conf files, and referral pages, and could not find what is prefixing the pages with 'http://firstname.lastname@'. Any ideas? The person whose name is 'Firstname Lastname' is stumped as well. Thanks.0 -
Weird 404 in Crawl Diagnostics
I'am getting a lot of 404 errors (196 to be precise ) - but their pattern is weird.
Moz Bar | | oorbo
The page that the crawler is trying to find is (e.g):
http://www.oorbo.com/item/asufa-israeli-design-shop**/www.oorbo.com.
the linking page is** http://www.oorbo.com/item/asufa-israeli-design-shop meaning it adds to the end of the link the root URL - /www.oorbo.com. This happens in all 196 cases - trying to find a page http://www.oorbo.com/some-page/www.oorbo.com from a refferer page http://www.oorbo.com/some-page. Obviously this pages do not exist, and it's getting a 404. I've look into the pages themselves and digged into their code - It doesn't seem that the bad link is any where on the page. Did anyone came across this kind of issue? any one can point me to a solution ?0 -
Moz crawler only crawls one page?!
Hello there, I'm using Moz for a while and I'm very pleased with the tool and community. But for the first time I encountered a problem. We are trying to run a crawler for a client's website but only one page (only the homepage) was crawled. We tried to do a test on a more detailed level (maybe there is something wrong with the homepage). My campaign test's crawl came back for the Producten folder (level deeper than homepage), and it was also only a 1 page crawl with a 200 status. I did look at the robots.txt file now, and it is very restrictive, but there is nothing that I can clearly see that would explain why the crawl isn't working. Hopefully someone can point us at the right direction. Thanks in advance, Jeremy
Moz Bar | | mediaxplain.nl0 -
Unusual "internal links" causing SEO issues?
Hi all, I'm working on an ecommerce site which has been around for almost 20 years. Over the years it has started to suffer in Google's search results and the decision was recently made to completely overhaul the site. We're now very happy with the website's design, and care was taken to maintain page rank via 301s, etc. However, the site has just fallen off the bottom of Google's first search result page (for the first time in years) for our main keyword. I signed up here in the hope of using Moz's SEO tools to help us return to our former glory, but I'm seeing some confusing results: I've run a crawl test on our site, as well as on our two biggest competitors. One thing that really stood out was that we have over 1000 "internal links" to our homepage, whereas our competitors both have around 20-30 (both of which appear at the top of the first SR page). Since the rest of the "on-page SEO" looks OK, I suspect that this could be causing our problems, but I don't understand where this "internal links" number is coming from. Links to our competitor's homepage appear in the navigation bar on every single one of their product pages (which they have about 500 of), yet your report only claims that they have 30 links. The only link to the homepage appears in the site's main navigation bar (which obviously appears on every product page - exactly as it does on our competitors' sites). Additionally, almost every other page on our site apparently has 0 "internal links" and 0 page authority. Is this a problem with Moz's crawl test tool, or is our site actually at fault? The above has been asked directly to Moz staff, but I haven't had a reply. I'd hugely appreciate any words of wisdom from the community. Many thanks in advance. Nick
Moz Bar | | nick45010 -
URLS appearing twice in Moz crawl
I have asked this question before and got a Moz response to which i replied but no reply after that. Hi, We have noticed in our moz crawl that urls are appearing twice so urls like this - http://www.recyclingbins.co.uk/about/ www.recyclingbins.co.uk/about/ Thought it may be possible rel=canonical issue as can find URL's but no linking URL's to the pages. Does anyone have any ideas? Thank you Jon I did the crawl test and they were not there
Moz Bar | | imrubbish0 -
Crawl Diagnostics - nofollow - reducing duplicate pages
Hi I'm looking at a crawl diagnostic report, I can see I have many duplicate pages, the reason for this is that when a brand filter is applied to a page. IE
Moz Bar | | chameleondm
www.mysite.com/mycategory - lets say this is the product listing page
www.mysite.com/category/mybrand - and this is the same page but with a brand filter applied
www.mysite.com/category/myotherbrand - and this is the same page but with a different brand filter applied I had intially appendeded the meta title, description and keywords with some extra content if a brand filter was applied, because the page on the whole does have different content. IE I would have a custom meta information, H1 tag and products on that page just for that specific brand.
However I am wondering if these two pages are really just competing with each other as lots of the content will be the same. Should I scrap that approach and use either nofollow on the brand filter link, or simply use a canonical. Thanks, James1 -
Prioritising campaign issues
HI Guys, Im just going through the data from our campaign I and I see we have the following. ** Too Many On-Page Links 10 000** Duplicate Page Title 8700 Duplicate Page Content 8000 Missing Meta Description Tag 1800 In terms of remedying, what do I need to prioritise? For instance does google penalise you more for duplicate URLs or more for too many links on page links? I look forward to hearing from you
Moz Bar | | Hardley1110