Crawl Diagnostics - unexpected results
-
I received my first Crawl Diagnostics report last night on my dynamic ecommerce site.
It showed errors on generated URLs which simply are not produced anywhere when running on my live site. Only when running on my local development server.
It appears that the Crawler doesn't think that it's running on the live site.
For example
http://www.nordichouse.co.uk/candlestick-centrepiece-p-1140.html
will go to a Product Not Found page, and therefore Duplicate Content errors are produced.
Running
http://www.nhlocal.co.uk/candlestick-centrepiece-p-1140.html
produces the correct product page and not a Product Not Found page
Any thoughts?
-
Hi Nordichouse,
Sorry it took awhile for me to get back to you on this.
I agree with the SEOmoz techs, it doesn't matter if it is a crawler or a actual person, if you go to an invalid url you should be redirected as 301 to the actual page. If the product doesn't exist it should not allow for superfluous urls.
So basically what you should have is if the product exist then the site redirects to the correct URL. If it doesn't exist then send any query for that product to the same page and display the oscommerce product not found message. By doing this you prevent the system from creating upteenthousand urls for each product.
If you want to test what I mean you can visit our store a www.rubberstore.com/catalog and try a few urls like:
catalog/nipple-clips-p-1000.html
we don't have a product with the id of 1000 so you'll get redirected to the not found message and the root page
-p-1000.htmlhowever if you try:
catalog/a-fake-url-p-29.html
you'll get redirected to our actual product page matching this product id.Hope that makes since. All this is done with the .htaccess url re-writter I posted above.
-
Don
Yes, that is how it is done and there is no problem with that. The above is just how inbound URLs get processed.
The issue here is how the crawler works. The only possible way for this particular URL to be generated is for a certain parameter to be appended to the URL - and that would be unusual (unless SEOmoz techies tell me different)
Alan
-
Did you ever have a product with the id of 1140? If you look at your products table just check the auto number in the product_id column..
If you did and it was live at some point it could be finding the old product based on the old url it used to have.
If you never made that product live then I don't know how a crawler could of found a product that doesn't exist unless they starting using some technology that I'm unaware of.
Since you said you use OSC this what we use to deal with the problem I outlined above..
Begin Ultimate SEO V2.2d
Options +FollowSymLinks
RewriteEngine On# RewriteBase instructions
# Change RewriteBase dependent on how your shop is accessed as below.
# http://www.mysite.com = RewriteBase /
# http://www.mysite.com/catalog/ = RewriteBase /catalog/
# http://www.mysite.com/catalog/shop/ = RewriteBase /catalog/shop/# Change the following line using the instructions above
RewriteBase /catalog/RewriteRule ^(.)-p-(.).html$ product_info.php?products_id=$2&%{QUERY_STRING}
RewriteRule ^(.)-c-(.).html$ index.php?cPath=$2&%{QUERY_STRING}
RewriteRule ^(.)-m-(.).html$ index.php?manufacturers_id=$2&%{QUERY_STRING}
RewriteRule ^(.)-pi-(.).html$ popup_image.php?pID=$2&%{QUERY_STRING}
RewriteRule ^(.)-by-(.).html$ all-products.php?fl=$2&%{QUERY_STRING}
RewriteRule ^(.)-t-(.).html$ articles.php?tPath=$2&%{QUERY_STRING}
RewriteRule ^(.)-a-(.).html$ article_info.php?articles_id=$2&%{QUERY_STRING}
RewriteRule ^(.)-au-(.).html$ articles.php?authors_id=$2&%{QUERY_STRING}
#RewriteRule ^(.)-pr-(.).html$ product_reviews.php?products_id=$2&%{QUERY_STRING}
RewriteRule ^(.)-pri-(.).html$ product_reviews_info.php?products_id=$2&%{QUERY_STRING}
RewriteRule ^(.)-f-(.).html$ faqdesk_info.php?faqdesk_id=$2&%{QUERY_STRING}
RewriteRule ^(.)-fc-(.).html$ faqdesk_index.php?faqPath=$2&%{QUERY_STRING}
RewriteRule ^(.)-fri-(.).html$ faqdesk_reviews_info.php?faqdesk_id=$2&%{QUERY_STRING}
RewriteRule ^(.)-fra-(.).html$ faqdesk_reviews_article.php?faqdesk_id=$2&%{QUERY_STRING}
RewriteRule ^(.)-i-(.).html$ information.php?info_id=$2&%{QUERY_STRING}
RewriteRule ^(.)-links-(.).html$ links.php?lPath=$2&%{QUERY_STRING}
RewriteRule ^(.)-pm-([0-9]+).html$ info_pages.php?pages_id=$2&%{QUERY_STRING}
RewriteRule ^(.)-n-(.).html$ newsdesk_info.php?newsdesk_id=$2&%{QUERY_STRING}
RewriteRule ^(.)-nc-(.).html$ newsdesk_index.php?newsPath=$2&%{QUERY_STRING}
RewriteRule ^(.)-nri-(.).html$ newsdesk_reviews_info.php?newsdesk_id=$2&%{QUERY_STRING}
RewriteRule ^(.)-nra-(.).html$ newsdesk_reviews_article.php?newsdesk_id=$2&%{QUERY_STRING}
RewriteRule ^(.)-po-([0-9]+).html$ pollbooth.php?pollid=$2&%{QUERY_STRING}End Ultimate SEO V2.2d
You may try it to see if it helps fix your issue.
-
Thanks, Don
You are right in your analysis - it is osC, but highly modified by myself. Yes, it does redirect.
That, however, is not the point. On the live site, the URL containing 1140 (for example) is never generated.
The mystery is how the Crawler can find something that isn't there! Magic.
Alan
-
Hi nordichouse,
You may want to check with your CMS provider. The urls are similar to Oscommerce which I'm experienced with, but I can see that isn't an Oscommerce setup. The system should have some sort of URL re-writer to deal with this problem.
The issue that I see is the system actually doesn't care what you type in between .co.uk/ and -p-1140.html
For example try this url to get a valid product..
http://www.nordichouse.co.uk/nipple-clips-p-1000.html
which is the same as
http://www.nordichouse.co.uk/-p-1000.html
But should 301 redirect to: http://www.nordichouse.co.uk/linen-style-collection-p-1000.htmlOscommerce has a URL 301 re-writer that prevents the system for using incorrect URL's I would hope your system does as well.
I'm not trying to avoid helping you, but the without an exact knowledge of how the system handles URL's it generates it is hard to troubleshoot, however since it is a CMS somebody who works on it should already have this knowledge.
My best,
Don
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
7,608 High Priority Crawl Diagnostic problems
Hey There, I have an e-commerce site that is showing 7,608 High Priorities to fix - 7,536 are duplicate content. What's the most effective process to start with? I'm open to outsourcing some of the work to an expert - email me on dave@emanbee.com Thanks for your time, Dave
Moz Pro | | emanbee0 -
Duplicate content in crawl despite canonical
Hi! I've had a bunch of duplicate content issues come up in a crawl, but a lot of them seem to have canonical tags implemented correctly. For example: http://www.alwayshobbies.com/brands/aztec-imports/-catg=Fireplaces http://www.alwayshobbies.com/brands/aztec-imports/-catg=Nursery http://www.alwayshobbies.com/brands/aztec-imports/-catg=Turntables http://www.alwayshobbies.com/brands/aztec-imports/-catg=Turntables?page=0 Aztec http://www.alwayshobbies.com/brands/aztec-imports/-catg=Turntables?page=1 Any ideas on what's happening here?
Moz Pro | | neooptic0 -
Rogerbot's crawl behaviour vs google spiders and other crawlers - disparate results have me confused.
I'm curious as to how accurately rogerbot replicates google's searchbot I've currently got a site which is reporting over 200 pages of duplicate/titles content in moz tools. The pages in question are all session IDs and have been blocked in the robot.txt (about 3 weeks ago), however the errors are still appearing. I've also crawled the page using screaming frog SEO spider. According to Screaming Frog, the offending pages have been blocked and are not being crawled. Webmaster tools is also reporting no crawl errors. Is there something I'm missing here? Why would I receive such different results. Which one's should I trust? Does rogerbot ignore robot.txt? Any suggestions would be appreciated.
Moz Pro | | KJDMedia0 -
Crawl Errors and Notices drop to zero
Hi all, After setting up a campaign in Moz the crawl is successful and it showed the Errors and Warnings in crawl diagnostics (each one had about 40-50), but after a few days the number dropped to zero. Only the "notices" seems to stay normal, with a slight drop since the campaign set up, but not dropping to zero. I set this campaign up in a colleague's account and the same thing happened shortly after set up. I didn't find any Q&A already posted so any insight is appreciated!
Moz Pro | | Vanessa120 -
Getting odd results with MOZbar. (Some pages are 0,0,0)
I'm trying to review the Domain aurhotiry, Page Authority, and MozRank & Moztrust for some news websites and I found it odd that may sites will have excellent DA,MT,MR & PA on most of the pages but then when I view one of their blog posts the PA,MR & MT are 0. Here's are two examples Site
Moz Pro | | SheffieldMarketing
http://www.washingtonpost.com/lifestyle Individual Post
http://www.washingtonpost.com/lifestyle/home/checking-in-with-thomas-pheasant/2012/10/30/a2920ed4-1df5-11e2-ba31-3083ca97c314_story.html Site
http://www.philly.com/philly/living/ Individual Post
http://www.philly.com/philly/home/Home_Style_Silver_makes_holiday_decorations_really_shine_.html Does that mean that links from blog posts would not be very benificial? The domain authoity is still crazy high but everything elese is 0. Anyone know why? I'm new to using the Moz bar. Thanks0 -
How to handle crawl diagnostic errors for the same url. /products & /products/
I have copied on of the errors out of the crawl diagnostics report. Both /products and /products/ are returning an error, and both have pretty good domain authority so I feel like its hurting my site that these show up this way. Both urls create the same page, should I just setup a 301 on the /products with no slash or will that cause more harm... I am using the MODx cms system and that could have something to do with it. | Products | Datalight http://www.datalight.com/products 1 37 5 Products | Datalight http://www.datalight.com/products/ | 1 | 30 | 1 |
Moz Pro | | tjsherrill0 -
How long does a crawl take?
A crawl of my site started on the 8th July & is still going on - is there something wrong???
Moz Pro | | Brian_Worger1