Crawl diagnostic errors due to query string

jmorehouse

I'm seeing a large amount of duplicate page titles, duplicate content, missing meta descriptions, etc. in my Crawl Diagnostics Report due to URLs' query strings. These pages already have canonical tags, but I know canonical tags aren't considered in MOZ's crawl diagnostic reports and therefore won't reduce the number of reported errors. Is there any way to configure MOZ to not consider query string variants as unique URLs? It's difficult to find a legitimate error among hundreds of these non-errors.

kevin.loesken

I'm glad to hear you got this figured out - thank you Patrick for your help!

Kevin
Help Team

jmorehouse

Hi Kevin,

I understand how MOZ's duplicate content system works. It would just be nice if it could take canonical URLs into consideration for Crawl Diagnostics Reports or give you the option of not counting URLs appended with parameters as unique pages.

Patrick was able to help me figure out that I can do the latter via the robots.txt feature by using a wildcard: Disallow: *?.

kevin.loesken

Hi there!

Our tool has a 90% tolerance for duplicate content, which means it will flag any content that has 90% of the same code between pages. This includes all the source code on the page and not just the viewable text. You can run your own tests using this tool: http://www.webconfs.com/similar-page-checker.php. In the case of http://www.optp.com/SMARTROLLER?cat_id=205#.VZreQhNVhBc and http://www.optp.com/SMARTROLLER?cat_id=54#.VZrdJhNVhBc, these pages are 100% similar, which is why they're being flagged.

I hope this helps! If you need any more help with your crawl, feel free to contact our Help Team at help@moz.com.

Thanks!

Kevin
Help Team

jmorehouse

This is very interesting! Strange that Webmaster tools wouldn't display duplicate content, but Google would still penalize you. I'd like to try this on my site, but am a little wary because I think some pages rank with the query string version of the URL, despite a canonical being specified.

AGMContainerControls

Is your traffic lower than expected?

I was having an issue like this where moz was showing a lot more duplicate content than webmaster tools was, actually webmaster tools showed none, but I was being penalized. I realized this when I added an exclusion to robots.txt to exclude any query strings on my site. After I did this I saw my rankings shoot through the roof.

Not saying that this is happening to you but I just like to err on the side of caution.

PatrickDelehanty

Hi there

My bad! Yeah - you could just do this:

User-agent: Rogerbot
Disallow: (check out this resource on how to block specific query strings)

Hope this helps! Good luck!

jmorehouse

Hi Patrick,

Thanks for the quick reply as always. As far as Google is concerned, these pages are set up correctly with canonical tags and URL strings - MOZ actually reports far more duplicate content than Webmaster tools.

My issue is just with the number of errors reported in MOZ. You mentioned that I can handle this via the robots.txt file - Is there a way to only disallow Rogerbot from crawling URLs with query strings, or URLs that contain a certain phrase such as "item_id=" or "cat_id="?

PatrickDelehanty

Hi there

Check out Google's duplicate content resources - they provide help on how to categorize your parameters and URL strings.

You can also handle this via your robots.txt. Make sure that you have a canonical tag on that page as well.

Hope this helps! Good luck!

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

Crawl diagnostic errors due to query string

Got a burning SEO question?

Browse Questions

Explore more categories

Related Questions

Server blocking crawl bot due to DOS protection and MOZ Help team not responding

Why my site not crawl?

Errors for URLS being too long and archive data is duplicate

Is there a way to take notes on a crawled URL?

Moz crawler is not able to crawl my website

Moz crawler is not able to crawl my website

Crawl error : 804 https (SSL) error

High Priority - Error Code 804: HTTPS (SSL) Error Encountered