Why does Crawl Diagnostics report this as duplicate content?

yacpro13

Hi guys,

we've been addressing a duplicate content problem on our site over the past few weeks. Lately, we've implemented rel canonical tags in various parts of our ecommerce store, over time, and observing the effects by both tracking changes in SEOMoz and Websmater tools.

Although our duplicate content errors are definitely decreasing, I can't help but wonder why some URLs are still being flagged with duplicate content by our SEOmoz crawler.

Here's an example, taken directly from our Crawl Diagnostics Report:

URL with 4 Duplicate Content errors:
/safety-lights.html

Duplicate content URLs:
/safety-lights.html ?cat=78&price=-100
/safety-lights.html?cat=78&dir=desc&order=position /safety-lights.html?cat=78 /safety-lights.html?manufacturer=514

What I don't understand, is all of the URLS with URL parameters have a rel canonical tag pointing to the 'real' URL
/safety-lights.html

So why is SEOMoz crawler still flagging this as duplicate content?

ChiarynMiranda

So glad I could help get this figured out! Sometimes it just takes another set of eyes.

-Chiaryn

yacpro13

Good catch Chiaryn! Totally didn't see this.

Essentially two URLs end up displaying the same content: 1 is the URL that's picked up by google from our XML sitemap, and the other is a dynamic URL with filtering parameters based on a one level higher category URL.

The canonical tags were set up in such a way that they point to the base category, which in this case, are different, even though the content is the same.

We will address this.

Thanks!

ChiarynMiranda

Hi there,

I looked into your campaign and it seems that this is happening because of where your canonical tags are pointing. These pages are considered duplicates because their canonical tags point to different URLs. For example, accessories/lights.html?cat=78&price=-100 is considered a duplicate of accessories/lights/safety-lights.html?manufacturer=514 because the canonical tag for the first page is accessories/lights.html while the canonical for the second URL is accessories/lights/safety-lights.html.

Since the canonical tags point to different pages it is assumed that accessories/lights.html and accessories/lights/safety-lights.html are likely to be duplicates themselves.

Here is how our system interprets duplicate content vs. rel canonical:

Assuming A, B, C, and D are all duplicates,

If A references B as the canonical, then they are not considered duplicates
If A and B both reference C as canonical, A and B are not considered duplicates of each other
If A references C as a canonical, A and B are considered duplicated
If A references C as canonical, B references D, then A and B are considered duplicates

The examples you've provided actually fall into the fourth example I've listed above.

I hope this clears things up. Please let me know if you have any other questions.

-Chiaryn

IainReloadMedia

Does seem a little odd. Could you post the domain so we can have a more detailed look?

Thanks

Iain - Reload Media

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

Why does Crawl Diagnostics report this as duplicate content?

Got a burning SEO question?

Browse Questions

Explore more categories

Related Questions

I have an issue with hubspot's blog platform and duplicate content.

WEbsite cannot be crawled

Site Crawl Error

Another site copied my content word for word. Whats the best way to handle or report this?

Forward slash on URL on Duplicate Content Report

False Pro reporting of duplicate titles

Crawl test tool from SEOmoz - which URLs does it actually crawl?

How to handle crawl diagnostic errors for the same url. /products & /products/