Finding the source of duplicate content URL's

DocdataCommerce

We have a website that displays a number of products. The product has variations (sizes) and unfortunately every size has its own URL (for now anyway). Needless to say, this causes duplicate content issues. (And of course, we are looking to change the URL's for our site as soon as possible)

However, even though these duplicate URL's exist, you should not be able to land on them by navigating through the site. In theory, the site should always display the link to the smallest size. It seems that there is a flaw in our system somewhere, as these links are now found in our campaign here on SEOmoz.

My question: is there any way to find the crawl path that lead to the URL's that shouldn't have been found, so we can locate the problem?

Mark_Ginsberg

Using the Screaming Frog SEO Spider (free version to download will crawl 500 URLs, paid version [99 GBP for a yearly license] will crawl as much as you want), you can see all of the inlinks to a particular page. So run a crawl of the site, you should find those pages with Screaming Frog, and then you can view the inlinks to those pages. Visit the inlinks, and check the code for the links to the page you're looking for - this will quickly show you where the links are to the pages you're trying to hide.

Also, have you checked the sitemap - the CMS might create links to these pages in the sitemap.

good luck and let me know if you need any more help with this.

Mark

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

Finding the source of duplicate content URL's

Got a burning SEO question?

Browse Questions

Explore more categories

Related Questions

Moz-Specific 404 Errors Jumped with URLs that don't exist

Facebook URLs, Anchor Text

5XX (Server Error) on all urls

Advice for 4000+ duplicate errors on 1st check

Why does my crawl diagnostics show duplicate content

Canonical link on canonical url

How to resolve Duplicate Content crawl errors for Magento Login Page

On-Page URL