Finding the source of duplicate content URL's

DocdataCommerce

We have a website that displays a number of products. The product has variations (sizes) and unfortunately every size has its own URL (for now anyway). Needless to say, this causes duplicate content issues. (And of course, we are looking to change the URL's for our site as soon as possible)

However, even though these duplicate URL's exist, you should not be able to land on them by navigating through the site. In theory, the site should always display the link to the smallest size. It seems that there is a flaw in our system somewhere, as these links are now found in our campaign here on SEOmoz.

My question: is there any way to find the crawl path that lead to the URL's that shouldn't have been found, so we can locate the problem?

Mark_Ginsberg

Using the Screaming Frog SEO Spider (free version to download will crawl 500 URLs, paid version [99 GBP for a yearly license] will crawl as much as you want), you can see all of the inlinks to a particular page. So run a crawl of the site, you should find those pages with Screaming Frog, and then you can view the inlinks to those pages. Visit the inlinks, and check the code for the links to the page you're looking for - this will quickly show you where the links are to the pages you're trying to hide.

Also, have you checked the sitemap - the CMS might create links to these pages in the sitemap.

good luck and let me know if you need any more help with this.

Mark

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

Finding the source of duplicate content URL's

Got a burning SEO question?

Browse Questions

Explore more categories

Related Questions

My "tag" pages are showing up as duplicate content. Is this harmful?

Duplicate Page Content, Indexing and Rel Canonical Just DOUBLED! Need Advice to Fix

How Moz takes a page title is duplicate?

Find Historical SERP Ranking for a Term?

Why does SEOMoz think I have duplicate content?

How to delete/redirect duplicate content

In my crawl diagnostics, there are links to duplicate content. How can I track down where these links originated in?

About Duplicate Content found by SEOMOZ... that is not duplicate