Finding the source of duplicate content URL's
-
We have a website that displays a number of products. The product has variations (sizes) and unfortunately every size has its own URL (for now anyway). Needless to say, this causes duplicate content issues. (And of course, we are looking to change the URL's for our site as soon as possible)
However, even though these duplicate URL's exist, you should not be able to land on them by navigating through the site. In theory, the site should always display the link to the smallest size. It seems that there is a flaw in our system somewhere, as these links are now found in our campaign here on SEOmoz.
My question: is there any way to find the crawl path that lead to the URL's that shouldn't have been found, so we can locate the problem?
-
Using the Screaming Frog SEO Spider (free version to download will crawl 500 URLs, paid version [99 GBP for a yearly license] will crawl as much as you want), you can see all of the inlinks to a particular page. So run a crawl of the site, you should find those pages with Screaming Frog, and then you can view the inlinks to those pages. Visit the inlinks, and check the code for the links to the page you're looking for - this will quickly show you where the links are to the pages you're trying to hide.
Also, have you checked the sitemap - the CMS might create links to these pages in the sitemap.
good luck and let me know if you need any more help with this.
Mark
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
My "tag" pages are showing up as duplicate content. Is this harmful?
Hi. I ran a Moz sitecrawl. I see "Yes" under "Duplicate Page Content" for each of my tag pages. Is this harmful? If so, how do I fix it? This is a Wordpress site. Tags are used in both the blog and ecommerce sections of the site. Ecommerce is a very small portion. Thank you. | |
Moz Pro | | dlmilli1 -
Duplicate Page Content, Indexing and Rel Canonical Just DOUBLED! Need Advice to Fix
Last Friday (Penguin 5/2.1) my website shot way off the grid and I noticed in my MOZ PRO Campaign dashboard that all of the following just doubled in numbers on my website: duplicate page content, Google indexing, and rel canonicals. I also noticed that some of my pages, images, tags and categories now added a /page/2/ or a -2. I just changed noindex for tags, but indexing for media, pages, posts, and categories. I'm currently using All In One SEO for a plugin. Any advice would be much appreciated as I'm stuck on the issue. relconical.png Duplicate-Page-Content.png [Duplicate Content II](Duplicate Content II) index1.png
Moz Pro | | CelebrityPersonalTrainer0 -
How Moz takes a page title is duplicate?
Suppose i have added suffix and prefix to each of my product (ex: i have two tittles like buy online t-shirt at abc.com & buy online poster at abc.com, so in this buy online and abc.com are suffix and prefix) so .. will it take these two page tittles as duplicates ?
Moz Pro | | vayush0 -
Find Historical SERP Ranking for a Term?
Is there any way to find out what pages ranked for a given term historically? I.e. what were the top 10 search results for "Widgets" 6 months ago, 1 year ago, 2 years ago? If I had a campaign tracking that term, I'd be able to look back, but I do not. Does this data exist anywhere in a format that could be queried?
Moz Pro | | kpclaypool0 -
Why does SEOMoz think I have duplicate content?
The SEOmoz crawl report shows me a large amount of duplicate content sites. Our site is built on a CMS that creates the link we want it to be but also automatically creates it's own longer version of the link (e.g. http://www.federalnational.com/About/tabid/82/Default.aspx and http://www.federalnational.com/about.aspx). We set the site up so that there are automatic redirects for our site. Google Webmaster does not see these pages as duplicate pages. Why does SEOmoz consider them duplicate content? Is there a way to weed this out so that the crawl report becomes more meaningful? Thanks!
Moz Pro | | jsillay0 -
How to delete/redirect duplicate content
Hello, Our site thewealthymind(dot)com has a lot of duplicate content. How do you clear up duplicate content when there's a lot of it. The owners redid the site several times and didn't update the URLs. Thank you.
Moz Pro | | BobGW0 -
In my crawl diagnostics, there are links to duplicate content. How can I track down where these links originated in?
How can I find out how SEOMOz found these links to begin with? That would help fix the issue. Where's the source page where the link was first encountered listed at?
Moz Pro | | kirklandsl0 -
About Duplicate Content found by SEOMOZ... that is not duplicate
Hi folks, I am hunting for duplicate content based on SEOMOZ great tool for that 🙂 I have some pages that are mentioned as duplicate but I cant say why. They are video page. The content is minimalistic so I guess it might be because all the navigation is the same but for instance http://www.nuxeo.com/en/resource-center/Videos/Nuxeo-World-2010/Nuxeo-World-2010-Presentation-Thierry-Delprat-CTO and http://www.nuxeo.com/en/resource-center/Videos/Nuxeo-World-2010/Nuxeo-World-2010-Presentation-Cheryl-McKinnon-CMO are mentioned as duplicate. Any idea? Is it hurting? Cheers,
Moz Pro | | nuxeo0