Finding the source of duplicate content URL's
-
We have a website that displays a number of products. The product has variations (sizes) and unfortunately every size has its own URL (for now anyway). Needless to say, this causes duplicate content issues. (And of course, we are looking to change the URL's for our site as soon as possible)
However, even though these duplicate URL's exist, you should not be able to land on them by navigating through the site. In theory, the site should always display the link to the smallest size. It seems that there is a flaw in our system somewhere, as these links are now found in our campaign here on SEOmoz.
My question: is there any way to find the crawl path that lead to the URL's that shouldn't have been found, so we can locate the problem?
-
Using the Screaming Frog SEO Spider (free version to download will crawl 500 URLs, paid version [99 GBP for a yearly license] will crawl as much as you want), you can see all of the inlinks to a particular page. So run a crawl of the site, you should find those pages with Screaming Frog, and then you can view the inlinks to those pages. Visit the inlinks, and check the code for the links to the page you're looking for - this will quickly show you where the links are to the pages you're trying to hide.
Also, have you checked the sitemap - the CMS might create links to these pages in the sitemap.
good luck and let me know if you need any more help with this.
Mark
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Large site with content silo's - best practice for deep indexing silo content
Thanks in advance for any advice/links/discussion. This honestly might be a scenario where we need to do some A/B testing. We have a massive (5 Million) content silo that is the basis for our long tail search strategy. Organic search traffic hits our individual "product" pages and we've divided our silo with a parent category & then secondarily with a field (so we can cross link to other content silo's using the same parent/field categorizations). We don't anticipate, nor expect to have top level category pages receive organic traffic - most people are searching for the individual/specific product (long tail). We're not trying to rank or get traffic for searches of all products in "category X" and others are competing and spending a lot in that area (head). The intent/purpose of the site structure/taxonomy is to more easily enable bots/crawlers to get deeper into our content silos. We've built the page for humans, but included link structure/taxonomy to assist crawlers. So here's my question on best practices. How to handle categories with 1,000+ pages/pagination. With our most popular product categories, there might be 100,000's products in one category. My top level hub page for a category looks like www.mysite/categoryA and the page build is showing 50 products and then pagination from 1-1000+. Currently we're using rel=next for pagination and for pages like www.mysite/categoryA?page=6 we make it reference itself as canonical (not the first/top page www.mysite/categoryA). Our goal is deep crawl/indexation of our silo. I use ScreamingFrog and SEOMoz campaign crawl to sample (site takes a week+ to fully crawl) and with each of these tools it "looks" like crawlers have gotten a bit "bogged down" with large categories with tons of pagination. For example rather than crawl multiple categories or fields to get to multiple product pages, some bots will hit all 1,000 (rel=next) pages of a single category. I don't want to waste crawl budget going through 1,000 pages of a single category, versus discovering/crawling more categories. I can't seem to find a consensus as to how to approach the issue. I can't have a page that lists "all" - there's just too much, so we're going to need pagination. I'm not worried about category pagination pages cannibalizing traffic as I don't expect any (should I make pages 2-1,000) noindex and canonically reference the main/first page in the category?). Should I worry about crawlers going deep in pagination among 1 category versus getting to more top level categories? Thanks!
Moz Pro | | DrewProZ1 -
What's the best way to keep track of keyword rankings
Here's the deal. I keep track of my keyword rankings with the help of Rank Tracker from seopowersuite.com for the most part. I ran it on a daily basis and my keyword was not in top 100 for a few months. Moz.com panel shows pretty much the same (not in top 50) for the same months. That said, if I check that keyword ranking in my Google Webmaster Tools (avg. position) it says that its position (ranking?) was on average: 49, 7, 7, 8 (for the last four months). So, I'm not sure how it's even possible? How come Rank Tracker and Moz don't see any rankings and Google gives me sorta decent avg. positions at the same time. I assume that avg. postion means that same as avg. ranking, right? I'm not sure what I'm missing here.
Moz Pro | | VinceWicks0 -
Can't figure out why some of my pages are duplicate content
Within the crawl diagnostics area I'm getting duplicate page content issues on several pages. I don't know why, would anyone be able to tell me how these links are duplicate so I can fix them? http://www.sagenews.ca/Column.asp?id=3010 http://www.sagenews.ca/Column.asp?id=2808 http://www.sagenews.ca/Column.asp?id=2998 http://www.sagenews.ca/Column.asp?id=2837 http://www.sagenews.ca/Column.asp?id=2981
Moz Pro | | INMCA0 -
Your site's pages may be using techniques that are outside Google's Webmaster Guidelines
Hi All The message below I received from google webmaster, please tell me how I solve this problem Dear site owner or webmaster of http://testedfatburners.com/, We've detected that some of your site's pages may be using techniques that are outside Google's Webmaster Guidelines. If you have any questions about how to resolve this issue, please see ourWebmaster Help Forum for support. Sincerely, Google Search Quality Team
Moz Pro | | mkm1040 -
Results still being seen for old deleted content
Hi, Our site has a blog. We had "Categories" and "Tags" in our blog. This caused SEO chaos with Duplicate content and 404 Errors. We removed all the Categories and Tags about a month ago. Why would these issues still be showing up on Moz as errors on our Crawl Diagnostics page? Do we need to do some thing else other than just remove them from our blog. Thanks
Moz Pro | | Studio330 -
Massive Rank Changes in Bing/Yahoo Only - What's up w/ Rank Tracker?
I'll be honest, it's pretty shameful how little attention I give Bing and Yahoo. But something recently caught my eye - I have a client who according to rank tracker is all over the place in yahoo/bing...up 20 positions for some kw's, down 12 for others...meanwhile Google's holding steady, and Google data seems mostly accurate - with one exception of showing 46th position for an exact match branded search that they're #1 for. I know Rank Tracker has been going through some fine tuning, right? I don't manually check Yahoo/Bing - I can't imagine those #'s are accurate though. Am I right to assume it's incorrect data? Just curious, any insight would be much appreciated. Thanks
Moz Pro | | SVmedia0 -
Wrong duplicated page content
I found out that some errors on my website are considered as "duplicated page content" while they are not, the content is different on each page. I wonder why ? Is it an issue from Seomoz ?
Moz Pro | | Amadeus_eBC0 -
My crawl diagnostic is showing 2 duplicate content and titles.
First of all Hi - My name is Jason and I've just joined - How you all doing? My 1st question then: When I view where these errors are occurring it says www mydomain co uk and www mydomain co uk/index.html Isn't this the same page? I have looked into my root folder and only index.html exists.
Moz Pro | | JasonHegarty0