Update in Moz spider/tools?? Flagging duplicate content / ignoring canonical
-
Hi all,
Has there been an update in the SEOmoz crawling software?
We now have thousands of dupe content/page title warnings for paginated product page URLs that have correctly formatted canonicals.
e.g.
http://www.woolovers.com/british-wool/mens/tweed-green/wool-countryman-suede-patch-sweater.aspx
... has following pages with identical content that have been flagged:
http://www.woolovers.com/british-wool/mens/olive-green/wool-countryman-suede-patch-sweater.aspx?p=true&rspage=4
..plus 4 more URL's.
But they all have canonical set. There's even a notice at the bottom of report that tells us there's a canonical set to http://www.woolovers.com/british-wool/mens/tweed-green/wool-countryman-suede-patch-sweater.aspx
What gives, SEOmoz ??
Thanks
Michael
-
Hey Lawrence,
Campaigns have a 95% tolerance for duplicate content. This includes all the source code on the page and not just the viewable text. So if a URL is at least 95% similar in code and content to another URL, this warning will appear.
You can run your own tests using this tool: http://www.webconfs.com/similar-page-checker.php
We don't know what standard Google uses, but it's safe to say they are a bit more sophisticated than us - so you might be okay in this regard as long as you have a couple hundred words of unique text and some unique coding per page. Google won't say how much duplicate content is too much, so we like to be better safe than sorry.
I hope this help. Let me know if you need further assistance.
-Chiaryn
-
Hi Chiaryn,
Thanks for reply and explanation. The different colour-specific pages e.g. Tweed Green and Olive Green have some different content but it's nothing like enough in cases of two greens, two blues etc. as we simplify colour names for search so when there is an Olive and a Tweed Green they both end up having 'Green' as variable in page title, H1 etc. Will fix this.
Do you think the reviews at the bottom of the pages will also trigger dupe content warning? i.e. even if we make all other on-page elements unique for each colour url? (page title, H1, H2, prod description etc) The reviews are quite extensive and are the same on all the separate colour specific product page versions of each style and was thinking today whether we should remove them from these colour product pages (OR perhaps let the colour product pages have their OWN reviews)
http://www.woolovers.com/british-wool/mens/tweed-green/wool-countryman-suede-patch-sweater.aspx
Thanks again
-
Oh, brilliant (re: "See more" aspect) Thanks for the info. Will let you how we tackle this and the repercussions (!) and look forward to hearing how you get on also!
-
Hi Michael,
Thanks for writing in. I already emailed you in response to the ticket you sent in to the Help Desk, but I will copy my answer here for you review.
--
I looked into your campaign and it seems that this is happening because of where your canonical tags are pointing. These pages are considered duplicates because their canonical tags point to different URLs. For example, http://www.woolovers.com/british-wool/mens/tweed-green/wool-countryman-suede-patch-sweater.aspx is considered a duplicate of http://www.woolovers.com/british-wool/mens/olive-green/wool-countryman-suede-patch-sweater.aspx?p=true&rspage=4 because the canonical tag for the first page is http://www.woolovers.com/british-wool/mens/tweed-green/wool-countryman-suede-patch-sweater.aspx while the canonical for the second URL ishttp://www.woolovers.com/british-wool/mens/olive-green/wool-countryman-suede-patch-sweater.aspx, with one URL showing tweed-green and the other showing olive-green.
Since the canonical tags point to different URLs it is assumed that http://www.woolovers.com/british-wool/mens/tweed-green/wool-countryman-suede-patch-sweater.aspx and http://www.woolovers.com/british-wool/mens/olive-green/wool-countryman-suede-patch-sweater.aspx are likely to be duplicates themselves.
Here is how our system interprets duplicate content vs. rel canonical:
Assuming A, B, C, and D are all duplicates,
If A references B as the canonical, then they are not considered duplicates
If A and B both reference C as canonical, A and B are not considered duplicates of each other
If A references C as a canonical, A and B are considered duplicated
If A references C as canonical, B references D, then A and B are considered duplicates
The examples you've provided actually fall into the fourth example I've listed above.I hope this clears things up. Please let me know if you have any other questions.
--
-Chiaryn
-
We use the "See more" script on our sites, and from what I understand, at least from other Mozzers, this is an okay practice. http://www.seomoz.org/q/using-more-info-javascript-toggledisplay-tag-for-more-info-text
We also use the rel="prev" and rel="next" to some success, but I can't comment on how that's functioning canonical-wise, because IT WAS DROPPED from our latest redesign and is going to be added to our client's website in the latest release. Oye.
I'd love to hear how this works out for you. There are some really great Mozzers on here with loads of experience about canonical tags and duplicate page issues. Can't wait to see what they have to contribute.
-
Hi there,
Thanks for your response.
It's not product page A being seen as a duplicate of product page B etc, but several versions of product A seen as duplicate due to pagination, stemming from reviews for the products that span several pages, so making the rest of the content, titles etc different other than the (crawlable) reviews isn't really an option.
Will look more into "noindex, follow" tags in pagination.
We could have a View All page for indexing showing all reviews (with lots of scrolling!) , with the paginated versions canonicalized to that version (could still serve the paginated version of product page from site navigation perhaps with "noindex, follow" meta tag) Text doesn’t take long to load and this approach would consolidate the review content.
http://googlewebmastercentral.blogspot.co.uk/2011/09/view-all-in-search-results.html
Other option is to use rel=”prev” and rel=”next” implementation which shows Google the relationship between the pages (not sure if it will still be flagged as dupe content in SEOmoz though! Depends if they follow the tag). This way individual pages might get indexed (not sure if that's a good thing?!) perhaps if there's something in a review from (say) page 5 of the product reviews.
http://googlewebmastercentral.blogspot.co.uk/2011/09/pagination-with-relnext-and-relprev.html
Ideally I'd like to implement all reviews on one page and hide them with a facebook-style 'See more' function. Not sure if that counts as hiding content? Will look into this.
-
Hi Michael,
Not sure if this helps you out at all, but I found this about the canonicals and SEOMoz crawl report in a previous Q http://mz.cm/11erRj6:
As far as the SEOmoz crawl reports go, not that setting a canonical won't stop these pages being reported as duplicate content.
From the help:
"Keep in mind that that canonicals will stop the pages from ranking against each other, but they will still show up as duplicate content from a UI perspective, so we will still count them as duplicate."
I have the same issues on my accounts. I'm focusing on making the pages content as unique as possible, or using the "noindex, follow" meta tags to see if that makes a difference.
I know you may have a lot of pages on your website, but perhaps writing short descriptions on your products would help. It might be worthwhile, but completely understandable that it may be a huge undertaking if you have hundreds or thousands of pages.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Large site with content silo's - best practice for deep indexing silo content
Thanks in advance for any advice/links/discussion. This honestly might be a scenario where we need to do some A/B testing. We have a massive (5 Million) content silo that is the basis for our long tail search strategy. Organic search traffic hits our individual "product" pages and we've divided our silo with a parent category & then secondarily with a field (so we can cross link to other content silo's using the same parent/field categorizations). We don't anticipate, nor expect to have top level category pages receive organic traffic - most people are searching for the individual/specific product (long tail). We're not trying to rank or get traffic for searches of all products in "category X" and others are competing and spending a lot in that area (head). The intent/purpose of the site structure/taxonomy is to more easily enable bots/crawlers to get deeper into our content silos. We've built the page for humans, but included link structure/taxonomy to assist crawlers. So here's my question on best practices. How to handle categories with 1,000+ pages/pagination. With our most popular product categories, there might be 100,000's products in one category. My top level hub page for a category looks like www.mysite/categoryA and the page build is showing 50 products and then pagination from 1-1000+. Currently we're using rel=next for pagination and for pages like www.mysite/categoryA?page=6 we make it reference itself as canonical (not the first/top page www.mysite/categoryA). Our goal is deep crawl/indexation of our silo. I use ScreamingFrog and SEOMoz campaign crawl to sample (site takes a week+ to fully crawl) and with each of these tools it "looks" like crawlers have gotten a bit "bogged down" with large categories with tons of pagination. For example rather than crawl multiple categories or fields to get to multiple product pages, some bots will hit all 1,000 (rel=next) pages of a single category. I don't want to waste crawl budget going through 1,000 pages of a single category, versus discovering/crawling more categories. I can't seem to find a consensus as to how to approach the issue. I can't have a page that lists "all" - there's just too much, so we're going to need pagination. I'm not worried about category pagination pages cannibalizing traffic as I don't expect any (should I make pages 2-1,000) noindex and canonically reference the main/first page in the category?). Should I worry about crawlers going deep in pagination among 1 category versus getting to more top level categories? Thanks!
Moz Pro | | DrewProZ1 -
Setting up MOZ to run on Staging
Hi Moz, We would like to setup Moz to run on our Staging Server. This would be extremely valuable as it alerts us to new SEO issues/risks in a controlled and secure environment that is not exposed to production. Our internal team has recommended potentially setting up a reverse proxy server that will validate either via Moz's/Rogerbot's http header or IP and allow Moz access to our Staging environment to crawl. Is this something that we can setup with Moz? Are there other ideas to enable Moz to crawl our Staging server?
Moz Pro | | kriskunisch0 -
Tools for editorial organizations?
Other than google trends, are there tools a publication could give its writers to help them optimize their articles. Sometimes these articles are very newsy pieces, sometimes they are more evergreen features. In other words, if they're writing a story about the Greek Debt Crisis, I'd love to give them a tool that would suggest the language that is trending most at the moment and then, as the story evolves, identify the language that would be important then. Are there tools to help with this?
Moz Pro | | nymedia0 -
Duplicate Content in Blog
Hi, SEOMoz on-page analysis is reporting that our blog has duplicate content when technically it doesn't. Is this something that we need to address as it will actually be hurting our ranking or is this just a SEOMoz software quirk? There is 100+ example like this but here is one example. SEOMoz is reporting http://www.invoicestudio.com/Blog/author/InvoiceStudio?page=1 and http://www.invoicestudio.com/Blog/author/InvoiceStudio?page=2 as a duplicate content and Title Tag. Thanks Andrew
Moz Pro | | Studio330 -
Did moz stop doing webinars?
The last recorded webinar is from april did moz stop doing these? Luckily i have all the moz con videos t go thru (which are awesome by the way-thanks)
Moz Pro | | DavidKonigsberg1 -
Why cant I hook up facebook to SEO Moz Campaign Tool?
Greetings from the digital epicentre otherwise known as Wetherby Uk 😉 I want to track sentiments ie mentions, slurs fliratations of a brand on Facebook. So to start out with I thought 'Id hook up SEO moz social tracking service as illustrated here: http://i216.photobucket.com/albums/cc53/zymurgy_bucket/cant-add-facebook-url.jpg But whilst i could add my twitter account I caount not add my favebook page:http://www.facebook.com/david.honan.98I added david.honan.98 in the url box and a number of other versions but none successfully hooked up. Am enetering the wrong url or is SEO moz socail media facebook plugin jinxed? Thanks in advance, David
Moz Pro | | Nightwing0 -
Duplicate page content showing up with proper use of canonical tag
Hi, In the Crawl diagnostics reports, I'm getting lots of duplicate errors warnings e.g. duplicate page title. In most cases these are tracking urls and the page has a canonical tag pointing to the original page. It would be helpful if the crawl analysis reports could separate these out from ones that are of genuine concern. It can also happen when there's a noindex tag on a page. Thanks, Leigh
Moz Pro | | Leighm0 -
"Rank Tracker Tool" is not agreeing with "Keyword Difficulty Tool"
I usually don't sweat a lot for ranks and such but last couple of days, our rankings have been moving drastically. 'Rank Tracker' shows 1st position for many keywords and the "keyword difficulty tool" shows 2nd and 3rd positions. Is is just me or this is a common thing?
Moz Pro | | Syed10