Update in Moz spider/tools?? Flagging duplicate content / ignoring canonical
-
Hi all,
Has there been an update in the SEOmoz crawling software?
We now have thousands of dupe content/page title warnings for paginated product page URLs that have correctly formatted canonicals.
e.g.
http://www.woolovers.com/british-wool/mens/tweed-green/wool-countryman-suede-patch-sweater.aspx
... has following pages with identical content that have been flagged:
http://www.woolovers.com/british-wool/mens/olive-green/wool-countryman-suede-patch-sweater.aspx?p=true&rspage=4
..plus 4 more URL's.
But they all have canonical set. There's even a notice at the bottom of report that tells us there's a canonical set to http://www.woolovers.com/british-wool/mens/tweed-green/wool-countryman-suede-patch-sweater.aspx
What gives, SEOmoz ??
Thanks
Michael
-
Hey Lawrence,
Campaigns have a 95% tolerance for duplicate content. This includes all the source code on the page and not just the viewable text. So if a URL is at least 95% similar in code and content to another URL, this warning will appear.
You can run your own tests using this tool: http://www.webconfs.com/similar-page-checker.php
We don't know what standard Google uses, but it's safe to say they are a bit more sophisticated than us - so you might be okay in this regard as long as you have a couple hundred words of unique text and some unique coding per page. Google won't say how much duplicate content is too much, so we like to be better safe than sorry.
I hope this help. Let me know if you need further assistance.
-Chiaryn
-
Hi Chiaryn,
Thanks for reply and explanation. The different colour-specific pages e.g. Tweed Green and Olive Green have some different content but it's nothing like enough in cases of two greens, two blues etc. as we simplify colour names for search so when there is an Olive and a Tweed Green they both end up having 'Green' as variable in page title, H1 etc. Will fix this.
Do you think the reviews at the bottom of the pages will also trigger dupe content warning? i.e. even if we make all other on-page elements unique for each colour url? (page title, H1, H2, prod description etc) The reviews are quite extensive and are the same on all the separate colour specific product page versions of each style and was thinking today whether we should remove them from these colour product pages (OR perhaps let the colour product pages have their OWN reviews)
http://www.woolovers.com/british-wool/mens/tweed-green/wool-countryman-suede-patch-sweater.aspx
Thanks again
-
Oh, brilliant (re: "See more" aspect) Thanks for the info. Will let you how we tackle this and the repercussions (!) and look forward to hearing how you get on also!
-
Hi Michael,
Thanks for writing in. I already emailed you in response to the ticket you sent in to the Help Desk, but I will copy my answer here for you review.
--
I looked into your campaign and it seems that this is happening because of where your canonical tags are pointing. These pages are considered duplicates because their canonical tags point to different URLs. For example, http://www.woolovers.com/british-wool/mens/tweed-green/wool-countryman-suede-patch-sweater.aspx is considered a duplicate of http://www.woolovers.com/british-wool/mens/olive-green/wool-countryman-suede-patch-sweater.aspx?p=true&rspage=4 because the canonical tag for the first page is http://www.woolovers.com/british-wool/mens/tweed-green/wool-countryman-suede-patch-sweater.aspx while the canonical for the second URL ishttp://www.woolovers.com/british-wool/mens/olive-green/wool-countryman-suede-patch-sweater.aspx, with one URL showing tweed-green and the other showing olive-green.
Since the canonical tags point to different URLs it is assumed that http://www.woolovers.com/british-wool/mens/tweed-green/wool-countryman-suede-patch-sweater.aspx and http://www.woolovers.com/british-wool/mens/olive-green/wool-countryman-suede-patch-sweater.aspx are likely to be duplicates themselves.
Here is how our system interprets duplicate content vs. rel canonical:
Assuming A, B, C, and D are all duplicates,
If A references B as the canonical, then they are not considered duplicates
If A and B both reference C as canonical, A and B are not considered duplicates of each other
If A references C as a canonical, A and B are considered duplicated
If A references C as canonical, B references D, then A and B are considered duplicates
The examples you've provided actually fall into the fourth example I've listed above.I hope this clears things up. Please let me know if you have any other questions.
--
-Chiaryn
-
We use the "See more" script on our sites, and from what I understand, at least from other Mozzers, this is an okay practice. http://www.seomoz.org/q/using-more-info-javascript-toggledisplay-tag-for-more-info-text
We also use the rel="prev" and rel="next" to some success, but I can't comment on how that's functioning canonical-wise, because IT WAS DROPPED from our latest redesign and is going to be added to our client's website in the latest release. Oye.
I'd love to hear how this works out for you. There are some really great Mozzers on here with loads of experience about canonical tags and duplicate page issues. Can't wait to see what they have to contribute.
-
Hi there,
Thanks for your response.
It's not product page A being seen as a duplicate of product page B etc, but several versions of product A seen as duplicate due to pagination, stemming from reviews for the products that span several pages, so making the rest of the content, titles etc different other than the (crawlable) reviews isn't really an option.
Will look more into "noindex, follow" tags in pagination.
We could have a View All page for indexing showing all reviews (with lots of scrolling!) , with the paginated versions canonicalized to that version (could still serve the paginated version of product page from site navigation perhaps with "noindex, follow" meta tag) Text doesn’t take long to load and this approach would consolidate the review content.
http://googlewebmastercentral.blogspot.co.uk/2011/09/view-all-in-search-results.html
Other option is to use rel=”prev” and rel=”next” implementation which shows Google the relationship between the pages (not sure if it will still be flagged as dupe content in SEOmoz though! Depends if they follow the tag). This way individual pages might get indexed (not sure if that's a good thing?!) perhaps if there's something in a review from (say) page 5 of the product reviews.
http://googlewebmastercentral.blogspot.co.uk/2011/09/pagination-with-relnext-and-relprev.html
Ideally I'd like to implement all reviews on one page and hide them with a facebook-style 'See more' function. Not sure if that counts as hiding content? Will look into this.
-
Hi Michael,
Not sure if this helps you out at all, but I found this about the canonicals and SEOMoz crawl report in a previous Q http://mz.cm/11erRj6:
As far as the SEOmoz crawl reports go, not that setting a canonical won't stop these pages being reported as duplicate content.
From the help:
"Keep in mind that that canonicals will stop the pages from ranking against each other, but they will still show up as duplicate content from a UI perspective, so we will still count them as duplicate."
I have the same issues on my accounts. I'm focusing on making the pages content as unique as possible, or using the "noindex, follow" meta tags to see if that makes a difference.
I know you may have a lot of pages on your website, but perhaps writing short descriptions on your products would help. It might be worthwhile, but completely understandable that it may be a huge undertaking if you have hundreds or thousands of pages.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Unsolved Is the Keyword Explorer tool down?
Hi, I was supposed to work with researching keywords today. After researching a few queries in the Keyword Explorer, it stopped retrieving keywords and shows the message, “Getting keyword suggestions failed. Please retry your search or refresh this page”. The issue has persisted for a few hours. Thanks in advance.
Moz Pro | | wp-annalv0 -
Duplicate Page
I just Check Crawl the status error with Duplicate Page Content. As Mentioned Below. Songs.pk | Download free mp3, Hindi Music, Indian Mp3 Songs http://www.getmp3songspk.com Songs.pk | Download free mp3, Hindi Music, Indian Mp3 Songs http://getmp3songspk.com and then i added these lines to my htaccess file RewriteBase /
Moz Pro | | Getmp3songspk
RewriteCond %{HTTP_HOST} !^www.getmp3songspk.com$ [NC]
RewriteRule ^(.*)$ http://www.getmp3songspk.com/$1 [L,R=301] But Still See that error again when i crawl a new test.0 -
URL Parameters causing duplicate content - Login/Registration page
All, I just recently acquired a new client and right away I noticed an abundance of duplicate content being recorded after the moz crawl diagnostics was completed. After a quick digest of the issue, it seems that the majority (90%) of the outlined duplicated content is stemming from the client's Login/Registration page. Upon clicking (without being logged-in) any asset or forum discussion board link within the site, the user is automatically redirected to the Login/Registration page, which seems to create this massive redirect loop associated with dynamic url parameters. Ex. After clicking on a select internal link (asset or discussion board) the user is redirected to the Login/Register page which presents the page and a URL that looks a lot this this: Ex. 1 https://www.clientsite.com/register-login?ReturnUr...xxxx%xxxx%xxxx%...... Ex. 2 https://www.clientsite.com**/register-login?returnurl=/register-login?returnurl=/register-login?returnurl=/page-titl**e/ These URLs seem to becoming larger and larger... The client wants to ensure users have to Login/Register within their site before they're allowed to view the content. This process doesn't allow for any type of preview page to be viewed by a user prior to clicking on the internal link, which in turn doesn't allow any preview pages to be indexed. Right now, Moz is picking up all of the redirect and labeling them as duplicate page content/duplicate page titles based on the Login/Registration page. Questions/Comments: Would it be wise to create preview pages for the asset pages and discussion board pages to allow for proper indexing? - Could this be a CMS issue? Current being used on this is, Kentico. There are thousands of pages being recorded in the crawl as duplicate, however only 14 seem to be indexing with duplicate title tags. 301 or canonical redirect strategy? Moz crawl data issue? Again, this is my first look at this issue, so more information is bound to come out soon! Please let me know if anyone has run into this issue and if you have a possible solution to get rid of this redirect loop process. Thanks! -T
Moz Pro | | MattLacuesta0 -
Updating Meta Keywords
Hi I am going through the process of cleaning up the SEO on my blog www.shoottokyo.com. Someone recommended that I can use ScreamingFrog to find the location of 4xx errors and I noticed that there are Meta Keywords on about 200 of my posts but some of them are wrong such as it mentions my old city I lived in or my old camera I used to use. I want to clean these up. If I look on the post itself in Wordpress I don't even this this information. Where can I edit it? Is there a way to easily edit across multiple posts? I previously used All in One SEO perhaps these came from that and I need to reinstall that to clean this up? I'm new to all of this expect a lot of questions. Thanks Dave
Moz Pro | | ShootTokyo0 -
Duplicate page content due to Sort By dropdown
Hi there, I have over 150 Duplicate Page Title errors showing up in SEOMoz but on closer inspection these are related to the 'Sort By:' functionality on our ecommerce site that allows customers to sort our products by Price, Alphabetically etc. To give an example: http://www.parklanechampagne.co.uk/park-lane-champagne/special-occasions/easter Is showing as being duplicated by this page: http://www.parklanechampagne.co.uk/park-lane-champagne/special-occasions/easter?productlisting_page=1&sortorder=Price Does anyone know how I can resolve this? Any help greatly appreciated. Kind regards, Jon CDFyp.jpg
Moz Pro | | jonmorse860 -
Duplicate Content Issues with WordPress
I'm having some difficulty with a few of the sites I'm managing right now. When I run a report here, I'm getting a duplicate content issue with sites that I'm running through WordPress. Sites running on a different CMS are not getting the issue. The duplicate content is being listed as from two URL's that are identical. I checked trailing slash, spelling, capitalization, everything. It looks like the same site is being marked as two with duplicate content. Does anyone have any ideas of what could be causing this and/or what I may be able to do to resolve the issue (or if it's really something to worry about or not)? Thanks. (and thanks for helping the new guy!)
Moz Pro | | DeliaAssociates0 -
Will canonical tag get rid of duplicate page title errors?
I have a directory on my website, paginated in groups of 10. On page 2 of the results, the title tag is the same as the first page, as it is on the 3rd page and so on. This is giving me duplicate page title errors. If i use rel=canonical tags on the subsequent pages and href the first page of my results, will my duplicate page title warnings go away? thanks.
Moz Pro | | fourthdimensioninc0 -
Why are these pages considered duplicate page content?
A recent crawl diagnostic for a client's website had several new duplicate page content errors. The problem is, I'm not sure where the error comes from since the content in the webpage is different from one another. Here's the pages that SEOMOZ reported to have duplicate page content errors: http://www.imaginet.com.ph/wireless-internet-service-providers-term http://www.imaginet.com.ph/antivirus-term http://www.imaginet.com.ph/berkeley-internet-name-domain http://www.imaginet.com.ph/customer-premises-equipment-term The only thing similar that I see is the headline which says "Glossary Terms Used in this Site" - I hope that the one sentence is the reason for the error. Any input is appreciated as I want to find out the best solution for my client's website errors. Thanks!
Moz Pro | | TheNorthernOffice790