Why does SEOmoz bot see duplicate pages despite I am using the canonical tag?
-
Hello here,
today SEOmoz bot found and marked as "duplicate content" the following pages on my website:
http://www.virtualsheetmusic.com/score/PatrickCollectionFlPf.html?tab=mp3
http://www.virtualsheetmusic.com/score/PatrickCollectionFlPf.html?tab=pdf
And I am wondering why considering the fact I am using on both those pages a canonical tag pointing to the main product page below:
http://www.virtualsheetmusic.com/score/PatrickCollectionFlPf.html
Shouldn't SEOmoz bot follow the canonical directive and not report those two pages as duplicate?
Thank you for any insights I am probably missing here!
-
Thank you Peter, I got your ticket reply.
That makes perfect sense, and as Dr. Peter pointed out on a different thread:
http://www.seomoz.org/q/why-seomoz-bot-consider-these-as-duplicate-pages
I was discussing this issue further, I was confused by your report.
Thank you again for your help and I hope you will improve your report interface to avoid such confusion related issues in the future.
Best,
Fabrizio
-
Hi there,
Thanks for reaching out to us, I replied to you in a support ticket, but I just wanted to share it everyone since I think it might be relevant to this discussion.
I looked into your campaign and it seems that this is happening because of where your canonical tags are pointing, you can see the duplicate pages by clicking on the number to the right side of the link. These pages are considered duplicates because their canonical tags point to different URLs. For example:
http://www.virtualsheetmusic.com/score/PatrickCollectionFlPf.html?tab=mp3(Duplicate 1) is considered a duplicate of
http://www.virtualsheetmusic.com/score/PatrickCollectionVcPf.html?tab=mp3 (Duplicate 2)because the canonical tag for the first page is CANON1(http://screencast.com/t/tqvDZrLsyz8D) while the canonical for the second URL is CANON2 (http://screencast.com/t/FOguPJmK0).
Since the canonical tags point to different pages it is assumed that CANON1 and CANON2 are likely to be duplicates themselves.
Here is how our system interprets duplicate content vs. rel canonical:
Assuming A, B, C, and D are all duplicates,
If A references B as the canonical, then they are not considered duplicates
If A and B both reference C as canonical, A and B are not considered duplicates of each other
If A references C as a canonical, A and B are considered duplicated
If A references C as canonical, B references D, then A and B are considered duplicates
The examples you've provided actually fall into the fourth example I've listed above.Hope that helps,
Best,
Peter
SEOmoz Help Team. -
Thinking furthermore, I don't see how these pages can be considered nearly duplicate since their content is quite different:
http://www.virtualsheetmusic.com/score/PatrickCollectionFlPf.html?tab=mp3
http://www.virtualsheetmusic.com/score/PatrickCollectionFlPf.html?tab=pdf
Thoughts??!!
-
Nobody can tell me why SEOmoz ignore my canonical tag definitions? According to some comments on the following thread:
http://www.seomoz.org/blog/visualizing-duplicate-web-pages
It should actually ignore pages with a canonical tag and NOT mark them as duplicate, but in my experience (as explained above), that's not been the case.
-
Ok, thank you, now I get the point... then here is my next question: is there a way to tell SEOmoz bot to ignore duplicate page with a defined canonical tag? If not, the SEOmoz duplicate page report is useless for me. I am not interested to know about duplicate page for which I have already defined a canonical tag for.
Thanks!
-
Canonical lets you pick which of the duplicates will be indexed. But Google still has to crawl the other pages when they could be crawling other parts of your site. It's an opportunity cost. If you can accept slower crawls, you can ignore the issue.
-
I am sorry, but I don't understand your point. If two pages are similar, we can use the canonical tag to "consolidate" them and avoid duplicate issues. Am I right? Or what are canonical tags for?
-
While I agree that SEOMOZ should better categorize duplicates that are canonical, the reason they still tell you it's duplicate is crawl budget. Remember, Google still has to crawl these duplicate pages and they could be crawling something else instead. Canonical only helps by letting you pick which duplicate content gets indexed. It's better to not have duplicate content than to have canonical duplicates.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Should I put rel next and rel prev and canonical on tags pages
Hi I have a tag pages on a news website each tag page is divided to several pages, but Google does't crawled those pages because the links are in javaScript, I want to do the following things: Change the links to html href Add rel=pref rel=next Add a canonical in each page with the url of the main tag page Do you agree with my solution? Thanks Roy
Intermediate & Advanced SEO | | kadut1 -
Duplicate content but different pages?
Hi there! Im getting LOTS of "duplicate content" pages but the thing is they are different pages. My website essentially is a niche video hosting site with embedded videos from Youtube. Im working on adding personal descriptions to each video but keeping the same video title (should I re-word it from the original also? Any help?
Intermediate & Advanced SEO | | sarevme0 -
How would you handle this duplicate content - noindex or canonical?
Hello Just trying look at how best to deal with this duplicated content. On our Canada holidays page we have a number of holidays listed (PAGE A)
Intermediate & Advanced SEO | | KateWaite
http://www.naturalworldsafaris.com/destinations/north-america/canada/suggested-holidays.aspx We also have a more specific Arctic Canada holidays page with different listings (PAGE B)
http://www.naturalworldsafaris.com/destinations/arctic-and-antarctica/arctic-canada/suggested-holidays.aspx Of the two, the Arctic Canada page (PAGE B) receives a far higher number of visitors from organic search. From a user perspective, people expect to see all holidays in Canada (PAGE A), including the Arctic based ones. We can tag these to appear on both, however it will mean that the PAGE B content will be duplicated on PAGE A. Would it be the best idea to set up a canonical link tag to stop this duplicate content causing an issue. Alternatively would it be best to no index PAGE A? Interested to see others thoughts. I've used this (Jan 2011 so quite old) article for reference in case anyone else enters this topic in search of information on a similar thing: Duplicate Content: Block, Redirect or Canonical - SEO Tips0 -
Does Google View "SRC", "HREF", TITLE and Alt tags as Duplicate Content on Home Page Slider?
Greetings MOZ Community. A keyword matrix was developed by my SEO firm. I am in the process of integrating primary, secondary and terciary phrases into the text and am also sprinkling three or four other terms. Using a keyword density tool (http://www.webconfs.com/keyword-density-checker.php) the results were somewhat unexpected after I optimized. So I then looked at the source code and noticed text from HREF, ALT and SRC tags that may be effecting how Google would interpret text on the page. Our home page (www.nyc-officespace-leader.com) contains a slider with commercial real estate listings. Would Google index the SRC, HREF, TITLE and ALT tags in these slider items? Would this be detrimental to SEO? The code for one listing (and there are 7-8 in the slider) looks like this: | href="http://www.nyc-officespace-leader.com/listings/305-fifth-avenue-office-suite-1340sf" title="Lease a Prestigious Fifth Avenue Office - Manhattan, New York">Class A Fifth Avenue Offices class="blockLeft"><a< p=""></a<> href="http://www.nyc-officespace-leader.com/listings/305-fifth-avenue-office-suite-1340sf" title="Lease a Prestigious Fifth Avenue Office - Manhattan, New York"> src="http://dr0nu3l9a17ym.cloudfront.net/wp-content/uploads/fsrep/houses/125x100/305.jpg" alt="Lease a Prestigious Fifth Avenue Office - Manhattan, New York" width="125" height="94" /> 1,340 Sq. Ft. $5,918 / month Fifth Avenue Midtown / Grand Central <a< p=""></a<> | Could the repetition of the title text ("lease a Prestigious Fifth...") trigger a duplicate content penalty? Should the slider content be blocked or set to no-index by some kind of a Java script? We have worked very hard to optimize the home page so it would be a real shame if through some technical oversight we got hit by a Google Panda penalty. Thanks, Alan Thanks
Intermediate & Advanced SEO | | Kingalan10 -
Best Way to Incorporate FAQs into Every Page - Duplicate Content?
Hi Mozzers, We want to incorporate a 'Dictionary' of terms onto quite a few pages on our site, similar to an FAQ system. The 'Dictionary' has 285 terms in it, with about 1 sentence of content for each one (approximately 5,000 words total). The content is unique to our site and not keyword stuffed, but I am unsure what Google will think about us having all this shared content on these pages. I have a few ideas about how we can build this, but my higher-ups really want the entire dictionary on every page. Thoughts? Image of what we're thinking here - http://screencast.com/t/GkhOktwC4I Thanks!
Intermediate & Advanced SEO | | Travis-W0 -
Artist Bios on Multiple Pages: Duplicate Content or not?
I am currently working on an eComm site for a company that sells art prints. On each print's page, there is a bio about the artist followed by a couple of paragraphs about the print. My concern is that some artists have hundreds of prints on this site, and the bio is reprinted on every page,which makes sense from a usability standpoint, but I am concerned that it will trigger a duplicate content penalty from Google. Some people are trying to convince me that Google won't penalize for this content, since the intent is not to game the SERPs. However, I'm not confident that this isn't being penalized already, or that it won't be in the near future. Because it is just a section of text that is duplicated, but the rest of the text on each page is original, I can't use the rel=canonical tag. I've thought about putting each artist bio into a graphic, but that is a huge undertaking, and not the most elegant solution. Could I put the bio on a separate page with only the artist's info and then place that data on each print page using an <iframe>and then put a noindex,nofollow in the robots.txt file?</p> <p>Is there a better solution? Is this effort even necessary?</p> <p>Thoughts?</p></iframe>
Intermediate & Advanced SEO | | sbaylor0 -
Is this structure valid for a canonical tag?
Working on a site, and noticed their canonical tags follow the structure: //www.domain.com/article They cited their reason for this as http://www.ietf.org/rfc/rfc3986.txt. Does anyone know if Google will recognize this as a valid canonical? Are there any issues with using this as a the canonical?
Intermediate & Advanced SEO | | nicole.healthline0 -
Proper use and coding of rel = "canonical" tag
I'm working on a site that has pages for many wedding vendors. There are essentially 3 variations of the page for each vendor with only slightly different content, so they're showing up as "duplicate content" in my SEOmoz Campaign. Here's an example of the 3 variations: http://www.weddingreportsma.com/MA-wedding.cfm/vendorID/4161 http://www.weddingreportsma.com/MA-wedding.cfm?vendorID=4161&action=messageWrite http://www.weddingreportsma.com/MA-wedding.cfm?vendorID=4161&action=writeReview Because of this, we placed a rel="canoncial" tag in the second 2 pages to try to fix the problem. However, the coding does not seem to validate in the w3 html validator. I can't say I understand html well enough to understand the error the validator is pointing out. We also added a the following to the second 2 types of pages <meta name="robots" content="noindex"> Am I employing this tag correctly in this case? Here is a snippet of the code below. <html> <head> <title>Reviews on Astonishing Event, Inc from Somerset MAtitle> <link rel="stylesheet" type="text/css" href="[/includes/style.css](view-source:http://www.weddingreportsma.com/includes/style.css)"> <link href="[http://www.weddingreportsma.com/MA-wedding.cfm/vendorID/4161](view-source:http://www.weddingreportsma.com/MA-wedding.cfm/vendorID/4161)" rel="canonical" /> <meta name="robots" content="noindex">
Intermediate & Advanced SEO | | jeffreytrull1
<meta name="keywords" content="Astonishing Event, Inc, Somerset Massachusetts, Massachusetts Wedding Wedding Planners Directory, Massachusetts weddings, wedding Massachusetts ">
<meta name="description" content="Get information and read reviews on Astonishing Event, Inc from Somerset MA. Astonishing Event, Inc appears in the directory of Somerset MA wedding Wedding Planners on WeddingReportsMA.com."> <script src="[http://www.google-analytics.com/urchin.js](view-source:http://www.google-analytics.com/urchin.js)" type="text/javascript">script> <script type="text/javascript"> _uacct = "UA-173959-2"; urchinTracker(); script> head>0