Why SEOmoz bot consider these as duplicate pages?
-
Hello here,
SEOmoz bot has recently marked the following two pages as duplicate:
http://www.virtualsheetmusic.com/score/PatrickCollectionFlPf.html?tab=mp3
http://www.virtualsheetmusic.com/score/PatrickCollectionFlPf.html?tab=pdf
I don't personally see how these pages can be considered duplicate since their content is quite different.
Thoughts??!!
-
Thank you so much! I really appreciated your reply which clarified everything for me.
I will follow your advice!
All the best,
-
We get this confusion often enough that we'll be changing it up a bit in the near future.
If this was a common problem, I'd probably recommend a different structure, with a parent page that splits into arrangements (cello/piano, flute/piano, etc.) and then rel=canonical to the parent product. Practically, though, this looks like a very isolated case on your site affecting maybe a dozen pages out of thousands. I probably wouldn't lose sleep over it, as I doubt it's having much impact either way. I think it's just something to be aware of for down the road, as the site grows.
-
Yes! Got it! You are absolutely right, I read the report in the wrong order! Here is how the reports listed the duplicate pages:
http://www.virtualsheetmusic.com/score/PatrickCollectionFlPf.html?tab=mp3
http://www.virtualsheetmusic.com/score/PatrickCollectionFlPf.html?tab=pdf
http://www.virtualsheetmusic.com/score/PatrickCollectionVcPf.html?tab=mp3
http://www.virtualsheetmusic.com/score/PatrickCollectionVcPf.html?tab=pdf
So, I thought the first couple above was a duplicate, and the second couple the second duplicate, instead here are the right coupled duplicate pages:
http://www.virtualsheetmusic.com/score/PatrickCollectionFlPf.html?tab=mp3
http://www.virtualsheetmusic.com/score/PatrickCollectionVcPf.html?tab=mp3
and the second couple:
http://www.virtualsheetmusic.com/score/PatrickCollectionFlPf.html?tab=pdf
http://www.virtualsheetmusic.com/score/PatrickCollectionVcPf.html?tab=pdf
So, I agree that the SEOmoz duplicate report should be improved graphically to avoid such a kind of confusion.
And that kind of duplicate issue is actually something that I might need to fix on my part... but with the fact that both duplicate pages belong to two different items and have two different canonical definitions may possibly solve the problem by itself... or not? I guess this is one of those rare cases where SEs can actually get confused!
What would you suggest to do with this kind of cross-similar product pages? Those are legitimate pages belonging to two different items that have the same kind of content (i.e. same included music pieces) but written for different instruments! And here is, in fact, another thread where I am discussing about how to handle these kind of similar products found often in the music industry, where the same piece of music can be written for several different instruments causing nearly-duplicate pages:
http://www.seomoz.org/q/canonical-tag-how-to-deal-with-product-variations-in-the-music-industry
Any further thoughts are very welcome.
Thank you again Dr. Meyers!
-
The duplicate content interface can occasionally be confusing in our campaign manager. I think you're reading this wrong, as I look at your account (to be fair to the other people trying to help, they don't have the ability to do that and are doing their best to assist). You have some duplicates due to a navigational issue, I think. For example:
http://www.virtualsheetmusic.com/score/PatrickCollectionFlPf.html?tab=mp3
http://www.virtualsheetmusic.com/score/PatrickCollectionVcPf.html?tab=mp3
These appear to be nearly identical, except for breadcrumb links. I think that's what we're picking up on. They each canonical to their core HTML page (without parameters), but those two pages are different, so the duplicates appear to be true duplicates.
I think your tabs are generally ok, and Google doesn't seem to be indexing the "tab=mp3" vs. "tab=pdf", etc. versions. I'm not sure canonical is completely consistent with Google's intentions (they aren't true duplicates), but it's probably a safe bet.
I won't give any numbers, but your duplicate content error count relative to your total indexed page could is incredibly low, so I think this may just be a fluke of a product that got double-listed or a category that has two paths.
-
I see your point and I agree that maybe a Javascript solution could better help, but the use of rel=prev/next, in my opinion, wouldn't be appropriate. That's more pertinent for multiple page lists/indexes.
-
I see your point, but you are still looking and my posted issue here the other way around. My question again then is: the fact SEOmoz bot tells me that those two pages are "identical" can't be because of my canonical definition. Therefore must be due to:
1. SEOmoz bot sees those pages identical from a SE stand point (and then I shouldn't worry about my canonical definition because the canonical tag should "fix" that problem). But in this case SEOmoz bot should not mark those page as duplicate because of my canonical tag definition.
2 SEOmoz bot sees those pages identical from a UI stand point, which I don't agree on (as a human I see those pages NOT identical). If canonical tags were made for humans, I wouldn't use them if this was the problem (UI duplicate issue). But since canonical tags are made for robots, I shouldn't worry about my canonical definitions if this is the case, specifically if SEOmoz bot marked those pages as duplicate from a UI stand point.
Does this make sense?
-
This is circular.
"If SEOmoz bot tells me that those two pages are "duplicate" pages, and with the fact both pages belong to the same item, I don't see what's wrong using a canonical tag pointing to the "main" page of the same item."
Your original question was "I don't personally see how these pages can be considered duplicate since their content is quite different."
You need to make a choice. Either you think they ARE duplicate and you want to use canonicals the way you have, or you do NOT think they are duplicate and your canonicals are wrong. You can't have it both ways.
The**
rel="canonical"
** attribute should be used only to specify the preferred version of many pages with identical content (although minor differences, such as sort order, are okay).http://support.google.com/webmasters/bin/answer.py?hl=en&answer=139394
"Should only be used on pages with identical content."
You don't believe this content is identical (thus your original question) so clearly you should not have the canonicals pointing the way they are.
-
You are correct a canonical will take care of it, and using a canonical does not tell the search engine they are identical. It works just like a 301 except for the fact that it does not physically move the users to the canonical page.
But does the search engine take the content from all urls and give the canonical value for al the content, I an not sure it dose, I have never tested it, so I would rather do something with JavaScript or maybe use previous and next tags.
-
I am sorry, I have realized now that your are suggesting me that the SEOmoz bot has marked those two pages as duplicate "because of my canonical definition"? Is that what you meant? If so, that puzzles me even more because I don't think a canonical definition shared by two or more pages can "create" two or more duplicate pages by itself! Doesn't make sense, according to my knowledge a canonical tag helps avoiding duplicate issues, not the opposite way around.
-
Thank you for your advice, but I am not really a SEO newbie. I begun working on SEO back in 1996 and I have been mentored by Bruce Clay a big deal. I am aware of my website situation and I joined recently these forums trying to improve my SEO knowledge furthermore and to stay up-to-date.
Thank you again.
-
I don't think with a canonical tag I tell search engines that those page are "identical", I just tell them that those pages can be "consolidated" as belonging to the same item. Or, as Google stated:
"A canonical page is the preferred version of a set of pages with highly similar content"
What's wrong with my canonical definition then??!!
-
I am sorry Matt, but your statement puzzles me. I have "confused search engines"?Google states:
"A canonical page is the preferred version of a set of pages with highly similar content:"
http://support.google.com/webmasters/bin/answer.py?hl=en&answer=139394
If SEOmoz bot tells me that those two pages are "duplicate" pages, and with the fact both pages belong to the same item, I don't see what's wrong using a canonical tag pointing to the "main" page of the same item.
-
From a human point of view they are different. But humans don't manage bots, just bot rules. Bot rules will follow logic and thus the answer I wrote out below is accurate.
IMHO your canonical tags are wrong. That's the problem. You have told bots that both pages are "the same" (canonical) to /score/PatrickCollectionFlPf.html They aren't - they have separate content. By putting in the wrong canonical tags, you've confused search engines. Bots follow the rules as stated. Your rule says they are the same, so search bots treat them the same.
-
**Canonical for the first link: **
http://www.virtualsheetmusic.com/score/PatrickCollectionFlPf.html" />
Canonical for the second link:
http://www.virtualsheetmusic.com/score/PatrickCollectionFlPf.html" />
You're telling search engines, including the Moz Bot, that the two pages have the exact same content as /score/PatrickCollectionFlPf.html
Now I'll break this down simply. First link is A, second link is B, canonical link is C.
A=C
B=C
Therefore A=B.
You've told bots that the mp3 tab is the same content (canonical) as the .html page. You have told bots that the pdf tab is the same content (canonical) as the .html page. Therefore if they are both duplicates of /score/PatrickCollectionFlPf.html, they are duplicates of each other.
-
Fabrizo,
I am saying what I would do if this were my site.
You have posted many questions on this forum about this site and have gotten advice from many different people.
Forums are great places to learn and lots of people spend lots of time here and give very generous answers.
In my opinion this site has technical problems that you are only going to get solved when a really competent person has the time to study it thoroughly.
I am not trying to drum up work for myself by suggesting a pro. I don't do SEO for hire.
I am just giving you my opinion on what is needed for this site.
Good luck. I've given you my best and final thoughts.
-
I am sorry, but I don't see the pertinence of this answer. Are these forums to learn and discuss SEO or just to find potential SEO experts to hire?!
I hope someone else can help me to understand what I am trying to figure out on this thread.
Thanks!
-
That's a good point I didn't think about... But the canonical tag should take care of that anyway, isnt't it?
UPDATE: I have looked at the meta tags (title and description), and they are not really identical...
-
I don't know how the mozbot analyzes that aspect of pages, so this may or may not be a factor in it declaring the two pages as duplicate. But the fact that all your metadata is nearly identical for the two pages can't be helping.
-
I would hire an expert who knows how these things are handled by search engines.
-
Ha I guess so
I'm new to SEO so my tech side comes out... Why do it simply when you can over complicate it!
-
Why don't simply use the canonical tag? Aren't canonical tags made also for that?
-
I think there is some confusion here. I think we must approach this issue by looking from 2 perspectives only: from the SE stand point and the user (UI) stand point.
From the SE stand point, I have setup a canonical tag definition which should take care of the duplicate issue (if I am not correct here, what are canonical tags for?).
From the user stand point, I repeat what I stated above: I don't see those two pages so similar as the bot has reported since the main content is completely different indeed (different textual content, different media, different purpose), therefore the duplicate issue from a UI prospective, is my opinion irrelevant.
To reinforce my thesis above, the fact you are suggesting me to approach such a "possible" duplicate issue via AJAX, tells me that my biggest concern should be from a SE stand point (which, I repeat, should have been tackled with the canonical tag) and not from a UI stand point (otherwise, why use AJAX instead than URL parameters if the UI end result is the same??!).
I will wait for your further thoughts. I am sorry, but I am not convinced by what you are telling me and I still don't understand what value I must then give to the duplicate report from SEOmoz bot considering that: 1. SEOmoz bot ignores the canonical tag and then... 2. SEOmoz bot is concerned simply from a UI stand point, which then put me back to my first question: do you, as humans, consider those two pages as duplicate? Do you see there really the same content? Please, be careful: I am asking that from a "human" stand point (hence from a UI stand point), not from a SE stand point. I am sure that if I ask granny to tell me if those two pages look the same, she's gonna think I wanna make fun of her.Thoughts?
-
I'm not sure on these things but if it's a parameter issue i.e. the url only being different after the ?, could a quick solution be to use htaccess and take the tab parameter and insert it into the url? Not sure how scale-able that would be though...
-
I am not going to look at this site any further because it is at the limits of my ability to diagnose.
However, I think that parameters are causing a huge problem, I think that there is a lot of linking into search results, and I think that there is a big problem with thin and duplicate content.
If this was my site I would hire a pro who knows about this stuff, be willing to undertake a major restructuring, and be willing to write an awful lot of content.
===================
that's the last I can offer.... good luck
-
Good point, I would be looking at a ajax solution.
-
In my opinion, these are not two different pages. They are the same page with a different parameter.
I am not an expert on how search engines handle these types of URLs but if this was my site I would be using a technology that allows different tabs to display without adding a parameter to the URL.....
-
I am sorry, but I don't agree on that: one page includes a long list of media files that the other one doesn't include. I see these two pages quite different as main content. Of course top navigation, side and bottom are identical (typical in an ecommerce sites), but the main content is quite different, in my opinion.
Look at the problem this way: what do you think should I do to differentiate those pages furthermore? Adding more and different text? I see the first page listing the media files already including a good number of text completely different by the second page. If SEOmoz duplicate page algorithm is giving feedback from a UI stand point (seen that it ignores completely my canonical tag definition on those two pages), as a "human" myself I see those pages with a completely different content and purpose. Therefore, I assume the algorithm is faulty in some way. Do you really see those pages with nearly identical content as a human yourself??!
-
I would say that because they main difference is a image v some flash. the text content is very much the same
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
403 error but page is fine??
Hi, on my report im getting 4xx error. When i look into it it says the error is crital fo4r 403 error on this page https://gaspipes.co.uk/contact-us/ i can get to the page and see it fine but no idea why its showing a 403 error or how to fix it. This is the only page that the error is coming up on, is there anything i can check/do to get this resolved? Thanks
Moz Pro | | JU-Mark0 -
Duplicate Content/Missing Meta Description | Pages DO NOT EXISIT!
Hello all, For the last few months, Moz has been showing us that our site has roughly 2,000 duplicate content errors. Pages that were actually duplicate content, I took care of accordingly using best practice (301 redirects, canonicalization,etc.). Still remaining after these fixes were errors showing for pages that we have never created. Our homepage is www.primepay.com. An example of pages that are being shown as duplicate content is http://primepay.com/blog/%5BLink%20to%20-%20http:/www.primepay.com/en/payrollservices/payroll/payroll/payroll/online-payroll with a referring page of http://primepay.com/blog/%5BLink%20to%20-%20http:/www.primepay.com/en/payrollservices/payroll/payroll/online-payroll. Some of these are even now showing up as 403 and 404 errors. The only real page on our site within that URL strand is primepay.com/payroll or primepay.com/payroll/online-payroll. Therefore, I am not sure where Moz is getting these pages from. Another issue we are having in relation to duplicate content is that moz is showing old campaign url’s tacked on to our blog page i.e. http://primepay.com/blog?title=&page=2&utm_source=blog&utm_medium=blogCTA&utm_campaign=IRSblogpost&qt-blog_tabs=1. As of this morning, our duplicate content went from 2,000 to 18,000. I exported all of our crawl diagnostics data and looked to see what the referring pages were, and even they are not pages that we have created. When you click on these links, they take you to a random point in time from the homepage of our blog; some dating back to 2010. I checked our crawl stats in both Google and Bing’s Webmaster tool, and there are no duplicate content or 400 level errors being reporting from their crawl. My team is truly at a loss with trying to resolve this issue and any help with this matter would be greatly appreciated.
Moz Pro | | PrimePay0 -
I've got quite a few "Duplicate Page Title" Errors in my Crawl Diagnostics for my Wordpress Blog
Title says it all, is this an issue? The pages seem to be set up properly with Rel=Canonical so should i just ignore the duplicate page title erros in my Crawl Diagnostics dashboard? Thanks
Moz Pro | | SheffieldMarketing0 -
Sorting Dupe Content Pages
Hi, I'm no excel pro, and I'm having a bit of a challenge interpreting the Crawl Diagnostics export .csv file. I'd like to see at a glance which of my pages (and I have many) are the worst offenders for dupe content – ie. which have the most "Other URLs" associated with them. Thanks, would appreciate any advice on how other people are using this data, and/or how 'Moz recommends to do it. 🙂
Moz Pro | | ntcma0 -
SEOMoz Tool Bar
Can google put a temporary ban on my IP using the SEOmoz Toolbar too many times? TY!
Moz Pro | | TP_Marketing0 -
Why can't I add my facebook page to SEOMOZ? Also having other facebook issues.
Hi, I have no trouble adding my twitter page in SEOMOZ, but its giving me an error when I try to load my facebook page http://www.facebook.com/pages/Eugene-Computer-Geeks/226660334011653 . I also tried adding my personal facebook page which is tied to the Eugene Computer Geeks facebook page, but SEOMOZ wont accept that either. My business facebook page is tied to my personal account, and its also not showing up on the facebook search. Any idea how I can make my business show up? I wish I could just start over fresh and have my buinsess setup with it's own facebook account. Thanks.
Moz Pro | | eugenecomputergeeks1 -
On Page Analysis and Grading
I am new here and happy to be! My site is an ecommerce site with hundreds of products. I have set up campaigns to track specific products. For the on page analysis where SEOMOZ gives you a grade I have 2 urls showing. But 1 of the urls is getting an A, and 1 is getting a F. But they are the same url and obviously go to the same page. Any help would be appreciated!
Moz Pro | | Confections0 -
About Duplicate Content found by SEOMOZ... that is not duplicate
Hi folks, I am hunting for duplicate content based on SEOMOZ great tool for that 🙂 I have some pages that are mentioned as duplicate but I cant say why. They are video page. The content is minimalistic so I guess it might be because all the navigation is the same but for instance http://www.nuxeo.com/en/resource-center/Videos/Nuxeo-World-2010/Nuxeo-World-2010-Presentation-Thierry-Delprat-CTO and http://www.nuxeo.com/en/resource-center/Videos/Nuxeo-World-2010/Nuxeo-World-2010-Presentation-Cheryl-McKinnon-CMO are mentioned as duplicate. Any idea? Is it hurting? Cheers,
Moz Pro | | nuxeo0