The Bible and Duplicate Content
-
We have our complete set of scriptures online, including the Bible at http://lds.org/scriptures. Users can browse to any of the volumes of scriptures. We've improved the user experience by allowing users to link to specific verses in context which will scroll to and highlight the linked verse. However, this creates a significant amount of duplicate content. For example, these links:
http://lds.org/scriptures/nt/james/1.5
http://lds.org/scriptures/nt/james/1.5-10
http://lds.org/scriptures/nt/james/1
All of those will link to the same chapter in the book of James, yet the first two will highlight the verse 5 and verses 5-10 respectively. This is a good user experience because in other sections of our site and on blogs throughout the world webmasters link to specific verses so the reader can see the verse in context of the rest of the chapter.
Another bible site has separate html pages for each verse individually and tends to outrank us because of this (and possibly some other reasons) for long tail chapter/verse queries. However, our tests indicated that the current version is preferred by users.
We have a sitemap ready to publish which includes a URL for every chapter/verse. We hope this will improve indexing of some of the more popular verses. However, Googlebot is going to see some duplicate content as it crawls that sitemap!
So the question is: is the sitemap a good idea realizing that we can't revert back to including each chapter/verse on its own unique page? We are also going to recommend that we create unique titles for each of the verses and pass a portion of the text from the verse into the meta description. Will this perhaps be enough to satisfy Googlebot that the pages are in fact unique? They certainly are from a user perspective.
Thanks all for taking the time!
-
Dave,
Thanks for the clarification. You're definitely in a rare circumstance as compared to most web sites.
In reality, since it's the Bible, there is going to be a duplicate content issue regardless, given how many sites currently and how many more will most likely publish the same content now and in the future. From Eternalministries.org to KingJamesBibleOnline.org, concordance.biblebrowser.com, and so many other sites are all offering this content.
If you can find a way to offer your content in a unique way, and within your own site, offer different versions of it (individual verses compared to entire chapters), then ideally yes, you'd want it all indexed.
How you do that without adding your own unique text above or below each page's direct biblical content is the issue though.
Given this challenge,this is why I offered the concept of not indexing variations. Even if you weren't hit by the Panda update, any time Google has to evaluate multiple pages across sites where the content is either identical or "mostly" identical, someone's content is going to suffer to one degree or another. Any time it's a conflict within a single site, some versions are going to be given less ranking value than others.
So unfortunately it's not a simple, straight forward situation where duplication avoidance can be guaranteed to provide the maximum reach, nor is there a simple way to boost multiple versions in a way to guarantee that they'll all be found, let alone show up above "competitor" sites.
This is why I initially offered what are essentially SEO best practices for addressing duplicate content.
If you don't want to lose the traffic you have now that come in by multiple means, the only other way to bolster what you've got already is to focus on high quality long term link building, and social media.
The link building would need to focus on obtaining high quality links pointing to deep content. (Specific chapter pages and specific verse pages), where the anchor text used in those links varies between chapter or verse specific words, broader bible related phrases, and the LDS brand.
On the other hand, by implementing canonical tags, you will definitely reduce at least a number of visits that currently come in by variation URLs. Will that be compensated for by an equal or greater number of visits to the new "preferred" URL? In this rather unique situation there's no way to truly know. It is a risk.
Which brings me back to the concept that you'd potentially be better off finding ways to add truly unique content around the biblical entries. It's the only on-site method I can think of that would allow you to continue to have multiple paths indexed. Combined with unique page Titles, chapter/verse targeted links and social media, it could very well make the difference.
With what, over 1100 chapters, and 31,000 verses, that's a lot of footwork. Then again, it's a labor of love, and every journey is made up of thousands of steps.
-
So you're saying it would not be a good idea to try and get every verse url listed in Google? Perhaps we could try adding a canonical tag to point the the chapter only? For example, browsing the site you can't actually navigate to http://lds.org/scriptures/nt/james/1.5?lang=eng. You can only navigate to /james/1?lang=eng. However, the other URLs exist when someone links externally to a specific chapter and verse. The code on the page will highlight the desired verse. In our example the entire chapter exists on its own url and the content is unique.
Your suggestion may work if we just canonicalize all those "verse" urls like /james/1.5?lang=eng and james/1.5-10?lang=eng to /james/1?lang=eng. Some of the more popular verses with great page authority could actually help prop up the rest of the content on the page.
My concern though is that MUCH of the scripture related traffic comes through queries of the exact chapter/verse reference. So I can see where having individual pages for each passage could be valuable for rankings. But that user experience is poor when someone wants to see a range of passages like ch 5 vs 1-4 or similar. So we are looking for the best way to get our URLs indexed and ranked as individual passages or ranges of passages that are popular on search engines.
I can tell you that this section was not hit by the Panda update. The content is not "thin" as could be the case if we put each verse on a single page.
The ?lang=eng parameter is how we handle language versions. We have the scriptures online in several languages. I'm sure there are better ways to handle that as well. Due to the size of the organization we're certainly trying to get the low hanging fruit out of the way first.
-
Dave,
You're facing a difficult challenge - satisfy the needs of SEO, or user experience. In light of all that Google has done going back to their May Day update last year and right through the Panda/Farmer update, duplicate content, as well as "thin" content, is more of a concern than ever.
Just having unique titles on each page is not enough. It's the entire weight of uniqueness.
Since you're not intending to go to individual pages for each verse, as long as you've got multiple methods of getting tocontent that is found by other methods, only one method should be designated as the primary search engine preferred method. All others should be blocked from being indexed.
From there, users can choose to explore other methods of finding content as they bookmark your site if they find it of help to their goals.
Unfortunately, this does of course, mean that you're going to end up with many less pages indexed. However every page that is indexed will become stronger in their individual rankings, and that in turn will boost all of the pages above them, and the entire site over time.
And here's another issue - when I go to any of the URLs you posted above, your site automatically tacks on "?lang=eng" using 301 Redirects. This means any inbound links you have pointing to the non-appended URLs are not providing maximum value to your site, since they point to pages designated as permanently moved.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Email and landing page duplicate content issue?
Hi Mozers, my question is, if there is a web based email that goes to subscribers, then if they click on a link it lands on a Wordpress page with very similar content, will Google penalize us for duplicate content? If so is the best workaround to make the email no index no follow? Thanks!
Technical SEO | | CalamityJane770 -
Home Page Blog Snippets - Duplicate Content Help?
Afternoon Folks- I have been asked to contribute to a new site that has a blogfeed prominently displayed on the home page. It's laid out like this: Logo | Menu HOME PAGE SLIDER Blog 1 Title about 100 words of blog 1 Text Blog 2 Title about 100 words of blog 2 Text Blog 3 Title about 100 words of blog 3 Text Footer: -- This seems like an obvious duplicate content situation but also a way I have seen a lot of blogs laid out. (I.E. With blog content snippets being a significant portion of the home page content) I want the blogs to rank and I want the home page to rank, so I don't feel like a rel canonical on the blog post's is the correct option unless I have misunderstood their purpose. Anyone have any ideas or know how this is usually handled?
Technical SEO | | CRO_first0 -
Duplicate Content within Site
I'm very new here... been reading a lot about Panda and duplicate content. I have a main website and a mobile site (same domain - m.domain.com). I've copied the same text over to those other web pages. Is that okay? Or is that considered duplicate content?
Technical SEO | | CalicoKitty20000 -
Duplicate content question...
I have a high duplicate content issue on my website. However, I'm not sure how to handle or fix this issue. I have 2 different URLs landing to the same page content. http://www.myfitstation.com/tag/vegan/ and http://www.myfitstation.com/tag/raw-food/ .In this situation, I cannot redirect one URL to the other since in the future I will probably be adding additional posts to either the "vegan" tag or the "raw food tag". What is the solution in this case? Thank you
Technical SEO | | myfitstation0 -
Avoiding Cannibalism and Duplication with content
Hi, For the example I will use a computers e-commerce store... I'm working on creating guides for the store -
Technical SEO | | BeytzNet
How to choose a laptop
How to choose a desktop I believe that each guide will be great on its own and that it answers a specific question (meaning that someone looking for a laptop will search specifically laptop info and the same goes for desktop). This is why I didn't creating a "How to choose a computer" guide. I also want each guide to have all information and not to start sending the user to secondary pages in order to fill in missing info. However, even though there are several details that are different between the laptops and desktops, like importance of weight, screen size etc., a lot of things the checklist (like deciding on how much memory is needed, graphic card, core etc.) are the same. Please advise on how to pursue it. Should I just write two guides and make sure that the same duplicated content ideas are simply written in a different way?0 -
Duplicate content issue with trailing / ?
Hi ,I did a SEOmoz Crawl Test and found most pages show twice, for example: A: www.website.com/index.php/dog/walk B: www.website.com/index.php/dog/walk/ I've checked Google Analytics and 90% of organic search traffic arrives on the URLs with the trailing slash (B). Question 1: Can I assume I've a duplicate content problem? Question 2: Is it best to do 301 redirects from the 'non trailing slash' pages to the 'trailing slash pages'? Question 3: For some reason every web page has a '/index.php' in it (see A&B) above. No idea why. Should it be a SEO concern? Kind regards and thank you in advance Nigel
Technical SEO | | Richard5550 -
How to Solve Duplicate Page Content Issue?
I have created one campaign over SEOmoz tools for my website. I have found 89 duplicate content issue from report. Please, look in to Duplicate Page Content Issue. I am quite confuse to resolve this issue. Can any one suggest me best solution to resolve it?
Technical SEO | | CommercePundit0 -
Duplicate content
This is just a quickie: On one of my campaigns in SEOmoz I have 151 duplicate page content issues! Ouch! On analysis the site in question has duplicated every URL with "en" e.g http://www.domainname.com/en/Fashion/Mulberry/SpringSummer-2010/ http://www.domainname.com/Fashion/Mulberry/SpringSummer-2010/ Personally my thoughts are that are rel = canonical will sort this issue, but before I ask our dev team to add this, and get various excuses why they can't I wanted to double check i am correct in my thinking? Thanks in advance for your time
Technical SEO | | Yozzer0