The Bible and Duplicate Content

LDS-SEO

We have our complete set of scriptures online, including the Bible at http://lds.org/scriptures. Users can browse to any of the volumes of scriptures. We've improved the user experience by allowing users to link to specific verses in context which will scroll to and highlight the linked verse. However, this creates a significant amount of duplicate content. For example, these links:

http://lds.org/scriptures/nt/james/1.5

http://lds.org/scriptures/nt/james/1.5-10

http://lds.org/scriptures/nt/james/1

All of those will link to the same chapter in the book of James, yet the first two will highlight the verse 5 and verses 5-10 respectively. This is a good user experience because in other sections of our site and on blogs throughout the world webmasters link to specific verses so the reader can see the verse in context of the rest of the chapter.

Another bible site has separate html pages for each verse individually and tends to outrank us because of this (and possibly some other reasons) for long tail chapter/verse queries. However, our tests indicated that the current version is preferred by users.

We have a sitemap ready to publish which includes a URL for every chapter/verse. We hope this will improve indexing of some of the more popular verses. However, Googlebot is going to see some duplicate content as it crawls that sitemap!

So the question is: is the sitemap a good idea realizing that we can't revert back to including each chapter/verse on its own unique page? We are also going to recommend that we create unique titles for each of the verses and pass a portion of the text from the verse into the meta description. Will this perhaps be enough to satisfy Googlebot that the pages are in fact unique? They certainly are from a user perspective.

Thanks all for taking the time!

AlanBleiweiss

Dave,

Thanks for the clarification. You're definitely in a rare circumstance as compared to most web sites.

In reality, since it's the Bible, there is going to be a duplicate content issue regardless, given how many sites currently and how many more will most likely publish the same content now and in the future. From Eternalministries.org to KingJamesBibleOnline.org, concordance.biblebrowser.com, and so many other sites are all offering this content.

If you can find a way to offer your content in a unique way, and within your own site, offer different versions of it (individual verses compared to entire chapters), then ideally yes, you'd want it all indexed.

How you do that without adding your own unique text above or below each page's direct biblical content is the issue though.

Given this challenge,this is why I offered the concept of not indexing variations. Even if you weren't hit by the Panda update, any time Google has to evaluate multiple pages across sites where the content is either identical or "mostly" identical, someone's content is going to suffer to one degree or another. Any time it's a conflict within a single site, some versions are going to be given less ranking value than others.

So unfortunately it's not a simple, straight forward situation where duplication avoidance can be guaranteed to provide the maximum reach, nor is there a simple way to boost multiple versions in a way to guarantee that they'll all be found, let alone show up above "competitor" sites.

This is why I initially offered what are essentially SEO best practices for addressing duplicate content.

If you don't want to lose the traffic you have now that come in by multiple means, the only other way to bolster what you've got already is to focus on high quality long term link building, and social media.

The link building would need to focus on obtaining high quality links pointing to deep content. (Specific chapter pages and specific verse pages), where the anchor text used in those links varies between chapter or verse specific words, broader bible related phrases, and the LDS brand.

On the other hand, by implementing canonical tags, you will definitely reduce at least a number of visits that currently come in by variation URLs. Will that be compensated for by an equal or greater number of visits to the new "preferred" URL? In this rather unique situation there's no way to truly know. It is a risk.

Which brings me back to the concept that you'd potentially be better off finding ways to add truly unique content around the biblical entries. It's the only on-site method I can think of that would allow you to continue to have multiple paths indexed. Combined with unique page Titles, chapter/verse targeted links and social media, it could very well make the difference.

With what, over 1100 chapters, and 31,000 verses, that's a lot of footwork. Then again, it's a labor of love, and every journey is made up of thousands of steps.

LDS-SEO

So you're saying it would not be a good idea to try and get every verse url listed in Google? Perhaps we could try adding a canonical tag to point the the chapter only? For example, browsing the site you can't actually navigate to http://lds.org/scriptures/nt/james/1.5?lang=eng. You can only navigate to /james/1?lang=eng. However, the other URLs exist when someone links externally to a specific chapter and verse. The code on the page will highlight the desired verse. In our example the entire chapter exists on its own url and the content is unique.

Your suggestion may work if we just canonicalize all those "verse" urls like /james/1.5?lang=eng and james/1.5-10?lang=eng to /james/1?lang=eng. Some of the more popular verses with great page authority could actually help prop up the rest of the content on the page.

My concern though is that MUCH of the scripture related traffic comes through queries of the exact chapter/verse reference. So I can see where having individual pages for each passage could be valuable for rankings. But that user experience is poor when someone wants to see a range of passages like ch 5 vs 1-4 or similar. So we are looking for the best way to get our URLs indexed and ranked as individual passages or ranges of passages that are popular on search engines.

I can tell you that this section was not hit by the Panda update. The content is not "thin" as could be the case if we put each verse on a single page.

The ?lang=eng parameter is how we handle language versions. We have the scriptures online in several languages. I'm sure there are better ways to handle that as well. Due to the size of the organization we're certainly trying to get the low hanging fruit out of the way first.

AlanBleiweiss

Dave,

You're facing a difficult challenge - satisfy the needs of SEO, or user experience. In light of all that Google has done going back to their May Day update last year and right through the Panda/Farmer update, duplicate content, as well as "thin" content, is more of a concern than ever.

Just having unique titles on each page is not enough. It's the entire weight of uniqueness.

Since you're not intending to go to individual pages for each verse, as long as you've got multiple methods of getting tocontent that is found by other methods, only one method should be designated as the primary search engine preferred method. All others should be blocked from being indexed.

From there, users can choose to explore other methods of finding content as they bookmark your site if they find it of help to their goals.

Unfortunately, this does of course, mean that you're going to end up with many less pages indexed. However every page that is indexed will become stronger in their individual rankings, and that in turn will boost all of the pages above them, and the entire site over time.

And here's another issue - when I go to any of the URLs you posted above, your site automatically tacks on "?lang=eng" using 301 Redirects. This means any inbound links you have pointing to the non-appended URLs are not providing maximum value to your site, since they point to pages designated as permanently moved.

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

The Bible and Duplicate Content

Got a burning SEO question?

Browse Questions

Explore more categories

Related Questions

Minimising the effects of duplicate content

Duplicate Content

Headers & Footers Count As Duplicate Content

Self inflicted duplicate content penalty?

Hosted Wordpress Blog creating Duplicate Content

Techniques for diagnosing duplicate content

An odd duplicate content issue...

How do I fix duplicate content with the home page?