The Bible and Duplicate Content
-
We have our complete set of scriptures online, including the Bible at http://lds.org/scriptures. Users can browse to any of the volumes of scriptures. We've improved the user experience by allowing users to link to specific verses in context which will scroll to and highlight the linked verse. However, this creates a significant amount of duplicate content. For example, these links:
http://lds.org/scriptures/nt/james/1.5
http://lds.org/scriptures/nt/james/1.5-10
http://lds.org/scriptures/nt/james/1
All of those will link to the same chapter in the book of James, yet the first two will highlight the verse 5 and verses 5-10 respectively. This is a good user experience because in other sections of our site and on blogs throughout the world webmasters link to specific verses so the reader can see the verse in context of the rest of the chapter.
Another bible site has separate html pages for each verse individually and tends to outrank us because of this (and possibly some other reasons) for long tail chapter/verse queries. However, our tests indicated that the current version is preferred by users.
We have a sitemap ready to publish which includes a URL for every chapter/verse. We hope this will improve indexing of some of the more popular verses. However, Googlebot is going to see some duplicate content as it crawls that sitemap!
So the question is: is the sitemap a good idea realizing that we can't revert back to including each chapter/verse on its own unique page? We are also going to recommend that we create unique titles for each of the verses and pass a portion of the text from the verse into the meta description. Will this perhaps be enough to satisfy Googlebot that the pages are in fact unique? They certainly are from a user perspective.
Thanks all for taking the time!
-
Dave,
Thanks for the clarification. You're definitely in a rare circumstance as compared to most web sites.
In reality, since it's the Bible, there is going to be a duplicate content issue regardless, given how many sites currently and how many more will most likely publish the same content now and in the future. From Eternalministries.org to KingJamesBibleOnline.org, concordance.biblebrowser.com, and so many other sites are all offering this content.
If you can find a way to offer your content in a unique way, and within your own site, offer different versions of it (individual verses compared to entire chapters), then ideally yes, you'd want it all indexed.
How you do that without adding your own unique text above or below each page's direct biblical content is the issue though.
Given this challenge,this is why I offered the concept of not indexing variations. Even if you weren't hit by the Panda update, any time Google has to evaluate multiple pages across sites where the content is either identical or "mostly" identical, someone's content is going to suffer to one degree or another. Any time it's a conflict within a single site, some versions are going to be given less ranking value than others.
So unfortunately it's not a simple, straight forward situation where duplication avoidance can be guaranteed to provide the maximum reach, nor is there a simple way to boost multiple versions in a way to guarantee that they'll all be found, let alone show up above "competitor" sites.
This is why I initially offered what are essentially SEO best practices for addressing duplicate content.
If you don't want to lose the traffic you have now that come in by multiple means, the only other way to bolster what you've got already is to focus on high quality long term link building, and social media.
The link building would need to focus on obtaining high quality links pointing to deep content. (Specific chapter pages and specific verse pages), where the anchor text used in those links varies between chapter or verse specific words, broader bible related phrases, and the LDS brand.
On the other hand, by implementing canonical tags, you will definitely reduce at least a number of visits that currently come in by variation URLs. Will that be compensated for by an equal or greater number of visits to the new "preferred" URL? In this rather unique situation there's no way to truly know. It is a risk.
Which brings me back to the concept that you'd potentially be better off finding ways to add truly unique content around the biblical entries. It's the only on-site method I can think of that would allow you to continue to have multiple paths indexed. Combined with unique page Titles, chapter/verse targeted links and social media, it could very well make the difference.
With what, over 1100 chapters, and 31,000 verses, that's a lot of footwork. Then again, it's a labor of love, and every journey is made up of thousands of steps.
-
So you're saying it would not be a good idea to try and get every verse url listed in Google? Perhaps we could try adding a canonical tag to point the the chapter only? For example, browsing the site you can't actually navigate to http://lds.org/scriptures/nt/james/1.5?lang=eng. You can only navigate to /james/1?lang=eng. However, the other URLs exist when someone links externally to a specific chapter and verse. The code on the page will highlight the desired verse. In our example the entire chapter exists on its own url and the content is unique.
Your suggestion may work if we just canonicalize all those "verse" urls like /james/1.5?lang=eng and james/1.5-10?lang=eng to /james/1?lang=eng. Some of the more popular verses with great page authority could actually help prop up the rest of the content on the page.
My concern though is that MUCH of the scripture related traffic comes through queries of the exact chapter/verse reference. So I can see where having individual pages for each passage could be valuable for rankings. But that user experience is poor when someone wants to see a range of passages like ch 5 vs 1-4 or similar. So we are looking for the best way to get our URLs indexed and ranked as individual passages or ranges of passages that are popular on search engines.
I can tell you that this section was not hit by the Panda update. The content is not "thin" as could be the case if we put each verse on a single page.
The ?lang=eng parameter is how we handle language versions. We have the scriptures online in several languages. I'm sure there are better ways to handle that as well. Due to the size of the organization we're certainly trying to get the low hanging fruit out of the way first.
-
Dave,
You're facing a difficult challenge - satisfy the needs of SEO, or user experience. In light of all that Google has done going back to their May Day update last year and right through the Panda/Farmer update, duplicate content, as well as "thin" content, is more of a concern than ever.
Just having unique titles on each page is not enough. It's the entire weight of uniqueness.
Since you're not intending to go to individual pages for each verse, as long as you've got multiple methods of getting tocontent that is found by other methods, only one method should be designated as the primary search engine preferred method. All others should be blocked from being indexed.
From there, users can choose to explore other methods of finding content as they bookmark your site if they find it of help to their goals.
Unfortunately, this does of course, mean that you're going to end up with many less pages indexed. However every page that is indexed will become stronger in their individual rankings, and that in turn will boost all of the pages above them, and the entire site over time.
And here's another issue - when I go to any of the URLs you posted above, your site automatically tacks on "?lang=eng" using 301 Redirects. This means any inbound links you have pointing to the non-appended URLs are not providing maximum value to your site, since they point to pages designated as permanently moved.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Is there an percentage of duplicate content required before you should use a canonical tag?
Is there a percentage (approximate or exact) of duplicate content you should have before you use a canonical tag? Similarly how does Google handle canonical tags if the pages aren’t 100% duplicate? I've added some background and an example below; Nike Trainer model 1 – has an overview page that also links to a sub-page about cushioning, one about Gore-Tex and one about breathability. Nike Trainer model 2,3,4,5 – have an overview page that also links to sub-pages page about cushioning , Gore-Tex and breathability. In each of the sub-pages the URL is a child of the parent so a distinct page from each other e.g. /nike-trainer/model-1/gore-tex /nike-trainer/model-2/gore-tex. There is some differences in material composition, some different images and of course the product name is referred multiple times. This makes the page in the region of 80% unique.
Technical SEO | | punchseo0 -
Best Way to Handle Near-Duplicate Content?
Hello Dear MOZers, Having duplicate content issues and I'd like some opinions on how best to deal with this problem. Background: I run a website for a cosmetic surgeon in which the most valuable content area is the section of before/after photos of our patients. We have 200+ pages (one patient per page) and each page has a 'description' block of text and a handful of before and after photos. Photos are labeled with very similar labels patient-to-patient ("before surgery", "after surgery", "during surgery" etc). Currently, each page has a unique rel=canonical tag. But MOZ Crawl Diagnostics has found these pages to be duplicate content of each other. For example, using a 'similar page checker' two of these pages were found to be 97% similar. As far as I understand there are a few ways to deal with this, and I'd like to get your opinions on the best course. Add 150+ more words to each description text block Prevent indexing of patient pages with robots.txt Set the rel=canonical for each patient page to the main gallery page Any other options or suggestions? Please keep in mind that this is our most valuable content, so I would be reluctant to make major structural changes, or changes that would result in any decrease in traffic to these pages. Thank you folks, Ethan
Technical SEO | | BernsteinMedicalNYC0 -
How does Google view duplicate photo content?
Now that we can search by image on Google and see every site that is using the same photo, I assume that Google is going to use this as a signal for ranking as well. Is that already happening? I ask because I have sold many photos over the years with first-use only rights, where I retain the copyright. So I have photos on my site that I own the copyright for that are on other sites (and were there first). I am not sure if I should make an effort to remove these photos from my site or if I can wait another couple years.
Technical SEO | | Lina5000 -
Duplicated content in news portal: should we use noindex?
Hello, We have a news portal, and like other newspapers we have our own content and content from other contributors. Both our content and our contributors content can be found in other websites (we sell our content and they give theirs to us). In this regard, everything seems to work fine from the business and users perspective. The problem is that this means duplicated content... so my question is: "Should we add the noindex,nofollow" tag to these articles? Notice that there might be hundreds of articles everyday, something like a 1/3 of the website. I checked one newspaper which uses news from agencies, but they seem not to use any noindex tag. Not sure what others do. I would appreciate any opinion on that.
Technical SEO | | forex-websites0 -
Uservoice and Duplicate Page Content
Hello All, I'm having an issue where the my UserVoice account is creating duplicate page content (image attached). Any ideas on how to resolve the problem? A couple solutions we're looking into: moving the uservoice content inside the app, so it won't get crawled, but that's all we got for now. Thank you very much for your time any insight would be helpful. Sincerely,
Technical SEO | | JonnyBird1
Jon Birdsong SalesLoft duplicate duplicate0 -
Need help with Joomla duplicate content issues
One of my campaigns is for a Joomla site (http://genesisstudios.com) and when my full crawl was done and I review the report, I have significant duplicate content issues. They seem to come from the automatic creation of /rss pages. For example: http://www.genesisstudios.com/loose is the page but the duplicate content shows up as http://www.genesisstudios.com/loose/rss It appears that Joomla creates feeds for every page automatically and I'm not sure how to address the problem they create. I have been chasing down duplicate content issues for some time and thought they were gone, but now I have about 40 more instances of this type. It also appears that even though there is a canonicalization plugin present and enabled, the crawl report shows 'false' for and rel= canonicalization tags Anyone got any ideas? Thanks so much... Scott | |
Technical SEO | | sdennison0 -
Duplicate Content Issue
Very strange issue I noticed today. In my SEOMoz Campaigns I noticed thousands of Warnings and Errors! I noticed that any page on my website ending in .php can be duplicated by adding anything you want to the end of the url, which seems to be causing these issues. Ex: Normal URL - www.example.com/testing.php Duplicate URL - www.example.com/testing.php/helloworld The duplicate URL displays the page without the images, but all the text and information is present, duplicating the Normal page. I Also found that many of my PDFs seemed to be getting duplicated burried in directories after directories, which I never ever put in place. Ex: www.example.com/catalog/pdfs/testing.pdf/pdfs/another.pdf/pdfs/more.pdfs/pdfs/ ... when the pdfs are only located in a pdfs directory! I am very confused on how to fix this problem. Maybe with some sort of redirect?
Technical SEO | | hfranz0 -
Duplicate content issue
Hi everyone, I have an issue determining what type of duplicate content I have. www.example.com/index.php?mact=Calendar,m57663,default,1&m57663return_id=116&m57663detailpage=&m57663year=2011&m57663month=6&m57663day=19&m57663display=list&m57663return_link=1&m57663detail=1&m57663lang=en_GB&m57663returnid=116&page=116 Since I am not an coding expert, to me it looks like it is a URL parameter duplicate content. Is it? At the same time "return_id" would makes me think it is a session id duplicate content. I am confused about how to determine different types of duplicate content, even by reading articles on Seomoz about it: http://www.seomoz.org/learn-seo/duplicate-content. Could someone help me on how to recognize different types of duplicate content? Thank you!
Technical SEO | | Ideas-Money-Art0