WordPress Duplicate Content Issues
-
Everyone knows that WordPress has some duplicate content issues with tags, archive pages, category pages etc...
My question is, how do you handle these issues?
Is the smart strategy to use robots meta and add no follow/ no index category pages, archive pages tag pages etc?
By doing this are you missing out on the additional internal links to your important pages from you category pages and tag pages?
I hope this makes sense.
Regards,
Bill
-
Hey Bill
I like to start with this standard setup (image/chart from my wordpress post on moz);
Pages, Posts, Categories - Index
Tags, Dated Archives, Subpages, Author Archives - noindex
You can check out the full post - I will be updating the Yoast Screenshots very soon!
-Dan
-
Thanks for article,
Now 2 years ahead, are there any important updates for preventing duplicate content/titles?
-
Most of the Plugins for wordpress use canonical urls.
-
Unless I'm missing something here, wouldn't it be easier to set the canonical tag for the main post? There are also plugins like SEO Ultimate that handle this automatically.
-
I posted this article I wrote the other day for someone asking a similar question.
With the Yoast SEO Plugin I no-index everything except Categories. You can see how I set mine up under section 3. Indexation.
Here is the original question that Sha submitted:
http://www.seomoz.org/q/what-is-with-wordpress-dupe-issues -
Bill-
There are several SEO plugs available for WP that will handle these issues. Yes, you are right that adding "noindex" will be beneficial on tag, category, and archive pages. The idea here is avoiding duplicate content issues. BTW, check out: Yoast SEO for Wordpress.
Here is how the values for the robots meta tag work:
- noindex will keep a page from being crawled
- nofollow will prevent a page's links from being followed
I agree with noindex'ing these pages; though I would argue that a nofollow is still worth leaving out. If these pages have any juice you want to allow this to flow to the other links on the page.
-
The WP on my blog is set up as follows (this is a blog that gets between four and ten short posts per day - about two to four sentences, each post linking to an article or other content on a topic-related website)
Homepage: Full text of the most recent 25 posts are displayed. Pagination pages are not indexed (blocked by robots.txt).
Post Pages: Full text is displayed and the title plus a few words of 20 related posts are displayed.
Category Pages: I have over 100 categories and each post is placed into at least two categories (one by location and one by topic). Some posts go into three or four categoreis - sometimes more. Each category page displays the full text of the most recent 25 posts. Categories do not have pagination pages (blocked by robots.txt).
All of the above pages are fully indexed and a long list of category pages appears in the left-side navigation. I don't use tag pages or archive pages. There is a lot of dupe content in this system but so far I am lucky that it does not cause a problem. The category pages pull a lot of organic search traffic.
In January of each year I delete all of the posts that are over a year old. Before doing that I identify those that are pulling reasonable traffic and either redirect them to a permanent page about same topic, write an article about that topic and redirect, or recycle that post. All the rest are redirected to the homepage of the blog.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Duplicate Content
I am trying to get a handle on how to fix and control a large amount of duplicate content I keep getting on my Moz Reports. The main area where this comes up is for duplicate page content and duplicate title tags ... thousands of them. I partially understand the source of the problem. My site mixes free content with content that requires a login. I think if I were to change my crawl settings to eliminate the login and index the paid content it would lower the quantity of duplicate pages and help me identify the true duplicate pages because a large number of duplicates occur at the site login. Unfortunately, it's not simple in my case because last year I encountered a problem when migrating my archives into a new CMS. The app in the CMS that migrated the data caused a large amount of data truncation Which means that I am piecing together my archives of approximately 5,000 articles. It also means that much of the piecing together process requires me to keep the former app that manages the articles to find where certain articles were truncated and to copy the text that followed the truncation and complete the articles. So far, I have restored about half of the archives which is time-consuming tedious work. My question is if anyone knows a more efficient way of identifying and editing duplicate pages and title tags?
Technical SEO | | Prop650 -
Getting high priority issue for our xxx.com and xxx.com/home as duplicate pages and duplicate page titles can't seem to find anything that needs to be corrected, what might I be missing?
I am getting high priority issue for our xxx.com and xxx.com/home as reporting both duplicate pages and duplicate page titles on crawl results, I can't seem to find anything that needs to be corrected, what am I be missing? Has anyone else had a similar issue, how was it corrected?
Technical SEO | | tgwebmaster0 -
Duplicate content warning for a hierarchy structure?
I have a series of pages on my website organized in a hierarchy, let's simplify it to say parent pages and child pages. Each of the child pages has product listings, and an introduction at the top (along with an image) explaining their importance, why they're grouped together, providing related information, etc.
Technical SEO | | westsaddle
The parent page has a list of all of its child pages and a copy of their introductions next to the child page's title and image thumbnail. Moz is throwing up duplicate content warnings for all of these pages. Is this an actual SEO issue, or is the warning being overzealous?
Each child page has tons of its own content, and each parent page has the introductions from a bunch of child pages, so any single introduction is never the only content on the page. Thanks in advance!0 -
Image centric site and duplicate content issues
We have a site that has very little text, the main purpose of the site is to allow users to find inspiration through images. 1000s of images come to us each week to be processed by our editorial team, so as part of our process we select a subset of the best images and process those with titles, alt text, tags, etc. We still host the other images and users can find them through galleries that link to the process and unprocessed image pages. Due to the lack of information on the unprocessed images, we are having lots of duplicate content issues (The layout of all the image pages are the same, and there isn't any unique text to differentiate the pages. The only changing factor is the image itself in each page) Any suggestions on how to resolve this issue, will be greatly appreciated.
Technical SEO | | wedlinkmedia0 -
Https Duplicate Content
My previous host was using shared SSL, and my site was also working with https which I didn’t notice previously. Now I am moved to a new server, where I don’t have any SSL and my websites are not working with https version. Problem is that I have found Google have indexed one of my blog http://www.codefear.com with https version too. My blog traffic is continuously dropping I think due to these duplicate content. Now there are two results one with http version and another with https version. I searched over the internet and found 3 possible solutions. 1 No-Index https version
Technical SEO | | RaviAhuja
2 Use rel=canonical
3 Redirect https versions with 301 redirection Now I don’t know which solution is best for me as now https version is not working. One more thing I don’t know how to implement any of the solution. My blog is running on WordPress. Please help me to overcome from this problem, and after solving this duplicate issue, do I need Reconsideration request to Google. Thank you0 -
Fix duplicate content caused by tags
Hi everyone, TGIF. We are getting hundreds of duplicate content errors on our WP site by what appears to be our tags. For each tag and each post we are seeing a duplicate content error. I thought I had this fixed but apparently I do not. We are using the Genesis theme with Yoast's SEO plugin. Does anyone have the solution to what I imagine is this easy fix? Thanks in advance.
Technical SEO | | okuma0 -
Duplicate content issue index.html vs non index.html
Hi I have an issue. In my client's profile, I found that the "index.html" are mostly authoritative than non "index.html", and I found that www. version is more authoritative than non www. The problem is that I find the opposite situation where non "index.html" are more authoritative than "index.html" or non www more authoritative than www. My logic would tell me to still redirect the non"index.html" to "index.html". Am I right? and in the case I find the opposite happening, does it matter if I still redirect the non"index.html" to "index.html"? The same question for www vs non www versions? Thank you
Technical SEO | | Ideas-Money-Art0 -
Why are my pages getting duplicate content errors?
Studying the Duplicate Page Content report reveals that all (or many) of my pages are getting flagged as having duplicate content because the crawler thinks there are two versions of the same page: http://www.mapsalive.com/Features/audio.aspx http://www.mapsalive.com/Features/Audio.aspx The only difference is the capitalization. We don't have two versions of the page so I don't understand what I'm missing or how to correct this. Anyone have any thoughts for what to look for?
Technical SEO | | jkenyon0