How to fix duplicate content for homepage and index.html
-
Hello,
I know this probably gets asked quite a lot but I haven't found a recent post about this in 2018 on Moz Q&A, so I thought I would check in and see what the best route/solution for this issue might be. I'm always really worried about making any (potentially bad/wrong) changes to the site, as it's my livelihood, so I'm hoping someone can point me in the right direction.
Moz, SEMRush and several other SEO tools are all reporting that I have duplicate content for my homepage and index.html (same identical page).
According to Moz, my homepage (without index.html) has PA 29 and index.html has PA 15. They are both showing Status 200. I read that you can either do a 301 redirect or add rel=canonical
I currently have a 301 setup for my http to https page and don't have any rel=canonical added to the site/page. What is the best and safest way to get rid of duplicate content and merge the my non index and index.html homepages together these days? I read that both 301 and canonical pass on link juice but I don't know what the best route for me is given what I said above.
Thank you for reading, any input is greatly appreciated!
-
OK, Paul, I hear what you are saying. It's a very open and obvious diss.
I'm not sure what you are saying makes any difference to the argument that the canonical way here is not the way to go. I was explaining in the simplest way, I would not want, and I'm sure you would not want either, a live page like this - the home page, live and canonicalised.
(It's a given that the canonical works like a 301, passing link juice to the preferred version.)
So thanks but it makes no difference - delete & 301 every time.
Google is heightening its distrust of canonicals - the new Seach Console tool reveals which pages are the preferred canonical and it's something of a surprise to SEOs!
If you feel like playing top trumps again then why not PM me? - it's so much better and the uninitiated do not need to see it!
Cheers Nigel
-
A proper canonical tag does a lot more than "just be telling Google not to rank it" When used properly (i.e. pages that truly do contain the same content), the canonicalised page passes its ranking signals back to the canonical source.
I agree with Kristina - while a 301 would be preferable (it's a directive, while canonical tags are taken as suggestions), a canonical tag would be vastly better than not doing anything about the issue. At least until the dev can get the problem with the 301-redirect properly resolved.
Paul
-
It's best practice to redirect, but if that's not an option, the canonical route should help the problem a lot! You'll probably lose some link equity with this route, but it should clear up duplicate content issues from Google's side.
-
Hi Dre
If you just do a canonical then the page will still be live, you will just be telling Google not to rank it. Best practice is to remove it all together and 301. It is bad practice having more than one version of your home page, (any page) live!
Regards Nigel
-
Thank you so much for all the responses. So it sounds like 301 redirect through htaccess is the way to go. What is the difference between using the 301 through htaccess vs using rel=canonical in my case? Does the 301 provide better link juice vs rel=canonical or is canonical just not applicable in this case? Thanks for all the replies and helpful suggestions again!
EDIT: I spoke to my developer (who is hosting and maintaining my site now).. he said he tried to do 301 through htaccess but it seems to be crashing the site (and trust me he is very good at what he does). Part of the problem is that my site is VERY old (originally build about 10 years ago and NOT updated once since).. he has been slowly updating and cleaning up the site slowly and he will try to figure out why the 301 is crashing the site and not working but in the mean time how safe is it to use rel=canonical instead of a 301?
Thanks again!
-
Hi dre
Your site really shouldn't be generating an index.html in the first place but if it is you must make sure that there is a 301 in the htaccess file sending all traffic to the single homepage URL as Lynn correctly points out this will be a permanent redirect.
It is very simple to do. Both versions are treated as separate pages (as http and https) so you are essentially showing a duplicate site to Google so your rankings will be terrible until you change.
Regards Nigel
-
Hello there,
You can use .htaccess URL rewrite to remove all the .html from your URL, here's the rewrite rules.
RewriteEngine On
RewriteRule ^index.html$ / [R=301,L]
RewriteRule ^(.*)/index.html$ /$1/ [R=301,L]Once you added this rules you should also fix all your internal links make sure they link to the URL without .html
Hope this helps,
Joseph Yap
-
"I currently have a 301 setup for my http to https page" - great! Also, you should check if your inner pages redirecting from HTTP-versions to HTTPS too.
index.html should redirect to the homepage main version with 301 Permanent Redirect.
-
Google consider HTTP and HTTPS as two separate protocols. Since the contents are same on both versions, google bots consider it as duplicate content. Adding a canonical URL will solve this problem. If you have any doubts, feel free to ask.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Ecommerce product page duplicate content
Hi, I know this topic has been covered in the past but I haven't been able to find the answers to this specific thing. So let's say on a website, all the product pages contain partial duplicate content - i.e. this could be delivery options or returning policy etc. Would this be classed as duplicate content? Or is this something that you would not get concerned about if it's let's say 5-10% of the content on the page? Or if you think this is something you'd take into consideration, how would you fix it? Thank you!
On-Page Optimization | | MH-UK0 -
Duplicate content issue, across site domains (blogging)
Hi all, I've just come to learn that a client has been cross-posting their blog posts to other blogs (on higher quality domains, in some cases). For example - this is the same post on 3 different blogs. http://thebioethicsprogram.wordpress.com/2014/06/30/how-an-irb-could-have-legitimately-approved-the-facebook-experiment-and-why-that-may-be-a-good-thing/
On-Page Optimization | | ketanmv
http://blogs.law.harvard.edu/billofhealth/2014/06/29/how-an-irb-could-have-legitimately-approved-the-facebook-experiment-and-why-that-may-be-a-good-thing/
http://www.thefacultylounge.org/2014/06/how-an-irb-could-have-legitimately-approved-the-facebook-experimentand-why-that-may-be-a-good-thing.html
And, sometimes a 4th time, on an NPR website. I'm assuming this is doing no one any favors and Harvard or NPR is going to earn the rank most every time. I'm going to encourage them to publish only fresh content on their real blog, would you agree? Can this actually harm the ranking of their blog and website - should we delete the old entries when migrating the blog? They are going to move their Wordpress Blog to hosting on their real domain soon:
http://www.bioethics.uniongraduatecollege.edu/news/ The current set up is not adding any value to their domain. Thank you for any advice! Ketan0 -
Google indexing
Hi In my site I have 2 blogs, the first blog is a standard blog, every post is informative and over 6oo words with pictures and all of them are keyworded. The second blog is basically a journal of bike rides i go on, with a picture and about 100 - 300 word writeup. I use a portfolio plugin to get this online. My question is should I noindex nofollow all of these posts. Im not sure if google will see it as a lot of uninformative noncene, I dont write these as blog posts they are a journal I post 1 or 2 a day. What is the normal practice for this... they are not keyworded or seo'd I dont want them to affect my seo or rankings. Thanks Chris
On-Page Optimization | | mrcsleonard0 -
Duplicate content on partner site
I have a trade partner who will be using some of our content on their site. What's the best way to prevent any duplicate content issues? Their plan is to attribute the content to us using rel=author tagging. Would this be sufficient or should I request that they do something else too? Thanks
On-Page Optimization | | ShearingsGroup0 -
Form Only Pages Considered No Content/Duplicate Pages
We have a lot of WordPress sites with pages that contain only a form. The header, sidebar and footer content is the same as what's one other pages throughout the site. Each form page has a unique page title, meta description, form title and questions but the form title, description and questions add up to probably less than 100 words. Are these form pages negatively affecting the rankings of our landing pages or being viewed as duplicate or no content pages?
On-Page Optimization | | projectassistant0 -
Best practice to solve this Unique duplicate page content issue?
I just got Seomoz Pro (it's awesome!), and when I did a campaign for my website I discovered that I have a big issue with duplicate page content (as well as titles). The Crawl Diagnostics Summary told me I have 196 Crawl Errors Found (I had a total of 362 pages crawled on my site), and as much as 160 of these was duplicate page content. Which to me sounds like a big problem, correct me if I'm wrong (I'm very new to SEO). So our website is an ecommerce that sells greeting cards. The unique part about our platform is that we offer the customer to make a customization of the cards.
On-Page Optimization | | danielpett
Let me walk you through each step a customer takes so you fully understand: They find a card they like and visit the product page of that card (just like on any ecommerce store.) They then decide they want to buy it. There is no "Add to cart" button, they will instead click on a "customize the card" button. 3) This takes them to a step by step process of customizing the card. They change the name on the front of the greeting card so it says for example: "Happy Birthday Katy!". And then adds a personal text on the inside of the card. They then add an delivery address and when it should be delivered. After that they proceed to checkout and it's all done. This is my website (it's in Swedish): loveday.se - it will take you to a product page so that you can click the green button and see what I mean with the customization pages. Hopefully it helps even though it's in Swedish. My issue starts at the customization part of the site (the bolded step above), as I can see the permalinks in the diagnostics I got.
This step-by-step process looks exactly the same with every card in the store. Same call-to-action headline, same descriptive text etc. The only difference is a JPEG-file with the unique greeting card design. So, what is your take on this? Let me know if I was unclear about something. Any help or advice is greatly appreciated.0 -
Duplicate content issues with products page 1,2,3 and so on
Hi, we have this products page, for example of a landing page:
On-Page Optimization | | Essentia
http://www.redwrappings.com.au/australian-made/gift-ideas and then we have the link to page 2,3,4 and so on:
http://www.redwrappings.com.au/products.php?c=australian-made&p=2
http://www.redwrappings.com.au/products.php?c=australian-made&p=3 In SEOmoz, they are recognized as duplicate page contents.
What would be the best way to solve this problem? One easy way i can think of is to nominate the first landing page to be the 'master' page (http://www.redwrappings.com.au/australian-made/gift-ideas), and add canonical meta links on page 2,3 and so on. Any other suggestions? Thanks 🙂0 -
Filtered Navigation, Duplicate content issue on an Ecommerce Website
I have navigation that allows for multiple levels of filtering. What is the best way to prevent the search engine from seeing this duplicate content? Is it a big deal nowadays? I've read many articles and I'm not entirely clear on the solution. For example. You have a page that lists 12 products out of 100: companyname.com/productcategory/page1.htm And then you filter these products: companyname.com/productcategory/filters/page1.htm The filtered page may or may not contain items from the original page, but does contain items that are in the unfiltered navigation pages. How do you help the search engine determine where it should crawl and index the page that contains these products? I can't use rel=canonical, because the exact set of products on the filtered page may not be on any other unfiltered pages. What about robots.txt to block all the filtered pages? Will that also stop pagerank from flowing? What about the meta noindex tag on the filitered pages? I have also considered removing filters entirely, but I'm not sure if sacrificing usability is worth it in order to remove duplicate content. I've read a bunch of blogs and articles, seen the whiteboard special on faceted navigation, but I'm still not clear on how to deal with this issue.
On-Page Optimization | | 13375auc30