Duplicate Content for index.html
-
In the Crawl Diagnostics Summary, it says that I have two pages with duplicate content which are:
I read in a Dream Weaver tutorial that you should name your home page "index.html" and then you can let www.mywebsite.com automatically direct the user to index.html. Is this a bug in SEOMoz's crawler or is it a real problem with my site?
Thank you,
Dan
-
The code should definitely go into the websites root directory's .htaccess, however .htaccess can be weird, a few days ago I ran into a similar issue with a client's website, and I was able to remedy the issue with a variation of the code.
index Redirect RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /([^/]+/)index.(php|html|htm|asp)\ HTTP/ RewriteRule ^(([^/]+/))index.(php|html|htm|asp)$ http://yoursite.com/$1 [R=301,L]
If you give me the URL for the site I will take a look at it and let you know what would be feasible.
-
Hi Daniel, can you share with us the URL of your site? We can take a look at it and give you a more precise answer that way. Thanks!
-
I eventually figured out that your method was a 301 redirect and I definitely broke my site trying to use the code you posted. .. haha. Its ok though. I just removed the code and it went back to normal. At first, I was editing the .htaccess file in the public_html folder which wasnt working. Then I tried the root folder for the site (I created the .htaccess file since it did not exist.) Neither of those worked. (I am using Bluehost so I do not think that I have root access and I am not sure if it is a Linux server or not.)
If there is an easy way to explain what I am doing wrong, please do so. Otherwise, I will use canonical.
Thanks for everything!
-
@Dan
Thanks for your reply. It seems like there are lots of different ways to solve this problem. I just watched this video on Matt Cutt's blog where he discusses his preference for 301 redirects over rel canonical tag.
Where would you say your solution fits in?
sorry about the delay of this response, i didn't realize the that you were asking me a question right away. When placing the code I provided in my previous answer this will cause a 301 perminant redirect to the original URL. That's actually what the
[R=301,L]
portion of the code is stating (R) redirect (301) status is referring to. After reviewing the Matt Cutts video, I realize that I should have asked you if you were operating on a Linux server that you had root access to. We actually utilize both redirects and canonical tags since it was recommended by the on-page optimization reports. Heck Google uses them, I would assume because it's easier for the user to be referred to a single page URL. Obviously though if you don't have server header access, and are not familiar with .htaccess (you can accidentally break your site) then the canonical solution is appropriate
-
Josh,
Thanks for your reply. It seems like there are lots of different ways to solve this problem. I just watched this video on Matt Cutt's blog where he discusses his preference for 301 redirects over rel canonical tag.
Where would you say your solution fits in?
Thanks,
Dan -
use the link rel tag for all my homepages for the http://www.yoursite.com
-
Odd enough I just recently answered this question. The SEOmoz crawler is correct, because without a redirect you will be able to access both versions of the page in your browser.
To resolve this issue simply rewrite the index.html to the root url by placing the following code into your .htaccess file into your root directory.
Options +FollowSymlinks RewriteEngine on
Index Rewrite RewriteRule ^index.(htm|html|php) http://www.yoursite.com/ [R=301,L] RewriteRule ^(.*)/index.(htm|html|php) http://www.yoursite.com/$1/ [R=301,L]
You can also do the same with the index file in any subdirectories that you might create, by simply placing a .htaccess into those sub directories and using variations of the above code. This is how you create nice tight URLs without the duplicate content issue that look like - http://www.semclix.com/design/business/
-
It is a problem which you need to fix. You need to canonicalize your pages.
Those are all various URLs which most likely lead to the same web page. I say "most likely" because these URLs can actually lead to different pages.
You need to tell crawlers and search engines how you organize your site. There are several ways to achieve canonicalization. The method I prefer is to add the following line of code to each page:
The URL provided should be the preferred URL for your page.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Multiple sites using same text - how to avoid Google duplicate content penalty?
Hi Mozers, my client located in Colorado is opening a similar (but not identical) clinic in California. Will Google penalize the new California site if we use text from our website that features his Colorado office? He runs the clinic in CO and will be a partner of the clinic in CA, so the CA clinic has his "permission" to use his original text. Eventually he hopes to go national, with multiple sites utilizing essentially the same text. Will Google penalize the new CA site for plagiarism and/or duplicate content? Or is there a way to tell Google, "hey Google, this new clinic is not ripping off my text"?
Web Design | | CalamityJane770 -
How to prevent development website subdomain from being indexed?
Hello awesome MOZ Community! Our development team uses a sub-domain "dev.example.com" for our SEO clients' websites. This allows changes to be made to the dev site (U/X changes, forms testing, etc.) for client approval and testing. An embarrassing discovery was made. Naturally, when you run a "site:example.com" the "dev.example.com" is being indexed. We don't want our clients websites to get penalized or lose killer SERPs because of duplicate content. The solution that is being implemented is to edit the robots.txt file and block the dev site from being indexed by search engines. My questions is, does anyone in the MOZ Community disagree with this solution? Can you recommend another solution? Would you advise against using the sub-domain "dev." for live and ongoing development websites? Thanks!
Web Design | | SproutDigital0 -
Should Blog Category Archive URLs be Set to "No-Index" in Wordpress?
It appears that Google Webmaster Tools is listing about 120 blog archives URLs in Google Index>Index Status that should not be listed. Our site map contains 650 pages, but Google shows 860. Pages like: <colgroup><col width="464"></colgroup>
Web Design | | Kingalan1
| http://www.nyc-officespace-leader.com/blog/category/manhattan-office-space | With Titles Like: <colgroup><col width="454"></colgroup>
| Manhattan Office Space Archives - Metro Manhattan Office Space | Are listed when in the Rogerbot crawl report for the site. How can we remove such pages from Google Webmaster Tools, Index Status? Our site map shows about 650 pages, yet Google show these extra pages. We would prefer that they not be indexed. Note that these pages do not appear when we run a site:www.nyc-officespace-leader.com search. The site has suffered a drop in ranking since May and we feel it prudent to keep Google from indexing useless URLs. Before May 650 pages showed on the Webmaster Tools Index status, and suddenly in early June when we upgraded the site the index grew by about 175 pages. I suspect the 120 blog archives URLs may have something to do with it. How can we get them removed? Can we set them to "No-Index", or should the robot text be used to remove them? Or can some type of removal request be made to Google? My developers have been struggling with this issue since early June. The bloat on the site is about 175 URLs not on the site map. Is there any go to authority on this issue (it is apparently rather complicated) that can provide a definitive answer? Thanks!!
Alan0 -
/index.php/ What is its purpose and does it hurt SEO?
Hello Moz Forum, I am still in the process of cleaning up the lack of attention to detail and betrayal set by our soon to be ex-SEO company. You can see a previous question I ask regarding betrayal SEO. I am analyzing every page on our website and i am noticing this /index.php/ in most of our URLs. We want to leave our expression engine cms and convert to wordpress. I have been reading about index.php but most of it is over my head for now. What does concern me is the "layman's" findings i am seeing through analytics. Our main domain has two URLs. one that ends in .com and the other ends in .com/index.php/ The one that ends in .com has a higher page rank than the ladder. And there are other internal pages with the same two variations. Can someone please explain to me what is /index.php/ ? what are the benefits of it? what are the cons? What will happen to my site once we move to wordpress? As always, your comments and suggestions are greatly appreciated.
Web Design | | CamiloSC0 -
Joomla! Site Returning 12000+ Duplicate Content Errors! W Image
(I do award "Good Answer" and "thumbs up" to responses as earned) I have tried to ask this question previously (maybe not correctly). I have a client that I am doing the on and offsite optimization and the MOZ report is kicking back major errors. I have examples below. They all seem to relate directly to rokecwid and ECWID. Is there ANY solution to fix this? Is this hurting the rankings Since I didn't build the site, I am having to tell the website company what to do when I need changes made to code, etc... I am also not very proficient with Joomla! and my web engineer is one of those closet coders (the best kind to have) and doesn't communicate in a way that a "layman" could understand. He pointed out several issues with the HTML but I don't think that is related to this below. Can anyone tell me what to tell the web company that built this site to get rid of these errors? A very small sample of the urls w errors:
Web Design | | Atlanta-SMO
http://www.metroboltmi.com/shop-spareparts?
Itemid=218&option=com_rokecwid&view=ecwid&ecwid_category_id=3560097
1 14 1 http://www.metroboltmi.com/shop-spareparts?
Itemid=218&option=com_rokecwid&view=ecwid&ecwid_category_id=3560098
1 1 0 http://www.metroboltmi.com/shop-spareparts?
Itemid=218&option=com_rokecwid&view=ecwid&ecwid_category_id=3560099
1 14 1 http://www.metroboltmi.com/shop-spareparts?
Itemid=218&option=com_rokecwid&view=ecwid&ecwid_category_id=3560100
1 14 1 SEOMOZErrors_zps3a1ce2a2.png0 -
Converting to WP - Should I add .html or 301?
Moving my site to WP and the old url structure pages end in ".html". I have seen there are plugins that allow you to add .html to the WP pages to preserve links. I am hosting on Synthesis and they do not support htaccess, although you can submit 301 re-directs through the help ticket system. My question is what is the best way to proceed? I have read that 301s "leak" some link juice, but I sure do like those pretty urls. Advice appreciated!
Web Design | | Chris6611 -
How to provide product information without duplicate content?
Hi all, I have an ecommerce website with approx 400 products, these are quite technical products and i use to have helpful information about the products on the product pages... My SEO company told me to remove all this, as i had lots of duplicate content issues... I have since had content writers re-write all product descriptions (about 250 words per product)... and now i am trying to figure out a way of getting the "helpful" information back on but in some kind of dynamic way... There is basically about 5 or 6 blocks of information, that can be added to each product page, these overlap hundreds of products. i was thinking of perhaps creating a separate static page for each block of useful information, and putting links on the product pages to this... however, ideally i would prefer to not keep sending customers to other pages... so wanted to see if others had come across similar issues themselves and how they went about having this "content" available to the user but in such a way it was not duplicate content... Please note using images would not be any good here, as the content varies in size but most of it is text based... regards James
Web Design | | isntworkdull1 -
Duplicate content on mobile sites
Hi Guys We are launching a mobile webshop later this year and have decided to use a subdomain for this. (m.domainname.xx). The content will be more or less identical with the one on the standard desktop site (domainname.xx), but im struggeling to find out if this will create dipplicate content between the mobile and desktop site. Does anyone have a solid answer for this one?
Web Design | | AndersDK0