Duplicate Content for index.html
-
In the Crawl Diagnostics Summary, it says that I have two pages with duplicate content which are:
I read in a Dream Weaver tutorial that you should name your home page "index.html" and then you can let www.mywebsite.com automatically direct the user to index.html. Is this a bug in SEOMoz's crawler or is it a real problem with my site?
Thank you,
Dan
-
The code should definitely go into the websites root directory's .htaccess, however .htaccess can be weird, a few days ago I ran into a similar issue with a client's website, and I was able to remedy the issue with a variation of the code.
index Redirect RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /([^/]+/)index.(php|html|htm|asp)\ HTTP/ RewriteRule ^(([^/]+/))index.(php|html|htm|asp)$ http://yoursite.com/$1 [R=301,L]
If you give me the URL for the site I will take a look at it and let you know what would be feasible.
-
Hi Daniel, can you share with us the URL of your site? We can take a look at it and give you a more precise answer that way. Thanks!
-
I eventually figured out that your method was a 301 redirect and I definitely broke my site trying to use the code you posted. .. haha. Its ok though. I just removed the code and it went back to normal. At first, I was editing the .htaccess file in the public_html folder which wasnt working. Then I tried the root folder for the site (I created the .htaccess file since it did not exist.) Neither of those worked. (I am using Bluehost so I do not think that I have root access and I am not sure if it is a Linux server or not.)
If there is an easy way to explain what I am doing wrong, please do so. Otherwise, I will use canonical.
Thanks for everything!
-
@Dan
Thanks for your reply. It seems like there are lots of different ways to solve this problem. I just watched this video on Matt Cutt's blog where he discusses his preference for 301 redirects over rel canonical tag.
Where would you say your solution fits in?
sorry about the delay of this response, i didn't realize the that you were asking me a question right away. When placing the code I provided in my previous answer this will cause a 301 perminant redirect to the original URL. That's actually what the
[R=301,L]
portion of the code is stating (R) redirect (301) status is referring to. After reviewing the Matt Cutts video, I realize that I should have asked you if you were operating on a Linux server that you had root access to. We actually utilize both redirects and canonical tags since it was recommended by the on-page optimization reports. Heck Google uses them, I would assume because it's easier for the user to be referred to a single page URL. Obviously though if you don't have server header access, and are not familiar with .htaccess (you can accidentally break your site) then the canonical solution is appropriate
-
Josh,
Thanks for your reply. It seems like there are lots of different ways to solve this problem. I just watched this video on Matt Cutt's blog where he discusses his preference for 301 redirects over rel canonical tag.
Where would you say your solution fits in?
Thanks,
Dan -
use the link rel tag for all my homepages for the http://www.yoursite.com
-
Odd enough I just recently answered this question. The SEOmoz crawler is correct, because without a redirect you will be able to access both versions of the page in your browser.
To resolve this issue simply rewrite the index.html to the root url by placing the following code into your .htaccess file into your root directory.
Options +FollowSymlinks RewriteEngine on
Index Rewrite RewriteRule ^index.(htm|html|php) http://www.yoursite.com/ [R=301,L] RewriteRule ^(.*)/index.(htm|html|php) http://www.yoursite.com/$1/ [R=301,L]
You can also do the same with the index file in any subdirectories that you might create, by simply placing a .htaccess into those sub directories and using variations of the above code. This is how you create nice tight URLs without the duplicate content issue that look like - http://www.semclix.com/design/business/
-
It is a problem which you need to fix. You need to canonicalize your pages.
Those are all various URLs which most likely lead to the same web page. I say "most likely" because these URLs can actually lead to different pages.
You need to tell crawlers and search engines how you organize your site. There are several ways to achieve canonicalization. The method I prefer is to add the following line of code to each page:
The URL provided should be the preferred URL for your page.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
How would a redesign, content update and URL change affect ranking?
Hi guys, I have a question that I suspect there is no simple true or false answer to, but perhaps someone has done the same thing as we're pondering wether or not to do? We're taking over an existing site that ranks very well on all the important keywords and is obviously very well liked by Google. The site is today hosted on a sub-domain (xxx.domain.com). When taking over, we'll have to redesign the site and recreate most of the content on the site (unique). The site structure, URLs, incoming links etc. will remain exactly the same. Since we are recreating the site, we also have the opportunity to move the site off the sub-domain and on to the main domain (domain.com/xxx - 85/100 Moz rank) and do a 301 Permanent Redirect on all old URLs. Our long-time experience is that content on the main domain, ranks way better than the sub-domain. The big question is wether or not Google will punish us for both changing the content and the location of the site at the same time? Cheers!
Web Design | | mattbs
Matt0 -
Best Practices for Leveraging Long Tail Content & Gated Content
Our B2B site has a lot of of long form content (e.g., transcriptions from presentations and webinars). We'd like to leverage the long tail SEO traffic driven to these pages and convert those visitors to leads. Essentially, we'd like Google to index all this lengthy, keyword-rich content AND we'd like to put up a read gate that requires users to register before viewing the full article. This is a B2B site, and the goal is to generate leads. Some considerations and questions: How much of the content to share before requiring registration? Ask too soon and it's a terrible user experience, give too much away and our business objectives are not met. Design-wise, what are good ways to do this? I notice Moz uses a "teaser" to block Mozinar content, and I've seen modals and blur bars on other sites. Any gotchas that Google doesn't like that we should be aware of? Trying to avoid anything that might seem like cloaking. Is it better to split the content across several pages (split a 10K word doc across 10 URLs and include a read gate on each) or keep to one page? Thank you!
Web Design | | Allie_Williams0 -
How to deal with 100s of Wordpress media link pages, containing images, but zero content
I have a Wordpress website with well over 1000 posts. I had a SEO audit done and it was highlighted that every post had clickable images. If you click the image a new webpage opens containing nothing but the image. I was told these image pages with zero content are very bad for SEO and that I should get them removed. I have contacted several Wordpress specialists on People Per Hour. I have basically been offered two solutions. 1 - redirect all these image pages to a 404, so they are not found by Google 2 - redirect each image page to the main post page the image is from. What's my best option here? Is there a better option? I don't care if these pages remain, providing they are not crawled by Google and classified as spam etc. All suggestions greatly received!
Web Design | | xpers0 -
Google tag manager on blocked beta site - will it phone home to Google and cause site to get indexed?
We want to develop a beta site, in a directory with the robots.txt blocking bots. We want to include the Google Tag Manager tags and event layer tracking code on this beta site. My question is that by including the Google Tag Manager code, that phones home to Google, will it cause Google to index this beta site when we don't want it indexed?
Web Design | | CFSSEO0 -
Google text-only vs rendered (index and ranking)
Hello, can someone please help answer a question about missing elements from Google's text-only cached version.
Web Design | | cpawsgo
When using JavaScript to display an element which is initially styled with display:none, does Google index (and most importantly properly rank) the elements contents? Using Google's "cache:" prefix followed by our pages url we can see the rendered cached page. The contents of the element in question are viewable and you can read the information inside. However, if you click the "Text-only version" link on the top-right of Google’s cached page, the element is missing and cannot be seen. The reason for this is because the element is initially styled with display:none and then JavaScript is used to display the text once some logic is applied. Doing a long-tail Google search for a few sentences from inside the element does find the page in the results, but I am not certain that is it being cached and ranked optimally... would updating the logic so that all the contents are not made visible by JavaScript improve our ranking or can we assume that since Google does return the page in its results that everything is proper? Thank you!0 -
Why is Google Webmaster suddenly started showing hundreds of HTML Improvements
Why is Google Webmaster suddenly started showing hundreds of HTML Improvements I mean to ask, my hundreds pages are been shown as duplicate - despite canonical marked correctly Below are sample url - which are been crawled in own way. I have rechecked canonical tag - which is correct as URL - 1, in all 3 url Do i need to worry about anything or shall i presume its a flaw from search engine to report this as an issue (This only pertain to Forum section) http://www.mycarhelpline.com/index.php?option=com_easydiscuss&view=post&id=1683&Itemid=78 http://www.mycarhelpline.com/?id=1683&Itemid=78&option=com_easydiscuss&view=post http://www.mycarhelpline.com/index.php?option=com_easydiscuss&view=post&id=1683 ps - i know these are dynamic url and not sef friendly url, but its been 3 yrs and , due to our ignorance and site builder took advantage of this. now - nothing can be done much to make them sef friendly as site has several thousand pages and touchwood - these dynamic url are not impacting much
Web Design | | Modi0 -
How to handle International Duplicated Content?
Hi, We have multiple international E-Commerce websites. Usually our content is translated and doesn't interfere with each other, but how do search engines react to duplicate content on different TLDs? We have copied our Dutch (NL) store for Belgium (BE) and i'm wondering if we could be inflicting damage onto ourselves... Should I use: for every page? are there other options so we can be sure that our websites aren't conflicting? Are they conflicting at all? Alex
Web Design | | WebmasterAlex0 -
Penalized by duplicate content?
Hello, I am in a very weird position. I am managing a website(EMD) which a part of it dynamically creates pages. The former webmaster who create this system though that this would help with SEO but I dought! The thing is that now the site has about 1500 pages which must look duplicate but are they really duplicate? Each page has a unique URL but the content is pretty much the same: one image and a different title with 5-8 words. There is more: All these pages are not accessible by the users but only for the crawlers!!! This URL machine is a part of a php - made photo gallery which i never understood the sense of it! The site overall is not performing very well in SERP, especially after Penguin, but judging by the link profile, the Domain authority, construction (ok besides that crazy photo gallery) and content, it never reached the position it should have in the past. The majority of these mysterious pages - and mostly their images - are cached by Google and some of them are in top places to some SERP - the ones that match the small title on page - but the numbers are poor, 10 - 15 clicks per month. Are these pages considered as duplicated, although they are cached, and is it safe for the site just to remove 1500 at once? The seomoz tools have pointed some of them as dups but the majority not! Can these pages impact the image of the whole site in search engines?( drop in Google and has disappeared from Yahoo and Bing!) Do I also have to tell Google about the removal? I have not seen anything like it before so any comment would be helpful! Thank you!
Web Design | | Tz_Seo0