Duplicate Content for index.html
-
In the Crawl Diagnostics Summary, it says that I have two pages with duplicate content which are:
I read in a Dream Weaver tutorial that you should name your home page "index.html" and then you can let www.mywebsite.com automatically direct the user to index.html. Is this a bug in SEOMoz's crawler or is it a real problem with my site?
Thank you,
Dan
-
The code should definitely go into the websites root directory's .htaccess, however .htaccess can be weird, a few days ago I ran into a similar issue with a client's website, and I was able to remedy the issue with a variation of the code.
index Redirect RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /([^/]+/)index.(php|html|htm|asp)\ HTTP/ RewriteRule ^(([^/]+/))index.(php|html|htm|asp)$ http://yoursite.com/$1 [R=301,L]
If you give me the URL for the site I will take a look at it and let you know what would be feasible.
-
Hi Daniel, can you share with us the URL of your site? We can take a look at it and give you a more precise answer that way. Thanks!
-
I eventually figured out that your method was a 301 redirect and I definitely broke my site trying to use the code you posted. .. haha. Its ok though. I just removed the code and it went back to normal. At first, I was editing the .htaccess file in the public_html folder which wasnt working. Then I tried the root folder for the site (I created the .htaccess file since it did not exist.) Neither of those worked. (I am using Bluehost so I do not think that I have root access and I am not sure if it is a Linux server or not.)
If there is an easy way to explain what I am doing wrong, please do so. Otherwise, I will use canonical.
Thanks for everything!
-
@Dan
Thanks for your reply. It seems like there are lots of different ways to solve this problem. I just watched this video on Matt Cutt's blog where he discusses his preference for 301 redirects over rel canonical tag.
Where would you say your solution fits in?
sorry about the delay of this response, i didn't realize the that you were asking me a question right away. When placing the code I provided in my previous answer this will cause a 301 perminant redirect to the original URL. That's actually what the
[R=301,L]
portion of the code is stating (R) redirect (301) status is referring to. After reviewing the Matt Cutts video, I realize that I should have asked you if you were operating on a Linux server that you had root access to. We actually utilize both redirects and canonical tags since it was recommended by the on-page optimization reports. Heck Google uses them, I would assume because it's easier for the user to be referred to a single page URL. Obviously though if you don't have server header access, and are not familiar with .htaccess (you can accidentally break your site) then the canonical solution is appropriate
-
Josh,
Thanks for your reply. It seems like there are lots of different ways to solve this problem. I just watched this video on Matt Cutt's blog where he discusses his preference for 301 redirects over rel canonical tag.
Where would you say your solution fits in?
Thanks,
Dan -
use the link rel tag for all my homepages for the http://www.yoursite.com
-
Odd enough I just recently answered this question. The SEOmoz crawler is correct, because without a redirect you will be able to access both versions of the page in your browser.
To resolve this issue simply rewrite the index.html to the root url by placing the following code into your .htaccess file into your root directory.
Options +FollowSymlinks RewriteEngine on
Index Rewrite RewriteRule ^index.(htm|html|php) http://www.yoursite.com/ [R=301,L] RewriteRule ^(.*)/index.(htm|html|php) http://www.yoursite.com/$1/ [R=301,L]
You can also do the same with the index file in any subdirectories that you might create, by simply placing a .htaccess into those sub directories and using variations of the above code. This is how you create nice tight URLs without the duplicate content issue that look like - http://www.semclix.com/design/business/
-
It is a problem which you need to fix. You need to canonicalize your pages.
Those are all various URLs which most likely lead to the same web page. I say "most likely" because these URLs can actually lead to different pages.
You need to tell crawlers and search engines how you organize your site. There are several ways to achieve canonicalization. The method I prefer is to add the following line of code to each page:
The URL provided should be the preferred URL for your page.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Why would a developer build all page content in php?
Picked up a new client. Site is built on Wordpress. Previous developer built nearly all page content in their custom theme's PHP files. In other words, the theme's "page.php" file contains virtually all the HTML for each of the site's pages. Each individual page's back-end page editor appears blank, except for some of the page text. No markup, no widgets, no custom fields. And no dedicated, page-specific php files either. Pages are differentiated within page.php using: elseif (is_page("27") Has anyone ever come across this approach before? Why might someone do this?
Web Design | | mphdavidson0 -
Website removed from Bing and Yahoo index
Hi, are website servicemanualrepairs.com was removed from Bing and Yahoo index, I emailed Bing via Webmaster tools they first said it was Backlinks I did look via at the inbound links tool to analyze the site's Backlinks I did find 20 links and used the Disavow tool, they said "I'm afraid but after careful and thorough investigation, your site still did not meet the Bing and Microsoft guidelines You may also refer to the things to avoid section of the Webmaster Guidelines for additional information. As an effect, the site is still blocked and it cannot be lifted" the website was in Bing and Yahoo index for 3 Years and only after the 20 Backlinks were added to the site it was de-index any help would be greatly appreciated Thanks
Web Design | | vista5211 -
Duplicate Titles for Large Lists
Our blog (www.cowleyweb.com/blog) has recently been given topic categories so we can utilize our old blogs. Otherwise, users would only see what's new and never look back (our blogs are organized by the month they were published) and all that hard work would kind of be a waste after a while. So we came up with a few topics (i.e. social media, internet marketing, etc.) and adding those as tags to blogs. Now, users can click the topics and get a results page on our blog of all the previously published blogs related to that topic. Sounds great. BUT, it's hurting our SEO crawl report. If the list goes beyond one page of search results, the 2nd and subsequent pages get dinged as "duplicate title" b/c they share the same title (i.e. "Social Media"). How can I fix this? I'm not the web designer but something tells me maybe some sort of tag that says "Page 2" or something would do the trick. We use Drupal which is good for customization. I assume tons of bloggers and websites have dealt with this problem. Please help. Want to give the web guy some solutions. Thank you.
Web Design | | JCunningham0 -
Google also indexed trailing slash version - PLEASE HELP
Hi Guys, We redesigned the website and somehow our canonical extension decided to add a trailing slash to all URLs. Previously our canonical URLs didn't have a trailing slash. During the redesign we haven't changed the URLs. They remained same but we have now two versions indexed. One with trailing slash one without. I've now fixed the issue and removed the the trailing slash from canonical URLs. Is this the correct way of fixing it? Will our rankings be effected in a negative way? Is there anything else I need to do. The website went live last Tuesday. Thanks
Web Design | | Jvalops0 -
What reason would scrapers, and syndication sites outrank all of our content?
Typing in any of our titles for content, scrapers and content syndication sites all outrank us by quite a bit. What is the main reason for this usually? I started noticing this happening quite a bit this year, and think maybe it has to do with panda. Has anyone figured out the reasoning?
Web Design | | upbuiltgames0 -
Websites with only one "html file" and page href # is good for SEO?
I bought one website from templatemonster that contains only one HTML and the pages are generated by links (PROGRAMACAO) My website: www.nextformaturas.com.br This is good in term of SEO? or it is better an website with deveral pages with diferent contents? What are the pros and cons? I really lost on this.
Web Design | | Naghirniac0 -
Best way to handle related content links in a sidebar?
My site contains tens of thousands of articles, studies, multimedia files, biographies, etc. To assist users with finding content that might be related to the page they're on, I use a side bar with 'also of interest' links to other, similar content on my site. This is, of course, pretty standard practice. Search engines -- Google in particular -- index these pages and then include the text in the sidebar links in search results. So, for example, on a given page I may have 20 links to related content, and the text in those links might be, 'A story about subject ABC.' When I search for 'A story about subject ABC,' Google returns not only the page titled (and containing the content) 'A story about subject ABC.' but also every page that links to it and happens to have that link text in the sidebar. What is the proper way to handle this kind of thing?
Web Design | | smorrison0 -
Dynamic pages and code within content
Hi all, I'm considering creating a dynamic table on my site that highlights rows / columns and cells depending on buttons that users can click. Each cell in the table links to a separate page that is created dynamically pulling information from a database. Now I'm aware of the google guidelines: "If you decide to use dynamic pages (i.e., the URL contains a "?" character), be aware that not every search engine spider crawls dynamic pages as well as static pages. It helps to keep the parameters short and the number of them few." So we wondered whether we could put the dynamic pages in our sitemap so that google could index them - the pages can be seen with javascript off which is how the pages are manipulated to make them dynamic. Could anyone give us a overview of the dangers here? I also wondered if you still need to separate content from code on a page? My developer still seems very keen to use inline CSS and javascript! Thanks a bundle.
Web Design | | tgraham0