Mod Rewrite / .htaccess avoid duplicate content
-
I have been searching and testing for hours but cannot find a solution. I am able to get a URL to display with out the file exntension.
i.e domain.com/file instead of domain.com/file.php
The problem is both versions of the URL above work, therefore a duplicate content issue. How can I force the URL with the file extension not to resolve and give a 404 error? Or just redirect to the non extension URL?
IF it helps here is my code.
Options +FollowSymLinks
RewriteEngine OnRewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME}.php -f
RewriteRule ^(.+)$ $1.php [L,QSA] -
Hi Erik,
No problem, glad I could help
To answer your question, No it doesn't matter which you use because the end result will be re-written to remove the file extension and add a forward slash at the end.
For consistency I would suggest having it without the .php inside your content though. If nothing else it would save you the pain of having to remove .php from your content if you moved to a content management system in the future.
If you've got any other questions let me know, and I'll be happy to help.
Ben
-
Didnt say thanks before, so thank you. One question I did not think of. Should the internal linking of the site be to the file name with extension or no extension?
I think it should be without extension but just want to double check.
-
Hi Ben. I tried this code on another hosting account and it did worked. The first account was a VPS account from Godaddy. The second was a shared account from the same hosting company. Im not sure why it works on one and not on the other. I did see the mod_rewrite option enabled.
-
Just tried this on my development server and it worked fine:
RewriteBase / RewriteEngine on RewriteCond %{HTTP_HOST} ^test.local RewriteCond %{THE_REQUEST} ^GET\ (.).php\ HTTP RewriteRule (.).php$ $1 [R=301]
remove index RewriteRule (.*)index$ $1 [R=301]
remove slash if not directory RewriteCond %{REQUEST_FILENAME} !-d RewriteCond %{REQUEST_URI} /$ RewriteRule (.)/ $1 [R=301] # add .php to access file, but don't redirect RewriteCond %{REQUEST_FILENAME}.php -f RewriteCond %{REQUEST_URI} !/$RewriteRule (.) $1.php [L]
The dev URL is test.local so you would want to change this to www.yourdomain.co.ukI had a page called about.php if I entered http://test.local/about.php or http://test.local/about it would show http://test.local/about in the address bar
-
Hi Ben. Thanks for your help but this does not work for some reason. Im testing it on an old site I have that is html and I just replaced php for html but both URL's still resolves.
-
Good answer Ben.
My main site is my own CMS, that I built 10 years ago, so after I added a lot of things to the .htaccess file and it became too large, I just moved the handling inside the control program, that only looks up filed URLs when they are broken. This processing is fast, but if there was any degradation, it only affects the broken URLs.
Speaking of broken URLs, I was getting a few 400 return codes and it seems the webserver handles those, so you have no chance to handle it in .htaccess. So the wat to handle that is with a 400 handler - that on cpanel sites just needs a 400.shtml file, that you can customize.
- you get a 400 response if you request a URL with a % symbol on the end, and some other site did that, thanks very much, and then google decided it would be a great thing to index.
-
Try using this instead:
<code>RewriteBase /</code>
<code># remove .php; use THE_REQUEST to prevent infinite loops
RewriteCond %{HTTP_HOST} ^www.domain.com
RewriteCond %{THE_REQUEST} ^GET\ (.).php\ HTTP
RewriteRule (.).php$ $1 [R=301]remove index
RewriteRule (.*)index$ $1 [R=301]
remove slash if not directory
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_URI} /$
RewriteRule (.*)/ $1 [R=301]add .php to access file, but don't redirect
RewriteCond %{REQUEST_FILENAME}.php -f
RewriteCond %{REQUEST_URI} !/$
RewriteRule (.*) $1.php [L]</code>
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Duplicate Content of Reseller Product?
There is a particular product/service that I resell through an API. There are quite a few of them and each one requires a lot of content. The company provides web content for each product but I'm wondering about the SEO implications of using it? Obviously using the content, it will not be unique so I won't be able to rank (easily at least) for these products. Are there any _negative_results that I can get from using this content though? If I simply won't rank for those products it's not an issue since I get traffic elsewhere. Thanks!
Technical SEO | | reliabox0 -
Duplicate Page content / Rel=Cannonical
My SEO Moz crawl is showing duplicate content on my site. What is showing up are two articles I submitted to Submit your article (article submission service). I put their code in to my pages i.e. " <noscript><b>This article will only display in JavaScript enabled browsers.</b></noscript> " So do I need to delete these blog posts since they are showing up as dup content? I am having a difficult time understanding rel=cannonical. Isn't this for dup content on within one site? So I could not use rel="cannonical" in this instance? What is the best way to feature an article or press release written for another site, but that you want your clients to see? Rewritting seem ridiculous for a small business like ours. Can we just present the link? Thank you.
Technical SEO | | RoxBrock0 -
Duplicate Page Titles Warnings, htaccess Rewrite & Canonical Links.
Hey guys, Just signed up for a pro account and I am getting Duplicate Page Title warnings on links that are duplicate, rewritten for SEO, but have a canonical href tag. I have two sets of links in my store: SEO friendly: http://www.mysite.com/item/iphone-case Operational link: http://www.mysite.com/shop/product.php?pid=11 This operational link however has a href canonical tag pointing to the SEO friendly link as being the primary link. My question is; Do I need to worry about this Duplicate Page Title Warning if I am using a canonical tag on the Operational link pointing to the SEO friendly link? Thanks!
Technical SEO | | jason3600 -
Duplicated content on subcategory pages: how do I fix it?
Hello Everybody,
Technical SEO | | uMoR
I manage an e-commerce website and we have a duplicated content issue for subcategory. The scenario is like this: /category1/subcategory1
/category2/subcategory1
/category3/subcategory1 A single subcategory can fit multiple categories, so we have 3 different URL for the same subcategory with the same content (except of the navigation link). Which are the best practice to avoid this issue? Thank you!0 -
Duplicate titles / canonical / Drupal
I have a site where there are several duplicate titles, looks like mainly based on a parameterized vs. non-parameterized version of the page. I have what appears to be a proper canonical tag, but webmaster still complains of both duplicate titles & meta descriptions. A good example (taken out of webmaster report for http://igottadrive.com) is: /driving-tips/mirror-setup-and-use /driving-tips/mirror-setup-and-use?inline=true If I look at the page (in either case) there appears to be a correct canonical tag pointing to the base case. However, for some reason google is either ignoring the canonical or its not properly done. Any suggestions would be greatly appreciated.
Technical SEO | | uwaim20120 -
Duplicate content
I have just ran a report in seomoz on my domain and has noticed that there are duplicate content issues, the issues are: www.domainname/directory-name/ www.domainname/directory-name/index.php All my internal links and external links point to the first domain, as i prefer this style as it looks clear & concise, however doing this has created duplicate content as within the site itself i have an index.php page inside this /directory-name/ to show the page. Could anyone give me some advice on what i should do please? Kind Regards
Technical SEO | | Paul780 -
Duplicate content handling.
Hi all, I have a site that has a great deal of duplicate content because my clients list the same content on a few of my competitors sites. You can see an example of the page here: http://tinyurl.com/62wghs5 As you can see the search results are on the right. A majority of these results will also appear on my competitors sites. My homepage does not seem to want to pass link juice to these pages. Is it because of the high level of Dup Content or is it because of the large amount of links on the page? Would it be better to hide the content from the results in a nofollowed iframe to reduce duplicate contents visibilty while at the same time increasing unique content with articles, guides etc? or can the two exist together on a page and still allow link juice to be passed to the site. My PR is 3 but I can't seem to get any of my internal pages(except a couple of pages that appear in my navigation menu) to budge of the PR0 mark even if they are only one click from the homepage.
Technical SEO | | Mulith0 -
Duplicate content across multiple domains
I have come across a situation where we have discovered duplicate content between multiple domains. We have access to each domain and have recently within the past 2 weeks added a 301 redirect to redirect each page dynamically to the proper page on the desired domain. My question relates to the removal of these pages. There are thousands of these duplicate pages. I have gone back and looked at a number of these cached pages in google and have found that the cached pages that are roughly 30 days old or older. Will these pages ever get removed from google's index? Will the 301 redirect even be read by google to be redirected to the proper domain and page? If so when will that happen? Are we better off submitting a full site removal request of the sites that carries the duplicate content at this point? These smaller sites do bring traffic on their own but I'd rather not wait 3 months for the content to be removed since my assumption is that this content is competing with the main site. I suppose another option would be to include no cache meta tag for these pages. Any thoughts or comments would be appreciated.
Technical SEO | | jmsobe0