Mod Rewrite / .htaccess avoid duplicate content

MiamiWebCompany

I have been searching and testing for hours but cannot find a solution. I am able to get a URL to display with out the file exntension.

i.e domain.com/file instead of domain.com/file.php

The problem is both versions of the URL above work, therefore a duplicate content issue. How can I force the URL with the file extension not to resolve and give a 404 error? Or just redirect to the non extension URL?

IF it helps here is my code.

Options +FollowSymLinks
RewriteEngine On

RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME}.php -f
RewriteRule ^(.+)$ $1.php [L,QSA]

blacey

Hi Erik,

No problem, glad I could help

To answer your question, No it doesn't matter which you use because the end result will be re-written to remove the file extension and add a forward slash at the end.

For consistency I would suggest having it without the .php inside your content though. If nothing else it would save you the pain of having to remove .php from your content if you moved to a content management system in the future.

If you've got any other questions let me know, and I'll be happy to help.

Ben

MiamiWebCompany

Didnt say thanks before, so thank you. One question I did not think of. Should the internal linking of the site be to the file name with extension or no extension?

I think it should be without extension but just want to double check.

MiamiWebCompany

Hi Ben. I tried this code on another hosting account and it did worked. The first account was a VPS account from Godaddy. The second was a shared account from the same hosting company. Im not sure why it works on one and not on the other. I did see the mod_rewrite option enabled.

blacey

Just tried this on my development server and it worked fine:

RewriteBase / RewriteEngine on RewriteCond %{HTTP_HOST} ^test.local RewriteCond %{THE_REQUEST} ^GET\ (.).php\ HTTP RewriteRule (.).php$ $1 [R=301]

remove index RewriteRule (.*)index$ $1 [R=301]

remove slash if not directory RewriteCond %{REQUEST_FILENAME} !-d RewriteCond %{REQUEST_URI} /$ RewriteRule (.)/ $1 [R=301] # add .php to access file, but don't redirect RewriteCond %{REQUEST_FILENAME}.php -f RewriteCond %{REQUEST_URI} !/$RewriteRule (.) $1.php [L]

The dev URL is test.local so you would want to change this to www.yourdomain.co.ukI had a page called about.php if I entered http://test.local/about.php or http://test.local/about it would show http://test.local/about in the address bar

MiamiWebCompany

Hi Ben. Thanks for your help but this does not work for some reason. Im testing it on an old site I have that is html and I just replaced php for html but both URL's still resolves.

loopyal

Good answer Ben.

My main site is my own CMS, that I built 10 years ago, so after I added a lot of things to the .htaccess file and it became too large, I just moved the handling inside the control program, that only looks up filed URLs when they are broken. This processing is fast, but if there was any degradation, it only affects the broken URLs.

Speaking of broken URLs, I was getting a few 400 return codes and it seems the webserver handles those, so you have no chance to handle it in .htaccess. So the wat to handle that is with a 400 handler - that on cpanel sites just needs a 400.shtml file, that you can customize.

you get a 400 response if you request a URL with a % symbol on the end, and some other site did that, thanks very much, and then google decided it would be a great thing to index.

blacey

Try using this instead:

<code>RewriteBase /</code>

<code># remove .php; use THE_REQUEST to prevent infinite loops
RewriteCond %{HTTP_HOST} ^www.domain.com
RewriteCond %{THE_REQUEST} ^GET\ (.).php\ HTTP
RewriteRule (.).php$ $1 [R=301]

remove index

RewriteRule (.*)index$ $1 [R=301]

remove slash if not directory

RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_URI} /$
RewriteRule (.*)/ $1 [R=301]

add .php to access file, but don't redirect

RewriteCond %{REQUEST_FILENAME}.php -f
RewriteCond %{REQUEST_URI} !/$
RewriteRule (.*) $1.php [L]</code>

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

Moz Q&A is closed.