Mod Rewrite / .htaccess avoid duplicate content
-
I have been searching and testing for hours but cannot find a solution. I am able to get a URL to display with out the file exntension.
i.e domain.com/file instead of domain.com/file.php
The problem is both versions of the URL above work, therefore a duplicate content issue. How can I force the URL with the file extension not to resolve and give a 404 error? Or just redirect to the non extension URL?
IF it helps here is my code.
Options +FollowSymLinks
RewriteEngine OnRewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME}.php -f
RewriteRule ^(.+)$ $1.php [L,QSA] -
Hi Erik,
No problem, glad I could help
To answer your question, No it doesn't matter which you use because the end result will be re-written to remove the file extension and add a forward slash at the end.
For consistency I would suggest having it without the .php inside your content though. If nothing else it would save you the pain of having to remove .php from your content if you moved to a content management system in the future.
If you've got any other questions let me know, and I'll be happy to help.
Ben
-
Didnt say thanks before, so thank you. One question I did not think of. Should the internal linking of the site be to the file name with extension or no extension?
I think it should be without extension but just want to double check.
-
Hi Ben. I tried this code on another hosting account and it did worked. The first account was a VPS account from Godaddy. The second was a shared account from the same hosting company. Im not sure why it works on one and not on the other. I did see the mod_rewrite option enabled.
-
Just tried this on my development server and it worked fine:
RewriteBase / RewriteEngine on RewriteCond %{HTTP_HOST} ^test.local RewriteCond %{THE_REQUEST} ^GET\ (.).php\ HTTP RewriteRule (.).php$ $1 [R=301]
remove index RewriteRule (.*)index$ $1 [R=301]
remove slash if not directory RewriteCond %{REQUEST_FILENAME} !-d RewriteCond %{REQUEST_URI} /$ RewriteRule (.)/ $1 [R=301] # add .php to access file, but don't redirect RewriteCond %{REQUEST_FILENAME}.php -f RewriteCond %{REQUEST_URI} !/$RewriteRule (.) $1.php [L]
The dev URL is test.local so you would want to change this to www.yourdomain.co.ukI had a page called about.php if I entered http://test.local/about.php or http://test.local/about it would show http://test.local/about in the address bar
-
Hi Ben. Thanks for your help but this does not work for some reason. Im testing it on an old site I have that is html and I just replaced php for html but both URL's still resolves.
-
Good answer Ben.
My main site is my own CMS, that I built 10 years ago, so after I added a lot of things to the .htaccess file and it became too large, I just moved the handling inside the control program, that only looks up filed URLs when they are broken. This processing is fast, but if there was any degradation, it only affects the broken URLs.
Speaking of broken URLs, I was getting a few 400 return codes and it seems the webserver handles those, so you have no chance to handle it in .htaccess. So the wat to handle that is with a 400 handler - that on cpanel sites just needs a 400.shtml file, that you can customize.
- you get a 400 response if you request a URL with a % symbol on the end, and some other site did that, thanks very much, and then google decided it would be a great thing to index.
-
Try using this instead:
<code>RewriteBase /</code>
<code># remove .php; use THE_REQUEST to prevent infinite loops
RewriteCond %{HTTP_HOST} ^www.domain.com
RewriteCond %{THE_REQUEST} ^GET\ (.).php\ HTTP
RewriteRule (.).php$ $1 [R=301]remove index
RewriteRule (.*)index$ $1 [R=301]
remove slash if not directory
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_URI} /$
RewriteRule (.*)/ $1 [R=301]add .php to access file, but don't redirect
RewriteCond %{REQUEST_FILENAME}.php -f
RewriteCond %{REQUEST_URI} !/$
RewriteRule (.*) $1.php [L]</code>
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Another Duplicate Content - eCommerce Question!
We are manufacturers of about 15 products and our website provides information about the products. We also offer them for sale on the site. Recently we partnered with a large eCommerce site that sells many of these types of products. They lifted descriptions from our site for theirs and are now selling our products. They have higher DA than us. Will this cause a ranking problem for us? Should we write unique descriptions for them? Thanks!
Technical SEO | | Chris6610 -
Despite canonical duplicate content in WMT
Hi, 2 weeks ago we've made big changes in title and meta descriptions. To solve the missing title and descriptions. Also set the right canonical. Now i see that in WMT despite the canonical it shows duplicates in meta descriptions and titles. i've setup the canonical like this:
Technical SEO | | Leonie-Kramer
1. url: www.domainname.com/category/listing-family/productname
2. url: www.domainname.com/category/listing-family/productname-more-info The canonical on both pages is like this: I'm aware of creating duplicate titles and descriptions, caused by the cms we use and also caused by wrong structure of category/products (we'll solve that nest year) that's why i wanted the canonical, but now it's not going any better, did i do something wrong with the canonical?0 -
Avoiding duplication in TLDs
I have started a ecom site with following config global version geekwik.com priced in usd india version geekwik.in priced in inr mostly the content in both sites is same (90% same), major difference is currency (and payment gateway) and helpline numbers etc How do I setup robots.txt and google webmaster so that indian users get results from India TLD and global users get results from global TLD and there is no duplication of content. .
Technical SEO | | geekwik0 -
Htaccess rewrites
We’re using wordpress, and we have the following in the .htaccess: BEGIN WordPress RewriteEngine On RewriteBase / RewriteRule ^index.php$ - [L] RewriteCond %{REQUEST_FILENAME} !-f RewriteCond %{REQUEST_FILENAME} !-d RewriteRule . /index.php [L] END WordPress We have a URL like http://subdomain.example.com/select-a-card/?q=fuel, and I want to rewrite it so that it becomes http://subdomain.example.com/fuel-card/ What do I need to add to the htaccess to do this?? Thanks,
Technical SEO | | AndrewAkesson0 -
404's and duplicate content.
I have real estate based websites that add new pages when new listings are added to the market and then deletes pages when the property is sold. My concern is that there are a significant amount of 404's created and the listing pages that are added are going to be the same as others in my market who use the same IDX provider. I can go with a different IDX provider that uses IFrame which doesn't create new pages but I used a IFrame before and my time on site was 3min w/ 2.5 pgs per visit and now it's 7.5 pg/visit with 6+min on the site. The new pages create new content daily so is fresh content and better on site metrics (with the 404's) better or less 404's, no dup content and shorter onsite metrics better? Any thoughts on this issue? Any advice would be appreciated
Technical SEO | | AnthonyLasVegas0 -
Forget Duplicate Content, What to do With Very Similar Content?
All, I operate a Wordpress blog site that focuses on one specific area of the law. Our contributors are attorneys from across the country who write about our niche topic. I've done away with syndicated posts, but we still have numerous articles addressing many of the same issues/topics. In some cases 15 posts might address the same issue. The content isn't duplicate but it is very similar, outlining the same rules of law etc. I've had an SEO I trust tell me I should 301 some of the similar posts to one authoritative post on the subject. Is this a good idea? Would I be better served implementing canonical tags pointing to the "best of breed" on each subject? Or would I be better off being grateful that I receive original content on my niche topic and not doing anything? Would really appreciate some feedback. John
Technical SEO | | JSOC0 -
Duplicate Page Content
Hi within my campaigns i get an error "crawl errors found" that says duplicate page content found, it finds the same content on the home pages below. Are these seen as two different pages? And how can i correct these errors as they are just one page? http://poolstar.net/ http://poolstar.net/Home_Page.php
Technical SEO | | RouteAccounts0 -
Duplicate content
I am getting flagged for duplicate content, SEOmoz is flagging the following as duplicate: www.adgenerator.co.uk/ www.adgenerator.co.uk/index.asp These are obviously meant to be the same path so what measures do I take to let the SE's know that these are to be considered the same page. I have used the canonical meta tag on the Index.asp page.
Technical SEO | | IPIM0