Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Mod Rewrite / .htaccess avoid duplicate content
-
I have been searching and testing for hours but cannot find a solution. I am able to get a URL to display with out the file exntension.
i.e domain.com/file instead of domain.com/file.php
The problem is both versions of the URL above work, therefore a duplicate content issue. How can I force the URL with the file extension not to resolve and give a 404 error? Or just redirect to the non extension URL?
IF it helps here is my code.
Options +FollowSymLinks
RewriteEngine OnRewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME}.php -f
RewriteRule ^(.+)$ $1.php [L,QSA] -
Hi Erik,
No problem, glad I could help
To answer your question, No it doesn't matter which you use because the end result will be re-written to remove the file extension and add a forward slash at the end.
For consistency I would suggest having it without the .php inside your content though. If nothing else it would save you the pain of having to remove .php from your content if you moved to a content management system in the future.
If you've got any other questions let me know, and I'll be happy to help.
Ben
-
Didnt say thanks before, so thank you. One question I did not think of. Should the internal linking of the site be to the file name with extension or no extension?
I think it should be without extension but just want to double check.
-
Hi Ben. I tried this code on another hosting account and it did worked. The first account was a VPS account from Godaddy. The second was a shared account from the same hosting company. Im not sure why it works on one and not on the other. I did see the mod_rewrite option enabled.
-
Just tried this on my development server and it worked fine:
RewriteBase / RewriteEngine on RewriteCond %{HTTP_HOST} ^test.local RewriteCond %{THE_REQUEST} ^GET\ (.).php\ HTTP RewriteRule (.).php$ $1 [R=301]
remove index RewriteRule (.*)index$ $1 [R=301]
remove slash if not directory RewriteCond %{REQUEST_FILENAME} !-d RewriteCond %{REQUEST_URI} /$ RewriteRule (.)/ $1 [R=301] # add .php to access file, but don't redirect RewriteCond %{REQUEST_FILENAME}.php -f RewriteCond %{REQUEST_URI} !/$RewriteRule (.) $1.php [L]
The dev URL is test.local so you would want to change this to www.yourdomain.co.ukI had a page called about.php if I entered http://test.local/about.php or http://test.local/about it would show http://test.local/about in the address bar
-
Hi Ben. Thanks for your help but this does not work for some reason. Im testing it on an old site I have that is html and I just replaced php for html but both URL's still resolves.
-
Good answer Ben.
My main site is my own CMS, that I built 10 years ago, so after I added a lot of things to the .htaccess file and it became too large, I just moved the handling inside the control program, that only looks up filed URLs when they are broken. This processing is fast, but if there was any degradation, it only affects the broken URLs.
Speaking of broken URLs, I was getting a few 400 return codes and it seems the webserver handles those, so you have no chance to handle it in .htaccess. So the wat to handle that is with a 400 handler - that on cpanel sites just needs a 400.shtml file, that you can customize.
- you get a 400 response if you request a URL with a % symbol on the end, and some other site did that, thanks very much, and then google decided it would be a great thing to index.
-
Try using this instead:
<code>RewriteBase /</code>
<code># remove .php; use THE_REQUEST to prevent infinite loops
RewriteCond %{HTTP_HOST} ^www.domain.com
RewriteCond %{THE_REQUEST} ^GET\ (.).php\ HTTP
RewriteRule (.).php$ $1 [R=301]remove index
RewriteRule (.*)index$ $1 [R=301]
remove slash if not directory
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_URI} /$
RewriteRule (.*)/ $1 [R=301]add .php to access file, but don't redirect
RewriteCond %{REQUEST_FILENAME}.php -f
RewriteCond %{REQUEST_URI} !/$
RewriteRule (.*) $1.php [L]</code>
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Duplicate Content Issues with Pagination
Hi Moz Community, We're an eCommerce site so we have a lot of pagination issues but we were able to fix them using the rel=next and rel=prev tags. However, our pages have an option to view 60 items or 180 items at a time. This is now causing duplicate content problems when for example page 2 of the 180 item view is the same as page 4 of the 60 item view. (URL examples below) Wondering if we should just add a canonical tag going to the the main view all page to every page in the paginated series to get ride of this issue. https://www.example.com/gifts/for-the-couple?view=all&n=180&p=2 https://www.example.com/gifts/for-the-couple?view=all&n=60&p=4 Thoughts, ideas or suggestions are welcome. Thanks
Technical SEO | | znotes0 -
.com and .co.uk duplicate content
hi mozzers I have a client that has just released a .com version of their .co.uk website. They have basically re-skinned the .co.uk version with some US amends so all the content and title tags are the same. What you do recommend? Canonical tag to the .co.uk version? rewrite titles?
Technical SEO | | KarlBantleman0 -
Do I use /es/, /mx/ or /es-mx/ for my Spanish site for Mexico only
I currently have the Spanish version of my site under myurl.com/es/ When I was at Pubcon in Vegas last year a panel reviewed my site and said the Spanish version should be in /mx/ rather than /es/ since es is for Spain only and my site is for Mexico only. Today while trying to find information on the web I found /es-mx/ as a possibility. I am changing my site and was planning to change to /mx/ but want confirmation on the correct way to do this. Does anyone have a link to Google documentation that will tell me for sure what to use here? The documentation I read led me to the /es/ but I cannot find that now.
Technical SEO | | RoxBrock0 -
Duplicate Content Issues on Product Pages
Hi guys Just keen to gauge your opinion on a quandary that has been bugging me for a while now. I work on an ecommerce website that sells around 20,000 products. A lot of the product SKUs are exactly the same in terms of how they work and what they offer the customer. Often it is 1 variable that changes. For example, the product may be available in 200 different sizes and 2 colours (therefore 400 SKUs available to purchase). Theese SKUs have been uploaded to the website as individual entires so that the customer can purchase them, with the only difference between the listings likely to be key signifiers such as colour, size, price, part number etc. Moz has flagged these pages up as duplicate content. Now I have worked on websites long enough now to know that duplicate content is never good from an SEO perspective, but I am struggling to work out an effective way in which I can display such a large number of almost identical products without falling foul of the duplicate content issue. If you wouldnt mind sharing any ideas or approaches that have been taken by you guys that would be great!
Technical SEO | | DHS_SH0 -
Localized domains and duplicate content
Hey guys, In my company we are launching a new website and there's an issue it's been bothering me for a while. I'm sure you guys can help me out. I already have a website, let's say ABC.com I'm preparing a localized version of that website for the uk so we'll launch ABC.co.uk Basically the websites are going to be exactly the same with the difference of the homepage. They have a slightly different proposition. Using GeoIP I will redirect the UK traffic to ABC.co.uk and the rest of the traffic will still visit .com website. May google penalize this? The site itself it will be almost the same but the homepage. This may count as duplicate content even if I'm geo-targeting different regions so they will never overlap. Thanks in advance for you advice
Technical SEO | | fabrizzio0 -
Duplicate Content and URL Capitalization
I have multiple URLs that SEOMoz is reporting as duplicate content. The reason is that there are characters in the URL that may, or may not, be capitalized depending on user input. A couple examples are: www.househitz.com/Pennsylvania/Houses-for-sale www.househitz.com/Pennsylvania/houses-for-sale www.househitz.com/Pennsylvania/Houses-for-rent www.househitz.com/Pennsylvania/houses-for-rent There are currently thousands of instances of this on the site. Is this something I should spend effort to try and resolve (may not be minor effort), or should I just ignore it and move on?
Technical SEO | | Jom0 -
Duplicate Content issue
I have been asked to review an old website to an identify opportunities for increasing search engine traffic. Whilst reviewing the site I came across a strange loop. On each page there is a link to printer friendly version: http://www.websitename.co.uk/index.php?pageid=7&printfriendly=yes That page also has a link to a printer friendly version http://www.websitename.co.uk/index.php?pageid=7&printfriendly=yes&printfriendly=yes and so on and so on....... Some of these pages are being included in Google's index. I appreciate that this can't be a good thing, however, I am not 100% sure as to the extent to which it is a bad thing and the priority that should be given to getting it sorted. Just wandering what views people have on the issues this may cause?
Technical SEO | | CPLDistribution0 -
CGI Parameters: should we worry about duplicate content?
Hi, My question is directed to CGI Parameters. I was able to dig up a bit of content on this but I want to make sure I understand the concept of CGI parameters and how they can affect indexing pages. Here are two pages: No CGI parameter appended to end of the URL: http://www.nytimes.com/2011/04/13/world/asia/13japan.html CGI parameter appended to the end of the URL: http://www.nytimes.com/2011/04/13/world/asia/13japan.html?pagewanted=2&ref=homepage&src=mv Questions: Can we safely say that CGI parameters = URL parameters that append to the end of a URL? Or are they different? And given that you have rel canonical implemented correctly on your pages, search engines will move ahead and index only the URL that is specified in that tag? Thanks in advance for giving your insights. Look forward to your response. Best regards, Jackson
Technical SEO | | jackson_lo0