Duplicate content /index.php/ issues
-
I'm having some duplicate content issues with Google. I've already got my .htaccess file working just fine as far as I can tell. Rewriting works great, and by using the site you'd never end up on a page with /index.php. However I do notice that on ANY page of the site you could add /index.php and get the same page i.e.:
www.mysite.com/category/article
and
www.mysite.com/index.php/category/article
Would both return the same page. How can I 301 or something similar all /index.php pages to the non index.php version? I have no desire for any page on my site to have index.php in it, there is no use to it. Having quite the hard time figuring this out.
Again this is basically just for the robots, the URL's the users see are perfect, never had an issue with that. Just SEOMOZ reporting duplicate content and I've verified that to be true.
-
Hey Emory - if that's the default .htaccess file your software created (assume this is a Joomla-based site?), it looks like the redirect code you need is already there, but it is disabled by default.
The following code
Remove index.php or index.htm/html from URL requests
#RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /(([^/]+/)*)index.(php|html?)\ HTTP/
#RewriteCond %{REQUEST_URI} !^/administrator
#RewriteRule ^([^/]+/)*index.(html?|php)$ http://www.mysite.com/$1 [R=301,L]should do what you want, The reason its not currently doing anything is because it has been commented out. The "#" symbol at the beginning of each line tells the server NOT to run the code in that line.
Try removing the "#" symbol in front of the last three lines of that code, save the file & then thoroughly test your site. (It's not the way I would write it, but there may be specific requirements for your site/system) The first line is just a descriptive header, so the "#" symbol needs to be left on it.
If for any reason it causes problems, you can simply re-add the "#" symbols and re-save to return the site to its original state.
Give that a shot and let us know if it accomplishes what you want to do.
Paul
P.S. In particular when testing - ensure that client logins work correctly, and that the search function and all plugins also still work.
-
Any ideas/input?
-
Tried that in many ways, but can't get it working. Here is a copy of the .htaccess file, what changes would need to be made (clearly input that code):
Options +FollowSymLinks
RewriteEngine On
prevents people from accessing anything with phpMyAdmin
RewriteRule phpMyAdmin - [F]
Remove index.php or index.htm/html from URL requests
#RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /(([^/]+/)*)index.(php|html?)\ HTTP/
#RewriteCond %{REQUEST_URI} !^/administrator
#RewriteRule ^([^/]+/)*index.(html?|php)$ http://www.mysite.com/$1 [R=301,L]force canonical www if request is for non-www or has port number etc
RewriteCond %{HTTP_HOST} !^(www.mysite.com)?$
RewriteRule (.*) http://www.mysite.com/$1 [R=301,L]redirect 301 /home.html http://www.mysite.com/
RewriteCond %{QUERY_STRING} mosConfig_[a-zA-Z_]{1,21}(=|%3D) [OR]
RewriteCond %{QUERY_STRING} base64_encode[^(]([^)]) [OR]
RewriteCond %{QUERY_STRING} (<|%3C)([^s]s)+cript.(>|%3E) [NC,OR]
RewriteCond %{QUERY_STRING} GLOBALS(=|[|%[0-9A-Z]{0,2}) [OR]
RewriteCond %{QUERY_STRING} _REQUEST(=|[|%[0-9A-Z]{0,2})
RewriteRule .* index.php [F]#RewriteBase /
RewriteRule .* - [E=HTTP_AUTHORIZATION:%{HTTP:Authorization}]
RewriteCond %{REQUEST_URI} !^/index.php
RewriteCond %{REQUEST_URI} (/[^.]|.(php|html?|feed|pdf|raw))$ [NC]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . index.php [L] -
Hi Emory,
Simple solution would be to redirect to root from the index.php using htaccess using the rule below. Lets us know how this works for you
RewriteRule ^(.*)index.(html|php)$ http://%{HTTP_HOST}/$1 [R=301,L]
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Duplicate content question
Hey Mozzers! I received a duplicate content notice from my Cycle7 Communications campaign today. I understand the concept of duplicate content, but none of the suggested fixes quite seems to fit. I have four pages with HubSpot forms embedded in them. (Only two of these pages have showed up so far in my campaign.) Each page contains a title (Content Marketing Consultation, Copywriting Consultation, etc), plus an embedded HubSpot form. The forms are all outwardly identical, but I use a separate form for each service that I offer. I’m not sure how to respond to this crawl issue: Using a 301 redirect doesn’t seem right, because each page/form combo is independent and serves a separate purpose. Using a rel=canonical link doesn’t seem right for the same reason that a 301 redirect doesn’t seem right. Using the Google Search Console URL Parameters tool is clearly contraindicated by Google’s documentation (I don’t have enough pages on my site). Is a meta robots noindex the best way to deal with duplicate content in this case? Thanks in advance for your help. AK
Technical SEO | | AndyKubrin0 -
Site Crawl -> Duplicate Page Content -> Same pages showing up with duplicates that are not
These, for example: | https://im.tapclicks.com/signup.php/?utm_campaign=july15&utm_medium=organic&utm_source=blog | 1 | 2 | 29 | 2 | 200 |
Technical SEO | | writezach
| https://im.tapclicks.com/signup.php?_ga=1.145821812.1573134750.1440742418 | 1 | 1 | 25 | 2 | 200 |
| https://im.tapclicks.com/signup.php?utm_source=tapclicks&utm_medium=blog&utm_campaign=brightpod-article | 1 | 119 | 40 | 4 | 200 |
| https://im.tapclicks.com/signup.php?utm_source=tapclicks&utm_medium=marketplace&utm_campaign=homepage | 1 | 119 | 40 | 4 | 200 |
| https://im.tapclicks.com/signup.php?utm_source=blog&utm_campaign=first-3-must-watch-videos | 1 | 119 | 40 | 4 | 200 |
| https://im.tapclicks.com/signup.php?_ga=1.159789566.2132270851.1418408142 | 1 | 5 | 31 | 2 | 200 |
| https://im.tapclicks.com/signup.php/?utm_source=vocus&utm_medium=PR&utm_campaign=52release | Any suggestions/directions for fixing or should I just disregard this "High Priority" moz issue? Thank you!0 -
Who gets punished for duplicate content?
What happens if two domains have duplicate content? Do both domains get punished for it, or just one? If so, which one?
Technical SEO | | Tobii-Dynavox0 -
Hreflang and possible duplicate content SEO issue
| 0 <a class="vote-down-off" title="This question does not show any research effort; it is unclear or not useful">down vote</a> favorite | Hey community, my first question here 🙂 Imagine there is a page with video, it has hreflang tags setup, to lead let's say German visitors to /de/ folder... So, on that German version of page, everything like menus, navigation and such are in German, but the video is the same, the title of the video (H1 tag) is the same, <title></code></strong> and <strong><code>meta description</code></strong> is the same as on the original English page. It means that general (English) page and German version of it has the same key content in English.</p> <p>To me it seems to be a SEO duplicate content issue. As I know, Google doesn't think that content is duplicate, if it is properly translated to other language.</p> <p>Does my explained case mean that the content will be detected by Google as duplicate?</p> </div> </div> </td> </tr> </tbody> </table></title> |
Technical SEO | | poiseo0 -
Staging & Development areas should be not indexable (i.e. no followed/no index in meta robots etc)
Hi I take it if theres a staging or development area on a subdomain for a site, who's content is hence usually duplicate then this should not be indexable i.e. (no-indexed & nofollowed in metarobots) ? In order to prevent dupe content probs as well as non project related people seeing work in progress or finding accidentally in search engine listings ? Also if theres no such info in meta robots is there any other way it may have been made non-indexable, or at least dupe content prob removed by canonicalising the page to the equivalent page on the live site ? In the case in question i am finding it listed in serps when i search for the staging/dev area url, so i presume this needs urgent attention ? Cheers Dan
Technical SEO | | Dan-Lawrence0 -
Duplicate Content Issues
We have some "?src=" tag in some URL's which are treated as duplicate content in the crawl diagnostics errors? For example, xyz.com?src=abc and xyz.com?src=def are considered to be duplicate content url's. My objective is to make my campaign free of these crawl errors. First of all i would like to know why these url's are considered to have duplicate content. And what's the best solution to get rid of this?
Technical SEO | | RodrigoVaca0 -
Filter Tag Duplicate Content E-Commerce Issue
Hello, I just launched a new site for a client but am seeing some duplicate content issues in the campaign crawl. It has to do with the drill-down, filter "tags" that helps users find the product they are looking for. You can see them in the sidebar here: http://www.ssmd.com/shop/ In my crawl report this is what is showing up as duplicate content (attached image). How do I keep these widgets from generating duplicate content on the site? Also, not sure if it's important or not, but I am using Wordpress, WooCommerce and Yoast's SEO Tool. Any suggestions are appreciated! Screen%20Shot%202012-10-23%20at%202.56.00%20PM.png
Technical SEO | | kylehungate0 -
Duplicate index.php/webpage pages on website. Help needed!
Hi Guys, Having a really frustrating problem with our website. It is a Joomla 1.7 site and we have some duplicate page issues. What is happening is that we have a webpage, lets say domain.com/webpage1 and then we also have domain.com/index.php/webpage1. Google is seeing these as duplicate pages and is causing me some real SEO problems. I have tried setting up a 301 redirect but it wn't let me redirect /index.php/webpage1 to /webpage1. Anyone have any ideas or plugins that can be used to sort this out? Any help will be really appreciated! Matt.
Technical SEO | | MatthewBarby0