Duplicate content /index.php/ issues
-
I'm having some duplicate content issues with Google. I've already got my .htaccess file working just fine as far as I can tell. Rewriting works great, and by using the site you'd never end up on a page with /index.php. However I do notice that on ANY page of the site you could add /index.php and get the same page i.e.:
www.mysite.com/category/article
and
www.mysite.com/index.php/category/article
Would both return the same page. How can I 301 or something similar all /index.php pages to the non index.php version? I have no desire for any page on my site to have index.php in it, there is no use to it. Having quite the hard time figuring this out.
Again this is basically just for the robots, the URL's the users see are perfect, never had an issue with that. Just SEOMOZ reporting duplicate content and I've verified that to be true.
-
Hey Emory - if that's the default .htaccess file your software created (assume this is a Joomla-based site?), it looks like the redirect code you need is already there, but it is disabled by default.
The following code
Remove index.php or index.htm/html from URL requests
#RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /(([^/]+/)*)index.(php|html?)\ HTTP/
#RewriteCond %{REQUEST_URI} !^/administrator
#RewriteRule ^([^/]+/)*index.(html?|php)$ http://www.mysite.com/$1 [R=301,L]should do what you want, The reason its not currently doing anything is because it has been commented out. The "#" symbol at the beginning of each line tells the server NOT to run the code in that line.
Try removing the "#" symbol in front of the last three lines of that code, save the file & then thoroughly test your site. (It's not the way I would write it, but there may be specific requirements for your site/system) The first line is just a descriptive header, so the "#" symbol needs to be left on it.
If for any reason it causes problems, you can simply re-add the "#" symbols and re-save to return the site to its original state.
Give that a shot and let us know if it accomplishes what you want to do.
Paul
P.S. In particular when testing - ensure that client logins work correctly, and that the search function and all plugins also still work.
-
Any ideas/input?
-
Tried that in many ways, but can't get it working. Here is a copy of the .htaccess file, what changes would need to be made (clearly input that code):
Options +FollowSymLinks
RewriteEngine On
prevents people from accessing anything with phpMyAdmin
RewriteRule phpMyAdmin - [F]
Remove index.php or index.htm/html from URL requests
#RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /(([^/]+/)*)index.(php|html?)\ HTTP/
#RewriteCond %{REQUEST_URI} !^/administrator
#RewriteRule ^([^/]+/)*index.(html?|php)$ http://www.mysite.com/$1 [R=301,L]force canonical www if request is for non-www or has port number etc
RewriteCond %{HTTP_HOST} !^(www.mysite.com)?$
RewriteRule (.*) http://www.mysite.com/$1 [R=301,L]redirect 301 /home.html http://www.mysite.com/
RewriteCond %{QUERY_STRING} mosConfig_[a-zA-Z_]{1,21}(=|%3D) [OR]
RewriteCond %{QUERY_STRING} base64_encode[^(]([^)]) [OR]
RewriteCond %{QUERY_STRING} (<|%3C)([^s]s)+cript.(>|%3E) [NC,OR]
RewriteCond %{QUERY_STRING} GLOBALS(=|[|%[0-9A-Z]{0,2}) [OR]
RewriteCond %{QUERY_STRING} _REQUEST(=|[|%[0-9A-Z]{0,2})
RewriteRule .* index.php [F]#RewriteBase /
RewriteRule .* - [E=HTTP_AUTHORIZATION:%{HTTP:Authorization}]
RewriteCond %{REQUEST_URI} !^/index.php
RewriteCond %{REQUEST_URI} (/[^.]|.(php|html?|feed|pdf|raw))$ [NC]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . index.php [L] -
Hi Emory,
Simple solution would be to redirect to root from the index.php using htaccess using the rule below. Lets us know how this works for you
RewriteRule ^(.*)index.(html|php)$ http://%{HTTP_HOST}/$1 [R=301,L]
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Duplicate content w/ same URLs
I am getting high priority issues for our privacy & terms pages that have the same URL. Why would this show up as duplicate content? Thanks!
Technical SEO | | RanvirGujral0 -
Duplicate Content Brainstorming
Hi, New here in the SEO world. Excellent resources here. We have an ecommerce website that sells presentation templates. Today our templates come in 3 flavours - for PowerPoint, for Keynote and both - called Presentation Templates. So we've ended up with 3 URLS with similar content. Same screenshots, similar description.. Example: https://www.improvepresentation.com/keynote-templates/social-media-keynote-template https://www.improvepresentation.com/powerpoint-templates/social-media-powerpoint-template https://www.improvepresentation.com/presentation-templates/social-media-presentation-template I know what you're thinking. Why not make a website with a template and give 3 download options right? But what about https://www.improvepresentation.com/powerpoint-templates/ https://www.improvepresentation.com/keynote-templates/ These are powerfull URL's in my opinion taking into account that the strongest keyword in our field is "powerpoint templates" How would you solve this "problem" or maybe there is no problem at all.
Technical SEO | | slidescamp0 -
Do mobile and desktop sites that pull content from the same source count as duplicate content?
We are about to launch a mobile site that pulls content from the same CMS, including metadata. They both have different top-level domains, however (www.abcd.com and www.m.abcd.com). How will this affect us in terms of search engine ranking?
Technical SEO | | ovenbird0 -
Does this content get indexed?
A lot of content on this site is displayed in pop up pages. Eg. Visit the Title page http://www.landgate.wa.gov.au/corporate.nsf/web/Certificate+of+Title To access the sample report or fee details, the info is shown in a pop up page with a strange url. Example: http://www.landgate.wa.gov.au/corporate.nsf/web/Certificate+of+Title+-+Fee+Details I can't see any of these pages being indexed in Google or other search engines when I do a site search: http://www.landgate.wa.gov.au/corporate.nsf/web/Certificate+of+Title+-+Fee+Details Is there a way to get this content indexed besides telling the client to restructure this content?
Technical SEO | | Bigheadigital0 -
Duplicate content issue. Delete index.html and replace with www.?
I have a duplicate content issue. On my site the home button goes to the index.html and not the www. If I change it to the www will it impact my SERPS? I don't think anyone links to the index.html.
Technical SEO | | bronxpad1 -
How to use internal tracking without causing duplicate content issues
Hi, We've been testing internal tracking for 4 weeks on a couple of pages using the basic string ?internalcampaign=X, but hese pages have started appearing in the search results. We don't currently have the facility to add canonical tags to correct this. Does anyone have any other solutions to this problem other than deleting the internal tracking or adding filters on the server? Thanks!
Technical SEO | | NSJ780 -
SEO MOZ report showing duplicate content pages with without ending /
Hello the SEOMOZ report is showing me I have a lot of duplicate content and then proceeds listing almost every page on my site as showing with a URL with an ending "/" and without. I checked my sitemap and only one version is there, the one with "/". I have a Wordpress site. Any recommendations ? Thanks.
Technical SEO | | dpaq20110 -
Are recipes excluded from duplicate content?
Does anyone know how recipes are treated by search engines? For example, I know press releases are expected to have lots of duplicates out there so they aren't penalized. Does anyone know if recipes are treated the same way. For example, if you Google "three cheese beef pasta shells" you get the first two results with identical content.
Technical SEO | | RiseSEO0