Duplicate content /index.php/ issues
-
I'm having some duplicate content issues with Google. I've already got my .htaccess file working just fine as far as I can tell. Rewriting works great, and by using the site you'd never end up on a page with /index.php. However I do notice that on ANY page of the site you could add /index.php and get the same page i.e.:
www.mysite.com/category/article
and
www.mysite.com/index.php/category/article
Would both return the same page. How can I 301 or something similar all /index.php pages to the non index.php version? I have no desire for any page on my site to have index.php in it, there is no use to it. Having quite the hard time figuring this out.
Again this is basically just for the robots, the URL's the users see are perfect, never had an issue with that. Just SEOMOZ reporting duplicate content and I've verified that to be true.
-
Hey Emory - if that's the default .htaccess file your software created (assume this is a Joomla-based site?), it looks like the redirect code you need is already there, but it is disabled by default.
The following code
Remove index.php or index.htm/html from URL requests
#RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /(([^/]+/)*)index.(php|html?)\ HTTP/
#RewriteCond %{REQUEST_URI} !^/administrator
#RewriteRule ^([^/]+/)*index.(html?|php)$ http://www.mysite.com/$1 [R=301,L]should do what you want, The reason its not currently doing anything is because it has been commented out. The "#" symbol at the beginning of each line tells the server NOT to run the code in that line.
Try removing the "#" symbol in front of the last three lines of that code, save the file & then thoroughly test your site. (It's not the way I would write it, but there may be specific requirements for your site/system) The first line is just a descriptive header, so the "#" symbol needs to be left on it.
If for any reason it causes problems, you can simply re-add the "#" symbols and re-save to return the site to its original state.
Give that a shot and let us know if it accomplishes what you want to do.
Paul
P.S. In particular when testing - ensure that client logins work correctly, and that the search function and all plugins also still work.
-
Any ideas/input?
-
Tried that in many ways, but can't get it working. Here is a copy of the .htaccess file, what changes would need to be made (clearly input that code):
Options +FollowSymLinks
RewriteEngine On
prevents people from accessing anything with phpMyAdmin
RewriteRule phpMyAdmin - [F]
Remove index.php or index.htm/html from URL requests
#RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /(([^/]+/)*)index.(php|html?)\ HTTP/
#RewriteCond %{REQUEST_URI} !^/administrator
#RewriteRule ^([^/]+/)*index.(html?|php)$ http://www.mysite.com/$1 [R=301,L]force canonical www if request is for non-www or has port number etc
RewriteCond %{HTTP_HOST} !^(www.mysite.com)?$
RewriteRule (.*) http://www.mysite.com/$1 [R=301,L]redirect 301 /home.html http://www.mysite.com/
RewriteCond %{QUERY_STRING} mosConfig_[a-zA-Z_]{1,21}(=|%3D) [OR]
RewriteCond %{QUERY_STRING} base64_encode[^(]([^)]) [OR]
RewriteCond %{QUERY_STRING} (<|%3C)([^s]s)+cript.(>|%3E) [NC,OR]
RewriteCond %{QUERY_STRING} GLOBALS(=|[|%[0-9A-Z]{0,2}) [OR]
RewriteCond %{QUERY_STRING} _REQUEST(=|[|%[0-9A-Z]{0,2})
RewriteRule .* index.php [F]#RewriteBase /
RewriteRule .* - [E=HTTP_AUTHORIZATION:%{HTTP:Authorization}]
RewriteCond %{REQUEST_URI} !^/index.php
RewriteCond %{REQUEST_URI} (/[^.]|.(php|html?|feed|pdf|raw))$ [NC]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . index.php [L] -
Hi Emory,
Simple solution would be to redirect to root from the index.php using htaccess using the rule below. Lets us know how this works for you
RewriteRule ^(.*)index.(html|php)$ http://%{HTTP_HOST}/$1 [R=301,L]
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Historic issue with incomplete indexing
Hi there We run quite a big site in the UK in the commercial real-estate space. Historically we have always had a challenge getting our "primary" landing pages indexed, which are location based property result pages. e.g. https://realla.co/to-rent/commercial-property/oxford For example, for the "towns" category we have 8,549 submitted in our xml sitemap, with only 3,171 indexed. This is a general issue across all our sitemaps. 120k submitted, 80k indexed. Our pages are linked through breadcrumbs, and nearby links. In the new search console these pages are reported as "crawled - currently not indexed" These all sit under the folder: site:https://realla.co/to-rent/commercial-property/* site:https://realla.co/to-rent/office/* We have done extensive work to optimise performance, including AMP pages. Each location page has many details pages for individual properties e.g. https://realla.co/to-rent/details/0ffbbd0a1a1147edb8847c5ce6179509 One action we have remaining is to nest the details under the locations pages, which may help. These details pages are indexed fully. Any feedback much appreciated
Technical SEO | | ianparryuk0 -
Home page duplicate content...
Hello all! I've just downloaded my first Moz crawl CSV and I noticed that the home page appears twice - one with an appending forward slash at the end: http://www.example.com
Technical SEO | | LiamMcArthur
http://www.example.com/ For any of my product and category pages that encounter this problem - it's automatically resolved with a canonical tag. Should I create the same canonical tag for my home page? rel="canonical" href="http://www.example.com" />0 -
Hreflang and possible duplicate content SEO issue
| 0 <a class="vote-down-off" title="This question does not show any research effort; it is unclear or not useful">down vote</a> favorite | Hey community, my first question here 🙂 Imagine there is a page with video, it has hreflang tags setup, to lead let's say German visitors to /de/ folder... So, on that German version of page, everything like menus, navigation and such are in German, but the video is the same, the title of the video (H1 tag) is the same, <title></code></strong> and <strong><code>meta description</code></strong> is the same as on the original English page. It means that general (English) page and German version of it has the same key content in English.</p> <p>To me it seems to be a SEO duplicate content issue. As I know, Google doesn't think that content is duplicate, if it is properly translated to other language.</p> <p>Does my explained case mean that the content will be detected by Google as duplicate?</p> </div> </div> </td> </tr> </tbody> </table></title> |
Technical SEO | | poiseo0 -
Content Duplication - Zencart
Hi Guys !!! Based on crawler results, it shows that I have 188 duplicate content pages, out of which some are those in which I am not able to understand where the duplication is ??? The page created is unique. All the URL's are static, all titles, metat tags are unique. How do I remove this duplication !!! I am using Zencart as a platform. Thanks in advance for the help !!! 🙂
Technical SEO | | sidjain4you0 -
Duplicate Content - Products
When running a report it says we have lots of duplicate content. We are a e-commerce site that has about 45,000 sku's on the site. Products can be in multiple departments on the site. So the same products can show up on different pages of the site. Because of this the reports show multiple products with duplicate content. Is this an issue with google and site ranking? Is there a way to get around this issue?
Technical SEO | | shoedog1 -
Duplicate content
I have two page, where the second makes a duplicate content from the first Example:www.mysite.com/mypagewww.mysite.com/mysecondpageIf i insert still making duplicate content?Best regards,Wendel
Technical SEO | | peopleinteractive0 -
Duplicate Content - Just how killer is it?
Yesterday I received my ranking report and was extremely disappointed that my high-priority pages dropped in rank for a second week in a row for my targeted keywords. This is after running them through the gradecard and getting As for each of them on the keywords I wanted. I looked at my google webmaster tools and saw new duplicate content pages listed, which were the ones I had just modified to get my keyword targeting better. In my hastiness to work on getting the keyword usage up, I neglected to prevent these descriptions from coming up when viewing the page with filter parameters, sort parameters and page parameters... so google saw these descriptions as duplicate content (since myurl.html and myurl.html?filter=blah are seen as different). So my question: is this the likely culprit for some pretty drastic hits to ranking? I've fixed this now, but are there any ways to prevent this in the future? (I know _of _canonical tags, but have never used them, and am not sure if this applies in this situation) Thanks! EDIT: One thing I forgot to ask as well: has anyone inflicted this upon themselves? And how long did it take you to recover?
Technical SEO | | Ask_MMM0 -
Need help with Joomla duplicate content issues
One of my campaigns is for a Joomla site (http://genesisstudios.com) and when my full crawl was done and I review the report, I have significant duplicate content issues. They seem to come from the automatic creation of /rss pages. For example: http://www.genesisstudios.com/loose is the page but the duplicate content shows up as http://www.genesisstudios.com/loose/rss It appears that Joomla creates feeds for every page automatically and I'm not sure how to address the problem they create. I have been chasing down duplicate content issues for some time and thought they were gone, but now I have about 40 more instances of this type. It also appears that even though there is a canonicalization plugin present and enabled, the crawl report shows 'false' for and rel= canonicalization tags Anyone got any ideas? Thanks so much... Scott | |
Technical SEO | | sdennison0