Duplicate content /index.php/ issues
-
I'm having some duplicate content issues with Google. I've already got my .htaccess file working just fine as far as I can tell. Rewriting works great, and by using the site you'd never end up on a page with /index.php. However I do notice that on ANY page of the site you could add /index.php and get the same page i.e.:
www.mysite.com/category/article
and
www.mysite.com/index.php/category/article
Would both return the same page. How can I 301 or something similar all /index.php pages to the non index.php version? I have no desire for any page on my site to have index.php in it, there is no use to it. Having quite the hard time figuring this out.
Again this is basically just for the robots, the URL's the users see are perfect, never had an issue with that. Just SEOMOZ reporting duplicate content and I've verified that to be true.
-
Hey Emory - if that's the default .htaccess file your software created (assume this is a Joomla-based site?), it looks like the redirect code you need is already there, but it is disabled by default.
The following code
Remove index.php or index.htm/html from URL requests
#RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /(([^/]+/)*)index.(php|html?)\ HTTP/
#RewriteCond %{REQUEST_URI} !^/administrator
#RewriteRule ^([^/]+/)*index.(html?|php)$ http://www.mysite.com/$1 [R=301,L]should do what you want, The reason its not currently doing anything is because it has been commented out. The "#" symbol at the beginning of each line tells the server NOT to run the code in that line.
Try removing the "#" symbol in front of the last three lines of that code, save the file & then thoroughly test your site. (It's not the way I would write it, but there may be specific requirements for your site/system) The first line is just a descriptive header, so the "#" symbol needs to be left on it.
If for any reason it causes problems, you can simply re-add the "#" symbols and re-save to return the site to its original state.
Give that a shot and let us know if it accomplishes what you want to do.
Paul
P.S. In particular when testing - ensure that client logins work correctly, and that the search function and all plugins also still work.
-
Any ideas/input?
-
Tried that in many ways, but can't get it working. Here is a copy of the .htaccess file, what changes would need to be made (clearly input that code):
Options +FollowSymLinks
RewriteEngine On
prevents people from accessing anything with phpMyAdmin
RewriteRule phpMyAdmin - [F]
Remove index.php or index.htm/html from URL requests
#RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /(([^/]+/)*)index.(php|html?)\ HTTP/
#RewriteCond %{REQUEST_URI} !^/administrator
#RewriteRule ^([^/]+/)*index.(html?|php)$ http://www.mysite.com/$1 [R=301,L]force canonical www if request is for non-www or has port number etc
RewriteCond %{HTTP_HOST} !^(www.mysite.com)?$
RewriteRule (.*) http://www.mysite.com/$1 [R=301,L]redirect 301 /home.html http://www.mysite.com/
RewriteCond %{QUERY_STRING} mosConfig_[a-zA-Z_]{1,21}(=|%3D) [OR]
RewriteCond %{QUERY_STRING} base64_encode[^(]([^)]) [OR]
RewriteCond %{QUERY_STRING} (<|%3C)([^s]s)+cript.(>|%3E) [NC,OR]
RewriteCond %{QUERY_STRING} GLOBALS(=|[|%[0-9A-Z]{0,2}) [OR]
RewriteCond %{QUERY_STRING} _REQUEST(=|[|%[0-9A-Z]{0,2})
RewriteRule .* index.php [F]#RewriteBase /
RewriteRule .* - [E=HTTP_AUTHORIZATION:%{HTTP:Authorization}]
RewriteCond %{REQUEST_URI} !^/index.php
RewriteCond %{REQUEST_URI} (/[^.]|.(php|html?|feed|pdf|raw))$ [NC]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . index.php [L] -
Hi Emory,
Simple solution would be to redirect to root from the index.php using htaccess using the rule below. Lets us know how this works for you
RewriteRule ^(.*)index.(html|php)$ http://%{HTTP_HOST}/$1 [R=301,L]
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Redirect /label/ to /tags/
Hi guys, I have noticed loads of errors in webmaster, page not found.. /label/..... what i need to do is to a 301 redirect to /tags/... can some one tell me the redirect code to help fix this issue Regards T
Technical SEO | | Taiger0 -
Indexing pages content that is not needed
Hi All, I have a site that has articles and a side block that shows interesting articles in a column block. While we google for a keyword i can see the page but the meta description is picked from the side block "interesting articles" and not the actual article in the page. How can i deny indexing that block alone Thanks
Technical SEO | | jomin740 -
Is duplicate content ok if its on LinkedIn?
Hey everyone, I am doing a duplicate content check using copyscape, and realized we have used a ton of the same content on LinkedIn as our website. Should we change the LinkedIn company page to be original? Or does it matter? Thank you!
Technical SEO | | jhinchcliffe0 -
Duplicate content
I have two page, where the second makes a duplicate content from the first Example:www.mysite.com/mypagewww.mysite.com/mysecondpageIf i insert still making duplicate content?Best regards,Wendel
Technical SEO | | peopleinteractive0 -
Duplicate Content - Just how killer is it?
Yesterday I received my ranking report and was extremely disappointed that my high-priority pages dropped in rank for a second week in a row for my targeted keywords. This is after running them through the gradecard and getting As for each of them on the keywords I wanted. I looked at my google webmaster tools and saw new duplicate content pages listed, which were the ones I had just modified to get my keyword targeting better. In my hastiness to work on getting the keyword usage up, I neglected to prevent these descriptions from coming up when viewing the page with filter parameters, sort parameters and page parameters... so google saw these descriptions as duplicate content (since myurl.html and myurl.html?filter=blah are seen as different). So my question: is this the likely culprit for some pretty drastic hits to ranking? I've fixed this now, but are there any ways to prevent this in the future? (I know _of _canonical tags, but have never used them, and am not sure if this applies in this situation) Thanks! EDIT: One thing I forgot to ask as well: has anyone inflicted this upon themselves? And how long did it take you to recover?
Technical SEO | | Ask_MMM0 -
How to tell if PDF content is being indexed?
I've searched extensively for this, but could not find a definitive answer. We recently updated our website and it contains links to about 30 PDF data sheets. I want to determine if the text from these PDFs is being archived by search engines. When I do this search http://bit.ly/rRYJPe (google - site:www.gamma-sci.com and filetype:pdf) I can see that the PDF urls are getting indexed, but does that mean that their content is getting indexed? I have read in other posts/places that if you can copy text from a PDF and paste it that means Google can index the content. When I try this with PDFs from our site I cannot copy text, but I was told that these PDFs were all created from Word docs, so they should be indexable, correct? Since WordPress has you upload PDFs like they are an image could this be causing the problem? Would it make sense to take the time and extract all of the PDF content to html? Thanks for any assistance, this has been driving me crazy.
Technical SEO | | zazo0 -
Getting multiple errors for domain.com/xxxx/xxxx/feed/feed/feed/feed...
A recent SEOMoz crawl report is showing a bunch 404's and duplicate page content on pages with urls like http://domain.com/categories/about/feed/feed/feed/feed/feed and on and on. This is a wordpress install. Does anyone know what could be causing this or why SEOMoz would be trying to read these non-existent feed pages?
Technical SEO | | Brandtailers0 -
How do I get content to be indexed at the top?
I have a paragraph at the top of my homepage. I was told I could use css to make the content visually appear at the bottom of the page but it would still get indexed at the top of the page, still giving it the same level of importance. Can anyone tell me how to do this?
Technical SEO | | BradBorst0