Duplicate Content From Indexing of non- File Extension Page
-
Google somehow has indexed a page of mine without the .html extension. so they indexed www.samplepage.com/page, so I am showing duplicate content because Google also see's www.samplepage.com/page.html How can I force google or bing or whoever to only index and see the page including the .html extension? I know people are saying not to use the file extension on pages, but I want to, so please anybody...HELP!!!
-
Yeah I looked further into the URL removal, but I guess technically I did not meet the criteria....and honestly I am fearful other potential implications of removal....I guess I will just have to wait for the 301 to ick in. I just cant believe there is not a simple .htaccess code to cause all URL's to show the .html extension. I mean it is a simple thing to implement the reverse and have the extension dropped...I mean....good lord...
Thanks for all your help though Mike, I truly appreciate the efforts!
-
LAME! You may just want to let the 301 redirect you have in place take its course or remove the URL from Google's index since it was added by mistake anyway.
Mike
-
Nope. .....good lord....
-
Nope.
-
If that does not work, give this a whirl:
RewriteCond %{REQUEST_URI} !\.[a-zA-Z0-9]{3,4}
RewriteCond %{REQUEST_URI} !/$
RewriteRule ^(.*)$ $1.html
-
Try:
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^([^.]*[^./])$ /$1.html [R=301,L] -
That caused the same "500 Internal Server Error" .......
-
Try my code without all the other redirects, and see if it works. If it does, then add back the other redirects one by one until everything works.
-
Oh, and my site auditor is seeing it as a directory with a file in it??? Ugghhh....
-
Nope. Didn't work. I am seriously about to lose my mind with this....
-
Maybe give this a whirl:
If URL does not contain a period or end with a slash
RewriteCond %{REQUEST_URI} !(.|/$)
append .html to requested URL
RewriteRule (.*) /$1.html [L]
-
I get a server error when I do this? Sooo confused... Here is the htaccess changes I made. FYI...I have removed the code you told me to put in there temporarily so the site's not down. I attached the server error screenshot too...
Options +FollowSymlinks
RewriteEngine OnRewriteCond %{REQUEST_URI} ! .html$
RewriteCond %{REQUEST_URI} ! /$
RewriteRule ^(.*)$ $1.htmlRewriteCond %{HTTP_HOST} ^hanneganconstructionllc.com [NC]
RewriteRule ^(.*)$ http://hanneganremodeling.com/$1 [L,R=301]RewriteCond %{HTTP_HOST} ^www.hanneganconstructionllc.com [NC]
RewriteRule ^(.*)$ http://hanneganremodeling.com/$1 [L,R=301]RewriteCond %{HTTP_HOST} ^hremodeling.com [NC]
RewriteRule ^(.*)$ http://hanneganremodeling.com/$1 [L,R=301]RewriteCond %{HTTP_HOST} ^www.hremodeling.com [NC]
RewriteRule ^(.*)$ http://hanneganremodeling.com/$1 [L,R=301]RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /index.html\ HTTP/
RewriteRule ^index.html$ http://www.hanneganremodeling.com/ [R=301,L]RewriteBase /
RewriteCond %{HTTP_HOST} ^hanneganremodeling.com$ [NC]
RewriteRule ^(.*)$ http://www.hanneganremodeling.com/$1 [R=301,L] -
You repeat this code a few times, maybe that's the problem? Pretty sure you only need it once:
RewriteEngine On
Options +FollowSymlinks
RewriteBase /The line:
RewriteEngine On
Also only needs to be included once in an htaccess file. You may want to remove all the other instances.
Try adding this code at the very top, after the first "RewriteEngine On":
RewriteCond %{REQUEST_URI} ! .html$
RewriteCond %{REQUEST_URI} ! /$
RewriteRule ^(.*)$ $1.html -
Thanks Mike, you are awesome! I actually was thinking to do that, but I was concerned that it might have some larger implications?
I also just resubmitted a sitemap so hopefully that "might" speed up the crawl process...
Thanks again!
-
"I accidentally manually submitted the url to google and manually in submitted it to index and that when this issue began...."
It sounds like you accidently added this URL to the index. You can follow the procedure outlined below to request Google remove the specific URL from the index:
https://support.google.com/webmasters/bin/answer.py?hl=en&answer=59819
I checked your site's structure using Screaming Frog and it does not appear that you are linking to any non-.html versions. If I perform a scan using one of your non-.html pages, it appears that it only links to itself.
Since you have the 301 redirect in place, you can choose to wait it out and Google should correct things eventually; otherwise, requesting Google remove the URL is a faster... PERMANENT process.
Good luck.
Mike
-
No it's not a wordpress, it was created with Dreamweaver. I didn't make sample and sample.html same page, but google is treating it that way.... I have implemented the 301, so I guess I just have to wait for a crawl
-
Thank you very for your input! When I implement into my .htacces what you suggested I get a "Internet 500 Server Error" ? Maybe it would help if I list what I currently have in my .htaccess I had to redirect some old domains and did canonical redirects and default non .index....I hope this help, I am at my wit's end... I also attached a screenshot of the webmaster warning... THANKS!!!
Options +FollowSymlinks
RewriteEngine OnRewriteCond %{HTTP_HOST} ^hanneganconstructionllc.com [NC]
RewriteRule ^(.*)$ http://hanneganremodeling.com/$1 [L,R=301]RewriteCond %{HTTP_HOST} ^www.hanneganconstructionllc.com [NC]
RewriteRule ^(.*)$ http://hanneganremodeling.com/$1 [L,R=301]RewriteCond %{HTTP_HOST} ^hremodeling.com [NC]
RewriteRule ^(.*)$ http://hanneganremodeling.com/$1 [L,R=301]RewriteCond %{HTTP_HOST} ^www.hremodeling.com [NC]
RewriteRule ^(.*)$ http://hanneganremodeling.com/$1 [L,R=301]RewriteEngine on
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /index.html\ HTTP/
RewriteRule ^index.html$ http://www.hanneganremodeling.com/ [R=301,L]RewriteEngine On
Options +FollowSymlinks
RewriteBase /
RewriteCond %{HTTP_HOST} ^hanneganremodeling.com$ [NC]
RewriteRule ^(.*)$ http://www.hanneganremodeling.com/$1 [R=301,L]Options +FollowSymLinks
RewriteEngine On
RewriteBase / -
Is this a wordpress based site ? What CMS are you using ? How were you able to get domain.com/sample and domain.com/sample.html be the same page ? Either way, canonical tag is the correct solution in this case. There's no need for a 301 and if you do 301 redirects, you are not really fixing the issue caused by your CMS System.
I would therefore strongly advise to use the canonical tag. That's the intended use of that tag.
-
A canonical tag won't physically redirect you when you visit the page, it just lets the search engines know which is the right page to index.
If you want to actually redirect using .htaccess, try using this code
RewriteEngine On
RewriteCond %{REQUEST_URI} ! .html$
RewriteCond %{REQUEST_URI} ! /$
RewriteRule ^(.*)$ $1.html
-
I tried the canonical and when I enter the url without the .html, it doesn't resolve to the url with the .html extension. I tried an .htaccess reirect...I am stumped, I can't get it to redirect automatically the the .html I accidentally manually submitted the url to google and manually in submitted it to index and that when this issue began....
-
Add a canonical tag to your header so that Google/Bing knows which version of your page they should be indexing.
You can also try looking into where the link to the non-html page is coming from. If it's an internal link, just change it so that Google doesn't continue to crawl it.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Duplicate Content with URL Parameters
Moz is picking up a large quantity of duplicate content, consists mainly of URL parameters like ,pricehigh & ,pricelow etc (for page sorting). Google has indexed a large number of the pages (not sure how many), not sure how many of them are ranking for search terms we need. I have added the parameters into Google Webmaster tools And set to 'let google decide', However Google still sees it as duplicate content. Is it a problem that we need to address? Or could it do more harm than good in trying to fix it? Has anyone had any experience? Thanks
Intermediate & Advanced SEO | | seoman100 -
Duplicate Content Issues :(
I am wondering how we can solve our duplicate content issues. Here is the thing: There are so many ways you can write a description about a used watch. http://beckertime.com/product/mens-rolex-air-king-no-date-stainless-steel-watch-wsilver-dial-5500/ http://beckertime.com/product/mens-rolex-air-king-stainless-steel-date-watch-wblue-dial-5500/ Whats different between these two? The dial color. We have a lot of the same model numbers but with different conditions, dial colors, and bands.. What ideas do you have?
Intermediate & Advanced SEO | | KingRosales0 -
More Indexed Pages than URLs on site.
According to webmaster tools, the number of pages indexed by Google on my site doubled yesterday (gone from 150K to 450K). Usually I would be jumping for joy but now I have more indexed pages than actual pages on my site. I have checked for duplicate URLs pointing to the same product page but can't see any, pagination in category pages doesn't seem to be indexed nor does parameterisation in URLs from advanced filtration. Using the site: operator we get a different result on google.com (450K) to google.co.uk (150K). Anyone got any ideas?
Intermediate & Advanced SEO | | DavidLenehan0 -
How best to handle (legitimate) duplicate content?
Hi everyone, appreciate any thoughts on this. (bit long, sorry) Am working on 3 sites selling the same thing...main difference between each site is physical location/target market area (think North, South, West as an example) Now, say these 3 sites all sell Blue Widgets, and thus all on-page optimisation has been done for this keyword. These 3 sites are now effectively duplicates of each other - well the Blue Widgets page is at least, and whist there are no 'errors' in Webmaster Tools am pretty sure they ought to be ranking better than they are (good PA, DA, mR etc) Sites share the same template/look and feel too AND are accessed via same IP - just for good measure 🙂 So - to questions/thoughts. 1 - Is it enough to try and get creative with on-page changes to try and 'de-dupe' them? Kinda tricky with Blue Widgets example - how many ways can you say that? I could focus on geographical element a bit more, but would like to rank well for Blue Widgets generally. 2 - I could, i guess, no-index, no-follow, blue widgets page on 2 of the sites, seems a bit drastic though. (or robots.txt them) 3 - I could even link (via internal navigation) sites 2 and 3 to site 1 Blue Widgets page and thus make 2 blue widget pages redundant? 4 - Is there anything HTML coding wise i could do to pull in Site 1 content to sites 2 and 3, without cloaking or anything nasty like that? I think 1- is first thing to do. Anything else? Many thanks.
Intermediate & Advanced SEO | | Capote0 -
How much (%) of the content of a page is considered too much duplication?
Google is not fond of duplication, I have been very kindly told. So how much would you suggest is too much?
Intermediate & Advanced SEO | | simonberenyi0 -
Is there a way to stop my product pages with the "show all" catagory/attribute from duplicating content?
If there were less pages with the "show all" attribute it would be a simple fix by adding the canonical URL tag. But seeing that there are about 1,000 of them I was wondering if their was a broader fix that I could apply.
Intermediate & Advanced SEO | | cscoville0 -
Pages with Little Content
I have a website that lists events in Dublin, Ireland. I want to provide a comprehensive number of listings but there are not enough hours in the day to provide a detailed (or even short) unique description for every event. At the moment I have some pages with little detail other than the event title and venue. Should I try and prevent Google from crawling/indexing these pages for fear of reducing the overall ranking of the site? At the moment I only link to these pages via the RSS feed. I could remove the pages entirely from my feed, but then that mean I remove information that might be useful to people following the events feed. Here is an example page with very little content
Intermediate & Advanced SEO | | andywozhere0 -
Mobile Site - Same Content, Same subdomain, Different URL - Duplicate Content?
I'm trying to determine the best way to handle my mobile commerce site. I have a desktop version and a mobile version using a 3rd party product called CS-Cart. Let's say I have a product page. The URLs are... mobile:
Intermediate & Advanced SEO | | grayloon
store.domain.com/index.php?dispatch=categories.catalog#products.view&product_id=857 desktop:
store.domain.com/two-toned-tee.html I've been trying to get information regarding how to handle mobile sites with different URLs in regards to duplicate content. However, most of these results have the assumption that the different URL means m.domain.com rather than the same subdomain with a different address. I am leaning towards using a canonical URL, if possible, on the mobile store pages. I see quite a few suggesting to not do this, but again, I believe it's because they assume we are just talking about m.domain.com vs www.domain.com. Any additional thoughts on this would be great!0