Duplicate Content Issue
-
Why do URL with .html or index.php at the end are annoying to the search engine? I heard it can create some duplicate content but I have no idea why? Could someone explain me why is that so?
Thank you
-
Using this as an example: If google showed duplicate content for http://www.domain.com/seo.html and http://www.domain.com/seo then you would want a 301 .htaccess redirect so anyone accessing the http://www.domain.com/seo.htm version of the page is automatically sent to http://www.domain.com/seo.
You can use the following in your .htaccess file.
RewriteRule ^/seo.html$ http://www.domain.com/seo/? [R=301,NC,L]
The benefit to having no file extension (.html) on the end of your URLs is it allows you to change the underlying framework or your website or even the programming language without the need for adding redirects each time you change (provided the website structure remains the same)
-
If there is the same page with the urls domain.com/seo & domain.com/seo.htm for this is what can be considered as duplicate content. With or without (better without) there should only be one version of the URL, to not split the link juice passing through. Hope this helps, Vahe
-
Agreed
-
There is also the problem with http://www.example.com and http:/example.com - they also generate duplicate content.
What you should do?
First - solve the http://www.example vs http://example issue
Edit your .htaccess file (if you don't have it, create one). Inside this file you need to input:RewriteEngine On RewriteCond %{HTTP_HOST} ^example.com RewriteRule (.*) http://www.example.com/$1 [R=301,L]
(where example.com is your websiteSecond - solve the index.html duplicate content
You need to include in the index.html file the follow metag inside the section:
This tag will tell google to forget the index.html and focus at www.example.com. So you will avoid the duplicate content without any problem
I hope I could help you.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Content Strategy/Duplicate Content Issue, rel=canonical question
Hi Mozzers: We have a client who regularly pays to have high-quality content produced for their company blog. When I say 'high quality' I mean 1000 - 2000 word posts written to a technical audience by a lawyer. We recently found out that, prior to the content going on their blog, they're shipping it off to two syndication sites, both of which slap rel=canonical on them. By the time the content makes it to the blog, it has probably appeared in two other places. What are some thoughts about how 'awful' a practice this is? Of course, I'm arguing to them that the ranking of the content on their blog is bound to be suffering and that, at least, they should post to their own site first and, if at all, only post to other sites several weeks out. Does anyone have deeper thinking about this?
Intermediate & Advanced SEO | | Daaveey0 -
Duplicate content. Competing for rank.
Scenario: An automotive dealer lists cars for sale on their website. The descriptions are very good and in depth at 1,200 words per car. However chunks of the copy are copied from car review websites and weaved into their original copy. Q1: This is flagged in copyscape - how much of an issue is this for Google? Q2: The same stock with the same copy is fed into a popular car listing website - the dealer's website and the classifieds website often rank in the top two positions (sometimes the dealer on top other times the classifieds site). Is this a good or a bad thing? Are you risking being seen as duplicating/scraping content? Thank you.
Intermediate & Advanced SEO | | Bee1590 -
Case Sensitive URLs, Duplicate Content & Link Rel Canonical
I have a site where URLs are case sensitive. In some cases the lowercase URL is being indexed and in others the mixed case URL is being indexed. This is leading to duplicate content issues on the site. The site is using link rel canonical to specify a preferred URL in some cases however there is no consistency whether the URLs are lowercase or mixed case. On some pages the link rel canonical tag points to the lowercase URL, on others it points to the mixed case URL. Ideally I'd like to update all link rel canonical tags and internal links throughout the site to use the lowercase URL however I'm apprehensive! My question is as follows: If I where to specify the lowercase URL across the site in addition to updating internal links to use lowercase URLs, could this have a negative impact where the mixed case URL is the one currently indexed? Hope this makes sense! Dave
Intermediate & Advanced SEO | | allianzireland0 -
How would you handle this duplicate content - noindex or canonical?
Hello Just trying look at how best to deal with this duplicated content. On our Canada holidays page we have a number of holidays listed (PAGE A)
Intermediate & Advanced SEO | | KateWaite
http://www.naturalworldsafaris.com/destinations/north-america/canada/suggested-holidays.aspx We also have a more specific Arctic Canada holidays page with different listings (PAGE B)
http://www.naturalworldsafaris.com/destinations/arctic-and-antarctica/arctic-canada/suggested-holidays.aspx Of the two, the Arctic Canada page (PAGE B) receives a far higher number of visitors from organic search. From a user perspective, people expect to see all holidays in Canada (PAGE A), including the Arctic based ones. We can tag these to appear on both, however it will mean that the PAGE B content will be duplicated on PAGE A. Would it be the best idea to set up a canonical link tag to stop this duplicate content causing an issue. Alternatively would it be best to no index PAGE A? Interested to see others thoughts. I've used this (Jan 2011 so quite old) article for reference in case anyone else enters this topic in search of information on a similar thing: Duplicate Content: Block, Redirect or Canonical - SEO Tips0 -
Why is Google Reporting big increase in duplicate content after Canonicalization update?
Our web hosting company recently applied a update to our site that should have rectified Canonicalized URLs. Webmaster tools had been reporting duplicate content on pages that had a query string on the end. After the update there has been a massive jump in Webmaster tools reporting now over 800 pages of duplicate content, Up from about 100 prior to the update plus it reporting some very odd pages (see attached image) They claim they have implement Canonicalization in line with Google Panda & Penguin, but surely something is not right here and it's going to cause us a big problem with traffic. Can anyone shed any light on the situation??? Duplicate%20Content.jpg
Intermediate & Advanced SEO | | Towelsrus0 -
Fixing Duplicate Content Errors
SEOMOZ Pro is showing some duplicate content errors and wondered the best way to fix them other than re-writing the content. Should I just remove the pages found or should I set up permanent re-directs through to the home page in case there is any link value or visitors on these duplicate pages? Thanks.
Intermediate & Advanced SEO | | benners0 -
Subdomains - duplicate content - robots.txt
Our corporate site provides MLS data to users, with the end goal of generating leads. Each registered lead is assigned to an agent, essentially in a round robin fashion. However we also give each agent a domain of their choosing that points to our corporate website. The domain can be whatever they want, but upon loading it is immediately directed to a subdomain. For example, www.agentsmith.com would be redirected to agentsmith.corporatedomain.com. Finally, any leads generated from agentsmith.easystreetrealty-indy.com are always assigned to Agent Smith instead of the agent pool (by parsing the current host name). In order to avoid being penalized for duplicate content, any page that is viewed on one of the agent subdomains always has a canonical link pointing to the corporate host name (www.corporatedomain.com). The only content difference between our corporate site and an agent subdomain is the phone number and contact email address where applicable. Two questions: Can/should we use robots.txt or robot meta tags to tell crawlers to ignore these subdomains, but obviously not the corporate domain? If question 1 is yes, would it be better for SEO to do that, or leave it how it is?
Intermediate & Advanced SEO | | EasyStreet0 -
Should I robots block site directories with primarily duplicate content?
Our site, CareerBliss.com, primarily offers unique content in the form of company reviews and exclusive salary information. As a means of driving revenue, we also have a lot of job listings in ouir /jobs/ directory, as well as educational resources (/career-tools/education/) in our. The bulk of this information are feeds, which exist on other websites (duplicate). Does it make sense to go ahead and robots block these portions of our site? My thinking is in doing so, it will help reallocate our site authority helping the /salary/ and /company-reviews/ pages rank higher, and this is where most of the people are finding our site via search anyways. ie. http://www.careerbliss.com/jobs/cisco-systems-jobs-812156/ http://www.careerbliss.com/jobs/jobs-near-you/?l=irvine%2c+ca&landing=true http://www.careerbliss.com/career-tools/education/education-teaching-category-5/
Intermediate & Advanced SEO | | CareerBliss0