Duplicate Content and URL Capitalization
-
I have multiple URLs that SEOMoz is reporting as duplicate content. The reason is that there are characters in the URL that may, or may not, be capitalized depending on user input.
A couple examples are:
www.househitz.com/Pennsylvania/Houses-for-sale
www.househitz.com/Pennsylvania/houses-for-sale
www.househitz.com/Pennsylvania/Houses-for-rent
www.househitz.com/Pennsylvania/houses-for-rent
There are currently thousands of instances of this on the site.
Is this something I should spend effort to try and resolve (may not be minor effort), or should I just ignore it and move on?
-
Hey Jom, you only rewrite the URL if it is not all lowercase, you can distinguish between lower and upper-case in your rewrites.
-
Mark,
In the canonicalization guide link you sent me, there is a link to Matt Cutts' blog www.mattcutts.com/blog/seo-advice-url-canonicalization/ where he talks about it. In that blog he posts:
Q: So when you say www vs. non-www, you’re talking about a type of canonicalization. Are there other ways that urls get canonicalized?
A: Yes, there can be a lot, but most people never notice (or need to notice) them. Search engines can do things like keeping or removing trailing slashes, trying to convert urls with upper case to lower case, or removing session IDs from bulletin board or other software (many bulletin board software packages will work fine if you omit the session ID).This makes me think that doing a 301 redirect and a rel="canonical" for lower case is not needed.
I'm conflicted again.
-
When you rewrite a URL that is already lower case to lower case with a 301 response code, does it now return a 301? Does that mean all pages on the site now return 301? Wouldn't that be bad?
Sorry if I'm being dense. I understand enough about rewrite rules to be dangerous (sometimes, very dangerous).
Jom
-
Yeah, it is absolutely the right thing to do. You can force the URLs t be lower case in RoR as well if you don't want to do it in htaccess (i would do both).
You are simply saying:
-
there are multiple versions of this page on different urls
-
this is the main version of the page
301 them to lower case and canonicalise them and you are good to go!
Marcus
-
-
Thanks, much! I will read through these.
-
Hi Marcus and Mark,
Thanks for the response. On creating the rel="canonical" statements.
That means that I will have thousands, perhaps hundreds of thousands (there are a lot of cities and zips in the US) of rel="canonical" statements on my site.
I thought I read on one of the blogs that too many canonical statements are bad practice. The site is dynamic (Ruby on Rails), I can certainly make the change. I would just like to be sure it's the wise thing to do.
-
Hey Jom,
I must admit I am not sure on the level of urgency to sort this problem out but personally I like to keep the duplication of content to a minimum.
There are multiple ways to sort this out but the most straight forward would probably be to add a rel canonical tag to your web pages.
Here is a good post discussing the faceted issues you can get from e-commerce site, here is SEOMoz's canonicalization guide and here is another seomoz blog post about e-commerce sites and the use of the rel canonical tag.
Hope this helps
-
Hey Jom
Problem is, from a search engine perspective, those are four duplicate pages & from a linking perspective, they are four different pages that you could see your link popularity shared between. Neither of which is ideal.
I would certainly deal with this but it needn't be an arduous task.
1. Set up a rewrite rule to change all URLs to lowercase and 301 any non lowercase ones, something like this in your htaccess should do the job assuming you are using a LAMP environment.
RewriteEngine On RewriteMap lc int:tolower RewriteCond %{REQUEST_URI} [A-Z] RewriteRule (.*) ${lc:$1} [R=301,L]
2. Add an automated lowercase canonical to all of these pages so they canonicalise to the lowercase version.
3. Try to replace the links so they all use lowercase. If this is a dynamic site it should be easy but if not, you could still do a string replacement across multiple files. You could write a little script to automate this if it is a huge job from the sitemap (of lowercase URLs of course.
Certainly worth doing and should not be too difficult with a bit of smarts applied.
Hope this helps!
Marcus
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
We have 2 versions of URLs. we have the mobile and the desktop. is that a duplicate content?
Hi, Our website has two version of URLs. dektop: www.myexample.com and mobile www.myexample.com/m If you go to our site from a mobile device you will land on our mobile URL, if you go to our site from desktop computer you will land on a regular URL. Both urls have the same content. Is that considered duplicate? If yes, then what can I do to fix it? Also, both URLs are indexed by google. We have two separate XML sitemaps- one for desktop and one for mobile. Is that a good SEO practice?
Technical SEO | | Armen-SEO0 -
Duplicate content on Places to Stay listings pages
Hello, I've just crawled our website https://www.i-escape.com/ to find we have a duplicate content issue. Every places to stay listing page has identical content (over 1,500 places) due to the fact it's based on user searches or selections. If we hide this pages using canonical tags, will we lose our visibility for each country and/or region we promote hotels? Any help on this would be hugely appreciated! Thanks so much Clair
Technical SEO | | iescape0 -
Duplicate content - working with CMS constraints
Hi, We use an industry-specific CMS and I'm struggling to figure out how we can fix duplicate content issues. Thankfully, the vendor has agreed to work on 301 vs 302 redirects. However, they aren't currently able to give us the ability to add rel=canonical tags to page headers (we've put it in their "suggestion box" which tends to take a long time, if ever, to materialize). My understanding is that the tag will not be recognized if it's in the body code, correct? (aka the part of the page we can edit from the CMS) Is there anything else I can do?
Technical SEO | | combska0 -
Affiliate Url & duplicate content
Hi i have checked passed Q&As and couldn't find anything on this so thought I would ask.
Technical SEO | | Direct_Ram
I have recently noticed my URLS adding the following to the end: mydomain.com/?fullweb=1 I cant seem to locate where these URLS are coming from and how this is being created? This is causing duplicate content on google. I wanted to know ig anyone has had any previous experience with something like this? If anyone has any information on this it would be a great help. thanks E0 -
Is this duplicate content when there is a link back to the original content?
Hello, My question is: Is it duplicate content when there is a link back to the original content? For example, here is the original page: http://www.saugstrup.org/en-ny-content-marketing-case-infografik/. But that same content can be found here: http://www.kommunikationsforum.dk/anders-saugstrup/blog/en-ny-content-marketing-case-til-dig, but there is a link back to the original content. Is it still duplicate content? Thanks in advance.
Technical SEO | | JoLindahl912 -
Content Duplication and Canonical Tag settings
Hi all, I have a question regarding content duplication.My site has posted one fresh content in the article section and set canonical in the same page for avoiding content duplication._But another webmaster has taken my post and posted the same in his site with canonical as his site url. They have not given to original source as well._May I know how Google will consider these two pages. Which site will be affected with content duplication by Google and how can I solve this issue?If two sites put canonical tags in there own pages for the same content how the search engine will find the original site which posted fresh content. How can we avoid content duplication in this case?
Technical SEO | | zco_seo0 -
Duplicate content issue with trailing / ?
Hi ,I did a SEOmoz Crawl Test and found most pages show twice, for example: A: www.website.com/index.php/dog/walk B: www.website.com/index.php/dog/walk/ I've checked Google Analytics and 90% of organic search traffic arrives on the URLs with the trailing slash (B). Question 1: Can I assume I've a duplicate content problem? Question 2: Is it best to do 301 redirects from the 'non trailing slash' pages to the 'trailing slash pages'? Question 3: For some reason every web page has a '/index.php' in it (see A&B) above. No idea why. Should it be a SEO concern? Kind regards and thank you in advance Nigel
Technical SEO | | Richard5550 -
What is the best practice to handle duplicate content?
I have several large sections that SEOMOZ is indicating has duplicate content, even though the content is not identical. For example: Leather Passport Section - Leather Passports - Black - Leather Passposts - Blue - Leather Passports - Tan - Etc. Each of the items has good content, but it is identical, since they are the same products. What is the best practice here: 1. Have only one product with a drop down (fear is that this is not best for the customer) 2. Make up content to have them sound different? 3. Put a do-no-follow on the passport section? 4. Use a rel canonical even though the sections are technically not identical? Thanks!
Technical SEO | | trophycentraltrophiesandawards0