Duplicate Content and URL Capitalization
-
I have multiple URLs that SEOMoz is reporting as duplicate content. The reason is that there are characters in the URL that may, or may not, be capitalized depending on user input.
A couple examples are:
www.househitz.com/Pennsylvania/Houses-for-sale
www.househitz.com/Pennsylvania/houses-for-sale
www.househitz.com/Pennsylvania/Houses-for-rent
www.househitz.com/Pennsylvania/houses-for-rent
There are currently thousands of instances of this on the site.
Is this something I should spend effort to try and resolve (may not be minor effort), or should I just ignore it and move on?
-
Hey Jom, you only rewrite the URL if it is not all lowercase, you can distinguish between lower and upper-case in your rewrites.
-
Mark,
In the canonicalization guide link you sent me, there is a link to Matt Cutts' blog www.mattcutts.com/blog/seo-advice-url-canonicalization/ where he talks about it. In that blog he posts:
Q: So when you say www vs. non-www, you’re talking about a type of canonicalization. Are there other ways that urls get canonicalized?
A: Yes, there can be a lot, but most people never notice (or need to notice) them. Search engines can do things like keeping or removing trailing slashes, trying to convert urls with upper case to lower case, or removing session IDs from bulletin board or other software (many bulletin board software packages will work fine if you omit the session ID).This makes me think that doing a 301 redirect and a rel="canonical" for lower case is not needed.
I'm conflicted again.
-
When you rewrite a URL that is already lower case to lower case with a 301 response code, does it now return a 301? Does that mean all pages on the site now return 301? Wouldn't that be bad?
Sorry if I'm being dense. I understand enough about rewrite rules to be dangerous (sometimes, very dangerous).
Jom
-
Yeah, it is absolutely the right thing to do. You can force the URLs t be lower case in RoR as well if you don't want to do it in htaccess (i would do both).
You are simply saying:
-
there are multiple versions of this page on different urls
-
this is the main version of the page
301 them to lower case and canonicalise them and you are good to go!
Marcus
-
-
Thanks, much! I will read through these.
-
Hi Marcus and Mark,
Thanks for the response. On creating the rel="canonical" statements.
That means that I will have thousands, perhaps hundreds of thousands (there are a lot of cities and zips in the US) of rel="canonical" statements on my site.
I thought I read on one of the blogs that too many canonical statements are bad practice. The site is dynamic (Ruby on Rails), I can certainly make the change. I would just like to be sure it's the wise thing to do.
-
Hey Jom,
I must admit I am not sure on the level of urgency to sort this problem out but personally I like to keep the duplication of content to a minimum.
There are multiple ways to sort this out but the most straight forward would probably be to add a rel canonical tag to your web pages.
Here is a good post discussing the faceted issues you can get from e-commerce site, here is SEOMoz's canonicalization guide and here is another seomoz blog post about e-commerce sites and the use of the rel canonical tag.
Hope this helps
-
Hey Jom
Problem is, from a search engine perspective, those are four duplicate pages & from a linking perspective, they are four different pages that you could see your link popularity shared between. Neither of which is ideal.
I would certainly deal with this but it needn't be an arduous task.
1. Set up a rewrite rule to change all URLs to lowercase and 301 any non lowercase ones, something like this in your htaccess should do the job assuming you are using a LAMP environment.
RewriteEngine On RewriteMap lc int:tolower RewriteCond %{REQUEST_URI} [A-Z] RewriteRule (.*) ${lc:$1} [R=301,L]
2. Add an automated lowercase canonical to all of these pages so they canonicalise to the lowercase version.
3. Try to replace the links so they all use lowercase. If this is a dynamic site it should be easy but if not, you could still do a string replacement across multiple files. You could write a little script to automate this if it is a huge job from the sitemap (of lowercase URLs of course.
Certainly worth doing and should not be too difficult with a bit of smarts applied.
Hope this helps!
Marcus
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
URL slash creating duplicate content
Hi All, I currently have an issue whereby by domain name (just homepage) has: mydomain.com and: mydomain.com/ Moz crawler flags this up as duplicate content - does anyone know of a way I can fix this? Thanks! Jack
Technical SEO | | Jack11660 -
Simple duplicate content query
Hello Community, One of my clients runs a job board website. They are having some new framework installed which will lead to them having to delete all their jobs and re-add them. The same jobs will be re-posted but with a different reference number which in turn with change each URL. I believe this will cause significant duplicate content issues, I just thought I would get a second opinion on best practice for approaching a situation like this. Would a possible solution be to delete jobs gradually and 301 re-direct old URLs to new URLs? Many thanks in advance, Adam
Technical SEO | | SO_UK0 -
Canonical Tags - Do they only apply to internal duplicate content?
Hi Moz, I've had a complaint from a company who we use a feed from to populate a restaurants product list.They are upset that on our products pages we have canonical tags linking back to ourselves. These are in place as we have international versions of the site. They believe because they are the original source of content we need to canonical back to them. Can I please confirm that canonical tags are purely an internal duplicate content strategy. Canonical isn't telling google that from all the content on the web that this is the original source. It's just saying that from the content on our domains, this is the original one that should be ranked. Is that correct? Furthermore, if we implemented a canonical tag linking to Best Restaurants it would de-index all of our restaurants listings and pages and pass the authority of these pages to their site. Is this correct? Thanks!
Technical SEO | | benj20341 -
Subdomain Severe Duplicate Content Issue
Hi A subdomain for our admin site has been indexed and it has caused over 2000 instances of duplicate content. To fix this issue, is a 301 redirect or canoncial tag the best option? http://www.example.com/services http://admin.example.com/services Really appreciate your advice J
Technical SEO | | Metricly-Marketing0 -
Question about duplicate content in crawl reports
Okay, this one's a doozie: My crawl report is listing all of these as separate URLs with identical duplicate content issues, even though they are all the home page and the one that is http://www.ccisolutions.com (the preferred URL) has a canonical tag of rel= http://www.ccisolutions.com: http://www.ccisolutions.com http://ccisolutions.com http://www.ccisolutions.com/StoreFront/IAFDispatcher?iafAction=showMain I will add that OSE is recognizing that there is a 301-redirect on http://ccisolutions.com, but the duplicate content report doesn't seem to recognize the redirect. Also, every single one of our 404-error pages (we have set up a custom 404 page) is being identified as having duplicate content. The duplicate content on all of them is identical. Where do I even begin sorting this out? Any suggestions on how/why this is happening? Thanks!
Technical SEO | | danatanseo1 -
Duplicate Content on Product Pages
Hello I'm currently working on two sites and I had some general question's about duplicate content. For the first one each page is a different location, but the wording is identical on each; ie it says Instant Remote Support for Critical Issues, Same Day Onsite Support with a 3-4 hour response time, etc. Would I get penalized for this? Another question i have is, we offer Antivirus support for providers ie Norton, AVG,Bit Defender etc. I was wondering if we will get penalized for having the same first paragraph with only changing the name of the virus provider on each page? My last question is we provide services for multiple city's and towns in various states. Will I get penalized for having the same content on each page, such as towns and producuts and services we provide? Thanks.
Technical SEO | | ilyaelbert0 -
Duplicate Content - Mobile Site
We think that a mobile version of our site is causing a duplicate content issue; what's the best way to stop the mobile version being indexed. Basically the site forwards mobile users to "/mobile" which is just a mobile optimised version of the original site. Is it best to block the /mobile folder from being crawled?
Technical SEO | | nsmith7870 -
About duplicate content
Hi i'm a new guy around here, but i'm having this problem in my website. Using de Seomoz tools i ran a camping to my website, in results i get to many errors for duplicate conten, for example, http://www.mysite/blue/ http://www.mysite/blue/index.html, so my question is, what is the best way to resolve this problem, use a 301 or use the rel canonical tag? Wich url will be consider for main url, Thanks for yor help.
Technical SEO | | NorbertoMM0