Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Duplicate Content and URL Capitalization
-
I have multiple URLs that SEOMoz is reporting as duplicate content. The reason is that there are characters in the URL that may, or may not, be capitalized depending on user input.
A couple examples are:
www.househitz.com/Pennsylvania/Houses-for-sale
www.househitz.com/Pennsylvania/houses-for-sale
www.househitz.com/Pennsylvania/Houses-for-rent
www.househitz.com/Pennsylvania/houses-for-rent
There are currently thousands of instances of this on the site.
Is this something I should spend effort to try and resolve (may not be minor effort), or should I just ignore it and move on?
-
Hey Jom, you only rewrite the URL if it is not all lowercase, you can distinguish between lower and upper-case in your rewrites.
-
Mark,
In the canonicalization guide link you sent me, there is a link to Matt Cutts' blog www.mattcutts.com/blog/seo-advice-url-canonicalization/ where he talks about it. In that blog he posts:
Q: So when you say www vs. non-www, you’re talking about a type of canonicalization. Are there other ways that urls get canonicalized?
A: Yes, there can be a lot, but most people never notice (or need to notice) them. Search engines can do things like keeping or removing trailing slashes, trying to convert urls with upper case to lower case, or removing session IDs from bulletin board or other software (many bulletin board software packages will work fine if you omit the session ID).This makes me think that doing a 301 redirect and a rel="canonical" for lower case is not needed.
I'm conflicted again.
-
When you rewrite a URL that is already lower case to lower case with a 301 response code, does it now return a 301? Does that mean all pages on the site now return 301? Wouldn't that be bad?
Sorry if I'm being dense. I understand enough about rewrite rules to be dangerous (sometimes, very dangerous).
Jom
-
Yeah, it is absolutely the right thing to do. You can force the URLs t be lower case in RoR as well if you don't want to do it in htaccess (i would do both).
You are simply saying:
-
there are multiple versions of this page on different urls
-
this is the main version of the page
301 them to lower case and canonicalise them and you are good to go!
Marcus
-
-
Thanks, much! I will read through these.
-
Hi Marcus and Mark,
Thanks for the response. On creating the rel="canonical" statements.
That means that I will have thousands, perhaps hundreds of thousands (there are a lot of cities and zips in the US) of rel="canonical" statements on my site.
I thought I read on one of the blogs that too many canonical statements are bad practice. The site is dynamic (Ruby on Rails), I can certainly make the change. I would just like to be sure it's the wise thing to do.
-
Hey Jom,
I must admit I am not sure on the level of urgency to sort this problem out but personally I like to keep the duplication of content to a minimum.
There are multiple ways to sort this out but the most straight forward would probably be to add a rel canonical tag to your web pages.
Here is a good post discussing the faceted issues you can get from e-commerce site, here is SEOMoz's canonicalization guide and here is another seomoz blog post about e-commerce sites and the use of the rel canonical tag.
Hope this helps
-
Hey Jom
Problem is, from a search engine perspective, those are four duplicate pages & from a linking perspective, they are four different pages that you could see your link popularity shared between. Neither of which is ideal.
I would certainly deal with this but it needn't be an arduous task.
1. Set up a rewrite rule to change all URLs to lowercase and 301 any non lowercase ones, something like this in your htaccess should do the job assuming you are using a LAMP environment.
RewriteEngine On RewriteMap lc int:tolower RewriteCond %{REQUEST_URI} [A-Z] RewriteRule (.*) ${lc:$1} [R=301,L]
2. Add an automated lowercase canonical to all of these pages so they canonicalise to the lowercase version.
3. Try to replace the links so they all use lowercase. If this is a dynamic site it should be easy but if not, you could still do a string replacement across multiple files. You could write a little script to automate this if it is a huge job from the sitemap (of lowercase URLs of course.
Certainly worth doing and should not be too difficult with a bit of smarts applied.
Hope this helps!
Marcus
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Duplicate content, although page has "noindex"
Hello, I had an issue with some pages being listed as duplicate content in my weekly Moz report. I've since discussed it with my web dev team and we decided to stop the pages from being crawled. The web dev team added this coding to the pages <meta name='robots' content='max-image-preview:large, noindex dofollow' />, but the Moz report is still reporting the pages as duplicate content. Note from the developer "So as far as I can see we've added robots to prevent the issue but maybe there is some subtle change that's needed here. You could check in Google Search Console to see how its seeing this content or you could ask Moz why they are still reporting this and see if we've missed something?" Any help much appreciated!
Technical SEO | | rj_dale0 -
Recurring events and duplicate content
Does anyone have tips on how to work in an event system to avoid duplicate content in regards to recurring events? How do I best utilize on-page optimization?
Technical SEO | | megan.helmer0 -
How to deal with duplicated content on product pages?
Hi, I have a webshop with products with different sizes and colours. For each item I have a different URL, with almost the same content (title tag, product descriptions, etc). In order to prevent duplicated content I'am wondering what is the best way to solve this problem, keeping in mind: -Impossible to create one page/URL for each product with filters on colour and size -Impossible to rewrite the product descriptions in order to be unique I'm considering the option to canonicolize the rest of de colours/size variations, but the disadvantage is that in case the product is not in stock it disappears from the website. Looking forward to your opinions and solutions. Jeroen
Technical SEO | | Digital-DMG0 -
.com and .co.uk duplicate content
hi mozzers I have a client that has just released a .com version of their .co.uk website. They have basically re-skinned the .co.uk version with some US amends so all the content and title tags are the same. What you do recommend? Canonical tag to the .co.uk version? rewrite titles?
Technical SEO | | KarlBantleman0 -
How to prevent duplicate content at a calendar page
Hi, I've a calender page which changes every day. The main url is
Technical SEO | | GeorgFranz
/calendar For every day, there is another url: /calendar/2012/09/12
/calendar/2012/09/13
/calendar/2012/09/14 So, if the 13th september arrives, the content of the page
/calendar/2012/09/13
will be shown at
/calendar So, it's duplicate content. What to do in this situation? a) Redirect from /calendar to /calendar/2012/09/13 with 301? (but the redirect changes the day after to /calendar/2012/09/14) b) Redirect from /calendar to /calendar/2012/09/13 with 302 (but I will loose the link juice of /calendar?) c) Add a canonical tag at /calendar (which leads to /calendar/2012/09/13) - but I will loose the power of /calendar (?) - and it will change every day... Any ideas or other suggestions? Best wishes, Georg.0 -
Squarespace Duplicate Content Issues
My site is built through squarespace and when I ran the campaign in SEOmoz...its come up with all these errors saying duplicate content and duplicate page title for my blog portion. I've heard that canonical tags help with this but with squarespace its hard to add code to page level...only site wide is possible. Was curious if there's someone experienced in squarespace and SEO out there that can give some suggestions on how to resolve this problem? thanks
Technical SEO | | cmjolley0 -
How to resolve this Duplicate content?
Hi , There is page i get when i do proper menu navigation Caratlane.com>jewellery>rings>casualsrings> http://www.caratlane.com/jewellery/rings/casual-rings/leaves-dew-diamond-0-03-ct-peridot-1-ct-ring-18k-yellow-gold.html When i do a site search in my search box by my product code number "JR00219" The same page is appears with different url http://www.caratlane.com/leaves-dew-diamond-0-03-ct-peridot-1-ct-ring-18k-yellow-gold.html So there is a duplicate content. How can we resolve it. Regards, kathir caratlane.com
Technical SEO | | kathiravan0 -
Whats with the backslash in the url adding as duplicate content?
Is this a bug or something that needs to be addressed? If so, just use a redirect?
Technical SEO | | Boogily0