Duplicate Content and URL Capitalization
-
I have multiple URLs that SEOMoz is reporting as duplicate content. The reason is that there are characters in the URL that may, or may not, be capitalized depending on user input.
A couple examples are:
www.househitz.com/Pennsylvania/Houses-for-sale
www.househitz.com/Pennsylvania/houses-for-sale
www.househitz.com/Pennsylvania/Houses-for-rent
www.househitz.com/Pennsylvania/houses-for-rent
There are currently thousands of instances of this on the site.
Is this something I should spend effort to try and resolve (may not be minor effort), or should I just ignore it and move on?
-
Hey Jom, you only rewrite the URL if it is not all lowercase, you can distinguish between lower and upper-case in your rewrites.
-
Mark,
In the canonicalization guide link you sent me, there is a link to Matt Cutts' blog www.mattcutts.com/blog/seo-advice-url-canonicalization/ where he talks about it. In that blog he posts:
Q: So when you say www vs. non-www, you’re talking about a type of canonicalization. Are there other ways that urls get canonicalized?
A: Yes, there can be a lot, but most people never notice (or need to notice) them. Search engines can do things like keeping or removing trailing slashes, trying to convert urls with upper case to lower case, or removing session IDs from bulletin board or other software (many bulletin board software packages will work fine if you omit the session ID).This makes me think that doing a 301 redirect and a rel="canonical" for lower case is not needed.
I'm conflicted again.
-
When you rewrite a URL that is already lower case to lower case with a 301 response code, does it now return a 301? Does that mean all pages on the site now return 301? Wouldn't that be bad?
Sorry if I'm being dense. I understand enough about rewrite rules to be dangerous (sometimes, very dangerous).
Jom
-
Yeah, it is absolutely the right thing to do. You can force the URLs t be lower case in RoR as well if you don't want to do it in htaccess (i would do both).
You are simply saying:
-
there are multiple versions of this page on different urls
-
this is the main version of the page
301 them to lower case and canonicalise them and you are good to go!
Marcus
-
-
Thanks, much! I will read through these.
-
Hi Marcus and Mark,
Thanks for the response. On creating the rel="canonical" statements.
That means that I will have thousands, perhaps hundreds of thousands (there are a lot of cities and zips in the US) of rel="canonical" statements on my site.
I thought I read on one of the blogs that too many canonical statements are bad practice. The site is dynamic (Ruby on Rails), I can certainly make the change. I would just like to be sure it's the wise thing to do.
-
Hey Jom,
I must admit I am not sure on the level of urgency to sort this problem out but personally I like to keep the duplication of content to a minimum.
There are multiple ways to sort this out but the most straight forward would probably be to add a rel canonical tag to your web pages.
Here is a good post discussing the faceted issues you can get from e-commerce site, here is SEOMoz's canonicalization guide and here is another seomoz blog post about e-commerce sites and the use of the rel canonical tag.
Hope this helps
-
Hey Jom
Problem is, from a search engine perspective, those are four duplicate pages & from a linking perspective, they are four different pages that you could see your link popularity shared between. Neither of which is ideal.
I would certainly deal with this but it needn't be an arduous task.
1. Set up a rewrite rule to change all URLs to lowercase and 301 any non lowercase ones, something like this in your htaccess should do the job assuming you are using a LAMP environment.
RewriteEngine On RewriteMap lc int:tolower RewriteCond %{REQUEST_URI} [A-Z] RewriteRule (.*) ${lc:$1} [R=301,L]
2. Add an automated lowercase canonical to all of these pages so they canonicalise to the lowercase version.
3. Try to replace the links so they all use lowercase. If this is a dynamic site it should be easy but if not, you could still do a string replacement across multiple files. You could write a little script to automate this if it is a huge job from the sitemap (of lowercase URLs of course.
Certainly worth doing and should not be too difficult with a bit of smarts applied.
Hope this helps!
Marcus
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Duplicate Content from long Site Title
Hello! I have a number of "Duplicate Title Errors" as my website has a long Site Title: Planit NZ: New Zealand Tours, Bus Passes & Travel Planning. Am I better off with a short title that is simply my website/business name: Planit NZ My thought was adding some keywords might help with my rankings. Thanks Matt
Technical SEO | | mkyhnn0 -
Purchasing duplicate content
Morning all, I have a client who is planning to expand their product range (online dictionary sites) to new markets and are considering the acquisition of data sets from low ranked competitors to supplement their own original data. They are quite large content sets and would mean a very high percentage of the site (hosted on a new sub domain) would be made up of duplicate content. Just to clarify, the competitor's content would stay online as well. I need to lay out the pros and cons of taking this approach so that they can move forward knowing the full facts. As I see it, this approach would mean forgoing ranking for most of the site and would need a heavy dose of original content as well as supplementing the data on page to build around the data. My main concern would be that launching with this level of duplicate data would end up damaging the authority of the site and subsequently the overall domain. I'd love to hear your thoughts!
Technical SEO | | BackPack851 -
Is this duplicate content that I should be worried about?
Our product descriptions appear in two places and on one page they appear twice. The best way to illustrate that would be to link you to a search results page that features one product. My duplicate content concern refers to the following, When the customer clicks the product a pop-up is displayed that features the product description (first showing of content) When the customer clicks the 'VIEW PRODUCT' button the product description is shown below the buy buytton (second showing of content), this is to do with the template of the page and is why it is also shown in the pop-up. This product description is then also repeated further down in the tabs (third showing of content). My thoughts are that point 1 doesn't matter as the content isn't being shown from a dedicated URL and it relies on javascript. With regards to point 2, is the fact the same paragraph appears on the page twice a massive issue and a duplicate content problem? Thanks
Technical SEO | | joe-ainswoth0 -
Duplicate content and 404 errors
I apologize in advance, but I am an SEO novice and my understanding of code is very limited. Moz has issued a lot (several hundred) of duplicate content and 404 error flags on the ecommerce site my company takes care of. For the duplicate content, some of the pages it says are duplicates don't even seem similar to me. additionally, a lot of them are static pages we embed images of size charts that we use as popups on item pages. it says these issues are high priority but how bad is this? Is this just an issue because if a page has similar content the engine spider won't know which one to index? also, what is the best way to handle these urls bringing back 404 errors? I should probably have a developer look at these issues but I wanted to ask the extremely knowledgeable Moz community before I do 🙂
Technical SEO | | AliMac260 -
Duplicate Content Issues on Product Pages
Hi guys Just keen to gauge your opinion on a quandary that has been bugging me for a while now. I work on an ecommerce website that sells around 20,000 products. A lot of the product SKUs are exactly the same in terms of how they work and what they offer the customer. Often it is 1 variable that changes. For example, the product may be available in 200 different sizes and 2 colours (therefore 400 SKUs available to purchase). Theese SKUs have been uploaded to the website as individual entires so that the customer can purchase them, with the only difference between the listings likely to be key signifiers such as colour, size, price, part number etc. Moz has flagged these pages up as duplicate content. Now I have worked on websites long enough now to know that duplicate content is never good from an SEO perspective, but I am struggling to work out an effective way in which I can display such a large number of almost identical products without falling foul of the duplicate content issue. If you wouldnt mind sharing any ideas or approaches that have been taken by you guys that would be great!
Technical SEO | | DHS_SH0 -
Duplicate Page Content
I've got several pages of similar products that google has listed as duplicate content. I have them all set up with rel="prev" and rel="next tags telling google that they are part of a group but they've still got them listed as duplicates. Is there something else I should do for these pages or is that just a short falling of googles webmaster tools? One of the pages: http://www.jaaronwoodcountertops.com/wood-countertop-gallery/walnut-countertop-9.html
Technical SEO | | JAARON0 -
Duplicate content, Original source?
Hi there, say i have two websites with identicle content. website a had content on before website b - so will be seen as the original source? If the content was intended for website b, would taking it off a then make the orinal source to google then go to website b? I want website b to get the value of the content but it was put on website a first - would taking it off website a then give website b the full power of the content? Any help of advice much appreciated. Kind Regards,
Technical SEO | | pauledwards0 -
Duplicate content issues caused by our CMS
Hello fellow mozzers, Our in-house CMS - which is usually good for SEO purposes as it allows all the control over directories, filenames, browser titles etc that prevent unwieldy / meaningless URLs and generic title tags - seems to have got itself into a bit of a tiz when it comes to one of our clients. We have tried solving the problem to no avail, so I thought I'd throw it open and see if anyone has a soultion, or whether it's just a fault in our CMS. Basically, the SEs are indexing two identical pages, one ending with a / and the other ending /index.php, for one of our sites (www.signature-care-homes.co.uk). We have gone through the site and made sure the links all point to just one of these, and have done the same for off-site links, but there is still the duplicate content issue of both versions getting indexed. We also set up an htaccess file to redirect to the chosen version, but to no avail, and we're not sure canonical will work for this issue as / pages should redirect to /index.php anyway - and that's we can't work out. We have set the access file to point to index.php, and that should be what should be happening anyway, but it isn't. Is there an alternative way of telling the SE's to only look at one of these two versions? Also, we are currently rewriting the content and changing the structure - will this change the situation we find ourselves in?
Technical SEO | | themegroup0