Why do I get duplicate pages, website referencing the capital version of the url vs the lowercase www.agi-automation.com/Pneumatic-grippers.htm
-
Can I the rel=canonical tag this?
-
I'm not a pro when it comes to technical server set ups, so maybe Keri can jump in with some better knowledge.
It seems to me like you have everything set up on your server correctly. And it looks like Google currently has only one version indexed of the original page in question.
You site navigation menu points to the capitalized version of the URL, but somewhere on your site there must be a link that points to the lowercase version which would explain how SEOmoz found the duplication when crawling your site, and if SEOmoz can find, so can Google.
I still think you should use the rel=canonical attribute just to be safe. Again, I'm not that great at technical stuff. Sorry I couldn't be of more help here.
Tim
-
Hi Tim,
Thanks for your responses. This is what the IT team has found. Let me know your thoughts:
On the physical computer that hosts the website the page exists as one file. The casing of the file is irrelevant to the host machine, it wouldn't allow 2 files of the same name in the same directory.
To reenforce this point, you can access said file by camel-casing the URI in any fashion (eg; http://www.agi-automation.com/Lin...). This does not bring up a different file each time, the server merely processes the URI as case-less and pulls the file by it's name.
What is happening in the example given is that some sort of indexer is being used to create a "dummy" reference of all the site files. Since the indexer doesn't have file access to the server, it does this by link crawling instead of reading files. It is the crawler that is making an assumption that the different casings of the pages are in fact different files. Perhaps there is a setting in the indexer to ignore casing.
So the indexer is thinking that these are 2 different pages when they really aren't. This makes all of the other points moot, though they would certainly be relevant in the case of an actual duplicated page."
-
Hi Keri and Tim,
Thanks for your responses. This is what the IT team has found. Let me know your thoughts:
On the physical computer that hosts the website the page exists as one file. The casing of the file is irrelevant to the host machine, it wouldn't allow 2 files of the same name in the same directory.
To reenforce this point, you can access said file by camel-casing the URI in any fashion (eg; http://www.agi-automation.com/Linear-EscapeMents.htm). This does not bring up a different file each time, the server merely processes the URI as case-less and pulls the file by it's name.
What is happening in the example given is that some sort of indexer is being used to create a "dummy" reference of all the site files. Since the indexer doesn't have file access to the server, it does this by link crawling instead of reading files. It is the crawler that is making an assumption that the different casings of the pages are in fact different files. Perhaps there is a setting in the indexer to ignore casing.
So the indexer is thinking that these are 2 different pages when they really aren't. This makes all of the other points moot, though they would certainly be relevant in the case of an actual duplicated page."
-
Excellent points, Keri. I hadn't thought about either of those issues. Using a redirect is definitely the best way to go.
-
I'd vote for doing the rewrite to the lowercase version. This gives you a couple of added benefits:
-
If people copy and paste the URL from their browser then link to it, you're getting all the links going to the same place.
-
Your analytics based on your URLs will be more accurate. Instead of seeing:
urla.htm 70 visits
urlb.htm 60 visits
urlB.htm 30 visitsYou'll see
urlb.htm 90 visits
urla.htm 70 visits -
-
The problem is that search engines view these URLs as two separate pages, so both pages get indexed and you run into duplication issues.
Yes, using rel=canonical is a good way to handle this. I would suggest using the lowercase version as your canonical page, so you would place this bit of HTML on both pages:
The other option is to create a 301 redirect from the caps version to the lowercase version. This would ensure that anyone arriving at the page (including search engine bots) would end up being directed to the lowercase version.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Site Crawl -> Duplicate Page Content -> Same pages showing up with duplicates that are not
These, for example: | https://im.tapclicks.com/signup.php/?utm_campaign=july15&utm_medium=organic&utm_source=blog | 1 | 2 | 29 | 2 | 200 |
Technical SEO | | writezach
| https://im.tapclicks.com/signup.php?_ga=1.145821812.1573134750.1440742418 | 1 | 1 | 25 | 2 | 200 |
| https://im.tapclicks.com/signup.php?utm_source=tapclicks&utm_medium=blog&utm_campaign=brightpod-article | 1 | 119 | 40 | 4 | 200 |
| https://im.tapclicks.com/signup.php?utm_source=tapclicks&utm_medium=marketplace&utm_campaign=homepage | 1 | 119 | 40 | 4 | 200 |
| https://im.tapclicks.com/signup.php?utm_source=blog&utm_campaign=first-3-must-watch-videos | 1 | 119 | 40 | 4 | 200 |
| https://im.tapclicks.com/signup.php?_ga=1.159789566.2132270851.1418408142 | 1 | 5 | 31 | 2 | 200 |
| https://im.tapclicks.com/signup.php/?utm_source=vocus&utm_medium=PR&utm_campaign=52release | Any suggestions/directions for fixing or should I just disregard this "High Priority" moz issue? Thank you!0 -
I really need some help with Magento and Duplicate Page Content results I;m getting
Hi, We use Magento for our eCommerce platform and I'm getting a number of duplicate page content results. It mainly concerns the duplicate page content errors for our category pages. Firstly It seems like the product type and filter options highlighted in the picture are causing duplicate page content Also one particularity category is getting a lot from duplicate page content errors , http://www.tidy-books.co.uk/shop-all-products I understand that this category page is using duplicate pages of other category pages so I set this to exclude them from the site map but it looks likes its till being picked up? I've attached the csv file showing these errors as well. - > Any help would be massively appreciated Thanks filter.png moz-tidy-books-uk-crawl_issues-01-OCT-2014.csv
Technical SEO | | tidybooks0 -
Duplicate Titles on Wordpress blog pages
Hi, I have an issue where I am getting for duplicate page titles for pages that shouldn't exist. The issue is on the blog index page's (from 0 - 16) and involves the same set of attachment_id for each page, i.e. /blog/page/10/?attachment_id=minack /blog/page/10/?attachment_id=ponyrides /blog/page/11/?attachment_id=minack /blog/page/11/?attachment_id=ponyrides There are 6 attachment_id values (and they are not ID values either) which repeat for every page on the index now what I can't work out is where those 6 links are coming from as on the actual blog index page http://www.bosinver.co.uk/blog/page/10/ there are no links to it and the links just go to blog index page and it ignores the attachment_id value. There is no sitemap.xml file either which I thought might have contained the links. Thanks
Technical SEO | | leapSEO0 -
How to redirect my .com/blog to my server folder /blog ?
Hello SEO Moz ! Always hard to post something serious for the 04.01 but anyway let's try ! I'm releasing Joomla websites website.com, website.com/fr, website.com/es and so on. Usually i have the following folders on my server [ROOT]/com [ROOT]/com/fr [ROOT]/com/es However I would like to get the following now (for back up and security purpose). [ROOT]/com [ROOT]/es [ROOT]/fr So now what can I do (I gues .htaccess) to open the folder [ROOT]/es when people clic on website.com/es ? It sounds stupid but I really don't know. I found this on internet but did not answer my needs. .htaccess RewriteEngine On
Technical SEO | | AymanH
RewriteCond %{REQUEST_URI} !(^/fr/.) [NC]
RewriteRule ^(.)$ /sites/fr/$1 [L,R=301] Tks a lot ! Florian0 -
Picking a URL co.uk vs .net vs .biz which is best for SEO?
Hi I was always told that you only want a .co.uk or a .com is this still true? I am setting up an ecommerce site and the ideal .co.uk is gone! Am I better to have a longer ulr with maybe a hifen or a uk in it or is it ok to have a .biz or a .net these days? You help would be greatly appreciated! Thanks James
Technical SEO | | JamesBryant0 -
How does Google find /feed/ at the end of all pages on my site?
Hi! In Google Webmaster Tools I find *.../feed/ as a 404 page in crawl errors. The problem is that none of these pages exist and they have no inbound links (except the start page). FYI, it´s a wordpress site. Example: www.mysite.com/subpage1/feed/ www.mysite.com/subpage2/feed/ www.mysite.com/subpage3/feed/ etc Does Google search for /feed/ by default or why do I keep getting these 404´s every day?
Technical SEO | | Vivamedia0 -
Duplicate pages, overly dynamic URL’s and long URL’s in Magento
Hi there, I’ve just completed the first crawl of my Magento site and SEOMOZ has picked up 1,000’s of duplicate pages, overly dynamic URL’s and long URL’s due to the sort function which appends URL’s with variables when sorting products (e.g. www.example.com?dir=asc&order=duration). I’m not particularly concerned that this will affect our rankings as Google has stated that they are familiar with the structure of popular CMS’s and Magento is pretty popular. However it completely dominates my crawl diagnostics so I can’t see if there are any real underlying issues. Does anyone know a way of preventing this? Cheers,
Technical SEO | | WendyWuTours
Al.1 -
Strange duplicate url
From your csv report I have this strange issue. This url: elettrodomestici.yeppon.it/climatizzatori/condizionatori-fissi/prodotti/condizionatori-fissi-comfee/ it's a duplicate of this elettrodomestici.yeppon.it/climatizzatori/condizionatori-fissi/prodotti/condizionatori-fissi-comfee/ but the only url that I can see in the website is this one. Why the "-" is transalted some times in "%2D" referrer obviously is elettrodomestici.yeppon.it/climatizzatori/condizionatori-fissi/prodotti/condizionatori-fissi-comfee/solo-disponibili/ I have many duplicate url...Can you help me? Thanks
Technical SEO | | yeppon0