Why do I get duplicate pages, website referencing the capital version of the url vs the lowercase www.agi-automation.com/Pneumatic-grippers.htm
-
Can I the rel=canonical tag this?
-
I'm not a pro when it comes to technical server set ups, so maybe Keri can jump in with some better knowledge.
It seems to me like you have everything set up on your server correctly. And it looks like Google currently has only one version indexed of the original page in question.
You site navigation menu points to the capitalized version of the URL, but somewhere on your site there must be a link that points to the lowercase version which would explain how SEOmoz found the duplication when crawling your site, and if SEOmoz can find, so can Google.
I still think you should use the rel=canonical attribute just to be safe. Again, I'm not that great at technical stuff. Sorry I couldn't be of more help here.
Tim
-
Hi Tim,
Thanks for your responses. This is what the IT team has found. Let me know your thoughts:
On the physical computer that hosts the website the page exists as one file. The casing of the file is irrelevant to the host machine, it wouldn't allow 2 files of the same name in the same directory.
To reenforce this point, you can access said file by camel-casing the URI in any fashion (eg; http://www.agi-automation.com/Lin...). This does not bring up a different file each time, the server merely processes the URI as case-less and pulls the file by it's name.
What is happening in the example given is that some sort of indexer is being used to create a "dummy" reference of all the site files. Since the indexer doesn't have file access to the server, it does this by link crawling instead of reading files. It is the crawler that is making an assumption that the different casings of the pages are in fact different files. Perhaps there is a setting in the indexer to ignore casing.
So the indexer is thinking that these are 2 different pages when they really aren't. This makes all of the other points moot, though they would certainly be relevant in the case of an actual duplicated page."
-
Hi Keri and Tim,
Thanks for your responses. This is what the IT team has found. Let me know your thoughts:
On the physical computer that hosts the website the page exists as one file. The casing of the file is irrelevant to the host machine, it wouldn't allow 2 files of the same name in the same directory.
To reenforce this point, you can access said file by camel-casing the URI in any fashion (eg; http://www.agi-automation.com/Linear-EscapeMents.htm). This does not bring up a different file each time, the server merely processes the URI as case-less and pulls the file by it's name.
What is happening in the example given is that some sort of indexer is being used to create a "dummy" reference of all the site files. Since the indexer doesn't have file access to the server, it does this by link crawling instead of reading files. It is the crawler that is making an assumption that the different casings of the pages are in fact different files. Perhaps there is a setting in the indexer to ignore casing.
So the indexer is thinking that these are 2 different pages when they really aren't. This makes all of the other points moot, though they would certainly be relevant in the case of an actual duplicated page."
-
Excellent points, Keri. I hadn't thought about either of those issues. Using a redirect is definitely the best way to go.
-
I'd vote for doing the rewrite to the lowercase version. This gives you a couple of added benefits:
-
If people copy and paste the URL from their browser then link to it, you're getting all the links going to the same place.
-
Your analytics based on your URLs will be more accurate. Instead of seeing:
urla.htm 70 visits
urlb.htm 60 visits
urlB.htm 30 visitsYou'll see
urlb.htm 90 visits
urla.htm 70 visits -
-
The problem is that search engines view these URLs as two separate pages, so both pages get indexed and you run into duplication issues.
Yes, using rel=canonical is a good way to handle this. I would suggest using the lowercase version as your canonical page, so you would place this bit of HTML on both pages:
The other option is to create a 301 redirect from the caps version to the lowercase version. This would ensure that anyone arriving at the page (including search engine bots) would end up being directed to the lowercase version.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
New Pages in my Shopify website is not indexing
Hi The Service area pages created on my Shopify website is not indexing on google for a long time, Tried indexing the pages manually and also submitted the sitemap but still the pages doesn't seem to get indexed.
Technical SEO | | Bhisshaun
Thanks in Advance.0 -
Duplicate page titles for blog snippets pages
I can't figure the answer to this issue, on my blog I have a number of pages which each show snippets and an image for each blog entry, these are called /recent-weddings/page/1 /2 /3 and so on. I'm getting duplicate page titles for these but can't find anywhere on Wordpress to set a unique title for them. So http://www.weddingphotojournalist.co.uk/recent-weddings/…/2/ has the same title as http://www.weddingphotojournalist.co.uk/recent-weddings/…/3/
Technical SEO | | simonatkinsphoto0 -
Www vs non www - Crawl Error 902
I have just taken over admin of my company website and I have been confronted with crawl error 902 on the existing campaign that has been running for years in Moz. This seems like an intermittent problem. I have searched and tried to go over many of the other solutions and non of them seem to help. The campaign is currently set-up with the url http://companywebsite.co.uk when I tried to do a Moz manual crawl using this URL I got an error message. I changed the link to crawl to http://www.companywebsite.co.uk and the crawl went off without a hitch and im currently waiting on the results. From testing I now know that if i go to the non-www version of my companies website then nothing happens it never loads. But if I go to the www version then it loads right away. I know for SEO you only want 1 of these URLS so you dont have duplicate content. But i thought the non-www should redirect to the www version. Not just be completely missing. I tried to set-up a new campaign with the defaults URL being the www version but Moz automatically changed it to the non-www version. It seems a cannot set up a new campaign with it automatically crawling the www version. Does it sound like im out the right path to finding this cause? Or can somebody else offer up a solution? Many thanks,
Technical SEO | | ATP
Ben .0 -
How to redirect my .com/blog to my server folder /blog ?
Hello SEO Moz ! Always hard to post something serious for the 04.01 but anyway let's try ! I'm releasing Joomla websites website.com, website.com/fr, website.com/es and so on. Usually i have the following folders on my server [ROOT]/com [ROOT]/com/fr [ROOT]/com/es However I would like to get the following now (for back up and security purpose). [ROOT]/com [ROOT]/es [ROOT]/fr So now what can I do (I gues .htaccess) to open the folder [ROOT]/es when people clic on website.com/es ? It sounds stupid but I really don't know. I found this on internet but did not answer my needs. .htaccess RewriteEngine On
Technical SEO | | AymanH
RewriteCond %{REQUEST_URI} !(^/fr/.) [NC]
RewriteRule ^(.)$ /sites/fr/$1 [L,R=301] Tks a lot ! Florian0 -
WordPress - How to stop both http:// and https:// pages being indexed?
Just published a static page 2 days ago on WordPress site but noticed that Google has indexed both http:// and https:// url's. Usually I only get http:// indexed though. Could anyone please explain why this may have happened and how I can fix? Thanks!
Technical SEO | | Clicksjim1 -
Duplicate Content on Product Pages
Hello I'm currently working on two sites and I had some general question's about duplicate content. For the first one each page is a different location, but the wording is identical on each; ie it says Instant Remote Support for Critical Issues, Same Day Onsite Support with a 3-4 hour response time, etc. Would I get penalized for this? Another question i have is, we offer Antivirus support for providers ie Norton, AVG,Bit Defender etc. I was wondering if we will get penalized for having the same first paragraph with only changing the name of the virus provider on each page? My last question is we provide services for multiple city's and towns in various states. Will I get penalized for having the same content on each page, such as towns and producuts and services we provide? Thanks.
Technical SEO | | ilyaelbert0 -
Crawl reveals hundreds of urls with multiple urls in the url string
The latest crawl of my site revealed hundreds of duplicate page content and duplicate page title errors. When I looked it was from a large number of urls with urls appended to them at the end. For example: http://www.test-site.com/page1.html/page14.html or http://www.test-site.com/page4.html/page12.html/page16.html some of them go on for a hundred characters. I am totally stymied, as are the people at my ISP and the person who talked to me on the phone from SEOMoz. Does anyone know what's going on? Thanks So much for any help you can offer! Jean
Technical SEO | | JeanYates0 -
Main vs Pointing URL Confusion
Hi, I have a new website that deals with 3 very similar businesses The main purpose of the website is to assist the consumer to search, compare and book chaffered transportation online. The 3 businesses are taxi cab hire, limo hire, party bus hire. My main domain is www.mytaxicab.co.za. I also own the domains mylimo.co.za and mypartybus.co.za. My question is, from a best practice SEO perspective, how should I set this up. Should I have one main site www.mytaxicab.co.za and have the 2 other niche domain names point in eg user types www.mypartybus.co.za and lands on www.mytaxicab.co.za/party-bus-hire or is there a better way? Any help or insite would be appreciated
Technical SEO | | chirsirwin0