Roger keeps telling me my canonical pages are duplicates
-
I've got a site that's brand spanking new that I'm trying to get the error count down to zero on, and I'm basically there except for this odd problem. Roger got into the site like a naughty puppy a bit too early, before I'd put the canonical tags in, so there were a couple thousand 'duplicate content' errors. I put canonicals in (programmatically, so they appear on every page) and waited a week and sure enough 99% of them went away.
However, there's about 50 that are still lingering, and I'm not sure why they're being detected as such. It's an ecommerce site, and the duplicates are being detected on the product page, but why these 50? (there's hundreds of other products that aren't being detected). The URLs that are 'duplicates' look like this according to the crawl report:
http://www.site.com/Product-1.aspx
http://www.site.com/product-1.aspx
And so on. Canonicals are in place, and have been for weeks, and as I said there's hundreds of other pages just like this not having this problem, so I'm finding it odd that these ones won't go away.
All I can think of is that Roger is somehow caching stuff from previous crawls? According to the crawl report these duplicates were discovered '1 day ago' but that simply doesn't make sense. It's not a matter of messing up one or two pages on my part either; we made this site to be dynamically generated, and all of the SEO stuff (canonical, etc.) is applied to every single page regardless of what's on it.
If anyone can give some insight I'd appreciate it!
-
ThompsonPaul -
Thanks for that info, it pretty much nails exactly what I had discovered independently. This is an IIS7/Win2k8R2 install so luckily the rewriting is a bit easier than in previous iterations. The whole platform is hand coded by us (after the 10th ecommerce site or so you can generally do them in your sleep) so I don't have to worry about CMS implementation and the like, and luckily we already knew that about the spaces so they simply aren't allowed in the filenames. I'm in the middle of making a regex right now that is going to down-case anything in an href="" or src="" tag that will hopefully handle everything on the site side user-created or not. Will consider what to do in regards to external links a bit down the road I think.
-
Valery, you're definitely going to want to normalize your URLs to lowercase. This is a quirk of IIS that it actually respects case in URLs and will consider different case URLs as different pages.
In addition to the search engine problems it creates, it's also a major problem for usabilty - yours and your users. For example, a user who is trying to type in a direct URL can get a 404 error depending on what case they use.
More importantly, your Google Analytics will report on each of those version as separate pages, unless you write a normalizing filter into your GA profiles. Better to do that normalization for the actual site, not just your analytics
While rel=canonical can resolve a number of issues, I've always found it vastly better to correct the actual problem at its root, rather than rely on canonicalization as a catch-all. Anecdotally, I've found correcting issues like this with rewrites seems to allow affected pages to rank better than when just corrected with canonicalization. WIsh I could find time to do an actual case-study on that
Managing rewrites on IIS servers will require a plugin like asapi-rewrite as IIS doesn't handle it natively.
P.S. IIS will also allow and respect spaces in URLs. Users in Internet Explorer will see them as normal with spaces but browsers like Firefox will insert the html entity for a space (%20) into each necessary spot in the URL. This is again a mess for usability, so much better to force rewrite of all URLs to replace spaces with dashes when creating new pages. Many CMSs have plugins for this or you can also use sitewide rewrites to do it after the fact.
-
I think I get your point; the canonical is pointing to where the juice should go, but the URLs are still functionally different things. I'm guessing some sort of URL rewrite is in order, and to standardize how I do in-text links on the site (with user-editable content this part could be a pain).
-
Hey Valery,
I see those on closer inspection. I know it looks weird, but that's accurate. Your server must be UNIX or Linux so they will actually treat case as a different word.
For example: banana.com/pancakes.html would be treated differently than banana.com/PanCakes.html.
So if you have any pages generated dynamically or otherwise that differ only in case, then they will be tagged as duplicate.
In your CSV file you can see the duplicates being caused by case. I'd also be happy to help provide a few specific examples but would want to generate a ticket for you so we don't divulge any private information.
Cheers,
Joel.
-
Joel -
Thanks a lot for looking into that. The pages are very similar, so I'm not surprised they're being duplicate triggered; but what does surprise me is that they are apparently being considered duplicate to a canonical version of themselves? When I click on the duplicate list I'm expecting to see:
Product1.aspx
Product1-Blue.aspx
Product1-Red.aspx
But instead I'm seeing:
Product1.aspx
product1.aspx
product1.ASPX
And so on. The first scenario to me implies that the 3 pages are duplicate to each other, whereas the second is saying that there's either a canonical problem or I literally have different-case versions of those files.
-
Hi Valery,
I took a peek at your campaign and it looks like those few remaining duplicate pages are in fact different, but very minor differences. Basically there's pages for different sizes of things.
While being different, they vary in such minute ways that Roger see's them as duplicates.
I Hope that answers the question.
Thanks,
Joel.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Syntax for canonical tag for a default page in a sub directory (not subdomain) of a web site?
I'm getting two "no canonical tag" errors for the default page of a sub-directory default page (www and root) - again NOT a subdomain. Since the page is not the root of its own site, I tagged it as -- I have tried without the default.asp, but the error remains. Been doing this for 24 years and don't remember running across this before.
Moz Pro | | dcmike0 -
Page authority
Hello, How can my page authority be different across various page built exactly on the same model and none of them having links ? Thank you,
Moz Pro | | seoanalytics0 -
Crawl diagnostics incorrectly reporting duplicate page titles
Hi guys, I have a question in regards to the duplicate page titles being reported in my crawl diagnostics. It appears that the URL parameter "?ctm" is causing the crawler to think that duplicate pages exist. In GWT, we've specified to use the representative URL when that parameter is used. It appears to be working, since when I search site:http://www.causes.com/about?ctm=home, I am served a single search result for www.causes.com/about. That begs the question, why is the SEOMoz crawler saying there is duplicate page titles when Google isn't (doesn't appear under the HTML improvements for duplicate page titles)? A canonical URL is not used for this page so I'm assuming that may be one reason why. The only other thing I can think of is that Google's crawler is simply "smarter" than the Moz crawler (no offense, you guys put out an awesome product!). Any help is greatly appreciated and I'm looking forward to being an active participant in the Q&A community! Cheers, Brad
Moz Pro | | brad_dubs0 -
SEOMoz On-Page Report Card
This question is for one of the SEOMoz staff. With the ongoing changes and improvement in algorithms, does the SEOMoz team keep the "On-page Report Card" up to date with best practices?
Moz Pro | | tdawson090 -
Canonical for Mobile
Hi Guys, I am curious why in SEOMoz, our mobile site is showing to have the canonical tags used on the desktop site but when you double check the code of the mobile website it is showing m.domain.com Any thoughts on why we are seeing this? Also is there any lag in the code updates being reported through the SEOmoz toolset? Thanks for all your help! Cheers,
Moz Pro | | lwalker0 -
Keywords Best Practices for On-Page Optimization
Hi guys, we've successfully optimized our home page such that it receives a Grade A for 3 completely different, high traffic keywords. Looking forward to seeing the results! The keywords in question were identified by using the monthly searches reported from the Google Keyword Tool. For one of the keywords, the Google Keyword Tool differentiates between what I thought would be seen as being the same. For example, let's say Google reports these three keywords as high traffic keywords: tea cup
Moz Pro | | yacpro13
tea cups
the tea cup Using the On-Page Report Card, we get a Grade A for 'tea cup', but we get an F for the other 2 terms! I thought Google searches didn't really care about the plural form or adding the word 'the' in front. How should we interpret the result from the On-Page Report Card for the plural form of the keyword and with the word 'the' added in front? Would you track all 3 instances of the keyword independtly in your campaign, or would you just track 'tea cup'? Thanks!0 -
HTTP 404 for 404-page?
Hi Mozzers! SEOmoz just finished crawling one of my websites and this crawl found 3 errors. One of these errors was the (custom) 404-page, because of the http-status 404. What's you suggestion about this? Should a 404-page have a status 404? Thanks in advance for your suggestions!
Moz Pro | | Partouter0 -
How do I find the most linked to page of a site?
I'm looking at a site for a potential link and am trying to find the most linked to page. The SEOmoz toolbar tells me the root domain (DA) is linked to by 660 root domains but the main URL (PA) is linked to by 38 root domains. I used open site explorer and got the same # of 38 root domains in the result. From the Top Pages tab, I clicked on the 2nd page down and the SEOmoz toolbar gives me 189 root domains linking to that page (PA). Then I ran a Linkscape report to see what that would say and I get 146 linking root domains. 1. Is this 2nd page down on OSE the most linked to page? 2. a. Is something off in these numbers?
Moz Pro | | Motava
b. How come OSE/Linkscape doesn't report the 660 root domains in the DA?0