Canonical URLs and Sitemaps
-
We are using canonical link tags for product pages in a scenario where the URLs on the site contain category names, and the canonical URL points to a URL which does not contain the category names. So, the product page on the site is like www.example.com/clothes/skirts/skater-skirt-12345, and also like www.example.com/sale/clearance/skater-skirt-12345 in another category. And on both of these pages, the canonical link tag references a 3rd URL like www.example.com/skater-skirt-12345. This 3rd URL, used in the canonical link tag is a valid page, and displays the same content as the other two versions, but there are no actual links to this generic version anywhere on the site (nor external).
Questions:
1. Does the generic URL referenced in the canonical link also need to be included as on-page links somewhere in the crawled navigation of the site, or is it okay to be just a valid URL not linked anywhere except for the canonical tags?
2. In our sitemap, is it okay to reference the non-canonical URLs, or does the sitemap have to reference only the canonical URL? In our case, the sitemap points to yet a 3rd variation of the URL, like www.example.com/product.jsp?productID=12345. This page retrieves the same content as the others, and includes a canonical link tag back to www.example.com/skater-skirt-12345. Is this a valid approach, or should we revise the sitemap to point to either the category-specific links or the canonical links?
-
Thanks. And since we've now implemented the aforementioned changes, I can give some findings back.
What we did: We changed our sitemap to point to the same canonical URLs as are referenced in the tags on our product pages (only one entry in sitemap per product).
What we didn't do: We didn't change the product pages themselves. They still have a canonical URL link reference, pointing to a URL with no category paths, which does not naturally occur in the navigation of the site (on the site, product pages all have category paths in the URL).
Findings: After submitting the new sitemap, the stats in Google Webmasters Tools indicate that almost all (> 96%) of our product pages are indexed. We believe that the pages were already indexed (for the most part) and now the sitemap is useful for metrics. From the timing, it's unlikely that the sitemap itself caused our index stats to get significantly better in just 1 day. Possible, but unlikely. In either case, since our product page URLs still reference canonical links which don't exist in the site's navigation, the evidence suggests that the canonical link itself is enough, and an actual navigation path to the canonical version of the page is not needed. That's just empirical evidence, we have no inside info on Google's methods, but this is what we believe now after monitoring.
-
With the canonical tag in place, I'm guessing that extra link would basically be ignored. It's probably harmless, but I'm not sure it will do anything. You could create an HTML "sitemap" (or even an XML sitemap) with the canonical URLs. It's not my first choice, but it at least would give Google an extra push.
-
We're in process of updating our canonical tagging and our sitemap, based on the feedback here. I have a question for the group though. Unfortunately we can't follow Andy Smith's suggestion of creating a "By Brand" navigation section on the site, since this web site is all private label (they sell all products under their own brand name).
One possible solution is to create a user-accessible site map page, with an "all products" paginated section, where all these product page URLs would be the canonical version.
But another possible solution, easier to implement, would be to have a user accessible link on each product page to the canonical version of itself. That is, when the user is on www.example.com/clothes/skirts/skater-skirt-12345, there would be a link to www.example.com/skater-skirt-12345, which would also be the URL specified in the canonical tag.
This seems redundant, but our results so far have borne out that the canonical tag pointing to a URL which doesn't really exist anywhere in the navigation doesn't seem to be having the desired effect. So, the thought is that a combination of the canonical tag, plus a "real" link to that same URL referenced in the canonical tag would better inform the search engine robots. But our hesitation is whether it should work for this link to be on the product page itself (e.g. the non-canonical version).
Any thoughts or feedback on approach?
-
Thanks for the responses. I've been monitoring for the past couple of weeks with the current sitemap and canonical structure, and so far the data seems consistent with the replies to this thread. In GWT, the sitemap stats show less than 1% of the URLs submitted are indexed so far. We have an action plan now to update the canonical structure and the sitemap to point to URLs which will be naturally crawled on the site as well.
-
There's no "have to" in most of these situations, but it boils down to this - the more canonical your canonical URL actually is, the better chance you have of Google honoring it. In other words, if you set a canonical tag but then never use that in internal links or your XML sitemap, odds are pretty good that Google may ignore the tag in some cases. You're basically saying "Hey, this URL is canonical! No, this one is! No, this one!" - it's a mixed message, and they're going to try to interpret it algorithmically.
I definitely think pointing to yet another version in the XML sitemap is a problem. Ideally, it would be great to unify your URLs, but if that's not possible, getting the canonical version in the sitemap would be a big help (and introducing yet another variant isn't good, so you'd kill two birds with one stone). As Andy said, if you could create some kind of internal link to the canonical version, even if it's not the main link, that could also help. I only hesitate on that one, because you don't want to end up with a weird, artificial linking structure (just creating links to have links).
Please note, this isn't necessarily a disaster the way you have it. Google could honor the tags properly and generally rank your site correctly. In my experience, though, it's a recipe for long-term problems, and it's worth fixing.
-
The purpose of the canonical tag is to tell Google which page to index first. So, on that note, I usually use the canonical tag on the strongest page in terms of pagerank, as this shows which page is linked to the best.
I'm also guessing you're using a framwork/platform like Magento, this can make linking quite difficult. I often suggest creating Brand pages, and link to the product page, the "3rd URL", from there. Brand pages also great for SEO, as most people search for brands first. Great place to get some fat head keywords in.
Also, make sure you put in the http:// as well, I think it is good practice to put in the full URL.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Changing sitemaps in console
Hi there, Does anyone have any experience submitting a completely new sitemap structure - including URLs - to google console? We've changed our sitemap plug in, so rather than /sitemap-index.xml, our main sitemap home is /sitemap.xml (as an example). Is it better to 410 the old ones or 301 redirect them to the new sitemaps? If 301, what do we do about sitemaps that don't completely correlate - what was divided into item1.xml, item2.xml is now by date so items-from-2015.xml, items-from-2016.xml and so on. On a related note, am I right in thinking that there's no longer a "delete/ remove sitemap" option on console? In which case, what happens to the old ones which will now 404? Thanks anyone for any insight you may have 🙂
Intermediate & Advanced SEO | | Fubra0 -
If I block a URL via the robots.txt - how long will it take for Google to stop indexing that URL?
If I block a URL via the robots.txt - how long will it take for Google to stop indexing that URL?
Intermediate & Advanced SEO | | Gabriele_Layoutweb0 -
Canonical Confusion
So I have products appearing in several categories, all of which have the correct canonical url. But Moz is flagging up pages I never knew existed, and I don't understand why they exist at all and more so why my canonical fix isn't occurring for them, as below: SEO Friendly URL: http://thespacecollective.com/nasa-pin-sets/nasa-shuttle-mission-pin-set-no2 Weird URL to same product: http://thespacecollective.com/index.php?route=themecontrol/product&product_id=159 Is this a developer problem rather than an SEO problem?
Intermediate & Advanced SEO | | moon-boots0 -
Canonical or No-index
Just a quick question really. Say I have a Promotions page where I list all current promotions for a product, and update it regularly to reflect the latest offer codes etc. On top of that I have Offer announcement posts for specific promotions for that product, highlighting very briefly the promotion, but also linking back to the main product promotion page which has a the promotion duplicated. So main page is 1000+ words with half a dozen promotions, the small post might be 200 words, and quickly become irrelevant as it is a limited time news article. Now, I don't want the promotion page indexed (unless it has a larger news story attached to the promotion, but for this purpose presume it is doesn't). Initially the core essence of the post will be duplicated in the main Promotion page, but later as the offer expires it wouldn't be. Therefore would you Rel Canonical or just simply No-index?
Intermediate & Advanced SEO | | TheWebMastercom0 -
URL Optimisation Dilemma
First of all, I fully appreciate that I may be over analysing this, so feel free to highlight if you think I’m going overboard on this one. I’m currently trying to optimise the URLs for a group of new pages that we have recently launched. I would usually err on the side of leaving the urls as they are so that any incoming links are not diluted through the 301 re-direct. In this case, however, there are very few links to these pages, so I don’t think that changing URLs will harm them. My main question is between short URLs vs. long URLs (I have already read Dr. Pete’s post on this). Note: the URLs I have listed below are not the actual URLs, but very similar examples that I have created. The URLs currently exist in a similar format to the examples below: http://www.company.com/products/dlm/hire-ca My first response was that we could put a few descriptive keywords in the url, with something like the following: http://www.company/products/debt-lifecycle-management/hire-collection-agents - I’m worried though that the URL will get too long for any pages sitting under this. As a compromise, I am considering the following: http://www.company/products/dlm/hire-collection-agents My feeling is that the second approach will give the best balance between having the keywords for the products and trying to ensure good user experience. My only concern is whether the /dlm/ category page would suffer slightly, but this would have ‘debt-lifecycle-management’ in the title tag. Does this sound like a good approach to people? Or do you think I’m being a little obsessive about this? Any help would be appreciated 🙂
Intermediate & Advanced SEO | | RG_SEO0 -
Mobile Sitemap Issue
Hi there, I am having some difficulty with an error on Webmaster Tools. I'm concerned with a possible duplicate content penalty following the launch of my mobile site. I have attempted to update my sitemap to inform Google that a different mobile page exists in addition to the desktop page. I have followed Google's guidelines as outlined here:
Intermediate & Advanced SEO | | DBC01
http://support.google.com/webmasters/bin/answer.py?hl=en&answer=34648 I'm having problems with my sitemap.xml file. Webmaster tools is reporting that it is not able to read the file and when I validate it I am getting an error stating that the 'Namespace prefix xhtml on link is not defined'. All I am trying to do is to create a sitemap that uses the rel="alternate" to inform Google that their is a mobile version of that specific page in addition to the desktop version. An instance of the code I am using is below: xml version="1.0" encoding="UTF-8"?> xml-stylesheet type="text/xsl" href="gss.xsl"?> <urlset< span="">xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"xsi:schemaLocation="http://www.google.com/schemas/sitemap/0.84 http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd"> http://www.mydomain/info/detail/ <xhtml:link< span="">rel="alternate" media="only screen and (max-width: 640px)" href="http://m.mydomain.com/info/detail.html"/> <lastmod></lastmod>2013-02-01T16:03:48+00:00<changefreq></changefreq>daily0.50</xhtml:link<></urlset<> Any help would be much appreciated. Thanks0 -
How do I make my URLs SEO friendly?
Hi all, I am aware that overly-dynamic URLs hurt a website's SEO potential and I want to fix mine. At present they look like this: http://www.societyboardshop.co.uk/products.php?brand=Girl+Skateboards&BrandID=153 What do I need to do to fix them please... do I add some code to the htaccess file? Many thanks, much apreciated. Paul.
Intermediate & Advanced SEO | | Paul530 -
Sitemap not indexing pages
My website has about 5000 pages submitted in the sitemap but only 900 being indexed. When I checked Google Webmaster Tools about a week ago 4500 pages were being indexed. Any suggestions about what happened or how to fix it? Thanks!
Intermediate & Advanced SEO | | theLotter0