Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Canonical URLs and Sitemaps
-
We are using canonical link tags for product pages in a scenario where the URLs on the site contain category names, and the canonical URL points to a URL which does not contain the category names. So, the product page on the site is like www.example.com/clothes/skirts/skater-skirt-12345, and also like www.example.com/sale/clearance/skater-skirt-12345 in another category. And on both of these pages, the canonical link tag references a 3rd URL like www.example.com/skater-skirt-12345. This 3rd URL, used in the canonical link tag is a valid page, and displays the same content as the other two versions, but there are no actual links to this generic version anywhere on the site (nor external).
Questions:
1. Does the generic URL referenced in the canonical link also need to be included as on-page links somewhere in the crawled navigation of the site, or is it okay to be just a valid URL not linked anywhere except for the canonical tags?
2. In our sitemap, is it okay to reference the non-canonical URLs, or does the sitemap have to reference only the canonical URL? In our case, the sitemap points to yet a 3rd variation of the URL, like www.example.com/product.jsp?productID=12345. This page retrieves the same content as the others, and includes a canonical link tag back to www.example.com/skater-skirt-12345. Is this a valid approach, or should we revise the sitemap to point to either the category-specific links or the canonical links?
-
Thanks. And since we've now implemented the aforementioned changes, I can give some findings back.
What we did: We changed our sitemap to point to the same canonical URLs as are referenced in the tags on our product pages (only one entry in sitemap per product).
What we didn't do: We didn't change the product pages themselves. They still have a canonical URL link reference, pointing to a URL with no category paths, which does not naturally occur in the navigation of the site (on the site, product pages all have category paths in the URL).
Findings: After submitting the new sitemap, the stats in Google Webmasters Tools indicate that almost all (> 96%) of our product pages are indexed. We believe that the pages were already indexed (for the most part) and now the sitemap is useful for metrics. From the timing, it's unlikely that the sitemap itself caused our index stats to get significantly better in just 1 day. Possible, but unlikely. In either case, since our product page URLs still reference canonical links which don't exist in the site's navigation, the evidence suggests that the canonical link itself is enough, and an actual navigation path to the canonical version of the page is not needed. That's just empirical evidence, we have no inside info on Google's methods, but this is what we believe now after monitoring.
-
With the canonical tag in place, I'm guessing that extra link would basically be ignored. It's probably harmless, but I'm not sure it will do anything. You could create an HTML "sitemap" (or even an XML sitemap) with the canonical URLs. It's not my first choice, but it at least would give Google an extra push.
-
We're in process of updating our canonical tagging and our sitemap, based on the feedback here. I have a question for the group though. Unfortunately we can't follow Andy Smith's suggestion of creating a "By Brand" navigation section on the site, since this web site is all private label (they sell all products under their own brand name).
One possible solution is to create a user-accessible site map page, with an "all products" paginated section, where all these product page URLs would be the canonical version.
But another possible solution, easier to implement, would be to have a user accessible link on each product page to the canonical version of itself. That is, when the user is on www.example.com/clothes/skirts/skater-skirt-12345, there would be a link to www.example.com/skater-skirt-12345, which would also be the URL specified in the canonical tag.
This seems redundant, but our results so far have borne out that the canonical tag pointing to a URL which doesn't really exist anywhere in the navigation doesn't seem to be having the desired effect. So, the thought is that a combination of the canonical tag, plus a "real" link to that same URL referenced in the canonical tag would better inform the search engine robots. But our hesitation is whether it should work for this link to be on the product page itself (e.g. the non-canonical version).
Any thoughts or feedback on approach?
-
Thanks for the responses. I've been monitoring for the past couple of weeks with the current sitemap and canonical structure, and so far the data seems consistent with the replies to this thread. In GWT, the sitemap stats show less than 1% of the URLs submitted are indexed so far. We have an action plan now to update the canonical structure and the sitemap to point to URLs which will be naturally crawled on the site as well.
-
There's no "have to" in most of these situations, but it boils down to this - the more canonical your canonical URL actually is, the better chance you have of Google honoring it. In other words, if you set a canonical tag but then never use that in internal links or your XML sitemap, odds are pretty good that Google may ignore the tag in some cases. You're basically saying "Hey, this URL is canonical! No, this one is! No, this one!" - it's a mixed message, and they're going to try to interpret it algorithmically.
I definitely think pointing to yet another version in the XML sitemap is a problem. Ideally, it would be great to unify your URLs, but if that's not possible, getting the canonical version in the sitemap would be a big help (and introducing yet another variant isn't good, so you'd kill two birds with one stone). As Andy said, if you could create some kind of internal link to the canonical version, even if it's not the main link, that could also help. I only hesitate on that one, because you don't want to end up with a weird, artificial linking structure (just creating links to have links).
Please note, this isn't necessarily a disaster the way you have it. Google could honor the tags properly and generally rank your site correctly. In my experience, though, it's a recipe for long-term problems, and it's worth fixing.
-
The purpose of the canonical tag is to tell Google which page to index first. So, on that note, I usually use the canonical tag on the strongest page in terms of pagerank, as this shows which page is linked to the best.
I'm also guessing you're using a framwork/platform like Magento, this can make linking quite difficult. I often suggest creating Brand pages, and link to the product page, the "3rd URL", from there. Brand pages also great for SEO, as most people search for brands first. Great place to get some fat head keywords in.
Also, make sure you put in the http:// as well, I think it is good practice to put in the full URL.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Pending Sitemaps
Hi, all Wondering if someone could give me a pointer or two, please. I cannot seem to get Google or Bing to crawl my sitemap. If I submit the sitemap in WMT and test it I get a report saying 44,322urls found. However, if I then submit that same sitemap it either says Pending (in old WMT) or Couldn't fetch in the new version. This couldn't fetch is very puzzling as it had no issue fetching the map to test it. My other domains on the same server are fine, the problem is limited to this one site. I have tried several pages on the site using the Fetch as Google tool and they load without issue, however, try as I may, it will not fetch my sitemap. The sitemapindex.xml file won't even submit. I can confirm my sitemaps, although large, work fine, please see the following as an example (minus the spaces, of course, didn't want to submit and make it look like I was just trying to get a link) https:// digitalcatwalk .co.uk/sitemap.xml https:// digitalcatwalk .co.uk/sitemapindex.xml I would welcome any feedback anyone could offer on this, please. It's driving me mad trying to work out what is up. Many thanks, Jeff
Intermediate & Advanced SEO | | wonkydogadmin0 -
Canonical and Alternate Advice
At the moment for most of our sites, we have both a desktop and mobile version of our sites. They both show the same content and use the same URL structure as each other. The server determines whether if you're visiting from either device and displays the relevant version of the site. We are in a predicament of how to properly use the canonical and alternate rel tags. Currently we have a canonical on mobile and alternate on desktop, both of which have the same URL because both mobile and desktop use the same as explained in the first paragraph. Would the way of us doing it at the moment be correct?
Intermediate & Advanced SEO | | JH_OffLimits3 -
Will disallowing URL's in the robots.txt file stop those URL's being indexed by Google
I found a lot of duplicate title tags showing in Google Webmaster Tools. When I visited the URL's that these duplicates belonged to, I found that they were just images from a gallery that we didn't particularly want Google to index. There is no benefit to the end user in these image pages being indexed in Google. Our developer has told us that these urls are created by a module and are not "real" pages in the CMS. They would like to add the following to our robots.txt file Disallow: /catalog/product/gallery/ QUESTION: If the these pages are already indexed by Google, will this adjustment to the robots.txt file help to remove the pages from the index? We don't want these pages to be found.
Intermediate & Advanced SEO | | andyheath0 -
Image URLs - best practice
Hi - I'm assuming image URL best practice follows same principles as non image URLs (not too many files and so on) - I notice alot of web devs putting photos in subdomains, so wonder if I'm missing something (I usually avoid subdomains like the plague)!
Intermediate & Advanced SEO | | McTaggart1 -
Hreflang in vs. sitemap?
Hi all, I decided to identify alternate language pages of my site via sitemap to save our development team some time. I also like the idea of having leaner markup. However, my site has many alternate language and country page variations, so after creating a sitemap that includes mostly tier 1 and tier 2 level URLs, i now have a sitemap file that's 17mb. I did a couple google searches to see is sitemap file size can ever be an issue and found a discussion or two that suggested keeping the size small and a really old article that recommended keeping it < 10mb. Does the sitemap file size matter? GWT has verified the sitemap and appears to be indexing the URLs fine. Are there any particular benefits to specifying alternate versions of a URL in vs. sitemap? Thanks, -Eugene
Intermediate & Advanced SEO | | eugene_bgb0 -
Sitemap on a Subdomain
Hi, For various reasons I placed my sitemaps on a subdomain where I keep images and other large files (static.example.com). I then submitted this to Google as a separate site in Webmaster tools. Is this a problem? All of the URLs are for the actual site (www.example.com), the only issue on my end is not being able to look at it all at the same time. But I'm wondering if this would cause any problems on Google's end.
Intermediate & Advanced SEO | | enotes0 -
Set up a rel canonical
I have a question. I was wondering, if it was possible to set up a rel canonical. When I can't access the non canonical pages? For example, my site as at www.site.com , but the non cannocail is at site.com is their any way to set thet up without actually edting it at site.com ? Thanks for your help
Intermediate & Advanced SEO | | PeterRota0 -
301 redirect with /? in URL
For a Wordpress site that has the ending / in the URL with a ? after it... how can you do a 301 redirect to strip off anything after the / For example how to take this URL domain.com/article-name/?utm_source=feedburner and 301 to this URL domain.com/article-name/ Thank you for the help
Intermediate & Advanced SEO | | COEDMediaGroup0