Include Cross Domain Canonical URL's in Sitemap - Yes or No?
-
I have several sites that have cross domain canonical tags setup on similar pages. I am unsure if these pages that are canonicalized to a different domain should be included in the sitemap. My first thought is no, because I should only include pages in the sitemap that I want indexed.
On the other hand, if I include ALL pages on my site in the sitemap, once Google gets to a page that has a cross domain canonical tag, I'm assuming it will just note that and determine if the canonicalized page is the better version. I have yet to see any errors in GWT about this. I have seen errors where I included a 301 redirect in my sitemap file. I suspect its ok, but to me, it seems that Google would rather not find these URL's in a sitemap, have to crawl them time and time again to determine if they are the best page, even though I'm indicating that this page has a similar page that I'd rather have indexed.
-
I looked at the sitemap, and they are including the http://www.seomoz.org/blog/the-story-of-seomoz but not the canonical page - http://www.masternewmedia.org/entrepreneurship-the-full-story-of-seomoz-told-by-rand-fishkin/
So based on this example, the page on SEOMoz is still included in the sitemap, regardless if it has a canonical or not.
This seems to make sense, since canonical links are used only as a hint and not an absolute directive.
I also noticed that Google is choosing to index and rank both pages, on Page 1.
SEOMoz is ranking higher on my browser for "the full story of seomoz". A few things going on here.
-
Why is google choosing to rank SEOMoz higher than Mastermedia.org for this page? There's a canonical setup, but google is choosing not to follow it. (again its a hint not an absolute) this doesn't always work.
-
I would think Google would be able to filter out the duplicate content easy. In this example, they are clearly not. SEOMoz is ranking #4 and Masternewmedia.org is ranking #5 for query "the full story of seomoz"
-
-
Right - as far as I know, you're supposed to put end URLs into a sitemap, not urls which 301 redirect. Cross domain canonical is still kind of new, but I would treat them as a 301 redirect and not include them in a sitemap.
Now, if you're curious, SEO Moz did a whiteboard Friday where they talked about this same exact issue (cross domain canonical), and as an experiment, re-posted a blog article from another blogger on SEO Moz.
http://www.seomoz.org/blog/cross-domain-canonical-the-new-301-whiteboard-friday
http://www.seomoz.org/blog-sitemap.xml
http://www.seomoz.org/blog/the-story-of-seomoz
The blog is still included in the blog sitemap. I think it probably won't 'hurt' to keep those pages in the sitemap, since a lot of sitemaps automatically generated CMS tools won't have been updated to deal with this yet.
-
There is no BIG problem if you add the pages that contain cross domain canonical tag on them. Why?
The reason why I can say this is because Google is not only indexing the pages from sitemap.xml file, Google have their own crawler and they have the ability to crawl and index the website no matter if you do not have an xml sitemap.
Google is very good at (in my opinion) picking the instructions that are available on the page so if you add the page in the xml sitemap, the crawler will read the instructions on the page and will only index the page that contain original content.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
I need help on how best to do a complicated site migration. Replacing certain pages with all new content and tools, and keeping the same URL's. The rest just need to disappear safely. Somehow.
I'm completely rebranding a website but keeping the same domain. All content will be replaced and it will use a different theme and mostly new plugins. I've been building the new site as a different site in Dev mode on WPEngine. This means it currently has a made-up domain that needs to replace the current site. I know I need to somehow redirect the content from the old version of the site. But I'm never going to use that content again. (I could transfer it to be a Dev site for the current domain and automatically replace it with the click of a button - just as another option.) What's the best way to replace blahblah.com with a completely new blahblah.com if I'm not using any of the old content? There are only about 4 URL'st, such as blahblah.com/contact hat will remain the same - with all content replaced. There are about 100 URL's that will no longer be in use or have any part of them ever used again. Can this be done safely?
Intermediate & Advanced SEO | | brickbatmove1 -
301ing one site's links to another
Hi, I have one site with a well-established link profile, but no actual reason to exist (site A). I have another site that could use a better link profile (site B). In your experience, would 301 forwarding all of site A's pages to site B do anything positive for the link profile/organic search of the site B? Site A is about boating at a specific lake. Site B is about travel destinations across the U.S. Thanks! Best... Michael
Intermediate & Advanced SEO | | 945010 -
All urls seem to exist (no 404 errors) but they don't.
Hello I am doing a SEO auditing for a website which only has a few pages. I have no cPanel credentials, no FTP no Wordpress admin account, just watching it from the outside. The site works, the Moz crawler didn't report any problem, I can reach every page from the menu. The problem is that - except for the few actual pages - no matter what you type after the domain name, you always reach the home page and don't get any 404 error. I.E. Http://domain.com/oiuxyxyzbpoyob/ (there is no such a page, but i don't get 404 error, the home is displayed and the url in the browser remains Http://domain.com/oiubpoyob/, so it's not a 301 redirect). Http://domain.com/WhatEverYouType/ (same) Could this be an important SEO issue (i.e. resulting in infinite amount of duplicate content pages )? Do you think I should require the owner to prevent this from happening? Should I look into the .htaccess file to fix it ? Thank you Mozers!
Intermediate & Advanced SEO | | DoMiSoL0 -
Why do people put xml sitemaps in subfolders? Why not just the root? What's the best solution?
Just read this: "The location of a Sitemap file determines the set of URLs that can be included in that Sitemap. A Sitemap file located at http://example.com/catalog/sitemap.xml can include any URLs starting with http://example.com/catalog/ but can not include URLs starting with http://example.com/images/." here: http://www.sitemaps.org/protocol.html#location Yet surely it's better to put the sitemaps at the root so you have:
Intermediate & Advanced SEO | | McTaggart
(a) http://example.com/sitemap.xml
http://example.com/sitemap-chocolatecakes.xml
http://example.com/sitemap-spongecakes.xml
and so on... OR this kind of approach -
(b) http://example/com/sitemap.xml
http://example.com/sitemap/chocolatecakes.xml and
http://example.com/sitemap/spongecakes.xml I would tend towards (a) rather than (b) - which is the best option? Also, can I keep the structure the same for sitemaps that are subcategories of other sitemaps - for example - for a subcategory of http://example.com/sitemap-chocolatecakes.xml I might create http://example.com/sitemap-chocolatecakes-cherryicing.xml - or should I add a sub folder to turn it into http://example.com/sitemap-chocolatecakes/cherryicing.xml Look forward to reading your comments - Luke0 -
Interesting Cross Domain Canonical Quirk...
We recently ran cross domain canonicals for 2 of our websites. What's interesting is that when I do a search for ""site:domain1.com "product name"" the Title in the SERPs uses the Domain Name from the site the page has been canonicaled to. So the title for Domain1 (for the search term above) looks like this: Product Name | Keywords | Domain 2 Interesting quirk. Ha anyone else seen this?
Intermediate & Advanced SEO | | AMHC0 -
Canonical Issue with urls
I saw some urls of my site showing duplicate page content, duplicate page title issues on crawl reports. So I have set canonical url for every urls , that has dupicate content / page title. But still SeoMoz crawl test is showing issue. I am giving here one url with issue. The below given urls shown duplicate content and duplicate page title with some other urls all are given below. Checked URL http://www.cyrusrugs.com/bridge-traditional-area-rug-item-7635 dup page content http://www.cyrusrugs.com/bridge-traditional-area-rug-item-7622&category_id=270&colors=Black_Tones&click=colors&ci=1
Intermediate & Advanced SEO | | trixmediainc
http://www.cyrusrugs.com/bridge-traditional-area-rug-item-7622 dup page Title http://www.cyrusrugs.com/bridge-traditional-area-rug-item-7636&category_id=270&sizes=12x15,12x18&click=sizes
http://www.cyrusrugs.com/bridge-traditional-area-rug-item-7636
http://www.cyrusrugs.com/bridge-traditional-area-rug-item-7622&category_id=270&colors=Black_Tones&click=colors&ci=1
http://www.cyrusrugs.com/bridge-traditional-area-rug-item-7622 But I have set canonical url for all these urls already , that is :- http://www.cyrusrugs.com/bridge-traditional-area-rug-item-7622 This should actually solve the problem right ? Search engine should identify the canonical url as original url and only should consider that. Thanks0 -
301 canonical'd pages?
I have an ecommerce site with many different URLs with the same product. Let's say the product is a hat. It's in: a a) mysite.com/products/hat b) mysite.com/collections/head-ware/hat c) mysite.com/collections/stuff-to-wear-on-your-head/hat Right now, A is the canonical page for B and C. I want to clean up my site, so that every product only has ONE unique URL, which is linked to from all the collections. So B and C URL will be broken. Is it necessary that I 301 them if they were already canonical'd? Based on the number of products I have, I would have to 301 1000+ URLs. I'm just trying to figure out what I need to do to avoid getting penalized. thanks
Intermediate & Advanced SEO | | birchlore0 -
Need to duplicate the index for Google in a way that's correct
Usually duplicated content is a brief to fix. I find myself in a little predicament: I have a network of career oriented websites in several countries. the problem is that for each country we use a "master" site that aggregates all ads working as a portal. The smaller nisched sites have some of the same info as the "master" sites since it is relevant for that site. The "master" sites have naturally gained the index for the majority of these ads. So the main issue is how to maintain the ads on the master sites and still make the nische sites content become indexed in a way that doesn't break Google guide lines. I can of course fix this in various ways ranging from iframes(no index though) and bullet listing and small adjustments to the headers and titles on the content on the nisched sites, but it feels like I'm cheating if I'm going down that path. So the question is: Have someone else stumbled upon a similar problem? If so...? How did you fix it.
Intermediate & Advanced SEO | | Gustav-Northclick0