Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Include Cross Domain Canonical URL's in Sitemap - Yes or No?
-
I have several sites that have cross domain canonical tags setup on similar pages. I am unsure if these pages that are canonicalized to a different domain should be included in the sitemap. My first thought is no, because I should only include pages in the sitemap that I want indexed.
On the other hand, if I include ALL pages on my site in the sitemap, once Google gets to a page that has a cross domain canonical tag, I'm assuming it will just note that and determine if the canonicalized page is the better version. I have yet to see any errors in GWT about this. I have seen errors where I included a 301 redirect in my sitemap file. I suspect its ok, but to me, it seems that Google would rather not find these URL's in a sitemap, have to crawl them time and time again to determine if they are the best page, even though I'm indicating that this page has a similar page that I'd rather have indexed.
-
I looked at the sitemap, and they are including the http://www.seomoz.org/blog/the-story-of-seomoz but not the canonical page - http://www.masternewmedia.org/entrepreneurship-the-full-story-of-seomoz-told-by-rand-fishkin/
So based on this example, the page on SEOMoz is still included in the sitemap, regardless if it has a canonical or not.
This seems to make sense, since canonical links are used only as a hint and not an absolute directive.
I also noticed that Google is choosing to index and rank both pages, on Page 1.
SEOMoz is ranking higher on my browser for "the full story of seomoz". A few things going on here.
-
Why is google choosing to rank SEOMoz higher than Mastermedia.org for this page? There's a canonical setup, but google is choosing not to follow it. (again its a hint not an absolute) this doesn't always work.
-
I would think Google would be able to filter out the duplicate content easy. In this example, they are clearly not. SEOMoz is ranking #4 and Masternewmedia.org is ranking #5 for query "the full story of seomoz"
-
-
Right - as far as I know, you're supposed to put end URLs into a sitemap, not urls which 301 redirect. Cross domain canonical is still kind of new, but I would treat them as a 301 redirect and not include them in a sitemap.
Now, if you're curious, SEO Moz did a whiteboard Friday where they talked about this same exact issue (cross domain canonical), and as an experiment, re-posted a blog article from another blogger on SEO Moz.
http://www.seomoz.org/blog/cross-domain-canonical-the-new-301-whiteboard-friday
http://www.seomoz.org/blog-sitemap.xml
http://www.seomoz.org/blog/the-story-of-seomoz
The blog is still included in the blog sitemap. I think it probably won't 'hurt' to keep those pages in the sitemap, since a lot of sitemaps automatically generated CMS tools won't have been updated to deal with this yet.
-
There is no BIG problem if you add the pages that contain cross domain canonical tag on them. Why?
The reason why I can say this is because Google is not only indexing the pages from sitemap.xml file, Google have their own crawler and they have the ability to crawl and index the website no matter if you do not have an xml sitemap.
Google is very good at (in my opinion) picking the instructions that are available on the page so if you add the page in the xml sitemap, the crawler will read the instructions on the page and will only index the page that contain original content.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Should I redirect a domain we control but which has been labeled 'toxic' or just shut it down?
Hi Mozzers: We recently launched a site for a client which involved bringing in and redirecting content which formerly had been hosted on different domains. One of these domains still existed and we have yet to bring over the content from it. It has also been flagged as a suspicious/toxic backlink source to our new domain. Would I be wise to redirect this old domain or should I just shut it down? None of the pages seem to have particular equity as link sources. Part of me is asking myself 'Why would we redirect a domain deemed toxic, why not just shut it down.' Thanks in advance, dave
Intermediate & Advanced SEO | | Daaveey0 -
How to stop URLs that include query strings from being indexed by Google
Hello Mozzers Would you use rel=canonical, robots.txt, or Google Webmaster Tools to stop the search engines indexing URLs that include query strings/parameters. Or perhaps a combination? I guess it would be a good idea to stop the search engines crawling these URLs because the content they display will tend to be duplicate content and of low value to users. I would be tempted to use a combination of canonicalization and robots.txt for every page I do not want crawled or indexed, yet perhaps Google Webmaster Tools is the best way to go / just as effective??? And I suppose some use meta robots tags too. Does Google take a position on being blocked from web pages. Thanks in advance, Luke
Intermediate & Advanced SEO | | McTaggart0 -
Can't generate a sitemap with all my pages
I am trying to generate a site map for my site nationalcurrencyvalues.com but all the tools I have tried don't get all my 70000 html pages... I have found that the one at check-domains.com crawls all my pages but when it writes the xml file most of them are gone... seemingly randomly. I have used this same site before and it worked without a problem. Can anyone help me understand why this is or point me to a utility that will map all of the pages? Kindly, Greg
Intermediate & Advanced SEO | | Banknotes0 -
URL structure change and xml sitemap
At the end of April we changed the url structure of most of our pages and 301 redirected the old pages to the new ones. The xml sitemaps were also updated at that point to reflect the new url structure. Since then Google has not indexed the new urls from our xml sitemaps and I am unsure of why. We are at 4 weeks since the change, so I would have thought they would have indexed the pages by now. Any ideas on what I should check to make sure pages are indexed?
Intermediate & Advanced SEO | | ang0 -
Will disallowing URL's in the robots.txt file stop those URL's being indexed by Google
I found a lot of duplicate title tags showing in Google Webmaster Tools. When I visited the URL's that these duplicates belonged to, I found that they were just images from a gallery that we didn't particularly want Google to index. There is no benefit to the end user in these image pages being indexed in Google. Our developer has told us that these urls are created by a module and are not "real" pages in the CMS. They would like to add the following to our robots.txt file Disallow: /catalog/product/gallery/ QUESTION: If the these pages are already indexed by Google, will this adjustment to the robots.txt file help to remove the pages from the index? We don't want these pages to be found.
Intermediate & Advanced SEO | | andyheath0 -
XML Sitemap on another domain
Hi, We've rebuilt our website and created a better sitemap index structure. There's a good chance that we not be able to append the XML files to existing site for technical reasons (don't get me started). I'm reaching out because I'm wondering if can we place the XML files on another website or subdomain? I know this is not best practice and probably very grey but I'm looking for alternatives. If there answer is DON'T DO IT let me know too. Thx
Intermediate & Advanced SEO | | WMCA0 -
Posing QU's on Google Variables "aclk", "gclid" "cd", "/aclk" "/search", "/url" etc
I've been doing a bit of stats research prompted by read the recent ranking blog http://www.seomoz.org/blog/gettings-rankings-into-ga-using-custom-variables There are a few things that have come up in my research that I'd like to clear up. The below analysis has been done on my "conversions". 1/. What does "/aclk" mean in the Referrer URL? I have noticed a strong correlation between this and "gclid" in the landing page variable. Does it mean "ad click" ?? Although they seem to "closely" correlate they don't exactly, so when I have /aclk in the referrer Url MOSTLY I have gclid in the landing page URL. BUT not always, and the same applies vice versa. It's pretty vital that I know what is the best way to monitor adwords PPC, so what is the best variable to go on? - Currently I am using "gclid", but I have about 25% extra referral URL's with /aclk in that dont have "gclid" in - so am I underestimating my number of PPC conversions? 2/. The use of the variable "cd" is great, but it is not always present. I have noticed that 99% of my google "Referrer URL's" either start with:
Intermediate & Advanced SEO | | James77
/aclk - No cd value
/search - No cd value
/url - Always contains the cd variable. What do I make of this?? Thanks for the help in advance!0 -
What's your best hidden SEO secret?
Don't take that question too serious but all answers are welcome 😉 Answer to all:
Intermediate & Advanced SEO | | petrakraft
"Gentlemen, I see you did you best - at least I hope so! But after all I suppose I am stuck here to go on reading the SEOmoz blog if I can't sqeeze more secrets from you!9