Canonical and Sitemap issue
-
Hi all,
I was told that I could change my homepage Canonical tag to match that of my XML sitemap, this sitemap is being generated for me automatically and shows the homepage as e.g. https://www.mysite.com/index.html, yet my Canonical tag has been set to https://www.mysite.com.
Google currently shows as https://www.mysite.com/ being indexed, but https://www.mysite.com/index.html is not currently displayed in search results.
Can someone please tell me if I should change the Canonical to the index.html version, or if I should do nothing, or remove the Canonical tag altogether?
Thank you for looking.
-
I agree with the others. Given "https://www.mysite.com/index.html is not currently displayed in search results", in all likelihood it is being redirected to https://www.mysite.com (and should be). So you don't want to change the canonical to the index.html version of the page only to have it redirected back to https://www.mysite.com. It'll unnecessarily slow the site and might even create a loop.
-
Thank you both, I'll leave it as it is, I'm not able to edit the XML my side sadly.
-
Yes, that's a good point. Canonicals are suggestions for Google, not commands.
-
I see your point, and don't worry about it. Sitemaps help Google find all of your pages and can provide certain other information, but they are not required so no need to overthink them. In general Google is pretty good at finding what it needs to find. And it will certainly find your homepage.
-
I agree with Linda here, I would leave the canonical tag as is. It is a cleaner, better looking URL for the SERPs. If anything, manually update the XML file to reflect the canonical version of the homepage. The main purpose of the XML sitemap is to help search engines crawl and index a website. The homepage is going to be the most frequently crawled page so Google will not have a problem finding it.
Also, do not worry about Google disliking the canonical pointing to .com instead of /index.html. If Google determines that is not the ideal URL for it's index it will ignore the canonical tag.
-
Hi,
Thanks, basically I was concerned that Google may not like that https://www.mysite.com/ was not in the sitemap, yet index.html was and the canonical was pointing to https://www.mysite.com.
If that makes any sense....
-
What are you trying to achieve? Do you particularly want the index.html version to be the canonical? The https://www.mysite.com/ version is more straightforward and what most people would expect your homepage URL to be.
Unless there is some pressing reason to do otherwise, I'd leave it the way it is.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
How important is anchor text in your sitemap?
I've read in a few recent articles that using keyword anchor text in your HTML sitemap is a good idea i.e. important. How important do you think it is? I'd love to hear your thoughts. Example 1: Widgets: View All Colors: Red | Blue | Green | Yellow | Orange | Purple Types: Oversized | Large | Small | Miniature Example 2: Widgets: View All Widgets Colors: Red Widgets | Blue Widgets | Green Widgets | Yellow Widgets | Orange Widgets | Purple Widgets Types: Oversized Widgets | Large Widgets | Small Widgets | Miniature Widgets
Web Design | | Choice0 -
Website Server Issue?
I'm getting error messages that a website cannot be crawled and it might be due to the following issues: Couldn't access the webpage because the server either timed out or refused/closed the connection before our crawler could receive a response. How to fix: Please contact your web hosting technical support team and ask them to fix the issue Could Possibly Be:
Web Design | | PrimeMediaConsulting
1. DDoS protection system.
OR
2. Overloaded or misconfigured server They asked me to talk to my hosting company about this issue and he's at a loss (I don't think he knows everything he needs to know potentially). Have you seen these issues before? Where is the best spot to start troubleshooting this issue?0 -
Site structure and Visual Sitemaps
Aside from mind mapping software are there any tools ( recommended) to build a visual sitemap of the internal linking structure of a URL? I've been trying to 'show' clients the structure of a website as it pertains to internal and external links. Here is one I've tried it's "Close" - http://site-visualizer.com/ . I've been using the excel export function, import into mind meister and building it. It's a teeny bit time consuming for large websites. Site structure I feel is a valuable portion of SEO and a down and dirty visual explanation would be great. Don't get me wrong, it offers other benefits as well- it's just I'd like to free up the time it takes. Thank you in advance. Screen shots are available on the website of the organization.
Web Design | | TammyWood0 -
Unable to submit sitemap in GWM.. Error
I recently published new EMD and installed WP on their. Now i installed the Plugin called Yoast <acronym title="Search Engine Optimization">SEO</acronym> and XML Sitemap. Now whenever i am trying to submit sitemap it shows "URL restricted by robots.txt" but you can see my robots file line written below: User-agent: *
Web Design | | xplodeguru
Disallow: /cgi-bin/
Disallow: /wp-admin/ But still it is showing same error.. i deactivate plugin and resubmit the sitemap but still no luck.. please help0 -
Why is there no sitemap.xml for SEOmoz?
I noticed that SEOmoz does not have a "root" sitemap called sitemap.xml. On the other hand, there do appear to be sitemaps for various sections of the such as http://www.seomoz.org/blog-sitemap.xml I was planning on having a root level sitemap that referenced difference sections of my site (blog, support, etc.) but I'm a little concerned that this site itself doesn't seem to be following that practice. Presumably this website is submitting the individual section maps to Google directly since they aren't linkable through sitemap.xml?
Web Design | | schof0 -
Google search issue with exact domain
We had a site from Feb-2011 to Nov-2011 at the domain amcoexterminating.com. The site was pure HTML/CSS and the daily unique visitors steadily increased over that time. So all was fine. We then moved the site to a CMS (Joomla) on Dec. 6th. From that day forward, the daily visitors went into the tank. Before the move, if you typed "amcoexterminating.com" or "amco exterminating" into Google search, the site would be the first result (as you'd expect since those are the words that make up the actua domain). But we tried this yesterday and the site did not come up at all. NOT GOOD. It would work in Yahoo or Bing, but not in Google. So obviously, the problem with Google search directly affected the daily visitors. We just checked Webmaster tools yesterday (yes, this should have been done sooner, lesson learned) and it said "Site has severe health issues - Important page blocked by robots.txt". It listed the "important" page URL and it was just a link to an image. Regardless, I wiped out the Joomla created robots.txt file and added a new one and made it just say... User-agent: *Allow: / About 14 hours later, after the new robots.txt file was recognized by Google, the "severe health" message went away. However if I search in Google for "amcoexterminating.com", it still doesn't show up and the client is concerned (as they should be). Do you think the search engines just need more time to refresh? If so, once it refreshes, should the site show up first again right away? Or is it possible the robots.txt file had nothing to do with the issue? If so, what other things could I check into that might cause Google search to not find a site even if you search for exact domain name? Please share any and all things I should look into as I need to get this site showing in Google search again (as it was before moving to the CMS). Thanks!
Web Design | | MarathonMS0 -
Canonical Tag
I've been helping someone out with their website, and I noticed the person who built the site made the canonical tags like this:
Web Design | | StandUpCubicles
href="http://www.example.com/" rel="canonical" /> I'm use to seeing it how seomoz does it: Does this matter? Is it ok to have it inverted? They also have another canonical tag in there like this:
var hs_canonical_url = "http\x3A\x2F\x2Fwww.example.com\x2Fhome" Any idea what that is? Could it be hurting the site?0 -
Crawl Budget vs Canonical
Got a debate raging here and I figured I'd ask for opinions. We have our websites structured as site/category/product This is fine for URL keywords, etc. We also use this for breadcrumbs. The problem is that we have multiple categories into which a category fits. So "product" could also be at site/cat1/product
Web Design | | Highland
site/cat2/product
site/cat3/product Obviously this produces duplicate content. There's no reason why it couldn't live under 1 URL but it would take some time and effort to do so (time we don't necessarily have). As such, we're applying the canonical band-aid and calling it good. My problem is that I think this will still kill our crawl budget (this is not an insignificant number of pages we're talking about). In some cases the duplicate pages are bloating a site by 500%. So what say you all? Do we just simply do canonical and call it good or do we need to take into account the crawl budget and actually remove the duplicate pages. Or am I totally off base and canonical solves the crawl budget issue as well?0