Is it OK for a sitemap to appear as a "Top URL" in Google Webmaster?
-
I'm using Google Webmaster (alongside other tools) to understand how Google is indexing my site.
One of the tools is "Content Keywords", where it lists keywords that Google sees as significant for your site. The keywords shown are generally fine, but when I click on an individual word, I am often seeing our sitemap as one of the "Top URLs" that the keyword is found on (our sitemap is at system/sitemap1.xml.gz) - is this OK?
Obviously I don't want to add the sitemap URL to robots.txt, but I also want to ensure that 'real' user-focused pages (e.g. our homepage) appear higher in the "Top URLs" list for the keywords, as I'm assuming this is an indicator of how the site is performing in search.
Any help appreciated!
-
Thanks for the answer. However I'm still unclear on a few things so I thought I'd give some further info:
- We actually have two XML sitemaps - one for our main site including our forums (this sitemap is generated/submitted by a ruby on rails plugin) and one for blog posts and static pages (this sitemap is generated by a Wordpress plugin). The sitemap which is appearing as a "Top URL" is the first one
- There are actually no links to our sitemap anywhere on our site - the only way Google knows about it is because we automatically generate and submit it to Webmaster
I think the reason that it is appearing as a Top URL is because all of the page titles of forum posts are listed in the sitemap, and this is the only page where they are all listed on one page. So I think you are right about the 'simple algorithm' thing, but I think it's because of the frequency of the keyword in the sitemap, rather than because the sitemap is linked to from anywhere on the site (because it's not).
This brings me to a related question - is it bad having two separate XML sitemaps, and should I be linking to them somehow from the site?
-
I wouldn't be overly concerned.
For some terms, especially product codes and the detail pages of your site there are probably only going to be three pages where that term appears. The product page itself, the page within the navigation that links to that page (normally a list), and the sitemap.
Your sitemap is probably heavily linked to across the site so it does kind of make sense that it would appear as one of the top URLs for a term.
The reason I wouldn't be overly concerned is that I would IMAGINE (and I could be totally wrong) that the top Pages list is generated by a very simple algorithm that doesn't reflect how the organic search algorithm sees your site.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
What could cause Google to not honor canonical URLs?
I have a strange situation on a website, when I do a Google query of site:example.com all the top indexed results appear to be queries that users can perform on the website. So any random term the user searches for on the website for some reason is causing the search result page to get indexed - like example.com/search/query/random-keywords However, the search results page has a canonical tag on it that points to example.com/search, but that doesn't seem to be doing anything. Any thoughts or ideas why this could be happening?
Technical SEO | | IrvCo_Interactive0 -
URL Indexed But Not Submitted to Sitemap
Hi guys, In Google's webmaster tool it says that the URL has been indexed but not submitted to the sitemap. Is it necessary that the URL be submitted to the sitemap if it has already been indexed? Appreciate your help with this. Mark
Technical SEO | | marktheshark100 -
Target="_blank"
Do href links that leave a site and use target="_blank" to open a new tab impact SEO?
Technical SEO | | ChristopherGlaeser0 -
How to Remove /feed URLs from Google's Index
Hey everyone, I have an issue with RSS /feed URLs being indexed by Google for some of our Wordpress sites. Have a look at this Google query, and click to show omitted search results. You'll see we have 500+ /feed URLs indexed by Google, for our many category pages/etc. Here is one of the example URLs: http://www.howdesign.com/design-creativity/fonts-typography/letterforms/attachment/gilhelveticatrade/feed/. Based on this content/code of the XML page, it looks like Wordpress is generating these: <generator>http://wordpress.org/?v=3.5.2</generator> Any idea how to get them out of Google's index without 301 redirecting them? We need the Wordpress-generated RSS feeds to work for various uses. My first two thoughts are trying to work with our Development team to see if we can get a "noindex" meta robots tag on the pages, by they are dynamically-generated pages...so I'm not sure if that will be possible. Or, perhaps we can add a "feed" paramater to GWT "URL Parameters" section...but I don't want to limit Google from crawling these again...I figure I need Google to crawl them and see some code that says to get the pages out of their index...and THEN not crawl the pages anymore. I don't think the "Remove URL" feature in GWT will work, since that tool only removes URLs from the search results, not the actual Google index. FWIW, this site is using the Yoast plugin. We set every page type to "noindex" except for the homepage, Posts, Pages and Categories. We have other sites on Yoast that do not have any /feed URLs indexed by Google at all. Side note, the /robots.txt file was previously blocking crawling of the /feed URLs on this site, which is why you'll see that note in the Google SERPs when you click on the query link given in the first paragraph.
Technical SEO | | M_D_Golden_Peak0 -
Sitemaps and "noindex" pages
Experimenting a little bit to recover from Panda and added "noindex" tag for quite a few pages. Obviously now we need Google to re-crawl them ASAP and de-index. Should we leave these pages in sitemaps (with updated "lastmod") for that? Or just patiently wait? 🙂 What's the common/best way?
Technical SEO | | LocalLocal0 -
Same URL in "Duplicate Content" and "Blocked by robots.txt"?
How can the same URL show up in Seomoz Crawl Diagnostics "Most common errors and warnings" in both the "Duplicate Content"-list and the "Blocked by robots.txt"-list? Shouldnt the latter exclude it from the first list?
Technical SEO | | alsvik0 -
I have a lot of warnings for "Overly-Dynamic URL"
I have a lot of warnings for "Overly-Dynamic URLs" but all the pages listed have a canonical with a static url , does this mean that I can ignore the warnings? Seems to me that I can but I just want to make sure?
Technical SEO | | Arnx1 -
Directory URL structure last / in the url
Ok, So my site's urls works like this www.site.com/widgets/ If you go to www.site.com/widgets (without the last / ) you get a 404. My site did no used to require the last / to load the page but it has over the last year and my rankings have dropped on those pages... But Yahoo and BING still indexes all my pages without the last / and it some how still loads the page if you go to it from yahoo or bing, but it looks like this in the address bar once you arrive from bing or yahoo. http://www.site.com/404.asp?404;http://site.com:80/widgets/ How do I fix this? Should'nt all the engines see those pages the same way with the last / included? What is the best structure for SEO?
Technical SEO | | DavidS-2820610