Google indexing less url's then containded in my sitemap.xml
-
My sitemap.xml contains 3821 urls but Google (webmaster tools) indexes only 1544 urls. What may be the cause? There is no technical problem. Why does Google index less URLs then contained in my sitemap.xml?
-
Thank you for helping
-
Unless you have a SEO actively reviewing your site, it is quite normal for Google to index less pages then are offered in your sitemap.
How exactly was your sitemap created? Did you go by hand through your site's 3281 pages and add them to a sitemap? Or more likely, did you use a tool to create the sitemap? If you used a tool, how much knowledge do you have regarding how this tool works or its settings?
Just a few examples of URLs which may be included in your sitemap that Google would likely not index:
-
Your home page and other pages may have multiple URLs which lead to the same page. For example: www.mysite.com and www.mysite.com/index.html may be two URLs for the same page. Google will likely only index one of them.
-
You may have links to various URLs which contain parameters which Google will reduce to a single URL. For example: www.mysite.com/product_id=308&sort=asc&color=black, and another URL www.mysite.com/product_id=308&sort=desc&color=black. Both URLs lead to the same content sorted differently.
-
You may have duplicate content on your site. For example, you can sell chairs and list the same chair under multiple paths such as /furniture/wood/chair123 and /furniture/dining-room/chair123. Google will recognize these two pages are the same content presented under multiple URLs.
-
You may have submitted pages to your sitemap which are blocked via robots.txt or the "noindex" tag or are canonicalized to another page.
In order to better understand the root issue you need to examine a list of all URLs in your sitemap and compare that to a list of all indexed URLs. Determine which URLs Google has not indexed and research the reason for each one independently.
-
-
Are they index worthy?
Having them on your sitemap does not mean google wants them in its index
-
He just said it. Is this a new domain? Im in the same boat as you for some of my domains.
-
Yes, I understand this. But In this situation Google first indexes all the URL's within my sitemap.xml uploaded in Google Webmaster tools. Now Google indexes less URL's, only 50%. What can be the cause if there are no technical problems?
-
Hi!
Google will only spend 'so much time' on any new domain. The more traffic and links and page authority you get, the more time Google will dedicate to crawling your website. You should also make sure that the site is not slow, as this will reduce the crawling speed even more! See Google page speed for tips on speeding up the load time of your site
Good Luck,
Sven Witteveen
Expand Online
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Do URLs with canonical tags get indexed by Google?
Hi, we re-branded and launched a new website in February 2016. In June we saw a steep drop in the number of URLs indexed, and there have continued to be smaller dips since. We started an account with Moz and found several thousand high priority crawl errors for duplicate pages and have since fixed those with canonical tags. However, we are still seeing the number of URLs indexed drop. Do URLs with canonical tags get indexed by Google? I can't seem to find a definitive answer on this. A good portion of our URLs have canonical tags because they are just events with different dates, but otherwise the content of the page is the same.
Technical SEO | | zasite0 -
Should I remove these pages from the Google index?
Hi there, Please have a look at the following URL http://www.elefant-tours.com/index.php?callback=imagerotator&gid=65&483. It's a "sitemap" generated by a Wordpress plug-in called NextGen gallery and it maps all the images that have been added to the site through this plugin, which is quite a lot in this case. I can see that these "sitemap" pages have been indexed by Google and I'm wondering whether I should remove these or not? In my opinion these are pages that a search engine would never would want to serve as a search result and pages that a visitor never would want to see. Attracting any traffic through Google images is irrelevant in this case. What is your advice? Block it or leave it indexed or something else?
Technical SEO | | Robbern0 -
Parked former company's url on top of my existing url and that URL is showing in SERPs for my top keywords
I have the URL from my former company parked on top of my existing URL. My top keywords are showing up with the old URL attached to the metadsecription of my existing URL. It was supposed to be 301 redirected instead of parked but my web developer insists this was the right way to do it and it will work itself out after google indexes the old URL out of existence. Are there any other options?
Technical SEO | | Joelabarre0 -
OSE says URL redirects to URL with trailing slash but it doesn't.
Site is www.example.com/folder/us and OSE says this URL redirects to www.example.com/folder/us/, but it does not. When I look at the OSE report for the latter version with the "/" it says "No Data Available For This URL". Why would that be? The original URL is www.example.com and it redirects to www.example.com/folder/us. Is this anything I need to worry about? I thought that the trailing / doesn't really mean much anymore but nonetheless, why does it think it redirects there?
Technical SEO | | rock220 -
Google PR Rank Question (s)
Hi
Technical SEO | | damientown
My Google PR rank is still 1/10 (www.abouttownmarketing.com) after 12 weeks of daily SEO work building what I thought were quality back links. Does anyone know how often Google updates its PR rank? Also is it a linear measure or link adwords quality score is it exponential?
Many thanks
Damien0 -
How can I best find out which URLs from large sitemaps aren't indexed?
I have about a dozen sitemaps with a total of just over 300,000 urls in them. These have been carefully created to only select the content that I feel is above a certain threshold. However, Google says they have only indexed 230,000 of these urls. Now I'm wondering, how can I best go about working out which URLs they haven't indexed? No errors are showing in WMT related to these pages. I can obviously manually start hitting it, but surely there's a better way?
Technical SEO | | rango0 -
URL restructure and phasing out HTML sitemap
Hi SEOMozzies, Love the Q&A resource and already found lots of useful stuff too! I just started as an in-house SEO at a retailer and my first main challenge is to tidy up the complex URL structures and remove the ugly sub sitemap approach currently used. I already found a number of suggestions but it looks like I am dealing with a number of challenges that I need to resolve in a single release. So here is the current setup: The website is an ecommerce site (department store) with around 30k products. We are using multi select navigation (non Ajax). The main website uses a third party search engine to power the multi select navigation, that search engine has a very ugly URL structure. For example www.domain.tld/browse?location=1001/brand=100/color=575&size=1&various other params, or for multi select URL’s www.domain.tld/browse?location=1001/brand=100,104,506/color=575&size=1 &various other non used URL params. URL’s are easily up to 200 characters long and non-descriptive at all to our users. Many of these type of URL’s are indexed by search engines (we currently have 1.2 million of those URL’s indexed including session id’s and all other nasty URL params) Next to this the site is using a “sub site” that is sort of optimized for SEO, not 100% sure this is cloaking but it smells like it. It has a simplified navigation structure and better URL structure for products. Layout is similair to our main site but all complex HTMLelements like multi select, large top navigations menu's etc are all removed. Many of these links are indexed by search engines and rank higher than links from our main website. The URL structure is www.domain.tld/1/optimized-url .Currently 64.000 of these URL’s are indexed. We have links to this sub site in the footer of every page but a normal customer would never reach this site unless they come from organic search. Once a user lands on one of these pages we try to push him back to the main site as quickly as possible. My planned approach to improve this: 1.) Tidy up the URL structure in the main website (e.g. www.domain.tld/women/dresses and www.domain.tld/diesel-red-skirt-4563749. I plan to use Solution 2 as described in http://www.seomoz.org/blog/building-faceted-navigation-that-doesnt-suck to block multi select URL’s from being indexed and would like to use the URL param “location” as an indicator for search engines to ignore the link. A risk here is that all my currently indexed URL (1.2 million URL’s) will be blocked immediately after I put this live. I cannot redirect those URL’s to the optimized URL’s as the old URL’s should still be accessible. 2.) Remove the links to the sub site (www.domain.tld/1/optimized-url) from the footer and redirect (301) all those URL’s to the newly created SEO friendly product URL’s. URL’s that cannot be matched since there is no similar catalog location in the main website will be redirected (301) to our homepage. I wonder if this is a correct approach and if it would be better to do this in a phased way rather than the currently planned big bang? Any feedback would be highly appreciated, also let me know if things are not clear. Thanks! Chris
Technical SEO | | eCommerceSEO0