XML Sitemap Index Percentage (Large Sites)
-
Hi all
I'm wanting to find out from those who have experience dealing with large sites (10s/100s of millions of pages).
What's a typical (or highest) percentage of indexed pages vs. submitted pages you've seen? This information can be found in webmaster tools where Google shows you the pages submitted & indexed for each of your sitemap.
I'm trying to figure out whether,
- The average index % out there
- There is a ceiling (i.e. will never reach 100%)
- It's possible to improve the indexing percentage further
Just to give you some background, sitemap index files (according to schema.org) have been implemented to improve crawl efficiency and I'm wanting to find out other ways to improve this further.
I've been thinking about looking at the URL parameters to exclude as there are hundreds (e-commerce site) to help Google improve crawl efficiency and utilise the daily crawl quote more effectively to discover pages that have not been discovered yet.
However, I'm not sure yet whether this is the best path to take or I'm just flogging a dead horse if there is such a ceiling or if I'm already at the average ballpark for large sites.
Any suggestions/insights would be appreciated. Thanks.
-
I've worked on a site that was ~100 million pages, and I've seen indexation percentages ranging from 8% to 95%. When dealing with sites this size, there are so, so many issues at play, and there are so few sites of this size that finding an average probably won't do you much good.
Rather than focusing on whether or not you have enough pages indexed based on averages, you should focus on two key questions: "do my sitemaps only include pages that would make great search engine entry pages" and "have I done everything possible to eliminate junk pages that are wasting crawl bandwidth."
Of course, making sure you don't have any duplicate content, thin content, or poor on-site optimization issues should also be a focus.
I guess what I'm trying to say is, I believe any site can have 100% of it's search entry worthy pages indexed, but sites of that size rarely have ALL of their pages indexed since sites that large often have a ton of pages that don't make great search results.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Should I submit an additional sitemap to speed up indexing
Hi all, Wondered if there was any wisdom on this that anyone could impart my way? I'm moving a set of pages from one area of the site to another - to bring them up the folder structure, and so they generally make more sense. Our URLs are very long in some cases, so this ought to help with some rationalisation there too. We will have redirects in place, but the pages I'm moving are important and I'd like the new paths to be indexed as soon as possible. In such an instance, can I submit an additional sitemap with just these URLs to get them indexed quicker (or to reaffirm that indexing from the initial parse)? The site is thousands of pages. Any benefits / disadvantages anyone could think of? Any thoughts very gratefully received.
Intermediate & Advanced SEO | | ceecee0 -
In Search Console, why is the XML sitemap "issue" count 5x higher than the URL submission count?
Google Search Console is telling us that there are 5,193 sitemap "issues" - URLs that are present on the XML sitemap that are blocked by robots.txt However, there are only 1,222 total URLs submitted on the XML sitemap. I only found 83 instances of URLs that fit their example description. Why is the number of "issues" so high? Does it compound over time as Google re-crawls the sitemap?
Intermediate & Advanced SEO | | FPD_NYC0 -
Splitting One Site Into Two Sites Best Practices Needed
Okay, working with a large site that, for business reasons beyond organic search, wants to split an existing site in two. So, the old domain name stays and a new one is born with some of the content from the old site, along with some new content of its own. The general idea, for more than just search reasons, is that it makes both the old site and new sites more purely about their respective subject matter. The existing content on the old site that is becoming part of the new site will be 301'd to the new site's domain. So, the old site will have a lot of 301s and links to the new site. No links coming back from the new site to the old site anticipated at this time. Would like any and all insights into any potential pitfalls and best practices for this to come off as well as it can under the circumstances. For instance, should all those links from the old site to the new site be nofollowed, kind of like a non-editorial link to an affiliate or advertiser? Is there weirdness for Google in 301ing to a new domain from some, but not all, content of the old site. Would you individually submit requests to remove from index for the hundreds and hundreds of old site pages moving to the new site or just figure that the 301 will eventually take care of that? Is there substantial organic search risk of any kind to the old site, beyond the obvious of just not having those pages to produce any more? Anything else? Any ideas about how long the new site can expect to wander the wilderness of no organic search traffic? The old site has a 45 domain authority. Thanks!
Intermediate & Advanced SEO | | 945010 -
Links to my site still showing in Webmaster Tools from a non-existent site
We owned 2 sites, with the pages on Site A all linking over to similar pages on Site B. We wanted to remove the links from Site A to Site B, so we redirected all the links on Site A to the homepage on Site A, and took Site A down completely. Unfortunately we are still seeing the links from Site A coming through on Google Webmaster Tools for Site B. Does anybody know what else we can do to remove these links?
Intermediate & Advanced SEO | | pedstores0 -
Development site is live (and has indexed) alongside live site - what's the best course of action?
Hello Mozzers, I am undertaking a site audit and have just noticed that the developer has left the development site up and it has indexed. They 301d from pages on old site to equivalent pages on new site but seem to have allowed the development site to index, and they haven't switched off the development site. So would the best option be to redirect the development site pages to the homepage of the new site (there is no PR on dev site and there are no links incoming to dev site, so nothing much to lose...)? Or should I request equivalent to equivalent page redirection? Alternatively I can simply ask for the dev site to be switched off and the URLs removed via WMT, I guess... Thanks in advance for your help! 🙂
Intermediate & Advanced SEO | | McTaggart1 -
Merging 11 community sites into 1 regional site
I am merging 11 real estate community sites into 1 regional site and don't really know what type of redirect should I use for the homepage?, for instance: www.homepage.com redirect to www.regionalsite.com/community-page Should I 301 this redirect? If yes, how could I 301 redirect a homepage to an internal page in my new site? Cheers 🙂
Intermediate & Advanced SEO | | mbulox0 -
Need a mobile XML Sitemap?
We're going to be running our mobile site on the same domain and generating content for users on mobile devices with style sheets (will not have m.domain). The content on our URLs will be the exact same. My question is if we need to create a mobile XML Sitemap to submit to the search engines. Do we need to create the Sitemap, that will contain the exact same URLs as our non-mobile Sitemap, and just include <mobile><mobile>tags around the URLs? Or do we need to create a mobile Sitemap at all to alert the search engines that we have mobile content?</mobile></mobile> Thanks!
Intermediate & Advanced SEO | | bonnierSEO0 -
1 of the sites i work on keeps having its home page "de-indexed" by google every few months, I then apply for a review and they put it back up. But i have no idea why this keeps happening and its only the home page
1 of the sites i work on (www.eva-alexander.com) keeps having its home page "de-indexed" by google every few months, I then apply for a review and they put it back up. But i have no idea why this keeps happening and its only the home page I have no idea why and have never experienced this before
Intermediate & Advanced SEO | | GMD10