XML Sitemap Index Percentage (Large Sites)
-
Hi all
I'm wanting to find out from those who have experience dealing with large sites (10s/100s of millions of pages).
What's a typical (or highest) percentage of indexed pages vs. submitted pages you've seen? This information can be found in webmaster tools where Google shows you the pages submitted & indexed for each of your sitemap.
I'm trying to figure out whether,
- The average index % out there
- There is a ceiling (i.e. will never reach 100%)
- It's possible to improve the indexing percentage further
Just to give you some background, sitemap index files (according to schema.org) have been implemented to improve crawl efficiency and I'm wanting to find out other ways to improve this further.
I've been thinking about looking at the URL parameters to exclude as there are hundreds (e-commerce site) to help Google improve crawl efficiency and utilise the daily crawl quote more effectively to discover pages that have not been discovered yet.
However, I'm not sure yet whether this is the best path to take or I'm just flogging a dead horse if there is such a ceiling or if I'm already at the average ballpark for large sites.
Any suggestions/insights would be appreciated. Thanks.
-
I've worked on a site that was ~100 million pages, and I've seen indexation percentages ranging from 8% to 95%. When dealing with sites this size, there are so, so many issues at play, and there are so few sites of this size that finding an average probably won't do you much good.
Rather than focusing on whether or not you have enough pages indexed based on averages, you should focus on two key questions: "do my sitemaps only include pages that would make great search engine entry pages" and "have I done everything possible to eliminate junk pages that are wasting crawl bandwidth."
Of course, making sure you don't have any duplicate content, thin content, or poor on-site optimization issues should also be a focus.
I guess what I'm trying to say is, I believe any site can have 100% of it's search entry worthy pages indexed, but sites of that size rarely have ALL of their pages indexed since sites that large often have a ton of pages that don't make great search results.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Sitemap Indexed vs. Submitted
My sitemap has been submitted to Google for well over 6 months and is updated frequently, a total of 979 URLs have been submitted by only 145 indexed. What can I do to get Google to index them all?
Intermediate & Advanced SEO | | moon-boots0 -
Best way to do site seals for clients to have on their sites
I am about to help release a product which also gives people a site seal for them to place on their website. Just like the geotrust, comodo, symantec, rapidssl and other web security providers do.
Intermediate & Advanced SEO | | ssltrustpaul
I have notices all these siteseals by these companies never have nofollow on their seals that link back to their websites. So i am wondering what is the best way to do this. Should i have a nofollow on the site seal that links back to domain or is it safe to not have the nofollow.
It wont be doing any keyword stuffing or anything, it will probly just have our domain in the link and that is all. The problem is too, we wont have any control of where customers place these site seals. From experience i would say they will mostly likely always be placed in the footer on every page of the clients website. I would like to hear any and all thoughts on this. As i can't get a proper answer anywhere i have asked.0 -
How to make sure dev site is not index in wordpress and how would it be affected?
hi guys! I'm currently having a dev version of my site (dev.website.com) that once everything is done i would move the dev to the public domain (website.com). But since is a total duplicate content of my real site would it affect the seo? if so, i tried setting the reading privacy in wordpress so google would not index it but im afraid when i live it in the future and revert the setting back to normal it would affect the site seo. any opinion and suggestion on this?
Intermediate & Advanced SEO | | andrewwatson920 -
XML Sitemaps - Multi-lingual website
Hi Mozzers, I am working with a large website that has some of its content translated across multiple languages. I am planning on using The Media Flow to create an HREFLANG Sitemap for content on various languages. Please see the attached image for the questions below. Thanks! Section Highlighted Yellow: When there is a URL that does not have a translated version, should it not be included on the same HREFLANG sitemap? Alternately, could I just remove the languages that are not being targeted, so this would just reflect English language targeting? fqO9Dvk
Intermediate & Advanced SEO | | J-Banz0 -
Mobile Site Outranking Main Site
Hi, We have recently been hit with a problem regarding our mobile site, where it is outranking our main site. This is causing a drop in orders and ranknings for our main site. It would appear that google has indexed our mobile site and so the two are now competing against each other. Our main site is on a .co.uk and our mobile site on a .mobi, but we have now taken down the mobile site until we get this sorted. Does anyone else have any experience of this happening and how to stop it happening again? Thanks Steve
Intermediate & Advanced SEO | | Steve251 -
Better SEO Option, 1 Site 3 Subdomains or 4 Separate Sites?
Hey Mozzers, I'm working with a client who wants to redo their web presence. They have a a main website for the umbrella and then 3 divisions which have their own website as well. My question is: Is it better to have the main site on the main domain and then have the 3 separate sites be subdomains? Or 4 different domains with a linking structure to tie them all together? To my understanding option 1 would include high traffic for 1 domain and option 2 would be building Page Authority by having 4 different sites linking to each other? My guess would be option 2, only if all 4 sites start getting relevant authority to make the links of value. But right out of the gates option 1 might be more beneficial. A little advice/clarification would be great!
Intermediate & Advanced SEO | | MonsterWeb280 -
Sitemap not indexing pages
My website has about 5000 pages submitted in the sitemap but only 900 being indexed. When I checked Google Webmaster Tools about a week ago 4500 pages were being indexed. Any suggestions about what happened or how to fix it? Thanks!
Intermediate & Advanced SEO | | theLotter0 -
Can a XML sitemap index point to other sitemaps indexes?
We have a massive site that is having some issue being fully crawled due to some of our site architecture and linking. Is it possible to have a XML sitemap index point to other sitemap indexes rather than standalone XML sitemaps? Has anyone done this successfully? Based upon the description here: http://sitemaps.org/protocol.php#index it seems like it should be possible. Thanks in advance for your help!
Intermediate & Advanced SEO | | CareerBliss0