XML Sitemap Index Percentage (Large Sites)
-
Hi all
I'm wanting to find out from those who have experience dealing with large sites (10s/100s of millions of pages).
What's a typical (or highest) percentage of indexed pages vs. submitted pages you've seen? This information can be found in webmaster tools where Google shows you the pages submitted & indexed for each of your sitemap.
I'm trying to figure out whether,
- The average index % out there
- There is a ceiling (i.e. will never reach 100%)
- It's possible to improve the indexing percentage further
Just to give you some background, sitemap index files (according to schema.org) have been implemented to improve crawl efficiency and I'm wanting to find out other ways to improve this further.
I've been thinking about looking at the URL parameters to exclude as there are hundreds (e-commerce site) to help Google improve crawl efficiency and utilise the daily crawl quote more effectively to discover pages that have not been discovered yet.
However, I'm not sure yet whether this is the best path to take or I'm just flogging a dead horse if there is such a ceiling or if I'm already at the average ballpark for large sites.
Any suggestions/insights would be appreciated. Thanks.
-
I've worked on a site that was ~100 million pages, and I've seen indexation percentages ranging from 8% to 95%. When dealing with sites this size, there are so, so many issues at play, and there are so few sites of this size that finding an average probably won't do you much good.
Rather than focusing on whether or not you have enough pages indexed based on averages, you should focus on two key questions: "do my sitemaps only include pages that would make great search engine entry pages" and "have I done everything possible to eliminate junk pages that are wasting crawl bandwidth."
Of course, making sure you don't have any duplicate content, thin content, or poor on-site optimization issues should also be a focus.
I guess what I'm trying to say is, I believe any site can have 100% of it's search entry worthy pages indexed, but sites of that size rarely have ALL of their pages indexed since sites that large often have a ton of pages that don't make great search results.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
My sites are not mooving why?
i have three local sites in Dubai. my second site is on page three. i didn't go for any guest post yet but for a long time with all improvement, It didn't move a bit. unable to understand the adhesivity of page three. lol any suggestion site 1- https://www.desertsafaritour.ae site 2- https://www.arabiannightsafari.com site3- https://www.uaedesertsafari.com any expert suggestion or any guideline by moz expert www.desertsafaritour.ae
Intermediate & Advanced SEO | | faisalkiani0 -
Site migration/ CMS/domain site structure change-no access to search console
Hi everyone, We are migrating an old site under a bigger umbrella (our main domain). As mentioned in the title, We'll perform CMS migration, domain change, and site structure change. Now, the major problem is that we can't get into google search console for the old site. The site still has old GA code, so google search console verification using this method is not possible, also there is no way developers will be able to add GTM or edit DNS setting (not to bother you with the reason why). Now, my dilemma is : 1. Do we need access to old search console to notify Google about the domain name change or this could be done from our main site (old site will become a part of) search console 2. We are setting up 301 redirects from old to the new domain (not perfect 1:1 redirect ). Once migration is done does anything else needs to be done with the old domain (it will become obsolete)? 3.The main site, Site-map... Should I create a new sitemap with newly added pages or update the current one. 4. if you have anything else please add:) Thank you!
Intermediate & Advanced SEO | | bgvsiteadmin0 -
Splitting One Site Into Two Sites Best Practices Needed
Okay, working with a large site that, for business reasons beyond organic search, wants to split an existing site in two. So, the old domain name stays and a new one is born with some of the content from the old site, along with some new content of its own. The general idea, for more than just search reasons, is that it makes both the old site and new sites more purely about their respective subject matter. The existing content on the old site that is becoming part of the new site will be 301'd to the new site's domain. So, the old site will have a lot of 301s and links to the new site. No links coming back from the new site to the old site anticipated at this time. Would like any and all insights into any potential pitfalls and best practices for this to come off as well as it can under the circumstances. For instance, should all those links from the old site to the new site be nofollowed, kind of like a non-editorial link to an affiliate or advertiser? Is there weirdness for Google in 301ing to a new domain from some, but not all, content of the old site. Would you individually submit requests to remove from index for the hundreds and hundreds of old site pages moving to the new site or just figure that the 301 will eventually take care of that? Is there substantial organic search risk of any kind to the old site, beyond the obvious of just not having those pages to produce any more? Anything else? Any ideas about how long the new site can expect to wander the wilderness of no organic search traffic? The old site has a 45 domain authority. Thanks!
Intermediate & Advanced SEO | | 945010 -
Open Site Explorer - Spam analysis: need help with inbound links... from my site!
hallo, reading my spam analysis report from open explorer, I found somenthing I don't understand (please see attached image): The long list of links inside the red rectangle are inbound links with a spam score of 5 coming from my same site. How is that possible? Should I remove those links? Also , I see that many of those links are links present in the top navigation bar (about page, home page, service description etc.) or in the sidebar section of the website (categories, recent posts, recent comments). Should I treat them differently? Thank you for your time.
Intermediate & Advanced SEO | | micvitale0 -
Blocking Certain Site Parameters from Google's Index - Please Help
Hello, So we recently used Google Webmaster Tools in an attempt to block certain parameters on our site from showing up in Google's index. One of our site parameters is essentially for user location and accounts for over 500,000 URLs. This parameter does not change page content in any way, and there is no need for Google to index it. We edited the parameter in GWT to tell Google that it does not change site content and to not index it. However, after two weeks, all of these URLs are still definitely getting indexed. Why? Maybe there's something we're missing here. Perhaps there is another way to do this more effectively. Has anyone else ran into this problem? The path we used to implement this action:
Intermediate & Advanced SEO | | Jbake
Google Webmaster Tools > Crawl > URL Parameters Thank you in advance for your help!0 -
XML Sitemap works fine in GWT, but does not show in SERP
XML Sitemap works properly in GWT, but when I run a search in Google for "site:example.com/sitemap.xml" it does not show. However, my XML image sitemap show when I run the same search in Google. Is this potentially an issue on my end and is there a solution?
Intermediate & Advanced SEO | | khi50 -
XML Sitemaps - Multi-lingual website
Hi Mozzers, I am working with a large website that has some of its content translated across multiple languages. I am planning on using The Media Flow to create an HREFLANG Sitemap for content on various languages. Please see the attached image for the questions below. Thanks! Section Highlighted Yellow: When there is a URL that does not have a translated version, should it not be included on the same HREFLANG sitemap? Alternately, could I just remove the languages that are not being targeted, so this would just reflect English language targeting? fqO9Dvk
Intermediate & Advanced SEO | | J-Banz0 -
Why my site it's not being indexed?
Hello.... I got to tell that I feel like a newbie (I am, but know I feel like it)... We were working with a client until january this year, they kept going on their own until september that they contacted us again... Someone on the team that handled things while we were gone, updated it´s robots.txt file to Disallow everything... for maybe 3 weeks before we were back in.... Additionally they were working on a different subdomain, the new version of the site and of course the didn't block the robots on that one. So now the whole site it's been duplicated, even it´s content, the exact same pages exist on the suddomain that was public the same time the other one was blocked. We came in changes the robots.txt file on both server, resend all the sitemaps, sent our URL on google+... everything the book says... but the site it´s not getting indexed. It's been 5 weeks now and no response what so ever. We were highly positioned on several important keywords and now it's gone. I now you guys can help, any advice will be highly appreciated. thanks Dan
Intermediate & Advanced SEO | | daniel.alvarez0