Sitemap - % of URL's in Google Index?
-
What is the average % of links from a sitemap that are included in the Google index? Obviously want to aim for 100% of the sitemap urls to be indexed, is this realistic?
-
If all the pages in your sitemap are worthy of the Google index, then you should expect around a 100% indexation rate. On the flip side, if you reference low quality pages in your sitemap file, you will not got them indexed and may even be hurting the trust of your sitemap file. As a point in case, Bing just recently announced that if they see an error rate greater than 1% in the sitemap, then they will just ignore your sitemap file.
-
Clients, so I have no idea how they do it. It's a complex automated process for sure.
-
Wow. Do you have a third party program to build your site map files or our you using something built in house?
-
Ryan's point is important to note. 100% is achievable under the correct circumstances. I've got a client with 34 million pages on their main site (and contained within a combined 909 sitemap xml files), and they have 34 million pages indexed.
-
The percent of pages indexed varies greatly with each site. If you desire 100% of your site indexed then 100% of your site's pages should be reviewed to ensure their content is worthy of being indexed. The content should be unique, well written and properly presented. Your sitemap process also needs to be carefully reviewed. Many site owners simply set up an automated process without taking the time to ensure it is properly configured. Often pages which are blocked by robots.txt are included in the site map, and those pages will not be indexed.
Many people say "I want 100% of my site indexed" just how many people say "I want to be #1 rank in Google". Both results are achievable, but both require time and effort, and perhaps money.
-
Hi. We have a stiemap with over 250,000 URLs and we are at 87%. This is a high for us. We have never been able to get 100%. We have been trying to clean up the sitemap a bit but with so many URLs it is hard to go through it line by line. We are making more of an effort to fix the errors Google tells us about in Webmaster Tools but these only account for a fraction of the URLs apparently not indexed.
We also do site searches on Google to see how many URLs total we have in Google as our sitemap only includes "the most important" pages. Doing a search for "site:www.sierratradingpost.com" comes up with over 400,000 URLs.
For us, I don't think 100% is realistic. We have never been able to achieve it. It will be interesting to see what other SEOmozers have to report!
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Why is my website not ranking for it's brand name in SERPs but has been indexed by Google?
The website https://christchurch.crowneplaza.com has been live for a couple of months but is not being found in Google search results - even when searching for it's own brand name 'crowne plaza christchurch.' Google has indexed the site - but we are still not showing - https://www.google.co.nz/search?q=site%3Ahttp%3A%2F%2Fchristchurch.crowneplaza.com&rlz=1C1NHXL_enNZ735NZ735&oq=site%3A&aqs=chrome.0.69i59j69i57j69i58j69i59l2j69i65.896j0j7&sourceid=chrome&ie=UTF-8 Any ideas as to why? I think it may be because their are two versions of the site, http and https, both with their own rel=canonical tags. Could this be the cause? Any help much appreciated.
Intermediate & Advanced SEO | | Timmy30 -
Does google ignore ? in url?
Hi Guys, Have a site which ends ?v=6cc98ba2045f for all its URLs. Example: https://domain.com/products/cashmere/robes/?v=6cc98ba2045f Just wondering does Google ignore what is after the ?. Also any ideas what that is? Cheers.
Intermediate & Advanced SEO | | CarolynSC0 -
This url is not allowed for a Sitemap at this location error using pro-sitemaps.com
Hey, guys, We are using the pro-sitemaps.com tool to automate our sitemaps on our properties, but some of them give this error "This url is not allowed for a Sitemap at this location" for all the urls. Strange thing is that not all of them are with the error and most have all the urls indexed already. Do you have any experience with the tool and what is your opinion? Thanks
Intermediate & Advanced SEO | | lgrozeva0 -
My site shows 503 error to Google bot, but can see the site fine. Not indexing in Google. Help
Hi, This site is not indexed on Google at all. http://www.thethreehorseshoespub.co.uk Looking into it, it seems to be giving a 503 error to the google bot. I can see the site I have checked source code Checked robots Did have a sitemap param. but removed it for testing GWMT is showing 'unreachable' if I submit a site map or fetch Any ideas on how to remove this error? Many thanks in advance
Intermediate & Advanced SEO | | SolveWebMedia0 -
Apps content Google indexation ?
I read some months back that Google was indexing the apps content to display it into its SERP. Does anyone got any update on this recently ? I'll be very interesting to know more on it 🙂
Intermediate & Advanced SEO | | JoomGeek0 -
Site's disappearnce in web rankings
I'm currently doing some work on a website: http://www.abetterdriveway.com.au. Upon starting, I detected a lot of spammy links going to this website and sort to remove them before submitting a disavow report. A few months later, this site completely disappeared in the rankings, with all keywords suddenly not ranked. I realised that the test website (which was put up to view before the new site went live) was still up on another URL and Google was suddenly ranking that site instead. Hence, I ensured that test site was completely removed. 3 weeks later however, the site (www.abetterdriveway.com.au) still remains unranked for its keywords. Upon checking Web Master Tools, I cannot see anything that stands out. There is no manual action or crawling issues that I can detect. Would anyone know the reason for this persistent disappearance? Is it something I will just have to wait out until ranking results come back, or is there something I am missing? Help here would be much appreciated.
Intermediate & Advanced SEO | | Gavo0 -
How can I get a list of every url of a site in Google's index?
I work on a site that has almost 20,000 urls in its site map. Google WMT claims 28,000 indexed and a search on Google shows 33,000. I'd like to find what the difference is. Is there a way to get an excel sheet with every url Google has indexed for a site? Thanks... Mike
Intermediate & Advanced SEO | | 945010 -
How is Google crawling and indexing this directory listing?
We have three Directory Listing pages that are being indexed by Google: http://www.ccisolutions.com/StoreFront/jsp/ http://www.ccisolutions.com/StoreFront/jsp/html/ http://www.ccisolutions.com/StoreFront/jsp/pdf/ How and why is Googlebot crawling and indexing these pages? Nothing else links to them (although the /jsp.html/ and /jsp/pdf/ both link back to /jsp/). They aren't disallowed in our robots.txt file and I understand that this could be why. If we add them to our robots.txt file and disallow, will this prevent Googlebot from crawling and indexing those Directory Listing pages without prohibiting them from crawling and indexing the content that resides there which is used to populate pages on our site? Having these pages indexed in Google is causing a myriad of issues, not the least of which is duplicate content. For example, this file <tt>CCI-SALES-STAFF.HTML</tt> (which appears on this Directory Listing referenced above - http://www.ccisolutions.com/StoreFront/jsp/html/) clicks through to this Web page: http://www.ccisolutions.com/StoreFront/jsp/html/CCI-SALES-STAFF.HTML This page is indexed in Google and we don't want it to be. But so is the actual page where we intended the content contained in that file to display: http://www.ccisolutions.com/StoreFront/category/meet-our-sales-staff As you can see, this results in duplicate content problems. Is there a way to disallow Googlebot from crawling that Directory Listing page, and, provided that we have this URL in our sitemap: http://www.ccisolutions.com/StoreFront/category/meet-our-sales-staff, solve the duplicate content issue as a result? For example: Disallow: /StoreFront/jsp/ Disallow: /StoreFront/jsp/html/ Disallow: /StoreFront/jsp/pdf/ Can we do this without risking blocking Googlebot from content we do want crawled and indexed? Many thanks in advance for any and all help on this one!
Intermediate & Advanced SEO | | danatanseo0