Some URLs in the sitemap not indexed
-
Our company site has hundreds of thousands of pages. Yet no matter how big or small the total page count, I have found that the "URLs Indexed" in GWMT has never matched "URLS in Sitemap". When we were small and now that we have a LOT more pages, there is always a discrepancy of ~10% or so missing from the index.
It's difficult to know which pages are not indexed, but I have found some that I can verify are in the Sitemap.xml file but not at all in the index. When I go to GWMT I can "Fetch and Render" missing pages fine - it's not as though it's blocked or inaccessible.
Any ideas on why this is? Is this type of discrepancy typical?
-
Thanks. Very helpful!
-
This is great to know that 10% is a good discrepancy. Hard to know otherwise.
That article about Screaming Frog is super helpful, thanks!
-
I have never had a site with 100% crawled pages, sometimes Google will drop a page off for being too similar to another, not informative enough, canonical links set, redirects.
As Ryan says, don't just rely on Moz use Screaming Frog to get a good view of your site too, see if there are any errors. Also you can run the frog whenever you like, it's just a little more technical to understand.
Xenu oooh never heard of that one Ryan thanks!
Just looked into Xenu, Screaming frog does it all and some.
-
Hi Mase,
I've managed sites with with hundreds of thousands of pages too, and in my experience a discrepancy between what's offered up via the sitemaps and what gets indexed is typical (dare I say it, a 10% discrepancy seems pretty good!). Pages deeper in the site seem to suffer this fate more frequently than those with fewer subfolders, as do those with thin content.
I agree completely with Ryan's comment about Screaming Frog: it is an invaluable tool for site audits, in addition to lots of other useful site insights. You might find this article interesting to get a sense of the many ways you can use SF: http://www.seerinteractive.com/blog/screaming-frog-guide/
-
You're welcome. Definitely take a look at a crawler that gives you more insight, especially with a site as large as yours. Just note, no matter what you might never achieve an exact match between the pages you've submitted and the number indexed as Google can decide not to index a page for other reasons aside from the page's presence in a site map. Something useful for you as well would be to look at how many of your pages recieve visits in analytics. That will give you an idea of percentages on pages in the sitemap vs the index vs active.
-
I have not run the site through those tools you mentioned, I'm unfamiliar.
I am not, however, receiving any errors on those pages. And when I "Fetch and Render" in GWMT, they look and render fine without errors. I'm able to submit them to the index one-by-one.
Thanks for your response, Ryan.
-
Hi Mase. Are you getting errors on URLs you've submitted? Or ran other crawlers on your site like Xenu or ScreamingFrog to produce any possible errors? It's also good to know which pages might not have enough content to be indexed: filters, sorting views, etc.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Which product URL to include in Sitemaps?
Hi Does the product URL's in Sitemaps affect the sub-categories authority too? For example, if I have a product with 2 URL's and which have a canonical tag: **/brands/michael-kors/bags/**jet-set-double-zip-wallet/ **/women/accessories/wallets/**jet-set-double-zip-wallet/ If I make the main URL "/women/accessories/wallets/jet-set-double-zip-wallet/" and set that as the Canonical URL & list that URL in the XML Sitemap, will it also mean the "/women/accessories/wallets/" category will get more authority and increase it's power to rank? Thanks Frankie
Technical SEO | | Frankie-BTDublin0 -
Which URL is better?
Hi everyone, Could you please help me with picking out the right URL for my company's website? We are MoonCreate and we make beautiful clothes. Unfortunately, the domain mooncreate.com is not available and I have to choose between mooncreatebrand.com or mooncreatewear.com Which one is better, in your opinion? Look forward to receive your suggestions! Thank you! 🙂
Technical SEO | | kirupa0 -
Vanity URLs are being indexed in Google
We are currently using vanity URLs to track offline marketing, the vanity URL is structured as www.clientdomain.com/publication, this URL then is 302 redirected to the actual URL on the website not a custom landing page. The resulting redirected URL looks like: www.clientdomain.com/xyzpage?utm_source=print&utm_medium=print&utm_campaign=printcampaign. We have started to notice that some of the vanity URLs are being indexed in Google search. To prevent this from happening should we be using a 301 redirect instead of a 302 and will the Google index ignore the utm parameters in the URL that is being 301 redirect to? If not, any suggestions on how to handle? Thanks,
Technical SEO | | seogirl221 -
Google News Sitemap
Currently for our website Thinkdigit, we are using a rss sitemap (http://www.thinkdigit.com/google_sitemap/news_rss.php) for news. Please let me know is this the right format or we should use xml format only. Also we have lost a huge chunk of traffic from news search, Previously it used to be around 10,000 visit from google news, now it is just 300 visit per day.
Technical SEO | | 9dot90 -
Site Indexed but not Cached?
I launched a new website ~2 weeks ago that seems to be indexed but not cached. According to Google Webmaster most of the pages are indexed and I see them appear when I search site:www.xxx.com. However, when I type into the URL - cache:www.xxx.com I get a 404 error page from Google.
Technical SEO | | theLotter
I've checked more established websites and they are cached so I know I am checking correctly here... Why would my site be indexed but not in the cache?0 -
Correct linking to the /index of a site and subfolders: what's the best practice? link to: domain.com/ or domain.com/index.html ?
Dear all, starting with my .htaccess file: RewriteEngine On
Technical SEO | | inlinear
RewriteCond %{HTTP_HOST} ^www.inlinear.com$ [NC]
RewriteRule ^(.*)$ http://inlinear.com/$1 [R=301,L] RewriteCond %{THE_REQUEST} ^./index.html
RewriteRule ^(.)index.html$ http://inlinear.com/ [R=301,L] 1. I redirect all URL-requests with www. to the non www-version...
2. all requests with "index.html" will be redirected to "domain.com/" My questions are: A) When linking from a page to my frontpage (home) the best practice is?: "http://domain.com/" the best and NOT: "http://domain.com/index.php" B) When linking to the index of a subfolder "http://domain.com/products/index.php" I should link also to: "http://domain.com/products/" and not put also the index.php..., right? C) When I define the canonical ULR, should I also define it just: "http://domain.com/products/" or in this case I should link to the definite file: "http://domain.com/products**/index.php**" Is A) B) the best practice? and C) ? Thanks for all replies! 🙂
Holger0 -
Strange URL's indexed
Hi, I got the message "Increase in not found errors" (404 errors) in GWT for one of my website. I did not change anything but I now see a lot of "strange" URL's indexed (~50) : &ui=2&tf=1&shva=1 &cat_id=6&tag_id=31&Remark=In %22%3EAny suggestion on how to fix it ?Erwan
Technical SEO | | johnny1220 -
Sitemap coming up in Google's index?
I apologize if this question's answer is glaringly obvious, but I was using Google to view all the pages it has indexed of our site--by searching for our company and then clicking the link that says to display more results for the site. On page three, it has the sitemap indexed as if it wee just another page of our site. <cite>www.stadriemblems.com/sitemap.xml</cite> Is this supposed to happen?
Technical SEO | | UnderRugSwept0