Substantial difference between Number of Indexed Pages and Sitemap Pages
-
Hey there,
I am doing a website audit at the moment.
I've notices substantial differences in the number of pages indexed (search console), the number of pages in the sitemap and the number I am getting when I crawl the page with screamingfrog (see below). Would those discrepancies concern you? The website and its rankings seems fine otherwise.
Total indexed: 2,360 (Search Consule)
About 2,920 results (Google search "site:example.com")
Sitemap: 1,229 URLs
Screemingfrog Spider: 1,352 URLsCheers,
Jochen -
Those discrepancies would not concern me, but there are some differences between all the things you list:
Total indexed: 2,360 Search Console - this is likely a reasonably accurate list of the number of pages you have indexed in Google. You could use a tool like URL Profiler to check index status of specific URLs.
About 2,920 results Google search "site:example.com" - site: search is less accurate and will likely return a different number each time you do it, even if it's just moments apart.
Sitemap: 1,229 URLs: these are URLs you added to a sitemap because they are priority pages you want to make sure Google has indexed and hopefully ranked. You control this number.
Screaming Frog Spider: 1,352 URLs - Screaming Frog is going to start on your homepage and crawl the site attempting to discover as many URLs as possible. If you are not linking to a page, SF won't be able to crawl it. Google on the other hand may have old pages, old URL structures or pages that were linked from an external website in their index and they won't forget them.
A really important question is: how many pages do you have that you want to be indexed? Is Google's index bloated with pages that you want to keep out? Figure these things out, and then try to adjust your sitemaps, noindex, robots.txt as needed.
-
Thanks for your reply Dmitrii,
we have excluded all query parameters in search console so this shouldn't be an issue. What is also strange is that when I try to scrape the SERPS via a site:example.com search Google is only showing a fraction (about 700) of the 2,920 results.
Cheers,
Jochen
- ★
- ★
- ☆
- ☆
- ☆
MozPoints: 810
Good Answers: 47
Endorsed Answers: 20">- ★
- ★
- ☆
- ☆
- ☆
-
Hi there.
I think that as long as rankings are good (especially historically), there is no reason to worry, because google includes in index pages, which wouldn't be in sitemap - for example pages, generated with query parameters (domain.com?x=value). Sometimes these pages do not really exist by themselves (like filters in online stores), they only exist "on the fly".
Hope this makes sense and helps
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Should I index resource submission forms, thank you pages, etc.?
Should I index resource submission forms, thank you, event pages, etc.? Doesn't Google consider this content too thin?
Intermediate & Advanced SEO | | amarieyoussef0 -
When serving a 410 for page gone, should I serve an error page?
I'm removing a bunch of old & rubbish pages and was going to serve 410 to tell google they're gone (my understanding is it'll get them out of the index a bit quicker than a 404). I should still serve an error page though, right? Similar to a 404. That doesn't muddy the "gone" message that I'm giving Google? There's no need to 410 and die?
Intermediate & Advanced SEO | | HSDOnline0 -
How I can improve my website On page and Off page
My Website is guitarcontrol.com, I have very strong competition in market. Please advice me the list of improvements on my websites. In regarding ON page, Linkbuiding and Social media. What I can do to improve my website ranking?
Intermediate & Advanced SEO | | zoe.wilson170 -
Google Webmaster Tools -> Sitemap suddent "indexed" drop
Hello MOZ, We had an massive SEO drop in June due to unknown reasons and we have been trying to recover since then. I've just noticed this yesterday and I'm worried. See: http://imgur.com/xv2QgCQ Could anyone help by explaining what would cause this sudden drop and what does this drop translates to exactly? What is strange is that our index status is still strong at 310 pages, no drop there: http://imgur.com/a1sRAKo And when I do search on google site:globecar.com everything seems normal see: http://imgur.com/O7vPkqu Thanks,
Intermediate & Advanced SEO | | GlobeCar0 -
Better to 301 or de-index 403 pages
Google WMT recently found and called out a large number of old unpublished pages as access denied errors. The pages are tagged "noindex, follow." These old pages are in Google's index. At this point, would it better to 301 all these pages or submit an index removal request or what? Thanks... Darcy
Intermediate & Advanced SEO | | 945010 -
Google is indexing the wrong pages
I have been having problems with Google indexing my website since mid May. I haven't made any changes to my website which is wordpress. I have a page with the title 'Peterborough Cathedral wedding', I search Google for 'wedding Peteborough Cathedral', this is not a competitive search phrase and I'd expect to find my blog post on page one. Instead, half way down page 4 I find Google has indexed www.weddingphotojournalist.co.uk/blog with the title 'wedding photojournalist | Portfolio', what google has indexed is a link to the blog post and not the blog post itself. I repeated this for several other blog posts and keywords and found similar results, most of which don't make any sense at all - A search for 'Menorca wedding photography' used to bring up one of my posts at the top of page one. Now it brings up a post titled 'La Mare wedding photography Jersey" which happens to have a link to the Menorca post at the bottom of the page. A search for 'Broadoaks country house weddng photography' brings up 'weddingphotojournalist | portfolio' which has a link to the Broadoaks post. a search for 'Blake Hall wedding photography' does exactly the same. In this case Google is linking to www.weddingphotojournalist.blog again, this is a page of recent blog posts. Could this be a problem with my sitemap? Or the Yoast SEO plugin? or a problem with my wordpress theme? Or is Google just a bit confused?
Intermediate & Advanced SEO | | weddingphotojournalist0 -
Canonical tag for similar page with different theme.
Our commerce system allows products to be shared across multiple categories/sections of our site. E.G. /boxes/blue-box.html /circles/blue-box.html This enables the product to show up in different areas of the site, but does not link to an evergreen URL. We are considering using the canonical tag to resolve this issue, but our question relates to the similarity of the pages. Each section folder (e.g. /boxes/ and /circles/) has a different header, left navigation and footer. They are similar in layout and some content is the same, but a good portion is different in the header and nav. Each category nav basically deals with deeper links in it's own category. The product title, image, description, etc. is all the same and makes up the bulk of the page. Is this a good candidate for the canonical tag or should we attempt to accommodate an evergreen URL?
Intermediate & Advanced SEO | | josh-att0 -
Duplicate content on index.htm page
How do I avoid duplicate content on the index.htm page . I need to redirect the spider from the /index.htm file to the main root of http://www.manandhisvan.com.au and hence avoid duplicate content. Does anyone know of a foolproof way of achieving this without me buggering up the complete site Cheers Freddy
Intermediate & Advanced SEO | | Fatfreddy0