Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Substantial difference between Number of Indexed Pages and Sitemap Pages
-
Hey there,
I am doing a website audit at the moment.
I've notices substantial differences in the number of pages indexed (search console), the number of pages in the sitemap and the number I am getting when I crawl the page with screamingfrog (see below). Would those discrepancies concern you? The website and its rankings seems fine otherwise.
Total indexed: 2,360 (Search Consule)
About 2,920 results (Google search "site:example.com")
Sitemap: 1,229 URLs
Screemingfrog Spider: 1,352 URLsCheers,
Jochen -
Those discrepancies would not concern me, but there are some differences between all the things you list:
Total indexed: 2,360 Search Console - this is likely a reasonably accurate list of the number of pages you have indexed in Google. You could use a tool like URL Profiler to check index status of specific URLs.
About 2,920 results Google search "site:example.com" - site: search is less accurate and will likely return a different number each time you do it, even if it's just moments apart.
Sitemap: 1,229 URLs: these are URLs you added to a sitemap because they are priority pages you want to make sure Google has indexed and hopefully ranked. You control this number.
Screaming Frog Spider: 1,352 URLs - Screaming Frog is going to start on your homepage and crawl the site attempting to discover as many URLs as possible. If you are not linking to a page, SF won't be able to crawl it. Google on the other hand may have old pages, old URL structures or pages that were linked from an external website in their index and they won't forget them.
A really important question is: how many pages do you have that you want to be indexed? Is Google's index bloated with pages that you want to keep out? Figure these things out, and then try to adjust your sitemaps, noindex, robots.txt as needed.
-
Thanks for your reply Dmitrii,
we have excluded all query parameters in search console so this shouldn't be an issue. What is also strange is that when I try to scrape the SERPS via a site:example.com search Google is only showing a fraction (about 700) of the 2,920 results.
Cheers,
Jochen
- ★
- ★
- ☆
- ☆
- ☆
MozPoints: 810
Good Answers: 47
Endorsed Answers: 20">- ★
- ★
- ☆
- ☆
- ☆
-
Hi there.
I think that as long as rankings are good (especially historically), there is no reason to worry, because google includes in index pages, which wouldn't be in sitemap - for example pages, generated with query parameters (domain.com?x=value). Sometimes these pages do not really exist by themselves (like filters in online stores), they only exist "on the fly".
Hope this makes sense and helps
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Should I index resource submission forms, thank you pages, etc.?
Should I index resource submission forms, thank you, event pages, etc.? Doesn't Google consider this content too thin?
Intermediate & Advanced SEO | | amarieyoussef0 -
Home page suddenly dropped from index!!
A client's home page, which has always done very well, has just dropped out of Google's index overnight!
Intermediate & Advanced SEO | | Caro-O
Webmaster tools does not show any problem. The page doesn't even show up if we Google the company name. The Robot.txt contains: Default Flywheel robots file User-agent: * Disallow: /calendar/action:posterboard/
Disallow: /events/action~posterboard/ The only unusual thing I'm aware of is some A/B testing of the page done with 'Optimizely' - it redirects visitors to a test page, but it's not a 'real' redirect in that redirect checker tools still see the page as a 200. Also, other pages that are being tested this way are not having the same problem. Other recent activity over the last few weeks/months includes linking to the page from some of our blog posts using the page topic as anchor text. Any thoughts would be appreciated.
Caro0 -
Google News Sitemap in Different Languages
Thought I'd ask this question to confirm what I already think. I'm curious that if we're publishing something in two language and both are verified by the publishing center if the group would recommend publishing two separate Google News Sitemaps (one in each language) or publishing one in each language.
Intermediate & Advanced SEO | | mattdinbrooklyn0 -
If Robots.txt have blocked an Image (Image URL) but the other page which can be indexed has this image, how is the image treated?
Hi MOZers, This probably is a dumb question but I have a case where the robots.tags has an image url blocked but this image is used on a page (lets call it Page A) which can be indexed. If the image on Page A has an Alt tags, then how is this information digested by crawlers? A) would Google totally ignore the image and the ALT tags information? OR B) Google would consider the ALT tags information? I am asking this because all the images on the website are blocked by robots.txt at the moment but I would really like website crawlers to crawl the alt tags information. Chances are that I will ask the webmaster to allow indexing of images too but I would like to understand what's happening currently. Looking forward to all your responses 🙂 Malika
Intermediate & Advanced SEO | | Malika11 -
Better to 301 or de-index 403 pages
Google WMT recently found and called out a large number of old unpublished pages as access denied errors. The pages are tagged "noindex, follow." These old pages are in Google's index. At this point, would it better to 301 all these pages or submit an index removal request or what? Thanks... Darcy
Intermediate & Advanced SEO | | 945010 -
Index process multi language website for different countries
We are in charge of a website with 7 languages for 16 countries. There are only slight content differences by countries (google.de | google.co.uk). The website is set-up with the correct language & country annotation e.g. de/DE/ | de/CH/ | en/GB/ | en/IE. All unwanted annotations are blocked by robots.txt. The «hreflang alternate» are also set. The objective is, to make the website visible in local search engines. Therefore we have submitted a overview sitemap connected with a sitemap per country. The sitemap has been submitted now for quite a while, but Google has indexed only 10 % of the content. We are looking for suggestion to boost the index process.
Intermediate & Advanced SEO | | imsi0 -
Should sitemap include https pages?
Hi guys, Trying to figure out some onsite issues I've been having. Would appreciate any feedback on the following 2 questions: My homepage (http://mysite.com) is a 301 redirect to https://mysite.com, which is under SSL. Only 2 pages of my site are https, the rest are http. Should the directory of my sitemap be https://mysite.com/sitemap.xml or should it be kept with http (even though the redirected homepage is to https)? Should my sitemap include the https pages (only 2 pages) as well as the http? Thanks, G
Intermediate & Advanced SEO | | G.Anderson0 -
Best way to get pages indexed fast?
Any suggestion on best ways to get new sites pages indexed? Was thinking getting high pr inbound links on fiverr but always a little risky right? Thanks for your opinions.
Intermediate & Advanced SEO | | mweidner27820