Is site: a reliable method for getting full list of indexed pages?
-
The site:domain.com search seems to show less pages than it used to (Google and Bing).
It doesn't relate to a specific site but all sites. For example, I will get "page 1 of about 3,000 results" but by the time I've paged through the results it will end and change to "page 24 of 201 results". In that example If I look in GSC it shows 1,932 indexed.
Should I now accept the "pages" listed in site: is an unreliable metric?
-
Keep in mind that for a site:domain.com search, Google now includes pages from OTHER SITES that are using the canonical tag to point to your site. So, even though it says there are 300 pages indexed, 30 of those pages might be on other sites that use the canonical tag pointing to your site. The numbers of pages indexed that you're looking at may not be entirely accurate because of this.
-
I just haven't seen where the pages reduced, but I only use that operator for a general search. I have never gone through all the pages, etc. For that I would use any of the crawler tools. It would be interesting to see a download of search, GSC, and then something like Screaming Frog to see what we see.
As soon as I wrote that I checked our site and realized what you are saying. For Google we get "About 281 results," as I go to last page of results it changes to "page 13 of 126 results."
Then out of curiosity I tried Bing and now I am scratching my head: "763 results." When I go to last possible page I get, "247-256 of 256 results." I think that means my 281 results from Google are mostly on Bing!!!! (in case someone does not realize my humor, that last statement can be defined as either jest or sarcasm.)
So, when doing the site: I get 126 with Google but search console has 428...
Certainly interesting. I will keep playing with it.
Best
-
Hi Robert,
Thanks for your input.
The reason for doing it is part of an SEO site review process to examine pages indexed in Google compared to a site crawl in a tool like screaming frog and the indexed pages defined in GSC.
In terms of the "page 24 of 201 results" example, I mean that when you first use the site:domain.com Google will give you an estimated number of results, e.g. 3000 but actually as you click through the pages you find that the number of results is reduced - sometimes significantly.
-
I am not sure I understand where you say, " ...it will end and change to "page 24 of 201 results." I have used the site: operator a long time and I think it is reasonably accurate. One thing I notice is the occasional "some pages have been ... duplicate" and do you want to see those? So, if you include all of those what's the magic number?
Is there a reason you want the data that demands an exact result? I am not sure of anything that would give you that. The question is "indexed" within the given search engine. If you crawl with screaming frog, etc. you may see pages that are not indexed, so the comparison is not apples to apples. Just curious as to what you are wanting to know exact indexed pages for?
Interesting question.
-
Typically, the site: command in Google is unreliable. There are lots of reasons why, one being that there may be pages indexed that aren't "good enough", for whatever reason, to show up in the search results. When we look at the site pages indexed, we typically will use the site: command, then click a few pages deep and look at the number it shows (not the first number of pages it shows).
For SEO auditing purposes, we're looking to see if there is a significant difference between the number of pages indexed and the number of pages that we find when we we crawl the website ourselves.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
How do I redirect old html pages to new site?
Im seeing in my Google search console some of my old html pages. I never redirecting them and now they get 404 errors. Below is my current htaccess file, how would I changed it so that any html page i.e. intercallsystems.com/index.html forwards to my new site intercallsystems.com ? I have about 5 html pages that I want to redirect. Thank you for the help! Rena Currently my htaccess says: BEGIN WordPress <ifmodule mod_rewrite.c="">RewriteEngine On
Technical SEO | | palila
RewriteBase /
RewriteRule ^index.php$ - [L]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.php [L]</ifmodule> END WordPress0 -
Is there a way to index important pages manually or to make sure a certain page will get indexed in a short period of time??
Hi There! The problem I'm having is that certain pages are waiting already three months to be indexed. They even have several backlinks. Is it normal to have to wait more than three months before these pages get an indexation? Is there anything i can do to make sure these page will get an indexation soon? Greetings Bob
Technical SEO | | rijwielcashencarry0400 -
How to stop crawls for product review pages? Volusion site
Hi guys, I have a new Volusion website. the template we are using has its own product review page for EVERY product i sell (1500+) When a customer purchases a product a week later they receive a link back to review the product. This link sends them to my site, but its own individual page strictly for reviewing the product. (As oppose to a page like amazon, where you review the product on the same page as the actual listing.) **This is creating countless "duplicate content" and missing "title" errors. What is the most effective way to block a bot from crawling all these pages? Via robots txt.? a meta tag? ** Here's the catch, i do not have access to every individual review page, so i think it will need to be blocked by a robot txt file? What code will i need to implement? i need to do this on my admin side for the site? Do i also have to do something on the Google analytics side to tell google about the crawl block? Note: the individual URLs for these pages end with: *****.com/ReviewNew.asp?ProductCode=458VB Can i create a block for all url's that end with /ReviewNew.asp etc. etc.? Thanks! Pardon my ignorance. Learning slowly, loving MOZ community 😃 1354bdae458d2cfe44e0a705c4ec38dd
Technical SEO | | Jerrion0 -
How to block text on a page to be indexed?
I would like to block the spider indexing a block of text inside a page , however I do not want to block the whole page with, for example , a noindex tag. I have tried already with a tag like this : chocolate pudding chocolate pudding However this is not working for my case, a travel related website. thanks in advance for your support. Best regards Gianluca
Technical SEO | | CharmingGuy0 -
My beta site (beta.website.com) has been inadvertently indexed. Its cached pages are taking traffic away from our real website (website.com). Should I just "NO INDEX" the entire beta site and if so, what's the best way to do this? Please advise.
My beta site (beta.website.com) has been inadvertently indexed. Its cached pages are taking traffic away from our real website (website.com). Should I just "NO INDEX" the entire beta site and if so, what's the best way to do this? Are there any other precautions I should be taking? Please advise.
Technical SEO | | BVREID0 -
De-indexing millions of pages - would this work?
Hi all, We run an e-commerce site with a catalogue of around 5 million products. Unfortunately, we have let Googlebot crawl and index tens of millions of search URLs, the majority of which are very thin of content or duplicates of other URLs. In short: we are in deep. Our bloated Google-index is hampering our real content to rank; Googlebot does not bother crawling our real content (product pages specifically) and hammers the life out of our servers. Since having Googlebot crawl and de-index tens of millions of old URLs would probably take years (?), my plan is this: 301 redirect all old SERP URLs to a new SERP URL. If new URL should not be indexed, add meta robots noindex tag on new URL. When it is evident that Google has indexed most "high quality" new URLs, robots.txt disallow crawling of old SERP URLs. Then directory style remove all old SERP URLs in GWT URL Removal Tool This would be an example of an old URL:
Technical SEO | | TalkInThePark
www.site.com/cgi-bin/weirdapplicationname.cgi?word=bmw&what=1.2&how=2 This would be an example of a new URL:
www.site.com/search?q=bmw&category=cars&color=blue I have to specific questions: Would Google both de-index the old URL and not index the new URL after 301 redirecting the old URL to the new URL (which is noindexed) as described in point 2 above? What risks are associated with removing tens of millions of URLs directory style in GWT URL Removal Tool? I have done this before but then I removed "only" some useless 50 000 "add to cart"-URLs.Google says themselves that you should not remove duplicate/thin content this way and that using this tool tools this way "may cause problems for your site". And yes, these tens of millions of SERP URLs is a result of a faceted navigation/search function let loose all to long.
And no, we cannot wait for Googlebot to crawl all these millions of URLs in order to discover the 301. By then we would be out of business. Best regards,
TalkInThePark0 -
Do index.php extensions count as duplicate content on Joomla sites?
When i run my error report, i see 2 duplicate pages, but both are the main domain and then the /index.php extension. how do i fix this? does it really count as duplicate content?
Technical SEO | | valetseo0 -
Paginated Home Page Duplicates on Wordpress Sites
A number of my websites created on WP are displaying duplicate home pages with these types of urls. http://www.example.com/page/10/ http://www.example.com/page/11/ http://www.example.com/page/12/ I found these duplicates using the site:search command. Basically, put in any number and the Home Page opens. With the above mentioned url structure. Any idea on why they are created, how they can be stopped and what kind of an impact they would have in terms of SEO and the penalty that comes with duplicate content.
Technical SEO | | AsadMemon1