Site: Query Question

BenRush

Hi All,

Question around the site: query you can execute on Google for example. Now I know it has lots of inaccuracies, but I like to keep a high level sight of it over time.

I was using it to also try and get a high level view of how many product pages were indexed vs. the total number of pages.

What is interesting is when I do a site: query for say www.newark.com I get ~748,000 results returned.

When I do a query for www.newark.com "/dp/" I get ~845,000 results returned.

Either I am doing something stupid or these numbers are completely backwards?

Any thoughts?

Thanks,

Ben

KeriMorgret

Barry Schwartz posted some great information about this in November of 2010, quoting a couple of different Google sources. In short, more specific queries can cause Google to dig deeper and give more accurate estimates.

http://www.seroundtable.com/archives/023208.html

firstconversion

Yup. get rid of parameter laden urls and its easy enough. If they hang around the index for a few months before disappearing thats no big deal, as long as you have done the right thing it will work out fine

Also your not interested in the chaff, just the bits you want to make sure are indexed. So make sure thise are in sensibly titled sitemaps and its fine (used this on sites with 50 million and 100 million product pages. It gets a bit more complex at that number, but the underlying principle is the same)

BenRush

But then on a big site (talking 4m+ products) its usually the case that you have URL's indexed that wouldn't be generated in a sitemap because they include additional parameters.

Ideally of course you rid the index of parameter filled URL's but its pretty tough to do that.

firstconversion

Best bet is to make sure all your urls are in your sitemap and then you get an exact count.

Ive found it handy to use multiple sitempas for each subfolder i.e. /news/ or /profiles/ to be able to quickly see exactly what % of urls are indexed from each section of my site. This is super helpful in finding errors in a specific section or when you are working on indexing of a certain type of page

S

AlanBleiweiss

What I've found the reason for this comes down to how the Google system works. Case in point, a client site I have with 25,000 actual pages. They have mass duplicate content issues. When I do a generic site: with the domain, Google shows 50-60,000 pages. If I do an inurl: with a specific URL param, I either get 500,000 or over a million.

Though that's not your exact situation, it can help explain what's happening.

Essentially, if you do a normal site: Google will try its best to provide the content within the site that it shows the world based on "most relevant" content. When you do a refined check, it's naturally going to look for the content that really is most relevant - closest match to that actual parameter.

So if you're seeing more results with the refined process, it means that on any given day, at any given time, when someone does a general search, the Google system will filter out a lot of content that isn't seen as highly valuable for that particular search. So all those extra pages that come up in your refined check - many of them are most likely then evaluated as less than highly valuable / high quality or relevant to most searches.

Even if many are great pages, their system has multiple algorithms that have to be run to assign value. What you are seeing is those processes struggling to sort it all out.

adrianvender1

about 839,000 results.

BenRush

Different data center perhaps - what about if you add in the "dp" query to the string?

adrianvender1

I actually see 'about 897,000 results' for the search 'site:www.newark.com'.

BenRush

Thanks Adrian,

I understand those areas of inaccuracy, but I didn't expect to see a refined search produce more results than the original search. That just seems a little bizarre to me, which is why I was wondering if there was a clear explanation or if I was executing my query incorrectly.

Ben

adrianvender1

This is an expected 'oddity' of the site: operator. Here is a video of Matt Cutts explaining the imprecise nature of the site: operator.

watch?v=2ix3mHeL7hg

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

Site: Query Question

Got a burning SEO question?

Browse Questions

Explore more categories

Related Questions

PDF web traffic hitting our site

Should Google Trends Match Organic Traffic to My Site?

GA Internal Site Search Correct Query Parameter?

Using Site Maps Correctly

Google Analytics Set-Up for site with both http & https pages

Google Web Master Tools show that my site has been crawled, but search results show old title tags, ect.

Is there a problem with using same gmail account for multiple site analytics and GWMT?

Can 500 errors hurt rankings for an entire site or just the pages with the errors?