Site: Query Question
-
Hi All,
Question around the site: query you can execute on Google for example. Now I know it has lots of inaccuracies, but I like to keep a high level sight of it over time.
I was using it to also try and get a high level view of how many product pages were indexed vs. the total number of pages.
What is interesting is when I do a site: query for say www.newark.com I get ~748,000 results returned.
When I do a query for www.newark.com "/dp/" I get ~845,000 results returned.
Either I am doing something stupid or these numbers are completely backwards?
Any thoughts?
Thanks,
Ben
-
Barry Schwartz posted some great information about this in November of 2010, quoting a couple of different Google sources. In short, more specific queries can cause Google to dig deeper and give more accurate estimates.
-
Yup. get rid of parameter laden urls and its easy enough. If they hang around the index for a few months before disappearing thats no big deal, as long as you have done the right thing it will work out fine
Also your not interested in the chaff, just the bits you want to make sure are indexed. So make sure thise are in sensibly titled sitemaps and its fine (used this on sites with 50 million and 100 million product pages. It gets a bit more complex at that number, but the underlying principle is the same)
-
But then on a big site (talking 4m+ products) its usually the case that you have URL's indexed that wouldn't be generated in a sitemap because they include additional parameters.
Ideally of course you rid the index of parameter filled URL's but its pretty tough to do that.
-
Best bet is to make sure all your urls are in your sitemap and then you get an exact count.
Ive found it handy to use multiple sitempas for each subfolder i.e. /news/ or /profiles/ to be able to quickly see exactly what % of urls are indexed from each section of my site. This is super helpful in finding errors in a specific section or when you are working on indexing of a certain type of page
S
-
What I've found the reason for this comes down to how the Google system works. Case in point, a client site I have with 25,000 actual pages. They have mass duplicate content issues. When I do a generic site: with the domain, Google shows 50-60,000 pages. If I do an inurl: with a specific URL param, I either get 500,000 or over a million.
Though that's not your exact situation, it can help explain what's happening.
Essentially, if you do a normal site: Google will try its best to provide the content within the site that it shows the world based on "most relevant" content. When you do a refined check, it's naturally going to look for the content that really is most relevant - closest match to that actual parameter.
So if you're seeing more results with the refined process, it means that on any given day, at any given time, when someone does a general search, the Google system will filter out a lot of content that isn't seen as highly valuable for that particular search. So all those extra pages that come up in your refined check - many of them are most likely then evaluated as less than highly valuable / high quality or relevant to most searches.
Even if many are great pages, their system has multiple algorithms that have to be run to assign value. What you are seeing is those processes struggling to sort it all out.
-
about 839,000 results.
-
Different data center perhaps - what about if you add in the "dp" query to the string?
-
I actually see 'about 897,000 results' for the search 'site:www.newark.com'.
-
Thanks Adrian,
I understand those areas of inaccuracy, but I didn't expect to see a refined search produce more results than the original search. That just seems a little bizarre to me, which is why I was wondering if there was a clear explanation or if I was executing my query incorrectly.
Ben
-
This is an expected 'oddity' of the site: operator. Here is a video of Matt Cutts explaining the imprecise nature of the site: operator.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
New Site Worries
To cut a long story short, our old web developers who built us a bespoke site decided that they could no longer offer us support so we decided to move our back end to the latest Magento 2 software and move over to https with a new company. The new setup has been live for 3 weeks, I have checked in webmaster tools and it says we have 4 pages indexed, if I type in site:https://www.mydomain.com/ we have 6560 pages indexed, our robots.txt file looks like this:Sitemap: https://www.mydomain.com/sitemap.xml Sitemap: https://www.mydomain.com/sitemaps/sitemap_default.xml I use Website Auditor and Screaming Frog, Website Auditor returns a 302 for my domain and Screaming Frog returns a 403 which means I cannot scan any of these. If I check my domain using an https checking tool some sites return an error but some return a 200.
Reporting & Analytics | | Palmbourne
I have spoken to my new developer and he says everything is fine, in Webmaster tools I can see some redirects from his domain to mine when the site was in testing mode. I am concerned that something is not right as I always check my pages on a regular basis. Can anyone shed any light on this, is it right or am I right to be concerned. Thank you in advance0 -
Getting google impressions for a site not in the index...
Hi all Wondering if i could pick the brains of those wise than myself... my client has an https website with tons of pages indexed and all ranking well, however somehow they managed to also set their server up so that non https versions of the pages were getting indexed and thus we had the same page indexed twice in the engine but on slightly different urls (it uses a cms so all the internal links are relative too). The non https is mainly used as a dev testing environment. Upon seeing this we did a google remove request in WMT, and added noindex in the robots and that saw the index pages drop over night. See image 1. However, the site still appears to getting return for a couple of 100 searches a day! The main site gets about 25,000 impressions so it's way down but i'm puzzled as to how a site which has been blocked can appear for that many searches and if we are still liable for duplicate content issues. Any thoughts are most welcome. Sorry, I am unable to share the site name i'm afraid. Client is very strict on this. Thanks, Carl image1.png
Reporting & Analytics | | carl_daedricdigital0 -
My GA code is on my site but Google Analytics isn't being pulled into SEOMoz...why?
The CEO wants me to present an SEO plan next week for three of our sites; however, I got this message when I went to campaign overview tab: "It appears there's a problem with our connection to your Google Analytics account. Please go to your Settings page to update your connection." I double-checked the GA code and it's the same on both our site and in SEOMoz...what gives? I clicked on Choose Your GA Profile->Set GA Account and Profile then got this warning: "Are you sure you want to change your Google Analytics connection? Changing your connection will reset our cache of your historical GA traffic data." I need this data pronto so I can set strategy for three sites; any help would be greatly appreciated! Darrell
Reporting & Analytics | | AdviceElle0 -
Site Crash Effect On Traffic
All, I manage a site that unfortunately crashed due to a server issue in late October for about 3 hours. Prior to the crash, traffic was the best it had ever been in the 3+ year history of the site. As you might expect, since the crash traffic has gone gradually down and is now about 15% off pre-crash numbers. I understand that when a site crashes, it disrupts the crawling process and can disrupt traffic (in my case rich snippets were thrown off for days) but would love to hear experiences any of you have had in similar situations. How much did traffic drop after a crash? When did it recover? Other thoughts? Thanks, John
Reporting & Analytics | | JSOC0 -
Question on regular expression for filters on GA
Hi guys, I am creating profiles on some of the countries sites in my network, and have managed to establish the filter for tracking certain url patterns, for example: ^/japan-english- is tracking all my urls in the Japan site that start by japan-english great! however, it is not tracking the japanese instance of the urls. The pattern for the latter is : www.mysite.org/jp/japan-english I could then modify the filter to track the jp subfolder like this: ^/jp/japan-english- but it will then only track the urls on the /jp/ subfolder does anyone know the regex command for tracking the two url patters as follows: /jp/japan-english- & /japan-english- thanks in advance david
Reporting & Analytics | | BritishCouncil0 -
Filter out IP address of Site Search analytics
Hi Mozzers, I have a filter that excludes all internal traffic from my sites. But this does not seem to work on site search > Search Terms See here:- http://productforums.google.com/forum/#!searchin/analytics/filter$20site$20search/analytics/pO18L31hEO4/tJ3lKVNT3YYJ Any ideas? Or is it a bug, etc Thanks S
Reporting & Analytics | | Metropolis0 -
Yahoo wont Index my site...???
For some reason every time I get an SEO report card, or even check for my site on Yahoo, im never there. An the Report card always tells me that I am not being indexed by Yahoo. I don't understand bc my site is indexed by Google and Bing beautifully. I feel like I am missing out on good potential traffic...Any suggestions?
Reporting & Analytics | | Caseman0 -
Question on correctly using rel="canonical
OK I have a question for the community here. All links below are just used as examples and no relationship or real campaigns are being used with any websites named below. Lets say that my domain is abc.com/whiskey/jack-daniels-whiskey/Gentleman-Jack/ but for Google Analytics tracking purposes I gave another website a tracking link for a banner that is as follows http://abc.com/whiskey/jack-daniels-whiskey/Gentleman-Jack/?utm_source=jackdanials&utm_medium=banner&utm_content=Gentleman-Jack&utm_campaign=holiday%2Bpromotion Since the original URL to my site is http://abc.com/whiskey/jack-daniels-whiskey/Gentleman-Jack and Google will then spider the other site picking up my tracking link within the banner which also contains my original URL, can it cause issues with duplicate content and if so what is the best way to use rel="canonical in this case or would you handle this issue in a different way? Thanks in advance for all your help.
Reporting & Analytics | | DRTBA0