Can Anybody Understand This ?
-
Hey guyz,
These days I'm reading the paperwork from sergey brin and larry which is the first paper of Google.
And I dont get the Ranking part which is:"Google maintains much more information about web documents than typical search engines. Every hitlist includes position, font, and capitalization information. Additionally, we factor in hits from anchor text and the PageRank of the document. Combining all of this information into a rank is difficult. We designed our ranking function so that no particular factor can have too much influence. First, consider the simplest case -- a single word query. In order to rank a document with a single word query, Google looks at that document's hit list for that word. Google considers each hit to be one of several different types (title, anchor, URL, plain text large font, plain text small font, ...), each of which has its own type-weight. The type-weights make up a vector indexed by type. Google counts the number of hits of each type in the hit list. Then every count is converted into a count-weight. Count-weights increase linearly with counts at first but quickly taper off so that more than a certain count will not help. We take the dot product of the vector of count-weights with the vector of type-weights to compute an IR score for the document. Finally, the IR score is combined with PageRank to give a final rank to the document.
For a multi-word search, the situation is more complicated. Now multiple hit lists must be scanned through at once so that hits occurring close together in a document are weighted higher than hits occurring far apart. The hits from the multiple hit lists are matched up so that nearby hits are matched together. For every matched set of hits, a proximity is computed. The proximity is based on how far apart the hits are in the document (or anchor) but is classified into 10 different value "bins" ranging from a phrase match to "not even close". Counts are computed not only for every type of hit but for every type and proximity. Every type and proximity pair has a type-prox-weight. The counts are converted into count-weights and we take the dot product of the count-weights and the type-prox-weights to compute an IR score. All of these numbers and matrices can all be displayed with the search results using a special debug mode. These displays have been very helpful in developing the ranking system.
"
-
I can't say I have a complete understanding of what this is explaining, but here's a link to the original paper on Stanford's website if anyone else is interested. http://infolab.stanford.edu/~backrub/google.html
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
How can i use multiple domain for one website?
Dear Experts, I want to make an online store like www.abc.com and I have a plan to buy 3 more domains like www.abc.co.uk, www.abc.com.au, www.abc.ae and redirect all domain to main domain which is www.abc.com but I want If somebody search from UK so He/she will see www.abc.co.uk domain in search result and If somebody search from UAE so He/she will see www.abc.ae domain in search result and same for other extension. How can I safe from duplication multiple domain for one website What would be the SEO strategy should i follow I am hoping a positive reply from your side Thanks
Technical SEO | | jfdagborrbg0 -
Why can't google mobile friendly test access my website?
getting the following error when trying to use google mobile friendly tool: "page cannot be reached. This could be because the page is unavailable or blocked by robots.txt" I don't have anything blocked by robots.txt or robots tag. i also manage to render my pages on google search console's fetch and render....so what can be the reason that the tool can't access my website? Also...the mobile usability report on the search console works but reports very little, and the google speed test also doesnt work... Any ideas to what is the reason and how to fix this? LEARN MOREDetailsUser agentGooglebot smartphone
Technical SEO | | Nadav_W0 -
How can I make it so that robots.txt is not ignored due to a URL re-direct?
Recently a site moved from blog.site.com to site.com/blog with an instruction like this one: /etc/httpd/conf.d/site_com.conf:94: ProxyPass /blog http://blog.site.com
Technical SEO | | rodelmo4
/etc/httpd/conf.d/site_com.conf:95: ProxyPassReverse /blog http://blog.site.com It's a Wordpress.org blog that was set as a subdomain, and now is being redirected to look like a directory. That said, the robots.txt file seems to be ignored by Google bot. There is a Disallow: /tag/ on that file to avoid "duplicate content" on the site. I have tried this before with other Wordpress subdomains and works like a charm, except for this time, in which the blog is rendered as a subdirectory. Any ideas why? Thanks!0 -
Error in how URLs were set up, how can it be fixed?
Hi, I managed a website port to a WP responsive deisgn for a client, see http://chicagotelephony.com. Unfortunately, he wanted me to work with a graphic designer rather than a web geek, so the resulting website has messed up URLS, i.e. index.php is smack in the middle of almost all the pages. I know that is all wrong, but I also realized that she was not fluent in the way the Genesis framework was set up or how the particular template I selected, operated. So I just wanted to get it out there.... and now it is live, but has all these errors. Do I have to do 301 redirects? Is there a setting or a button inside of the WP template that would put correct slugs but get rid of the index.pho within the URL? For example, http://chicagotelephony.com/index.php/cloud-based-solutions/ and http://chicagotelephony.com/index.php/var-network-value-added-reseller/ should be chicagotelephony.com/cloud-based-solutions/ and chicagotelephony.com/var-netowrk-value-added-reselle/ and so forth.
Technical SEO | | DianeDP0 -
Can I disallow my subdomain for penguin recover?
Hi, I have a site like BannerBuzz.com, before last penguin my site's all keywords were in good position in google, but after penguin hit on my website, my all keywords are going down and down day by day, i have done some changes in my website for improvement, but in 1 change i have some confusion. i have one sub domain (http://reviews.bannerbuzz.com/), which display my websites all keywords user reviews, in which every category's 15 reviews are display in my website http://www.bannerbuzz.com so are those user reviews consider as duplicate content between sub domain and main website. can i disallow sub domain from all search engine? currently sub domain is open for all search engine, is that helpful to block it? Thanks
Technical SEO | | CommercePundit0 -
Can Google Anlaytics Segment By Time of the DaY?
Greetings from Latitude 53.92705600 Longitude -1.38481600... Can Google analytics anser this question..."Tell me on the 1st Sept how many visitors landed on my site between 1200HRS & 1300HRS" Grazie Tanto,
Technical SEO | | Nightwing
David0 -
Google is keeping very old title tags in the SERPs for my site. How can I fix this?
Hi Around 6 months ago a site I work with changed its brand. One company became two. Despite changing the title when a new site went live around 6 months ago Google still picks up the old title for certain search results relevant to the old title. When a search result is relevant to the new title it shows that. It's very frustrating as we are trying to re-brand and do not want the old brand name showing for some very important search results. Thanks in advance for your help Paul
Technical SEO | | pauldoffman0 -
Www vs non www and understanding opensite
Hi Guys, New guy here with some questions regarding the difference between www and non www. I am helping with a site at the moment and gradually working my way through bits and learning all the time. I was watching one of the seomoz videos and it brought my attention back to www vs non www. I understand that google will treat these as two seperate sites but wanted to check what the stats are telling me. I was under the impression that www.mydummysite.com was getting most links etc as this is what I have always used. However when I used Opensite explorer it told me something different as follows: www.mydummysite.com 32/100 29/100 5 16 mydummysite.com 32/100 29/100 2 1,500 Am i correct in saying that i should be adding a redirect from www.mydummysite.com to mydummysite.com ???? I am thinking that this is telling me that I am potentially missing out on 1,500 links to my site but it could mean I am missing out on just 16. Eitherway I guess its something I should fix right? Do I just redirect that page or would all pages beneith it such as mydummysite.com/news also need redirect??? Can i use Canonical Rel links for this now? Thanks for taking the time to read and reply! 🙂
Technical SEO | | wedmonds0