Can Anybody Understand This ?
-
Hey guyz,
These days I'm reading the paperwork from sergey brin and larry which is the first paper of Google.
And I dont get the Ranking part which is:"Google maintains much more information about web documents than typical search engines. Every hitlist includes position, font, and capitalization information. Additionally, we factor in hits from anchor text and the PageRank of the document. Combining all of this information into a rank is difficult. We designed our ranking function so that no particular factor can have too much influence. First, consider the simplest case -- a single word query. In order to rank a document with a single word query, Google looks at that document's hit list for that word. Google considers each hit to be one of several different types (title, anchor, URL, plain text large font, plain text small font, ...), each of which has its own type-weight. The type-weights make up a vector indexed by type. Google counts the number of hits of each type in the hit list. Then every count is converted into a count-weight. Count-weights increase linearly with counts at first but quickly taper off so that more than a certain count will not help. We take the dot product of the vector of count-weights with the vector of type-weights to compute an IR score for the document. Finally, the IR score is combined with PageRank to give a final rank to the document.
For a multi-word search, the situation is more complicated. Now multiple hit lists must be scanned through at once so that hits occurring close together in a document are weighted higher than hits occurring far apart. The hits from the multiple hit lists are matched up so that nearby hits are matched together. For every matched set of hits, a proximity is computed. The proximity is based on how far apart the hits are in the document (or anchor) but is classified into 10 different value "bins" ranging from a phrase match to "not even close". Counts are computed not only for every type of hit but for every type and proximity. Every type and proximity pair has a type-prox-weight. The counts are converted into count-weights and we take the dot product of the count-weights and the type-prox-weights to compute an IR score. All of these numbers and matrices can all be displayed with the search results using a special debug mode. These displays have been very helpful in developing the ranking system.
"
-
I can't say I have a complete understanding of what this is explaining, but here's a link to the original paper on Stanford's website if anyone else is interested. http://infolab.stanford.edu/~backrub/google.html
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Can an external firewall affect rankings?
For security reasons we are now routing traffic through an external firewall cum CDN. Our server and domain IPs remain the same, but any request is routed through an external IP, which then forwards the traffic. Would our rankings be affected because of IP changes? Thanks Sam
Technical SEO | | samgold0 -
Can Google index the text content in a PDF?
I really really thought the answer was always no. There's plenty of other things you can do to improve search visibility for a PDF, but I thought the nature of the file type made the content itself not-parsable by search engine crawlers... But now, my client's competitor is ranking for my client's brand name with a PDF that contains comparison content. Thing is, my client's brand isn't in the title, the alt-text, the url... it's only in the actual text of the PDF. Did I miss a major update? Did I always have this wrong?
Technical SEO | | LindsayDayton0 -
How can my homepage have 2 meta descriptions?
Hi all, When googling our company, I see our main page pop up with 2 different meta descriptions, depending on the search query. The situation
Technical SEO | | NHA_DistanceLearning
The search query 'nha' (on google.nl) returns the main page with a meta description that looks like a random grab from the code by Google itself, starting with 'Ik volg een cursus bij de NHA...' The search query 'nha.nl' (on google.nl) returns the main page with the proper meta description, starting with 'Aanbieder van thuisstudies met onder meer MBO-opleidingen...'. So yeah, I'd like to have the main page only appear with the proper meta description, the latter one. We did have a redirect issue (duplicate homepages) a few weeks ago and programming fixed it. Could this have something to do with a redirect? I'd love to hear your thoughts. Thanks!0 -
Error in how URLs were set up, how can it be fixed?
Hi, I managed a website port to a WP responsive deisgn for a client, see http://chicagotelephony.com. Unfortunately, he wanted me to work with a graphic designer rather than a web geek, so the resulting website has messed up URLS, i.e. index.php is smack in the middle of almost all the pages. I know that is all wrong, but I also realized that she was not fluent in the way the Genesis framework was set up or how the particular template I selected, operated. So I just wanted to get it out there.... and now it is live, but has all these errors. Do I have to do 301 redirects? Is there a setting or a button inside of the WP template that would put correct slugs but get rid of the index.pho within the URL? For example, http://chicagotelephony.com/index.php/cloud-based-solutions/ and http://chicagotelephony.com/index.php/var-network-value-added-reseller/ should be chicagotelephony.com/cloud-based-solutions/ and chicagotelephony.com/var-netowrk-value-added-reselle/ and so forth.
Technical SEO | | DianeDP0 -
Hi can anyone let me know which is the better server
hi, i am trying to find out which is the better dedicated server and would like your opinion. the first one is Dell PowerEdge 😄 Intel Xeon E3-1220L, 2.2GHz Dual-Core
Technical SEO | | ClaireH-184886
4GB DDR3 RAM
2 x 500GB SATA HDD
Linux/Windows
10000GB Monthly Transfer
Up to 2 IP Addresses
LSI Raid Card and the second one is, Intel Atom 330 1MB L2 Cache 1.6GH 500GBStorage
4GBRAM
10TBBandwidth if you can please let me know the difference and which one is better for speed and for memory for a large site. many thanks0 -
How can I prevent duplicate content between www.page.com/ and www.page.com
SEOMoz's recent crawl showed me that I had an error for duplicate content and duplicate page titles. This is a problem because it found the same page twice because of a '/' on the end of one url. e.g. www.page.com/ vs. www.page.com My question is do I need to be concerned about this. And is there anything I should put in my htaccess file to prevent this happening. Thanks!
Technical SEO | | onlineexpression
Karl0 -
Www vs non www and understanding opensite
Hi Guys, New guy here with some questions regarding the difference between www and non www. I am helping with a site at the moment and gradually working my way through bits and learning all the time. I was watching one of the seomoz videos and it brought my attention back to www vs non www. I understand that google will treat these as two seperate sites but wanted to check what the stats are telling me. I was under the impression that www.mydummysite.com was getting most links etc as this is what I have always used. However when I used Opensite explorer it told me something different as follows: www.mydummysite.com 32/100 29/100 5 16 mydummysite.com 32/100 29/100 2 1,500 Am i correct in saying that i should be adding a redirect from www.mydummysite.com to mydummysite.com ???? I am thinking that this is telling me that I am potentially missing out on 1,500 links to my site but it could mean I am missing out on just 16. Eitherway I guess its something I should fix right? Do I just redirect that page or would all pages beneith it such as mydummysite.com/news also need redirect??? Can i use Canonical Rel links for this now? Thanks for taking the time to read and reply! 🙂
Technical SEO | | wedmonds0 -
Can I noindex most of my site?
A large number of the pages on my site are pages that contain things like photos and maps that are useful to my visitors, but would make poor landing pages and have very little written content. My site is huge. Would it be benificial to noindex all of these?
Technical SEO | | mascotmike0