Does Google Parse The Anchor Text while Indexing
-
Hey moz fanz,
I'm here to ask a bit technical and open-minding question.
In the Google's paper http://infolab.stanford.edu/~backrub/google.html
They say they parse the page into hits which is basically word occurences.
But I want to know that they also do the same thing while keeping the anchor text database.
I mean do they parse the anchor text or keep it as it is .
For example, let's say my anchor text is "real car games".
When they indexing my link with anchor text, do they parse my anchor text as hits like
"real" distinct hits
"car" distinct hits
"games" distinct hits.
OR do they just use it as it is. As "real car games" -
I would say it depends on whether an entity is detected.
Imagine there is a company named "Real SEO." Google crawls a website that mentions them. Google sees the word "real" and then the word "seo." Normally, Google would see that "real" is an adjective that is modifying the noun "seo." So normally, this would be viewed as two separate, distinct words.
However, in this example, "real seo" is a brand and an "entity." So, even though the two words are first viewed separately, Google has become smart enough to figure out that when those two separate words are found in that order, then they are together referring to a single "thing."
For more on entities in search, I'd read the Moz posts here, here, here, and here.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Site Indexed but not Cached?
I launched a new website ~2 weeks ago that seems to be indexed but not cached. According to Google Webmaster most of the pages are indexed and I see them appear when I search site:www.xxx.com. However, when I type into the URL - cache:www.xxx.com I get a 404 error page from Google.
Technical SEO | | theLotter
I've checked more established websites and they are cached so I know I am checking correctly here... Why would my site be indexed but not in the cache?0 -
Google Sitelinks
Hello, Good afternoon. I am having a site issue with Sitelinks. For some reason when I search Google for the brand I represent "California Olive Ranch" Sitelinks are not being generated. When I search for "Cal Olive Ranch" our site links are being generated. Our domain is Californiaoliveranch.com. Is there a way to tell Google to to change the site links to match our domain and brand name? Is this something that can be done in Google Webmasters? Thank you very much for your help. Adam P
Technical SEO | | apost40 -
Google Indexed Only 1 Page
Hi, I'm new and hope this forum can help me. I have recently resubmit my sitemap and Google only Indexed 1 Page. I can still see many of my old indexed pages in the SERP's? I have upgraded my template and graded all my pages to A's on SEOmoz, I have solid backlinks and have been building them over time. I have redirected all my 404 errors in .htaccess and removed /index.php from my url's. I have never done this before but my website runs perfect and all my pages redirect as I hoped. My site: www.FunerallCoverFinder.co.za How do I figure out what the problem is? Thanks in Advance!
Technical SEO | | Klement690 -
Index page
To the SEO experts, this may well seem a silly question, so I apologies in advance as I try not to ask questions that I probably know the answer for already, but clarity is my goal I have numerous sites ,as standard practice, through the .htaccess I will always set up non www to www, and redirect the index page to www.mysite.com. All straight forward, have never questioned this practice, always been advised its the ebst practice to avoid duplicate content. Now, today, I was looking at a CMS service for a customer for their website, the website is already built and its a static website, so the CMS integration was going to mean a full rewrite of the website. Speaking to a friend on another forum, he told me about a service called simple CMS, had a look, looks perfect for the customer ... Went to set it up on the clients site and here is the problem. For the CMS software to work, it MUST access the index page, because my index page is redirected to www.mysite.com , it wont work as it cant find the index page (obviously) I questioned this with the software company, they inform me that it must access the index page, I have explained that it wont be able to and why (cause I have my index page redirected to avoid duplicate content) To my astonishment, the person there told me that duplicate content is a huge no no with Google (that's not the astonishing part) but its not relevant to the index and non index page of a website. This goes against everything I thought I knew ... The person also reassured me that they have worked within the SEO area for 10 years. As I am a subscriber to SEO MOZ and no one here has anything to gain but offering advice, is this true ? Will it not be an issue for duplicate content to show both a index page and non index page ?, will search engines not view this as duplicate content ? Or is this SEO expert talking bull, which I suspect, but cannot be sure. Any advice would be greatly appreciated, it would make my life a lot easier for the customer to use this CMS software, but I would do it at the risk of tarnishing the work they and I have done on their ranking status Many thanks in advance John
Technical SEO | | Johnny4B0 -
Inbound links with anchor text about pills pointing to our URLs
Hi, I have just noticed one of our websites has lots of inbound links with the words: "order cialis", "**cialis professional online", "**order viagra soft" and so on. I have checked the target URLs source for any kind of suspicious code and found nothing. **What should I look for and what should I do in this case? ** Thanks
Technical SEO | | ceci27100 -
Duplicate Homepage In Google
Hi Just found through my SEO dashboard, Google has two versions of the same homepage, the root page, plus the index.html page, causing duplicate content from both the pages. what is the best option to ensure google only have 1 version of the homepage listed?
Technical SEO | | rfksolutionsltd0 -
Getting Google to index new pages
I have a site, called SiteB that has 200 pages of new, unique content. I made a table of contents (TOC) page on SiteB that points to about 50 pages of SiteB content. I would like to get SiteB's TOC page crawled and indexed by Google, as well as all the pages it points to. I submitted the TOC to Pingler 24 hours ago and from the logs I see the Googlebot visited the TOC page but it did not crawl any of the 50 pages that are linked to from the TOC. I do not have a robots.txt file on SiteB. There are no robot meta tags (nofollow, noindex). There are no 'rel=nofollow' attributes on the links. Why would Google crawl the TOC (when I Pinglered it) but not crawl any of the links on that page? One other fact, and I don't know if this matters, but SiteB lives on a subdomain and the URLs contain numbers, like this: http://subdomain.domain.com/category/34404 Yes, I know that the number part is suboptimal from an SEO point of view. I'm working on that, too. But first wanted to figure out why Google isn't crawling the TOC. The site is new and so hasn't been penalized by Google. Thanks for any ideas...
Technical SEO | | scanlin0