How can Google index a page that it can't crawl completely?
-
I recently posted a question regarding a product page that appeared to have no content. [http://www.seomoz.org/q/why-is-ose-showing-now-data-for-this-url]
What puzzles me is that this page got indexed anyway. Was it indexed based on Google knowing that there was once content on the page? Was it indexed based on the trust level of our root domain?
What are your thoughts? I'm asking not only because I don't know the answer, but because I know the argument is going to be made that if Google indexed the page then it must have been crawlable...therefore we didn't really have a crawlability problem.
Why Google index a page it can't crawl?
-
Yep. If you had links to that page from other authority pages, the pagerank/authority would transfer over, even with the indexing issue.
-
Awesome explanation Oleg. We had some other product pages (128) to be exact, that fell victim to the same coding error. I found it interesting that not only were most of them indexed, some of them actually had PageAuthority and or PageRank.
I am thinking Google may have allocated authority to some of these product pages because they had decent link profiles, despite Googlebot not being able to access the whole page. Is that possible?
-
It has crawled and indexed the page - check out the cached copy.
If you view the source, you can see that there is some HTML code but it seems to get cut off prematurely (perhaps due to a coding error). But that HTML code was enough to get the page indexed, but I would be suprised to see if it ranks for any terms. i.e. a search for the pages title does not return the correct url - "Shure SLX24/SM58 | Wireless Microphone System - CCI Solutions"
So G recognizes a page is there but see's think's it is blank - which is why it is indexed but won't rank for anything.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
After hack and remediation, thousands of URL's still appearing as 'Valid' in google search console. How to remedy?
I'm working on a site that was hacked in March 2019 and in the process, nearly 900,000 spam links were generated and indexed. After remediation of the hack in April 2019, the spammy URLs began dropping out of the index until last week, when Search Console showed around 8,000 as "Indexed, not submitted in sitemap" but listed as "Valid" in the coverage report and many of them are still hack-related URLs that are listed as being indexed in March 2019, despite the fact that clicking on them leads to a 404. As of this Saturday, the number jumped up to 18,000, but I have no way of finding out using the search console reports why the jump happened or what are the new URLs that were added, the only sort mechanism is last crawled and they don't show up there. How long can I expect it to take for these remaining urls to also be removed from the index? Is there any way to expedite the process? I've submitted a 'new' sitemap several times, which (so far) has not helped. Is there any way to see inside the new GSC view why/how the number of valid URLs in the indexed doubled over one weekend?
Intermediate & Advanced SEO | | rickyporco0 -
Is it possible to rank in google mexico when you don't have a local site?
Hello, someone is asking me why we don't rank in google mexico search engine. I mentioned we don't have a google mexico site, but have a USA site, so we may rank, but not as well as if we had the mexico site. IS there anyway to improve rankings or tips? THanks! Laura Robinson
Intermediate & Advanced SEO | | lauramrobinson321 -
How is Google crawling and indexing this directory listing?
We have three Directory Listing pages that are being indexed by Google: http://www.ccisolutions.com/StoreFront/jsp/ http://www.ccisolutions.com/StoreFront/jsp/html/ http://www.ccisolutions.com/StoreFront/jsp/pdf/ How and why is Googlebot crawling and indexing these pages? Nothing else links to them (although the /jsp.html/ and /jsp/pdf/ both link back to /jsp/). They aren't disallowed in our robots.txt file and I understand that this could be why. If we add them to our robots.txt file and disallow, will this prevent Googlebot from crawling and indexing those Directory Listing pages without prohibiting them from crawling and indexing the content that resides there which is used to populate pages on our site? Having these pages indexed in Google is causing a myriad of issues, not the least of which is duplicate content. For example, this file <tt>CCI-SALES-STAFF.HTML</tt> (which appears on this Directory Listing referenced above - http://www.ccisolutions.com/StoreFront/jsp/html/) clicks through to this Web page: http://www.ccisolutions.com/StoreFront/jsp/html/CCI-SALES-STAFF.HTML This page is indexed in Google and we don't want it to be. But so is the actual page where we intended the content contained in that file to display: http://www.ccisolutions.com/StoreFront/category/meet-our-sales-staff As you can see, this results in duplicate content problems. Is there a way to disallow Googlebot from crawling that Directory Listing page, and, provided that we have this URL in our sitemap: http://www.ccisolutions.com/StoreFront/category/meet-our-sales-staff, solve the duplicate content issue as a result? For example: Disallow: /StoreFront/jsp/ Disallow: /StoreFront/jsp/html/ Disallow: /StoreFront/jsp/pdf/ Can we do this without risking blocking Googlebot from content we do want crawled and indexed? Many thanks in advance for any and all help on this one!
Intermediate & Advanced SEO | | danatanseo0 -
How does the crawl find duplicate pages that don't exist on the site?
It looks like I have a lot of duplicate pages which are essentially the same url with some extra ? parameters added eg: http://www.merlin.org.uk/10-facts-about-malnutrition http://www.merlin.org.uk/10-facts-about-malnutrition?page=1 http://www.merlin.org.uk/10-facts-about-malnutrition?page=2 These extra 2 pages (and there's loads of pages this happens to) are a mystery to me. Not sure why they exist as there's only 1 page. Is this a massive issue? It's built on Drupal so I wonder if it auto generates these pages for some reason? Any help MUCH appreciated. Thanks
Intermediate & Advanced SEO | | Deniz0 -
How important is the number of indexed pages?
I'm considering making a change to using AJAX filtered navigation on my e-commerce site. If I do this, the user experience will be significantly improved but the number of pages that Google finds on my site will go down significantly (in the 10,000's). It feels to me like our filtered navigation has grown out of control and we spend too much time worrying about the url structure of it - in some ways it's paralyzing us. I'd like to be able to focus on pages that matter (explicit Category and Sub-Category) pages and then just let ajax take care of filtering products below these levels. For customer usability this is smart. From the perspective of manageable code and long term design this also seems very smart -we can't continue to worry so much about filtered navigation. My concern is that losing so many indexed pages will have a large negative effect (however, we will reduce duplicate content and be able provide much better category and sub-category pages). We probably should have thought about this a year ago before Google indexed everything :-). Does anybody have any experience with this or insight on what to do? Thanks, -Jason
Intermediate & Advanced SEO | | cre80 -
Any idea why I can't add a Panoramio image link to my Google Places page?
Hey guys & gals! Last week, I watched one of the Pro Webinars on here related to Google Places. Since then, I have begun to help one of my friends with his GP page to get my feet wet. One of the tips from the webinar was to geotag images in Panoramio to use for your images on the Places page. However, when I try to do this, I just get an error that says they can't upload it at this time. I tried searching online for answers, but the G support pages that I have found where someone asks the same question, there is no resolution. Can anyone help? PS - I would prefer not to post publicly the business name, URL, etc. So, if that info is needed, I can PM. Thanks a lot!
Intermediate & Advanced SEO | | strong11 -
Nobody Can Answer This? What Can Google Tell About Videos?
I uploaded a video to youtube one time and then went to upload it again, but saved differently with different tags. Youtube rejected the second upload as being the same as the first. Really, it was the same... just a different file with different tags. Now, I was thinking about making and uploading some similar but not identical videos for embedding on some web pages. Was thinking I'd make the voice overs different, but the images mostly the same montage. Do you think Youtube/Google will see it as the same video? I kind of assume that it didn't fly when I first tried it some time ago because youtube was looking at the audio in the way it can make a transcription. Do you think if the audi,o, file name, tags were different, it wouldn't matter if the video was the same? Thanks!
Intermediate & Advanced SEO | | 945010 -
Google swapped our website's long standing ranking home page for a less authoritative product page?
Our website has ranked for two variations of a keyword, one singular & the other plural in Google at #1 & #2 (for over a year). Keep in mind both links in serps were pointed to our home page. This year we targeted both variations of the keyword in PPC to a products landing page(still relevant to the keywords) within our website. After about 6 weeks, Google swapped out the long standing ranked home page links (p.a. 55) rank #1,2 with the ppc directed product page links (p.a. 01) and dropped us to #2 & #8 respectively in search results for the singular and plural version of the keyword. Would you consider this swapping of pages temporary, if the volume of traffic slowed on our product page?
Intermediate & Advanced SEO | | JingShack0