Discrepency between # of pages and # of pages indexed
-
Here is some background:
- The site in question has approximately 10,000 pages and Google Webmaster shows that 10,000 urls(pages were submitted)
2) Only 5,500 pages appear in the Google index
3) Webmaster shows that approximately 200 pages could not be crawled for various reasons
4) SEOMOZ shows about 1,000 pages that have long URL's or Page Titles (which we are correcting)
5) No other errors are being reported in either Webmaster or SEO MOZ
6) This is a new site launched six weeks ago. Within two weeks of launching, Google had indexed all 10,000 pages and showed 9,800 in the index but over the last few weeks, the number of pages in the index kept dropping until it reached 5,500 where it has been stable for two weeks.
Any ideas of what the issue might be? Also, is there a way to download all of the pages that are being included in that index as this might help troubleshoot?
-
It's not exactly 3 clicks... if you're a PR 10 website it will take you quite a few clicks in before it gets "tired". Deep links are always a great idea.
-
I have also heard 3 clicks from a page with link juice. So if you have deep links to a page it can help carry pages deeper in. Do you agree?
-
Thank you to all for your advice. Good suggestions.
-
We do have different types of pages but Google is indexing all category pages but not all individual content pages. Based on the replies I have received, I suspect the issue can be helped by flattening the site architecture and links.
As an FYI, the site is a health care content site so no products are sold on the site. Revenue is from ads.
-
Great tip. I have seen this happen too (e.g. forum, blog, archive and content part of the website not indexed equally).
-
Do you have areas of your site that are distinctively different in type, such as category pages and individual item pages, or individual item pages and user submitted content?
What I'm getting at is trying to find if there's a certain type of page that Google isn't indexing. If you have distinct types of pages, you can create separate site maps (one for each type of content) and see if one type of content is being indexed better than another. It's more of a diagnostics tool that a solution, but I've found it helpful for sites of that size and larger in the past.
As other people have said, it's also a new site, so the lack of links could be hindering things as well.
-
Agreed!
-
Oh yes, Google is very big on balancing and allocation of resources. I don't think 10,000 will present a problem though as this number may be too common on ecommerce and content websites.
-
Very good advice in the replies. Everyone seems to have forgotten PageRank though. In Google's random surfer model it is assumed user will at some point abandon the website (after PageRank has been exhausted). This means if your site lacks raw link juice it may not have enough to go around through the whole site structure and it leaves some pages dry and unindexed. What can help is: Already mentioned flatter site architecture and unique content, but also direct links to pages not in index (including via social media) and more and stronger links towards home page which should ideally cascade down to the rest.
-
If you don't have many links to your site yet, I think that could reduce the number of pages that Google keeps in its main index. Google may allocate less resources to crawling your site if you have very little link juice, especially if deep pages on your site have no link juice coming in to them.
Another possibility is if some of the 10,000 pages are not unique content or duplicate content. Google could send a lot of your pages to its supplemental index if this is the case.
-
If you flatten out your site architecture a bit to where all pages are no more then 3 clicks deep, and provide a better HTML sitemap you will definitely see more pages indexed. It wont be all 10k, but it will be an improvement.
-
I appreciate the reply. The HTML site map does not show all 10,000 pages and some pages are likely more than 3 deep. I will try this and see what happens.
-
Google will not index your entire 10k page site just because you submitted the links in a site map. They will crawl your site and index many pages, but most likely you will never have your entire site indexed.
Cleaning up your crawl errors will help in getting your content indexed. A few other things you can do are:
-
provide a HTML sitemap on your website
-
ensure your site navigation is solid ( i.e. all pages are reachable, no island pages, the navigation can be seen in HMTL, etc)
-
ensure you do not have deep content. Google will often only go about 3 clicks deep. If you have buried content, it won't be indexed unless it is well linked.
-
if there are any particular pages you want to get indexed, you can link to them from your home page, or ask others to link to those pages from external sites.
-
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Index problems
“The website http://www.vaneyckshutters.com/nl/ does not show in the index of Google (site:vaneyckshutters.com/nl/). This must be the homepage in the Netherlands. Previously, the page www.vaneyckshutters.com was redirected to /nl/. This page is accessible now with a canonical tag to http://www.vaneyckshutters.com/nl/ in the hope to let /nl/ be indexed. When we look at the SERPS for keyword ‘shutters’, the page http://www.vaneyckshutters.com/ is shown in Google.nl on #32 and in Belgium #3. Problem & question: Why is it that /nl/ has not been indexed properly and why is it that we rank with http://www.vaneyckshutters.com on ‘shutters’ instead the/nl/ page?”
Technical SEO | | Happy-SEO1 -
Contact Page
I'm currently designing a new website for my wife, who just started her own wedding/engagement photography business. I'm trying to build it as SEO friendly as possible, but she brought up an idea that she likes that I've never tried before. Typically on all the websites I've ever built, I've had a dedicated contact page that has the typical contact form. Because that contact form on a wedding photographers website is almost as important as selling a product on an e-commerce site, she brought up the possibility of putting the contact form in the footer site-wide (minus maybe the homepage) rather than having a dedicated contact page. And in the navigation, where you have links such as "Home", "Portfolio", "About", "Prices", "Contact", etc. the "Contact" navigation item would transfer the user to the bottom of the page they are on rather than a new page. Any thoughts on which way would be better for a case like this, and any positives/negatives for doing it each way? One thought I had is that if it's in the footer rather than it's own page, it would lose it's search-ability as it's technically duplicate content on each page. But then again, that's what a footer is. Thanks, Mickey
Technical SEO | | shannmg10 -
New Page Showing Up On My Reports w/o Page Title, Words, etc - However, I didn't create it
I have a WordPress site and I was doing a crawl for errors and it is now showing up as of today that this page : https://thinkbiglearnsmart.com/event-registration/?event_id=551&name_of_event=HTML5 CSS3 is new and has no page title, words, etc. I am not even sure where this page or URL came from. I was messing with the robots.txt file to allow some /category/ posts that were being hidden, but I didn't re-allow anything with the above appendages. I just want to make sure that I didn't screw something up that is now going to impact my rankings - this was just a really odd message to come up as I didn't create this page recently - and that shouldnt even be a page accessible to the public. When I edit the page - it is using an Event Espresso (WordPress plugin) shortcode - and I don't want to noindex this page as it is all of my events. Sorry this post is confusing, any help or insight would be appreciated! I am also interested in hiring someone for some hourly consulting work on SEO type issues if anyone has any references. Thank you!
Technical SEO | | webbmason0 -
Staging & Development areas should be not indexable (i.e. no followed/no index in meta robots etc)
Hi I take it if theres a staging or development area on a subdomain for a site, who's content is hence usually duplicate then this should not be indexable i.e. (no-indexed & nofollowed in metarobots) ? In order to prevent dupe content probs as well as non project related people seeing work in progress or finding accidentally in search engine listings ? Also if theres no such info in meta robots is there any other way it may have been made non-indexable, or at least dupe content prob removed by canonicalising the page to the equivalent page on the live site ? In the case in question i am finding it listed in serps when i search for the staging/dev area url, so i presume this needs urgent attention ? Cheers Dan
Technical SEO | | Dan-Lawrence0 -
Page Indexing increase when I request Google Site Link demote
Hi there, Has anyone seen a page crawling increase in Google Web Master Tools when they have requested a site link demotion? I did this around the 23rd of March, the next day I started to see page crawling rise and rise and report a very visible spike in activity and to this day is still relatively high. From memory I have asked about this in SEOMOZ Q&A a couple of years ago in and was told that page crawl activity is a good thing - ok fine, no argument. However at the nearly in the same period I have noticed that my primary keyword rank for my home page has dropped away to something in the region of 4th page on Google US and since March has stayed there. However the exact same query in Google UK (Using SEOMOZ Rank Checker for this) has remained the same position (around 11th) - it has barely moved. I decided to request an undemote on GWT for this page link and the page crawl started to drop but not to the level before March 23rd. However the rank situation for this keyword term has not changed, the content on our website has not changed but something has come adrift with our US ranks. Using Open Site Explorer not one competitor listed has a higher domain authority than our site, page authority, domain links you name it but they sit there in first page. Sorry the above is a little bit of frustration, this question is not impulsive I have sat for weeks analyzing causes and effects but cannot see why this disparity is happening between the 2 country ranks when it has never happened for this length of time before. Ironically we are still number one in the United States for a keyword phrase which I moved away from over a month ago and do not refer to this phrase at all on our index page!! Bizarre. Granted, site link demotion may have no correlation to the KW ranking impact but looking at activities carried out on the site and timing of the page crawling. This is the only sizable factor I can identify that could be the cause. Oh! and the SEOMOZ 'On-Page Optimization Tool' reports that the home page gets an 'A' for this KW term. I have however this week commented out the canonical tag for the moment in the index page header to see if this has any effect. Why? Because as this was another (if not minor) change I employed to get the site to an 'A' credit with the tool. Any ideas, help appreciated as to what could be causing the rank differences. One final note the North American ranks initially were high, circa 11-12th but then consequently dropped away to 4th page but not the UK rankings, they witnessed no impact. Sorry one final thing, the rank in the US is my statistical outlier, using Google Analytics I have an average rank position of about 3 across all countries where our company appears for this term. Include the US and it pushes the average to 8/9th. Thanks David
Technical SEO | | David-E-Carey0 -
Noindex Pages indexed
I'm having problem that gogole is index my search results pages even though i have added the "noindex" metatag. Is the best thing to block the robot from crawling that file using robots.txt?
Technical SEO | | Tedred0 -
Two different page authority ranks for the same page
I happened to notice that trophycentral.com and www.trophycentral.com have two different page ranks even though there is a 301 redirect. Should I be concerned? http://trophycentral.com Page Authority: 47 Domain Authority: 42 http://www.trophycentral.com Page Authority: 51 Domain Authority: 42 Thanks!
Technical SEO | | trophycentraltrophiesandawards0 -
Rel canonical or 301 the Index Page?
Still a bit confused on best practice for /index.php showing up as duplicate for www.mysite.com. What do I need to do and How?
Technical SEO | | bozzie3110