Discrepency between # of pages and # of pages indexed
-
Here is some background:
- The site in question has approximately 10,000 pages and Google Webmaster shows that 10,000 urls(pages were submitted)
2) Only 5,500 pages appear in the Google index
3) Webmaster shows that approximately 200 pages could not be crawled for various reasons
4) SEOMOZ shows about 1,000 pages that have long URL's or Page Titles (which we are correcting)
5) No other errors are being reported in either Webmaster or SEO MOZ
6) This is a new site launched six weeks ago. Within two weeks of launching, Google had indexed all 10,000 pages and showed 9,800 in the index but over the last few weeks, the number of pages in the index kept dropping until it reached 5,500 where it has been stable for two weeks.
Any ideas of what the issue might be? Also, is there a way to download all of the pages that are being included in that index as this might help troubleshoot?
-
It's not exactly 3 clicks... if you're a PR 10 website it will take you quite a few clicks in before it gets "tired". Deep links are always a great idea.
-
I have also heard 3 clicks from a page with link juice. So if you have deep links to a page it can help carry pages deeper in. Do you agree?
-
Thank you to all for your advice. Good suggestions.
-
We do have different types of pages but Google is indexing all category pages but not all individual content pages. Based on the replies I have received, I suspect the issue can be helped by flattening the site architecture and links.
As an FYI, the site is a health care content site so no products are sold on the site. Revenue is from ads.
-
Great tip. I have seen this happen too (e.g. forum, blog, archive and content part of the website not indexed equally).
-
Do you have areas of your site that are distinctively different in type, such as category pages and individual item pages, or individual item pages and user submitted content?
What I'm getting at is trying to find if there's a certain type of page that Google isn't indexing. If you have distinct types of pages, you can create separate site maps (one for each type of content) and see if one type of content is being indexed better than another. It's more of a diagnostics tool that a solution, but I've found it helpful for sites of that size and larger in the past.
As other people have said, it's also a new site, so the lack of links could be hindering things as well.
-
Agreed!
-
Oh yes, Google is very big on balancing and allocation of resources. I don't think 10,000 will present a problem though as this number may be too common on ecommerce and content websites.
-
Very good advice in the replies. Everyone seems to have forgotten PageRank though. In Google's random surfer model it is assumed user will at some point abandon the website (after PageRank has been exhausted). This means if your site lacks raw link juice it may not have enough to go around through the whole site structure and it leaves some pages dry and unindexed. What can help is: Already mentioned flatter site architecture and unique content, but also direct links to pages not in index (including via social media) and more and stronger links towards home page which should ideally cascade down to the rest.
-
If you don't have many links to your site yet, I think that could reduce the number of pages that Google keeps in its main index. Google may allocate less resources to crawling your site if you have very little link juice, especially if deep pages on your site have no link juice coming in to them.
Another possibility is if some of the 10,000 pages are not unique content or duplicate content. Google could send a lot of your pages to its supplemental index if this is the case.
-
If you flatten out your site architecture a bit to where all pages are no more then 3 clicks deep, and provide a better HTML sitemap you will definitely see more pages indexed. It wont be all 10k, but it will be an improvement.
-
I appreciate the reply. The HTML site map does not show all 10,000 pages and some pages are likely more than 3 deep. I will try this and see what happens.
-
Google will not index your entire 10k page site just because you submitted the links in a site map. They will crawl your site and index many pages, but most likely you will never have your entire site indexed.
Cleaning up your crawl errors will help in getting your content indexed. A few other things you can do are:
-
provide a HTML sitemap on your website
-
ensure your site navigation is solid ( i.e. all pages are reachable, no island pages, the navigation can be seen in HMTL, etc)
-
ensure you do not have deep content. Google will often only go about 3 clicks deep. If you have buried content, it won't be indexed unless it is well linked.
-
if there are any particular pages you want to get indexed, you can link to them from your home page, or ask others to link to those pages from external sites.
-
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Will redirecting a logged in user from a public page to an equivalent private page (not visible to google) impact SEO?
Hi, We have public pages that can obviously be visited by our registered members. When they visit these public pages + they are logged in to our site, we want to redirect them to the equivalent (richer) page on the private site e.g. a logged in user visiting /public/contentA will be redirected to /private/contentA Note: Our /public pages are indexed by Google whereas /private pages are excluded. a) will this affect our SEO? b) if not, is 302 the best http status code to use? Cheers
Technical SEO | | bernienabo0 -
My New Pages Are Really Slow to Index Lately - Are Yours Slow Too ?
New pages on my site usually shoot right into the index - often in under 24 hours. Lately they are taking weeks to get into the index. Are your new pages slow to index lately? Thanks for anything that you can report.
Technical SEO | | EGOL2 -
How to Stop Google from Indexing Old Pages
We moved from a .php site to a java site on April 10th. It's almost 2 months later and Google continues to crawl old pages that no longer exist (225,430 Not Found Errors to be exact). These pages no longer exist on the site and there are no internal or external links pointing to these pages. Google has crawled the site since the go live, but continues to try and crawl these pages. What are my next steps?
Technical SEO | | rhoadesjohn0 -
My blog homepage deindexed, other pages indexing, still traffic not changed.
Hello! Today when I check my blog site search on Google, I can't see my blog home page. Though all my posts and pages are still on the Google results. Today I published a test post, then it also indexed by the Google less than 3 minutes. Still I can't see any traffic changes. 10th of April (yesterday) when I perform a site search (site:mydomain.com), I saw my site on the Google search result. Today I installed the Ulitmate SEO plug-in and deactivated WordPress SEO plug-in. After a few hours I saw this issue. (I'm not saying this is the issue, I just mentioned it). In addition to that I never used any black hat SEO methods to improve my ranking. my site:- http://goo.gl/6mvQT Any help really appreciate!
Technical SEO | | Godad0 -
Off-page SEO and on-page SEO improvements
I would like to know what off-page SEO and on-page SEO improvements can be made to one of our client websites http://www.nd-center.com Best regards,
Technical SEO | | fkdpl2420 -
Duplicate index.php/webpage pages on website. Help needed!
Hi Guys, Having a really frustrating problem with our website. It is a Joomla 1.7 site and we have some duplicate page issues. What is happening is that we have a webpage, lets say domain.com/webpage1 and then we also have domain.com/index.php/webpage1. Google is seeing these as duplicate pages and is causing me some real SEO problems. I have tried setting up a 301 redirect but it wn't let me redirect /index.php/webpage1 to /webpage1. Anyone have any ideas or plugins that can be used to sort this out? Any help will be really appreciated! Matt.
Technical SEO | | MatthewBarby0 -
Removing Duplicate Pages
Hi everyone. I'm sure this falls under novice seo question. But how do i remove duplicate pages from my site. I have not created the pages per say. Their may be a an internal link on a page that links to the page causing the duplication. Do i remove the internal link here is a sample of a duplicate page http://www.ticketplatform.com/about/ticket-industry-news-details/11-03-07/Ticket_Platform_to_help_LilysProject_com_to_raise_money_for_ALYN_Hospital_in_Israel.aspx?ReturnURL=%2fabout%2fticket-industry-news.aspx http://www.ticketplatform.com/about/ticket-industry-news-details/11-03-07/Ticket_Platform_to_help_LilysProject_com_to_raise_money_for_ALYN_Hospital_in_Israel.aspx?ReturnURL=%2fhome.aspx&CntPageID=1 I know the url is way too long. working on it Thanks for your feedbacks.
Technical SEO | | ticketplatform0 -
On Page 301 redirect for html pages
For php pages youve got Header( "HTTP/1.1 301 Moved Permanently" );
Technical SEO | | shupester
Header( "Location: http://www.example.com" );
?> Is there anything for html pages? Other then Or is placing this code redirect 301 /old/old.htm http://www.you.com/new.php in the .htaccess the only way to properly 301 redirect html pages? Thanks!0