Crawl efficiency - Page indexed after one minute!

Mr.bfz

Hey Guys,A site that has 5+ million pages indexed and 300 new pages a day.I hear a lot that sites at this level its all about efficient crawlabitliy.The pages of this site gets indexed one minute after the page is online.1) Does this mean that the site is already crawling efficient and there is not much else to do about it?2) By increasing crawlability efficiency, should I expect gogole to crawl my site less (less bandwith google takes from my site for the same amount of crawl)or to crawl my site more often?Thanks

anthonydnelson

This is a complicated question that I can't give a simple answer for, as every site is set-up differently and has it's own challenges. You will likely use a variety of the techniques mentioned in my last paragraph above. Good luck.

Mr.bfz

Thanks Anthony,

Your explanation was very helpful.

Assuming that 3 millions pages out of my 5 are not so important for google to be crawling or indexing.

What would be the best way to optimize my crawl efficiency in relation to the amount of pages?

Just <noindex>3 million pages on the site, I believe this can be a risk move.</noindex>

Perhaps robots.txt but that would not de-index the existing pages.

anthonydnelson

Crawl efficiency isn't exactly the same as indexation speed. It is normal for a new page to be indexed quickly, often times it is linked to from the blog home page, shared on social networks, etc.

Crawl efficiency has a lot to do with making sure your most important pages are crawled as frequently as possible. Let's use the example of your site with 5,000,000 pages indexed. Perhaps there are 100,000 of those pages that are extremely important for your website. Your top categories, all of your products, your content, etc.

Then you are left with 4,900,000 pages that are not that important, but needed for the functionality of your website (pagination, filtering, sorting, etc). You have to determine, is it a good thing that Google has 5 million pages of your site indexed? Do you want Google regularly crawling those 4,900,000 pages, potentially at the expense of your more important pages?

Next, you check your Google Webmaster Tools and see that Google is crawling about 130,000 pages/day on your site. At that rate, it would take Google 38 days (over an entire month) to crawl your entire site. Of course, it doesn't actually work that way - Google will crawl your site in a logical manor, crawling the pages with high authority (well linked to internally/externally) much more often. The point is, you can see that not all of your pages are being crawled every day. You want your best content crawled as frequently as possible.

"To be more blunt, if a page hasn't been crawled recently, it won't rank well." This quote is taken from one of my favorite resources on this topic, is this post by AJ Kohn. http://www.blindfiveyearold.com/crawl-optimization

Crawl efficiency is guiding the search spiders to your best content and helping them learn what types of pages you can ignore. You do this primarily through: Site Structure, Internal Linking, robots.txt, NoFollow attribute and Parameter Handling in Google Webmaster Tools.

AndrewAtMGXCopy

You can actually let Google know about a new mass of pages through the sitemap. The sitemap is a single file what can be parsed to produce a large list of links.

Google can discover new pages by comparing the list of links with what they know about.

Here's an intro link that covers the sitemap: http://blog.kissmetrics.com/get-google-to-index/

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

Crawl efficiency - Page indexed after one minute!

Got a burning SEO question?

Browse Questions

Explore more categories

Related Questions

Can a duplicate page referencing the original page on another domain in another country using the 'canonical link' still get indexed locally?

Crawling/indexing of near duplicate product pages

Multiple hreflang tags pointing to one page from the same country

How to 301 Redirect /page.php to /page, after a RewriteRule has already made /page.php accessible by /page (Getting errors)

After Server Migration - Crawling Gets slow and Dynamic Pages wherein Content changes are not getting Updated

Pagination on a product page with reviews spread out on multiple pages

How to remove wrong crawled domain from Google index

How to stop pages being crawled from xml feed?