Crawl efficiency - Page indexed after one minute!
-
Hey Guys,A site that has 5+ million pages indexed and 300 new pages a day.I hear a lot that sites at this level its all about efficient crawlabitliy.The pages of this site gets indexed one minute after the page is online.1) Does this mean that the site is already crawling efficient and there is not much else to do about it?2) By increasing crawlability efficiency, should I expect gogole to crawl my site less (less bandwith google takes from my site for the same amount of crawl)or to crawl my site more often?Thanks
-
This is a complicated question that I can't give a simple answer for, as every site is set-up differently and has it's own challenges. You will likely use a variety of the techniques mentioned in my last paragraph above. Good luck.
-
Thanks Anthony,
Your explanation was very helpful.
Assuming that 3 millions pages out of my 5 are not so important for google to be crawling or indexing.
What would be the best way to optimize my crawl efficiency in relation to the amount of pages?
Just <noindex>3 million pages on the site, I believe this can be a risk move.</noindex>
Perhaps robots.txt but that would not de-index the existing pages.
-
Crawl efficiency isn't exactly the same as indexation speed. It is normal for a new page to be indexed quickly, often times it is linked to from the blog home page, shared on social networks, etc.
Crawl efficiency has a lot to do with making sure your most important pages are crawled as frequently as possible. Let's use the example of your site with 5,000,000 pages indexed. Perhaps there are 100,000 of those pages that are extremely important for your website. Your top categories, all of your products, your content, etc.
Then you are left with 4,900,000 pages that are not that important, but needed for the functionality of your website (pagination, filtering, sorting, etc). You have to determine, is it a good thing that Google has 5 million pages of your site indexed? Do you want Google regularly crawling those 4,900,000 pages, potentially at the expense of your more important pages?
Next, you check your Google Webmaster Tools and see that Google is crawling about 130,000 pages/day on your site. At that rate, it would take Google 38 days (over an entire month) to crawl your entire site. Of course, it doesn't actually work that way - Google will crawl your site in a logical manor, crawling the pages with high authority (well linked to internally/externally) much more often. The point is, you can see that not all of your pages are being crawled every day. You want your best content crawled as frequently as possible.
"To be more blunt, if a page hasn't been crawled recently, it won't rank well." This quote is taken from one of my favorite resources on this topic, is this post by AJ Kohn. http://www.blindfiveyearold.com/crawl-optimization
Crawl efficiency is guiding the search spiders to your best content and helping them learn what types of pages you can ignore. You do this primarily through: Site Structure, Internal Linking, robots.txt, NoFollow attribute and Parameter Handling in Google Webmaster Tools.
-
You can actually let Google know about a new mass of pages through the sitemap. The sitemap is a single file what can be parsed to produce a large list of links.
Google can discover new pages by comparing the list of links with what they know about.
Here's an intro link that covers the sitemap: http://blog.kissmetrics.com/get-google-to-index/
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Two high ranking pages instantly dropped from index - no manual penalty notification
We are facing an issue where two of our major rankings pages have just completely disappeared from search results. This has happened in the last 24 - 48 hours and there has been no changes made to the site. From what we can tell, it's only impacted two pages (but two very important category pages). I have double and triple checked all standard indexing protocols - Search Console URL inspection says the pages are fine to crawl and index. URLs have been requested to re-index but nothing has worked. This would have me to believe it could be a manual action yet there are no notifications in Search Console and we are listed as 'No issues detected' in all versions of our web property. Can anyone else think what could be the reason?
Intermediate & Advanced SEO | | Vuly0 -
Different content on pages with the same URL--except one is at www and the other at www2
Hi! I have two pages with unique content on each. However, they have virtually the same URL--except one is a www and the other is a www2. As far as I know, both pages were meant to gain organic traction. How should this situation be handled for SEO purposes? Thanks for any help! ---Ivey
Intermediate & Advanced SEO | | Nichiha0 -
Duplicate Page getting indexed and not the main page!
Main Page: www.domain.com/service
Intermediate & Advanced SEO | | Ishrat-Khan
Duplicate Page: www.domain.com/products-handler.php/?cat=service 1. My page was getting indexed properly in 2015 as: www.domain.com/service
2. Redesigning done in Aug 2016, a new URL pattern surfaced for my pages with parameter "products-handler"
3. One of my product landing pages had got 301-permanent redirected on the "products-handler" page
MAIN PAGE: www.domain.com/service GETTING REDIRECTED TO: www.domain.com/products-handler.php/?cat=service
4. This redirection was appearing until Nov 2016.
5. I took over the website in 2017, the main page was getting indexed and deindexed on and off.
6. This June it suddenly started showing an index of this page "domain.com/products-handler.php/?cat=service"
7. These "products-handler.php" pages were creating sitewide internal duplicacy, hence I blocked them in robots.
8. Then my page (Main Page: www.domain.com/service) got totally off the Google index Q1) What could be the possible reasons for the creation of these pages?
Q2) How can 301 get placed from main to duplicate URL?
Q3) When I have submitted my main URL multiple times in Search Console, why it doesn't get indexed?
Q4) How can I make Google understand that these URLs are not my preferred URLs?
Q5) How can I permanently remove these (products-handler.php) URLs? All the suggestions and discussions are welcome! Thanks in advance! 🙂0 -
How I can improve my website On page and Off page
My Website is guitarcontrol.com, I have very strong competition in market. Please advice me the list of improvements on my websites. In regarding ON page, Linkbuiding and Social media. What I can do to improve my website ranking?
Intermediate & Advanced SEO | | zoe.wilson170 -
How does the crawl find duplicate pages that don't exist on the site?
It looks like I have a lot of duplicate pages which are essentially the same url with some extra ? parameters added eg: http://www.merlin.org.uk/10-facts-about-malnutrition http://www.merlin.org.uk/10-facts-about-malnutrition?page=1 http://www.merlin.org.uk/10-facts-about-malnutrition?page=2 These extra 2 pages (and there's loads of pages this happens to) are a mystery to me. Not sure why they exist as there's only 1 page. Is this a massive issue? It's built on Drupal so I wonder if it auto generates these pages for some reason? Any help MUCH appreciated. Thanks
Intermediate & Advanced SEO | | Deniz0 -
How long does google take to show the results in SERP once the pages are indexed ?
Hi...I am a newbie & trying to optimize the website www.peprismine.com. I have 3 questions - A little background about this : Initially, close to 150 pages were indexed by google. However, we decided to remove close to 100 URLs (as they were quite similar). After the changes, we submitted the NEW sitemap (with close to 50 pages) & google has indexed those URLs in sitemap. 1. My pages were indexed by google few days back. How long does google take to display the URL in SERP once the pages get indexed ? 2. Does google give more preference to websites with more number of pages than those with lesser number of pages to display results in SERP (I have just 50 pages). Does the NUMBER of pages really matter ? 3. Does removal / change of URLs have any negative effect on ranking ? (Many of these URLs were not shown on the 1st page) An answer from SEO experts will be highly appreciated. Thnx !
Intermediate & Advanced SEO | | PepMozBot0 -
On Page vs Off Page - Which Has a Greater Effect on Rankings?
Hi Mozzers, My site will be migrating to a new domain soon, and I am not sure how to spend my time. Should I be optimizing our content for keywords, improving internal linking, and writing new content - or should I be doing link building for our current domain (or the new one)? Is there a certain ratio that determines rankings which can help me prioritize these to-dos?, such as 70:30 in favor of link-building? Thanks for any help you can offer!
Intermediate & Advanced SEO | | Travis-W0 -
Do in page links pointing to the parent page make the page more relevant for that term?
Here's a technical question. Suppose I have a page relevant to the term "Mobile Phones". I have a piece of text, on that page talking about "mobile phones", and within that text is the term "cell phones". Now if I link the text "cell phones", to the page it is already placed on (ie the parent page) - will the page gain more relevancy for the term "cell phones"?? Thanks
Intermediate & Advanced SEO | | James770