Improving Crawl Efficieny
-
Hi
I'm reading about crawl efficiency & have looked in WMT at the current crawl rate - letting Google optimise this as recommended.
What it's set to is 0.5 requests every 2 seconds, which is 15 URLs every minute.
To me this doesn't sound very good, especially for a site with over 20,000 pages at least?
I'm reading about improving this but if anyone has advice that would be great
-
Great thank you for this! I'll take them on board
Becky
-
You may be overthinking this, Becky. Once the bot has crawled a page, there's no reason (or benefit to you) for it to crawl the page again unless its content has changed. The usual way for it to detect this is through your xml sitemap,. If it's properly coded, it will have a <lastmod>date for Googlebot to reference.
Googlebot does continue to recrawl pages it already knows about "just in case", but your biggest focus should be on ensuring that your most recently added content is crawled quickly upon publishing. This is where making sure your sitemap is updating quickly and accurately, making sure it is pinging search engines on update, and making sure you have links from solid existing pages to the new content will help. If you have blog content many folks don't know that you can submit the blog's RSS feed as an additional sitemap! That's one of the quickest ways to get it noticed.
The other thing you can do to assist the crawling effectiveness is to make certain you're not forcing the crawler to waste its time crawling superfluous, duplicate, thin, or otherwise useless URLs.</lastmod>
Hope that helps?
Paul
-
There are actually several aspects to your question.
1. Google will make its own decision as to how important pages and therefore how often it should be crawled
2. Site speed is a ranking factor
3. Most SEO's belief that Google has a maximum timeframe in which to crawl each page/site. However, I have seen some chronically slow sites which have still crawl and indexed.
I forgot to mention about using an xml site map can help search engines find pages.
Again, be very careful not to confuse crawling and indexing. Crawling is only updating the index, once indexed if it doesn't rank you have another SEO problem, not a technical crawling problem.
Any think a user can access a crawler should be able to find it no problem, however if you have hidden pages the crawler may not find them.
-
Hi
Yes working on that
I just read something which said - A “scheduler” directs Googlebot to crawl the URLs in the priority order, under the constraints of the crawl budget. URLs are being added to the list and prioritized.
So, if you have pages which havent been crawled/indexed as they're seen as a low priority for crawling - how can I improve or change this if need be?
Can I even impact it at all? Can I help crawlers be more efficient at finding/crawling pages I want to rank or not?
Does any of this even help SEO?
-
As a general rule pages will be indexed unless there is a technical issue or a penalty involved.
What you need to be more concerned with is the position of those pages within the index. That obviously comes back to the whole SEO game.
You can use the site parameter followed by a search term that is present on the page you want to check to make sure the pages indexed, like: site:domain.com "page name"
-
Ok thank you, so there must be ways to improve on the number of pages Google indexes?
-
You can obviously do a fetch and submit through search console, but that is designed for one-off changes. Even if you submit pages and make all sorts of signals Google will still make up its own mind what it's going to do and when.
If your content isn't changing much it is probably a disadvantage to have the Google crawler coming back too often as it will slow the site down. If a page is changing regularly the Google bot will normally gobble it pretty quick.
If it was me I would let you let it make its own decision, unless it is causing your problem.
Also keep in mind that crawl and index are two separate kettles of fish, Google crawler will crawl every site and every page that it can find, but doesn't necessarily index.
-
Hi - yes it's the default.
I know we can't figure out exactly what Google is doing, but we can improve crawl efficiency.
If those pages aren't being crawled for weeks, isnt there a way to improve this? How have you found out they haven't been crawled for weeks?
-
P.S. I think the crawl rate setting you are referring to is the Google default if you move the radio button to manual
-
Google is very clever working out how often it needs to crawl your site, pages that get updated more often will get crawled more often. There is no way of influencing exactly what the Google bot does, mostly it will make its own decisions.
If you are talking about other web crawlers, you may need to put guidelines in place in terms of robots.txt or settings on the specific control panel.
20,000 pages to Google isn't a problem! Yes, it may take time. You say it is crawling at '0.5 requests every 2 seconds' - if I've got my calculation right in theory Google will have crawled 20,000 URLs in less than a day!
On my site I have a page which I updated about 2 hours ago, and the change has already replicated to Google, and yet other pages I know for a fact haven't been crawled for weeks.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
What happens to crawled URLs subsequently blocked by robots.txt?
We have a very large store with 278,146 individual product pages. Since these are all various sizes and packaging quantities of less than 200 product categories my feeling is that Google would be better off making sure our category pages are indexed. I would like to block all product pages via robots.txt until we are sure all category pages are indexed, then unblock them. Our product pages rarely change, no ratings or product reviews so there is little reason for a search engine to revisit a product page. The sales team is afraid blocking a previously indexed product page will result in in it being removed from the Google index and would prefer to submit the categories by hand, 10 per day via requested crawling. Which is the better practice?
Intermediate & Advanced SEO | | AspenFasteners1 -
Sub Directories Domain & Page Crawl Depth
Hi, I just bought an old domain with good backlinks and authority, that domain was technology product formerly. So, I want to make this domain for my money site. The purpose of this website is to serve technological information like WordPress tutorial and etc (free software or drivers). And I just installed a sub directory on this domain like https://maindomain.com/subdirectory/ and this directory I made for a free software like graphics drivers download (NVIDIA or AMD). What you think with this website? Is it make sense? Wait, I just added this domain to my campaign at MOZ and the result shown my sub directory was 6 times of crawl depth. Is it good for directory or I need to move the sub directory to my main site? Thank you, hope someone answer my confuse. Best Regard, Matthew.
Intermediate & Advanced SEO | | matthewparkman0 -
How is Google crawling and indexing this directory listing?
We have three Directory Listing pages that are being indexed by Google: http://www.ccisolutions.com/StoreFront/jsp/ http://www.ccisolutions.com/StoreFront/jsp/html/ http://www.ccisolutions.com/StoreFront/jsp/pdf/ How and why is Googlebot crawling and indexing these pages? Nothing else links to them (although the /jsp.html/ and /jsp/pdf/ both link back to /jsp/). They aren't disallowed in our robots.txt file and I understand that this could be why. If we add them to our robots.txt file and disallow, will this prevent Googlebot from crawling and indexing those Directory Listing pages without prohibiting them from crawling and indexing the content that resides there which is used to populate pages on our site? Having these pages indexed in Google is causing a myriad of issues, not the least of which is duplicate content. For example, this file <tt>CCI-SALES-STAFF.HTML</tt> (which appears on this Directory Listing referenced above - http://www.ccisolutions.com/StoreFront/jsp/html/) clicks through to this Web page: http://www.ccisolutions.com/StoreFront/jsp/html/CCI-SALES-STAFF.HTML This page is indexed in Google and we don't want it to be. But so is the actual page where we intended the content contained in that file to display: http://www.ccisolutions.com/StoreFront/category/meet-our-sales-staff As you can see, this results in duplicate content problems. Is there a way to disallow Googlebot from crawling that Directory Listing page, and, provided that we have this URL in our sitemap: http://www.ccisolutions.com/StoreFront/category/meet-our-sales-staff, solve the duplicate content issue as a result? For example: Disallow: /StoreFront/jsp/ Disallow: /StoreFront/jsp/html/ Disallow: /StoreFront/jsp/pdf/ Can we do this without risking blocking Googlebot from content we do want crawled and indexed? Many thanks in advance for any and all help on this one!
Intermediate & Advanced SEO | | danatanseo0 -
Should you give all the posts in a Forum an unique description? Or let it empty so Google can make one with the crawled keywords .... ...
To make all descriptions for all forum posts unique is a hell of a job.... One option is to crawl the first 165 characters and turn these automaticly into the meta description of the page.
Intermediate & Advanced SEO | | Zanox
If Google thinks the meta description is not suitable for the search query, Google will make a own description. In this case all te meta descriptions are unique, like the Google Guidlines want you to do. How will Google think off the fact when we delete the meta description tag so Google will make all the descriptions by herself?0 -
Should I let Google crawl my production server if the site is still under development?
I am building out a brand new site. It's built on Wordpress so I've been tinkering with the themes and plug-ins on the production server. To my surprise, less than a week after installing Wordpress, I have pages in the index. I've seen advice in this forum about blocking search bots from dev servers to prevent duplicate content, but this is my production server so it seems like a bad idea. Any advice on the best way to proceed? Block or no block? Or something else? (I know how to block, so I'm not looking for instructions). We're around 3 months from officially launching (possibly less). We'll start to have real content on the site some time in June, even though we aren't planning to launch. We should have a development environment ready in the next couple of weeks. Thanks!
Intermediate & Advanced SEO | | DoItHappy0 -
Webmaster Tools: Total Indexed VS Ever Crawled
Ok, In WMT's under health > index status I have both total indexed and ever crawled ticked - It also looks like the data is broken up weekly. As an example say you have the following: Total Indexed: 1000 Ever Crawled: 5000 What is this say? It found 5000 pages but only indexed 1000 (20%). Thanks
Intermediate & Advanced SEO | | Bondara0 -
What can I do with my PRO subscription to improve my SEO?
I use WordPress on several sites, I use many SEO plugins... I joined this PRO membership, and I used seo site explorer to find the backlinks to my sites. Now is there anything else I can do in Seo Moz to improve my sites? Maybe something Im missing? How else can I use my seo moz membership to improve my sites SEO ? Thanks.
Intermediate & Advanced SEO | | BloggerGuy0 -
Improve change my Meta Description shows in SERP
I feel my meta descriptions are descriptive and fairly represent the info on each page of my site. However, Google frequently includes this "20+ items" in front of the snippet. I run a job site and each page list 20 jobs. What if I include a bit of coding in the Meta Description to include "Latest Jobs Posted TODAY's DATE" - since the jobs listed on the page will include a date. On each page there is also option to "Create Email Alert" and "save Jobs" maybe I should include writing about that as well? I have read all Google's documents on the importance of making Meta Des relevant for the page etc, so any good insight how increase my chances of getting the meta des displayed in the SERP would be appreciated. thank you, Kristian
Intermediate & Advanced SEO | | knielsen0