Improving Crawl Efficieny
-
Hi
I'm reading about crawl efficiency & have looked in WMT at the current crawl rate - letting Google optimise this as recommended.
What it's set to is 0.5 requests every 2 seconds, which is 15 URLs every minute.
To me this doesn't sound very good, especially for a site with over 20,000 pages at least?
I'm reading about improving this but if anyone has advice that would be great
-
Great thank you for this! I'll take them on board
Becky
-
You may be overthinking this, Becky. Once the bot has crawled a page, there's no reason (or benefit to you) for it to crawl the page again unless its content has changed. The usual way for it to detect this is through your xml sitemap,. If it's properly coded, it will have a <lastmod>date for Googlebot to reference.
Googlebot does continue to recrawl pages it already knows about "just in case", but your biggest focus should be on ensuring that your most recently added content is crawled quickly upon publishing. This is where making sure your sitemap is updating quickly and accurately, making sure it is pinging search engines on update, and making sure you have links from solid existing pages to the new content will help. If you have blog content many folks don't know that you can submit the blog's RSS feed as an additional sitemap! That's one of the quickest ways to get it noticed.
The other thing you can do to assist the crawling effectiveness is to make certain you're not forcing the crawler to waste its time crawling superfluous, duplicate, thin, or otherwise useless URLs.</lastmod>
Hope that helps?
Paul
-
There are actually several aspects to your question.
1. Google will make its own decision as to how important pages and therefore how often it should be crawled
2. Site speed is a ranking factor
3. Most SEO's belief that Google has a maximum timeframe in which to crawl each page/site. However, I have seen some chronically slow sites which have still crawl and indexed.
I forgot to mention about using an xml site map can help search engines find pages.
Again, be very careful not to confuse crawling and indexing. Crawling is only updating the index, once indexed if it doesn't rank you have another SEO problem, not a technical crawling problem.
Any think a user can access a crawler should be able to find it no problem, however if you have hidden pages the crawler may not find them.
-
Hi
Yes working on that
I just read something which said - A “scheduler” directs Googlebot to crawl the URLs in the priority order, under the constraints of the crawl budget. URLs are being added to the list and prioritized.
So, if you have pages which havent been crawled/indexed as they're seen as a low priority for crawling - how can I improve or change this if need be?
Can I even impact it at all? Can I help crawlers be more efficient at finding/crawling pages I want to rank or not?
Does any of this even help SEO?
-
As a general rule pages will be indexed unless there is a technical issue or a penalty involved.
What you need to be more concerned with is the position of those pages within the index. That obviously comes back to the whole SEO game.
You can use the site parameter followed by a search term that is present on the page you want to check to make sure the pages indexed, like: site:domain.com "page name"
-
Ok thank you, so there must be ways to improve on the number of pages Google indexes?
-
You can obviously do a fetch and submit through search console, but that is designed for one-off changes. Even if you submit pages and make all sorts of signals Google will still make up its own mind what it's going to do and when.
If your content isn't changing much it is probably a disadvantage to have the Google crawler coming back too often as it will slow the site down. If a page is changing regularly the Google bot will normally gobble it pretty quick.
If it was me I would let you let it make its own decision, unless it is causing your problem.
Also keep in mind that crawl and index are two separate kettles of fish, Google crawler will crawl every site and every page that it can find, but doesn't necessarily index.
-
Hi - yes it's the default.
I know we can't figure out exactly what Google is doing, but we can improve crawl efficiency.
If those pages aren't being crawled for weeks, isnt there a way to improve this? How have you found out they haven't been crawled for weeks?
-
P.S. I think the crawl rate setting you are referring to is the Google default if you move the radio button to manual
-
Google is very clever working out how often it needs to crawl your site, pages that get updated more often will get crawled more often. There is no way of influencing exactly what the Google bot does, mostly it will make its own decisions.
If you are talking about other web crawlers, you may need to put guidelines in place in terms of robots.txt or settings on the specific control panel.
20,000 pages to Google isn't a problem! Yes, it may take time. You say it is crawling at '0.5 requests every 2 seconds' - if I've got my calculation right in theory Google will have crawled 20,000 URLs in less than a day!
On my site I have a page which I updated about 2 hours ago, and the change has already replicated to Google, and yet other pages I know for a fact haven't been crawled for weeks.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Interest in optimise Google Crawl
Hello, I have an ecommerce site with all pages crawled and indexed by Google. But I have some pages with multiple urls like : www.sitename.com/product-name.html and www.sitename.com/category/product-name.html There is a canonical on all these pages linking to the simplest url (so Google index only one page). So the multiple pages are not indexed, but Google still comes crawling them. My question is : Did I have any interest in avoiding Google to crawl these pages or not ? My point is that Google crawl around 1500 pages a day on my site, but there are only 800 real pages and they are all indexed on Google. There is no particular issue, so is it interesting to make it change ? Thanks
Intermediate & Advanced SEO | | onibi290 -
Worth Improving HTML Sort Order?
Our developer has suggested that we alter our HTML so the important content appears at the very top of the source code and Google can index our pages more efficiently. Is this a worthwhile improvement in terms of improving ranking? Our developer describes the improvement in this manner: sort-order of the important content inside the code, so we may have similar text code ratio at the end but the important code we need Google to index will be at the very top in the source code, in terms of a very technical approach Google will find the key content faster and that should help to improve the crawling process as search engines read HTML code linearly. This change do not necessarily will affect the HTML, we can achieve it by using style sheet (CSS code) instead, reducing the chance of major BUGs. Thanks, Alan
Intermediate & Advanced SEO | | Kingalan10 -
How can I make sure Google is crawling a link from an iframe (video)?
Do they crawl backlinks from an iframe example from a Youtube video embedded in a blog post? TIA!
Intermediate & Advanced SEO | | zpm20140 -
After Ranking Drop Continue SEO or Focus on Improving User Experience Instead?
Six months after starting a marketing campaign and spending a lot of money on SEO audits, link removals, wire frames, copywriting and coding my web site (www.nyc-officespace-leader.com) traffic dropped significantly after I launched a new version of my site in early June. Traffic is down about 27%, but most of the traffic from competitive terms is gone and the number of leads (phone calls, form completions) is off by about 70%. On june 6th an upgraded version of the site with mostly cosmetic changes (narrower header without social media buttons, streamlined conversion forms, new right rail was launched. No URLs were changed, and the text remained mostly the same. But somehow my developers botched up either canonical tags or Robot Text and 175 URLs with very little/no content were indexed by Google. At that point my ranking and traffic. A few days ago a request to remove those pages was made via Google WebmasterTools and now the number of pages indexed is down to 675 rather than the incorrect 850 from before. But ranking, traffic and lead generation have not yet recovered. After spending almost $25,000 over nine months this is rather frustrating. I might add the site has very few links from incoming domains and those links are not high quality. An SEO audit was performed in February and in April a link removal campaign occurred with about 30 domains agreeing to remove links and a disavow file being submitted for another 70-80 domains that would not agree to remove links. My SEO believes that we should focus on improving visitor engagement rather that on more esoteric SEO like trying to build incoming links. They think that improving useability will improve conversions and would generate results faster than traditional SEO. Also, they think that improving click through rates, reducing bounce rates will improve ranking by signaling to Google that the site is providing value to visitors. Does this sound like a reasonable approach? On one hand I don't see how my site with a MOZ domain authority could possibly compete against sites with a high number of quality incoming links and that maybe building a better link profile would yield faster results. On the other hand, it seems logical that Google would reward a site that creates a better user experience. Any thoughts from the MOZ community???? Does it sound like the recent loss of traffic is due to the indexing of the 175 pages? If so, when should my traffic and ranking return? Incidentally, these are the steps taken since last November to improve SEO: SEO Traffic & Ranking Drop Analysis and Recommendations (included in-depth SEO technical audit and recommendations). Unnatural Link Removal Program Content Optimization (Audit & Strategy with 20 page keyword matrix) CORE (also provided wireframe for /visitor-details pages at no-charge) SEO Copywriting for 10 pages New wire frames implemented on site on June 6th Jump in indexed pages by 175 on June 10th. Google Webmaster Tools removal request made for those low quality pages on June 23rd. Thanks, Alan
Intermediate & Advanced SEO | | Kingalan11 -
Implementation of AJAX Crawling Specifications
My URL is: http://www.redfin.com/TX/Austin/8413-Navidad-Dr-78735/home/31224372 We're using Google's AJAX crawling system, per the documentation here. https://developers.google.com/webmasters/ajax-crawling/The example page above requires JavaScript to display content; it includes in the source. We have a lot of pages like this on our site.We expect Google to query us at this URL:http://www.redfin.com/TX/Austin/8413-Navidad-Dr-78735/home/31224372?escaped_fragment=This page renders correctly with JavaScript disabled.Are we doing this correctly? There are some small differences between the escaped_fragment HTML snapshot and the JavaScript-generated content. Will this cause any problems for us?We ask because there was a period of about two months (from October 4th to Dec 29th) during which Google's crawler radically decreased the hits to our escaped_fragment URLs; it's maybe recovering now, but maybe it isn't, and I wanted to be absolutely sure we're doing this correctly.
Intermediate & Advanced SEO | | RyanOD0 -
Why isnt my crawl results showing a 301 redirect even though I have a 301 rewrite in my .htaccess file?
Ive searched the previous Q&A's & cant find an answer so I;ll ask it here 🙂 crawling my site shows isnt the 301 redirect that i have from my non www to my www domainIts only showing all the results for my www subdomain.As i'm new to SEO & SeoMoz I dont fully understand. Any help would be greatly appreciated because my site is like 2 & a half years old & i'm trying to learn seo so I can rank higher in the serp's. Thanks
Intermediate & Advanced SEO | | PCTechGuy20120 -
Negative impact on crawling after upload robots.txt file on HTTPS pages
I experienced negative impact on crawling after upload robots.txt file on HTTPS pages. You can find out both URLs as follow. Robots.txt File for HTTP: http://www.vistastores.com/robots.txt Robots.txt File for HTTPS: https://www.vistastores.com/robots.txt I have disallowed all crawlers for HTTPS pages with following syntax. User-agent: *
Intermediate & Advanced SEO | | CommercePundit
Disallow: / Does it matter for that? If I have done any thing wrong so give me more idea to fix this issue.0 -
Old pages still crawled by SE returning 404s. Better to put 301 or block with robots.txt ?
Hello guys, A client of ours has thousand of pages returning 404 visibile on googl webmaster tools. These are all old pages which don't exist anymore but Google keeps on detecting them. These pages belong to sections of the site which don't exist anymore. They are not linked externally and didn't provide much value even when they existed What do u suggest us to do: (a) do nothing (b) redirect all these URL/folders to the homepage through a 301 (c) block these pages through the robots.txt. Are we inappropriately using part of the crawling budget set by Search Engines by not doing anything ? thx
Intermediate & Advanced SEO | | H-FARM0