Improving Crawl Efficieny
-
Hi
I'm reading about crawl efficiency & have looked in WMT at the current crawl rate - letting Google optimise this as recommended.
What it's set to is 0.5 requests every 2 seconds, which is 15 URLs every minute.
To me this doesn't sound very good, especially for a site with over 20,000 pages at least?
I'm reading about improving this but if anyone has advice that would be great
-
Great thank you for this! I'll take them on board
Becky
-
You may be overthinking this, Becky. Once the bot has crawled a page, there's no reason (or benefit to you) for it to crawl the page again unless its content has changed. The usual way for it to detect this is through your xml sitemap,. If it's properly coded, it will have a <lastmod>date for Googlebot to reference.
Googlebot does continue to recrawl pages it already knows about "just in case", but your biggest focus should be on ensuring that your most recently added content is crawled quickly upon publishing. This is where making sure your sitemap is updating quickly and accurately, making sure it is pinging search engines on update, and making sure you have links from solid existing pages to the new content will help. If you have blog content many folks don't know that you can submit the blog's RSS feed as an additional sitemap! That's one of the quickest ways to get it noticed.
The other thing you can do to assist the crawling effectiveness is to make certain you're not forcing the crawler to waste its time crawling superfluous, duplicate, thin, or otherwise useless URLs.</lastmod>
Hope that helps?
Paul
-
There are actually several aspects to your question.
1. Google will make its own decision as to how important pages and therefore how often it should be crawled
2. Site speed is a ranking factor
3. Most SEO's belief that Google has a maximum timeframe in which to crawl each page/site. However, I have seen some chronically slow sites which have still crawl and indexed.
I forgot to mention about using an xml site map can help search engines find pages.
Again, be very careful not to confuse crawling and indexing. Crawling is only updating the index, once indexed if it doesn't rank you have another SEO problem, not a technical crawling problem.
Any think a user can access a crawler should be able to find it no problem, however if you have hidden pages the crawler may not find them.
-
Hi
Yes working on that
I just read something which said - A “scheduler” directs Googlebot to crawl the URLs in the priority order, under the constraints of the crawl budget. URLs are being added to the list and prioritized.
So, if you have pages which havent been crawled/indexed as they're seen as a low priority for crawling - how can I improve or change this if need be?
Can I even impact it at all? Can I help crawlers be more efficient at finding/crawling pages I want to rank or not?
Does any of this even help SEO?
-
As a general rule pages will be indexed unless there is a technical issue or a penalty involved.
What you need to be more concerned with is the position of those pages within the index. That obviously comes back to the whole SEO game.
You can use the site parameter followed by a search term that is present on the page you want to check to make sure the pages indexed, like: site:domain.com "page name"
-
Ok thank you, so there must be ways to improve on the number of pages Google indexes?
-
You can obviously do a fetch and submit through search console, but that is designed for one-off changes. Even if you submit pages and make all sorts of signals Google will still make up its own mind what it's going to do and when.
If your content isn't changing much it is probably a disadvantage to have the Google crawler coming back too often as it will slow the site down. If a page is changing regularly the Google bot will normally gobble it pretty quick.
If it was me I would let you let it make its own decision, unless it is causing your problem.
Also keep in mind that crawl and index are two separate kettles of fish, Google crawler will crawl every site and every page that it can find, but doesn't necessarily index.
-
Hi - yes it's the default.
I know we can't figure out exactly what Google is doing, but we can improve crawl efficiency.
If those pages aren't being crawled for weeks, isnt there a way to improve this? How have you found out they haven't been crawled for weeks?
-
P.S. I think the crawl rate setting you are referring to is the Google default if you move the radio button to manual
-
Google is very clever working out how often it needs to crawl your site, pages that get updated more often will get crawled more often. There is no way of influencing exactly what the Google bot does, mostly it will make its own decisions.
If you are talking about other web crawlers, you may need to put guidelines in place in terms of robots.txt or settings on the specific control panel.
20,000 pages to Google isn't a problem! Yes, it may take time. You say it is crawling at '0.5 requests every 2 seconds' - if I've got my calculation right in theory Google will have crawled 20,000 URLs in less than a day!
On my site I have a page which I updated about 2 hours ago, and the change has already replicated to Google, and yet other pages I know for a fact haven't been crawled for weeks.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Why does the order of the keywords affect my SERP? And what can I do to improve?
Hi all, So, if you google "london life coach" my site appears #2 (www.nickhatter.com) But if you google "life coach London" my SERP seems to fluctuate between #3 up to #6. If you google "life coach in London" my SERP is a solid #2/3. I don't get it all. Would someone care to explain? Also, if you have any tips on how I might improve the EAT of my website please do feel free to weigh in! Many thanks,
Intermediate & Advanced SEO | | NickHatster
Nick0 -
How to improve or fix my website ? for an unknown reason bad ranks on Google
Hello, I know that I need to have good quality backlinks, and I am getting backlinks sometime more sometime less. but that is not a problem at this moment i belive. So I am asking someone who can help me and check my website MajorDroid and tells me how can I improve or fix my problem if there is a problem. Because I am very sad when I see this for example: if you search for keyword: oukitel k3 review my website is at page 3, but at same time on page 1 there are only three relevant results, and all other results on page 1 are only partial like OUKITEL K3 Specs Confirmed.... and they are not relevant for users... because they are looking for review. I have the same problem with other keywords also where my page is on second page or third page of Google... and again some other websites with only partial keyword are on first page.... and google claims to offer relevant info for users... now i do not know how google looks my website, or where is the problem ? why quite often my website ranks on second page or third page although competition is not great, and google shows only several relevant results on first page... for example this is review for oukitel k3 I hope you can help me Thank you
Intermediate & Advanced SEO | | zgb0 -
Do CTR manipulation services actually work to improve rankings?
I've seen a variety of services on the fringe of the SEO world that send a flow of (fake) traffic to your website via Google, to drive up your SERP CTR and site engagement. Seems gray hat, but I'm curious as to whether it actually works. The latest data I've seen from trustworthy sources (example and example 2) seems mixed on whether CTR has a direct impact on search rankings. Google claims it doesn't. I think it's possible it directly impacts rankings, or its possible Google is using some other metric to reward high engagement pages and CTR correlates with that. Any insight on whether CTR manipulation services actually work?
Intermediate & Advanced SEO | | AdamThompson1 -
Crawled page count in Search console
Hi Guys, I'm working on a project (premium-hookahs.nl) where I stumble upon a situation I can’t address. Attached is a screenshot of the crawled pages in Search Console. History: Doing to technical difficulties this webshop didn’t always no index filterpages resulting in thousands of duplicated pages. In reality this webshops has less than 1000 individual pages. At this point we took the following steps to result this: Noindex filterpages. Exclude those filterspages in Search Console and robots.txt. Canonical the filterpages to the relevant categoriepages. This however didn’t result in Google crawling less pages. Although the implementation wasn’t always sound (technical problems during updates) I’m sure this setup has been the same for the last two weeks. Personally I expected a drop of crawled pages but they are still sky high. Can’t imagine Google visits this site 40 times a day. To complicate the situation: We’re running an experiment to gain positions on around 250 long term searches. A few filters will be indexed (size, color, number of hoses and flavors) and three of them can be combined. This results in around 250 extra pages. Meta titles, descriptions, h1 and texts are unique as well. Questions: - Excluding in robots.txt should result in Google not crawling those pages right? - Is this number of crawled pages normal for a website with around 1000 unique pages? - What am I missing? BxlESTT
Intermediate & Advanced SEO | | Bob_van_Biezen0 -
After Ranking Drop Continue SEO or Focus on Improving User Experience Instead?
Six months after starting a marketing campaign and spending a lot of money on SEO audits, link removals, wire frames, copywriting and coding my web site (www.nyc-officespace-leader.com) traffic dropped significantly after I launched a new version of my site in early June. Traffic is down about 27%, but most of the traffic from competitive terms is gone and the number of leads (phone calls, form completions) is off by about 70%. On june 6th an upgraded version of the site with mostly cosmetic changes (narrower header without social media buttons, streamlined conversion forms, new right rail was launched. No URLs were changed, and the text remained mostly the same. But somehow my developers botched up either canonical tags or Robot Text and 175 URLs with very little/no content were indexed by Google. At that point my ranking and traffic. A few days ago a request to remove those pages was made via Google WebmasterTools and now the number of pages indexed is down to 675 rather than the incorrect 850 from before. But ranking, traffic and lead generation have not yet recovered. After spending almost $25,000 over nine months this is rather frustrating. I might add the site has very few links from incoming domains and those links are not high quality. An SEO audit was performed in February and in April a link removal campaign occurred with about 30 domains agreeing to remove links and a disavow file being submitted for another 70-80 domains that would not agree to remove links. My SEO believes that we should focus on improving visitor engagement rather that on more esoteric SEO like trying to build incoming links. They think that improving useability will improve conversions and would generate results faster than traditional SEO. Also, they think that improving click through rates, reducing bounce rates will improve ranking by signaling to Google that the site is providing value to visitors. Does this sound like a reasonable approach? On one hand I don't see how my site with a MOZ domain authority could possibly compete against sites with a high number of quality incoming links and that maybe building a better link profile would yield faster results. On the other hand, it seems logical that Google would reward a site that creates a better user experience. Any thoughts from the MOZ community???? Does it sound like the recent loss of traffic is due to the indexing of the 175 pages? If so, when should my traffic and ranking return? Incidentally, these are the steps taken since last November to improve SEO: SEO Traffic & Ranking Drop Analysis and Recommendations (included in-depth SEO technical audit and recommendations). Unnatural Link Removal Program Content Optimization (Audit & Strategy with 20 page keyword matrix) CORE (also provided wireframe for /visitor-details pages at no-charge) SEO Copywriting for 10 pages New wire frames implemented on site on June 6th Jump in indexed pages by 175 on June 10th. Google Webmaster Tools removal request made for those low quality pages on June 23rd. Thanks, Alan
Intermediate & Advanced SEO | | Kingalan11 -
Does Mobile optimised site improve ranking and how to index it faster?
Hi i have several question with regards to mobile optimised site: Does having a mobile optimised site improve ranking in SERP? How can we push/index mobile optimised sites to users searching on mobile sites faster? e.g. returning m.abc.com or abc.com/m to users seraching on mobile earlier.
Intermediate & Advanced SEO | | FWSBIO0 -
Negative impact on crawling after upload robots.txt file on HTTPS pages
I experienced negative impact on crawling after upload robots.txt file on HTTPS pages. You can find out both URLs as follow. Robots.txt File for HTTP: http://www.vistastores.com/robots.txt Robots.txt File for HTTPS: https://www.vistastores.com/robots.txt I have disallowed all crawlers for HTTPS pages with following syntax. User-agent: *
Intermediate & Advanced SEO | | CommercePundit
Disallow: / Does it matter for that? If I have done any thing wrong so give me more idea to fix this issue.0 -
How to prevent Google from crawling our product filter?
Hi All, We have a crawler problem on one of our sites www.sneakerskoopjeonline.nl. On this site, visitors can specify criteria to filter available products. These filters are passed as http/get arguments. The number of possible filter urls is virtually limitless. In order to prevent duplicate content, or an insane amount of pages in the search indices, our software automatically adds noindex, nofollow and noarchive directives to these filter result pages. However, we’re unable to explain to crawlers (Google in particular) to ignore these urls. We’ve already changed the on page filter html to javascript, hoping this would cause the crawler to ignore it. However, it seems that Googlebot executes the javascript and crawls the generated urls anyway. What can we do to prevent Google from crawling all the filter options? Thanks in advance for the help. Kind regards, Gerwin
Intermediate & Advanced SEO | | footsteps0