Crawl budget
-
I am a believer in this concept, showing google less pages will increase their importance.
here is my question: I manage a website with millions of pages, high organic traffic (lower than before).
I do believe that too many pages are crawled. there are pages that I do not need google to crawl and followed. noindex follow does not save on the mentioned crawl budget. deleting those pages is not possible. any advice will be appreciated. If I disallow those pages I am missing on pages that help my important pages.
-
I just wrote a better reply so forgive me the file was deleted by hit post. But I do think this information will help you get acquainted much better with a crawl budget
One thing that may not incorporate your domain however is that is not centered a external link is a content delivery or CDN URL that is not rewritten to the traditional CDN.example.com Content delivery network URLs can look like
- https://example.cachefly.net/images/i-love-cats.jpg
- https://example.global.prod.fastly.net/images/i-love-cats-more.png
So you may see URLs like one shown here should be considered part of your domain and treated like a domain even if they look like this but these are just two examples of literally thousands CDN variants.
Examples for this would be the cost of incorporating your hostname to a HTTPS encrypted content delivery network can be very expensive.
Crawl budget information
- http://www.stateofdigital.com/guide-crawling-indexing-ranking/
- https://www.deepcrawl.com/case-studies/elephate-fixing-deep-indexation-issues/
- https://builtvisible.com/log-file-analysis/
- https://www.deepcrawl.com/knowledge/best-practice/the-seo-ruler/ ( image below)
Tools for Crawl budget
I hope is more helpful,
Thomas
-
Nope, having more or less external links will have no effect on the crawl budget spent on your site. Your crawl budget is only spent on yourdomain.com. I just meant that, Google's crawler will follow external links, but it won't until it's spent its crawl budget on your site.
Google isn't going to give you a metric showing your crawl budget, but you can assume how much it is by going to Google Search Console > Crawl > Crawl Stats. That'll show you how many pages Googlebot has crawled per day.
-
I see, thank you.
I'm guessing if we have these external links across the site, it'll cause more harm than good for us if they use up some crawl budget on every page?
Is there anyway to find out what the crawl budget is?
-
To be clear, Googlebot will find those external links and leave your site, but not until they've used their crawl budget up on your site.
-
Nope.
-
I'd like to find out if this crawl budget applies to external sites - we link to sister companies in the footer.
Will Google bot find these external links and leave our site to go and crawl these external sites?
Thanks!
-
Using Google's parameter tools you can also reduce crawl budget issues.
https://www.google.com/webmasters/tools/crawl-url-parameters
http://www.blindfiveyearold.com/crawl-optimization
http://searchenginewatch.com/sew/news/2064349/crawl-index-rank-repeat-a-tactical-seo-framework
http://searchengineland.com/how-i-think-crawl-budget-works-sort-of-59768
-
What's the background of your conclusion? What's the logic?
I am asking because my understanding about crawl budget is that if you waste it you risk to 1) slow down recrawl frequency on a per page basis and 2) risk google crawler gives up crawling the website before to have crawled all pages.
Are you splitting your sitemap into sub sitemaps? That's a good way to spot groups/categories of pages being ignored by google crawler.
-
If you want to test this, identify a section of your site that you can disallow via robots.txt and then measure the corresponding changes. Proceed section by section based on results. There are so many variables at play that I don't think you'll get an answer that's anywhere as precise as testing your specific situation.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
My last site crawl shows over 700 404 errors all with void(0 added to the ends of my posts/pages.
Hello, My last site crawl shows over 700 404 errors all with void(0 added to the ends of my posts/pages. I have contacted my theme company but not sure what could have done this. Any ideas? The original posts/pages are still correct and working it just looks like it did duplicates and added void(0 to the end of each post/page. Questions: There is no way to undo this correct? Do I have to do a redirect on each of these? Will this hurt my rankings and domain authority? Any suggestions would be appreciated. Thanks, Wade
Intermediate & Advanced SEO | | neverenoughmusic.com0 -
Lazy Loading of Blog Posts and Crawl Depths
Hi Moz Fans, We are looking at our blog and improving the content as much as we can for SEO purposes, but we have hit a bit of a blank in terms of lazy loading implications and issues with crawl depths. We introduced lazy loading onto the blog home page to increase site speed initially and it works well with infinite scroll, but we were wondering whether this would cause any issues regarding SEO. A lot of the resources online seem to be conflicting and some are very outdated, so some clarification on what is best in terms of lazy loading and crawl depths for blogs, would be fantastic! I hope someone can help and give us some up to date insights - If you need anymore information, I'll reply ASAP
Intermediate & Advanced SEO | | Victoria_0 -
Why Did My Google Crawls Hit A Wall?
Hello, One my the sites I work with, http://www.oransi.com, has seen a significant decrease in crawl Googlebot activity in the last 90 days. See screenshot. This decrease in crawl stats runs in conjunction with less Kb downloaded per day & an increase in how much time it took Google to download a page. The client did just go through a redesign, however that happened on 4/16/15, which was after the decrease in Googlebot activity, so that should not be the issue. Same could be said for the mobilegeddan algorithm change. Any help would be greatly appreciated. 5u1lM6B
Intermediate & Advanced SEO | | BrandLabs0 -
Difference in Number of URLS in "Crawl, Sitemaps" & "Index Status" in Webmaster Tools, NORMAL?
Greetings MOZ Community: Webmaster Tools under "Index Status" shows 850 URLs indexed for our website (www.nyc-officespace-leader.com). The number of URLs indexed jumped by around 175 around June 10th, shortly after we launched a new version of our website. No new URLs were added to the site upgrade. Under Webmaster Tools under "Crawl, Site maps", it shows 637 pages submitted and 599 indexed. Prior to June 6th there was not a significant difference in the number of pages shown between the "Index Status" and "Crawl. Site Maps". Now there is a differential of 175. The 850 URLs in "Index Status" is equal to the number of URLs in the MOZ domain crawl report I ran yesterday. Since this differential developed, ranking has declined sharply. Perhaps I am hit by the new version of Panda, but Google indexing junk pages (if that is in fact happening) could have something to do with it. Is this differential between the number of URLs shown in "Index Status" and "Crawl, Sitemaps" normal? I am attaching Images of the two screens from Webmaster Tools as well as the MOZ crawl to illustrate what has occurred. My developer seems stumped by this. He has submitted a removal request for the 175 URLs to Google, but they remain in the index. Any suggestions? Thanks,
Intermediate & Advanced SEO | | Kingalan1
Alan0 -
Meta No INDEX and Robots - Optimizing Crawl Budget
Hi, Sometime ago, a few thousand pages got into Google's index - they were "product pop up" pages, exact duplicates of the actual product page but a "quick view". So I deleted them via GWT and also put in a Meta No Index on these pop up overlays to stop them being indexed and causing dupe content issues. They are no longer within the index as far as I can see, i do a site:www.mydomain.com/ajax and nothing appears - So can I block these off now with robots.txt to optimize my crawl budget? Thanks
Intermediate & Advanced SEO | | bjs20100 -
Website Crawl problems
I have a feeling that Google doesn't crawl my website. E.g. this blogpost - I copy a sentence from it and paste it to Google. The page that shows up in search results is www.silvamethodlife.com/page/9/ - which is just a blog page with all the articles listed, not the link to the article itself! Did anyone ever have this problem? It's definitely some technical issue. Any advice will be deeply appreciated Thanks
Intermediate & Advanced SEO | | Alexey_mindvalley0 -
Crawl Budget on Noindex Follow
We have a list of crawled product search pages where pagination on Page 1 is indexed and crawled and page 2 and onward is noindex, noarchive follow as we want the links followed to the Product Pages themselves. (All product Pages have canonicals and unique URLs) Orr search results will be increasing the sets, and thus Google will have more links to follow on our wesbite although they all will be noindex pages. will this impact our carwl budget and additionally have impact to our rankings? Page 1 - Crawled Indexed and Followed Page 2 onward - Crawled No-index No-Archive Followed Thoughts? Thanks, Phil G
Intermediate & Advanced SEO | | AU-SEO0 -
Best way to block a search engine from crawling a link?
If we have one page on our site that is is only linked to by one other page, what is the best way to block crawler access to that page? I know we could set the link to "nofollow" and that would prevent the crawler from passing any authority, and we can set the page to "noindex" to prevent it from appearing in search results, but what is the best way to prevent the crawler from accessing that one link?
Intermediate & Advanced SEO | | nicole.healthline0