Crawl budget
-
I am a believer in this concept, showing google less pages will increase their importance.
here is my question: I manage a website with millions of pages, high organic traffic (lower than before).
I do believe that too many pages are crawled. there are pages that I do not need google to crawl and followed. noindex follow does not save on the mentioned crawl budget. deleting those pages is not possible. any advice will be appreciated. If I disallow those pages I am missing on pages that help my important pages.
-
I just wrote a better reply so forgive me the file was deleted by hit post. But I do think this information will help you get acquainted much better with a crawl budget
One thing that may not incorporate your domain however is that is not centered a external link is a content delivery or CDN URL that is not rewritten to the traditional CDN.example.com Content delivery network URLs can look like
- https://example.cachefly.net/images/i-love-cats.jpg
- https://example.global.prod.fastly.net/images/i-love-cats-more.png
So you may see URLs like one shown here should be considered part of your domain and treated like a domain even if they look like this but these are just two examples of literally thousands CDN variants.
Examples for this would be the cost of incorporating your hostname to a HTTPS encrypted content delivery network can be very expensive.
Crawl budget information
- http://www.stateofdigital.com/guide-crawling-indexing-ranking/
- https://www.deepcrawl.com/case-studies/elephate-fixing-deep-indexation-issues/
- https://builtvisible.com/log-file-analysis/
- https://www.deepcrawl.com/knowledge/best-practice/the-seo-ruler/ ( image below)
Tools for Crawl budget
I hope is more helpful,
Thomas
-
Nope, having more or less external links will have no effect on the crawl budget spent on your site. Your crawl budget is only spent on yourdomain.com. I just meant that, Google's crawler will follow external links, but it won't until it's spent its crawl budget on your site.
Google isn't going to give you a metric showing your crawl budget, but you can assume how much it is by going to Google Search Console > Crawl > Crawl Stats. That'll show you how many pages Googlebot has crawled per day.
-
I see, thank you.
I'm guessing if we have these external links across the site, it'll cause more harm than good for us if they use up some crawl budget on every page?
Is there anyway to find out what the crawl budget is?
-
To be clear, Googlebot will find those external links and leave your site, but not until they've used their crawl budget up on your site.
-
Nope.
-
I'd like to find out if this crawl budget applies to external sites - we link to sister companies in the footer.
Will Google bot find these external links and leave our site to go and crawl these external sites?
Thanks!
-
Using Google's parameter tools you can also reduce crawl budget issues.
https://www.google.com/webmasters/tools/crawl-url-parameters
http://www.blindfiveyearold.com/crawl-optimization
http://searchenginewatch.com/sew/news/2064349/crawl-index-rank-repeat-a-tactical-seo-framework
http://searchengineland.com/how-i-think-crawl-budget-works-sort-of-59768
-
What's the background of your conclusion? What's the logic?
I am asking because my understanding about crawl budget is that if you waste it you risk to 1) slow down recrawl frequency on a per page basis and 2) risk google crawler gives up crawling the website before to have crawled all pages.
Are you splitting your sitemap into sub sitemaps? That's a good way to spot groups/categories of pages being ignored by google crawler.
-
If you want to test this, identify a section of your site that you can disallow via robots.txt and then measure the corresponding changes. Proceed section by section based on results. There are so many variables at play that I don't think you'll get an answer that's anywhere as precise as testing your specific situation.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Crawl diagnostic how important is these 2 types of errors and what to do?
Hi,
Intermediate & Advanced SEO | | nicolaj1977
I am trying to SEO optimized my webpage dreamesatehuahin.com When I saw SEO Moz webpage crawl diagnostic I kind of got a big surprise due to the high no. of errors. I don’t know if this is the kind of errors that need to be taken very serious i my paticular case, When I am looking at the details I can see the errors are cause by the way my wordpress theme is put together. I don’t know how to resolve this. But If important I might hire a programmer. DUPLICATE ERRORS (40 ISSUES HIGH PRIORITY ACCORDING TO MOZ)
They are all the same as this one.
http://www.dreamestatehuahin.com/property-feature/restaurent/page/2/
is eaqual to this one
http://www.dreamestatehuahin.com/property-feature/restaurent/page/2/?view=list This one exsist
http://www.dreamestatehuahin.com/property-feature/car-park/
while a level down don’t exsit
http://www.dreamestatehuahin.com/property-feature/ DUPLICATE PAGE TITLE (806 ISSUES MEDIUM PRIORITY ACCORDING TO MOZ)
This is related to search results and pagination.
Etc. Title for each of these pages is the same
http://www.dreamestatehuahin.com/property-search/page/1 http://www.dreamestatehuahin.com/property-search/page/2 http://www.dreamestatehuahin.com/property-search/page/3 http://www.dreamestatehuahin.com/property-search/page/4 Title element is to long (405)
http://www.dreamestatehuahin.com/property-feature/fitness/?view=list
this is not what I consider real pages but maybe its actually is a page for google. The title from souce code is auto generated and in this case it not makes sense
<title>Fitness Archives - Dream Estate Hua Hin | Property For Sale And RentDream Estate Hua Hin | Property For Sale And Rent</title> I know at the moment there are properly more important things for our website like content, title, meta descriptions, intern and extern links and are looking into this and taking the whole optimization seriously. Have for instance just hired a content writer rewrite and create new content based on keywords research. I WOULD REALLY APPRICIATE SOME EXPERIENCE PEOPLE FEEDBACK ON HOW IMPORTANT IS IT THAT I FIX THIS ISSUES IF AT ALL POSSIBLE? best regards, Nicolaj1 -
How can a Page indexed without crawled?
Hey moz fans,
Intermediate & Advanced SEO | | atakala
In the google getting started guide it says **"
Note: **Pages may be indexed despite never having been crawled: the two processes are independent of each other. If enough information is available about a page, and the page is deemed relevant to users, search engine algorithms may decide to include it in the search results despite never having had access to the content directly. That said, there are simple mechanisms such as robots meta tags to make sure that pages are not indexed.
" How can it happen, I dont really get the point.
Thank you0 -
Should you give all the posts in a Forum an unique description? Or let it empty so Google can make one with the crawled keywords .... ...
To make all descriptions for all forum posts unique is a hell of a job.... One option is to crawl the first 165 characters and turn these automaticly into the meta description of the page.
Intermediate & Advanced SEO | | Zanox
If Google thinks the meta description is not suitable for the search query, Google will make a own description. In this case all te meta descriptions are unique, like the Google Guidlines want you to do. How will Google think off the fact when we delete the meta description tag so Google will make all the descriptions by herself?0 -
What is the value of Google Crawling Dynamic URLS with NO SEO
Hi All I am Working on travel site for client where there are 1000's of product listing pages that are dynamically created. These pages are not SEO optimised and are just lists of products with no content other than the product details. There are no meta tags for title and description on the listings pages. You then click Find Out more to go to the full product details. There is no way to SEO these Dynamic pages This main product details has no content other than details and now meta tags. To help increase my google rankings for the rest of the site which is search optimised would it be better to block google from indexing these pages. Are these pages hurting my ability to improve rankings if my SEO of the content pages has been done to a good level with good unique Titles, descriptions and useful content thanks In advance John
Intermediate & Advanced SEO | | ingageseo0 -
Issues with Google-Bot crawl vs. Roger-Bot
Greetings from a first time poster and SEO noob... I hope that this question makes sense... I have a small e-commerce site, I have had Roger-bot crawl the site and I have fixed all errors and warnings that Volusion will allow me to fix. Then I checked Webmaster Tools, HTML improvements section and the Google-bot sees different dupe. title tag issues that Roger-bot did not. so A few weeks back I changed the title tag for a product, and GWT says that I have duplicate title tags but there is only one live page for the product. GWT lists the dupe. title tags, but when I click on each they all lead to the same live page. I'm confused, what pages are these other title tags referring to? Does Google have more than one page for that product indexed due to me changing the title tag when the page had a different URL? Does this question make sense? 2) Is this issue a problem? 3) What can I do to fix it? Any help would be greatly appreciated Jeff
Intermediate & Advanced SEO | | IOSC0 -
If I had an issue with a friendly URL module and I lost all my rankings. Will they return now that issue is resolved next time I'm crawled by google?
I have 'magic seo urls' installed on my zencart site. Except for some reason no one can explain why or how the files were disabled. So my static links went back to dynamic (index.php?**********) etc. The issue was resolved with the module except in that time google must have crawled my site and I lost all my rankings. I'm nowher to be found in the top 50. Did this really cause such an extravagant SEO issue as my web developers told me? Can I expect my rankings to return next time my site is crawled by google?
Intermediate & Advanced SEO | | Pete790 -
SEOMOZ crawl all my pages
SEOMOZ crawl all my pages including ".do" (all web pages after sign up ) . Coz of this it finishes all my 10.000 crawl page quota and be exposed to dublicate pages. Google is not crawling pages that user reach after sign up. Because these are private pages for customers I guess The main question is how we can limit SEOMOZ crawl bot. If the bot can stay out of ".do" java extensions it'll perfect to starting SEO analysis. Do you know think about it? Cheers Example; .do java extension (after sign up page) (Google can't crawl) http://magaza.turkcell.com.tr/showProductDetail.do?psi=1001694&shopCategoryId=1000021&model=Apple-iPhone-3GS-8GB Normal Page (Google can crawl) http://magaza.turkcell.com.tr/telefon/Apple-iPhone-3GS-8GB/1001694/.html
Intermediate & Advanced SEO | | hcetinsoy0 -
Why isnt my crawl results showing a 301 redirect even though I have a 301 rewrite in my .htaccess file?
Ive searched the previous Q&A's & cant find an answer so I;ll ask it here 🙂 crawling my site shows isnt the 301 redirect that i have from my non www to my www domainIts only showing all the results for my www subdomain.As i'm new to SEO & SeoMoz I dont fully understand. Any help would be greatly appreciated because my site is like 2 & a half years old & i'm trying to learn seo so I can rank higher in the serp's. Thanks
Intermediate & Advanced SEO | | PCTechGuy20120