Suggested Screaming Frog configuration to mirror default Googlebot crawl?
-
Hi All,
Does anyone have a suggested Screaming Frog (SF) configuration to mirror default Googlebot crawl? I want to test my site and see if it will return 429 "Too Many Requests" to Google.
I have set the User Agent as Googlebot (Smartphone). Is the default SF Menu > Configuration > Speed > Max Threads 5 and Max URLs 2.0 comparable to Googlebot?
Context:
I had tried NetPeak SEO Spider which did a nice job and had a cool feature that would pause a crawl if it got to many 429. Long Story short, B2B site threw 429 Errors when there should have been no load on a holiday weekend at 1:00 AM.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Crawling/indexing of near duplicate product pages
Hi, Hope someone can help me out here. This is the current situation: We sell stones/gravel/sand/pebbles etc. for gardens. I will take a type of pebbles and the corresponding pages/URL's to illustrate my question --> black beach pebbles. We have a 'top' product page for black beach pebbles on which you can find different types of quantities (differing from 20kg untill 1600 kg). There is not any search volume related to the different quantities The 'top' page does not link to the pages for the different quantities The content on the pages for the different quantities is not exactly the same (different price + slightly different content). But a lot of the content is the same. Current situation:
Intermediate & Advanced SEO | | AMAGARD
- Most pages for the different quantities do not have internal links (about 95%) But the sitemap does contain all of these pages. Because the sitemap contains all these URL's, google frequently crawls them (I checked the logfiles) and has indexed them. Problems: Google spends its time crawling irrelevant pages --> our entire website is not that big, so these quantity URL's kind of double the total number of URL's. Having url's in the sitemap that do not have an internal link is a problem on its own All these pages are indexed so all sorts of gravel/pebbles have near duplicates. My solution: remove these URL's from the sitemap --> that will probably stop Google from regularly crawling these pages Putting a canonical on the quantity pages pointing to the top-product page. --> that will hopefully remove the irrelevant (no search volume) near duplicates from the index My questions: To be able to see the canonical, google will need to crawl these pages. Will google still do that after removing them from the sitemap? Do you agree that these pages are near duplicates and that it is best to remove them from the index? A few of these quantity pages do have intenral links (a few procent of them) because of a sale campaign. So there will be some (not much) internal links pointing to non-canonical pages. Would that be a problem? Thanks a lot in advance for your help! Best!1 -
Client has an inexplicable jump in crawled pages being reported in Google Search Console
Recently a client of mine noticed an inexplicable jump in crawled pages being reported in Google Search Console. We researched the following culprits and found nothing: Rel=canonicals are put in place No SSL/non SSL duplication We used a tool to extrapolate search query page data from Google Search Insights; nothing unusual No dynamic pages being made on the website All necessary landing pages are in the XML sitemap Could this be a glitch in GSC? We are wondering what the heck is going on. 7eaeS
Intermediate & Advanced SEO | | BigChad20 -
Hreflang="x-default"
Hello all This is my first question in the Moz Forum, hope I will get some concrete answers 🙂 I am looking for some suggestions on implementing the hreflang="x-default" properly in our site. Any previous experience or a link to a specific resource/ example will be very helpful. I have found many examples on implementing the homepage hreflang, however nothing on non-homepage urls within your site. The below will be the code for the "Homepage" for /uk/. Here /en-INT/ is a Global English site not targeted for any country unlike en-MY, en-SG, en-AU etc. Is this the correct approach? Now, in case of non homepage urls, should the respective en-INT url be "x-default" or the "x-default" shouldn't exist altogether? For example, will the below be the correct coding? Many thanks Avi
Intermediate & Advanced SEO | | Delonghi_Group0 -
Do search engines crawl links on 404 pages?
I'm currently in the process of redesigning my site's 404 page. I know there's all sorts of best practices from UX standpoint but what about search engines? Since these pages are roadblocks in the crawl process, I was wondering if there's a way to help the search engine continue its crawl. Does putting links to "recent posts" or something along those lines allow the bot to continue on its way or does the crawl stop at that point because the 404 HTTP status code is thrown in the header response?
Intermediate & Advanced SEO | | brad-causes0 -
GoogleBot Mobile & Depagination
I am building a new site for a client and we're discussing their inventory section. What I would like to accomplish is have all their products load on scroll (or swipe on mobile). I have seen suggestions to load all content in the background at once, and show it as they swipe, lazy loading the product images. This will work fine for the user, but what about how GoogleBot mobile crawls the page? Will it simulate swiping? Will it load every product at once, killing page load times b/c of all of the images it must load at once? What are considered SEO best practices when loading inventory using this technique. I worry about this b/c it's possible for 2,000+ results to be returned, and I don't want GoogleBot to try and load all those results at once (with their product thumbnail images). And I know you will say to break those products up into categories, etc. But I want the "swipe for more" experience. 99.9% of our users will click a category or filter the results, but if someone wants to swipe through all 2,000 items on the main inventory landing page, they can. I would rather have this option than "Page 1 of 350". I like option #4 in this question, but not sure how Google will handle it. http://ux.stackexchange.com/questions/7268/iphone-mobile-web-pagination-vs-load-more-vs-scrolling?rq=1 I asked Matt Cutts to answer this, if you want to upvote this question. 🙂
Intermediate & Advanced SEO | | nbyloff
https://www.google.com/moderator/#11/e=adbf4&u=CAIQwYCMnI6opfkj0 -
Large number of pages crawled.
My campaign for printlabelandmail.com says that seomoz has crawled 619 pages. My site, however, only has a little over 250 pages. Where are these extra pages? I did recently relaunched my website with wordpress. I was using Dreamweaver before. I thought I deleted all the old pages. Could these extra pages be old pages from the site prior to my relaunch? I hope my question makes sense. Any insights would be helpful. Thanks! Andrea
Intermediate & Advanced SEO | | JimDirectMailCoach0 -
How does the crawl find duplicate pages that don't exist on the site?
It looks like I have a lot of duplicate pages which are essentially the same url with some extra ? parameters added eg: http://www.merlin.org.uk/10-facts-about-malnutrition http://www.merlin.org.uk/10-facts-about-malnutrition?page=1 http://www.merlin.org.uk/10-facts-about-malnutrition?page=2 These extra 2 pages (and there's loads of pages this happens to) are a mystery to me. Not sure why they exist as there's only 1 page. Is this a massive issue? It's built on Drupal so I wonder if it auto generates these pages for some reason? Any help MUCH appreciated. Thanks
Intermediate & Advanced SEO | | Deniz0 -
Home page url 301 redirect suggestion
Hello, In our site we have already done 301 redirect from http:// to http://www. However, the home page links are still coming in 2 ways http://www.mycarhelpline.com/ http://www.mycarhelpline.com/index.php?option=com_newcar&view=search&Itemid=2 Need suggestion We have already use rel canonical is another 301 redirect to be used for maintaining the home page pr from seo point of view. Does google still takes both urls as separate url and finds duplicate content
Intermediate & Advanced SEO | | Modi0