Screaming frog Advice
-
Hi
I am trying to crawl my site and it keeps crashing.
My sys admins keeps upgrading the virtual box it sits on and it now currently has 8GB of memory, but still crashes.
It gets to around 200k pages crawl and dies.
Any tips on how I can crawl my whole site, can u use screaming frog to crawl part of a site.
Thanks in advance for any tips.
Andy
-
Thanks, I tried all the tips on the screaming frog site, but I have just tried to 2 pages a second and lets hope that work.
-
Hi Andy. There are quite a few settings you can adjust to make the server load less while the crawl is running. These can be found with descriptions here: http://www.screamingfrog.co.uk/seo-spider/user-guide/configuration/
For example, by not checking Images, CSS, SWF, and Javascript you'll be able to lessen load substantially, or if you'd like to crawl just a portion of the site you can set it to not check links outside of the start folder.
To have even more control over the crawl, you can use regular expressions to exclude certain pages, or sections that match a given pattern. The page above is fairly robust, so it should help you dial back the crawler to be friendlier to your server. Cheers!
-
Hey there mate,
Sorry to hear that you are having issues. You can actually ask Screaming Frog to use more RAM. If you haven't done that yet please give it a go.
You can find more here http://www.screamingfrog.co.uk/seo-spider/user-guide/general/
If you want to crawl part of your site it can surely do that. You can exclude pages or whole sections.
Find more here http://www.screamingfrog.co.uk/seo-spider/user-guide/configuration/
Hope this helps!
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Very Old Pages Creeping Up - Advice
We are currently having very old pages dating back 5+ years ago appearing on moz all of a sudden, we don't necessarily get traffic from these links anymore and i doubt they still hold any weight. Currently they take you to a 404 page, would there be any worth in redirecting these links?
Intermediate & Advanced SEO | | JH_OffLimits0 -
Looking for SEO advice on Negative SEO attack. Technical SEO
please see this link https://www.dropbox.com/s/thgy57zmmwzodcp/Screenshot 2016-05-31 13.25.23.png?dl=0 you can see my domain is getting tons of chinese spam. I have 410'd the page but it still keeps coming.. 7tnawRV
Intermediate & Advanced SEO | | mattguitar990 -
Panda penalty removal advice
Hi everyone! I'm after a second (or third, or fourth!) opinion here! I'm working on the website www.workingvoices.com that has a Panda penalty dating from the late March 2012 update. I have made a number of changes to remove potential Panda issues but haven't seen any rankings movement in the last 7 weeks and was wondering if I've missed something... The main issues I identified and fixed were: Keyword stuffed near duplicate title tags - fixed with relevant unique title tags Copies of the website on other domains creating duplicate content issues - fixed by taking these offline Thin content - fixed by adding content to some pages, and noindexing other thin/tag/category pages. Any thoughts on other areas of the site that might still be setting off the mighty Panda are appreciated! Cheers Damon.
Intermediate & Advanced SEO | | Digitator0 -
Strange 404s in Screaming Frog
I just ran a website (Drupal) through screaming frog and the only 404s I found related to web pages which were the same as URLs already used on the website plus the company phone number so... www.company.com/[their phone number] - www.company.com/services[their phone number] - any ideas what might be causing this problem?
Intermediate & Advanced SEO | | McTaggart0 -
Can't crawl website with Screaming frog... what is wrong?
Hello all - I've just been trying to crawl a site with Screaming Frog and can't get beyond the homepage - have done the usual stuff (turn off JS and so on) and no problems there with nav and so on- the site's other pages have indexed in Google btw. Now I'm wondering whether there's a problem with this robots.txt file, which I think may be auto-generated by Joomla (I'm not familiar with Joomla...) - are there any issues here? [just checked... and there isn't!] If the Joomla site is installed within a folder such as at e.g. www.example.com/joomla/ the robots.txt file MUST be moved to the site root at e.g. www.example.com/robots.txt AND the joomla folder name MUST be prefixed to the disallowed path, e.g. the Disallow rule for the /administrator/ folder MUST be changed to read Disallow: /joomla/administrator/ For more information about the robots.txt standard, see: http://www.robotstxt.org/orig.html For syntax checking, see: http://tool.motoricerca.info/robots-checker.phtml User-agent: *
Intermediate & Advanced SEO | | McTaggart
Disallow: /administrator/
Disallow: /bin/
Disallow: /cache/
Disallow: /cli/
Disallow: /components/
Disallow: /includes/
Disallow: /installation/
Disallow: /language/
Disallow: /layouts/
Disallow: /libraries/
Disallow: /logs/
Disallow: /modules/
Disallow: /plugins/
Disallow: /tmp/0 -
Site Wide Footer Links Exception, Any Advice ?
I was reading the following Q&A on site wide footer links, http://moz.com/community/q/site-wide-links-from-another-domain-could-these-cause-a-problem I feel my situation is slightly different however,we have lots of international sites linking to each other through these links like our sites for different counties and languages so our German, French and Spanish sites, http://www.cirrusresearch.co.uk/ Our main UK site has always ranked very well and has never really had a problem despite always having had these followed sitewide footer links, Because of this we regularly get high amount of visitors performing English language searches from different counties and i don't think it is a bad thing having more country/language specific sites of ours available in the footer for visitors that may prefer a more localized site, Our main website has to be at least 10+ years old at least, has a lot of strong links compared to our competitors, but the smaller German and Spanish sites are relatively smaller in size and most only 1-2 years old, my big fear is that these smaller sites would not be able to stand on there own without these footer links from our main site, After reading the community question caused me to question this ?, should i take a leap of faith and no-follow all of these site wide footer links connecting all of our sites ? we never really had a problem ranking so i don't really see the need but would this be the best thing to do ? Thank you, James
Intermediate & Advanced SEO | | Antony_Towle0 -
Advice on Content Marketing in a Tough Niche
Hello, In our niche, nobody links to the content/information with rare exceptions. Do you guys have any good articles/ideas for cases like this? The content that is linked to is once removed in subject matter from the content of our site, like if we sold shoes and had to write on different types of clothing stores. Looking for advice on what to do and how to figure out what to write about. We've probably got a descent budget this time but we're not sure how to go about this. Any advice is appreciated.
Intermediate & Advanced SEO | | BobGW0 -
Need some urgent Panda advice. Open discussion about recovering from the Panda algorithm.
I have a site that has been affected by Panda, and I think I have finally found the problem. When I created this site in the year 2006, I bought content without checking it. Recently, when I went through the site I found out that this content had many duplicates around the web. Not 100% exact, but close to. The first thing I did is ask my best writer to rewrite these topics, as they are a must on my site. This is a very experienced writer, and she will make the categories and subpages outstanding. Second thing I did was putting a NOINDEX, FOLLOW robots meta in place for the pages I determined being bad. They haven't been de-indexed yet. Another thing I recently did is separate other languages and move these over to other domains (with 301's redirecting the old locations to the new.) This means that the site now has a /en/ directory in the URL which is no longer used. With this in mind I was thinking to relocate the NEW content, and 301 the old (to preserve the juice for a while.) For example: http://www.mysite.com/en/this-is-a-pandalized-page/ 301 to http://www.mysite.com/this-is-the-rewritten-page/ The benefits of doing this are: decreasing the amounts of directories in the URL getting rid of pages that are possibly causing trouble getting fresh pages added to the site Now, the advice I am looking for is basically this: Do you agree with the above? Or don't you agree? If you don't, please be so kind to include a reason with your answer. If you do, and have any additional information, or would like to discuss, please go ahead 🙂 Thanks, Giorgio PS: Is it proven that Panda is now a running update? Or is it still periodically executed?
Intermediate & Advanced SEO | | VisualSense1