Exclude status codes in Screaming Frog
-
I have a very large ecommerce site I'm trying to spider using screaming frog. Problem is I keep hanging even though I have turned off the high memory safeguard under configuration.
The site has approximately 190,000 pages according to the results of a Google site: command.
- The site architecture is almost completely flat. Limiting the search by depth is a possiblity, but it will take quite a bit of manual labor as there are literally hundreds of directories one level below the root.
- There are many, many duplicate pages. I've been able to exclude some of them from being crawled using the exclude configuration parameters.
- There are thousands of redirects. I haven't been able to exclude those from the spider b/c they don't have a distinguishing character string in their URLs.
Does anyone know how to exclude files using status codes? I know that would help.
If it helps, the site is kodylighting.com.
Thanks in advance for any guidance you can provide.
-
Thanks for your help. It literally was just the fact that it had to be done before the crawl began and could not be changed during the crawl. Hopefully this is changed because sometimes during a crawl you find things you want to exclude that you may have not known of their existence before hand.
-
Are you sure it's just on Mac,have you tried on PC? Do you have any other rules in include or perhaps a conflicting rule in exclude? Try running a single exclude rule, also on another small site to test.
Also from support if failing on all fronts:
- Mac version, please make sure you have the most up to date version of the OS which will update Java.
- Please uninstall, then reinstall the spider ensuring you are using the latest version and try again.
To be sure - http://www.youtube.com/watch?v=eOQ1DC0CBNs
-
does the exclude function work on mac. i have tried every possible way to exclude folders and have not been successful while running an analysis
-
That's exactly the problem, the redirects are disbursed randomly throughout the site. Although, and the job's still running, it now appears as though there's almost a 1-2-1 correlation between pages and redirects on the site.
I also heard from Dan Sharp via Twitter. He said "You can't, as we'd have to crawl a URL to see the status code You can right click and remove after though!"
Thanks again Michael. Your thoroughness and follow through is appreciated.
-
Took another look, also looked at documentation/online and don't see any way to exclude URLs from crawl based on response codes. As I see it you would only want to exclude on name or directory as response code is likely to be random throughout a site and impede a thorough crawl.
-
Thank you Michael.
You're right. I was on a 64 bit machine running a 32 bit verson of java. I updated it and the scan has been running for more than 24 hours now without hanging. So thank you.
If anyone else knows of a way to exclude files using status codes I'd still like to learn about it. So far the scan is showing me 20,000 redirected files which I'd just as soon not inventory.
-
I don't think you can filter out on response codes.
However, first I would ensure you are running the right version of Java if you are on a 64bit machine. The 32bit version functions but you cannot increase the memory allocation which is why you could be running into problems. Take a look at http://www.screamingfrog.co.uk/seo-spider/user-guide/general/ under Memory.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Redirect chain error free htaccess code for website
i want to redirect domain, example.com to https://www.example.com, is anyone can help me to provide redirect chain error free ht-access code. I implemented this htaccess code on the website and mhy site show on the moz redirect chain error RewriteCond %{HTTP_HOST} !=""
Technical SEO | | truehab
RewriteCond %{THE_REQUEST} ^[A-Z]+\s//+(.)\sHTTP/[0-9.]+$ [OR]
RewriteCond %{THE_REQUEST} ^[A-Z]+\s(./)/+\sHTTP/[0-9.]+$
RewriteRule .* http://%{HTTP_HOST}/%1 [R=301,L]0 -
Pages giving both 200 and 302 reponce codes?
We are having some issues with response codes on our product pages on our new site. It first came to my attention with the mozbot crawl which was picking up 1000s of 302 redirects, but when I checked them manually there was no redirect (and even the moz toobar was giving a 200 status) I then check with this tool http://tools.seobook.com/server-header-checker/?page=single&url=https%3A%2F%2Fwww.equipashop.ie%2Fshop-fittings-retail-equipment%2Fgridwall%2Fgridwall-shelves%2Fflat-gridwall-shelf.html&useragent=2&typeProtocol=11
Technical SEO | | PaddyDisplays
And its showing that there are 2 responses at 302 and a 200 ( but with the same bot under googlebot setting only shows the 200 status). I'm also getting no warning about it in WMTs Does anyone know what's happening here and how worried about it should I be as it seems goggle is using only the 200 status btw the developer thinks it something to do with how the browser is handling the canonicallink, but I'm not convinced Thanks0 -
Confirming Robots.txt code deep Directories
Just want to make sure I understand exactly what I am doing If I place this in my Robots.txt Disallow: /root/this/that By doing this I want to make sure that I am ONLY blocking the directory /that/ and anything in front of that. I want to make sure that /root/this/ still stays in the index, its just the that directory I want gone. Am I correct in understanding this?
Technical SEO | | cbielich0 -
Do You Have To Have Access to Website Code to Use Open Graph
I am not a website programmer and all of our websites are in Wordpress. I never change the coding on the backend. Is this a necessity if one wants to use Open Graph?
Technical SEO | | dahnyogaworks0 -
Suggestions on good framework/code for building an optimized website?
There seem to be quite a few template, framework, and theme options for building a site optimized for search. I'm currently looking at Socrates and Genesis premium themes for Wordpress. Does anyone have experience or suggestions on these resources?
Technical SEO | | ksracer0 -
Schema coding
Hi, I was wondering if you may know if you have to keep to the and coding when adding schema code to the site. For example if I'm already using H and P tags can I add the "itemprop" to those or do they have to be in aor as in the example below: <span itemprop="name">Kenmore White 17" Microwavespan>
Technical SEO | | DragonSearch
Product description:
<span itemprop="description">0.7 cubic feet countertop microwave. Has six preset cooking categories and convenience features like Add-A-Minute and Child Lock.span> So could I code it like this? <h1 itemprop="name">Kenmore White 17" Microwaveh1>
Product description:
<p itemprop="description">0.7 cubic feet countertop microwave. Has six preset cooking categories and convenience features like Add-A-Minute and Child Lock.p> Thank you,
Etela0 -
Is there such thing as a good text/code ratio? Can it effect SERPs?
As it says on the tin; Is there such thing as a good text/code ratio? And can it effect SERPs? I'm currently looking at a 20% ratio whereas some competitors are closer to 40%+. Best regards,
Technical SEO | | ARMofficial
Sam.0 -
How Add 503 status to IIS 6.0
Hi, Our IS department is bringing down our network for maintenance this weekend for 24 hours. I am worried about search engine implications. all Traffic is being diverted, and the diverted traffic is being sent to another server with IIS 6.0 From all research i have done it appears creating a custom 503 error message in IIS 6 is not possible Source: http://technet.microsoft.com/en-us/library/bb877968.aspx So my question is does anyone have any suggestions on how to do a proper 503 temporarily unavailable in IIS 6.0 with a custom error message? Thanks
Technical SEO | | Jinx146780