Exclude status codes in Screaming Frog
-
I have a very large ecommerce site I'm trying to spider using screaming frog. Problem is I keep hanging even though I have turned off the high memory safeguard under configuration.
The site has approximately 190,000 pages according to the results of a Google site: command.
- The site architecture is almost completely flat. Limiting the search by depth is a possiblity, but it will take quite a bit of manual labor as there are literally hundreds of directories one level below the root.
- There are many, many duplicate pages. I've been able to exclude some of them from being crawled using the exclude configuration parameters.
- There are thousands of redirects. I haven't been able to exclude those from the spider b/c they don't have a distinguishing character string in their URLs.
Does anyone know how to exclude files using status codes? I know that would help.
If it helps, the site is kodylighting.com.
Thanks in advance for any guidance you can provide.
-
Thanks for your help. It literally was just the fact that it had to be done before the crawl began and could not be changed during the crawl. Hopefully this is changed because sometimes during a crawl you find things you want to exclude that you may have not known of their existence before hand.
-
Are you sure it's just on Mac,have you tried on PC? Do you have any other rules in include or perhaps a conflicting rule in exclude? Try running a single exclude rule, also on another small site to test.
Also from support if failing on all fronts:
- Mac version, please make sure you have the most up to date version of the OS which will update Java.
- Please uninstall, then reinstall the spider ensuring you are using the latest version and try again.
To be sure - http://www.youtube.com/watch?v=eOQ1DC0CBNs
-
does the exclude function work on mac. i have tried every possible way to exclude folders and have not been successful while running an analysis
-
That's exactly the problem, the redirects are disbursed randomly throughout the site. Although, and the job's still running, it now appears as though there's almost a 1-2-1 correlation between pages and redirects on the site.
I also heard from Dan Sharp via Twitter. He said "You can't, as we'd have to crawl a URL to see the status code You can right click and remove after though!"
Thanks again Michael. Your thoroughness and follow through is appreciated.
-
Took another look, also looked at documentation/online and don't see any way to exclude URLs from crawl based on response codes. As I see it you would only want to exclude on name or directory as response code is likely to be random throughout a site and impede a thorough crawl.
-
Thank you Michael.
You're right. I was on a 64 bit machine running a 32 bit verson of java. I updated it and the scan has been running for more than 24 hours now without hanging. So thank you.
If anyone else knows of a way to exclude files using status codes I'd still like to learn about it. So far the scan is showing me 20,000 redirected files which I'd just as soon not inventory.
-
I don't think you can filter out on response codes.
However, first I would ensure you are running the right version of Java if you are on a 64bit machine. The 32bit version functions but you cannot increase the memory allocation which is why you could be running into problems. Take a look at http://www.screamingfrog.co.uk/seo-spider/user-guide/general/ under Memory.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
301 and 200 Status Issues
Hi, Moz has highlighted that we have duplicate page content on our site, displaying the following: http://bmiresearch.com/press 200 status code and http://www.bmiresearch.com/press 200 status code We have setup a 301 redirect rule on http://bmiresearch.com/press to permanently redirect to http://www.bmiresearch.com/press and on Google inspect element network it shows this http://bmiresearch.com/press 301 status code which mean redirect to this URL permanently http://www.bmiresearch.com/press 200 status code Does anyone know why this might be occuring? Is it possible that because Google has index both URL http://www.bmiresearch.com/press and http://bmiresearch.com/press with 200 status code? If so how would we correct this? Thanks
Technical SEO | | carlsutherland0 -
Keyword place in page HTML code? Higher is better?
Hello, is it important to place keyword more higher in html code Our situation: item page. H1 and all text about this item with keyword mentioned three times is in the end of html code? Competitors pages with info about item, but higher keyword place and description in html code make better in SERPS. Could it be reason? Could we change place of text about item in html code ? Giedrius, Lithuania
Technical SEO | | Patogupirkti0 -
Htaccess code to 301 redirect a folder change
Hi, I need some help to redirect all my site as there was a folder change. eg, the old structure was www.mysite.com/stuff-1/bags.html and I need it to go to the same structure without the "-1" eg: /stuff/bags.html
Technical SEO | | Paul_MC
The "bags.html" will be lots of different products, so this would be a wildcard? What would the htaccess code need to be? Thanks0 -
302 redirect and NO DATA as HTTP Status in Top Pages in SEOMOZ Link Analysis
I recently performed a link analysis using SEOMOZ and my home page (top page) indicates that there is a 302 status. Is this bad? Also, 2 other key landing pages have [NO STATUS] as the http status and [NO DATA] for the page title. Could anyone offer insight into what might be happening here, and whether or not it's something that is potentially hurting us? Thanks for your help!
Technical SEO | | dstepchew0 -
No Search Results Found - Should this return status code 404?
A question came up today on how to correctly serve the right status code on pages where no search results are found. I did a couple searches on some major eccomerce and news sites and they were ALL serving status code 200 for No Search Results Found http://www.zappos.com/dsfasdgasdgadsg http://www.amazon.com/s/ref=nb_sb_noss?url=search-alias%3Daps&field-keywords=sdafasdklgjasdklgjsjdjkl http://www.ebay.com/sch/i.html?_trksid=p5197.m570.l1313&_nkw=dfjakljgdkslagklasd&_sacat=0 http://www.cnn.com/search/?query=sdgadgdsagas&x=0&y=0&primaryType=mixed&sortBy=date&intl=false http://www.seomoz.org/pages/search_results?q=sdagasdgasdgasg I thought I read somewhere were it was recommended to serve a status code 404 on these types of pages. Based on what I found above, all sites were serving a 200, so it appears this may not be the best practice. Any thoughts?
Technical SEO | | WEB-IRS0 -
Should WordPress themes be hard coded for better SEO?
In the interests of making my site faster I have recently come across the suggestion of removing unwanted PHP from my WooThemes WordPress theme. The suggestion is to hard code the choices I have made in the WordPress template to reduce on database calls. Has anyone actually done this to their WordPress theme before and seen any measurable results?
Technical SEO | | Wallander1 -
Code for redirect
What is the code to redirect www.xyz.com/abc where abc is a folder to www.xyz.com/abc.html
Technical SEO | | seoug_20050 -
Code problem and the impact on links
We have a specific URL naming convention for 'city landing pages': .com/Burbank-CA .com/Boston-MA etc. We use this naming convention almost exclisively as the URLs for links. Our website had a code breakdown and all those URLs within that naming convention led to an error message on the website. Will this impact our links?
Technical SEO | | Storitz0