Exclude status codes in Screaming Frog
-
I have a very large ecommerce site I'm trying to spider using screaming frog. Problem is I keep hanging even though I have turned off the high memory safeguard under configuration.
The site has approximately 190,000 pages according to the results of a Google site: command.
- The site architecture is almost completely flat. Limiting the search by depth is a possiblity, but it will take quite a bit of manual labor as there are literally hundreds of directories one level below the root.
- There are many, many duplicate pages. I've been able to exclude some of them from being crawled using the exclude configuration parameters.
- There are thousands of redirects. I haven't been able to exclude those from the spider b/c they don't have a distinguishing character string in their URLs.
Does anyone know how to exclude files using status codes? I know that would help.
If it helps, the site is kodylighting.com.
Thanks in advance for any guidance you can provide.
-
Thanks for your help. It literally was just the fact that it had to be done before the crawl began and could not be changed during the crawl. Hopefully this is changed because sometimes during a crawl you find things you want to exclude that you may have not known of their existence before hand.
-
Are you sure it's just on Mac,have you tried on PC? Do you have any other rules in include or perhaps a conflicting rule in exclude? Try running a single exclude rule, also on another small site to test.
Also from support if failing on all fronts:
- Mac version, please make sure you have the most up to date version of the OS which will update Java.
- Please uninstall, then reinstall the spider ensuring you are using the latest version and try again.
To be sure - http://www.youtube.com/watch?v=eOQ1DC0CBNs
-
does the exclude function work on mac. i have tried every possible way to exclude folders and have not been successful while running an analysis
-
That's exactly the problem, the redirects are disbursed randomly throughout the site. Although, and the job's still running, it now appears as though there's almost a 1-2-1 correlation between pages and redirects on the site.
I also heard from Dan Sharp via Twitter. He said "You can't, as we'd have to crawl a URL to see the status code You can right click and remove after though!"
Thanks again Michael. Your thoroughness and follow through is appreciated.
-
Took another look, also looked at documentation/online and don't see any way to exclude URLs from crawl based on response codes. As I see it you would only want to exclude on name or directory as response code is likely to be random throughout a site and impede a thorough crawl.
-
Thank you Michael.
You're right. I was on a 64 bit machine running a 32 bit verson of java. I updated it and the scan has been running for more than 24 hours now without hanging. So thank you.
If anyone else knows of a way to exclude files using status codes I'd still like to learn about it. So far the scan is showing me 20,000 redirected files which I'd just as soon not inventory.
-
I don't think you can filter out on response codes.
However, first I would ensure you are running the right version of Java if you are on a 64bit machine. The 32bit version functions but you cannot increase the memory allocation which is why you could be running into problems. Take a look at http://www.screamingfrog.co.uk/seo-spider/user-guide/general/ under Memory.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
301 Redirection, then 200 status for specific webpage
Hello everyone, Would like to seek your advice. Our company classified web pages currently set 301 redirection for product listing (expired) -> relevant category pages. At the same time, remove this webpage URL from the sitemap as well. In some cases, users reactivated the expired ads. In this case, the page will become status 200 again, also re-included in the sitemap again. Wondering if Search engines able to pick up and index the same webpage again? Thanks in advance!
Technical SEO | | raysamu
Raymond0 -
I Lost Index Status of My Sitemap
We have a simple WordPress website for our law firm, with an English version and a Spanish version. I have created a sitemap (with appropriate language markup in the XML file) and submitted it to Webmaster Tools. Google crawled the site and accepted the sitemap last week, 24/24 pages indexed, 12 English and 12 Spanish. This week, Google decided to remove one of the pages from the index, showing 23/24 pages indexed. So, my questions are as follows: How can I find out which page was dropped from the index? If the pages are the same content, but different language, why did only one version of the page get dropped, while the other version remains? Why did the Big G drop one of my pages from the index? How can I reindex the dropped page? I know this is a fairly basic issue, and I'm embarrassed for asking, but I sure do appreciate the help.
Technical SEO | | RLG0 -
Redirect 302 status code to 301 status code
Dear All, According to Mozz crawling report our site (www.rijwielcashencarry.n) have a few medium priority problems. There are 302 temporarly direct which i would like to redirect to 301 (because of the linkjuice). What is the proper way to do this?
Technical SEO | | rijwielcashencarry040
I keep looking for it, but i can't seem to find the right solution. Thanks for your help!0 -
Title tag code
Hi, I have a couple of websites where I can't define the title tag (CMS does not support it) on a few default pages. On these pages "the system" just uses the primary/main title tag (from the frontpage) and my programming skills (as if I have any...!) have not been able to make a html code or something to override the main title tag on these specific pages. Does this make sense at all and can anyone give me a hint, a code to try out or something? Problem is that I now have 3 pages with the same title tag which in terms of SEO isn't too good, so to say... Thanks in advance. Jan
Technical SEO | | Wello12340 -
Code problem and the impact on links
We have a specific URL naming convention for 'city landing pages': .com/Burbank-CA .com/Boston-MA etc. We use this naming convention almost exclisively as the URLs for links. Our website had a code breakdown and all those URLs within that naming convention led to an error message on the website. Will this impact our links?
Technical SEO | | Storitz0 -
Why are apostrophes and other characters still showing as code in my titles?
Hi, I have a WordPress-based site and overall everything is working well. However, I can't seem to figure out how to get apostrophes and other characters to display normally. Now, the problem isn't that they are displaying as code to normal visitors or up in the title bar, they are displaying as code to Google's bots as well as to SEOMOZ. Example: Normal visitor sees: About **** | **** - Metro Vancouver's IT & Web Experts Google and SEOMOZ see: About **** | **** - Metro Vancouver's IT & Web Experts I've played around with different ways of typing the title (not using character codes vs. using character codes) and nothing seems to work. Any help or explanation would be appreciated.
Technical SEO | | Function50 -
How Add 503 status to IIS 6.0
Hi, Our IS department is bringing down our network for maintenance this weekend for 24 hours. I am worried about search engine implications. all Traffic is being diverted, and the diverted traffic is being sent to another server with IIS 6.0 From all research i have done it appears creating a custom 503 error message in IIS 6 is not possible Source: http://technet.microsoft.com/en-us/library/bb877968.aspx So my question is does anyone have any suggestions on how to do a proper 503 temporarily unavailable in IIS 6.0 with a custom error message? Thanks
Technical SEO | | Jinx146780