ScreamingFrog won't crawl my site.
-
Hey guys,
My site is Netspiren.dk and when I use a tool like Screaming Frog or Integrity, it only crawls my homepage and menu's - not product-pages.
Examples
A menu: http://www.netspiren.dk/pl/Helse-Kosttilskud-Blandingsolie_57699.aspx
A product: http://www.netspiren.dk/pi/All-Omega-3-6-9-180-kapsler_1412956_57699.aspxIs it because the products are being loaded in Javascript?
What's your recommendation?All best,
Fred. -
Hi,
Thank you for this question and the responses because we encountered the same issue; Screaming Frog was only crawling a handful of products out of hundreds, because of JS. We made significant changes to the redirect rules on our dev site, and we want to make sure that the changes will not cause any crawling errors before we deploy to the live site. Is there any way to disable JS just for the purpose of a Screaming Frog crawl?
Our dev site is: https://msc-nop.com
Our regular site is: https://medicalscrubscollection.com
Thanks in advance!
-
I'm not sure if this has been fixed already, and thank you for Dan for chiming in, but I was able to crawl around 700 URLs.
-
Cheers @Andy & @Patrick
Hi Fred,
I haven't performed an extensive check, but the SEO Spider crawls around 35 URLs with /pi/ in the string, which is presumably not all the products on the site
Patrick actually mentions the issue in one of his points above. Essentially it looks like the site uses JavaScript on category pages for products, example - http://www.netspiren.dk/pl/Helse-Homøopati-Allergica-Ron-serien_58721.aspx
If you disable JS in your browser, you'll see a blank page where the products were. Our tool doesn't execute JS, although Google is much smarter and often can.
However, I'll leave you to verify that -
Hope that helps!
Cheers
Dan
-
I have sent Dan from Screaming Frog a tweet for you Fred. I'm sure he will be along presently
-Andy
-
Hi there
It's crawling for me. Here are a list of reasons why ScreamingFrog won't crawl your site:
- The site is blocked by robots.txt. A count of pages blocked by robots.txt is shown in the crawl overview pane on top right hand site of the user interface. You can configure the SEO Spider to ignore robots.txt by going to the “Basic” tab under Configuration->Spider.
- The site behaves differently depending on User Agent. Try changing the User Agent under Configuration->User Agent.
- The site requires JavaScript. Try looking at the site in your browser with JavaScript disabled.
- The site requires Cookies. Can you view the site with cookies disabled in your browser? Licenced users can enable cookies by going to Configuration->Spider and ticking “Allow Cookies” in the “Advanced” tab.
- The ‘nofollow’ attribute is present on links not being crawled. There is an option in Configuration->Spider under the “Basic” tab to follow ‘nofollow’ links.
- The page has a page level ‘nofollow’ attribute. The could be set by either a meta robots tag or an X-Robots-Tag in the HTTP header. These can be seen in the “Directives” tab in the “Nofollow” filter.
- The website is using framesets. The SEO Spider does not crawl the frame src attribute.
- The Content-Type header did not indicate the page is html. This is shown in the Content column and should be either text/html or application/xhtml+xml.
Run through your settings and check and see if you may have turned something on inadvertently that you didn't mean to. One thing you can try, is goto Configuration > Spider and then goto the last option Ignore robots.txt. Click the checkbox and try running it again.
It could just be a slow connection on your end. Give it a few minutes and see if any of the above suggestions work.
Hope this helps! Good luck!
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
How can a recruitment company get 'credit' from Google when syndicating job posts?
I'm working on an SEO strategy for a recruitment agency. Like many recruitment agencies, they write tons of great unique content each month and as agencies do, they post the job descriptions to job websites as well as their own. These job websites won't generally allow any linking back to the agency website from the post. What can we do to make Google realise that the originator of the post is the recruitment agency and they deserve the 'credit' for the content? The recruitment agency has a low domain authority and so we've very much at the start of the process. It would be a damn shamn if they produced so much great unique content but couldn't get Google to recognise it. Google's advice says: "Syndicate carefully: If you syndicate your content on other sites, Google will always show the version we think is most appropriate for users in each given search, which may or may not be the version you'd prefer. However, it is helpful to ensure that each site on which your content is syndicated includes a link back to your original article. You can also ask those who use your syndicated material to use the noindex meta tag to prevent search engines from indexing their version of the content." - But none of that can happen. Those big job websites just won't do it. A previous post here didn't get a sufficient answer. I'm starting to think there isn't an answer, other than having more authority than the websites we're syndicating to. Which isn't going to happen any time soon! Any thoughts?
Intermediate & Advanced SEO | | Mark_Reynolds0 -
Sites still rank who don't seem like they should. Why?
So you've been MOZing and SEOing for years and we're all convinced of the 10x factor when it comes to content and ranking for certain search terms... right? So what do you do when some older sites that don't even produce content dominate the first page of a very important search term? They're home pages with very little content and have clearly all dabbled in pre Panda SEO. Surely people are still seeing this and wondering why?
Intermediate & Advanced SEO | | wearehappymedia0 -
What was your experience with changing site url's?
I work with a company that is about to move to a new platform. Because the category and page structure is different every almost every url but the home page will need to be 301 redirected. I know how to do this and am pretty sure I will find and fix 99% ahead of time and not have too many 404's showing up in webmaster tools to clean up. My question is has anyone who is reading this post had to do this before and what was your experience with organic traffic after you made the switch. I am predicting that even if I successfully redirected 100% of the url's there would be some loss for a couple of months just due to the fact that we are making a major change. My bosses are asking if there will be any loss and I need to tell them what to expect.
Intermediate & Advanced SEO | | KentH0 -
Site's disappearnce in web rankings
I'm currently doing some work on a website: http://www.abetterdriveway.com.au. Upon starting, I detected a lot of spammy links going to this website and sort to remove them before submitting a disavow report. A few months later, this site completely disappeared in the rankings, with all keywords suddenly not ranked. I realised that the test website (which was put up to view before the new site went live) was still up on another URL and Google was suddenly ranking that site instead. Hence, I ensured that test site was completely removed. 3 weeks later however, the site (www.abetterdriveway.com.au) still remains unranked for its keywords. Upon checking Web Master Tools, I cannot see anything that stands out. There is no manual action or crawling issues that I can detect. Would anyone know the reason for this persistent disappearance? Is it something I will just have to wait out until ranking results come back, or is there something I am missing? Help here would be much appreciated.
Intermediate & Advanced SEO | | Gavo0 -
Is my text readable? I don't see it in the page source
Text on my site seems to be readable in a text only version (the page is not cached so I viewed it by disabling JAVA and then copy and pasted the page into Word) However, when I look in the page source I don't see the text there. The text was created using Open X html boxes to help us with formatting, but is this causing an SEO problem?
Intermediate & Advanced SEO | | theLotter0 -
Are This Site's Backlinks Hurting Us?
Google WMT reports more than 198,000 backlinks to our site (www.audiobooksonline.com) from http://dilandau.eu/? We have never been notified by Google of any penalty, malware notification... but continue to struggle to get our page 1 Google ranking back since Panda. Could these backlinks be hurting our Google ranking? Should I implement a disavow rule for http://dilandau.eu/?
Intermediate & Advanced SEO | | lbohen0 -
Refocusing a site's conent
Here's a question I was asked recently, and I can really see going either way, but want to double check my preference. The site has been around for years and over that time expanded it's content to a variety of areas that are not really core to it's mission, income or themed content. These jettisonable other areas have a fair amount of built up authority but don't really contribute anything to the site's bottom line. The site is considering what to do with these off-theme pages and the two options seem to be: Leave them in place, but make them hard to find for users, thus preserving their authority as an inlink to other core pages. or... Just move on and 301 the pages to whatever is half-way relevant. The 301 the pages camp seems to believe that making the site's existing/remaining content focused on three or four narrower areas will have benefits for what Google sees the site as being about. So, instead of being about 12 different things that aren't too related to each other, the site will be about 3 or 4 things that are kinda related to eachother. Personally, I'm not eager to let go of old pages because they do produce some traffic and have some authority value to help the core pages via in-context and navigation links. On the other hand, maybe focusing more would have benefits search benefits. What do think? Best... Darcy
Intermediate & Advanced SEO | | 945010 -
Why are our sites top landing pages URL's that no longer exist and retrun 404 errors?
Digging through analytics today an noticed that our sites top landing pages are for pages that were part of the old www.towelsrus.co.uk website taken down almost 12 months ago. All these pages had the 301 re-directs which were removed a few months back but still have not dropped out of Googles crawl error logs. I can't understand why this is happening but almost certainly the bounce rate on these pages (100%) mean we are loosing potential conversions. How can I identify what keywords and links people are using to land on these pages?
Intermediate & Advanced SEO | | Towelsrus0