ScreamingFrog won't crawl my site.
-
Hey guys,
My site is Netspiren.dk and when I use a tool like Screaming Frog or Integrity, it only crawls my homepage and menu's - not product-pages.
Examples
A menu: http://www.netspiren.dk/pl/Helse-Kosttilskud-Blandingsolie_57699.aspx
A product: http://www.netspiren.dk/pi/All-Omega-3-6-9-180-kapsler_1412956_57699.aspxIs it because the products are being loaded in Javascript?
What's your recommendation?All best,
Fred. -
Hi,
Thank you for this question and the responses because we encountered the same issue; Screaming Frog was only crawling a handful of products out of hundreds, because of JS. We made significant changes to the redirect rules on our dev site, and we want to make sure that the changes will not cause any crawling errors before we deploy to the live site. Is there any way to disable JS just for the purpose of a Screaming Frog crawl?
Our dev site is: https://msc-nop.com
Our regular site is: https://medicalscrubscollection.com
Thanks in advance!
-
I'm not sure if this has been fixed already, and thank you for Dan for chiming in, but I was able to crawl around 700 URLs.
-
Cheers @Andy & @Patrick
Hi Fred,
I haven't performed an extensive check, but the SEO Spider crawls around 35 URLs with /pi/ in the string, which is presumably not all the products on the site
Patrick actually mentions the issue in one of his points above. Essentially it looks like the site uses JavaScript on category pages for products, example - http://www.netspiren.dk/pl/Helse-Homøopati-Allergica-Ron-serien_58721.aspx
If you disable JS in your browser, you'll see a blank page where the products were. Our tool doesn't execute JS, although Google is much smarter and often can.
However, I'll leave you to verify that -
Hope that helps!
Cheers
Dan
-
I have sent Dan from Screaming Frog a tweet for you Fred. I'm sure he will be along presently
-Andy
-
Hi there
It's crawling for me. Here are a list of reasons why ScreamingFrog won't crawl your site:
- The site is blocked by robots.txt. A count of pages blocked by robots.txt is shown in the crawl overview pane on top right hand site of the user interface. You can configure the SEO Spider to ignore robots.txt by going to the “Basic” tab under Configuration->Spider.
- The site behaves differently depending on User Agent. Try changing the User Agent under Configuration->User Agent.
- The site requires JavaScript. Try looking at the site in your browser with JavaScript disabled.
- The site requires Cookies. Can you view the site with cookies disabled in your browser? Licenced users can enable cookies by going to Configuration->Spider and ticking “Allow Cookies” in the “Advanced” tab.
- The ‘nofollow’ attribute is present on links not being crawled. There is an option in Configuration->Spider under the “Basic” tab to follow ‘nofollow’ links.
- The page has a page level ‘nofollow’ attribute. The could be set by either a meta robots tag or an X-Robots-Tag in the HTTP header. These can be seen in the “Directives” tab in the “Nofollow” filter.
- The website is using framesets. The SEO Spider does not crawl the frame src attribute.
- The Content-Type header did not indicate the page is html. This is shown in the Content column and should be either text/html or application/xhtml+xml.
Run through your settings and check and see if you may have turned something on inadvertently that you didn't mean to. One thing you can try, is goto Configuration > Spider and then goto the last option Ignore robots.txt. Click the checkbox and try running it again.
It could just be a slow connection on your end. Give it a few minutes and see if any of the above suggestions work.
Hope this helps! Good luck!
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Syntax: 'canonical' vs "canonical" (Apostrophes or Quotes) does it matter?
I have been working on a site and through all the tools (Screaming Frog & Moz Bar) I've used it recognizes the canonical, but does Google? This is the only site I've worked on that has apostrophes. rel='canonical' href='https://www.example.com'/> It's apostrophes vs quotes. Could this error in syntax be causing the canonical not to be recognized? rel="canonical"href="https://www.example.com"/>
Intermediate & Advanced SEO | | ccox10 -
Help me to understand why this page doesn't rank
Hello everyone. I am trying to understand why most of my website category pages don't show up in the in the first 50 organic results on Google, despite my high website DA and high PA of those pages. We used to rank high a few years ago, not clear why most of those pages have almost completely disappeared. So, just to take one as an example, please, help me to understand why this page doesn't shows up in the first 50 organic search results for the keyword "cello sheet music": http://www.virtualsheetmusic.com/downloads/Indici/Cello.html I really can't explain why, unless we are under some sort of "penalization" or similar (a curse?!)... I have analyzed any possible metric, and can't find a logical explanation. Looking forward for your thoughts guys! All the best, Fab.
Intermediate & Advanced SEO | | fablau0 -
Silly Question still - Because I am paying high to google adwords is it possible google can't rank me high in organic?
Hello All, My ecommerce site gone in penalty more than 3 years before and within 3 months I got message from google penalty removed. Since then till date my organic ranking is very worst. In this 3 years I improved my site onpage very great. If I compare my site with all other competitors who are ranking in top 10 then my onpage that includes all schema, reviews, sitemap, header tags, meta's etc, social media, site structure, most imp speed, google page speed insight score, pingdom, w3c errors, alexa rank, global rank, UI, offers, design, content, code to text raito, engagement rate, page views, time on site etc all my sites always good compare to competitors. They also have few backlinks I do have few backlinks only. I am doing very high google adwords and my conversion rate is very very good. But do you think because I am paying since last 3 year high to google because of that google have some setting or strategy that those who perform well in adwords so not to bring up in organic? Is it possible I can talk with google on this? If yes then what will be the medium of conversation? Pls give some valuable inputs I am performing very much in paid so user end site is very very well. Thanks!
Intermediate & Advanced SEO | | pragnesh96390 -
Wondering why PR hasn't increased?
Hi there, I’ve been working on a website for about 6 months now and the page rank still remains at 0 - Google Page Rank. Fresh content has been created across the majority of the site, blog implemented, titles and meta’s, schema.org, we've built some good links etc. There are a lot of 404’errors but a lot of this is to do with stocking issues, products being sold/taken down and new products being put up. Do you think this is the major reason the page rank is not moving – but 404’s are a regular occurrence on a lot of E-Commerce sites. Also, the server went off line on two occasions(obviously Google frowns upon this) but in general server is grand. Also when we started working on the website it wasn't in the best of shape DA: 11, now it's DA:17. I know still not great but moving in the right direction. Just wondering yer thoughts on the PR?
Intermediate & Advanced SEO | | niamhomahony0 -
What can you do when Google can't decide which of two pages is the better search result
On one of our primary keywords Google is swapping out (about every other week) returning our home page, which is more transactional, with a deeper more information based page. So if you look at the Analysis in Moz you get an almost double helix like graph of those pages repeatedly swapping places. So there seems to be a bit of cannibalizing happening that I don't know how to correct. I think part of the problem is the deeper page would ideally be "longer" tail searches that contain the one word keyword that is having this bouncing problem as a part of the longer phrase. What can be done to try prevent this from happening? Can internal links help? I tried adding a link on that term to the deeper page to our homepage, and in a knee jerk reaction was asked to pull that link before I think there was really any evidence to suggest that that one new link made a positive or negative effect. There are some crazy theories floating around at the moment, but I am curious what others think both about if adding a link from a informational to a transactional page could in fact have a negative effect, and what else could be done/tried to help clarify the difference between the two pages for the search engines.
Intermediate & Advanced SEO | | plumvoice0 -
New website won't rank for branded keywords in Google, but does in Bing
We launched a website in October www.butterfly.com. The branded product name "Butterfly Body Liners" will not rank until page 2 of Google, but it ranks #1 in Bing. Organic traffic never really picked up so it's not easy to tell if it's been "hit" by any penalty. The strange thing is, this website: http://archive.is/PQZdO is ranking #1. This is an archived version of the site. Does anyone have any insight as to why this is happening?
Intermediate & Advanced SEO | | LaughlinConstable0 -
Want to merge high ranking niche websites into a new mega site, but don't want to lose authority from old top level pages
I have a few older websites that SERP well, and I am considering merging some or all of them into a new related website that I will be launching regardless. My old websites display real estate listings and not much else. Each website is devoted to showing homes for sale in a specific neighborhood. The domains are all in the form of Neighborhood1CityHomes.com, Neighborhood2CityHomes.com, etc. These sites SERP well for searches like "Neighborhood1 City homes for sale" and also "Neighborhood1 City real estate" where some or all of the query is in the domain name. Google simply points to the top of the domain although each site has a few interior pages that are rarely used. There is next to zero backlinking to the old domains, but each links to the other with anchor text like "Neighborhood1 Cityname real estate". That's pretty much the extent of the link profile. The new website will be a more comprehensive search portal where many neighborhoods and cities can be searched. The domain name is a nonsense word .com not related to actual key words. The structure will be like newdomain.com/cityname/neighborhood-name/ where the neighborhood real estate listings are that would replace the old websites, and I'd 301 the old sites to the appropriate internal directories of the new site. The content on the old websites is all on the home page of each, at least the content for searches that matter to me and rank well, and I read an article suggesting that Google assigns additional authority for top level pages (can I link to that here?). I'd be 301-ing each old domain from a top level to a 3rd level interior page like www. newdomain/cityname/neighborhood1/. The new site is better than the old sites by a wide margin, especially on mobile, but I don't want to lose all my top positions for some tough phrases. I'm not running analytics on the old sites in question, but each of the old sites has extensive past history with AdWords (which I don't run any more). So in theory Google knows these old sites are good quality.
Intermediate & Advanced SEO | | Gogogomez0 -
Does Google crawl the pages which are generated via the site's search box queries?
For example, if I search for an 'x' item in a site's search box and if the site displays a list of results based on the query, would that page be crawled? I am asking this question because this would be a URL that is non existent on the site and hence am confused as to whether Google bots would be able to find it.
Intermediate & Advanced SEO | | pulseseo0