ScreamingFrog won't crawl my site.
-
Hey guys,
My site is Netspiren.dk and when I use a tool like Screaming Frog or Integrity, it only crawls my homepage and menu's - not product-pages.
Examples
A menu: http://www.netspiren.dk/pl/Helse-Kosttilskud-Blandingsolie_57699.aspx
A product: http://www.netspiren.dk/pi/All-Omega-3-6-9-180-kapsler_1412956_57699.aspxIs it because the products are being loaded in Javascript?
What's your recommendation?All best,
Fred. -
Hi,
Thank you for this question and the responses because we encountered the same issue; Screaming Frog was only crawling a handful of products out of hundreds, because of JS. We made significant changes to the redirect rules on our dev site, and we want to make sure that the changes will not cause any crawling errors before we deploy to the live site. Is there any way to disable JS just for the purpose of a Screaming Frog crawl?
Our dev site is: https://msc-nop.com
Our regular site is: https://medicalscrubscollection.com
Thanks in advance!
-
I'm not sure if this has been fixed already, and thank you for Dan for chiming in, but I was able to crawl around 700 URLs.
-
Cheers @Andy & @Patrick
Hi Fred,
I haven't performed an extensive check, but the SEO Spider crawls around 35 URLs with /pi/ in the string, which is presumably not all the products on the site
Patrick actually mentions the issue in one of his points above. Essentially it looks like the site uses JavaScript on category pages for products, example - http://www.netspiren.dk/pl/Helse-Homøopati-Allergica-Ron-serien_58721.aspx
If you disable JS in your browser, you'll see a blank page where the products were. Our tool doesn't execute JS, although Google is much smarter and often can.
However, I'll leave you to verify that -
Hope that helps!
Cheers
Dan
-
I have sent Dan from Screaming Frog a tweet for you Fred. I'm sure he will be along presently
-Andy
-
Hi there
It's crawling for me. Here are a list of reasons why ScreamingFrog won't crawl your site:
- The site is blocked by robots.txt. A count of pages blocked by robots.txt is shown in the crawl overview pane on top right hand site of the user interface. You can configure the SEO Spider to ignore robots.txt by going to the “Basic” tab under Configuration->Spider.
- The site behaves differently depending on User Agent. Try changing the User Agent under Configuration->User Agent.
- The site requires JavaScript. Try looking at the site in your browser with JavaScript disabled.
- The site requires Cookies. Can you view the site with cookies disabled in your browser? Licenced users can enable cookies by going to Configuration->Spider and ticking “Allow Cookies” in the “Advanced” tab.
- The ‘nofollow’ attribute is present on links not being crawled. There is an option in Configuration->Spider under the “Basic” tab to follow ‘nofollow’ links.
- The page has a page level ‘nofollow’ attribute. The could be set by either a meta robots tag or an X-Robots-Tag in the HTTP header. These can be seen in the “Directives” tab in the “Nofollow” filter.
- The website is using framesets. The SEO Spider does not crawl the frame src attribute.
- The Content-Type header did not indicate the page is html. This is shown in the Content column and should be either text/html or application/xhtml+xml.
Run through your settings and check and see if you may have turned something on inadvertently that you didn't mean to. One thing you can try, is goto Configuration > Spider and then goto the last option Ignore robots.txt. Click the checkbox and try running it again.
It could just be a slow connection on your end. Give it a few minutes and see if any of the above suggestions work.
Hope this helps! Good luck!
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
I have a metadata issue. My site crawl is coming back with missing descriptions, but all of the pages look like site tags (i.e. /blog/?_sft_tag=call-routing)
I have a metadata issue. My site crawl is coming back with missing descriptions, but all of the pages look like site tags (i.e. /blog/?_sft_tag=call-routing)
Intermediate & Advanced SEO | | amarieyoussef0 -
Why isn't Google indexing this site?
Hello, Moz Community My client's site hasn't been indexed by Google, although it was launched a couple of months ago. I've ran down the check points in this article https://moz.com/ugc/8-reasons-why-your-site-might-not-get-indexed without finding a reason why. Any sharp SEO-eyes out there who can spot this quickly? The url is: http://www.oldermann.no/ Thank you
Intermediate & Advanced SEO | | Inevo
INEVO, digital agency0 -
Ranking for keyword I don't optimize for & Other oddities
Hi Moz Community! I've been working with a clients website for about a year now. They were hit with the original Panda update because of some spammy links from a shady SEO firm. We've made a decent climb back but not a full recovery. There are some weird things happening that I would love some insight into. 1. Ranking for keywords we don't optimize for: I noticed some low keyword volume for a keyword term that is close to our main term, but is slightly different. We don't optimize for this term at all on our website. We rank third for this term, and actually show site links in the result, which doesn't happen for any of our other pages. 2. Index not found when doing site: search: Other oddity is that when you search site:www.mywebsite.com, I see all the pages within the site except the homepage. Not sure whats going on here, but when I fetch the homepage in GWMT, it returns the homepage. When you query the homepage by itself, it also ranks. Any help would be appreciated! Regards, J
Intermediate & Advanced SEO | | artscienceweb0 -
What was your experience with changing site url's?
I work with a company that is about to move to a new platform. Because the category and page structure is different every almost every url but the home page will need to be 301 redirected. I know how to do this and am pretty sure I will find and fix 99% ahead of time and not have too many 404's showing up in webmaster tools to clean up. My question is has anyone who is reading this post had to do this before and what was your experience with organic traffic after you made the switch. I am predicting that even if I successfully redirected 100% of the url's there would be some loss for a couple of months just due to the fact that we are making a major change. My bosses are asking if there will be any loss and I need to tell them what to expect.
Intermediate & Advanced SEO | | KentH0 -
Any reasons why social media properties are ranking higher than the site's own name?
The site below has social media properties and other sites coming up before it's own listing even for the exact search of the site name. Any ideas why this is happening? Link Any input is appreciated.
Intermediate & Advanced SEO | | SEO5Team0 -
Google isn't seeing the content but it is still indexing the webpage
When I fetch my website page using GWT this is what I receive. HTTP/1.1 301 Moved Permanently
Intermediate & Advanced SEO | | jacobfy
X-Pantheon-Styx-Hostname: styx1560bba9.chios.panth.io
server: nginx
content-type: text/html
location: https://www.inscopix.com/
x-pantheon-endpoint: 4ac0249e-9a7a-4fd6-81fc-a7170812c4d6
Cache-Control: public, max-age=86400
Content-Length: 0
Accept-Ranges: bytes
Date: Fri, 14 Mar 2014 16:29:38 GMT
X-Varnish: 2640682369 2640432361
Age: 326
Via: 1.1 varnish
Connection: keep-alive What I used to get is this: HTTP/1.1 200 OK
Date: Thu, 11 Apr 2013 16:00:24 GMT
Server: Apache/2.2.23 (Amazon)
X-Powered-By: PHP/5.3.18
Expires: Sun, 19 Nov 1978 05:00:00 GMT
Last-Modified: Thu, 11 Apr 2013 16:00:24 +0000
Cache-Control: no-cache, must-revalidate, post-check=0, pre-check=0
ETag: "1365696024"
Content-Language: en
Link: ; rel="canonical",; rel="shortlink"
X-Generator: Drupal 7 (http://drupal.org)
Connection: close
Transfer-Encoding: chunked
Content-Type: text/html; charset=utf-8 xmlns:content="http://purl.org/rss/1.0/modules/content/"
xmlns:dc="http://purl.org/dc/terms/"
xmlns:foaf="http://xmlns.com/foaf/0.1/"
xmlns:og="http://ogp.me/ns#"
xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
xmlns:sioc="http://rdfs.org/sioc/ns#"
xmlns:sioct="http://rdfs.org/sioc/types#"
xmlns:skos="http://www.w3.org/2004/02/skos/core#"
xmlns:xsd="http://www.w3.org/2001/XMLSchema#"> <title>Inscopix | In vivo rodent brain imaging</title>0 -
Troubled QA Platform - Site Map vs Site Structure
I'm running a Q&A forum that was built prioritizing UX over SEO. This decision has cause a bit of a headache as we're 6 months into the project with 2278 Q&A pages with extremely minimal traffic coming from search engines. The structure has the following hiccups: A. The category navigation from the main Q&A page is entirely javascript and only navigable by users. B. We identify Google bots and send them to another version of the Q&A platform w/o javascript. Category links don't exist in this google bot version of the main Q&A page. On this Google version of the main Q&A page, the Pinterest-like tiles displaying individual Q&As are capped at 10. This means that the only way google bot can identify link juice being passed down to individual QAs (after we've directed them to this page) is through 10 random Q&As. C. All 2278 of the QAs are currently indexed in search. They are just indexed very very poorly in SERPs. My personal assumption, is that Google can't pass link juice to any of the Q&As (poor SERP) but registers them from the site map so it gets included in Google's index. My dilemma has me struggling between two different decisions: 1. Update the navigation in the header to remove the javascript and fundamentally change the look and feel of the Q&A platform. This will allow Google bot to navigate through Expert category links to pass link juice to all Q&As. or 2. Update the redirected main Q&A page to include hard coded category links with 100s of hard coded Q&As under each category page. Make it similar, ugly, flat and efficient for the crawling bots. Any suggestions would be greatly appreciated. I need to find a solution as soon as possible.
Intermediate & Advanced SEO | | TQContent0 -
If I had an issue with a friendly URL module and I lost all my rankings. Will they return now that issue is resolved next time I'm crawled by google?
I have 'magic seo urls' installed on my zencart site. Except for some reason no one can explain why or how the files were disabled. So my static links went back to dynamic (index.php?**********) etc. The issue was resolved with the module except in that time google must have crawled my site and I lost all my rankings. I'm nowher to be found in the top 50. Did this really cause such an extravagant SEO issue as my web developers told me? Can I expect my rankings to return next time my site is crawled by google?
Intermediate & Advanced SEO | | Pete790