ScreamingFrog won't crawl my site.
-
Hey guys,
My site is Netspiren.dk and when I use a tool like Screaming Frog or Integrity, it only crawls my homepage and menu's - not product-pages.
Examples
A menu: http://www.netspiren.dk/pl/Helse-Kosttilskud-Blandingsolie_57699.aspx
A product: http://www.netspiren.dk/pi/All-Omega-3-6-9-180-kapsler_1412956_57699.aspxIs it because the products are being loaded in Javascript?
What's your recommendation?All best,
Fred. -
Hi,
Thank you for this question and the responses because we encountered the same issue; Screaming Frog was only crawling a handful of products out of hundreds, because of JS. We made significant changes to the redirect rules on our dev site, and we want to make sure that the changes will not cause any crawling errors before we deploy to the live site. Is there any way to disable JS just for the purpose of a Screaming Frog crawl?
Our dev site is: https://msc-nop.com
Our regular site is: https://medicalscrubscollection.com
Thanks in advance!
-
I'm not sure if this has been fixed already, and thank you for Dan for chiming in, but I was able to crawl around 700 URLs.
-
Cheers @Andy & @Patrick
Hi Fred,
I haven't performed an extensive check, but the SEO Spider crawls around 35 URLs with /pi/ in the string, which is presumably not all the products on the site
Patrick actually mentions the issue in one of his points above. Essentially it looks like the site uses JavaScript on category pages for products, example - http://www.netspiren.dk/pl/Helse-Homøopati-Allergica-Ron-serien_58721.aspx
If you disable JS in your browser, you'll see a blank page where the products were. Our tool doesn't execute JS, although Google is much smarter and often can.
However, I'll leave you to verify that -
Hope that helps!
Cheers
Dan
-
I have sent Dan from Screaming Frog a tweet for you Fred. I'm sure he will be along presently
-Andy
-
Hi there
It's crawling for me. Here are a list of reasons why ScreamingFrog won't crawl your site:
- The site is blocked by robots.txt. A count of pages blocked by robots.txt is shown in the crawl overview pane on top right hand site of the user interface. You can configure the SEO Spider to ignore robots.txt by going to the “Basic” tab under Configuration->Spider.
- The site behaves differently depending on User Agent. Try changing the User Agent under Configuration->User Agent.
- The site requires JavaScript. Try looking at the site in your browser with JavaScript disabled.
- The site requires Cookies. Can you view the site with cookies disabled in your browser? Licenced users can enable cookies by going to Configuration->Spider and ticking “Allow Cookies” in the “Advanced” tab.
- The ‘nofollow’ attribute is present on links not being crawled. There is an option in Configuration->Spider under the “Basic” tab to follow ‘nofollow’ links.
- The page has a page level ‘nofollow’ attribute. The could be set by either a meta robots tag or an X-Robots-Tag in the HTTP header. These can be seen in the “Directives” tab in the “Nofollow” filter.
- The website is using framesets. The SEO Spider does not crawl the frame src attribute.
- The Content-Type header did not indicate the page is html. This is shown in the Content column and should be either text/html or application/xhtml+xml.
Run through your settings and check and see if you may have turned something on inadvertently that you didn't mean to. One thing you can try, is goto Configuration > Spider and then goto the last option Ignore robots.txt. Click the checkbox and try running it again.
It could just be a slow connection on your end. Give it a few minutes and see if any of the above suggestions work.
Hope this helps! Good luck!
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Breaking up a site into multiple sites
Hi, I am working on plan to divide up mid-number DA website into multiple sites. So the current site's content will be divided up among these new sites. We can't share anything going forward because each site will be independent. The current homepage will change to just link out to the new sites and have minimal content. I am thinking the websites will take a hit in rankings but I don't know how much and how long the drop will last. I know if you redirect an entire domain to a new domain the impact is negligible but in this case I'm only redirecting parts of a site to a new domain. Say we rank #1 for "blue widget" on the current site. That page is going to be redirected to new site and new domain. How much of a drop can we expect? How hard will it be to rank for other new keywords say "purple widget" that we don't have now? How much link juice can i expect to pass from current website to new websites? Thank you in advance.
Intermediate & Advanced SEO | | timdavis0 -
Do I need to remove pages that don't get any traffic from the index?
Hi, Do I need to remove pages that don't get any traffic from the index? Thanks Roy
Intermediate & Advanced SEO | | kadut1 -
The images on site are not found/indexed, it's been recommended we change their presentation to Google Bot - could this create a cloaking issue?
Hi We have an issue with images on our site not being found or indexed by Google. We have an image sitemap but the images are served on the Sitecore powered site within <divs>which Google can't read. The developers have suggested the below solution:</divs> Googlebot class="header-banner__image" _src="/~/media/images/accommodation/arctic-canada/arctic-safari-camp/arctic-cafari-camp-david-briggs.ashx"/>_Non Googlebot <noscript class="noscript-image"><br /></span></em><em><span><div role="img"<br /></span></em><em><span>aria-label="Arctic Safari Camp, Arctic Canada"<br /></span></em><em><span>title="Arctic Safari Camp, Arctic Canada"<br /></span></em><em><span>class="header-banner__image"<br /></span></em><em><span>style="background-image: url('/~/media/images/accommodation/arctic-canada/arctic-safari-camp/arctic-cafari-camp-david-briggs.ashx?mw=1024&hash=D65B0DE9B311166B0FB767201DAADA9A4ADA4AC4');"></div><br /></span></em><em><span></noscript> aria-label="Arctic Safari Camp, Arctic Canada" title="Arctic Safari Camp, Arctic Canada" class="header-banner__image image" data-src="/~/media/images/accommodation/arctic-canada/arctic-safari-camp/arctic-cafari-camp-david-briggs.ashx" data-max-width="1919" data-viewport="0.80" data-aspect="1.78" data-aspect-target="1.00" > Is this something that could be flagged as potential cloaking though, as we are effectively then showing code looking just for the user agent Googlebot?The devs have said that via their contacts Google has advised them that the original way we set up the site is the most efficient and considered way for the end user. However they have acknowledged the Googlebot software is not sophisticated enough to recognise this. Is the above solution the most suitable?Many thanksKate
Intermediate & Advanced SEO | | KateWaite0 -
Something happened within the last 2 weeks on our WordPress-hosted site that created "duplicates" by counting www.company.com/example and company.com/example (without the 'www.') as separate pages. Any idea what could have happened, and how to fix it?
Our website is running through WordPress. We've been running Moz for over a month now. Only recently, within the past 2 weeks, have we been alerted to over 100 duplicate pages. It appears something happened that created a duplicate of every single page on our site; "www.company.com/example" and "company.com/example." Again, according to our MOZ, this is a recent issue. I'm almost certain that prior to a couple of weeks ago, there existed both forms of the URL that directed to the same page without be counting as a duplicate. Thanks for you help!
Intermediate & Advanced SEO | | wzimmer0 -
Help! The website ranks fine but one of my web pages simply won't rank on Google!!!
One of our web pages will not rank on Google. The website as a whole ranks fine except just one section...We have tested and it looks fine...Google can crawl the page no problem. There are no spurious redirects in place. The content is fine. There is no duplicate page content issue. The page has a dozen product images (photos) but the load time of the page is absolutely fine. We have the submitted the page via webmaster and its fine. It gets listed but then a few hours later disappears!!! The site has not been penalised as we get good rankings with other pages. Can anyone help? Know about this problem?
Intermediate & Advanced SEO | | CayenneRed890 -
Alt tag for src='blank.gif' on lazy load images
I didn't find an answer on a search on this, so maybe someone here has faced this before. I am loading 20 images that are in the viewport and a bit below. The next 80 images I want to 'lazy-load'. They therefore are seen by the bot as a blank.gif file. However, I would like to get some credit for them by giving a description in the alt tag. Is that a no-no? If not, do they all have to be the same alt description since the src name is the same? I don't want to mess things up with Google by being too aggressive, but at the same time those are valid images once they are lazy loaded, so would like to get some credit for them. Thanks! Ted
Intermediate & Advanced SEO | | friendoffood0 -
Google don't index .ee version of a website
Hello, We have a problem with our clients website .ee. This website was developed by another company and now we don't know what is wrong with it. If i do a Google search "site:.ee" it only finds konelux.ee homepage and nothing else. Also homepage title tag and meta dec is in Finnish language not in Estonian language. If i look at .ee/robots.txt it looks like robots.txt don't block Google access. Any ideas what can be wrong here? BR, T
Intermediate & Advanced SEO | | sfinance0 -
Competitior 'scraped' entire site - pretty much - what to do?
I just discovered a competitor in the insurance lead generation space has completely copied my client's site's architecture, page names, titles, even the form, tweaking a word or two here or there to prevent 100% 'scraping'. We put a lot of time into the site, only to have everything 'stolen'. What can we do about this? My client is very upset. I looked into filing a 'scraper' report through Google but the slight modifications to content technically don't make it a 'scraped' site. Please advise to what course of action we can take, if any. Thanks,
Intermediate & Advanced SEO | | seagreen
Greg0