ScreamingFrog won't crawl my site.
-
Hey guys,
My site is Netspiren.dk and when I use a tool like Screaming Frog or Integrity, it only crawls my homepage and menu's - not product-pages.
Examples
A menu: http://www.netspiren.dk/pl/Helse-Kosttilskud-Blandingsolie_57699.aspx
A product: http://www.netspiren.dk/pi/All-Omega-3-6-9-180-kapsler_1412956_57699.aspxIs it because the products are being loaded in Javascript?
What's your recommendation?All best,
Fred. -
Hi,
Thank you for this question and the responses because we encountered the same issue; Screaming Frog was only crawling a handful of products out of hundreds, because of JS. We made significant changes to the redirect rules on our dev site, and we want to make sure that the changes will not cause any crawling errors before we deploy to the live site. Is there any way to disable JS just for the purpose of a Screaming Frog crawl?
Our dev site is: https://msc-nop.com
Our regular site is: https://medicalscrubscollection.com
Thanks in advance!
-
I'm not sure if this has been fixed already, and thank you for Dan for chiming in, but I was able to crawl around 700 URLs.
-
Cheers @Andy & @Patrick
Hi Fred,
I haven't performed an extensive check, but the SEO Spider crawls around 35 URLs with /pi/ in the string, which is presumably not all the products on the site
Patrick actually mentions the issue in one of his points above. Essentially it looks like the site uses JavaScript on category pages for products, example - http://www.netspiren.dk/pl/Helse-Homøopati-Allergica-Ron-serien_58721.aspx
If you disable JS in your browser, you'll see a blank page where the products were. Our tool doesn't execute JS, although Google is much smarter and often can.
However, I'll leave you to verify that -
Hope that helps!
Cheers
Dan
-
I have sent Dan from Screaming Frog a tweet for you Fred. I'm sure he will be along presently
-Andy
-
Hi there
It's crawling for me. Here are a list of reasons why ScreamingFrog won't crawl your site:
- The site is blocked by robots.txt. A count of pages blocked by robots.txt is shown in the crawl overview pane on top right hand site of the user interface. You can configure the SEO Spider to ignore robots.txt by going to the “Basic” tab under Configuration->Spider.
- The site behaves differently depending on User Agent. Try changing the User Agent under Configuration->User Agent.
- The site requires JavaScript. Try looking at the site in your browser with JavaScript disabled.
- The site requires Cookies. Can you view the site with cookies disabled in your browser? Licenced users can enable cookies by going to Configuration->Spider and ticking “Allow Cookies” in the “Advanced” tab.
- The ‘nofollow’ attribute is present on links not being crawled. There is an option in Configuration->Spider under the “Basic” tab to follow ‘nofollow’ links.
- The page has a page level ‘nofollow’ attribute. The could be set by either a meta robots tag or an X-Robots-Tag in the HTTP header. These can be seen in the “Directives” tab in the “Nofollow” filter.
- The website is using framesets. The SEO Spider does not crawl the frame src attribute.
- The Content-Type header did not indicate the page is html. This is shown in the Content column and should be either text/html or application/xhtml+xml.
Run through your settings and check and see if you may have turned something on inadvertently that you didn't mean to. One thing you can try, is goto Configuration > Spider and then goto the last option Ignore robots.txt. Click the checkbox and try running it again.
It could just be a slow connection on your end. Give it a few minutes and see if any of the above suggestions work.
Hope this helps! Good luck!
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Website Isn't Ranking & I'm Not Sure Why Based On The Data
Hi Moz Community,
Intermediate & Advanced SEO | | ErrickG
I am having an issue that has been killing me for some time and I could really use another opinion. One of my client’s websites hasn't been ranking for some time and I can't put my finger on it. There are no issues showing up in the webmaster tools. If you compare the site with the tops ranking sites for the websites number one keyword, the website is just as good as everyone else. My clients website is the first one on the left in the attachment. We have better quality content but instead of showing up on page 1,2,3 the site is on page 21. I am just at a lost. Anyone have any thoughts outside looking in. Thanks,
Errick rrLJZ2G0 -
Google's form for "Small sites that should rank better" | Any experiences or results?
Back in August of 2013 Google created a form that allowed people to submit small websites that "should be ranking better in Google". There is more info about it in this article http://www.seroundtable.com/google-small-site-survey-17295.html Has anybody used it? Any experiences or results you can share? *private message if you do not want to share publicly...
Intermediate & Advanced SEO | | GregB1230 -
When I type link:mydomainname.com in Google I don't see any result, why?
My other website is 4 years old and Page Rank 3. We are into business of design and development for 5 years and still we don't have any result from Google Searches. When I type link:mydomainname.com I don't get any result. What's the reason?
Intermediate & Advanced SEO | | vikaspooja1 -
'Nofollow' footer links from another site, are they 'bad' links?
Hi everyone,
Intermediate & Advanced SEO | | romanbond
one of my sites has about 1000 'nofollow' links from the footer of another of my sites. Are these in any way hurtful? Any help appreciated..0 -
Merging Sites: Will redirecting the old homepage to an internal page on the new site cause issues?
I've ended up with two sites which have similar content (but not duplicate) and target similar keywords, rather than trying to maintain two sites I would like to merge the sites together. The old site is more of a traditional niche site and targets a particular set of keywords on its homepage, the new site is more of an authority site with a magazine type homepage and targets the same set of keywords from an internal page. My question is: Should I redirect the old site's homepage to the relevant internal page on the new website...
Intermediate & Advanced SEO | | lara_dar
...or should I redirect the old site's homepage to the new site's homepage? (the old site's homepage backlinks are a mixture of partial match keyword anchor text, naked URLs and branded anchor text) I am in two minds (a & b!) (a) Redirecting to the internal page would be great for ranking as there are some decent backlinks and the content is similar (b) But usually when you do a 301 redirect the homepage usually directs to the new homepage and some of the old site's links are related to the domain rather than the keyword (e.g. http://www.site.com) and some people will be looking for the site's homepage. What do you think? Your help is much appreciated (and hope this makes sense...!)0 -
Duplicate site (disaster recovery) being crawled and creating two indexed search results
I have a primary domain, toptable.co.uk, and a disaster recovery site for this primary domain named uk-www.gtm.opentable.com. In the event of a disaster, toptable.co.uk would get CNAMEd (DNS alias) to the .gtm site. Naturally the .gtm disaster recover domian is an exact match to the toptable.co.uk domain. Unfortunately, Google has crawled the uk-www.gtm.opentable site, and it's showing up in search results. In most cases the gtm urls don't get redirected to toptable they actually appear as an entirely separate domain to the user. The strong feeling is that this duplicate content is hurting toptable.co.uk, especially as .gtm.ot is part of the .opentable.com domain which has significant authority. So we need a way of stopping Google from crawling gtm. There seem to be two potential fixes. Which is best for this case? use the robots.txt to block Google from crawling the .gtm site 2) canonicalize the the gtm urls to toptable.co.uk In general Google seems to recommend a canonical change but in this special case it seems robot.txt change could be best. Thanks in advance to the SEOmoz community!
Intermediate & Advanced SEO | | OpenTable0 -
Long term strategy to retain link 'goodness', I need some help!
Hi, I have a few questions around the best approach to retain as much link juice / authority from transitioning multiple domains into 1 single domain over the next year or so. I have 2 similar websites (www.brandA.co.uk and www.brandB.co.uk) which I need to transition to a new website (www.brandC.co.uk) over the next 2 years. Both A&B are established and have there own brand value, brand C will be a new website. I need to start introducing the brand from website C onto A&B straight away and then eventually drop the brands from A&B and just be left with C. One idea I am considering is: www.brandA.co.uk becomes brandA.brandC.co.uk (brandA sits as a subdomain on brandC website) Ultimately over time I would drop the subdomain (brandA) and just be left with www.brandC.co.uk The other option is: www.brandA.co.uk becomes brandC.co.uk/brandA...with the same ultimate aim as above. In both above case the same would be done for brandB, either becoming a subdomain of a folder on brandC website What I need to know is what is the best way to first pass any SEO goodness from the websites for brandA and brandB to the intermediate solution of either brandA.brandC.co.uk or brandC.co.uk/brandA (I see this intermediate solution being in place for approx 2 years). And then how to transition the intermediate solution into just having brandC.co.uk Which solution will aid growing the SEO goodness on the final brandC.co.uk website? Does google see subdomains as part of the main domain and thus the main domain will benefit from any links going to the subdomain or is it better to always use /folders as google sees these as more part of one website? ...or is there another option that I haven't considered? I know it's rater confusing so please give me a shout if you want anymore info. Thanks James
Intermediate & Advanced SEO | | cewe0 -
Lots of city pages - How do I ensure we don't get penalized
We are planning on having a job posting page for each city that we are looking to hire new CFO partners in. But, the problem is, we have LOTS of locations. I was wondering what would be the best way to have similar content on each page (since the job description and requirements are the same for each job posting) without being hit by Google for having duplicate content? One of the main reasons we have decided to have location based pages is that we have noticed visitors to our site are searching for "cfo job in [location] but we notice that most of these visitors then leave. We believe it to be because the pages they land on make no mention of the location that they were looking for and is a little incongruent with what they were expecting. We are looking to use the following URLs and TItle/Description as an example: | http://careers.b2bcfo.com/cfo-jobs/Alabama/Birmingham | CFO Careers in Birmingham, AL | | Are you looking for a CFO Career in Birmingham, Alabama ? We're looking for partners there. Apply today! | | Any advice you have for this would be greatly appreciated. Thank you.
Intermediate & Advanced SEO | | B2B.CFO0