Can't crawl website with Screaming frog... what is wrong?
-
Hello all - I've just been trying to crawl a site with Screaming Frog and can't get beyond the homepage - have done the usual stuff (turn off JS and so on) and no problems there with nav and so on- the site's other pages have indexed in Google btw.
Now I'm wondering whether there's a problem with this robots.txt file, which I think may be auto-generated by Joomla (I'm not familiar with Joomla...) - are there any issues here? [just checked... and there isn't!]
If the Joomla site is installed within a folder such as at
e.g. www.example.com/joomla/ the robots.txt file MUST be
moved to the site root at e.g. www.example.com/robots.txt
AND the joomla folder name MUST be prefixed to the disallowed
path, e.g. the Disallow rule for the /administrator/ folder
MUST be changed to read Disallow: /joomla/administrator/
For more information about the robots.txt standard, see:
http://www.robotstxt.org/orig.html
For syntax checking, see:
http://tool.motoricerca.info/robots-checker.phtml
User-agent: *
Disallow: /administrator/
Disallow: /bin/
Disallow: /cache/
Disallow: /cli/
Disallow: /components/
Disallow: /includes/
Disallow: /installation/
Disallow: /language/
Disallow: /layouts/
Disallow: /libraries/
Disallow: /logs/
Disallow: /modules/
Disallow: /plugins/
Disallow: /tmp/ -
For anyone wondering; The answer above by Ecommerce Site (odd name btw) works - 21-Nov-2016.
-
This is the best I could find to so someone who had a similar problem with Joomla-
"In the premium version you can slow down the crawl rate under 'speed' in the configuration. In the free lite version, you can crawl the site and then right click on any URLs with a 403 response and press 're-spider'. The server will generally then allow you to crawl these pages (and return a 200 ok response) as you're not requesting too many at once, so you might have to re-spider them individually."
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Suggested Screaming Frog configuration to mirror default Googlebot crawl?
Hi All, Does anyone have a suggested Screaming Frog (SF) configuration to mirror default Googlebot crawl? I want to test my site and see if it will return 429 "Too Many Requests" to Google. I have set the User Agent as Googlebot (Smartphone). Is the default SF Menu > Configuration > Speed > Max Threads 5 and Max URLs 2.0 comparable to Googlebot? Context:
Intermediate & Advanced SEO | | gravymatt-se
I had tried NetPeak SEO Spider which did a nice job and had a cool feature that would pause a crawl if it got to many 429. Long Story short, B2B site threw 429 Errors when there should have been no load on a holiday weekend at 1:00 AM.0 -
Do you know if there is a tool that check all the scripts that are running on the page, and can diagonse scripts that can harm our seo?
Hi, Do you know if there is a tool that check all the scripts that are running on the page, and can diagnose scripts that can harm our seo? Thanks Roy
Intermediate & Advanced SEO | | kadut0 -
Training Website Improvements...
Hi Folks, I'm in the process of going over our corporate website with a view to improving on-page optimisation, layout, design and user experience and I would like your feedback on what you think I should improve or change with respect to SEO. Some of my ideas include: Restructure Home Page to Better Show Our Services Possibly Add a Slider to the Home Page (I know engagement rates with these are generally low) Restructure the Course Pages Completely (https://purplegriffon.com/courses/itil-training/itil-foundation-training/itil-foundation) Restructure the Events Pages Completely (https://purplegriffon.com/event/2028/itil-foundation) Improve & Streamline the Booking Process AJAXIFY the Booking Process Improve Responsive Elements I'm also interested in conducting user testing before I go ahead and make any changes. What are your thoughts? What would you change? Thanks. Gaz
Intermediate & Advanced SEO | | PurpleGriffon0 -
Redesigned Website
Hi, I have redesigned my website in html whereas it was in .asp earlier. I have resubmitted my google sitemap but it still showing me old site pages in search except home page. My question is how i can immediately change my web presence. How i can get the benefit of my .asp page ranks? In addition my website still alive with .asp. What should be the strategy, should i remove this websites or have to redirect all pages to new. If i make 301 redirection then will it cause any issue in SEO, ranking etc ? Thx
Intermediate & Advanced SEO | | 1akal0 -
How can I get a list of every url of a site in Google's index?
I work on a site that has almost 20,000 urls in its site map. Google WMT claims 28,000 indexed and a search on Google shows 33,000. I'd like to find what the difference is. Is there a way to get an excel sheet with every url Google has indexed for a site? Thanks... Mike
Intermediate & Advanced SEO | | 945010 -
Don't affiliate programs have an unfair impact on a company's ability to compete with bigger businesses?
So many coupon sites and other websites these days will only link to your website if you have a relationship with Commission Junction or one of the other large affiliate networks. It seems to me that links on these sites are really unfair as they allow businesses with deep pockets to acquire links unequitably. To me it seems like these are "paid links", as the average website cannot afford the cost of running an affiliate program. Even worse, the only reason why these businesses are earning a link is because they have an affiliate program; that to me should violate some sort of Google rule about types and values of links. The existence of an affiliate program as the only reason for earning a link is preposterous. It's just as bad as paid link directories that have no editorial standards. I realize the affiliate links are wrapped in CJ's code, so that mush diminish the value of the link, but there is still tons of good value in having the brand linked to from these high authority sites.
Intermediate & Advanced SEO | | williamelward0 -
Is it possible for a multi doctor practice to have the practice's picture displayed in Google's SERP?
Google now includes pictures of authors in the results of the pages. Therefore, a single practice doctor can include her picture into Google's SERP (http://markup.io/v/dqpyajgz7jkd). How can a multi doctor practice display the practice's picture as opposed to a single doctor? A search for Plastic Surgery Chicago displayed this (query: plastic surgery Chicago) http://markup.io/v/bx3f28ynh4w5. I found one example of a search result showing a picture of both doctors for a multi doctor practice (query: houston texas plastic surgeon). http://markup.io/v/t20gfazxfa6h
Intermediate & Advanced SEO | | CakeWebsites0 -
A Noob's SEO Plan of attack... can you critique it for me?
I've been digging my teeth into SEO for a solid 1.5 weeks or so now and I've learned a tremendous amount. However, I realize I have only scratched the surface still. One of the hardest things I've struggled with is the sheer amount of information and feeling overwhelmed. I finally think I've found a decent path. Please critique and offer input, it would be much appreciated. Step One: Site Architecture I run an online proofreading & editing service. That being said, there are lots of different segment we would eventually like to rank for other than the catch-all phrases like 'proofreading service'. For example, 'essay editing', 'resume editing', 'book editing', or even 'law school personal statement editing'. I feel that my first step is to understand how my site is built to handle this plan now, and into the future. Right now we simply have the homepage and one segment: kibin.com/essay-editing. Eventually, we will have a services page that serves almost like a site-map, showing all of our different services and linking to them. Step Two: Page Anatomy I know it is important to have a well defined anatomy to these services pages. For example, we've done a decent job with 'above the fold' content, but now understand the importance of putting the same type of care in below the fold. The plan here is to have a section for recent blog posts that pertain to that subject in a section titled "Essay Editing and Essay Writing Tips & Advice", or something to that effect. Also including some social sharing options, other resources, and an 'about us' section to assist with keyword optimization is in the plan. Step Three: Page Optimization Once we're done with Step Two, I feel that we'll finally be ready to truly optimize each of our pages. We've down some of this already, but probably less than 50%. You can see evidence of this on our essay editing page and proofreading rates page. So, the goal here is to find the most relevant keywords for each page and optimize for those to the point we have A grades on our on-page optimization reports. Step Four: Content/Passive Link Building The bones for our content strategy is in place. We have sharing links on blog posts already in place and a slight social media presence already. I admit, the blog needs some tightening up, and we can do a lot more on our social channels. However, I feel we need to start by creating content that our audience is interested in and interacting with them on a consistent basis. I do not feel like I should be chasing link building strategies or guest blog posts at this time. PLEASE correct me if I'm off base here, but only after reading step five: Step Five: Active Link Building My bias is to get some solid months of creating content and building a good social media presence where people are obviously interacting with our posts and sharing our content. My reasoning is that it will make it much easier for me to reach out to bloggers for guest posts as we'll be much more reputable after spending time doing step 4. Is this poor thinking? Should I try to get some guest blog posts in during step 4 instead? Step Six: Test, Measure, Refine I'll admit, I have yet to really dive into learning about the different ways to measure our SEO efforts. Besides being set up with our first campaign as an SEOPro Member and having 100 or so keywords and phrases we're tracking... I'm really not sure what else to do at this point. However, I feel we'll be able to measure the popularity of each blog post by number of comments, shares, new links, etc. once I reach step 6. Is there something vital I'm missing or have forgotten here? I'm sorry for the long winded post, but I'm trying to get my thoughts straight before we start cranking on this plan. Thank you so much!
Intermediate & Advanced SEO | | TBiz2