Can't crawl website with Screaming frog... what is wrong?
-
Hello all - I've just been trying to crawl a site with Screaming Frog and can't get beyond the homepage - have done the usual stuff (turn off JS and so on) and no problems there with nav and so on- the site's other pages have indexed in Google btw.
Now I'm wondering whether there's a problem with this robots.txt file, which I think may be auto-generated by Joomla (I'm not familiar with Joomla...) - are there any issues here? [just checked... and there isn't!]
If the Joomla site is installed within a folder such as at
e.g. www.example.com/joomla/ the robots.txt file MUST be
moved to the site root at e.g. www.example.com/robots.txt
AND the joomla folder name MUST be prefixed to the disallowed
path, e.g. the Disallow rule for the /administrator/ folder
MUST be changed to read Disallow: /joomla/administrator/
For more information about the robots.txt standard, see:
http://www.robotstxt.org/orig.html
For syntax checking, see:
http://tool.motoricerca.info/robots-checker.phtml
User-agent: *
Disallow: /administrator/
Disallow: /bin/
Disallow: /cache/
Disallow: /cli/
Disallow: /components/
Disallow: /includes/
Disallow: /installation/
Disallow: /language/
Disallow: /layouts/
Disallow: /libraries/
Disallow: /logs/
Disallow: /modules/
Disallow: /plugins/
Disallow: /tmp/ -
For anyone wondering; The answer above by Ecommerce Site (odd name btw) works - 21-Nov-2016.
-
This is the best I could find to so someone who had a similar problem with Joomla-
"In the premium version you can slow down the crawl rate under 'speed' in the configuration. In the free lite version, you can crawl the site and then right click on any URLs with a 403 response and press 're-spider'. The server will generally then allow you to crawl these pages (and return a 200 ok response) as you're not requesting too many at once, so you might have to re-spider them individually."
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Can 'follow' rather than 'nofollow' links be damaging partner's SEO
Hey guys and happy Monday! We run a content rich website, 12+ years old, focused on travel in a specific region, and advertisers pay for banners/content etc alongside editorial. We have never used 'nofollow' website links as they're no explicitly paid for by clients, but a partner has asked us to make all links to them 'nofollow' as they have stated the way we currently link is damaging their SEO. Could this be true in any way? I'm only assuming it would adversely affect them if our website was peanalized by Google for 'selling links', which we're not. Perhaps they're just keen to follow best practice for fear of being seen to be buying links. FYI we now plan to change to more full use of 'nofollow', but I'm trying to work out what the client is refering to without seeming ill-informed on the subject! Thank you for any advice 🙂
Intermediate & Advanced SEO | | SEO_Jim0 -
Google cache from my website give another website
Hello, Some time ago, I already asked a question here because my homepage disappeared from Google for our main keyword. One of the problems that we showing up was the Google cache. If you look to the cache of the website www.conseilfleursdebach.fr, you see that it show the content of www.lesfleursdebach.be. It's both our website, but one is focus on France and the other one on Belgium. http://webcache.googleusercontent.com/search?q=cache%3Awww.conseilfleursdebach.fr&oq=cach&aqs=chrome.0.69i59j69i57j0j69i60j0l2.1374j0j4&sourceid=chrome&ie=UTF-8 Before, there were flags on the page to go to the other country, but in the meantime I removed all links from the .fr to the .be and opposite. This is ongoing since January. Who has an idea of what can cause this and most of all, what do do? Kind regards, Tine
Intermediate & Advanced SEO | | TineDL1 -
WordPress posts Title field inserts title into blog posts like a headline but doesn't ad H1 tag how to change?
I have a Wordpress website which is just using the Default theme, when I post in the blog, whatever I put in the "Title" field at the top of the editor is automatically is placed within the body of the blog post, like a headline, but it doesn't include any H1 tags that I can see. If I add my own headline within in the blog editor, it still inserts the Title like a headline. I am using the Yoast SEO Plugin and also write the meta title there, should I just leave the Wordpress title field blank so it doesn't insert into the blog post? Or is that inserted Title being recognized as an H1 even though I don't see h1 tags anywhere? Hope this isn't too confusing.
Intermediate & Advanced SEO | | SEO4leagalPA1 -
Why isn't the rel=canonical tag working?
My client and I have a problem: An ecommerce store with around 20 000 products has nearly 1 000 000 pages indexed (according to Search Console). I frequently get notified by messages saying “High number of URLs found” in search console. It lists a lot of sample urls with filter and parameters that are indexed by google, for example: https://www.gsport.no/barn-junior/tilbehor/hansker-votter/junior?stoerrelse-324=10-11-aar+10-aar+6-aar+12-aar+4-5-aar+8-9-aar&egenskaper-368=vindtett+vanntett&type-365=hansker&bruksomraade-367=fritid+alpint&dir=asc&order=name If you check the source code, there’s a canonical tag telling the crawler to ignore (..or technically commanding it to regard this exact page as another version of the page without all the parameters) everything after the “?” Does this url showing up in the Search Console message mean that this canonical isn’t working properly? If so: what’s wrong with it? Regards,
Intermediate & Advanced SEO | | Inevo
Sigurd0 -
Server responds with 302 but the pages doesn't appear to redirect?
I'm working on a site and am running some basic audits, including a campaign within Moz. When I put the domain into any of these tools, including response header checkers, the response is a 302 that says there is a redirect to an Error Page. However, the page itself doesn't redirect, and resolves fine in the browser. But all of the audit tools cant seem to get any information from any of the pages. What is the best way to troubleshoot what is going on here? Thanks.
Intermediate & Advanced SEO | | jim_shook0 -
Company name doesn't have keyword: use domains instead?
Good Morning! Now, I'll admit, I may be obsessing a little too much on this, and it may not make that big of an impact in the long run, but with Google being introduced to the world if I were to start a business today I would try and include my keyword into the title of my business. For example Dollar Shave Club, at least they got the word shave in there. My business doesn't have a keyword in our name, is it beneficial to structure our URLs to include a keyword so that all of our URLs include that word? So if I sell organic bananas, but my company is called Evananas, is it worth it to have all domains become a child of Evananas.com/organic_bananas? That way at least we have the keyword "Organic Bananas" in our title? So I could then have things like: evananas.com/organic_bananas/recipes evananas.com/organic_bananas/benefits evananas.com/organic_bananas/taste_really_freeking_good Vs. evananas.com/recipes evananas.com/benefits evananas.com/taste_really_freeking_good I'm not sure it makes a difference. The other problem is I want to keep our URL's as short as possible. I feel like less is always more, but I was always under the impression domain/URL based keywords were rather powerful. What is the best practice in this case? Thanks Guys! Evan(ana)
Intermediate & Advanced SEO | | HashtagHustler0 -
Why isn't my uneven link flow among index pages causing uneven search traffic?
I'm working with a site that has millions of pages. The link flow through index pages is atrocious, such that for the letter A (for example) the index page A/1.html has a page authority of 25 and the next pages drop until A/70.html (the last index page listing pages that start with A) has a page authority of just 1. However, the pages linked to from the low page authority index pages (that is, the pages whose second letter is at the end of the alphabet) get just as much traffic as the pages linked to from A/1.html (the pages whose second letter is A or B). The site gets a lot of traffic and has a lot of pages, so this is not just a statistical biip. The evidence is overwhelming that the pages from the low authority index pages are getting just as much traffic as those getting traffic from the high authority index pages. Why is this? Should I "fix" the bad link flow problem if traffic patterns indicate there's no problem? Is this hurting me in some other way? Thanks
Intermediate & Advanced SEO | | GilReich0 -
My homepage doesn't rank anymore. It's been replaced by irrelevant subpages which rank around 100-200 instead of top 5.
Hey guys, I think I got some kind of penalty for my homepage. I was in top5 for my keywords. Then a few days ago, my homepage stopped ranking for anything except searching for my domain name in Google. sitename.com/widget-reviews/ previously ranked #3 for "widget reviews"
Intermediate & Advanced SEO | | wearetribe
but now....
sitename.com/widget-training-for-pet-cats/ is ranking #84 for widget reviews instead. Similarly across all my other keywords, irrelevant, wrong pages are ranking. Did I get some kind of penalty?0