Can't crawl website with Screaming frog... what is wrong?
-
Hello all - I've just been trying to crawl a site with Screaming Frog and can't get beyond the homepage - have done the usual stuff (turn off JS and so on) and no problems there with nav and so on- the site's other pages have indexed in Google btw.
Now I'm wondering whether there's a problem with this robots.txt file, which I think may be auto-generated by Joomla (I'm not familiar with Joomla...) - are there any issues here? [just checked... and there isn't!]
If the Joomla site is installed within a folder such as at
e.g. www.example.com/joomla/ the robots.txt file MUST be
moved to the site root at e.g. www.example.com/robots.txt
AND the joomla folder name MUST be prefixed to the disallowed
path, e.g. the Disallow rule for the /administrator/ folder
MUST be changed to read Disallow: /joomla/administrator/
For more information about the robots.txt standard, see:
http://www.robotstxt.org/orig.html
For syntax checking, see:
http://tool.motoricerca.info/robots-checker.phtml
User-agent: *
Disallow: /administrator/
Disallow: /bin/
Disallow: /cache/
Disallow: /cli/
Disallow: /components/
Disallow: /includes/
Disallow: /installation/
Disallow: /language/
Disallow: /layouts/
Disallow: /libraries/
Disallow: /logs/
Disallow: /modules/
Disallow: /plugins/
Disallow: /tmp/ -
For anyone wondering; The answer above by Ecommerce Site (odd name btw) works - 21-Nov-2016.
-
This is the best I could find to so someone who had a similar problem with Joomla-
"In the premium version you can slow down the crawl rate under 'speed' in the configuration. In the free lite version, you can crawl the site and then right click on any URLs with a 403 response and press 're-spider'. The server will generally then allow you to crawl these pages (and return a 200 ok response) as you're not requesting too many at once, so you might have to re-spider them individually."
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
I want to use a photo from an official website for my own website.IF YES HOW?
Lets suppose i downloaded a photo from a XYZ website and want to use it on my own website, and also i want to rank for same keyword, and would like to rank just below XYZ site, i know there could be copyright issue. what can be done to avoid this issue. Can i manipulate the picture in a such way that it is usable. if yes how? How can i use that official websites picture for my website, i mean, can i cite that website as a source? what is the best practice in this case? i dont want to use stock photo,i really like xyz sites pics.
Intermediate & Advanced SEO | | Sam09schulz0 -
Suggested Screaming Frog configuration to mirror default Googlebot crawl?
Hi All, Does anyone have a suggested Screaming Frog (SF) configuration to mirror default Googlebot crawl? I want to test my site and see if it will return 429 "Too Many Requests" to Google. I have set the User Agent as Googlebot (Smartphone). Is the default SF Menu > Configuration > Speed > Max Threads 5 and Max URLs 2.0 comparable to Googlebot? Context:
Intermediate & Advanced SEO | | gravymatt-se
I had tried NetPeak SEO Spider which did a nice job and had a cool feature that would pause a crawl if it got to many 429. Long Story short, B2B site threw 429 Errors when there should have been no load on a holiday weekend at 1:00 AM.0 -
What's wrong with the algorithm?
Is it possible that Google is penalising a specific page and in the same time it shows unrelated page in the search results? "rent luxury car florence" shows https://lurento.com/city/munich/on the 2nd page (that's Munich, Germany) and in the same time completely ignores the related page https://lurento.com/city/florence/ How I can figure out if the specific page has been trashed and why? Thanks,
Intermediate & Advanced SEO | | lurento.com
Mike0 -
International website. Di I need a new website
i am looking to expand from the UK and open a location in the US. i curretly have a .co.uk domain. what would you recommend I do with th website, create a new one wth a .com domain?
Intermediate & Advanced SEO | | Caffeine_Marketing0 -
Screaming Frog returning both HTTP and HTTPS results...
Hi, About 10 months I switched from HTTP to HTTPS. I then switched back (long story). I noticed that Screaming Frog is picking up the HTTP and HTTPS version of the site. Maybe this doesn't matter, but I'd like to know why SF is doing that. The URL is: www.aerlawgroup.com Any feedback, including how to remove the HTTPS version, is greatly appreciated. Thanks.
Intermediate & Advanced SEO | | mrodriguez14400 -
Is it a good or bad idea (in Google's eyes) to add a forum to my website?
I have an active website with many users adding dozens of comments on the many pages of the site daily. I'm am wondering if it would be good for the overall ranking strength of the site if I were to add a forum to it (in a subdirectory, like forum.mysite.com). On one hand, I can see the forum posts as thin content, which Google wouldn't care for. On the other hand, I see the additional user engagement on the site, which I think Google would like. I know the benefits it can have to the users, but for this question, all I want to know is if this would be seen by Google as a plus or a minus for my site, assuming the forum succeeded in becoming popular. I don't want to do anything that will diminish the value of my site in Google's eyes. Thank you.
Intermediate & Advanced SEO | | bizzer0 -
Interlinking vs. 'orphaning' mobile page versions in a dynamic serving scenario
Hi there, I'd love to get the Moz community's take on this. We are working on setting up dynamic serving for mobile versions of our pages. During the process of planning the mobile version of a page, we identified a type of navigational links that, while useful enough for desktop visitors, we feel would not be as useful to mobile visitors. We would like to remove these from our mobile version of the page as part of offering a more streamlined mobile page. So we feel that we're making a fine decision with user experience in mind. On any single page, the number of links removed in the mobile version would be relatively few. The question is: is there any danger in “orphaning” the mobile versions of certain pages because links don’t exist pointing to those pages on our mobile pages? Is this a legitimate concern, or is it enough that none of the desktop versions of pages are orphaned? We were not sure whether it’s even possible, in Googlebot’s eyes, to orphan a mobile version of a page if we use dynamic serving and if there are no orphaned desktop versions of our pages. (We also plan to link to "full site" in the footer.) Thank you in advance for your help,
Intermediate & Advanced SEO | | Eric_R
Eric0 -
Anyone managed to change 'At a glance:' in local search results
On Google's local search results, i.e when the 'Google places' data is displayed along with the map on the right hand side of the search results, there is also an element 'At a glance:'
Intermediate & Advanced SEO | | DeanAndrews
The data that if being displayed is from some years ago and the client would if possible like it to reflect there current services, which they have been providing for some five years. According to Google support here - http://support.google.com/maps/bin/answer.py?hl=en&answer=1344353 this cannot be changed, they say 'Can I edit a listing’s descriptive terms or suggest a new one?
No; the terms are not reviewed, curated, or edited. They come from an algorithm, and we do not help that algorithm figure it out. ' My question is has anyone successfully influenced this data and if so how.0