Can't crawl website with Screaming frog... what is wrong?
-
Hello all - I've just been trying to crawl a site with Screaming Frog and can't get beyond the homepage - have done the usual stuff (turn off JS and so on) and no problems there with nav and so on- the site's other pages have indexed in Google btw.
Now I'm wondering whether there's a problem with this robots.txt file, which I think may be auto-generated by Joomla (I'm not familiar with Joomla...) - are there any issues here? [just checked... and there isn't!]
If the Joomla site is installed within a folder such as at
e.g. www.example.com/joomla/ the robots.txt file MUST be
moved to the site root at e.g. www.example.com/robots.txt
AND the joomla folder name MUST be prefixed to the disallowed
path, e.g. the Disallow rule for the /administrator/ folder
MUST be changed to read Disallow: /joomla/administrator/
For more information about the robots.txt standard, see:
http://www.robotstxt.org/orig.html
For syntax checking, see:
http://tool.motoricerca.info/robots-checker.phtml
User-agent: *
Disallow: /administrator/
Disallow: /bin/
Disallow: /cache/
Disallow: /cli/
Disallow: /components/
Disallow: /includes/
Disallow: /installation/
Disallow: /language/
Disallow: /layouts/
Disallow: /libraries/
Disallow: /logs/
Disallow: /modules/
Disallow: /plugins/
Disallow: /tmp/ -
For anyone wondering; The answer above by Ecommerce Site (odd name btw) works - 21-Nov-2016.
-
This is the best I could find to so someone who had a similar problem with Joomla-
"In the premium version you can slow down the crawl rate under 'speed' in the configuration. In the free lite version, you can crawl the site and then right click on any URLs with a 403 response and press 're-spider'. The server will generally then allow you to crawl these pages (and return a 200 ok response) as you're not requesting too many at once, so you might have to re-spider them individually."
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Crawl Stats Decline After Site Launch (Pages Crawled Per Day, KB Downloaded Per Day)
Hi all, I have been looking into this for about a month and haven't been able to figure out what is going on with this situation. We recently did a website re-design and moved from a separate mobile site to responsive. After the launch, I immediately noticed a decline in pages crawled per day and KB downloaded per day in the crawl stats. I expected the opposite to happen as I figured Google would be crawling more pages for a while to figure out the new site. There was also an increase in time spent downloading a page. This has went back down but the pages crawled has never went back up. Some notes about the re-design: URLs did not change Mobile URLs were redirected Images were moved from a subdomain (images.sitename.com) to Amazon S3 Had an immediate decline in both organic and paid traffic (roughly 20-30% for each channel) I have not been able to find any glaring issues in search console as indexation looks good, no spike in 404s, or mobile usability issues. Just wondering if anyone has an idea or insight into what caused the drop in pages crawled? Here is the robots.txt and attaching a photo of the crawl stats. User-agent: ShopWiki Disallow: / User-agent: deepcrawl Disallow: / User-agent: Speedy Disallow: / User-agent: SLI_Systems_Indexer Disallow: / User-agent: Yandex Disallow: / User-agent: MJ12bot Disallow: / User-agent: BrightEdge Crawler/1.0 (crawler@brightedge.com) Disallow: / User-agent: * Crawl-delay: 5 Disallow: /cart/ Disallow: /compare/ ```[fSAOL0](https://ibb.co/fSAOL0)
Intermediate & Advanced SEO | | BandG0 -
Do you know if there is a tool that check all the scripts that are running on the page, and can diagonse scripts that can harm our seo?
Hi, Do you know if there is a tool that check all the scripts that are running on the page, and can diagnose scripts that can harm our seo? Thanks Roy
Intermediate & Advanced SEO | | kadut0 -
Can't diagnose this 404 error
Hi Moz community I have started receiving a load of 404 errors that look like this: This page: http://paulminors.com/blog/page/5/ is linking to: http://paulminors.com/category/podcast/paulminors.com which is a broken link. This is happening with a load of other pages as well. It seems that "paulminors.com" is being added to the end of the linking pages URL.I'm using Wordpress and the SEO by Yoast plugin. I have searched for this link in the source of the linking page but can't find it, so I'm struggling to diagnose the problem. Does anyone have any ideas on what could be causing this? Thanks in advance Paul
Intermediate & Advanced SEO | | kevinliao0 -
Should I delete 'data hightlighter' mark-up in webmaster tools after added schema.org mark-up?
LEDSupply.com is my site, and before becoming familiar with schema mark-up I used the 'data-highlighter' in webmaster tools to mark-up as much of the site as I could. Now that Schema is set-up I'm wondering if having both active is bad and am thinking I should delete the previous work with the 'data highlighter' tool. To delete or not to delete? Thank you!
Intermediate & Advanced SEO | | saultienut0 -
Google don't index .ee version of a website
Hello, We have a problem with our clients website .ee. This website was developed by another company and now we don't know what is wrong with it. If i do a Google search "site:.ee" it only finds konelux.ee homepage and nothing else. Also homepage title tag and meta dec is in Finnish language not in Estonian language. If i look at .ee/robots.txt it looks like robots.txt don't block Google access. Any ideas what can be wrong here? BR, T
Intermediate & Advanced SEO | | sfinance0 -
Is my text readable? I don't see it in the page source
Text on my site seems to be readable in a text only version (the page is not cached so I viewed it by disabling JAVA and then copy and pasted the page into Word) However, when I look in the page source I don't see the text there. The text was created using Open X html boxes to help us with formatting, but is this causing an SEO problem?
Intermediate & Advanced SEO | | theLotter0 -
Privacy Policy & T&C's SEO related question
With Adwords they request a Privacy Policy and T&C's sometimes for an Ad to be approved. Silly question I know but do you think Google looks out for pages like this to identity websites which are more genuine for organic? Thanks
Intermediate & Advanced SEO | | activitysuper0 -
Yoast meta description in ' ' instead of " " problem
Hi Guys this is really strange, i am using yoast seo for wordpress on two sites. On both sites i am seeing meta name='description' instead of meta name="description" And this is why google is probably not reading it correctly, on many other link submission sites which read your meta data automatically say site blocked. How to i fix this? Thanks
Intermediate & Advanced SEO | | SamBuck0