Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Can't crawl website with Screaming frog... what is wrong?
-
Hello all - I've just been trying to crawl a site with Screaming Frog and can't get beyond the homepage - have done the usual stuff (turn off JS and so on) and no problems there with nav and so on- the site's other pages have indexed in Google btw.
Now I'm wondering whether there's a problem with this robots.txt file, which I think may be auto-generated by Joomla (I'm not familiar with Joomla...) - are there any issues here? [just checked... and there isn't!]
If the Joomla site is installed within a folder such as at
e.g. www.example.com/joomla/ the robots.txt file MUST be
moved to the site root at e.g. www.example.com/robots.txt
AND the joomla folder name MUST be prefixed to the disallowed
path, e.g. the Disallow rule for the /administrator/ folder
MUST be changed to read Disallow: /joomla/administrator/
For more information about the robots.txt standard, see:
http://www.robotstxt.org/orig.html
For syntax checking, see:
http://tool.motoricerca.info/robots-checker.phtml
User-agent: *
Disallow: /administrator/
Disallow: /bin/
Disallow: /cache/
Disallow: /cli/
Disallow: /components/
Disallow: /includes/
Disallow: /installation/
Disallow: /language/
Disallow: /layouts/
Disallow: /libraries/
Disallow: /logs/
Disallow: /modules/
Disallow: /plugins/
Disallow: /tmp/ -
For anyone wondering; The answer above by Ecommerce Site (odd name btw) works - 21-Nov-2016.
-
This is the best I could find to so someone who had a similar problem with Joomla-
"In the premium version you can slow down the crawl rate under 'speed' in the configuration. In the free lite version, you can crawl the site and then right click on any URLs with a 403 response and press 're-spider'. The server will generally then allow you to crawl these pages (and return a 200 ok response) as you're not requesting too many at once, so you might have to re-spider them individually."
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Does Google ignore content styled with 'display:none'?
Do you know if an H1 within a div that has a 'display: none' style applied will still be crawled and evaluated by Google? We have that situation on this page on line 136: view-source:https://www.junk-king.com/services/items-we-take/foreclosure-cleanouts Of course we also have an H1 up at the top of the page and are concerned that the second one will cause interference with our SEO efforts. I've seen conflicting and inconclusive information on line - not sure. Thanks for any help.
Intermediate & Advanced SEO | | rastellop0 -
How will changing my website's page content affect SEO?
Our company is looking to update the content on our existing web pages and I am curious what the best way to roll out these changes are in order to maintain good SEO rankings for certain pages. The infrastructure of the site will not be modified except for maybe adding a couple new pages, but existing domains will stay the same. If the domains are staying the same does it really matter if I just updated 1 page every week or so, versus updating them all at once? Just looking for some insight into how freshening up the content on the back end pages could potentially hurt SEO rankings initially. Thanks!
Intermediate & Advanced SEO | | Bankable1 -
Changed all external links to 'NoFollow' to fix manual action penalty. How do we get back?
I have a blog that received a Webmaster Tools message about a guidelines violation because of "unnatural outbound links" back in August. We added a plugin to make all external links 'NoFollow' links and Google removed the penalty fairly quickly. My question, how do we start changing links to 'follow' again? Or at least being able to add 'follow' links in posts going forward? I'm confused by the penalty because the blog has literally never done anything SEO-related, they have done everything via social and email. I only started working with them recently to help with their organic presence. We don't want them to hurt themselves at all, but 'follow' links are more NATURAL than having everything as 'NoFollow' links, and it helps with their own SEO by having clean external 'follow' links. Not sure if there is a perfect answer to this question because it is Google we're dealing with here, but I'm hoping someone else has some tips that I may not have thought about. Thanks!
Intermediate & Advanced SEO | | HashtagJeff0 -
One of my Friend's website Domain Authority is Reducing? What could be the reason?
Hello Guys, One of my friend's website domain authority is decreasing since they have moved their domain from HTTP to https.
Intermediate & Advanced SEO | | Max_
There is another problem that his blog is on subfolder with HTTP.
So, can you guys please tell me how to fix this issue and also it's losing some of the rankings like 2-5 positions down. Here is website URL: myfitfuel.in/
here is the blog URL: myfitfuel.in/mffblog/0 -
Screaming frog Advice
Hi I am trying to crawl my site and it keeps crashing. My sys admins keeps upgrading the virtual box it sits on and it now currently has 8GB of memory, but still crashes. It gets to around 200k pages crawl and dies. Any tips on how I can crawl my whole site, can u use screaming frog to crawl part of a site. Thanks in advance for any tips. Andy
Intermediate & Advanced SEO | | Andy-Halliday0 -
Alt tag for src='blank.gif' on lazy load images
I didn't find an answer on a search on this, so maybe someone here has faced this before. I am loading 20 images that are in the viewport and a bit below. The next 80 images I want to 'lazy-load'. They therefore are seen by the bot as a blank.gif file. However, I would like to get some credit for them by giving a description in the alt tag. Is that a no-no? If not, do they all have to be the same alt description since the src name is the same? I don't want to mess things up with Google by being too aggressive, but at the same time those are valid images once they are lazy loaded, so would like to get some credit for them. Thanks! Ted
Intermediate & Advanced SEO | | friendoffood0 -
How do I get rel='canonical' to eliminate the trailing slash on my home page??
I have been searching high and low. Please help if you can, and thank you if you spend the time reading this. I think this issue may be affecting most pages. SUMMARY: I want to eliminate the trailing slash that is appended to my website. SPECIFIC ISSUE: I want www.threewaystoharems.com to showing up to users and search engines without the trailing slash but try as I might it shows up like www.threewaystoharems.com/ which is the canonical link. WHY? and I'm concerned my back-links to the link without the trailing slash will not be recognized but most people are going to backlink me without a trailing slash. I don't want to loose linkjuice from the people and the search engines not being in consensus about what my page address is. THINGS I"VE TRIED: (1) I've gone in my wordpress settings under permalinks and tried to specify no trailing slash. I can do this here but not for the home page. (2) I've tried using the SEO by yoast to set the canonical page. This would work if I had a static front page, but my front page is of blog posts and so there is no advanced page settings to set the canonical tag. (3) I'd like to just find the source code of the home page, but because it is CSS, I don't know where to find the reference. I have gone into the css files of my wordpress theme looking in header and index and everywhere else looking for a specification of what the canonical page is. I am not able to find it. I'm thinking it is actually specified in the .htaccess file. (4) Went into cpanel file manager looking for files that contain Canonical. I only found a file called canonical.php . the only thing that seemed like it was worth changing was changing line 139 from $redirect_url = home_url('/'); to $redirect_url = home_url(''); nothing happened. I'm thinking it is actually specified in the .htaccess file. (5) I have gone through the .htaccess file and put thes 4 lines at the top (didn't redirect or create the proper canonical link) and then at the bottom of the file (also didn't redirect or create the proper canonical link) : RewriteEngine on
Intermediate & Advanced SEO | | Dillman
RewriteCond %{HTTP_HOST} ^([a-z.]+)?threewaystoharems.com$ [NC]
RewriteCond %{HTTP_HOST} !^www. [NC]
RewriteRule .? http://www.%1threewaystoharems.com%{REQUEST_URI} [R=301,L] Please help friends.0 -
301 doesn't redirect a page that ends in %20, and others being appended with ?q=
I have a product page that ends /product-name**%20** that I'm trying to redirect in this way: Redirect 301 /products/product-name%20 http://www.site.com/products/product-name And it doesn't redirect at all. The others, those with %20, are being redirected to a url hybrid of old and new: http://www.site.com/products/product-name**?q=old-url** I'm using Drupal CMS, and it may be creating rules that counter my entries.
Intermediate & Advanced SEO | | Brocberry0