May know what's the meaning of these parameters in .htaccess?
-
Begin HackRepair.com Blacklist
RewriteEngine on
Abuse Agent Blocking
RewriteCond %{HTTP_USER_AGENT} ^BlackWidow [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Bolt\ 0 [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Bot\ mailto:craftbot@yahoo.com [NC,OR]
RewriteCond %{HTTP_USER_AGENT} CazoodleBot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^ChinaClaw [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Custo [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Default\ Browser\ 0 [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^DIIbot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^DISCo [NC,OR]
RewriteCond %{HTTP_USER_AGENT} discobot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Download\ Demon [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^eCatch [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ecxi [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^EirGrabber [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailCollector [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailSiphon [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailWolf [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Express\ WebPictures [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^ExtractorPro [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^EyeNetIE [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^FlashGet [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^GetRight [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^GetWeb! [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Go!Zilla [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Go-Ahead-Got-It [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^GrabNet [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Grafula [NC,OR]
RewriteCond %{HTTP_USER_AGENT} GT::WWW [NC,OR]
RewriteCond %{HTTP_USER_AGENT} heritrix [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^HMView [NC,OR]
RewriteCond %{HTTP_USER_AGENT} HTTP::Lite [NC,OR]
RewriteCond %{HTTP_USER_AGENT} HTTrack [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ia_archiver [NC,OR]
RewriteCond %{HTTP_USER_AGENT} IDBot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} id-search [NC,OR]
RewriteCond %{HTTP_USER_AGENT} id-search.org [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Image\ Stripper [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Image\ Sucker [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Indy\ Library [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^InterGET [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Internet\ Ninja [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^InternetSeer.com [NC,OR]
RewriteCond %{HTTP_USER_AGENT} IRLbot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ISC\ Systems\ iRc\ Search\ 2.1 [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Java [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^JetCar [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^JOC\ Web\ Spider [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^larbin [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^LeechFTP [NC,OR]
RewriteCond %{HTTP_USER_AGENT} libwww [NC,OR]
RewriteCond %{HTTP_USER_AGENT} libwww-perl [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Link [NC,OR]
RewriteCond %{HTTP_USER_AGENT} LinksManager.com_bot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} linkwalker [NC,OR]
RewriteCond %{HTTP_USER_AGENT} lwp-trivial [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Mass\ Downloader [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Maxthon$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} MFC_Tear_Sample [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^microsoft.url [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Microsoft\ URL\ Control [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^MIDown\ tool [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Mister\ PiX [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Missigua\ Locator [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla.*Indy [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla.NEWT [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^MSFrontPage [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Navroad [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^NearSite [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^NetAnts [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^NetSpider [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Net\ Vampire [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^NetZIP [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Nutch [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Octopus [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Offline\ Explorer [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Offline\ Navigator [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^PageGrabber [NC,OR]
RewriteCond %{HTTP_USER_AGENT} panscient.com [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Papa\ Foto [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^pavuk [NC,OR]
RewriteCond %{HTTP_USER_AGENT} PECL::HTTP [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^PeoplePal [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^pcBrowser [NC,OR]
RewriteCond %{HTTP_USER_AGENT} PHPCrawl [NC,OR]
RewriteCond %{HTTP_USER_AGENT} PleaseCrawl [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^psbot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^RealDownload [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^ReGet [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Rippers\ 0 [NC,OR]
RewriteCond %{HTTP_USER_AGENT} SBIder [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^SeaMonkey$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^sitecheck.internetseer.com [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^SiteSnagger [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^SmartDownload [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Snoopy [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Steeler [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^SuperBot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^SuperHTTP [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Surfbot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^tAkeOut [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Teleport\ Pro [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Toata\ dragostea\ mea\ pentru\ diavola [NC,OR]
RewriteCond %{HTTP_USER_AGENT} URI::Fetch [NC,OR]
RewriteCond %{HTTP_USER_AGENT} urllib [NC,OR]
RewriteCond %{HTTP_USER_AGENT} User-Agent [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^VoidEYE [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Web\ Image\ Collector [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Web\ Sucker [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Web\ Sucker [NC,OR]
RewriteCond %{HTTP_USER_AGENT} webalta [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^WebAuto [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^[Ww]eb[Bb]andit [NC,OR]
RewriteCond %{HTTP_USER_AGENT} WebCollage [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^WebCopier [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^WebFetch [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^WebGo\ IS [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^WebLeacher [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^WebReaper [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^WebSauger [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Website\ eXtractor [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Website\ Quester [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^WebStripper [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^WebWhacker [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^WebZIP [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Wells\ Search\ II [NC,OR]
RewriteCond %{HTTP_USER_AGENT} WEP\ Search [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Wget [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Widow [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^WWW-Mechanize [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^WWWOFFLE [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Xaldon\ WebSpider [NC,OR]
RewriteCond %{HTTP_USER_AGENT} zermelo [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Zeus [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(.)Zeus.Webster [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ZyBorg [NC]
RewriteRule ^. - [F,L]Abuse bot blocking rule end
End HackRepair.com Blacklist
-
Now it's clear. Thanks a lot ThompsonPaul!
-
Thanks!
Typically these blacklists are created and maintained by security specialists who have done testing on the different bots to determine which are legit/beneficial and which are crapbots. They then provide these lists for others to use. Often the lists are amalgamations of bots detected and analysed on a number of different sites and by a number of different specialists to act as a double-check for each other.
You do need to be careful that you are using a well-curated list, as carelessly blocking bots can cause problems for legitimate bots. You would check out the creator of such a list the same way you'd check out the creator of a plugin you're considering using - check reviews, look at comments and responses on the post that provides the blacklist etc.
That answer your question?
Paul
-
Hi ThompsonPaul,
Wow! Superb explanation. One thing I just want to clarify, how would I know if these bots are "bad bots".
Thanks a lot!
-
As Lynn mentions, these entries form a blacklist for "bad bots". These are bots that are identified as being harmful (or at least non-helpful) to the real use of a website. Bots are essentially spiders that crawl and record the pages of your site the same way the GoogleBot does.There are 2 main reasons for blocking them
-
Too many unnecessary bots can put a real strain on server resources, causing the site to slow down for real users. This can be especially problematic with bad bots as they do not respect the entries in your robots.txt file and so will crawl even blocked pages. This can mean huge numbers of extra pages get crawled, leading to even more load.
-
Many (most?) of these bots are collecting data for nefarious purposes. Some are scrapers to collect your site content in order to re-use it illegally on another site, some are scanning for certain files/plugins on your site known to be insecure so they can target them for attack, etc.
Best case scenario, these bots waste your bandwidth and can cause site slowdowns on low-powered (e.g. shared) servers. Worst case, they can actually cause harm to your site.
There are literally many thousands of these types of bots out there, and their creators often change their identifying user agents just to get around these types of blacklists. But many have been around for some time and still use the same identifier. So having a blacklist to block the most common of them is actually very good security practice. To be totally proactive however, you'd need to update the list every couple of months.
Bottom line - those entries are providing some security and overload protection for your site, and there's essentially no downside to having them in place even if they're not catching everything.
Hope that helps - if any of my explanation isn't clear, just holler
Paul
-
-
Thanks Lynn! I'll just remove these parameters and leave this one:
BEGIN WordPress
<ifmodule mod_rewrite.c="">RewriteEngine On
RewriteBase /
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.php [L]
Rewritecond %{http_host} ^domain.com [NC]
Rewriterule ^(.*)$ http://www.domain.com/$1 [R=301,NC]</ifmodule>END WordPress
-
I dont use something like this myself. I suppose if you are having some problem with bots it might be useful, maybe someone else can chime in if they have some experience with this kind of blocking.
-
Thanks Lynn! Is this really necessary?
-
HI,
It is checking to see if the visiting user agent contains any of these strings (NC is telling it non case sensitive) and if it does to return a 403 forbidden message.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Removing indexed internal search pages from Google when it's driving lots of traffic?
Hi I'm working on an E-Commerce site and the internal Search results page is our 3rd most popular landing page. I've also seen Google has often used this page as a "Google-selected canonical" on Search Console on a few pages, and it has thousands of these Search pages indexed. Hoping you can help with the below: To remove these results, is it as simple as adding "noindex/follow" to Search pages? Should I do it incrementally? There are parameters (brand, colour, size, etc.) in the indexed results and maybe I should block each one of them over time. Will there be an initial negative impact on results I should warn others about? Thanks!
Intermediate & Advanced SEO | | Frankie-BTDublin0 -
Can subdomains hurt your primary domain's SEO?
Our primary website https://domain.com has a subdomain https://subDomain.domain.com and on that subdomain we have a jive-hosted community, with a few links to and fro. In GA they are set up as different properties but there are many SEO issues in the jive-hosted site, in which many different people can create content, delete content, comment, etc. There are issues related to how jive structures content, broken links, etc. My question is this: Aside from the SEO issues with the subdomain, can the performance of that subdomain negatively impact the SEO performance and rank of the primary domain? I've heard and read conflicting reports about this and it would be nice to hear from the MOZ community about options to resolve such issues if they exist. Thanks.
Intermediate & Advanced SEO | | BHeffernan1 -
Do I need 301's if I use HSTS in HTTP to HTTPS migration?
Just wondering if this was strong enough signal to search engines that we don't need to write a 301 rule in .htaccess.
Intermediate & Advanced SEO | | KevinBudzynski0 -
Community Discussion - What's the ROI of "pruning" content from your ecommerce site?
Happy Friday, everyone! 🙂 This week's Community Discussion comes from Monday's blog post by Everett Sizemore. Everett suggests that pruning underperforming product pages and other content from your ecommerce site can provide the greatest ROI a larger site can get in 2016. Do you agree or disagree? While the "pruning" tactic here is suggested for ecommerce and for larger sites, do you think you could implement a similar protocol on your own site with positive results? What would you change? What would you test?
Intermediate & Advanced SEO | | MattRoney2 -
Are This Site's Backlinks Hurting Us?
Google WMT reports more than 198,000 backlinks to our site (www.audiobooksonline.com) from http://dilandau.eu/? We have never been notified by Google of any penalty, malware notification... but continue to struggle to get our page 1 Google ranking back since Panda. Could these backlinks be hurting our Google ranking? Should I implement a disavow rule for http://dilandau.eu/?
Intermediate & Advanced SEO | | lbohen0 -
Killing 404 errors on our site in Google's index
Having moved a site across to Magento, obviously re-directs were a large part of that, ensuring all the old products and categories linked up correctly with the new site structure. However, we came up against an issue where we needed to add, delete, then re-add products. This, coupled with a misunderstanding of the csv upload processing, meant that although the old urls redirected, some of the new Magento urls changed and then didn't redirect: For Example: mysite/product would get deleted re-added and become: mysite/product-1324 We now know what we did wrong to ensure it doesn't continue to happen if we weret o delete and re-add a product, but Google contains all these old URLs in its index which has caused people to search for products on Google, click through, then land on the 404 page - far from ideal. We kind of assumed, with continual updating of sitemaps and time, that Google would realise and update the URL accordingly. But this hasn't happened - we are still getting plenty of 404 errors on certain product searches (These aren't appearing in SEOmoz, there are no links to the old URL on the site, only Google, as the index contains the old URL). Aside from going through and finding the products affected (no easy task), and setting up redirects for each one, is there any way we can tell Google 'These URLs are no longer a thing, forget them and move on, let's make a fresh start and Happy New Year'?
Intermediate & Advanced SEO | | seanmccauley0 -
How to do a 301 redirect for url's with this structure?
In an effort to clean up my url's I'm trying to shorten them by using a 301 redirect in my .htaccess file. How would I set up a rule to grab all urls with a specific structure to a new shorter url examples: http://www.yakangler.com/articles/reviews/other-reviews/item/article-title http://www.yakangler.com/reviews/article-title So in the example above dynamically redirect all url's with /articles/reviews/other-reviews/item/ in it to /reviews/ so http://www.yakangler.com/articles/reviews/boat-reviews/item/1550-review-nucanoe-frontier http://www.yakangler.com/articles/reviews/other-reviews/item/1551-review-spyderco-salt http://www.yakangler.com/articles/reviews/fishing-gear-reviews/item/1524-slayer-inc-sinister-swim-tail would be... http://www.yakangler.com/reviews/1550-review-nucanoe-frontier http://www.yakangler.com/reviews/1551-review-spyderco-salt http://www.yakangler.com/reviews/1524-slayer-inc-sinister-swim-tail with one 301 redirect rule in my .htaccess file.
Intermediate & Advanced SEO | | mr_w0 -
How 'Off Topic' can I go - site wide?
Hello, I am currently number 1 for a competitive keyword - so don't want to push the wrong button and self destruct! My site is highly focused on one relatively narrow niche with about 50-60 pages of content bang on topic. I was wondering if Google will discredit my site in any way if I start adding pages that are** 'loosely related' **to the overall theme of my niche. Some of them are what you might call sister concepts with maybe one mention of my target keyword in the body..... Does the algo value what percentage of the whole site's content is on/ off topic? If so how important is this as a factor? Thanks a lot
Intermediate & Advanced SEO | | philipjterry0