Block bad crawlers
-
Hi! how are you?
I've been working on some of my sites, and noticed that i'm getting lots of crawls by search engines that i'm not intereted in ranking well.
My question is the following: do you have a list of 'bad behaved' search engines that take lots of bandwidth and don´t send much/good traffic?
If so, do you know how to block them using robots.txt?
Thanks for the help!
Best wishes,
Ariel
-
Hey Ariel,
Here's a couple lists of bots that some people are blocking - you should probably review your server data to see which bots are visiting you that you want to block:
In addition to the moz resource Chris referenced, here are a couple more pages that might be useful for you:
- http://stackoverflow.com/questions/10793906/how-to-allow-known-web-crawlers-and-block-spammers-and-harmful-robots-from-scann
- http://www.distilled.net/u/robots-txt/
Good luck!
-
Chris gives a good answer, but is it really a problem, bandwidth is very cheap these days, in fact here in Australia most accounts are unlimited,
I Host with Microsoft Azure and bandwidth is very cheap.
-
Ariel, you could start with the list shown here and tailor it to fit your needs if you're having problems with others: http://www.webmasterworld.com/search_engine_spiders/4579553.htm. There's info there on using robots.txt to block them and you should also read this for info on using robots.txt file: Robots.txt and Meta Robots - SEO Best Practices - Moz
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Can I Block https URLs using Host directive in robots.txt?
Hello Moz Community, Recently, I have found that Google bots has started crawling HTTPs urls of my website which is increasing the number of duplicate pages at our website. Instead of creating a separate robots.txt file for https version of my website, can I use Host directive in the robots.txt to suggest Google bots which is the original version of the website. Host: http://www.example.com I was wondering if this method will work and suggest Google bots that HTTPs URLs are the mirror of this website. Thanks for all of the great responses! Regards,
Technical SEO | | TJC.co.uk
Ramendra0 -
WEBMASTER console: increase in the number of URLs we were blocked from crawling due to authorization permission errors.
Hi guys,I received this warning in my webmaster console: "Google detected a significant increase in the number of URLs we were blocked from crawling due to authorization permission errors." So i went to "Crawl Errors" section and i found such errors under "Access denied" status: ?page_name=Cheap+Viagra+Gold+Online&id=471 ?page_name=Cheapest+Viagra+Us+Licensed+Pharmacies&id=1603 and many happy URLs like these. Does anybody know what this is and where it comes from? Thanks in advance!
Technical SEO | | odmsoft0 -
Should I Remove Thousands of Bad Links over a Short Time or Long Time?
Hey Moz Community! I've got a website that has hundreds of thousands of old links that don't really offer any great content. They need to be removed. Would it be a better idea to remove them in batches of 5000,10000, or more over a long time... or remove them all at the same time because it doesn't matter? Cheers, Alex
Technical SEO | | Anti-Alex0 -
Why is it the crawler saying I have 9 Duplicate Page Titles?
Hi, I received my weekly web crawl and it is saying this: | 4 | Duplicate Page Content |
Technical SEO | | afrohairsolutions
| 22 | Missing Meta Description Tag |
| 9 | Duplicate Page Title |
| 1 | Title Element Too Long (> 70 Characters) |
| 1 | Title Element Too Short |
| 1 | 301 (Permanent Redirect) | I'm new to SEO and don't know how to fix this, I don't really see how I have Duplicate Page Content or Duplicate Page Title. This is my website: afrohairsolutions.co.uk Thank you in advance.0 -
Are bad links the reason for not ranking?
Hello Moz community. I'm looking here for some input from the experts on what could be wrong with a site I'm working on. The site is in Spanish, but I'm sure you'll get the idea. We want to rank the site first page on Google Mexico (www.google.com.mx) for the keyword "refacciones Audi" and some other brands (refacciones = replacement parts would probably be a good translation, just FYI). Now, our page hasn't been completely optimized, so in my mind it's OK not to be on first page yet. However, our main competitor is ranking first page for all the keywords we want to rank for, but when you check their site, you'll find there is hardly any content, no keywords are being used in their content, all pages have the exact same title and meta description, their catalog is in a completely different domain. In short, no SEO whatsoever. Looking at Moz data, our site has a DA of 26, while our competitor's has a 10. They have no external backlinks at all, while we have a few hundred. This leaves me scratching my head: how can a completely non-optimized site outrank us? I decided to check our backlink profile, and a previous SEO agency seems to have built MANY fake blogs with lots of backlinks with rich anchor text. Quite a big percentage of our backlinks are of this kind, so this is the only thing I can think can be affecting our ranking. Will disavowing be our solution? If you'd like to check, our site is: www.refaccionariaalemana.com.mx Our competitors' is: www.saferefacciones.com ANY help will be extremely appreciated as I feel a bit lost. Thanks!
Technical SEO | | EduardoRuiz1 -
Pointing Other URL to My Site? Good or bad for ranking.
A few years ago I purchased a few keyword rich domain names and set up some satellite sites. Spammy I now know. What should I do now? I own the domain names for at least another 3 years. Should I point them to my main site or would that hurt my main site ranking?
Technical SEO | | caisson0 -
Deleting 30,000 pages all at once - good idea or bad idea?
We have 30,000 pages that we want to get rid of. Each product within our database has it's own page. And these particular 30,000 products are not relevant anymore. They have very little content on them and are basically the same exact page but with a few title changes. We no longer want them weighing down our database so we are going to delete them. My question is - should we get rid of them in smaller batches like 2,000 pages at a time, or is it better to get rid of all them in one fell swoop? Which is least likely to raise a flag to Google? Anyone have any experience with this?
Technical SEO | | Viewpoints0 -
How is my competition causing bad crawl errors and links on my site
We have a compeditor who we are in a legal dispute at the moment, and they are using under hand tactics to cause us to have bad links and crawl errors and i do not know how they are doing it or how to stop it. The crawl errors we are getting is the site having two urls together, for example www.testsite.com/www.testsite.com and other errors are pages that we do not even have or pages that are spelt wrong or have a dot after the page name. We have been told off a number of people in our field that this has also happened to them and i would like to know how they are doing it so we can have this stopped Since they have been doing this our traffic has gone down by half
Technical SEO | | ClaireH-1848860