Block bad crawlers
-
Hi! how are you?
I've been working on some of my sites, and noticed that i'm getting lots of crawls by search engines that i'm not intereted in ranking well.
My question is the following: do you have a list of 'bad behaved' search engines that take lots of bandwidth and don´t send much/good traffic?
If so, do you know how to block them using robots.txt?
Thanks for the help!
Best wishes,
Ariel
-
Hey Ariel,
Here's a couple lists of bots that some people are blocking - you should probably review your server data to see which bots are visiting you that you want to block:
In addition to the moz resource Chris referenced, here are a couple more pages that might be useful for you:
- http://stackoverflow.com/questions/10793906/how-to-allow-known-web-crawlers-and-block-spammers-and-harmful-robots-from-scann
- http://www.distilled.net/u/robots-txt/
Good luck!
-
Chris gives a good answer, but is it really a problem, bandwidth is very cheap these days, in fact here in Australia most accounts are unlimited,
I Host with Microsoft Azure and bandwidth is very cheap.
-
Ariel, you could start with the list shown here and tailor it to fit your needs if you're having problems with others: http://www.webmasterworld.com/search_engine_spiders/4579553.htm. There's info there on using robots.txt to block them and you should also read this for info on using robots.txt file: Robots.txt and Meta Robots - SEO Best Practices - Moz
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Is it bad to update product titles and URLs if they are only slightly modified
I am doing some house cleaning on the site and made some minor updates to product titles and a rule was written in and it auto updated the URL to what the product title was with a redirect put in place from the old URL. If this a bad thing and should i leave the URL alone and just update the product title? Then for the ones i did change the Product title and the URL was updated is this a bad thing and should i have just left the URL alone? These are all high ranking popular products so dont want to mess with any rankings going into busy season?
Technical SEO | | isle_surf0 -
Can I Block https URLs using Host directive in robots.txt?
Hello Moz Community, Recently, I have found that Google bots has started crawling HTTPs urls of my website which is increasing the number of duplicate pages at our website. Instead of creating a separate robots.txt file for https version of my website, can I use Host directive in the robots.txt to suggest Google bots which is the original version of the website. Host: http://www.example.com I was wondering if this method will work and suggest Google bots that HTTPs URLs are the mirror of this website. Thanks for all of the great responses! Regards,
Technical SEO | | TJC.co.uk
Ramendra0 -
To avoid errors in our Moz crawl, we removed subdomains from our host. (First we tried 301 redirects, also listed as errors.) Now we have backlinks all over the web that are broken. How bad is this, from a pagerank standpoint?
Our MOZ crawl kept telling us we had duplicate page content even though our subdomains were redirected to our main site. (Pages from Wineracks.vigilantinc.com were 301 redirected to vigilantinc.com/wineracks.) Now, to solve that problem, we have removed the wineracks.vigilantinc.com subdomain. The error report is better, but now we have broken backlinks - thousands of them. Is this hurting us worse than the duplicate content problem?
Technical SEO | | KristyFord0 -
Why does my site rank so badly
its my turn to ask the interminable question why does my site rank so badly? site is: marriagerecords.org.uk. it was #1 for 'marriage records' on google for about 6 months. then it was 5th to 10th for about 2 months. now it is nowhere for this phrase and anything else, none of the pages I have written rank for anything. I have spent hours upon hours researching original content and I have got some great backlinks from sites like wrexham.gov.uk and somerset.gov.uk (some dont show in opensiteexplorer yet). im guessing im over-optimizing something but i'd love some concrete fixes if anyone could suggest any. thanks, tom
Technical SEO | | lethal0r0 -
Client with Very Very Bad Onsite SEO
So one of my clients has a really really bad website from the technical perspective. I am talking over 75k in violations and warnings. Granted, the tagging is done well but any other SEO violation you can think of is occurring. In any case, they are building a new website, and I am on a retainer for a couple hours a week to do some link building. I am feeling like I am not getting anywhere. What is your advice? Should I keep on keeping on or advice the client to put SEO on hold until the technical issues are resolved. I feel like all of this link building isn't having the value that it could have with a site like this.
Technical SEO | | runnerkik0 -
Matching C Block
Hi Guys We have 2 sites that are in the same niche and competing for the same keywords. The sites are on seperate domains one is UK and one is .com They have their own IP's however have both have the same C Block... We have noticed that when the rankings for one site improves the other drops.... Could the C Block be causing this?
Technical SEO | | EwanFisher0 -
Page not Accesible for crawler in on-page report
Hi All, We started using SEOMoz this week and ran into an issue regarding the crawler access in the on-page report module. The attached screen shot shows that the HTTP status is 200 but SEOMoz still says that the page is not accessible for crawlers. What could this be? Page in question
Technical SEO | | TiasNimbas
http://www.tiasnimbas.edu/Executive_MBA/pgeId=307 Regards, Coen SEOMoz.png0 -
Converse.com - flash and html version of site... bad idea?
I have a questions regarding Converse.com. I realize this ecommerce site is needs a lot of seo help. There’s plenty of obvious low hanging seo fruit. On a high level, I see a very large SEO issue with the site architecture. The site is a full page flash experience that uses a # in the URL. The search engines pretty much see every flash page as the home page. To help with issue a HTML version of the site was created. Google crawls the Home Page - Converse.com http://www.converse.com Marimekko category page (flash version) http://www.converse.com/#/products/featured/marimekko Marimekko category page (html version, need to have flash disabled) http://www.converse.com/products/featured/marimekko Here is the example of the issue. This site has a great post featuring Helen Marimekko shoes http://www.coolmompicks.com/2011/03/finnish_foot_prints.php The post links to the flash Marimekko catagory page (http://www.converse.com/#/products/featured/marimekko) as I would expect (ninety something percent of visitors to converse.com have the required flash plug in). So the flash page is getting the link back juice. But the flash page is invisible to google. When I search for “converse marimekko” in google, the marimekko landing page is not in the top 500 results. So I then searched for “converse.com marimekko” and see the HTML version of the landing page listed as the 4<sup>th</sup> organic result. The result has the html version of the page. When I click the link I get redirected to the flash Marimekko category page but if I do not have flash I go to the html category page. ----- Marimekko - Converse All Star Marimekko Price: $85, Jack Purcell Helen Marimekko Price: $75 ... www.converse.com/products/featured/marimekko - Cached So my issues are… Is converse skating on thin SEO ice by having a HTML and flash version of their site/product pages? Do you think it’s a huge drag on seo rankings to have a large % of back links linking to flash pages when google is crawling the html pages? Any recommendations on to what to do about this? Thanks, SEOsurfer
Technical SEO | | seosurfer-2883190