Baidu Spider Spam

MangoMM

Baidu Spider hits my UK site every 5 minutes of every day for the past 2 years.

It has no consideration whether a domain exists or not.

I know this because looking at etc/httpd/logs/error_log, i am getting every 5 minutes hits from Baidu spider trying to access a domain which points to my server which no longer exists.

Given that I have absolutely no trade with China, and given that the only spam comments I get on my wordpress blog originate from China, do you think it's a good idea to either do a China country block in my .HTACCESS or block out Baidu spider?

Baidu is consuming bandwidth and is clogging my error_logs!!!

Why is it that Google, Bing, Yahoo etc... can all crawl my site nicely, but Baidu just abuses?

MangoMM

Hi, ive tried cloudflare before.

Problem is that i am using SSL for some of my pages, so Cloudflare doesn't play nice unless I pay them.

Also, I am using amazon cdn - does that work with cloudflare or is it a bit ott?

I will take a look at your links and thanks!

BlueprintMarketing

I just remembered another tool that you can easily add to your site and simply block the bots by implementing to not trust this hostname or IP

https://www.cloudflare.com/

in fact with cloud flare can block anything looking for that old domain

Is a free service and very good DNS I would utilize it if you must.

Sincerely,

Thomas

BlueprintMarketing

the complete block is here

Required robots.txt code:

Baidu (CN)
Info: http://www.baidu.com/search/spider.htm

Required robots.txt code:

User-agent: Baiduspider
User-agent: Baiduspider-video
User-agent: Baiduspider-image
Disallow: /

http://searchenginewatch.com/article/2067357/Bye-bye-Crawler-Blocking-the-Parasites

http://forums.oscommerce.com/topic/382923-baiduspider-using-multiple-user-agents-how-to-stop-them/

GPainter

?It should respect the robots so may be some one pretending to be Baidu I would try HTACCESS if you're not looking to go near China etc.

BlueprintMarketing

make sure you're not running an odd plug-in that maybe causing a caching issue I know it sounds strange but I've heard of this before and it was because of an all-in-one event calendar plug in.

If it's not something like that I definitely agree with what Chris's said Good call on that Chris.

however if there is no domain you will have to implement the robots.txt on whatever your server is currently running.

If you want a free tool that will allow you to create a solid block here's one below however Chris has done a great job of creating one.

http://www.internetmarketingninjas.com/seo-tools/robots-txt-generator/

sincerely,

Thomas

MangoMM

User-agent: Baiduspider
User-agent: baiduspider
User-agent: Baiduspider+
Disallow: /

Baidu spider is blocked, but it doesn't seem to care!

GPainter

Have you tried blocking it in robots ?

#Baiduspider
User-agent: Baiduspider
Disallow: /

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

Baidu Spider Spam

Got a burning SEO question?

Browse Questions

Explore more categories

Related Questions

Mobile Usability Issues after Mobile Frist

On page vs Off page vs Technical SEO: Priority, easy to handle, easy to measure.

What does it mean to build a 'good' website.

I'm Pulling Hairs! - Duplicate Content Issue on 3 Sites

Remove spam url errors from search console

Is this spamming keywords into a url?

Spam Back Link Removal Problem.

Is This Keyword Stuffing/Spamming?