Block Baidu crawler?

AJPro

Hello!

One of our websites receives a large amount of traffic from the Baidu crawler. We do not have any Chinese content or do any business with China since our market is Uk.

Is it a good idea to block the Baidu crawler in the robots.txt or could it have any adverse effects on SEO of our site?

What do you suggest?

William.Lau

I'm also trying to get this done as well, not sure if its doable on Volusion(don't use them).

Yandex actually crawls more than Baidu for me, and both don't benefit me at all(sucks when you pay for the bandwidth)

LoveFitness

Thanks for that I have just looked that up-I didn't realise that this was such a common problem.

Metropolis

Hi

Further to Ally's answer, in my experiance Baidu tends to ignor the robot.txt, so just do it on the server side.

S

AJPro

Thanks Ally for your answer, will now block Baidu

LoveFitness

Hi Stefan,

You can block the Baidu crawler in in the robots.txt.

There should be no adverse affect to your site. As this is not an area you are targeting and has no future long term benerfit to your business. Blocking the crawler will mean that your server has less load to deal with from the unnecessary traffic you have been receiving.

You can block the spiders in the following ways:

Robots.txt (below is code for Baidu)

User-agent: Baiduspider
User-agent: Baiduspider-video
User-agent: Baiduspider-image
Disallow: /

Blocking Spiders via the Apache Configuration File httpd.conf

See the below article for more details on this method

http://searchenginewatch.com/article/2067357/Bye-bye-Crawler-Blocking-the-Parasites

You may also want to check out:

http://www.robotstxt.org/

I hope this helps,

Ally

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

Moz Q&A is closed.

Block Baidu crawler?

Got a burning SEO question?

Browse Questions

Explore more categories

Related Questions

Moz crawler is not able to crawl my website

Robot.txt : How to block a specific file type in several subdirectories ?

Google insists robots.txt is blocking... but it isn't.

Should I block robots from URLs containing query strings?

OK to block /js/ folder using robots.txt?

What is the best method to block a sub-domain, e.g. staging.domain.com/ from getting indexed?

Does using parentheses affect the crawlers?

Is blocking RSS Feeds with robots.txt necessary?

Products

Moz Solutions

Free SEO Tools

Resources

About Moz

Why Moz

Get Involved