Block Baidu crawler?
-
Hello!
One of our websites receives a large amount of traffic from the Baidu crawler. We do not have any Chinese content or do any business with China since our market is Uk.
Is it a good idea to block the Baidu crawler in the robots.txt or could it have any adverse effects on SEO of our site?
What do you suggest?
-
I'm also trying to get this done as well, not sure if its doable on Volusion(don't use them).
Yandex actually crawls more than Baidu for me, and both don't benefit me at all(sucks when you pay for the bandwidth)
-
Thanks for that I have just looked that up-I didn't realise that this was such a common problem.
-
Hi
Further to Ally's answer, in my experiance Baidu tends to ignor the robot.txt, so just do it on the server side.
S
-
Thanks Ally for your answer, will now block Baidu
-
Hi Stefan,
You can block the Baidu crawler in in the robots.txt.
There should be no adverse affect to your site. As this is not an area you are targeting and has no future long term benerfit to your business. Blocking the crawler will mean that your server has less load to deal with from the unnecessary traffic you have been receiving.
You can block the spiders in the following ways:
- Robots.txt (below is code for Baidu)
User-agent: Baiduspider
User-agent: Baiduspider-video
User-agent: Baiduspider-image
Disallow: /- Blocking Spiders via the Apache Configuration File httpd.conf
See the below article for more details on this method
http://searchenginewatch.com/article/2067357/Bye-bye-Crawler-Blocking-the-Parasites
You may also want to check out:
I hope this helps,
Ally
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Adding directories to robots nofollow cause pages to have Blocked Resources
In order to eliminate duplicate/missing title tag errors for a directory (and sub-directories) under www that contain our third-party chat scripts, I added the parent directory to the robots disallow list. We are now receiving a blocked resource error (in Webmaster Tools) on all of the pages that have a link to a javascript (for live chat) in the parent directory. My host is suggesting that the warning is only a notice and we can leave things as is without worrying about the page being de-ranked/penalized. I am wondering if this is true or if we should remove the one directory that contains the js from the robots file and find another way to resolve the duplicate title tags?
Technical SEO | | miamiman1000 -
Blocked jquery in Robots.txt, Any SEO impact?
I've heard that Google is now indexing links and stuff available in javascript and jquery. My webmastertools is showing that some links are blocked in robots.txt of jquery. Sorry I'm not a developer or designer. I want to know is there any impact of this on my SEO? and also how can I unblock it for the robots? Check this screenshot: http://i.imgur.com/3VDWikC.png
Technical SEO | | hammadrafique0 -
Block or remove pages using a robots.txt
I want to use robots.txt to prevent googlebot access the specific folder on the server, Please tell me if the syntax below is correct User-Agent: Googlebot Disallow: /folder/ I want to use robots.txt to prevent google image index the images of my website , Please tell me if the syntax below is correct User-agent: Googlebot-Image Disallow: /
Technical SEO | | semer0 -
Should I block robots from URLs containing query strings?
I'm about to block off all URLs that have a query string using robots.txt. They're mostly URLs with coremetrics tags and other referrer info. I figured that search engines don't need to see these as they're always better off with the original URL. Might there be any downside to this that I need to consider? Appreciate your help / experiences on this one. Thanks Jenni
Technical SEO | | ShearingsGroup0 -
What does it mean by 'blocked by Meta Robot'? How do I fix this?
When i get my crawl diagnostics, I am getting a blocked by Meta Robot, which means that my page is not being indexed in the search engines... obviously this is a major issue for organic traffic!!! What does it actually mean, and how can i fix it?
Technical SEO | | rolls1230 -
How to block google robots from a subdomain
I have a subdomain that lets me preview the changes I put on my site. The live site URL is www.site.com, working preview version is www.site.edit.com The contents on both are almost identical I want to block the preview version (www.site.edit.com) from Google Robots, so that they don't penalize me for duplicated content. Is it the right way to do it: User-Agent: * Disallow: .edit.com/*
Technical SEO | | Alexey_mindvalley0