Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Block Baidu crawler?
-
Hello!
One of our websites receives a large amount of traffic from the Baidu crawler. We do not have any Chinese content or do any business with China since our market is Uk.
Is it a good idea to block the Baidu crawler in the robots.txt or could it have any adverse effects on SEO of our site?
What do you suggest?
-
I'm also trying to get this done as well, not sure if its doable on Volusion(don't use them).
Yandex actually crawls more than Baidu for me, and both don't benefit me at all(sucks when you pay for the bandwidth)
-
Thanks for that I have just looked that up-I didn't realise that this was such a common problem.
-
Hi
Further to Ally's answer, in my experiance Baidu tends to ignor the robot.txt, so just do it on the server side.
S
-
Thanks Ally for your answer, will now block Baidu
-
Hi Stefan,
You can block the Baidu crawler in in the robots.txt.
There should be no adverse affect to your site. As this is not an area you are targeting and has no future long term benerfit to your business. Blocking the crawler will mean that your server has less load to deal with from the unnecessary traffic you have been receiving.
You can block the spiders in the following ways:
- Robots.txt (below is code for Baidu)
User-agent: Baiduspider
User-agent: Baiduspider-video
User-agent: Baiduspider-image
Disallow: /- Blocking Spiders via the Apache Configuration File httpd.conf
See the below article for more details on this method
http://searchenginewatch.com/article/2067357/Bye-bye-Crawler-Blocking-the-Parasites
You may also want to check out:
I hope this helps,
Ally
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Robot.txt : How to block a specific file type in several subdirectories ?
Hello everyone ! I need help setting up a robot.txt. I'm trying to block all pdf files in particular directories so I'm using this command. In the example below the line is blocking all .gif in the entire site. Block files of a specific file type (for example, .gif) | Disallow: /*.gif$ 2 questions : Can I use this command to specify one particular directory in which I want to block pdf files ? Will this line be recognized by googlebots ? Disallow: /fileadmin/xxxxxxx/xxx/xxxxxxx/*.pdf$ Then I realized that I would have to write as many lines as many directories there are in which I want to block pdf files. Let's say I want to block pdf files in all these 3 directories /fileadmin/directory1 /fileadmin/directory1/sub1 /fileadmin/directory1/sub1/pdf Is there a pattern-matching rule I could use to blocks access to pdf files in all subdirectories instead of writing 3x the above line for each subdirectory ? For exemple : Disallow: /fileadmin/directory1*/ Many thanks in advance for any insight you may have.
Technical SEO | | LabeliumUSA0 -
Should I block Map pages with robots.txt?
Hello, I have a website that was started in 1999. On the website I have map pages for each of the offices listed on my site, for which there are about 120. Each of the 120 maps is in a whole separate html page. There is no content in the page other than the map. I know all of the offices love having the map pages so I don't want to remove the pages. So, my question is would these pages with no real content be hurting the rankings of the other pages on our site? Therefore, should I block the pages with my robots.txt? Would I also have to remove these pages (in webmaster tools?) from Google for blocking by robots.txt to really work? I appreciate your feedback, thanks!
Technical SEO | | imaginex0 -
How to block text on a page to be indexed?
I would like to block the spider indexing a block of text inside a page , however I do not want to block the whole page with, for example , a noindex tag. I have tried already with a tag like this : chocolate pudding chocolate pudding However this is not working for my case, a travel related website. thanks in advance for your support. Best regards Gianluca
Technical SEO | | CharmingGuy0 -
Block Domain in robots.txt
Hi. We had some URLs that were indexed in Google from a www1-subdomain. We have now disabled the URLs (returning a 404 - for other reasons we cannot do a redirect from www1 to www) and blocked via robots.txt. But the amount of indexed pages keeps increasing (for 2 weeks now). Unfortunately, I cannot install Webmaster Tools for this subdomain to tell Google to back off... Any ideas why this could be and whether it's normal? I can send you more domain infos by personal message if you want to have a look at it.
Technical SEO | | zeepartner0 -
Does using parentheses affect the crawlers?
Quick question: if you using a parantheses around a keyword, do search bots still recognize the keyword? Fox ex: Welcome to a website about the National Basketball Association (NBA). Will the bots recognize that I'm trying to optimize to NBA and not (NBA)? Is this different for tags vs. actual body copy?
Technical SEO | | BPIAnalytics2 -
How Can I Block Archive Pages in Blogger when I am not using classic/default template
Hi, I am trying to block all the archive pages of my blog as Google is indexing them. This could lead to duplicate content issue. I am not using default blogger theme or classic theme and therefore, I cannot use this code therein: Please suggest me how I can instruct Google not to index archive pages of my blog? Looking for quick response.
Technical SEO | | SoftzSolutions0 -
Block a sub-domain from being indexed
This is a pretty quick and simple (i'm hoping) question. What is the best way to completely block a sub domain from getting indexed from all search engines? One item i cannot use is the meta "no follow" tag. Thanks! - Kyle
Technical SEO | | kchandler0