Baidu Spider Spam
-
Baidu Spider hits my UK site every 5 minutes of every day for the past 2 years.
It has no consideration whether a domain exists or not.
I know this because looking at etc/httpd/logs/error_log, i am getting every 5 minutes hits from Baidu spider trying to access a domain which points to my server which no longer exists.
Given that I have absolutely no trade with China, and given that the only spam comments I get on my wordpress blog originate from China, do you think it's a good idea to either do a China country block in my .HTACCESS or block out Baidu spider?
Baidu is consuming bandwidth and is clogging my error_logs!!!
Why is it that Google, Bing, Yahoo etc... can all crawl my site nicely, but Baidu just abuses?
-
Hi, ive tried cloudflare before.
Problem is that i am using SSL for some of my pages, so Cloudflare doesn't play nice unless I pay them.
Also, I am using amazon cdn - does that work with cloudflare or is it a bit ott?
I will take a look at your links and thanks!
-
I just remembered another tool that you can easily add to your site and simply block the bots by implementing to not trust this hostname or IP
in fact with cloud flare can block anything looking for that old domain
Is a free service and very good DNS I would utilize it if you must.
Sincerely,
Thomas
-
the complete block is here
Required robots.txt code:
Baidu (CN)
Info: http://www.baidu.com/search/spider.htmRequired robots.txt code:
User-agent: Baiduspider
User-agent: Baiduspider-video
User-agent: Baiduspider-image
Disallow: /http://searchenginewatch.com/article/2067357/Bye-bye-Crawler-Blocking-the-Parasites
http://forums.oscommerce.com/topic/382923-baiduspider-using-multiple-user-agents-how-to-stop-them/
-
?It should respect the robots so may be some one pretending to be Baidu I would try HTACCESS if you're not looking to go near China etc.
-
make sure you're not running an odd plug-in that maybe causing a caching issue I know it sounds strange but I've heard of this before and it was because of an all-in-one event calendar plug in.
If it's not something like that I definitely agree with what Chris's said Good call on that Chris.
however if there is no domain you will have to implement the robots.txt on whatever your server is currently running.
If you want a free tool that will allow you to create a solid block here's one below however Chris has done a great job of creating one.
http://www.internetmarketingninjas.com/seo-tools/robots-txt-generator/
sincerely,
Thomas
-
User-agent: Baiduspider
User-agent: baiduspider
User-agent: Baiduspider+
Disallow: /Baidu spider is blocked, but it doesn't seem to care!
-
Have you tried blocking it in robots ?
#Baiduspider
User-agent: Baiduspider
Disallow: /
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Next scheduled update?
Hello my new website is showing everything at minimum like domain and page rating and backlinks, when will the next update will be?
Algorithm Updates | | raffaelegurrieri0 -
Our Sites Organic Traffic Went Down Significantly After The June Core Algorithm Update, What Can I Do?
After the June Core Algorithim Update, the site suffered a loss of about 30-35% of traffic. My suggestions to try to get traffic back up have been to add metadata (since the majority of our content is lacking it), as well ask linking if possible, adding keywords to alt images, expanding and adding content as it's thin content wise. I know that from a technical standpoint there are a lot of fixes we can implement, but I do not want to suggest anything as we are onboarding an SEO agency soon. Last week, I saw that traffic for the site went back to "normal" for one day and then saw a dip of 30% the next day. Despite my efforts, traffic has been up and down, but the majority of organic traffic has dipped overall this month. I have been told by my company that I am not doing a good job of getting numbers back up, and have been given a warning stating that I need to increase traffic by 25% by the end of the month and keep it steady, or else. Does anyone have any suggestions? Is it realistic and/or possible to reach that goal?
Algorithm Updates | | NBJ_SM2 -
Is using REACT SEO friendly?
Hi Guys Is REACT SEO friendly? Has anyone used REACT and what was the results? Or do you recommend something else that is better suited for SEO? Many thanks for your help in advance. Cheers Martin
Algorithm Updates | | martin19700 -
Mobile Usability Issues after Mobile Frist
Hi All A couple months ago we got an email from google, telling us - Mobile-first indexing enabled for https://www.impactsigns.com/ Ran the test on MOZ, Mobile usability shows 100% Last week got an email from google - New Mobile usability issues detected for impactsigns.com Top new issues found, ordered by number of affected pages: Content wider than screen Clickable elements too close together I can not seem to figure out what those issues are, as all content is visible. How important are these 2 issues? Since we are now on the mobile first side?
Algorithm Updates | | samoos0 -
On page vs Off page vs Technical SEO: Priority, easy to handle, easy to measure.
Hi community, I am just trying to figure out which can be priority in on page, off page and technical SEO. Which one you prefer to go first? Which one is easy to handle? Which one is easy to measure? Your opinions and suggestions please. Expecting more realistic answers rather than usual check list. Thanks
Algorithm Updates | | vtmoz0 -
Is anyone else's ranking jumping?
Rankings have been jumping across 3 of our websites since about 24 October. Is anyone seeing similar? For example ... jumps from position 5 to 20 on one day, then back to 5 for 3 days and then back to 20 for a day I'm trying to figure out if it's algorithm based or if my rank checker has gone mad. I can't replicate the same results if I search incognito or in a new browser, everything always looks stable in the SERPs if I do the search myself
Algorithm Updates | | Marketing_Today0 -
I'm Pulling Hairs! - Duplicate Content Issue on 3 Sites
Hi, I'm an SEO intern trying to solve a duplicate content issue on three wine retailer sites. I have read up on the Moz Blog Posts and other helpful articles that were flooded with information on how to fix duplicate content. However, I have tried using canonical tags for duplicates and redirects for expiring pages on these sites and it hasn't fixed the duplicate content problem. My Moz report indicated that we have 1000s of duplicates content pages. I understand that it's a common problem among other e-commerce sites and the way we create landing pages and apply dynamic search results pages kind of conflicts with our SEO progress. Sometimes we'll create landing pages with the same URLs as an older landing page that expired. Unfortunately, I can't go around this problem since this is how customer marketing and recruitment manage their offers and landing pages. Would it be best to nofollow these expired pages or redirect them? Also I tried to use self-referencing canonical tags and canonical tags that point to the higher authority on search results pages and even though it worked for some pages on the site, it didn't work for a lot of the other search result pages. Is there something that we can do to these search result pages that will let google understand that these search results pages on our site are original pages? There are a lot of factors that I can't change and I'm kind of concerned that the three sites won't rank as well and also drive traffic that won't convert on the site. I understand that Google won't penalize your sites with duplicate content unless it's spammy. So If I can't fix these errors -- since the company I work conducts business where we won't ever run out of duplicate content -- Is it worth going on to other priorities in SEO like Keyword research, On/Off page optimization? Or should we really concentrate on fixing these technical issues before doing anything else? I'm curious to know what you think. Thanks!
Algorithm Updates | | drewstorys0 -
Is this spamming keywords into a url?
My company has previously added on extensions to a url like the example below http://www.test.com/product-name/extra-keywords My question is since there is no difference between the pages http://www.test.com/product-name and http://www.test.com/product-name/extra-keywords and you don't leave the product page to reach the extra-keyword page is this really necessary? I feel like this is probably not a best practice. Thanks for any suggestions.
Algorithm Updates | | Sika220