Baidu Spider Spam
-
Baidu Spider hits my UK site every 5 minutes of every day for the past 2 years.
It has no consideration whether a domain exists or not.
I know this because looking at etc/httpd/logs/error_log, i am getting every 5 minutes hits from Baidu spider trying to access a domain which points to my server which no longer exists.
Given that I have absolutely no trade with China, and given that the only spam comments I get on my wordpress blog originate from China, do you think it's a good idea to either do a China country block in my .HTACCESS or block out Baidu spider?
Baidu is consuming bandwidth and is clogging my error_logs!!!
Why is it that Google, Bing, Yahoo etc... can all crawl my site nicely, but Baidu just abuses?
-
Hi, ive tried cloudflare before.
Problem is that i am using SSL for some of my pages, so Cloudflare doesn't play nice unless I pay them.
Also, I am using amazon cdn - does that work with cloudflare or is it a bit ott?
I will take a look at your links and thanks!
-
I just remembered another tool that you can easily add to your site and simply block the bots by implementing to not trust this hostname or IP
in fact with cloud flare can block anything looking for that old domain
Is a free service and very good DNS I would utilize it if you must.
Sincerely,
Thomas
-
the complete block is here
Required robots.txt code:
Baidu (CN)
Info: http://www.baidu.com/search/spider.htmRequired robots.txt code:
User-agent: Baiduspider
User-agent: Baiduspider-video
User-agent: Baiduspider-image
Disallow: /http://searchenginewatch.com/article/2067357/Bye-bye-Crawler-Blocking-the-Parasites
http://forums.oscommerce.com/topic/382923-baiduspider-using-multiple-user-agents-how-to-stop-them/
-
?It should respect the robots so may be some one pretending to be Baidu I would try HTACCESS if you're not looking to go near China etc.
-
make sure you're not running an odd plug-in that maybe causing a caching issue I know it sounds strange but I've heard of this before and it was because of an all-in-one event calendar plug in.
If it's not something like that I definitely agree with what Chris's said Good call on that Chris.
however if there is no domain you will have to implement the robots.txt on whatever your server is currently running.
If you want a free tool that will allow you to create a solid block here's one below however Chris has done a great job of creating one.
http://www.internetmarketingninjas.com/seo-tools/robots-txt-generator/
sincerely,
Thomas
-
User-agent: Baiduspider
User-agent: baiduspider
User-agent: Baiduspider+
Disallow: /Baidu spider is blocked, but it doesn't seem to care!
-
Have you tried blocking it in robots ?
#Baiduspider
User-agent: Baiduspider
Disallow: /
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Next scheduled update?
Hello my new website is showing everything at minimum like domain and page rating and backlinks, when will the next update will be?
Algorithm Updates | | raffaelegurrieri0 -
SEO: High intent organic revenue down in Europe
Our team is stumped and we are hoping some of you might have some insight! We are seeing a drop in Europe organic revenue and we can't seem to figure out what the core cause of the problem is. What's interesting, the high intent traffic is increasing across the business, as is organic-attributed revenue. And in Europe specifically, other channels appear to be doing just fine. This seems to be a Europe high-intent SEO problem. What we have established: Revenue was at a peak in Q4 2017 and Q1 2018 Revenue dips in mid-late Q2 2018 and again in Q4 2018 where it has stayed low since Organic traffic has gone up, conversion rate has gone down, purchases have gone down Paid search traffic has gone up, conversion rate has gone down slightly, submissions have gone up Currency changes are minimal We cannot find any site load issues What we know happened during this time frame (January 2018 onward): Updates to the website (homepage layout, some text changes) end of April 2018 GDPR end of May 2018 Google Analytics stops being able to track Firefox Europe is a key market for us and we cant figure out what might be causing this to happen - again, only in Europe - beyond GDPR and the changes we've made on our site is there anything else major that we're missing that could be causing this? Or does anyone have any insights as to where we should look? Thank you in advance!
Algorithm Updates | | RS-Marketing0 -
International Homepage Advice
Hello, colleagues! We have a conundrum. A client website has a good subdirectory strategy for localized/translated content for its various international markets, but nothing currently "lives" at the root. In my mind, this presents a challenge to search engines (note that we have had some trouble getting proper visibility overall, which is why I'm asking this question). I'm looking for any links or just plain old good advice on why it's important to have a global homepage. Should that global homepage be in English? Most enterprise sites I've worked with do have a homepage that's in English, with the ability to select a country from a drop down in a nav across the site. Any advice, best practices, etc. about why a global homepage is important and what language it could/should be in would be really helpful. Hreflang tags would make sense, I guess, but each country has slightly different offerings so I'm not sure that it makes complete sense. In other words, one country's homepage may have completely different content than another's. Thank you!
Algorithm Updates | | SimpleSearch0 -
Is using REACT SEO friendly?
Hi Guys Is REACT SEO friendly? Has anyone used REACT and what was the results? Or do you recommend something else that is better suited for SEO? Many thanks for your help in advance. Cheers Martin
Algorithm Updates | | martin19700 -
Is anyone else's ranking jumping?
Rankings have been jumping across 3 of our websites since about 24 October. Is anyone seeing similar? For example ... jumps from position 5 to 20 on one day, then back to 5 for 3 days and then back to 20 for a day I'm trying to figure out if it's algorithm based or if my rank checker has gone mad. I can't replicate the same results if I search incognito or in a new browser, everything always looks stable in the SERPs if I do the search myself
Algorithm Updates | | Marketing_Today0 -
Remove spam url errors from search console
My site was hacked some time ago. I've since then redesigned it and obviously removed all the injection spam. Now I see in search console that I'm getting hundreds of url errors (from the spam links that no longer work). How do I remove them from the search console. The only option I see is "mark as fixed", but obviously they are not "fixed", rather removed. I've already uploaded a new sitemap and fetched the site, as well as submitted a reconsideration request that has been approved.
Algorithm Updates | | rubennunez0 -
Is this spamming keywords into a url?
My company has previously added on extensions to a url like the example below http://www.test.com/product-name/extra-keywords My question is since there is no difference between the pages http://www.test.com/product-name and http://www.test.com/product-name/extra-keywords and you don't leave the product page to reach the extra-keyword page is this really necessary? I feel like this is probably not a best practice. Thanks for any suggestions.
Algorithm Updates | | Sika220 -
Hyphenated Words as Keywords what is spam?
Do you know of any evidence that explains how Google or any SE would handle pages with words that are commonly hyphenate? Our site for example has a large O-ring section. A couple years ago when we did our SEO we used Google's keyword tool and found that these words all have different Cost Per Click, Global / Local searches. O-Rings
Algorithm Updates | | donford
O-Ring
oring
o-ring So we assumed they were each unique keywords and designed our pages to alternate usage of the terms as they are fairly interchangeable. However we have not achieved the position we would have expected from all that work, now while I'm doing another SEO pass with the tools here.. I want to make sure we don't spam them, but still cover all our bases. Thanks for any tips, advice or links.0