What To Do About Yahoo Slurp Bot Bogging My Site Down?
-
Hello,
Our IT department has informed me that they have seen extremely heavy traffic from the Yahoo Slurp bot in recent days. They are claiming this bot has single-handedly caused one of our servers to crash.
I am a bit skeptical of this, as I have not found these particular legitimate search engine bots to be aggressive resource hogs, especially for an enterprise-level web server.
I have requested to examine the server logs myself, but have not had success with this. IT is requesting to block this particular bot, but I am apprehensive about doing this, as I don't want this to have any negative implications on our site showing in Yahoo News or other Yahoo properties.
Does anyone else have experience with this bot being an overly-zealous resource drag, and if so, what is the best course of action to satisfy all parties?
-
Examining the server logs yourself probably wont help your understanding of the issue unless you know what your looking at specifically. On the Yahoo note, i have found Slurp to be really bad in the past, but no legitimate bot should be able to bring down a properly configured web server, especially an 'enterprise-level' one.
I would check your .htaccess and apache settings for bad redirects (or web.conf if on windows) before considering banning the bot. Other things to check would be website code or if a bot hits a massive and horribly optimised Database Query for example, that could bring the server down.
Ask IT exactly what the bot did that caused the server to go down, they should atleast be able to tell you that. If not then they need to run load tests against the website itself to try and reproduce the scenario and thus debug the issue, if indeed there is one.
Tl;dr :- Normally bad config or code / queries are to blame for this kind of thing. I'd review that before blocking a bot that crawls hundreds of thousands of other sites without issue.
-
You should be able to can control the rate at which the bot accesses you pages by adding a crawl delay in your robots.txt file. Robots.txt and crawl delay is discussed here: http://en.wikipedia.org/wiki/Robots_exclusion_standard, and Slurp bot here: https://help.yahoo.com/kb/SLN22600.html.
Should look like this in your robots.txt file:
User-agent: Slurp
Crawl-delay: 30
The crawl delay is the number of seconds the bot should wait between pageview (ask your IT guys what's appropriate for you). I stuck 30 in there, meaning the Slurp bot would only be able to access up to 2 pages a minute.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
My site is being deindexed for unknown reason
A few days ago I noticed that my site gusty.se was not showing up in google, only the subpages. There is no message in the google search console. I requested the site to be reindexed and about a day later the site was showing up in google again. Now another day has past and the site is now again not indexed in google. Question is why the site is being deindexed??? I have worked a bit with getting backlinks to the site and I did recently gain 3 backlinks within a few days (about a week has past since I gained these links). Still I can't believe Google would count this as unnatural link building, especially since I guess it will take some time for Google to detect new incoming links. Another thing I've notice though is that my site about two weeks ago got a high number of incoming links from different spam sites with .gq TLD's (see the attached screenshot). The majority of these sites have however not linked to my main page but to a sub page which still is indexed by Google. Can all these spamlinks be the reason to why Google has deindexed the main page of my site? I've read that Google in general ignore links from spam sites, still I have taken action against these spam sites by submitting a disavow text file containing all these spam domains. I submitted this file about 2 days ago. I have now again requested the site to be reindexed so perhaps will it soon be listed again. Still, I can't keep having my site deindexed and having me reindexing it every second day. I would really appreciate if someone could give me some insight in this problem. moz.jpg
Intermediate & Advanced SEO | | Grodan21 -
Moving to a new site while keeping old site live
For reasons I won't get into here, I need to move most of my site to a new domain (DOMAIN B) while keeping every single current detail on the old domain (DOMAIN A) as it is. Meaning, there will be 2 live websites that have mostly the same content, but I want the content to appear to search engines as though it now belongs to DOMAIN B. Weird situation. I know. I've run around in circles trying to figure out the best course of action. What do you think is the best way of going about this? Do I simply point DOMAIN A's canonical tags to the copied content on DOMAIN B and call it good? Should I ask sites that link to DOMAIN A to change their links to DOMAIN B, or start fresh and cut my losses? Should I still file a change of address with GWT, even though I'm not going to 301 redirect anything?
Intermediate & Advanced SEO | | kdaniels0 -
Links from random sites: Disavow?
I am looking at the links to my site from GWT. I see a bunch of random sites I've never heard of. I never made an effort to get links from these sites. Sites like | http://www.xlx.pl | Also found one porn site! Should I just ignore these or disavow them?
Intermediate & Advanced SEO | | inhouseseo0 -
Site wide links removal
A website of mine has about 4,000 backlinks of which 2,500 of them are coming from one website to the homepage and about 6 internal pages. These have been built up over about 5 years, mainly via article posts. The site was recently hit via penguin 2.0 but has only had natural links built so i'm wondering if the sitewide links are in fact the issue? The website linking to mine is an authority source within its niche but the concern is the amount of backlinks coming from this one site and if it may now be seen as having a negative impact. When ive reviewed the links from this one site via a backlink removal tool about 80% seem fine and suggestions are to remove about 20% of the backlinks. Would you keep all the sitewide backlinks or remove them?
Intermediate & Advanced SEO | | jazavide
Have you come across a similar situation and how did it affect ranking/traffic?0 -
Any Suggestions For My Site?
I've recently started a website that is based on movie posters. The site has fundamentally been built for users and not SEO but I'm wondering if anyone can see any problems or just general advice that may help with our SEO efforts? The "content" on the website are the movie posters. I know Google likes text content, but I don't see what else we could add that wouldn't be purely for SEO. My site is: http://www.bit.ly/ZSPbTA
Intermediate & Advanced SEO | | whispertera0 -
Will Google bots crawl tablet optimized pages of our site?
We are in the process of creating a tablet experience for a portion of our site. We haven’t yet decided if we will use a one URL structure for pages that will have a tablet experience or if we will create separate URLs that can only be access by tablet users. Either way, will the tablet versions of these pages/URLs be crawled by Google bots?
Intermediate & Advanced SEO | | kbbseo0 -
Problems with a NoIndex NoFollow Site
For legal reasons my website is going to launch non-branded websites. We do not have the capacity to make these site sufficiently unique from the main site so we are planning on having them be NoIndex NoFollow. Are there any potential SEO problems here? What will the implication be if in ~1-2 years from launching the NoIndex NoFollow we make the site unique, take away the tag and want to start promoting these sites organically. Thanks!
Intermediate & Advanced SEO | | theLotter0 -
What is wrong with my once highly ranked site?
Hello, I'm really desperate for some help. I believe I've been hit by Panda as traffic started diminishing late March for my 6 year old website. I've never had a sharp drop, only gradual. I've lost a load traffic since then. I did change shopping carts in June since I had a a canonical issue from my old cart that couldn't be corrected. The old cart also did not allow me to have unique product titles and the urls were garbled with letters, numbers, etc... I know Panda did not like that. I've made several changes (cleaned up broken links, added more content to my site, added Disallow: /search/ and Disallow: /search to my robots.txt file to avoid duplicate content from my search box.) My most prized keywords (tutu and tutus) are on page 15 or so of Google now, when they used to be on page 1 or 2 at the worst. Other good ones are slipping too. I do need to hire an SEO to optimize my titles eventually as I believe my they are still a bit stuffed but before I do that, I am just trying to get the other stuff done first. My main questions are this... Do any of you see something else that looks bad to you in the eyes of Google? I'd love the cold, hard truth as I'm pretty desperate for my site to recover, if at all possible. Thank you in advance! Here is my site: http://alturl.com/mvmux
Intermediate & Advanced SEO | | tutugirl0