What To Do About Yahoo Slurp Bot Bogging My Site Down?
-
Hello,
Our IT department has informed me that they have seen extremely heavy traffic from the Yahoo Slurp bot in recent days. They are claiming this bot has single-handedly caused one of our servers to crash.
I am a bit skeptical of this, as I have not found these particular legitimate search engine bots to be aggressive resource hogs, especially for an enterprise-level web server.
I have requested to examine the server logs myself, but have not had success with this. IT is requesting to block this particular bot, but I am apprehensive about doing this, as I don't want this to have any negative implications on our site showing in Yahoo News or other Yahoo properties.
Does anyone else have experience with this bot being an overly-zealous resource drag, and if so, what is the best course of action to satisfy all parties?
-
Examining the server logs yourself probably wont help your understanding of the issue unless you know what your looking at specifically. On the Yahoo note, i have found Slurp to be really bad in the past, but no legitimate bot should be able to bring down a properly configured web server, especially an 'enterprise-level' one.
I would check your .htaccess and apache settings for bad redirects (or web.conf if on windows) before considering banning the bot. Other things to check would be website code or if a bot hits a massive and horribly optimised Database Query for example, that could bring the server down.
Ask IT exactly what the bot did that caused the server to go down, they should atleast be able to tell you that. If not then they need to run load tests against the website itself to try and reproduce the scenario and thus debug the issue, if indeed there is one.
Tl;dr :- Normally bad config or code / queries are to blame for this kind of thing. I'd review that before blocking a bot that crawls hundreds of thousands of other sites without issue.
-
You should be able to can control the rate at which the bot accesses you pages by adding a crawl delay in your robots.txt file. Robots.txt and crawl delay is discussed here: http://en.wikipedia.org/wiki/Robots_exclusion_standard, and Slurp bot here: https://help.yahoo.com/kb/SLN22600.html.
Should look like this in your robots.txt file:
User-agent: Slurp
Crawl-delay: 30
The crawl delay is the number of seconds the bot should wait between pageview (ask your IT guys what's appropriate for you). I stuck 30 in there, meaning the Slurp bot would only be able to access up to 2 pages a minute.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Ranking Issue for New Site
Hi all, I have got a specific SEO challenge. 6 months ago, we started to build an eCommerce site (located in the UK). In order to speed up the site launch, we copied the entire site over from an existing site based in Ireland. Now, the new UK site has been running for 5 months. Google has indexed many pages, which is good, but we can't rank high (position: between 20-30 for most pages). We thought it was because of content duplication in spite of different regions. So we tried to optimize the pages for the UK site to make them more UK-related and avoid content duplication. I've also used schema to tell google it's a UK-based site and set up Google my business and got more local citations. Besides, If you could give me any suggestions, it'd be perfect.
Intermediate & Advanced SEO | | Insightful_Media
Thank you so much for your time and advice.1 -
Links from a penalised site.
Hey Mozzers, Recently we have had a series of agencies in to pitch for work, one group mentioned that due to our association with a possibly penalised product review website, any links and activity associated with the brand would hinder our SEO. We currently have a good rating, but we are now no longer pushing our customers to the site as we move to a new platform. The current link back from this website is also no-followed. Any thoughts on how this could impact us? And how the agencies determined the site was penalised and causing us problems. Cheers Tim
Intermediate & Advanced SEO | | TimHolmes0 -
Having problem with multiple ccTLD sites, SERP showing different sites on different region
Hi everyone, We have more than 20 websites for different region and all the sites have their specific ccTLD. The thing is we are having conflict in SERP for our English sites and almost all the English sites have the same content I would say 70% of the content is duplicating. Despite having a proper hreflang, I see co.uk results in (Google US) and not only .co.uk but also other sites are showing up (xyz.in, xyz.ie, xyz.com.au)The tags I'm using are below, if the site is for the US I'm using canonical and hreflang tag :https://www.xyz.us/" />https://www.xyz.us/" hreflang="en-us" />and for the UK siteshttps://www.xyz.co.uk/" />https://www.xyz.co.uk/" hreflang="en-gb" />I know we have ccTLD so we don't have to use hreflang but since we have duplicate content so just to be safe we added hreflang and what I have heard/read that there is no harm if you have hreflang (of course If implemented properly).Am I doing something wrong here? Or is it conflicting due to canonicals for the same content on different regions and we are confusing Google so (Google showing the most authoritative and relevant results)Really need help with this.Thanks,
Intermediate & Advanced SEO | | shahryar890 -
What do you think about SEO of big sites ?
Hi, I was doing some research of new huge sites for example carstory.com that have over million pages and i notice that many new sites have strong growing for number of keywords and then at some point everything start going down (Image of traffic drop attached) there are no major updates at this time but you can clearly see even on recent kewyords changes that this site start loosing keywords every day , so number of new keywords are much less that lost keywords. How would you explain it ? Is that at some point when site have more than X number of indexed pages then power of domain is not enough to keep all of them at the top and those keywords start dropping ? Please share you opinion and if you have any experience by yourself with huge sites. Thank You very appreciated 2LC3AxE
Intermediate & Advanced SEO | | logoderivv0 -
Subdomained White-Label Sites
Wanted to pass along a specific use-case that I'm thinking through in the technical setup for a client. Site: http://www.abc.com is an ecommerce company that offers the ability to white-label a site so an affiliate can join and get access to the site, and ultimately get a cut of whatever is sold through that affiliate. So I join the site and get access to scott.xyz.com and can handle my business through that. From a technical standpoint, this is the proposed technical setup of the site. Canonical URLS will be set to www.xyz.com Pages on scott.xyz.com will be set to noindex, while the main www.xyz.com will be set to be indexed Webmaster Tools for scott.xyz.com will be set to have preferred domain of www.xyz.com scott.xyz.com will have separate robots.txt instructing to block crawl Questions Am I missing any steps in properly setting up the technical background of the subdomain sites? The use of subdomains isn't something that I am able to move away from. Will any links in to scott.xyz.com pass juice and authority to www.xyz.com, or does the noindex/nocrawl block that from happening? Is there anything else that I am missing? Thanks!
Intermediate & Advanced SEO | | RosemarieReed
Scott0 -
Site redesign..have I done everything?
Hello, We have a site that was recently put through the redesign process-a couple of weeks ago. It was a tired site that was optimized well, but still struggled because it was so outdated. I went ahead and re-optimized, submitted a new sitemap, and did the fetch. Have I missed a step? Could someone offer insight into what they do when a site is redesigned and the steps taken to make sure that Google crawls and "appreciates" 🙂 the new site as soon as possible? Thanks in advance for any and all help!
Intermediate & Advanced SEO | | lfrazer0 -
Merging three sites to one
Hi guys, I just wanted confirmation if this is the right way to go about doing this. I need to merge three websites and I've never done three websites in to a brand new site before. Ok so we have Sitex.com
Intermediate & Advanced SEO | | Profero
Sitey.com
Sitez.com We've created a SiteB.com SiteB.com has SiteB.com/SiteXCat
SiteB.com/SiteYCat
SiteB.com/SiteZCat Each X,Y and Z have over 1,000 pages. They only have about 10 pages each with Page Authority above 10 and the domains arn't that strong. What i plan to do is: 301 redirect each site domain (X,Y,,Z) to it's corresponding category. e.g. Sitex.com > SiteB.com/SiteXCat 301 redirect each page off X,Y,Z that has a Page Authority above 10 to their new pages on SiteB.com Then, I'm unsure if i should 410 every other URL... I don't think its worht 301 every single URL if they arn't in search results much - but maybe it is if they have a lot of inbound links even with low page authority? Any ideas and does the above seem the best practise? Thanks.0 -
Splitting a Site into Two Sites for SEO Purposes
I have a client that owns a business that really could be easily divided into two separate business in terms of SEO. Right now his web site covers both divisions of his business. He gets about 5500 visitors a month. The majority go to one part of his business and around 600 each month go to the other. So about 11% I'm considering breaking off this 11% and putting it on an entirely different domain name. I think I could rank better for this 11%. The site would only be SEO'd for this particular division of the company. The keywords would not be in competition with each other. I would of course link the two web sites and watch that I don't run into any duplicate content issues. I worry about placing the redirects from the pages that I remove to the new pages. I know Google is not a fan of redirects. Then I also worry about the eventual drop in traffic to the main site now. How big of a factor is traffic in rankings? Other challenges include that the business services 4 major metropolitan areas. Would you do this? Have you done this? How did it work? Any suggestions?
Intermediate & Advanced SEO | | MSWD0