The "webmaster" disallowed all ROBOTS to fight spam! Help!!
-
One of the companies I do work for has a magento site. I am simply the SEO guy and they work the website through some developers who hold access to their systems VERY tightly. Using Google Webmaster Tools I saw that the robots.txt file was blocking ALL robots.
I immediately e-mailed out and received a long reply about foreign robots and scrappers slowing down the website. They told me I would have to provide a list of only the good robots to allow in robots.txt.
Please correct me if I'm wrong.. but isn't Robots.txt optional?? Won't a bad scrapper or bot still bog down the site? Shouldn't that be handled in httaccess or something different?
I'm not new to SEO but I'm sure some of you who have been around longer have run into something like this and could provide some suggestions or resources I could use to plead my case!
If I'm wrong.. please help me understand how we can meet both needs of allowing bots to visit the site but prevent the 'bad' ones. Their claim is the site is bombarded by tons and tons of bots that have slowed down performance.
Thanks in advance for your help!
-
Thanks for the suggestions!! I'll keep you updated.
-
You can get the list of good robots from the list at Robotstxt.org: http://www.robotstxt.org/db.html.
I'd recommend creating an edited version of the robots.txt file yourself, specifically Allowing googlebot and others. Then send that with a link to the robotstxt.org site.
You may need to get the business owners involved. IT exists to enable the business, not strap it down so it can't move.
-
What you could do is just add Allow statements for the different Googlebots and the bots of other search engines. This will probably make the developers happy so they can keep other bots out of the door (although I doubt this would work and definitely don't think that this should be the option to keep spammers away, but that says more about the quality of development ;-)).
-
Yes, there are a ton of bad bots one may want to block. Can you show us the robots.txt file? If they aren't blocking legit search engine bots, you're probably okayish. If they are actually blocking all bots, you have cause for concern.
Can you give us a screenshot from GWT?
I use a program called Screaming Frog daily. It's not malicious, off the shelf. I just want to crawl and gather meta data. I can tell it to disregard robots.txt. It will crawl a site until it hit's something password protected. There's not much any robots.txt can do about it, as it can also spoof user agents.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Ranking for a brand term with "&" (and) in the name?
Hello Moz community. We have a company that rebranded their name to "Bar & Cocoa" with the URL https://barandcocoa.com/. It's been about 3 months, and the website has yet to show up organically anywhere within the first 50 results foer their brand terms. It seems that Google pretty much ignores the "&" or "and" word when typing in bar & cocoa, or bar and cocoa in search. You'd think with that with the exact domain name, it would at least move the needle a bit, but it has not helped. Even being in Denver, I'm getting results for a "Bar Cocoa" business located in Charlotte, NC, and the secondary pages that belong to that business, and then a bunch of other companies, products and irrelevant search results (like a parked domain)! Any suggestions or ideas, please help!
Intermediate & Advanced SEO | | flowsimple1 -
SEO Best Practices regarding Robots.txt disallow
I cannot find hard and fast direction about the following issue: It looks like the Robots.txt file on my server has been set up to disallow "account" and "search" pages within my site, so I am receiving warnings from the Google Search console that URLs are being blocked by Robots.txt. (Disallow: /Account/ and Disallow: /?search=). Do you recommend unblocking these URLs? I'm getting a warning that over 18,000 Urls are blocked by robots.txt. ("Sitemap contains urls which are blocked by robots.txt"). Seems that I wouldn't want that many urls blocked. ? Thank you!!
Intermediate & Advanced SEO | | jamiegriz0 -
Google Webmaster Tools -> Sitemap suddent "indexed" drop
Hello MOZ, We had an massive SEO drop in June due to unknown reasons and we have been trying to recover since then. I've just noticed this yesterday and I'm worried. See: http://imgur.com/xv2QgCQ Could anyone help by explaining what would cause this sudden drop and what does this drop translates to exactly? What is strange is that our index status is still strong at 310 pages, no drop there: http://imgur.com/a1sRAKo And when I do search on google site:globecar.com everything seems normal see: http://imgur.com/O7vPkqu Thanks,
Intermediate & Advanced SEO | | GlobeCar0 -
Webmaster Tools "Not found" errors after sitemap update
Hello Mozzers - I found a sitemap with loads of URL errors on it (none of the URLs on sitemap actually existed) so I went ahead and updated sitemap - now I'm seeing a spike in "not found" errors in WMT - is this normal / anything to worry about when you significantly change a sitemap. I've never replaced every URL on a sitemap before! L
Intermediate & Advanced SEO | | McTaggart0 -
Wise or cluttery for a website? Should our "out of the mainstream" of popular products be listed on our site? (older/discontinued, umfamiliar brands, parts to products, etc...)
For instance, should we list replacement parts for a music stand? Or parts for a trumpet, like a valve button? To some, this seems like a cluttery thing to do. I suppose another way to ask would be, "Should we only list the high quantity selling items that are well branded and that everyone shops for, and leave the rest off the website for instore customers only to buy?" (FYI: Our website focus is for our local market mainly, and we're not trying to take on the world per-say, but if the world wants in, that's cool too.) (My thought here is that if a customer walks into our retail store and they request an odd ball part or item... we go hunting for it and find it for them. Or perhaps another Music Store needs a part? To me, it's ALL for sale,... right? Our retail depth, should be reflected in our online presence as much as possible,... correct? I'd personally choose to list the odd balls on our site, just as if a customer was standing in the store. Another side thought is, if we only list the main stream products... we are basically lessening our content (which could affect our rankings) and would be inviting ourselves into a higher competitive market place because we wouldn't be saying anything different than what most other music store sites out there say. I believe we need to show off our uniqueness,... and product depth (of course w/good SEO & content too) is really kinda it, aside of course also from good expert people and a large facility. But perhaps that's a wrong way to look at it?) Thanks, Kevin
Intermediate & Advanced SEO | | Kevin_McLeish0 -
Anyone managed to decrease the "not selected" graph in WMT?
Hi Mozzers. I am working with a very large E-com site that has a big issue with duplicate or near duplicate content. The site actually received a message in WMT listing out pages that Google deemed it should not be crawling. Many of these were the usual pagination / category sorting option URL issues etc. We have since fixed the issue with a combination of site changes, robots.txt, parameter handling and URL removals, however I was expecting the "not selected" graph in WMT to start dropping. The number of roboted pages has increased by around 1 million pages (which was expected) and indexed pages has actually increased despite removing hundreds of thousands of pages. I assume this is due to releasing some crawl bandwidth for more important pages like products. I guess my question is two-fold; 1. Is the "not selected" graph cumulative, as this would explain why it isn't dropping? 2. Has anyone managed to get this figure to significantly drop? Should I even care? I am relating this to Panda by the way. Important to note that the changes were made around 3 weeks ago and I am aware not everything will be re-crawled yet. Thanks,
Intermediate & Advanced SEO | | Further
Chris notselected.jpg0 -
If google ignores links from "spammy" link directories ...
Then why does SEO moz have this list: http://www.seomoz.org/dp/seo-directory ?? Included in that list are some pretty spammy looking sites such as: <colgroup><col width="345"></colgroup>
Intermediate & Advanced SEO | | adriandg
| http://www.site-sift.com/ |
| http://www.2yi.net/ |
| http://www.sevenseek.com/ |
| http://greenstalk.com/ |
| http://anthonyparsons.com/ |
| http://www.rakcha.com/ |
| http://www.goguides.org/ |
| http://gosearchbusiness.com/ |
| http://funender.com/free_link_directory/ |
| http://www.joeant.com/ |
| http://www.browse8.com/ |
| http://linkopedia.com/ |
| http://kwika.org/ |
| http://tygo.com/ |
| http://netzoning.com/ |
| http://goongee.com/ |
| http://bigall.com/ |
| http://www.incrawler.com/ |
| http://rubberstamped.org/ |
| http://lookforth.com/ |
| http://worldsiteindex.com/ |
| http://linksgiving.com/ |
| http://azoos.com/ |
| http://www.uncoverthenet.com/ |
| http://ewilla.com/ |0 -
Posing QU's on Google Variables "aclk", "gclid" "cd", "/aclk" "/search", "/url" etc
I've been doing a bit of stats research prompted by read the recent ranking blog http://www.seomoz.org/blog/gettings-rankings-into-ga-using-custom-variables There are a few things that have come up in my research that I'd like to clear up. The below analysis has been done on my "conversions". 1/. What does "/aclk" mean in the Referrer URL? I have noticed a strong correlation between this and "gclid" in the landing page variable. Does it mean "ad click" ?? Although they seem to "closely" correlate they don't exactly, so when I have /aclk in the referrer Url MOSTLY I have gclid in the landing page URL. BUT not always, and the same applies vice versa. It's pretty vital that I know what is the best way to monitor adwords PPC, so what is the best variable to go on? - Currently I am using "gclid", but I have about 25% extra referral URL's with /aclk in that dont have "gclid" in - so am I underestimating my number of PPC conversions? 2/. The use of the variable "cd" is great, but it is not always present. I have noticed that 99% of my google "Referrer URL's" either start with:
Intermediate & Advanced SEO | | James77
/aclk - No cd value
/search - No cd value
/url - Always contains the cd variable. What do I make of this?? Thanks for the help in advance!0