The "webmaster" disallowed all ROBOTS to fight spam! Help!!
-
One of the companies I do work for has a magento site. I am simply the SEO guy and they work the website through some developers who hold access to their systems VERY tightly. Using Google Webmaster Tools I saw that the robots.txt file was blocking ALL robots.
I immediately e-mailed out and received a long reply about foreign robots and scrappers slowing down the website. They told me I would have to provide a list of only the good robots to allow in robots.txt.
Please correct me if I'm wrong.. but isn't Robots.txt optional?? Won't a bad scrapper or bot still bog down the site? Shouldn't that be handled in httaccess or something different?
I'm not new to SEO but I'm sure some of you who have been around longer have run into something like this and could provide some suggestions or resources I could use to plead my case!
If I'm wrong.. please help me understand how we can meet both needs of allowing bots to visit the site but prevent the 'bad' ones. Their claim is the site is bombarded by tons and tons of bots that have slowed down performance.
Thanks in advance for your help!
-
Thanks for the suggestions!! I'll keep you updated.
-
You can get the list of good robots from the list at Robotstxt.org: http://www.robotstxt.org/db.html.
I'd recommend creating an edited version of the robots.txt file yourself, specifically Allowing googlebot and others. Then send that with a link to the robotstxt.org site.
You may need to get the business owners involved. IT exists to enable the business, not strap it down so it can't move.
-
What you could do is just add Allow statements for the different Googlebots and the bots of other search engines. This will probably make the developers happy so they can keep other bots out of the door (although I doubt this would work and definitely don't think that this should be the option to keep spammers away, but that says more about the quality of development ;-)).
-
Yes, there are a ton of bad bots one may want to block. Can you show us the robots.txt file? If they aren't blocking legit search engine bots, you're probably okayish. If they are actually blocking all bots, you have cause for concern.
Can you give us a screenshot from GWT?
I use a program called Screaming Frog daily. It's not malicious, off the shelf. I just want to crawl and gather meta data. I can tell it to disregard robots.txt. It will crawl a site until it hit's something password protected. There's not much any robots.txt can do about it, as it can also spoof user agents.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
What does Disallow: /french-wines/?* actually do - robots.txt
Hello Mozzers - Just wondering what this robots.txt instruction means: Disallow: /french-wines/?* Does it stop Googlebot crawling and indexing URLs in that "French Wines" folder - specifically the URLs that include a question mark? Would it stop the crawling of deeper folders - e.g. /french-wines/rhone-region/ that include a question mark in their URL? I think this has been done to block URLs containing query strings. Thanks, Luke
Intermediate & Advanced SEO | | McTaggart0 -
Dates in the URLs for a "hot" content website (tipping service)
Hi, I'm planning to build a website that will present games previews for different sports. I think that the date should be included in the URL as the content will be valuable until the kick off f the game. So first i want to know if this is the right approach and second the URL structure i have imagined is /tips/sport/competition/year/month/day Ex : /tips/football/premier_league/2013/11/05 Is this a good structure ? Guillaume.
Intermediate & Advanced SEO | | betadvisor0 -
Help with htaccess
I just setup a WP install in a subfolder: domain.com/development/ However, there is an existing htaccess file in the root which contains the following: RewriteRule ^([A-Za-z_0-9-]+)$ /index.php?page=$1 [QSA]
Intermediate & Advanced SEO | | SCW
RewriteRule ^([A-Za-z_0-9-]+)/$ /index.php?page=$1 [QSA]
RewriteRule ^([A-Za-z_0-9-]+)/([a-z]+)$ /index.php?page=$1&comp=$2 [QSA]
RewriteRule ^([A-Za-z_0-9-]+)/([a-z]+)/$ /index.php?page=$1&comp=$2 [QSA] I need to leave the rules as-is due to the nature of CMS (not WP) under the root domain. Is it possible to include an exception or condition which allows URL requests containing /development/ to resolve to that folder? I tried to add: RewriteRule ^development/([A-Za-z_0-9-]+)$ /development/index.php?page=$1 [QSA] but this seems to send it in a loop back to the root. Thanks!!!0 -
Meta NoIndex tag and Robots Disallow
Hi all, I hope you can spend some time to answer my first of a few questions 🙂 We are running a Magento site - layered/faceted navigation nightmare has created thousands of duplicate URLS! Anyway, during my process to tackle the issue, I disallowed in Robots.txt anything in the querystring that was not a p (allowed this for pagination). After checking some pages in Google, I did a site:www.mydomain.com/specificpage.html and a few duplicates came up along with the original with
Intermediate & Advanced SEO | | bjs2010
"There is no information about this page because it is blocked by robots.txt" So I had added in Meta Noindex, follow on all these duplicates also but I guess it wasnt being read because of Robots.txt. So coming to my question. Did robots.txt block access to these pages? If so, were these already in the index and after disallowing it with robots, Googlebot could not read Meta No index? Does Meta Noindex Follow on pages actually help Googlebot decide to remove these pages from index? I thought Robots would stop and prevent indexation? But I've read this:
"Noindex is a funny thing, it actually doesn’t mean “You can’t index this”, it means “You can’t show this in search results”. Robots.txt disallow means “You can’t index this” but it doesn’t mean “You can’t show it in the search results”. I'm a bit confused about how to use these in both preventing duplicate content in the first place and then helping to address dupe content once it's already in the index. Thanks! B0 -
How important is a good "follow" / "no-follow" link ratio for SEO?
Is it very important to make sure most of the links pointing at your site are "follow" links? Is it problematic to post legitimate comments on blogs that include a link back to relevant content or posts on your site?
Intermediate & Advanced SEO | | BlueLinkERP0 -
What is the best way to optimize/setup a teaser "coming soon" page for a new product launch?
Within the context of a physical product launch what are some ideas around creating a /coming-soon page that "teases" the launch. Ideally I'd like to optimize a page around the product, but the client wants to try build consumer anticipation without giving too many details away. Any thoughts?
Intermediate & Advanced SEO | | GSI0 -
Help!!! Am I being Attacked???
Hello, I do not believe so much in spammy links attacks and I definitely do not believe my site is worth attacking. However, I'm seeing new links pointing to my site that I have no idea where they come from. I just spotted three articles on a poor crappy article site with exact match keywords point to me. The articles are completely unique (copyscaped them) and they were posted according to the site time stamp during Oct and Nov 2012. (And they Appear in the WMT recently discovered links from more or less the same time). What to do (besides for disavowing this domain)? Thanks
Intermediate & Advanced SEO | | BeytzNet0 -
"Hotel" SEO & TripAdvisor
I am trying to learn a little more about Travel SEO, particularly in the "hotel" vertical. what are some of the top Hotel SEO sites out there and what are they doing right? Tripadvisor is great at SEO in general, but I've heard they struggle a little in the "hotel" vertical. Is there anything they can do to improve their rankings in this area? Does anyone have any suggestions, whether it be a far out idea or on-site optimization? Thanks!
Intermediate & Advanced SEO | | Super70