The "webmaster" disallowed all ROBOTS to fight spam! Help!!
-
One of the companies I do work for has a magento site. I am simply the SEO guy and they work the website through some developers who hold access to their systems VERY tightly. Using Google Webmaster Tools I saw that the robots.txt file was blocking ALL robots.
I immediately e-mailed out and received a long reply about foreign robots and scrappers slowing down the website. They told me I would have to provide a list of only the good robots to allow in robots.txt.
Please correct me if I'm wrong.. but isn't Robots.txt optional?? Won't a bad scrapper or bot still bog down the site? Shouldn't that be handled in httaccess or something different?
I'm not new to SEO but I'm sure some of you who have been around longer have run into something like this and could provide some suggestions or resources I could use to plead my case!
If I'm wrong.. please help me understand how we can meet both needs of allowing bots to visit the site but prevent the 'bad' ones. Their claim is the site is bombarded by tons and tons of bots that have slowed down performance.
Thanks in advance for your help!
-
Thanks for the suggestions!! I'll keep you updated.
-
You can get the list of good robots from the list at Robotstxt.org: http://www.robotstxt.org/db.html.
I'd recommend creating an edited version of the robots.txt file yourself, specifically Allowing googlebot and others. Then send that with a link to the robotstxt.org site.
You may need to get the business owners involved. IT exists to enable the business, not strap it down so it can't move.
-
What you could do is just add Allow statements for the different Googlebots and the bots of other search engines. This will probably make the developers happy so they can keep other bots out of the door (although I doubt this would work and definitely don't think that this should be the option to keep spammers away, but that says more about the quality of development ;-)).
-
Yes, there are a ton of bad bots one may want to block. Can you show us the robots.txt file? If they aren't blocking legit search engine bots, you're probably okayish. If they are actually blocking all bots, you have cause for concern.
Can you give us a screenshot from GWT?
I use a program called Screaming Frog daily. It's not malicious, off the shelf. I just want to crawl and gather meta data. I can tell it to disregard robots.txt. It will crawl a site until it hit's something password protected. There's not much any robots.txt can do about it, as it can also spoof user agents.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Is it helpful for seo to have helpful links at the footer?
HI, suppose my homepage has good content but no external links to other sites for more info. and no helpful internal links in the footer to learn more. and my competition has 9 internal links in the footer which links to other pages on the site who has more SEO boost? I know the answer , does it really makes a difference or its minute?
Intermediate & Advanced SEO | | SIMON-CULL1 -
Community Discussion - What's the ROI of "pruning" content from your ecommerce site?
Happy Friday, everyone! 🙂 This week's Community Discussion comes from Monday's blog post by Everett Sizemore. Everett suggests that pruning underperforming product pages and other content from your ecommerce site can provide the greatest ROI a larger site can get in 2016. Do you agree or disagree? While the "pruning" tactic here is suggested for ecommerce and for larger sites, do you think you could implement a similar protocol on your own site with positive results? What would you change? What would you test?
Intermediate & Advanced SEO | | MattRoney2 -
Use "If-Modified-Since HTTP header"
I´m working on a online brazilian marketplace ( looks like etsy in US) and we have a huge amount of pages... I´ve been studing a lot about that and I was wondering to use If-Modified-Since so Googlebot could check if the pages have been updated, and if it is not, there is no reason to get a new copy of them since it already has a current copy in the index. It uses a 304 status code, "and If a search engine crawler sees a web page status code of 304 it knows that web page has not been updated and does not need to be accessed again." Someone quoted before me**Since Google spiders billions of pages, there is no real need to use their resources or mine to look at a webpage that has not changed. For very large websites, the crawling process of search engine spiders can consume lots of bandwidth and result in extra cost and Googlebot could spend more time in pages actually changed or new stuff!**However, I´ve checked Amazon, Rakuten, Etsy and few others competitors and no one use it! I´d love to know what you folks think about it 🙂
Intermediate & Advanced SEO | | SeoMartin10 -
Help with htaccess
I just setup a WP install in a subfolder: domain.com/development/ However, there is an existing htaccess file in the root which contains the following: RewriteRule ^([A-Za-z_0-9-]+)$ /index.php?page=$1 [QSA]
Intermediate & Advanced SEO | | SCW
RewriteRule ^([A-Za-z_0-9-]+)/$ /index.php?page=$1 [QSA]
RewriteRule ^([A-Za-z_0-9-]+)/([a-z]+)$ /index.php?page=$1&comp=$2 [QSA]
RewriteRule ^([A-Za-z_0-9-]+)/([a-z]+)/$ /index.php?page=$1&comp=$2 [QSA] I need to leave the rules as-is due to the nature of CMS (not WP) under the root domain. Is it possible to include an exception or condition which allows URL requests containing /development/ to resolve to that folder? I tried to add: RewriteRule ^development/([A-Za-z_0-9-]+)$ /development/index.php?page=$1 [QSA] but this seems to send it in a loop back to the root. Thanks!!!0 -
Is it better "nofollow" or "follow" links to external social pages?
Hello, I have four outbound links from my site home page taking users to join us on our social Network pages (Twitter, FB, YT and Google+). if you look at my site home page, you can find those 4 links as 4 large buttons on the right column of the page: http://www.virtualsheetmusic.com/ Here is my question: do you think it is better for me to add the rel="nofollow" directive to those 4 links or allow Google to follow? From a PR prospective, I am sure that would be better to apply the nofollow tag, but I would like Google to understand that we have a presence on those 4 social channels and to make clearly a correlation between our official website and our official social channels (and then to let Google understand that our social channels are legitimate and related to us), but I am afraid the nofollow directive could prevent that. What's the best move in this case? What do you suggest to do? Maybe the nofollow is irrelevant to allow Google to correlate our website to our legitimate social channels, but I am not sure about that. Any suggestions are very welcome. Thank you in advance!
Intermediate & Advanced SEO | | fablau9 -
Photo Gallery marked as spam???
Hi all, we recently launched some articles with photo galleries. Our CMS produces a single URL for each photo you click--> www.domain.com/article-url/photogallery/1, www.domain.com/article-url/photogallery/2 and so forth... We have 6-15 photos in our galleries. Each photo has a caption which contains 1-3 sentences. We do not advertise on our pages, so these gallery pages just contain of the top navigation, sidebar, footer, picture and the caption. My question: Google is indexing these URLs, do you think that they will be considered spammy, as there is almost no content on these pages? Should we noindex them? Or canonical them to the article URL? Or write more content to each photo and let them be indexed??? Thanx....
Intermediate & Advanced SEO | | accessKellyOCG0 -
What is the proper syntax for rel="canonical" ??
I believe the proper syntax is like this [taken from the SEOMoz homepage]: However, one of the sites I am working on has all of their canonical tags set up like this: I should clarify, not all of their canonicals are identical to this one, they simply use this naming convention, which appears to be relative URLs instead of absolute. Doesn't the entire URL need to be in the tag? If that is correct, can you also provide me with an explanation that I can give to management please? They hate it when I say "Because I said so!" LOL
Intermediate & Advanced SEO | | danatanseo0 -
Help with a Sticky Site
Hey Everyone - I work for a company that is just getting into SEO. We have had some successes, but one project lately has got us stumped. We have been working hard, but have been unable to make an impact in Google rankings with the following site: http://stoneycreekinn.com/locations/index.cfm/DesMoines We are trying to optimize for the keyword phrase, "des moines hotel" This hotel is a branch location of a hotel chain in the Midwest. *Note we've already moved up some other branch locations for this hotel chain successfully. We've used several tools including the SEOmoz tool and seem to have higher marks than those sites that rank above us in Google surprisingly. Any idea what we're missing? Thanks!
Intermediate & Advanced SEO | | markhope0