Wordpress error
-
On our Google Webmaster Tools I'm getting a Severe Health Warning regarding our Robot.txt file reading:
User-agent: *
Crawl-delay: 20User-agent: 008
Disallow: /I'm wondering how I can fix this and stop it happening again.
The site was hacked about 4 months ago but I thought we'd managed to clear things up.
Colin
-
This will be my first post on SEOmoz so bear with me
The way I understand it is that robots read the robots.txt file from top to bottom, and once they find a rule that applies to them they stop reading and begin crawling. So basically the robots.txt written as:
User-agent:*
Disallow:
Crawl-delay: 20
User-agent: 008
Disallow: /
would not have the desired result as user-agent 008 would first read the top guideline:
User-agent: *
Disallow:
Crawl-delay: 20
and then begin crawling your site, as it is first being told that All user-agents are disallowed to crawl no pages or directories.
The corrected way to write this would be:
User-agent: 008
Disallow: /
User-agent: *
Disallow:
Crawl-delay: 20
-
Hi Peter,
I've tested the robot.txt file in Webmaster Tools and it now seems to be working as it should and it seems Google is seeing the same file as I have on the server.
I'm afraid this side of things isn't' my area of expertise so it's been a bit of a minefield.
I've taken a subscription with sucuri.net and taken various other steps that hopefully will hel;p with security. But who knows?
Thanks,
Colin
-
Google is seeing the same Robots.txt content (in GWT) that you show in the physical file, right? I just want to make sure that, when the site was hacked, no changes were made that are showing different versions of files to Google. It sounds like that's not the case here, but it definitely can happen.
-
Blog isn't' showing now and my hosts say that the index.php file is missing from the directory but I can see it.
Strange.
Have contacted them again to see what the problem can be.
Bit of a wasted Saturday!
-
Thanks Keith. Just contacting out hosts.
Nightmare!
-
Looks like a 403 permissions problem, that's a server side error... Make sure you have the correct permissions set on the blog folder in IIS Personally I always host on Linux...
-
Mind you the whole blog is now showing an error message and cant' be viewed so looks like an afternoon of trial and error!
-
Thanks very much Keith. I've just edited the file as suggested.
I see the error but as I am the web guy I cant' figure out how to get rid of it.
I think it might be a plugin that's causing it so I'm going to disable the and re-able them one as a time.
I've just PM'd you by the way.
Thanks for your help Keith.
Colin
-
Use this:
**User-agent: * Disallow: /blog/wp-admin/ Disallow: /blog/wp-includes/ Sitemap: http://nile-cruises-4u.co.uk/sitemap.xml**
Any FYI, you have the following error on your blog:
Warning: is_readable() [function.is-readable]: open_basedir restriction in effect. File(D:\home\nile-cruises-4u.co.uk\wwwroot\blog/wp-content/plugins/D:\home\nile-cruises-4u.co.uk\wwwroot\blog\wp-content\plugins\websitedefender-wordpress-security/languages/WSDWP_SECURITY-en_US.mo) is not within the allowed path(s): (D:\home\nile-cruises-4u.co.uk\wwwroot) in D:\home\nile-cruises-4u.co.uk\wwwroot\blog\wp-includes\l10n.php on line **339 **
Get your web guy to look at that, it appears at the top of every blog page for me...
Hope that helps,
Keith
-
Thanks Keith.
Only part of our site is WP based. Would that be a problem using the example you kindly suggested?
-
I gave you an example of a basic robots.txt file that I use on one of my Wordpress sites above, I would suggest using that for now.
I would not bother messing around with crawl delay in robots.txt as Peter said above there are better ways to achieve this... Plus I doubt you need it any way.
Google caches the robots.txt info for about 24hrs normally in my experience... So it's possible the old cached version is still being used by Google.
-
Hi Guys,
Thanks so much for your help. As you say Troy, that's defintely not what I want.
I assumed when we were hacked (twice in 8 months) that it might have been a competitor as we are in a very competitive niche. Might be very wrong there but we have certainly lost our top ranking on Google.co.uk for our main key phrases and our now at about position 7 for the same key phrases after about 3 years at number 1.
So when I saw on Google Webmaster Tools yesterday that we had a severe health warning and that the Googlebot was being prevented crawling our site I thought it might be the aftereffects of the hack.
Today even though I changed the robot.txt file yesterday GWT is showing 1000 pages with errors, 285 Access Denied and 719 Not Found and this message: Googlebot is blocked from http://nile-cruises-4u.co.uk/
I've just tested the robot.txt via GWT and now get this message:
AllowedDetected as a directory; specific files may have different restrictionsSo maybe the pages will be able to access by Googlebot shortly and the Access Denied message will disappear.I've chaged the robot.txt file to
User-agent: *
Crawl-delay: 20But should I change it to a better version? Sorry guys, I'm an online travel agent and not great on coding and really techie stuff. Although I'm learning pretty quickly about the bad stuff!I seem to have a few problems getting this sorted and wonder if this is a part of why our page position is dropping? -
I would simplify your robots.txt to read something like:
**User-agent: * Disallow: /wp-admin/ Disallow: /wp-includes/ Sitemap: http://www.your-domain.com/sitemap.xml**
-
That's odd: "008" appears to be the user agent for "80legs", a custom crawler platform. I'm seeing it in other Robots.txt files.
-
I'm not 100% sure what he's seeing, but when I plug his robots.txt into the robots analysis tool, I get this back:
Googlebot blocked by line 5: Disallow: /
Detected as a directory; specific files may have different restrictions
However, when I gave the top "**User-agent: ***" the "Disallow: " it seemed to fix the problem. Like, it didn't understand that the **Disallow: / **was meant only for the 008 user-agent?
-
Not honestly sure what User-agent "008" is, but that seems harmless. Why the crawl delay? There are better ways to handle that than Robots.txt, if a crawler is giving you trouble.
Was there a specific message/error in GWT?
-
I think, if you have a robots.txt reading what you show above:
User-agent: * Crawl-delay: 20
User-agent: 008 Disallow: /
That just basically says, "Don't crawl my site at all" (The "Disallow: /" means, I'm not allowing anything to be crawled by any search engine that pays attention to robots.txt at all)
So...I'm guessing that's not what you want?
(Bah..ignore. "User-agent". I'm a fool)
Actually, this seems to have solved your issue...make sure you explicitly tell all other User-agents that they are allowed:
User-agent: * Disallow: Crawl-delay: 20
User-agent: 008 Disallow: /
The extra "Disallow:" under User-agent: * says "I'm not going to disallow anything to most user-agents." Then the Disallow under user-agent 008 seems to only apply to them.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Fix Google Index error
I changed my blog URL structure Can Someone please let me how to solve this?
Intermediate & Advanced SEO | | Michael.Leonard0 -
How to 301 Redirect /page.php to /page, after a RewriteRule has already made /page.php accessible by /page (Getting errors)
A site has its URLs with php extensions, like this: example.com/page.php I used the following rewrite to remove the extension so that the page can now be accessed from example.com/page RewriteCond %{REQUEST_FILENAME}.php -f
Intermediate & Advanced SEO | | rcseo
RewriteRule ^(.*)$ $1.php [L] It works great. I can access it via the example.com/page URL. However, the problem is the page can still be accessed from example.com/page.php. Because I have external links going to the page, I want to 301 redirect example.com/page.php to example.com/page. I've tried this a couple of ways but I get redirect loops or 500 internal server errors. Is there a way to have both? Remove the extension and 301 the .php to no extension? By the way, if it matters, page.php is an actual file in the root directory (not created through another rewrite or URI routing). I'm hoping I can do this, and not just throw a example.com/page canonical tag on the page. Thanks!0 -
Error Meta Description
(adult website) https://www.google.com.br/webhp?sourceid=chrome-instant&ion=1&espv=2&ie=UTF-8#q=robertinha Why Google is not reading my description of Yoast plugin? Vídeos de sexo - Vídeos porno
Intermediate & Advanced SEO | | stroke
www.robertinha.com.br/
Robertinha.com.br. lupa. facebook twitter plus. Página Inicial; Última Atualização: terça, 14 abril 2015. Página Inicial. Categorias. Amadoras (227) · Coroas (6) ... If I site: meusite.com.br work, he read correctly, but the site search not.
I do not understand https://www.google.com.br/webhp?sourceid=chrome-instant&ion=1&espv=2&ie=UTF-8#q=site:robertinha.com.br Vídeos de sexo - Vídeos porno
www.robertinha.com.br/
Vídeos de sexo grátis: assista agora mesmo vídeos porno com gatas, gostosas, safadas fazendo muito sexo.0 -
Wordpress Comments Pagination
Hi Mozzers What is your view on the following. Should you Paginate comments to increase page speed? If yes, at what # of comments would you begin pagination? (with the objective being decreasing page load times) Apply rel="canonical" back to the main article URL? eg: url/comment-page-1 => url noindex the comment pages? create a "View all" comments page? Thanks in advance for your help! 🙂
Intermediate & Advanced SEO | | jeremycabral
J0 -
Wordpress Config Thoughts: Multisite vs. Parent/Child Themes vs. Infinite WP?
We publish four local food and drink magazines, each with its own website and related web content. Even though the content across all four titles shares a common mission, there is little overlap in actual stories. That is, each site has its own story content, events calendar and business listing guide. Still, since we share an editorial staff and a common look among all four, we are debating the pros and cons of a few different wordpress and SEO configurations, and would welcome the community's input on the pros and cons. Here is what we are considering for the Wordpress configuration: Wordpress Multisite - concerns about 10-15% performance hit, incompatibility with certain plug ins, need to more ‘expert’ development InfiniteWP - concerns that adding a 3rd party plugin to the mix might complicate things Parent / child themes A single wordpress site with different content subfolders for each locale - simplifies events / guide listings / seo, but too much in one place? Problems with current config (four different wordpress installs across four different base domains - ediblemanhattan.com, ediblebrooklyn.com, ediblelongisland.com, etc) SEO value is currently spread across four base domains Four different wordpress installs / upgrades / templates / plugins must be managed separately Four different namespaces for registered users make cross-domain registration more difficult, less usable The independent site approach is potentially problematic if we were to decide to combine certain site features - for example guide and event listings - into a single site experience filterable by zip / location Our questions: WP config: independent sites vs. multisite vs. parent/child themes vs. other? SEO config: should we move to shared parent domain? If we do, should we use locale-based subfolders or second level domains (brooklyn.ediblemag.com vs. ediblemag.com/brooklyn)? Operations: We think there are SEO advantages to move all four sites share the same base domain - ex, ediblemagazine.com, but are there operational disadvantages we are not considering? Ability for local site editors to work within their locale section only Ability for ad sales to target a single locale, example, run of site display ads on specific locales Ability to segment users by their locale - ex. enroll users in email lists for edible brooklyn only
Intermediate & Advanced SEO | | brianhalweil0 -
Recovering from Programmers Error
Hey Everybody! Last year one of my bigger sites hit a snaffu. I was getting about 300k + hits a day from google, and then, when a developper released an update with a robots.txt file that basically blocked google from the entire site. We didn't notice the bug until a few days later, but by then, it was already too late. My google traffic dropped to 30k a day and I've been having the hardest time coming back ever since. As a matter of fact, hundreds of sites that were aggregating my content started outranking me for my own terms. For over a year, I've been working on building what I lost back and everything seemed to be coming together. I was back at 100k+ hits a day Until today... My developpers repeated the exact same error as last year. They blocked google from crawling my site for over 5 days and now I'm down to 10k se hits a day. My question : Has anyone encountered this problem before and what did you do to come back?
Intermediate & Advanced SEO | | CrakJason0 -
Robots.txt is blocking Wordpress Pages from Googlebot?
I have a robots.txt file on my server, which I did not develop, it was done by the web designer at the company before me. Then there is a word press plugin that generates a robots.txt file. How Do I unblock all the wordpress pages from googlebot?
Intermediate & Advanced SEO | | ENSO0 -
Where to get a video sitemap creator for Wordpress?
I have a website that is nearly all about videos and is based on Wordpress. Does anyone know of a way to create a video sitemap that updates automatically as I write a new post? The video files and other data are all stored in separate meta-post locations... So it needs to be able to grab them. Any help is appreciated.
Intermediate & Advanced SEO | | DojoGuy0