Wordpress error
-
On our Google Webmaster Tools I'm getting a Severe Health Warning regarding our Robot.txt file reading:
User-agent: *
Crawl-delay: 20User-agent: 008
Disallow: /I'm wondering how I can fix this and stop it happening again.
The site was hacked about 4 months ago but I thought we'd managed to clear things up.
Colin
-
This will be my first post on SEOmoz so bear with me
The way I understand it is that robots read the robots.txt file from top to bottom, and once they find a rule that applies to them they stop reading and begin crawling. So basically the robots.txt written as:
User-agent:*
Disallow:
Crawl-delay: 20
User-agent: 008
Disallow: /
would not have the desired result as user-agent 008 would first read the top guideline:
User-agent: *
Disallow:
Crawl-delay: 20
and then begin crawling your site, as it is first being told that All user-agents are disallowed to crawl no pages or directories.
The corrected way to write this would be:
User-agent: 008
Disallow: /
User-agent: *
Disallow:
Crawl-delay: 20
-
Hi Peter,
I've tested the robot.txt file in Webmaster Tools and it now seems to be working as it should and it seems Google is seeing the same file as I have on the server.
I'm afraid this side of things isn't' my area of expertise so it's been a bit of a minefield.
I've taken a subscription with sucuri.net and taken various other steps that hopefully will hel;p with security. But who knows?
Thanks,
Colin
-
Google is seeing the same Robots.txt content (in GWT) that you show in the physical file, right? I just want to make sure that, when the site was hacked, no changes were made that are showing different versions of files to Google. It sounds like that's not the case here, but it definitely can happen.
-
Blog isn't' showing now and my hosts say that the index.php file is missing from the directory but I can see it.
Strange.
Have contacted them again to see what the problem can be.
Bit of a wasted Saturday!
-
Thanks Keith. Just contacting out hosts.
Nightmare!
-
Looks like a 403 permissions problem, that's a server side error... Make sure you have the correct permissions set on the blog folder in IIS Personally I always host on Linux...
-
Mind you the whole blog is now showing an error message and cant' be viewed so looks like an afternoon of trial and error!
-
Thanks very much Keith. I've just edited the file as suggested.
I see the error but as I am the web guy I cant' figure out how to get rid of it.
I think it might be a plugin that's causing it so I'm going to disable the and re-able them one as a time.
I've just PM'd you by the way.
Thanks for your help Keith.
Colin
-
Use this:
**User-agent: * Disallow: /blog/wp-admin/ Disallow: /blog/wp-includes/ Sitemap: http://nile-cruises-4u.co.uk/sitemap.xml**
Any FYI, you have the following error on your blog:
Warning: is_readable() [function.is-readable]: open_basedir restriction in effect. File(D:\home\nile-cruises-4u.co.uk\wwwroot\blog/wp-content/plugins/D:\home\nile-cruises-4u.co.uk\wwwroot\blog\wp-content\plugins\websitedefender-wordpress-security/languages/WSDWP_SECURITY-en_US.mo) is not within the allowed path(s): (D:\home\nile-cruises-4u.co.uk\wwwroot) in D:\home\nile-cruises-4u.co.uk\wwwroot\blog\wp-includes\l10n.php on line **339 **
Get your web guy to look at that, it appears at the top of every blog page for me...
Hope that helps,
Keith
-
Thanks Keith.
Only part of our site is WP based. Would that be a problem using the example you kindly suggested?
-
I gave you an example of a basic robots.txt file that I use on one of my Wordpress sites above, I would suggest using that for now.
I would not bother messing around with crawl delay in robots.txt as Peter said above there are better ways to achieve this... Plus I doubt you need it any way.
Google caches the robots.txt info for about 24hrs normally in my experience... So it's possible the old cached version is still being used by Google.
-
Hi Guys,
Thanks so much for your help. As you say Troy, that's defintely not what I want.
I assumed when we were hacked (twice in 8 months) that it might have been a competitor as we are in a very competitive niche. Might be very wrong there but we have certainly lost our top ranking on Google.co.uk for our main key phrases and our now at about position 7 for the same key phrases after about 3 years at number 1.
So when I saw on Google Webmaster Tools yesterday that we had a severe health warning and that the Googlebot was being prevented crawling our site I thought it might be the aftereffects of the hack.
Today even though I changed the robot.txt file yesterday GWT is showing 1000 pages with errors, 285 Access Denied and 719 Not Found and this message: Googlebot is blocked from http://nile-cruises-4u.co.uk/
I've just tested the robot.txt via GWT and now get this message:
AllowedDetected as a directory; specific files may have different restrictionsSo maybe the pages will be able to access by Googlebot shortly and the Access Denied message will disappear.I've chaged the robot.txt file to
User-agent: *
Crawl-delay: 20But should I change it to a better version? Sorry guys, I'm an online travel agent and not great on coding and really techie stuff. Although I'm learning pretty quickly about the bad stuff!I seem to have a few problems getting this sorted and wonder if this is a part of why our page position is dropping? -
I would simplify your robots.txt to read something like:
**User-agent: * Disallow: /wp-admin/ Disallow: /wp-includes/ Sitemap: http://www.your-domain.com/sitemap.xml**
-
That's odd: "008" appears to be the user agent for "80legs", a custom crawler platform. I'm seeing it in other Robots.txt files.
-
I'm not 100% sure what he's seeing, but when I plug his robots.txt into the robots analysis tool, I get this back:
Googlebot blocked by line 5: Disallow: /
Detected as a directory; specific files may have different restrictions
However, when I gave the top "**User-agent: ***" the "Disallow: " it seemed to fix the problem. Like, it didn't understand that the **Disallow: / **was meant only for the 008 user-agent?
-
Not honestly sure what User-agent "008" is, but that seems harmless. Why the crawl delay? There are better ways to handle that than Robots.txt, if a crawler is giving you trouble.
Was there a specific message/error in GWT?
-
I think, if you have a robots.txt reading what you show above:
User-agent: * Crawl-delay: 20
User-agent: 008 Disallow: /
That just basically says, "Don't crawl my site at all" (The "Disallow: /" means, I'm not allowing anything to be crawled by any search engine that pays attention to robots.txt at all)
So...I'm guessing that's not what you want?
(Bah..ignore. "User-agent". I'm a fool)
Actually, this seems to have solved your issue...make sure you explicitly tell all other User-agents that they are allowed:
User-agent: * Disallow: Crawl-delay: 20
User-agent: 008 Disallow: /
The extra "Disallow:" under User-agent: * says "I'm not going to disallow anything to most user-agents." Then the Disallow under user-agent 008 seems to only apply to them.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Wordpress pages posts
Say you have a WordPress website with reviews and lists. Would you use "post" or "page" type for them? Is there any SEO advantage in using pages/subpages instead of posts?
Intermediate & Advanced SEO | | fabx1 -
Hostage Taking by My Wordpress Developer
Since 2013 a Wordpress developer has coded my real estate website. Their hourly rate is $24 but the programmers take too long to perform tasks and the service has become prohibitively expensive. Examples of unreasonable time estimates below: | | 1. Change theme settings so posts/pages do not display a date. -> 7 hrs
Intermediate & Advanced SEO | | Kingalan1
2.Google search results are displaying the breadcrumb on the top of each page rather than the URL. Please correct so this does not display. -> 3 hrs
3. Install SSL certificate to www.metro-manhattan.com domain -> 8 hrs | | The above does not include 5-6 hours for testing. I am considering changing vendors. Potential programmers have asked how the site was developed and to what extent is it is customized. Ends up several plugins were built from scratch. My question is whether a new developer is going to be able to pick up a custom coded site. That without understanding how the site was built, any change will break the site. My concern is that current developer has made themselves indispensable, and created a situation where there is no alternative to using them and they can therefore charge any price they want.Any thoughts? Also below are questions I asked my developer about how the site was built and their answers: | 1. Was everything coded using a child theme?
No, is a custom theme. 2. Did you use any ready made theme or just plugins
We used the theme and and we've used plugins. Third party plugins and plugins builded from scratch 3. Can Wordpress and every one of the plugin be updated?
Wordpress can be updated, core files was never modified. If after an update something start to work wrong is due to some radical wordpress change or similar Can't be updated: FireStorm Professional Real Estate Plugin Created at xxx: Form Submissions Report Miscellaneous Hooks and Filters NYC Check memory usage NYC SEO listings NYC Slider Sitemap Updater 4. Were any of the plugins customized and if so, which ones?
Yes, this plugin "FireStorm Professional Real Estate Plugin" |0 -
New Yoast SEO Wordpress Plugin - Using Passive Voice In Copy?
Hi all, we wondered if anyone is using the updated version of Yoast's SEO plugin for Wordpress? We use it on our Woocommerce website (along with the Woocommerce extension) and the latest version seems to have focussed on moving the writing style on. To avoid using passive voice. https://kb.yoast.com/kb/passive-voice/ We're struggling a bit to get this right, it seems to be quite a change to our current style...so we're wondering if anyone else has been looking at this and feeling like they are getting it right? Or if anyone has any resources good for helping to write more in this way? How important do people think this is? Many thanks, James & Ally
Intermediate & Advanced SEO | | allydr1 -
Implementing AMP pages on WordPress blog
Hey Moz Users, Has anyone tried using the WordPress plugin for AMP pages on their blog yet? Here's the link to it: https://wordpress.org/plugins/amp/. The implementation seems pretty straightforward but since there will be an AMP and a mobile friendly version of the posts on my blog I'm worried it will create a lot of duplicate content issues. I've seen a lot of articles pointing to a rel canonical tag that can be used to fix this situation. Not sure if I'm going to have an AMP version of all the posts on my blog, so this seems like it would be a pain to place the tag manually on specific pages with the AMP version only. Has anyone tried this plugin and what have you done to fix this duplicate content issue? Thanks
Intermediate & Advanced SEO | | znotes0 -
Best way to structure urls wordpress and Yoast?
I am using Wordpress and Yoast. I have Parent pages and child pages. Yoast recommends you have the keyword in the url. For the parent page I have the city name in the url. Question is, should the child pages also have the city name in the url or would that be considered keyword stuffing? Here is the current structure. http://forestparkdental.info/st-louis-dental-services/restorative-dentistry/inlays-and-onlays So didn't know if should have the end of that url as /restorative-dentistry-st-louis /inlays-and-onlays-st louis since those are separate pages and Yoast and Moz plugin doesn't give you the Green light in in all areas unless you do it like this? Thanks Scott
Intermediate & Advanced SEO | | scott3150 -
Rankings gone, no WMT errors, help!
Hi, Client Google rankings have been seriously hit. We have done everything we know of to see why this is the case, and there is no obvious explanation. The client dominated search terms, and are no down on page 7/8 for these search terms. There are no errors in WMT, so we can not resubmit for reconsideration. This is a genuine client and their business has been seriously affected. Can anybody offer help? Thanks in advance!
Intermediate & Advanced SEO | | roadjan0 -
Why are our sites top landing pages URL's that no longer exist and retrun 404 errors?
Digging through analytics today an noticed that our sites top landing pages are for pages that were part of the old www.towelsrus.co.uk website taken down almost 12 months ago. All these pages had the 301 re-directs which were removed a few months back but still have not dropped out of Googles crawl error logs. I can't understand why this is happening but almost certainly the bounce rate on these pages (100%) mean we are loosing potential conversions. How can I identify what keywords and links people are using to land on these pages?
Intermediate & Advanced SEO | | Towelsrus0