Wordpress error
-
On our Google Webmaster Tools I'm getting a Severe Health Warning regarding our Robot.txt file reading:
User-agent: *
Crawl-delay: 20User-agent: 008
Disallow: /I'm wondering how I can fix this and stop it happening again.
The site was hacked about 4 months ago but I thought we'd managed to clear things up.
Colin
-
This will be my first post on SEOmoz so bear with me
The way I understand it is that robots read the robots.txt file from top to bottom, and once they find a rule that applies to them they stop reading and begin crawling. So basically the robots.txt written as:
User-agent:*
Disallow:
Crawl-delay: 20
User-agent: 008
Disallow: /
would not have the desired result as user-agent 008 would first read the top guideline:
User-agent: *
Disallow:
Crawl-delay: 20
and then begin crawling your site, as it is first being told that All user-agents are disallowed to crawl no pages or directories.
The corrected way to write this would be:
User-agent: 008
Disallow: /
User-agent: *
Disallow:
Crawl-delay: 20
-
Hi Peter,
I've tested the robot.txt file in Webmaster Tools and it now seems to be working as it should and it seems Google is seeing the same file as I have on the server.
I'm afraid this side of things isn't' my area of expertise so it's been a bit of a minefield.
I've taken a subscription with sucuri.net and taken various other steps that hopefully will hel;p with security. But who knows?
Thanks,
Colin
-
Google is seeing the same Robots.txt content (in GWT) that you show in the physical file, right? I just want to make sure that, when the site was hacked, no changes were made that are showing different versions of files to Google. It sounds like that's not the case here, but it definitely can happen.
-
Blog isn't' showing now and my hosts say that the index.php file is missing from the directory but I can see it.
Strange.
Have contacted them again to see what the problem can be.
Bit of a wasted Saturday!
-
Thanks Keith. Just contacting out hosts.
Nightmare!
-
Looks like a 403 permissions problem, that's a server side error... Make sure you have the correct permissions set on the blog folder in IIS Personally I always host on Linux...
-
Mind you the whole blog is now showing an error message and cant' be viewed so looks like an afternoon of trial and error!
-
Thanks very much Keith. I've just edited the file as suggested.
I see the error but as I am the web guy I cant' figure out how to get rid of it.
I think it might be a plugin that's causing it so I'm going to disable the and re-able them one as a time.
I've just PM'd you by the way.
Thanks for your help Keith.
Colin
-
Use this:
**User-agent: * Disallow: /blog/wp-admin/ Disallow: /blog/wp-includes/ Sitemap: http://nile-cruises-4u.co.uk/sitemap.xml**
Any FYI, you have the following error on your blog:
Warning: is_readable() [function.is-readable]: open_basedir restriction in effect. File(D:\home\nile-cruises-4u.co.uk\wwwroot\blog/wp-content/plugins/D:\home\nile-cruises-4u.co.uk\wwwroot\blog\wp-content\plugins\websitedefender-wordpress-security/languages/WSDWP_SECURITY-en_US.mo) is not within the allowed path(s): (D:\home\nile-cruises-4u.co.uk\wwwroot) in D:\home\nile-cruises-4u.co.uk\wwwroot\blog\wp-includes\l10n.php on line **339 **
Get your web guy to look at that, it appears at the top of every blog page for me...
Hope that helps,
Keith
-
Thanks Keith.
Only part of our site is WP based. Would that be a problem using the example you kindly suggested?
-
I gave you an example of a basic robots.txt file that I use on one of my Wordpress sites above, I would suggest using that for now.
I would not bother messing around with crawl delay in robots.txt as Peter said above there are better ways to achieve this... Plus I doubt you need it any way.
Google caches the robots.txt info for about 24hrs normally in my experience... So it's possible the old cached version is still being used by Google.
-
Hi Guys,
Thanks so much for your help. As you say Troy, that's defintely not what I want.
I assumed when we were hacked (twice in 8 months) that it might have been a competitor as we are in a very competitive niche. Might be very wrong there but we have certainly lost our top ranking on Google.co.uk for our main key phrases and our now at about position 7 for the same key phrases after about 3 years at number 1.
So when I saw on Google Webmaster Tools yesterday that we had a severe health warning and that the Googlebot was being prevented crawling our site I thought it might be the aftereffects of the hack.
Today even though I changed the robot.txt file yesterday GWT is showing 1000 pages with errors, 285 Access Denied and 719 Not Found and this message: Googlebot is blocked from http://nile-cruises-4u.co.uk/
I've just tested the robot.txt via GWT and now get this message:
AllowedDetected as a directory; specific files may have different restrictionsSo maybe the pages will be able to access by Googlebot shortly and the Access Denied message will disappear.I've chaged the robot.txt file to
User-agent: *
Crawl-delay: 20But should I change it to a better version? Sorry guys, I'm an online travel agent and not great on coding and really techie stuff. Although I'm learning pretty quickly about the bad stuff!I seem to have a few problems getting this sorted and wonder if this is a part of why our page position is dropping? -
I would simplify your robots.txt to read something like:
**User-agent: * Disallow: /wp-admin/ Disallow: /wp-includes/ Sitemap: http://www.your-domain.com/sitemap.xml**
-
That's odd: "008" appears to be the user agent for "80legs", a custom crawler platform. I'm seeing it in other Robots.txt files.
-
I'm not 100% sure what he's seeing, but when I plug his robots.txt into the robots analysis tool, I get this back:
Googlebot blocked by line 5: Disallow: /
Detected as a directory; specific files may have different restrictions
However, when I gave the top "**User-agent: ***" the "Disallow: " it seemed to fix the problem. Like, it didn't understand that the **Disallow: / **was meant only for the 008 user-agent?
-
Not honestly sure what User-agent "008" is, but that seems harmless. Why the crawl delay? There are better ways to handle that than Robots.txt, if a crawler is giving you trouble.
Was there a specific message/error in GWT?
-
I think, if you have a robots.txt reading what you show above:
User-agent: * Crawl-delay: 20
User-agent: 008 Disallow: /
That just basically says, "Don't crawl my site at all" (The "Disallow: /" means, I'm not allowing anything to be crawled by any search engine that pays attention to robots.txt at all)
So...I'm guessing that's not what you want?
(Bah..ignore. "User-agent". I'm a fool)
Actually, this seems to have solved your issue...make sure you explicitly tell all other User-agents that they are allowed:
User-agent: * Disallow: Crawl-delay: 20
User-agent: 008 Disallow: /
The extra "Disallow:" under User-agent: * says "I'm not going to disallow anything to most user-agents." Then the Disallow under user-agent 008 seems to only apply to them.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Fun and games -- not -- after website relaunch (Rails to Wordpress)
I'm after some advice. I've dropped 1 million pageviews in the last 30 days after a website relaunch a few weeks ago, and my revenue is a quarter of what it was, which is a massive worry. After 12 years of hard work and a much anticipated upgrade, it's been a nightmare. There were so many problems with the changeover that should have been done, but I have been fixing the aftermath up like crazy since then and nothing seems to be improving. I just want to know if there are any glaring issues I have missed that I can focus on. I have been working on ridiculous stuff like duplicate content from multiple imports (301 redirects and removing the dupes from google) and so much more. Feels like a bomb and I am stuck underneath it. Website is BellyBelly.com.au. Thanks in advance.
Intermediate & Advanced SEO | | BellyBellyKelly0 -
WordPress and Rich Snippets plugin creating 501 error
Good Morning MOZguru's, Right, so I've been trying to install the Google schema.org rich snippet plugin through Wordpress for a website, and after I activate it, the website does not load ( appears a blank page) or loads very very slooooowww. Also through the MOZBar, Http status section, after the plug in it's activated it shows a 501 error. I had this issues with another website I was working on, hosted by Godaddy, and fixed it by installing plugins through the control panel on go daddy and not through WordPress. However this website is not hosted on the same platform. Does anyone know what should I do in order for the plugin to work and not affect the website? Many thanks, Moncia
Intermediate & Advanced SEO | | monicapopa0 -
Best way to fix 404 crawl errors caused by Private blog posts in WordPress?
Going over Moz Crawl error report and WMT's Crawl errors for a new client site... I found 44 High Priority Crawl Errors = 404 Not Found I found that those 44 blog pages were set to Private Mode (WordPress theme), causing the 404 issue.
Intermediate & Advanced SEO | | SEOEND
I was reviewing the blog content for those 44 pages to see why those 2010 blog posts, were set to private mode. Well, I noticed that all those 44 blog posts were pretty much copied from other external blog posts. So i'm thinking previous agency placed those pages under private mode, to avoid getting hit for duplicate content issues. All other blog posts posted after 2011 looked like unique content, non scraped. So my question to all is: What is the best way to fix the issue caused by these 44 pages? A. Remove those 44 blog posts that used verbatim scraped content from other external blogs.
B. Update the content on each of those 44 blog posts, then set to Public mode, instead of Private.
C. ? (open to recommendations) I didn't find any external links pointing to any of those 44 blog pages, so I was considering in removing those blog posts. However not sure if that will affect site in anyway. Open to recommendations before making a decision...
Thanks0 -
Pages Titles in SERPs - Wordpress Site
In Google SERPs we have several websites (built in wordpress) who's pages are being displayed without using the page title - is this google ignoring the page title or is there a problem in our code - also if this is google is it still taking notice of the page title to determine what content is on the page?I have read several articles on this but wondered if someone can advise - I can provide the URL if required.Also I wanted to 100% that our robots.txt is behaving its self.
Intermediate & Advanced SEO | | JohnW-UK0 -
Right SEO strategy for Wordpress
Hello all, I am working on my SEO strategy for a WordPress site. I am trying to cover all my keywords in: 1.a) Page title trying to have a length <70 1.b) Page url trying to have a length<115 My question is: should i try to focus all my keywords in both name and url page path? or only in the Page title as the SEOMOZ's guide suggest? I would go for a mix strategy with my keywords in both page title and url path name, but I do not know if the search engines PAY MORE ATTENTION TO THE PAGE TITLE, so mixing 1.a) and 1.b) would mean I am loosing keywords. I am using the WordPress All in ONE SEO Plugin. Do you recommend me this or any other plugin? This plugin has 3 input fields: a) Title tag b) Description tag c) Keywords My questions here are: a) If these tags replace the standard settings of WP as described in point 1.a) b) If the description and title tags are META TAGS that are not taken into account in terms of SEO but in terms of customer description of the contento of the page. c) Where are the keywords listed inserted in the page? In H1, H2, H3 and H4 tags? My feeling after reading the SEOMOZ guide is that this plugin is not providing any added value for SEO any more??' Thank you very much, Best regards, Antonio Alcocer
Intermediate & Advanced SEO | | aalcocer20030 -
Where would the SEO juice go if I have a wordpress site hosted by godaddy?
I am planning on moving my website to a wordpress that is hosted by godaddy. I am wondering where the SEO juice that my website has already gained would go. Would it go to godaddy when I make the move instead?
Intermediate & Advanced SEO | | SierraPCB0 -
Simple Press forum for wordpress
I'm using a forum plugin called Simple Press, and the rest of my site is looking good with only a few minor errors due to a long url. Anyway, the only 4 major errors I have are these; These 3 links have no titles, so is there somewhere I can give them titles, or do a rel=nofollow? /index.php?sf_ahah=acknowledge /index.php?sf_ahah=permissions /index.php?sf_ahah=tags And then the 3 above plus this one; http://www.societyforethicsand…..?xfeed=all Have no META DESCRIPTION associated with them. So, is there somewhere I can add the meta description for all 4? I have spoken to support, and it turns out the first 3 links with no titles are ajax content for pop ups, instead of waiting for them to work out how to resolve this issue, does anyone know how to stop them coming up as major errors?
Intermediate & Advanced SEO | | CosmikCarrot0