Wordpress error
-
On our Google Webmaster Tools I'm getting a Severe Health Warning regarding our Robot.txt file reading:
User-agent: *
Crawl-delay: 20User-agent: 008
Disallow: /I'm wondering how I can fix this and stop it happening again.
The site was hacked about 4 months ago but I thought we'd managed to clear things up.
Colin
-
This will be my first post on SEOmoz so bear with me
The way I understand it is that robots read the robots.txt file from top to bottom, and once they find a rule that applies to them they stop reading and begin crawling. So basically the robots.txt written as:
User-agent:*
Disallow:
Crawl-delay: 20
User-agent: 008
Disallow: /
would not have the desired result as user-agent 008 would first read the top guideline:
User-agent: *
Disallow:
Crawl-delay: 20
and then begin crawling your site, as it is first being told that All user-agents are disallowed to crawl no pages or directories.
The corrected way to write this would be:
User-agent: 008
Disallow: /
User-agent: *
Disallow:
Crawl-delay: 20
-
Hi Peter,
I've tested the robot.txt file in Webmaster Tools and it now seems to be working as it should and it seems Google is seeing the same file as I have on the server.
I'm afraid this side of things isn't' my area of expertise so it's been a bit of a minefield.
I've taken a subscription with sucuri.net and taken various other steps that hopefully will hel;p with security. But who knows?
Thanks,
Colin
-
Google is seeing the same Robots.txt content (in GWT) that you show in the physical file, right? I just want to make sure that, when the site was hacked, no changes were made that are showing different versions of files to Google. It sounds like that's not the case here, but it definitely can happen.
-
Blog isn't' showing now and my hosts say that the index.php file is missing from the directory but I can see it.
Strange.
Have contacted them again to see what the problem can be.
Bit of a wasted Saturday!
-
Thanks Keith. Just contacting out hosts.
Nightmare!
-
Looks like a 403 permissions problem, that's a server side error... Make sure you have the correct permissions set on the blog folder in IIS Personally I always host on Linux...
-
Mind you the whole blog is now showing an error message and cant' be viewed so looks like an afternoon of trial and error!
-
Thanks very much Keith. I've just edited the file as suggested.
I see the error but as I am the web guy I cant' figure out how to get rid of it.
I think it might be a plugin that's causing it so I'm going to disable the and re-able them one as a time.
I've just PM'd you by the way.
Thanks for your help Keith.
Colin
-
Use this:
**User-agent: * Disallow: /blog/wp-admin/ Disallow: /blog/wp-includes/ Sitemap: http://nile-cruises-4u.co.uk/sitemap.xml**
Any FYI, you have the following error on your blog:
Warning: is_readable() [function.is-readable]: open_basedir restriction in effect. File(D:\home\nile-cruises-4u.co.uk\wwwroot\blog/wp-content/plugins/D:\home\nile-cruises-4u.co.uk\wwwroot\blog\wp-content\plugins\websitedefender-wordpress-security/languages/WSDWP_SECURITY-en_US.mo) is not within the allowed path(s): (D:\home\nile-cruises-4u.co.uk\wwwroot) in D:\home\nile-cruises-4u.co.uk\wwwroot\blog\wp-includes\l10n.php on line **339 **
Get your web guy to look at that, it appears at the top of every blog page for me...
Hope that helps,
Keith
-
Thanks Keith.
Only part of our site is WP based. Would that be a problem using the example you kindly suggested?
-
I gave you an example of a basic robots.txt file that I use on one of my Wordpress sites above, I would suggest using that for now.
I would not bother messing around with crawl delay in robots.txt as Peter said above there are better ways to achieve this... Plus I doubt you need it any way.
Google caches the robots.txt info for about 24hrs normally in my experience... So it's possible the old cached version is still being used by Google.
-
Hi Guys,
Thanks so much for your help. As you say Troy, that's defintely not what I want.
I assumed when we were hacked (twice in 8 months) that it might have been a competitor as we are in a very competitive niche. Might be very wrong there but we have certainly lost our top ranking on Google.co.uk for our main key phrases and our now at about position 7 for the same key phrases after about 3 years at number 1.
So when I saw on Google Webmaster Tools yesterday that we had a severe health warning and that the Googlebot was being prevented crawling our site I thought it might be the aftereffects of the hack.
Today even though I changed the robot.txt file yesterday GWT is showing 1000 pages with errors, 285 Access Denied and 719 Not Found and this message: Googlebot is blocked from http://nile-cruises-4u.co.uk/
I've just tested the robot.txt via GWT and now get this message:
AllowedDetected as a directory; specific files may have different restrictionsSo maybe the pages will be able to access by Googlebot shortly and the Access Denied message will disappear.I've chaged the robot.txt file to
User-agent: *
Crawl-delay: 20But should I change it to a better version? Sorry guys, I'm an online travel agent and not great on coding and really techie stuff. Although I'm learning pretty quickly about the bad stuff!I seem to have a few problems getting this sorted and wonder if this is a part of why our page position is dropping? -
I would simplify your robots.txt to read something like:
**User-agent: * Disallow: /wp-admin/ Disallow: /wp-includes/ Sitemap: http://www.your-domain.com/sitemap.xml**
-
That's odd: "008" appears to be the user agent for "80legs", a custom crawler platform. I'm seeing it in other Robots.txt files.
-
I'm not 100% sure what he's seeing, but when I plug his robots.txt into the robots analysis tool, I get this back:
Googlebot blocked by line 5: Disallow: /
Detected as a directory; specific files may have different restrictions
However, when I gave the top "**User-agent: ***" the "Disallow: " it seemed to fix the problem. Like, it didn't understand that the **Disallow: / **was meant only for the 008 user-agent?
-
Not honestly sure what User-agent "008" is, but that seems harmless. Why the crawl delay? There are better ways to handle that than Robots.txt, if a crawler is giving you trouble.
Was there a specific message/error in GWT?
-
I think, if you have a robots.txt reading what you show above:
User-agent: * Crawl-delay: 20
User-agent: 008 Disallow: /
That just basically says, "Don't crawl my site at all" (The "Disallow: /" means, I'm not allowing anything to be crawled by any search engine that pays attention to robots.txt at all)
So...I'm guessing that's not what you want?
(Bah..ignore. "User-agent". I'm a fool)
Actually, this seems to have solved your issue...make sure you explicitly tell all other User-agents that they are allowed:
User-agent: * Disallow: Crawl-delay: 20
User-agent: 008 Disallow: /
The extra "Disallow:" under User-agent: * says "I'm not going to disallow anything to most user-agents." Then the Disallow under user-agent 008 seems to only apply to them.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Best Practices to Design Site Mock Up Using Wordpress Rather than Wireframes?
We are in the process of redesigning our real estate website. Our designer/developer is very quick and confident on Wordpress. He suggests designing directly on Wordpress and bypassing wireframes and a mock ups. He is very confident in his Wordpress abilities. Is it a mistake to take this approach? He has also asked that we select a real estate theme at this point. I would think that the theme would be selected after the wireframes and mock ups get done. But there are certainly different approaches. Are there best practices for redesigning a webiste; any suggestions? Are there significant risks/disadvantages to bypassing wireframes/mock ups? Thanks,
Intermediate & Advanced SEO | | Kingalan1
Alan Rosinsky0 -
Default Wordpress 301 Redirects of JS and CSS files. Bad for SEO & How to Fix?
Hi there: We are developers with some digital marketing expertise, but a current issue has us perplexed. An outside SEO firm has asked us to clean up a large number of 301 redirects. Most of these are 'default' Wordpress behavior that relate to calling the latest version of a JS or CSS file. For instance, a JS file is called with this: https://websitexyz.com/wp-includes/js/wp-embed.min.js?ver=4.9.1 but ultimately redirects to this: https://websitexyz.com/wp-includes/js/wp-embed.min.js. We are being asked to prevent the redirect from happening by, presumably, calling the ultimate file to begin with. The issue is that, as far as we know, there's no easy way to alter WP behavior to call the ultimate file to begin with. Does anyone have any thoughts on this? Thanks.
Intermediate & Advanced SEO | | Daaveey0 -
Hacked Wordpress Site! So many 404s
So I had a site that I worked on get hacked. We eliminated the URLs, found the vulnerability (Bluehost!) and rolled back the site. BUT they got into the Google Search Console and indexed a LOT of pages. These pages are now 404 errors and I asked the robots.txt file to make them noindex. The problem is that Google is placing a "this site may be hacked" on the search listing. I asked Google to reevaluate it and it was approved by there are still 80,000 404 errors being shown and it still believes that the uploaded files that we deleted should be showing. Doing a site search STILL shows the infected pages though and it has been a month. Any insight would definitely be helpful. Thanks!
Intermediate & Advanced SEO | | mattdinbrooklyn0 -
Why are "noindex" pages access denied errors in GWT and should I worry about it?
GWT calls pages that have "noindex, follow" tags "access denied errors." How is it an "error" to say, "hey, don't include these in your index, but go ahead and crawl them." These pages are thin content/duplicate content/overly templated pages I inherited and the noindex, follow tags are an effort to not crap up Google's view of this site. The reason I ask is that GWT's detection of a rash of these access restricted errors coincides with a drop in organic traffic. Of course, coincidence is not necessarily cause. Should I worry about it and do something or not? Thanks... Darcy
Intermediate & Advanced SEO | | 945010 -
Wordpress Tag Pages - NoIndex?
Hi there. I am using Yoast Wordpress Plugin. I just wonder if any test have been done around the effects of Index vs Noindex for Tag Pages? ( like when tagging a word relevant to an article ) Thanks 🙂 Martin
Intermediate & Advanced SEO | | s_EOgi_Bear0 -
Www vs. non-www differences in crawl errors in Webmaster tools...
Hey All, I have been working on an eCommerce site for a while that to no avail, continues to make me want to hang myself. To make things worth the developers just do not understand SEO and it seems every change they make just messes up work we've already done. Job security I guess. Anywho,most recently we realized they had some major sitemap issues as almost 3000 pages were submitted by only 20 or so were indexed. Well, they updated the sitemap and although all the pages are properly indexing, I now have 5000+ "not found" crawl errors in the non-www version of WMT and almost none in the www version of the WMT account. Anyone have insight as to why this would be?
Intermediate & Advanced SEO | | RossFruin0 -
404 Error on Blog Pages that Look Like Loading Fine
There was recently a huge increase in 404 errors on Yandex Webmasters corresponding with a drop in rankings. Most of the pages seem to be from my blog (which was updated around the same time). When I click on the links from Yandex the page looks like it is loading normal, expect that it has the following message from the Facebook plugin I am using for commenting Any ideas about what the problem is or how to fix it? Critical Errors That Must Be Fixed | Bad Response Code: | URL returned a bad HTTP response code. | Open Graph Warnings That Should Be Fixed | Inferred Property: | The 'og:url' property should be explicitly provided, even if a value can be inferred from other tags. |
Intermediate & Advanced SEO | | theLotter
| Inferred Property: | The 'og:title' property should be explicitly provided, even if a value can be inferred from other tags. |
| Small og:image: | All the images referenced by og:image should be at least 200px in both dimensions. Please check all the images with tag og:image in the given url and ensure that it meets the recommended specification. |0 -
Wordpress blog integration with full website effect on SEO
I have searched and searched for the answer to this question and can't find it. We are going to be launching a Wordpress blog on our domain shortly, however we have a much larger site that is mixed with static and dynamic pages full of custom programming tied to databases, etc. that we are running around the blog and can't integrate that into Wordpress due to its complexity. What this means is we have to install Wordpress on our servers somehow separate from the pages of our website. What I am wondering is if we run Wordpress in the /blog directory of our site as a separate installation if it will inherit the domain authority of our domain (currently around 60) or if it will be viewed as a separate site and get no ranking. Also, will our main site inherit the additional link juice from the inbound links that we get from the blog with it being separate from the main site? How does this need to be setup on our webservers to ensure the blog gets authority of the domain, and the blog contributes maximum SEO value to the domain? Any help would be appreciated.
Intermediate & Advanced SEO | | CodyWheeler0