First Crawl Report
-
Just joined SEOMoz today and am slightly overwhelmed, but excited about learning loads from it.
I've just received my Crawl Report and there is a
404 : UserPreemptionError:
http://www.iainmoran.com/comments/feed/This is a WordPress site and I've no idea what the best course of action to take. I've done some searching on Google and a couple of sites suggest removing that url from within the robots.txt file. I'm using the Yoast Plugin which apparently creates a robots.txt file, but I can't see any way to edit it. Is there another solution for resolving the 404 error?
Many thanks,
Iain.
-
John, thank you so much for your help with this. It's very much appreciated.
That does explain why I couldn't find the robots.txt file.
Is there another way to resolve the 404 error warning that SEOMoz is telling me about?
404 : UserPreemptionError:
http://www.iainmoran.com/comments/feed/I understand that if I disable comments on my WP site, that would fix the issue, but don't really want to do that if it can be avoided.
Thanks again,
Iain.
-
Actually, on further investigation, blocking the includes folder does not effect ranking, as the content is loaded server side (PHP)
I don't imagine that the robot.txt file is your issue, as its just stopping people accessing your include and admin folder, which you want
-
Iain
I dont use WP, but just Googled this
"I believe WordPress itself creates the robots.txt file and it is a virtual file, meaning there won't be a hard copy on your server for you to edit. You could create one yourself and upload it to your server and I think search engines will use that one instead, or you can use a plugin like this one, that will let you edit your robots.txt file from the WordPress admin."
Explains why you cant find it !
Basically that suggest creating one yourself and uploading it to the server.
Create a note pad file, add your content, name file robot.txt (must change extension to .txt), and that should work
-
Iain
Just to be sure, its actually just called a robot.txt (not .text)
I just checked, you do have one in your root file
http://www.iainmoran.com/robots.txt
Bizarrely, its blocking Google indexing your includes, which is a problem cause most of your links will be included in some sort of nav include, not to mention most of your website content.
Its defiantly there, maybe check your public folder and www folder
Let me know how you get on
John
-
Thanks for your quick response, Johnny,
I've just looked at my root directory via both my CPanel and FTP, but there is no robots.text file there.
I don't understand why this is, as in the Yoast section within each of my pages, there are some which I have set to "nofollow"
Also, within the Yoast CP there is a section titled: Robots.txt and says the following under it:
"If you had a robots.txt file and it was editable, you could edit it from here."Iain.
-
Iain
You robots.xt file can be found within your root directory
Where ever your website is hosted, log in there (if you have cpanel which is more than likely) and navigate to your public folder. the robots.text file should reside there, download it and edit in any software really, wordpad etc.
If you don't have cpanel, you can add your FTP credential using an FTP client and navigate the same way
When you edit it, upload it back to your server.
Regards
J
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
What is a good crawl budget?
Hi Community! I am in the process of updating sitemaps and am trying to obtain a standard for what is considered "strong" crawl budget? Every documentation I've found includes how to make it better or what to watch out for. However, I'm looking for an amount to obtain for (ex: 60% of the sitemap has been crawled, 100%, etc.)
Technical SEO | | yaelslater1 -
Why does Bing bot crawl so aggressively?
We observer that the Bing bot is crawling our site very aggressively. We set Bing's crawl control so that it should not crawl us during heavy traffic hours, but that did not change a thing. Does anyone have the problem and even better a solution?
Technical SEO | | Roverandom1 -
Will Google crawl and rank our ReactJS website content?
We have 250+ products dynamically inserted and sorted on our site daily (more specifically our homepage... yes, it's a long page). Our dev team would like to explore rendering the page server-side using ReactJS. We currently use a CDN to cache all the content, which of course we would like to continue using. SO... will Google be able to crawl that content? We've read some articles with different ideas (including prerendering): http://andrewhfarmer.com/react-seo/
Technical SEO | | Jane.com
http://www.seoskeptic.com/json-ld-big-day-at-google/ If we were to only load the schema important to the page (like product title, image, price, description, etc.) from the server and then let the client render the remaining content (comments, suggested products, etc.), would that go against best practices? It seems like that might be seen as showing the googlebot 1 version and showing the site visitor a different (more complete) version.0 -
How long after disallowing Googlebot from crawling a domain until those pages drop out of their index?
We recently had Google crawl a version of the site we that we had thought we had disallowed already. We have corrected the issue of them crawling the site, but pages from that version are still appearing in the search results (the version we want them to not index and serve up is our .us domain which should have been blocked to them). My question is this: How long should I expect that domain (the .us we don't want to appear) to stay in their index after disallowing their bot? Is this a matter of days, weeks, or months?
Technical SEO | | TLM0 -
Webmaster tools reporting spurious errors?
For the past 3 or so months Webmaster tools has been reporting 404 errors on my pages... The odd thing is that I can't figure out what they are seeing. Here is an example of a link they claim is a 404 antiquebanknotes/nationalcurrency/rare/1895-Ten-Dollar-Bill.aspx This is strange because it's a malformed URL. It says it's linked from this page: http://www.antiquebanknotes.com/antiquebanknotes/rare/1882-twenty-dollar-bill.aspx Which is a URL that doesn't exist. The bolded portion of this URRL shouldn't be there. Can anyone give me an idea what is happening here? Kind regards, Greg
Technical SEO | | Banknotes1 -
Odd URL errors upon crawl
Hi, I see this in Google Webmasters, and am now also seeing it here...when a crawl is performed on my site, I get many 500 server error codes for URLs that I don't believe exist. It's as if it sees a normal URL but adds this to it: %3Cdiv%20id= It's like this for hundreds of URLs. Good URL that actually exists http://www.ffr-dsi.com/food-retailing/supplies/ URL that causes error and I have no idea why http://www.ffr-dsi.com/food-retailing/supplies/%3Cdiv%20id= Thanks!
Technical SEO | | Matt10 -
What to do about Google Crawl Error due to Faceted Navigation?
We are getting many crawl errors listed in Google webmaster tools. We use some faceted navigation with several variables. Google sees these as "500" response code. It looks like Google is truncating the url. Can we tell Google not to crawl these search results using part of the url ("sort=" for example)? Is there a better way to solve this?
Technical SEO | | EugeneF0 -
Internal Links not Crawled by Open Site Explorer
Can someone plz tell me why www.hotelelgreco.gr has only 2 internal links in OSE despite the fact that the text content has a plethora of them. Thanks in advance.
Technical SEO | | socrateskirtsios0