What's wrong with this robots.txt
-
Hi. really struggling with the robots.txt file
this is it:User-agent: *
Disallow: /product/#old sitemap
Disallow: /media/name.xmlWhen testing in w3c.org everything looks good, testing is okay, but when uploading it to the server, Google webmaster tools gives 3 errors. Checked it with my collegue we both don't know what's wrong.
Can someone take a look at this and give me the solution.
Thanx in advance!Leonie
-
I think thats a great Idea .net is not my thing.
All the best!
Tom
-
Ah thanks, it's an Azure platform, so no SFTP, SSH or .htaccess. but i'll give the stack link to the technical guys then they have to translate it to our environment ( .net)
-
Believe me it took me plenty of time to realize how to do this but if you're handy with SFTP or SSH you can change the
And for the ultimate in ease if you're using WordPress there is actually a plug-in for 410s so it wasn't something anyone found easy to do.
https://wordpress.org/plugins/wp-410/
Sincerely,
Thomas
-
Hi Leonie,
That's very kind of you I am very happy that you got it working correctly.
All the best,
Thomas
-
Hi ,
i got it working with a proper sitemap. Special thanks to Thomas for the great effort in his answers!
-
Hi, Thanx for your reply, i'm not sure i understand you by "please note you are disallowing more than just media"
the thing with this is the xml file is an old file but somewhere in the google archive. i tried do remove it with the wmt, but returns. It's not on the server anymore. the directory "media" doesn't exist anymore, also from an old website.
Because the file still returns in wmt i thought let's try it with the robots.txt
new robots.txt not tested waiting for deployment
Oh call me stupid, but how do i make a 410?
Grtz, Leonie
-
By the way here is an outdated site map that has when it looks like errors that really is telling me the protocol for putting a site map inside a robots.txt file is not endorsed by Google or Bing however I truly feel it is helpful so I do it. I've also added extra video site maps from an external host which is what's throwing out the errors the red color of the disallows is not a error it is just letting you know they are being blocked. Hopefully this will be of help
bigger photo is right here as well please give me a look at what errors are getting
http://i.imgur.com/Xg7EXwO.png
http status: 200
Syntax check robots.txt on http://www.blueprintmarketing.com/robots.txt (359 bytes)
| Line | Severity | Code |
| 6 | Warning | The official standard does not include Sitemap support even though major crawlers (Google and Bing) support it. It is still nonstandard. |
| 7 | Warning | The official standard does not include Sitemap support even though major crawlers (Google and Bing) support it. It is still nonstandard. |
| 8 | Warning | The official standard does not include Sitemap support even though major crawlers (Google and Bing) support it. It is still nonstandard. |
| 9 | Warning | The official standard does not include Sitemap support even though major crawlers (Google and Bing) support it. It is still nonstandard. |
| 10 | Warning | The official standard does not include Sitemap support even though major crawlers (Google and Bing) support it. It is still nonstandard. |Warnings Detected: 5
Errors Detected: 0
robots.txt source code for http://
| Line | Code |
| <a name="line-1"></a>1 | User-agent: * |
| <a name="line-2"></a>2 | Disallow: /wp-content/plugins/ |
| <a name="line-3"></a>3 | Disallow: /wp-admin/ |
| <a name="line-4"></a>4 | Disallow: /wp-includes/ |
| <a name="line-5"></a>5 | |
| <a name="line-6"></a>6 | Sitemap: http://www.blueprintmarketing.com/sitemap_index.xml |
| <a name="line-7"></a>7 | Sitemap: http://app.wistia.com/sitemaps/11323.xml |
| <a name="line-8"></a>8 | Sitemap: http://app.wistia.com/sitemaps/4339.xml |
| <a name="line-9"></a>9 | Sitemap: http://app.wistia.com/sitemaps/14213.xml |
| <a name="line-10"></a>10 | Sitemap: http://app.wistia.com/sitemaps/23283.xml | -
Hi Leonie,
I believe that you should create a robots.txt file that allows for a user agent disallow a folder /media/ and /.xml file. make the Unwanted xml file a 410 it will be dead to Google. however I think I have come up with a solution below please try pasting that in if it does not work.
A another tool for building robots.txt files and comparing them to the existing file from the same company believe it or not is right here.
http://www.internetmarketingninjas.com/seo-tools/robots-txt-generator/
please note you are disallowing more than just media you are disallowing something that should be more like this is for the xml sitemap why not just set it to a 410 killing the link in Google's eyes then you will not have to Disallow.
User-agent: *
Disallow: /product/
Disallow: /media/
Disallow: /bcc.xmlSitemap: http://example.com/sitemap_index.xml
putting your new site map in where I have placed a site map or where the rule above will give you the spot to put it will help you tell Google where your new site map resides along with of course submitting it to Google Webmaster tools and fetching it as a Google bot.
I would like to look at the architecture of your site if you're getting errors with what you showed me you can send me a private message and I promise I will respond if you are not comfortable showing the URL on Q&A.
I hope this is of help,
Thomas
-
Hi Dean happy to be of help!
-
Thanx for the url: it gives a warning on
Disallow: /product/
and
Disallow: /media/bcc.xmli wonder why?
-
Thomas,
That's an awesome tool, thank you for sharing.
-
if you want to find out anything that could possibly be wrong with that this tool is the holy grail of finding out what's wrong with robots.txt issues in my opinion just expect a lot more info than a simple response from it.
http://tools.seochat.com/tools/robots-txt-validator/
Sincerely,
Thomas
-
if i test the blocked url's they are blocked so it looks like the file is doing what's supposed to do. but still is strange i got these errors.
@Dean Andrews, thanx i will test it without empty lines, though have to wait for another deployment
-
Okay i got these errors in webmaster tools, very strange it is
-
Sounds more like a bug in the tool that you're as I tested the syntax just now in Google Webmaster Tools and it's not causing any issues there.
-
Hi, Lines containing only a comment are discarded completely, and therefore do not indicate a record boundary however you may need to remove the line break (not 100% sure but worth testing): User-agent: * Disallow: /product/ Disallow: /media/bcc.xml
-
Hi, sorry forgot to mention that
syntax error @ User-agent: *
no user agent @ Disallow: /product/
no user agent @ Disallow: /media/name.xml
Thanx, Leonie
-
Hi Leonie, what are the 3 errors as it seems that the robots.txt file syntax is correct.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
One server, two domains - robots.txt allow for one domain but not other?
Hello, I would like to create a single server with two domains pointing to it. Ex: domain1.com -> myserver.com/ domain2.com -> myserver.com/subfolder. The goal is to create two separate sites on one server. I would like the second domain ( /subfolder) to be fully indexed / SEO friendly and have the robots txt file allow search bots to crawl. However, the first domain (server root) I would like to keep non-indexed, and the robots.txt file disallowing any bots / indexing. Does anyone have any suggestions for the best way to tackle this one? Thanks!
Technical SEO | | Dave1000 -
Block Domain in robots.txt
Hi. We had some URLs that were indexed in Google from a www1-subdomain. We have now disabled the URLs (returning a 404 - for other reasons we cannot do a redirect from www1 to www) and blocked via robots.txt. But the amount of indexed pages keeps increasing (for 2 weeks now). Unfortunately, I cannot install Webmaster Tools for this subdomain to tell Google to back off... Any ideas why this could be and whether it's normal? I can send you more domain infos by personal message if you want to have a look at it.
Technical SEO | | zeepartner0 -
Why are my URL's changing
My rankings suddenly dropped and when trying to understand why I realized that nearly all images in Google's cached version of my site were missing. In the actual site they appear but in the cached version they don't. I noticed that most of the images had a ?6b5830 at the end of the URL and these were the images that were not showing. I am hoping that I found the reason for the drop in rankings. Maybe since Google cannot see a lot of the content it decided not to rank it as well (particularly since it seems to happen on thousands of pages). This is a cached version of my site I am using the following plugins that might be causing it: Yoasts SEO plugin, W3 total cache. Does anyone know what is causing ?6b5830 to be added to the end of most of my URL's? Could this be the reason for the ranking drop? Thanks in advance!
Technical SEO | | JillB20130 -
Do i have my robots.txt file set up properly
Hi, just doing some seo on my site and i am not sure if i have my robots file set correctly. i use joomla and my website is www.in2town.co.uk. here is my robots file, does this look correct to you User-agent: *
Technical SEO | | ClaireH-184886
Disallow: /administrator/
Disallow: /cache/
Disallow: /components/
Disallow: /includes/
Disallow: /installation/
Disallow: /language/
Disallow: /libraries/
Disallow: /media/
Disallow: /modules/
Disallow: /plugins/
Disallow: /templates/
Disallow: /tmp/
Disallow: /xmlrpc/ many thanks1 -
What's the issue?
Hi, We have a client who dropped in the rankings (initially from bottom of the first page to page to page 3, and now page 5) for a single keyword (their most important one - targeted on their homepage) back in the middle of March. So far, we've found that the issue isn't the following: Keyword stuffing on the page External anchor text pointing to the page Internal anchor text pointing to the page In addition to the above, the drop didn't coincide with panda or penguin. Any other ideas as to what could cause such a drop for a single keyword (other related rankings haven't moved). We're starting to think that this may just have been another small change in the algorithm but it seems like too big of a drop in a short space of time for that to be the case. Any thoughts would be much appreciated! Thanks.
Technical SEO | | jasarrow0 -
We have a decent keyword rich URL domain that's not being used - what to do with it?
We're an ecommerce site and we have a second, older domain with a better keyword match URL than our main domain (I know, you may be wondering why we didn't use it, but that's beside the point now). It currently ranks fairly poorly as there's very few links pointing to it. However, the exact match URL means it has some value, if we were to build a few links to it. What would you do with it: 301 product/category pages to current site's equivalent page Link product/category pages to current site's equivalent page Not bother using it at all Something else
Technical SEO | | seanmccauley0 -
What's the best way to transplant a blogger blog to another domain?
So I have this client who's got a killer blogger blog—tons of inbound links, great content, etc. He wants to move it onto his new website. Correct me if I'm wrong, but there isn't a single way to 301 the darn thing. I can do meta refresh and/or JavaScript redirects, but those won't transfer link juice, right? Is there a best practice here? I've considered truncating each post and adding a followed "continue reading…" link, which would of course link to the full post on the client's new site. It would take a while and I'm wondering if it would be worth it, and/or if there are any better ideas out there. Sock it to me.
Technical SEO | | TheEspresseo0 -
Is robots.txt a must-have for 150 page well-structured site?
By looking in my logs I see dozens of 404 errors each day from different bots trying to load robots.txt. I have a small site (150 pages) with clean navigation that allows the bots to index the whole site (which they are doing). There are no secret areas I don't want the bots to find (the secret areas are behind a Login so the bots won't see them). I have used rel=nofollow for internal links that point to my Login page. Is there any reason to include a generic robots.txt file that contains "user-agent: *"? I have a minor reason: to stop getting 404 errors and clean up my error logs so I can find other issues that may exist. But I'm wondering if not having a robots.txt file is the same as some default blank file (or 1-line file giving all bots all access)?
Technical SEO | | scanlin0