What's wrong with this robots.txt
-
Hi. really struggling with the robots.txt file
this is it:User-agent: *
Disallow: /product/#old sitemap
Disallow: /media/name.xmlWhen testing in w3c.org everything looks good, testing is okay, but when uploading it to the server, Google webmaster tools gives 3 errors. Checked it with my collegue we both don't know what's wrong.
Can someone take a look at this and give me the solution.
Thanx in advance!Leonie
-
I think thats a great Idea .net is not my thing.
All the best!
Tom
-
Ah thanks, it's an Azure platform, so no SFTP, SSH or .htaccess. but i'll give the stack link to the technical guys then they have to translate it to our environment ( .net)
-
Believe me it took me plenty of time to realize how to do this but if you're handy with SFTP or SSH you can change the
And for the ultimate in ease if you're using WordPress there is actually a plug-in for 410s so it wasn't something anyone found easy to do.
https://wordpress.org/plugins/wp-410/
Sincerely,
Thomas
-
Hi Leonie,
That's very kind of you I am very happy that you got it working correctly.
All the best,
Thomas
-
Hi ,
i got it working with a proper sitemap. Special thanks to Thomas for the great effort in his answers!
-
Hi, Thanx for your reply, i'm not sure i understand you by "please note you are disallowing more than just media"
the thing with this is the xml file is an old file but somewhere in the google archive. i tried do remove it with the wmt, but returns. It's not on the server anymore. the directory "media" doesn't exist anymore, also from an old website.
Because the file still returns in wmt i thought let's try it with the robots.txt
new robots.txt not tested waiting for deployment
Oh call me stupid, but how do i make a 410?
Grtz, Leonie
-
By the way here is an outdated site map that has when it looks like errors that really is telling me the protocol for putting a site map inside a robots.txt file is not endorsed by Google or Bing however I truly feel it is helpful so I do it. I've also added extra video site maps from an external host which is what's throwing out the errors the red color of the disallows is not a error it is just letting you know they are being blocked. Hopefully this will be of help
bigger photo is right here as well please give me a look at what errors are getting
http://i.imgur.com/Xg7EXwO.png
http status: 200
Syntax check robots.txt on http://www.blueprintmarketing.com/robots.txt (359 bytes)
| Line | Severity | Code |
| 6 | Warning | The official standard does not include Sitemap support even though major crawlers (Google and Bing) support it. It is still nonstandard. |
| 7 | Warning | The official standard does not include Sitemap support even though major crawlers (Google and Bing) support it. It is still nonstandard. |
| 8 | Warning | The official standard does not include Sitemap support even though major crawlers (Google and Bing) support it. It is still nonstandard. |
| 9 | Warning | The official standard does not include Sitemap support even though major crawlers (Google and Bing) support it. It is still nonstandard. |
| 10 | Warning | The official standard does not include Sitemap support even though major crawlers (Google and Bing) support it. It is still nonstandard. |Warnings Detected: 5
Errors Detected: 0
robots.txt source code for http://
| Line | Code |
| <a name="line-1"></a>1 | User-agent: * |
| <a name="line-2"></a>2 | Disallow: /wp-content/plugins/ |
| <a name="line-3"></a>3 | Disallow: /wp-admin/ |
| <a name="line-4"></a>4 | Disallow: /wp-includes/ |
| <a name="line-5"></a>5 | |
| <a name="line-6"></a>6 | Sitemap: http://www.blueprintmarketing.com/sitemap_index.xml |
| <a name="line-7"></a>7 | Sitemap: http://app.wistia.com/sitemaps/11323.xml |
| <a name="line-8"></a>8 | Sitemap: http://app.wistia.com/sitemaps/4339.xml |
| <a name="line-9"></a>9 | Sitemap: http://app.wistia.com/sitemaps/14213.xml |
| <a name="line-10"></a>10 | Sitemap: http://app.wistia.com/sitemaps/23283.xml | -
Hi Leonie,
I believe that you should create a robots.txt file that allows for a user agent disallow a folder /media/ and /.xml file. make the Unwanted xml file a 410 it will be dead to Google. however I think I have come up with a solution below please try pasting that in if it does not work.
A another tool for building robots.txt files and comparing them to the existing file from the same company believe it or not is right here.
http://www.internetmarketingninjas.com/seo-tools/robots-txt-generator/
please note you are disallowing more than just media you are disallowing something that should be more like this is for the xml sitemap why not just set it to a 410 killing the link in Google's eyes then you will not have to Disallow.
User-agent: *
Disallow: /product/
Disallow: /media/
Disallow: /bcc.xmlSitemap: http://example.com/sitemap_index.xml
putting your new site map in where I have placed a site map or where the rule above will give you the spot to put it will help you tell Google where your new site map resides along with of course submitting it to Google Webmaster tools and fetching it as a Google bot.
I would like to look at the architecture of your site if you're getting errors with what you showed me you can send me a private message and I promise I will respond if you are not comfortable showing the URL on Q&A.
I hope this is of help,
Thomas
-
Hi Dean happy to be of help!
-
Thanx for the url: it gives a warning on
Disallow: /product/
and
Disallow: /media/bcc.xmli wonder why?
-
Thomas,
That's an awesome tool, thank you for sharing.
-
if you want to find out anything that could possibly be wrong with that this tool is the holy grail of finding out what's wrong with robots.txt issues in my opinion just expect a lot more info than a simple response from it.
http://tools.seochat.com/tools/robots-txt-validator/
Sincerely,
Thomas
-
if i test the blocked url's they are blocked so it looks like the file is doing what's supposed to do. but still is strange i got these errors.
@Dean Andrews, thanx i will test it without empty lines, though have to wait for another deployment
-
Okay i got these errors in webmaster tools, very strange it is
-
Sounds more like a bug in the tool that you're as I tested the syntax just now in Google Webmaster Tools and it's not causing any issues there.
-
Hi, Lines containing only a comment are discarded completely, and therefore do not indicate a record boundary however you may need to remove the line break (not 100% sure but worth testing): User-agent: * Disallow: /product/ Disallow: /media/bcc.xml
-
Hi, sorry forgot to mention that
syntax error @ User-agent: *
no user agent @ Disallow: /product/
no user agent @ Disallow: /media/name.xml
Thanx, Leonie
-
Hi Leonie, what are the 3 errors as it seems that the robots.txt file syntax is correct.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Moving wordpress to it's own server
Our company wants to remove wordpress from our current windows OS server at provider 1 and move it to a new server at provider 2. Godaddy handles our DNS. I would like to have it on the same domain without masking. I would like to make a DNS entry on godaddy so that our current server and our new server can use the same URL (ie sellstuff.com). But I only want the DNS to direct traffic to our current server. The goal here is to have the new server using the same URL as the old server so nothing needs to be masked once traffic is redirected with a 301 rule in the htaccess file. But no traffic outside of the 301 rule will end up going to the new server. I would then like to edit the htaccess file on our current server to redirect to the new servers IP address when someone goes to sellstuff.com/blog. Does this make since and is it possible?
Technical SEO | | larsonElectronics0 -
Robots.txt blocking Addon Domains
I have this site as my primary domain: http://www.libertyresourcedirectory.com/ I don't want to give spiders access to the site at all so I tried to do a simple Disallow: / in the robots.txt. As a test I tried to crawl it with Screaming Frog afterwards and it didn't do anything. (Excellent.) However, there's a problem. In GWT, I got an alert that Google couldn't crawl ANY of my sites because of robots.txt issues. Changing the robots.txt on my primary domain, changed it for ALL my addon domains. (Ex. http://ethanglover.biz/ ) From a directory point of view, this makes sense, from a spider point of view, it doesn't. As a solution, I changed the robots.txt file back and added a robots meta tag to the primary domain. (noindex, nofollow). But this doesn't seem to be having any effect. As I understand it, the robots.txt takes priority. How can I separate all this out to allow domains to have different rules? I've tried uploading a separate robots.txt to the addon domain folders, but it's completely ignored. Even going to ethanglover.biz/robots.txt gave me the primary domain version of the file. (SERIOUSLY! I've tested this 100 times in many ways.) Has anyone experienced this? Am I in the twilight zone? Any known fixes? Thanks. Proof I'm not crazy in attached video. robotstxt_addon_domain.mp4
Technical SEO | | eglove0 -
Pro's & contra's: http vs https
Hi there, We are planning to take the step and go from http to https. The main reason to do this, is to mean trustfull to our clients. And of course the rumours that it would be better for ranking (in the future). We have a large e-commerce site. A part of this site ia already HTTPS. I've read a lot of info about pro's and contra's, also this MOZ article: http://moz.com/blog/seo-tips-https-ssl
Technical SEO | | Leonie-Kramer
But i want to know some experience from others who already done this. What did you encountered when changing to HTTPS, did you had ranking drops, or loss of links etc? I want to make a list form pro's and contra's and things we have to do in advance. Thanx, Leonie0 -
Google's Omitted Results - Attempt to De-Index
We're trying to get webpages from our QA site out of Google's index. We've inserted the NOINDEX tags. Google now shows only 3 results (down from 196,000), however, they offer a link to "show omitted results" at the bottom of the page. (A) Did we do something wrong? or (B) were we successful with our NOINDEX but Google will offer to show omitted results anyway? Please advise! Thanks!
Technical SEO | | BVREID0 -
301ing 404's
Hey guys, I am currently in the process of redirecting some of my 404 pages to pages like my home page. Before I do that, I am assessing the link value of the 404 pages. My question is what do you do with the 404 pages which appear to have low quality links, do you really want to redirect them to an important page on your site? What should I do with these 404 pages? CheersAdam
Technical SEO | | Adamshowbiz0 -
Do i have my robots.txt file set up properly
Hi, just doing some seo on my site and i am not sure if i have my robots file set correctly. i use joomla and my website is www.in2town.co.uk. here is my robots file, does this look correct to you User-agent: *
Technical SEO | | ClaireH-184886
Disallow: /administrator/
Disallow: /cache/
Disallow: /components/
Disallow: /includes/
Disallow: /installation/
Disallow: /language/
Disallow: /libraries/
Disallow: /media/
Disallow: /modules/
Disallow: /plugins/
Disallow: /templates/
Disallow: /tmp/
Disallow: /xmlrpc/ many thanks1 -
Restricted by robots.txt does this cause problems?
I have restricted around 1,500 links which are links to retailers website and links that affiliate links accorsing to webmaster tools Is this the right approach as I thought it would affect the link juice? or should I take the no follow out of the restricted by robots.txt file
Technical SEO | | ocelot0 -
How do I use the Robots.txt "disallow" command properly for folders I don't want indexed?
Today's sitemap webinar made me think about the disallow feature, seems opposite of sitemaps, but it also seems both are kind of ignored in varying ways by the engines. I don't need help semantically, I got that part. I just can't seem to find a contemporary answer about what should be blocked using the robots.txt file. For example, I have folders containing site comps for clients that I really don't want showing up in the SERPS. Is it better to not have these folders on the domain at all? There are also security issues I've heard of that make sense, simply look at a site's robots file to see what they are hiding. It makes it easier to hunt for files when they know the directory the files are contained in. Do I concern myself with this? Another example is a folder I have for my xml sitemap generator. I imagine google isn't going to try to index this or count it as content, so do I need to add folders like this to the disallow list?
Technical SEO | | SpringMountain0