What's wrong with this robots.txt
-
Hi. really struggling with the robots.txt file
this is it:User-agent: *
Disallow: /product/#old sitemap
Disallow: /media/name.xmlWhen testing in w3c.org everything looks good, testing is okay, but when uploading it to the server, Google webmaster tools gives 3 errors. Checked it with my collegue we both don't know what's wrong.
Can someone take a look at this and give me the solution.
Thanx in advance!Leonie
-
I think thats a great Idea .net is not my thing.
All the best!
Tom
-
Ah thanks, it's an Azure platform, so no SFTP, SSH or .htaccess. but i'll give the stack link to the technical guys then they have to translate it to our environment ( .net)
-
Believe me it took me plenty of time to realize how to do this but if you're handy with SFTP or SSH you can change the
And for the ultimate in ease if you're using WordPress there is actually a plug-in for 410s so it wasn't something anyone found easy to do.
https://wordpress.org/plugins/wp-410/
Sincerely,
Thomas
-
Hi Leonie,
That's very kind of you I am very happy that you got it working correctly.
All the best,
Thomas
-
Hi ,
i got it working with a proper sitemap. Special thanks to Thomas for the great effort in his answers!
-
Hi, Thanx for your reply, i'm not sure i understand you by "please note you are disallowing more than just media"
the thing with this is the xml file is an old file but somewhere in the google archive. i tried do remove it with the wmt, but returns. It's not on the server anymore. the directory "media" doesn't exist anymore, also from an old website.
Because the file still returns in wmt i thought let's try it with the robots.txt
new robots.txt not tested waiting for deployment
Oh call me stupid, but how do i make a 410?
Grtz, Leonie
-
By the way here is an outdated site map that has when it looks like errors that really is telling me the protocol for putting a site map inside a robots.txt file is not endorsed by Google or Bing however I truly feel it is helpful so I do it. I've also added extra video site maps from an external host which is what's throwing out the errors the red color of the disallows is not a error it is just letting you know they are being blocked. Hopefully this will be of help
bigger photo is right here as well please give me a look at what errors are getting
http://i.imgur.com/Xg7EXwO.png
http status: 200
Syntax check robots.txt on http://www.blueprintmarketing.com/robots.txt (359 bytes)
| Line | Severity | Code |
| 6 | Warning | The official standard does not include Sitemap support even though major crawlers (Google and Bing) support it. It is still nonstandard. |
| 7 | Warning | The official standard does not include Sitemap support even though major crawlers (Google and Bing) support it. It is still nonstandard. |
| 8 | Warning | The official standard does not include Sitemap support even though major crawlers (Google and Bing) support it. It is still nonstandard. |
| 9 | Warning | The official standard does not include Sitemap support even though major crawlers (Google and Bing) support it. It is still nonstandard. |
| 10 | Warning | The official standard does not include Sitemap support even though major crawlers (Google and Bing) support it. It is still nonstandard. |Warnings Detected: 5
Errors Detected: 0
robots.txt source code for http://
| Line | Code |
| <a name="line-1"></a>1 | User-agent: * |
| <a name="line-2"></a>2 | Disallow: /wp-content/plugins/ |
| <a name="line-3"></a>3 | Disallow: /wp-admin/ |
| <a name="line-4"></a>4 | Disallow: /wp-includes/ |
| <a name="line-5"></a>5 | |
| <a name="line-6"></a>6 | Sitemap: http://www.blueprintmarketing.com/sitemap_index.xml |
| <a name="line-7"></a>7 | Sitemap: http://app.wistia.com/sitemaps/11323.xml |
| <a name="line-8"></a>8 | Sitemap: http://app.wistia.com/sitemaps/4339.xml |
| <a name="line-9"></a>9 | Sitemap: http://app.wistia.com/sitemaps/14213.xml |
| <a name="line-10"></a>10 | Sitemap: http://app.wistia.com/sitemaps/23283.xml | -
Hi Leonie,
I believe that you should create a robots.txt file that allows for a user agent disallow a folder /media/ and /.xml file. make the Unwanted xml file a 410 it will be dead to Google. however I think I have come up with a solution below please try pasting that in if it does not work.
A another tool for building robots.txt files and comparing them to the existing file from the same company believe it or not is right here.
http://www.internetmarketingninjas.com/seo-tools/robots-txt-generator/
please note you are disallowing more than just media you are disallowing something that should be more like this is for the xml sitemap why not just set it to a 410 killing the link in Google's eyes then you will not have to Disallow.
User-agent: *
Disallow: /product/
Disallow: /media/
Disallow: /bcc.xmlSitemap: http://example.com/sitemap_index.xml
putting your new site map in where I have placed a site map or where the rule above will give you the spot to put it will help you tell Google where your new site map resides along with of course submitting it to Google Webmaster tools and fetching it as a Google bot.
I would like to look at the architecture of your site if you're getting errors with what you showed me you can send me a private message and I promise I will respond if you are not comfortable showing the URL on Q&A.
I hope this is of help,
Thomas
-
Hi Dean happy to be of help!
-
Thanx for the url: it gives a warning on
Disallow: /product/
and
Disallow: /media/bcc.xmli wonder why?
-
Thomas,
That's an awesome tool, thank you for sharing.
-
if you want to find out anything that could possibly be wrong with that this tool is the holy grail of finding out what's wrong with robots.txt issues in my opinion just expect a lot more info than a simple response from it.
http://tools.seochat.com/tools/robots-txt-validator/
Sincerely,
Thomas
-
if i test the blocked url's they are blocked so it looks like the file is doing what's supposed to do. but still is strange i got these errors.
@Dean Andrews, thanx i will test it without empty lines, though have to wait for another deployment
-
Okay i got these errors in webmaster tools, very strange it is
-
Sounds more like a bug in the tool that you're as I tested the syntax just now in Google Webmaster Tools and it's not causing any issues there.
-
Hi, Lines containing only a comment are discarded completely, and therefore do not indicate a record boundary however you may need to remove the line break (not 100% sure but worth testing): User-agent: * Disallow: /product/ Disallow: /media/bcc.xml
-
Hi, sorry forgot to mention that
syntax error @ User-agent: *
no user agent @ Disallow: /product/
no user agent @ Disallow: /media/name.xml
Thanx, Leonie
-
Hi Leonie, what are the 3 errors as it seems that the robots.txt file syntax is correct.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
What are the negative implications of listing URLs in a sitemap that are then blocked in the robots.txt?
In running a crawl of a client's site I can see several URLs listed in the sitemap that are then blocked in the robots.txt file. Other than perhaps using up crawl budget, are there any other negative implications?
Technical SEO | | richdan0 -
To integrate a blog tool onto site - or build a blog solution - what's better for SEO?
Currently looking at adding a blog to our company site subdirectory and wanted to know if there was a SEO distinction between the following methods: Integrating a bolt-on blog tool with the site to create the blog VS. just using the current site infrastructure to build blog functionality. What's better for SEO? (and if tool integration is the overwhelming response - which tool?). Cheers.
Technical SEO | | Oxfordcomma0 -
'No Follow' and 'Do Follow' links when using WordPress plugins
Hi all I hope someone can help me out with the following question in regards to 'no follow' and 'do follow' links in combination with WordPress plugins. Some plugins that deal with links i.e. link masking or SEO plugins do give you the option to 'not follow' links. Can someone speak from experience that this does actually work?? It's really quite stupid, but only occurred to me that when using the FireFox add on 'NoDoFollow' as well as looking at the SEOmoz link profile of course, 95% of my links are actually marked as FOLLOW, while the opposite should be the case. For example I mark about 90% of outgoing links as no follow within a link masking plugin. Well, why would WordPress plugins give you the option to mark links as no follow in the first place when they do in fact appear as follow for search engines and SEOmoz? Is this a WordPress thing or whatnot? Maybe they are in fact no follow, and the information supplied by SEO tools comes from the basic HTML structure analysis. I don't know... This really got me worried. Hope someone can shed a light. All the best and many thanks for your answers!
Technical SEO | | Hermski0 -
When rankings dip what's the best diagnostic procedure?
Bonjourno from 10 degrees C lighly raining Wetherby UK 🙂 Every so often SEO feels like a game of snakes & ladders. One minute your rankings go up and then then within the click of a mouse they drop back down. Like a Greek play you begin to feel our mortal lives as SEO pundits is controlled by the Google Gods. A case in point is illustrated here in this graph:
Technical SEO | | Nightwing
http://i216.photobucket.com/albums/cc53/zymurgy_bucket/lincoln-drop_zpseeb04690.jpg Now if i want to explain why the rapid dip has occured for target term "Lincoln Solicitors" here's is what i'd do: 1. Go to webmaster tools and check for crawl errors 2. See if a Google algo change has changed the rules of engagment 3. Check another site administrator hasnt tinkered with the original layout But i wonder what process do other SEO practitioners follow to explain to a disgruntled client - "Why have my rankings that i pay you to look after nose dived?" Any insights welcome:-)0 -
Alternatives to SEOmoz's Crawl Diagnistics
I really like SEOmoz's Crawl diagnostics reports, it goes through the pages and finds all sorts of valuable information, I wanted to know if there are any other services that compete against this specific service, to test the accuracy of their crawl diagnistics. Thanks
Technical SEO | | BestOdds0 -
Google indexing less url's then containded in my sitemap.xml
My sitemap.xml contains 3821 urls but Google (webmaster tools) indexes only 1544 urls. What may be the cause? There is no technical problem. Why does Google index less URLs then contained in my sitemap.xml?
Technical SEO | | Juist0 -
Robots.txt usage
Hey Guys, I am about make an important improvement to our site's robots.txt we have large number of properties on our site and we have different views for them. List, gallery and map view. By default list view shows up and user can navigate through gallery view. We donot want gallery pages to get indexed and want to save our crawl budget for more important pages. this is one example of our site: http://www.holiday-rentals.co.uk/France/r31.htm When you click on "gallery view" URL of this site will remain same in your address bar: but when you mouse over the "gallery view" tab it will show you URL with parameter "view=g". there are number of parameters: "view=g, view=l and view=m". http://www.holiday-rentals.co.uk/France/r31.htm?view=l http://www.holiday-rentals.co.uk/France/r31.htm?view=g http://www.holiday-rentals.co.uk/France/r31.htm?view=m Now my question is: I If restrict bots by adding "Disallow: ?view=" in our robots.txt will it effect the list view too? Will be very thankful if yo look into this for us. Many thanks Hassan I will test this on some other site within our network too before putting it to important one's. to measure the impact but will be waiting for your recommendations. Thanks
Technical SEO | | holidayseo0 -
Robots.txt file getting a 500 error - is this a problem?
Hello all! While doing some routine health checks on a few of our client sites, I spotted that a new client of ours - who's website was not designed built by us - is returning a 500 internal server error when I try to look at the robots.txt file. As we don't host / maintain their site, I would have to go through their head office to get this changed, which isn't a problem but I just wanted to check whether this error will actually be having a negative effect on their site / whether there's a benefit to getting this changed? Thanks in advance!
Technical SEO | | themegroup0