What's wrong with this robots.txt
-
Hi. really struggling with the robots.txt file
this is it:User-agent: *
Disallow: /product/#old sitemap
Disallow: /media/name.xmlWhen testing in w3c.org everything looks good, testing is okay, but when uploading it to the server, Google webmaster tools gives 3 errors. Checked it with my collegue we both don't know what's wrong.
Can someone take a look at this and give me the solution.
Thanx in advance!Leonie
-
I think thats a great Idea .net is not my thing.
All the best!
Tom
-
Ah thanks, it's an Azure platform, so no SFTP, SSH or .htaccess. but i'll give the stack link to the technical guys then they have to translate it to our environment ( .net)
-
Believe me it took me plenty of time to realize how to do this but if you're handy with SFTP or SSH you can change the
And for the ultimate in ease if you're using WordPress there is actually a plug-in for 410s so it wasn't something anyone found easy to do.
https://wordpress.org/plugins/wp-410/
Sincerely,
Thomas
-
Hi Leonie,
That's very kind of you I am very happy that you got it working correctly.
All the best,
Thomas
-
Hi ,
i got it working with a proper sitemap. Special thanks to Thomas for the great effort in his answers!
-
Hi, Thanx for your reply, i'm not sure i understand you by "please note you are disallowing more than just media"
the thing with this is the xml file is an old file but somewhere in the google archive. i tried do remove it with the wmt, but returns. It's not on the server anymore. the directory "media" doesn't exist anymore, also from an old website.
Because the file still returns in wmt i thought let's try it with the robots.txt
new robots.txt not tested waiting for deployment
Oh call me stupid, but how do i make a 410?
Grtz, Leonie
-
By the way here is an outdated site map that has when it looks like errors that really is telling me the protocol for putting a site map inside a robots.txt file is not endorsed by Google or Bing however I truly feel it is helpful so I do it. I've also added extra video site maps from an external host which is what's throwing out the errors the red color of the disallows is not a error it is just letting you know they are being blocked. Hopefully this will be of help
bigger photo is right here as well please give me a look at what errors are getting
http://i.imgur.com/Xg7EXwO.png
http status: 200
Syntax check robots.txt on http://www.blueprintmarketing.com/robots.txt (359 bytes)
| Line | Severity | Code |
| 6 | Warning | The official standard does not include Sitemap support even though major crawlers (Google and Bing) support it. It is still nonstandard. |
| 7 | Warning | The official standard does not include Sitemap support even though major crawlers (Google and Bing) support it. It is still nonstandard. |
| 8 | Warning | The official standard does not include Sitemap support even though major crawlers (Google and Bing) support it. It is still nonstandard. |
| 9 | Warning | The official standard does not include Sitemap support even though major crawlers (Google and Bing) support it. It is still nonstandard. |
| 10 | Warning | The official standard does not include Sitemap support even though major crawlers (Google and Bing) support it. It is still nonstandard. |Warnings Detected: 5
Errors Detected: 0
robots.txt source code for http://
| Line | Code |
| <a name="line-1"></a>1 | User-agent: * |
| <a name="line-2"></a>2 | Disallow: /wp-content/plugins/ |
| <a name="line-3"></a>3 | Disallow: /wp-admin/ |
| <a name="line-4"></a>4 | Disallow: /wp-includes/ |
| <a name="line-5"></a>5 | |
| <a name="line-6"></a>6 | Sitemap: http://www.blueprintmarketing.com/sitemap_index.xml |
| <a name="line-7"></a>7 | Sitemap: http://app.wistia.com/sitemaps/11323.xml |
| <a name="line-8"></a>8 | Sitemap: http://app.wistia.com/sitemaps/4339.xml |
| <a name="line-9"></a>9 | Sitemap: http://app.wistia.com/sitemaps/14213.xml |
| <a name="line-10"></a>10 | Sitemap: http://app.wistia.com/sitemaps/23283.xml | -
Hi Leonie,
I believe that you should create a robots.txt file that allows for a user agent disallow a folder /media/ and /.xml file. make the Unwanted xml file a 410 it will be dead to Google. however I think I have come up with a solution below please try pasting that in if it does not work.
A another tool for building robots.txt files and comparing them to the existing file from the same company believe it or not is right here.
http://www.internetmarketingninjas.com/seo-tools/robots-txt-generator/
please note you are disallowing more than just media you are disallowing something that should be more like this is for the xml sitemap why not just set it to a 410 killing the link in Google's eyes then you will not have to Disallow.
User-agent: *
Disallow: /product/
Disallow: /media/
Disallow: /bcc.xmlSitemap: http://example.com/sitemap_index.xml
putting your new site map in where I have placed a site map or where the rule above will give you the spot to put it will help you tell Google where your new site map resides along with of course submitting it to Google Webmaster tools and fetching it as a Google bot.
I would like to look at the architecture of your site if you're getting errors with what you showed me you can send me a private message and I promise I will respond if you are not comfortable showing the URL on Q&A.
I hope this is of help,
Thomas
-
Hi Dean happy to be of help!
-
Thanx for the url: it gives a warning on
Disallow: /product/
and
Disallow: /media/bcc.xmli wonder why?
-
Thomas,
That's an awesome tool, thank you for sharing.
-
if you want to find out anything that could possibly be wrong with that this tool is the holy grail of finding out what's wrong with robots.txt issues in my opinion just expect a lot more info than a simple response from it.
http://tools.seochat.com/tools/robots-txt-validator/
Sincerely,
Thomas
-
if i test the blocked url's they are blocked so it looks like the file is doing what's supposed to do. but still is strange i got these errors.
@Dean Andrews, thanx i will test it without empty lines, though have to wait for another deployment
-
Okay i got these errors in webmaster tools, very strange it is
-
Sounds more like a bug in the tool that you're as I tested the syntax just now in Google Webmaster Tools and it's not causing any issues there.
-
Hi, Lines containing only a comment are discarded completely, and therefore do not indicate a record boundary however you may need to remove the line break (not 100% sure but worth testing): User-agent: * Disallow: /product/ Disallow: /media/bcc.xml
-
Hi, sorry forgot to mention that
syntax error @ User-agent: *
no user agent @ Disallow: /product/
no user agent @ Disallow: /media/name.xml
Thanx, Leonie
-
Hi Leonie, what are the 3 errors as it seems that the robots.txt file syntax is correct.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Robots txt. in page with 301 redirect
We currently have a a series of help pages that we would like to disallow from our robots txt. The thing is that these help pages are located in our old website, which now has a 301 redirect to current site. Which is the proper way to go around? 1- Add the pages we want to disallow to the robots.txt of the new website? 2- Break the redirect momentarily and add the pages to the robots.txt of the old one? Thanks
Technical SEO | | Kilgray0 -
Robots.txt
Hi All Having a robots.txt looking like the below will this stop Google crawling the site User-agent: *
Technical SEO | | internetsalesdrive0 -
Specific question about pagination prompted by Adam Audette's Presentation at RKG Summit
This question is prompted by something Adam Audette said in this excellent presentation: http://www.rimmkaufman.com/blog/top-5-seo-conundrums/08062012/ First, I will lay out the issues: 1. All of our paginated pages have the same URL. To view this in action, go here: http://www.ccisolutions.com/StoreFront/category/audio-technica , scroll down to the bottom of the page and click "Next" - look at the URL. The URL is: http://www.ccisolutions.com/StoreFront/IAFDispatcher, and for every page after it, the same URL. 2. All of the paginated pages with non-unique URLs have canonical tags referencing the first page of the paginated series. 3. http://www.ccisolutions.com/StoreFront/IAFDispatcher has been instructed to be neither crawled nor indexed by Google. Now, on to what Adam said in his presentation: At about minute 24 Adam begins talking about pagination. At about 27:48 in the video, he is discussing the first of three ways to properly deal with pagination issues. He says [I am somewhat paraphrasing]: "Pages 2-N should have self-referencing canonical tags - Pages 2-N should all have their own unique URLs, titles and meta descriptions...The key is, with this is you want deeper pages to get crawled and all the products on there to get crawled too. The problem that we see a lot is, say you have ten pages, each one using rel canonical pointing back to page 1, and when that happens, the products or items on those deep pages don't get get crawled...because the rel canonical tag is sort of like a 301 and basically says 'Okay, this page is actually that page.' All the items and products on this deeper page don't get the love." Before I get to my question, I'll just throw out there that we are planning to fix the pagination issue by opting for the "View All" method, which Adam suggests as the second of three options in this video, so that fix is coming. My question is this: It seems based on what Adam said (and our current abysmal state for pagination) that the products on our paginated pages aren't being crawled or indexed. However, our products are all indexed in Google. Is this because we are submitting a sitemap? Even so, are we missing out on internal linking (authority flow) and Google love because Googlebot is finding way more products in our sitemap that what it is seeing on the site? (or missing out in other ways?) We experience a lot of volatility in our rankings where we rank extremely well for a set of products for a long time, and then disappear. Then something else will rank well for a while, and disappear. I am wondering if this issue is a major contributing factor. Oh, and did I mention that our sort feature sorts the products and imposes that new order for all subsequent visitors? it works like this: If I go to that same Audio-Technica page, and sort the 125+ resulting products by price, they will sort by price...but not just for me, for anyone who subsequently visits that page...until someone else re-sorts it some other way. So if we merchandise the order to be XYZ, and a visitor comes and sorts it ZYX and then googlebot crawls, google would potentially see entirely different products on the first page of the series than the default order marketing intended to be presented there....sigh. Additional thoughts, comments, sympathy cards and flowers most welcome. 🙂 Thanks all!
Technical SEO | | danatanseo0 -
To integrate a blog tool onto site - or build a blog solution - what's better for SEO?
Currently looking at adding a blog to our company site subdirectory and wanted to know if there was a SEO distinction between the following methods: Integrating a bolt-on blog tool with the site to create the blog VS. just using the current site infrastructure to build blog functionality. What's better for SEO? (and if tool integration is the overwhelming response - which tool?). Cheers.
Technical SEO | | Oxfordcomma0 -
Best use of robots.txt for "garbage" links from Joomla!
I recently started out on Seomoz and is trying to make some cleanup according to the campaign report i received. One of my biggest gripes is the point of "Dublicate Page Content". Right now im having over 200 pages with dublicate page content. Now.. This is triggerede because Seomoz have snagged up auto generated links from my site. My site has a "send to freind" feature, and every time someone wants to send a article or a product to a friend via email a pop-up appears. Now it seems like the pop-up pages has been snagged by the seomoz spider,however these pages is something i would never want to index in Google. So i just want to get rid of them. Now to my question I guess the best solution is to make a general rule via robots.txt, so that these pages is not indexed and considered by google at all. But, how do i do this? what should my syntax be? A lof of the links looks like this, but has different id numbers according to the product that is being send: http://mywebshop.dk/index.php?option=com_redshop&view=send_friend&pid=39&tmpl=component&Itemid=167 I guess i need a rule that grabs the following and makes google ignore links that contains this: view=send_friend
Technical SEO | | teleman0 -
Can't find mistake in robots.txt
Hi all, we recently filled our robots.txt file to prevent some directories from crawling. Looks like: User-agent: * Disallow: /Views/ Disallow: /login/ Disallow: /routing/ Disallow: /Profiler/ Disallow: /LILLYPROFILER/ Disallow: /EventRweKompaktProfiler/ Disallow: /AccessIntProfiler/ Disallow: /KellyIntProfiler/ Disallow: /lilly/ now, as Google Webmaster Tools hasn't updated our robots.txt yet, I checked our robots.txt in some ckeckers. They tell me that the User agent: * contains an error. **Example:** **Line 1: Syntax error! Expected <field>:</field> <value></value> 1: User-agent: *** **`I checked other robots.txt written the same way --> they work,`** accordign to the checkers... **`Where the .... is the mistake???`** ```
Technical SEO | | accessKellyOCG0 -
What's best hosting option for web design company targeting UK market?
Hi, What would be the best hosting company to go with if I want to promote my site in the UK? Right now it's hostgator and I know I'll have to change it. Should I get something located in the UK (logic would suggest it) and rather dedicated server (very expensive, especially if you're using wordpress) or shared hosting will do? Thanks in advance, JJ
Technical SEO | | jjtech0 -
Robots.txt question
Hello, What does the following command mean - User-agent: * Allow: / Does it mean that we are blocking all spiders ? Is Allow supported in robots.txt ? Thanks
Technical SEO | | seoug_20050