Robots txt
-
We have a development site that we want google and other bots to stay out of but we want roger to have access.
Currently our robots.txt looks like this:
User-agent: *
Disallow: /cgi-bin/
Disallow: /development/What would i need to addd or change to let him through?
Thank you.
-
Thank you very much.
Its looks so simple but when you've not done this before it can be scary.
I also didn't want to get it wrong and miss roger on his crawl tomorrow!
-
<code>This should work fine:</code>
<code>User-agent: *
Disallow: /cgi-bin/
Disallow: /development/</code><code>User-agent: rogerbot Allow: /development/</code> ```
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Robots.txt allows wp-admin/admin-ajax.php
Hello, Mozzers!
Technical SEO | | AndyKubrin
I noticed something peculiar in the robots.txt used by one of my clients: Allow: /wp-admin/admin-ajax.php What would be the purpose of allowing a search engine to crawl this file?
Is it OK? Should I do something about it?
Everything else on /wp-admin/ is disallowed.
Thanks in advance for your help.
-AK:2 -
One server, two domains - robots.txt allow for one domain but not other?
Hello, I would like to create a single server with two domains pointing to it. Ex: domain1.com -> myserver.com/ domain2.com -> myserver.com/subfolder. The goal is to create two separate sites on one server. I would like the second domain ( /subfolder) to be fully indexed / SEO friendly and have the robots txt file allow search bots to crawl. However, the first domain (server root) I would like to keep non-indexed, and the robots.txt file disallowing any bots / indexing. Does anyone have any suggestions for the best way to tackle this one? Thanks!
Technical SEO | | Dave1000 -
Easy Question: regarding no index meta tag vs robot.txt
This seems like a dumb question, but I'm not sure what the answer is. I have an ecommerce client who has a couple of subdirectories "gallery" and "blog". Neither directory gets a lot of traffic or really turns into much conversions, so I want to remove the pages so they don't drain my page rank from more important pages. Does this sound like a good idea? I was thinking of either disallowing the folders via robot.txt file or add a "no index" tag or 301redirect or delete them. Can you help me determine which is best. **DEINDEX: **As I understand it, the no index meta tag is going to allow the robots to still crawl the pages, but they won't be indexed. The supposed good news is that it still allows link juice to be passed through. This seems like a bad thing to me because I don't want to waste my link juice passing to these pages. The idea is to keep my page rank from being dilluted on these pages. Kind of similar question, if page rank is finite, does google still treat these pages as part of the site even if it's not indexing them? If I do deindex these pages, I think there are quite a few internal links to these pages. Even those these pages are deindexed, they still exist, so it's not as if the site would return a 404 right? ROBOTS.TXT As I understand it, this will keep the robots from crawling the page, so it won't be indexed and the link juice won't pass. I don't want to waste page rank which links to these pages, so is this a bad option? **301 redirect: **What if I just 301 redirect all these pages back to the homepage? Is this an easy answer? Part of the problem with this solution is that I'm not sure if it's permanent, but even more importantly is that currently 80% of the site is made up of blog and gallery pages and I think it would be strange to have the vast majority of the site 301 redirecting to the home page. What do you think? DELETE PAGES: Maybe I could just delete all the pages. This will keep the pages from taking link juice and will deindex, but I think there's quite a few internal links to these pages. How would you find all the internal links that point to these pages. There's hundreds of them.
Technical SEO | | Santaur0 -
Can't find mistake in robots.txt
Hi all, we recently filled our robots.txt file to prevent some directories from crawling. Looks like: User-agent: * Disallow: /Views/ Disallow: /login/ Disallow: /routing/ Disallow: /Profiler/ Disallow: /LILLYPROFILER/ Disallow: /EventRweKompaktProfiler/ Disallow: /AccessIntProfiler/ Disallow: /KellyIntProfiler/ Disallow: /lilly/ now, as Google Webmaster Tools hasn't updated our robots.txt yet, I checked our robots.txt in some ckeckers. They tell me that the User agent: * contains an error. **Example:** **Line 1: Syntax error! Expected <field>:</field> <value></value> 1: User-agent: *** **`I checked other robots.txt written the same way --> they work,`** accordign to the checkers... **`Where the .... is the mistake???`** ```
Technical SEO | | accessKellyOCG0 -
Robots.txt and joomla
Hello, I use joomla for my website and automatically all those files are blocked is that good or bad, so I remove anything and if so why ? User-agent: *
Technical SEO | | seoanalytics
Disallow: /administrator/
Disallow: /cache/
Disallow: /components/
Disallow: /images/
Disallow: /includes/
Disallow: /installation/
Disallow: /language/
Disallow: /libraries/
Disallow: /media/
Disallow: /modules/
Disallow: /plugins/
Disallow: /templates/
Disallow: /tmp/
Disallow: /xmlrpc/ I also added to my robots.txt files my email address ( is that useful, I am afraid google passes PR to the email address )
and a javascript: void (0) because I have tabs on my webpage ( is that useful )
as well as a .pdf ( is it also useful ) any comments ? does anything need to be changed or is it ok ? Thank you,0 -
Confirming Robots.txt code deep Directories
Just want to make sure I understand exactly what I am doing If I place this in my Robots.txt Disallow: /root/this/that By doing this I want to make sure that I am ONLY blocking the directory /that/ and anything in front of that. I want to make sure that /root/this/ still stays in the index, its just the that directory I want gone. Am I correct in understanding this?
Technical SEO | | cbielich0 -
Should I add my blog posts to my sitemap.txt file?
This seems like it should be an obvious no, just because of the amount of work that would entail, and then remembering to do it every time I make a post, but since I couldn't find anything on Google about it and have never heard anyone mention it, I figured I'd ask.
Technical SEO | | UnderRugSwept0 -
Impact of "restricted by robots" crawler error in WT
I have been wondering about this for a while now with regards to several of my sites. I am getting a list of pages that I have blocked in the robots.txt file. If I restrict Google from crawling them, then how can they consider their existence an error? In one case, I have even removed the urls from the index. And do you have any idea of the negative impact associated with these errors. And how do you suggest I remedy the situation. Thanks for the help
Technical SEO | | phogan0