Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Multiple robots.txt files on server
-
Hi!
I have previously hired a developer to put up my site and noticed afterwards that he did not know much about SEO. This lead me to starting to learn myself and applying some changes step by step.
One of the things I am currently doing is inserting sitemap reference in robots.txt file (which was not there before). But just now when I wanted to upload the file via FTP to my server I found multiple ones - in different sizes - and I dont know what to do with them? Can I remove them? I have downloaded and opened them and they seem to be 2 textfiles and 2 dupplicates. Names:
robots.txt (original dupplicate)
robots.txt-Original (original)
robots.txt-NEW (other content)
robots.txt-Working (other content dupplicate)Would really appreciate help and expertise suggestions. Thanks!
-
So what's the best policy if a site uses an e-commerce platform like Magento, which has a robots file, but also has a Wordpress blog installed to another folder. eg: /blog and uses a plugin like YOAST which generated a robots file of the Wordpress installation.
Then you have 2 robots files, is this detrimental or no big deal?
-
Thanks very much for the help!
-
Thanks very much for the help!
-
Keep a backup and remove them.
Search engines are only going to look at the file which is exactly called robots.txt variations of file name will be ignored.
Do make sure the entries are correct in the main one though, you don't want Google crawling admin pages or other confidential areas of the site.
-
Hi, thanks for the answer and help!
Well, I only have one domain that has a webpage and no subdomains active (no blog-subdomain or similar) - so how can I configure that to the situation? Can I just remove all and upload the one I want, maybe?
-
That's a good question, EMS. The robots.txt protocol can get kind of
confusing when you think about it too long, and it sounds like you've
thought about this a bit. However, in this case, it might help to
look at robots.txt from the perspective of the spider.When a spider finds a URL, it takes the whole domain name (everything
between 'http://' and the next '/'), then sticks a '/robots.txt' on
the end of it and looks for that file. If that file exists, then the
spider should read it to see where it is allowed to crawl.In your case, Googlebot, or any other spider, should try to access
three URLs: domainA.com/robots.txt, domainB.domainA.com/robots.txt,
and domainB.com/robots.txt. The rules in each are treated as
separate, so disallowing robots from domainA.com/ should result in
domainA.com/ being removed from search results while
domainB.domainA.com/ remains unaffected, which does not sound like not
something you want.The problem you might have with the setup you have described is this--
in order to keep domainB.domainA.com out of the results, you would
need to have domainB.domainA.com/robots.txt exclude robots, while
domainB.com/robots.txt welcomes them. This means that you would need
to have a way to make domainB.domainA.com/ and domainB.com/ serve
different information, and judging from what you've described, you
have not set up your server to do so yet.Of course, it is always possible that I have assumed to much about
your situation, so it is a good idea to use Google's robots.txt
analysis tool (see http://www.google.com/support/webmasters/bin/topic.py?topic=8475
) to see if your robots.txt files already produce the results you
want.If using robots.txt files doesn't solve the problem, and assuming that
you want to continue hosting all of your content on domainA.com, one
strategy you really should look into would be setting up a 301
redirect from the pages on domainB.domainA.com/ to domainB.com/ . If
you need more advice on how to do this with your server software, your
hosting company's tech support would definitely be the best place to
start, but this group is here to help if more isues arise.
Hope that helps!
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Crawl solutions for landing pages that don't contain a robots.txt file?
My site (www.nomader.com) is currently built on Instapage, which does not offer the ability to add a robots.txt file. I plan to migrate to a Shopify site in the coming months, but for now the Instapage site is my primary website. In the interim, would you suggest that I manually request a Google crawl through the search console tool? If so, how often? Any other suggestions for countering this Meta Noindex issue?
Technical SEO | | Nomader1 -
I have a GoDaddy website and have multiple homepages
I have GoDaddy website builder and a new website http://ecuadorvisapros.com and I notices through your crawl test that there are 3 home pages http://ecuadorvisapros with a 302 temporary redirect, http://www.ecuadorvisapros.com/ with no redirect and http://www.ecuadorvisapros/home.html. GoDaddy says there is only one home page. Is this going to kill my chances of having a successful website and can this be fixed? Or can it. I actually went with the SEO version thinking it would be better, but it wants to auto change my settings that I worked so hard at with your sites help. Please keep it simple, I am a novice although I have had websites in the past I know more about the what's than the how's of websites. Thanks,
Technical SEO | | ScottR.0 -
Robots.txt to disallow /index.php/ path
Hi SEOmoz, I have a problem with my Joomla site (yeah - me too!). I get a large amount of /index.php/ urls despite using a program to handle these issues. The URLs cause indexation errors with google (404). Now, I fixed this issue once before, but the problem persist. So I thought, instead of wasting more time, couldnt I just disallow all paths containing /index.php/ ?. I don't use that extension, but would it cause me any problems from an SEO perspective? How do I disallow all index.php's? Is it a simple: Disallow: /index.php/
Technical SEO | | Mikkehl0 -
Determining When to Break a Page Into Multiple Pages?
Suppose you have a page on your site that is a couple thousand words long. How would you determine when to split the page into two and are there any SEO advantages to doing this like being more focused on a specific topic. I noticed the Beginner's Guide to SEO is split into several pages, although it would concentrate the link juice if it was all on one page. Suppose you have a lot of comments. Is it better to move comments to a second page at a certain point? Sometimes the comments are not super focused on the topic of the page compared to the main text.
Technical SEO | | ProjectLabs1 -
Links from the same server has value or not
Hi Guys, Sometime ago one of the SEO experts said to me if I get links from the same IP address, Google doesn't count them as with much value. For an example, I am a web devleoper and I host all my clients websites on one server and link them back to me. Im wondering whether those links have any value when it comes to seo or should I consider getting different hosting providers? Regards Uds
Technical SEO | | Uds0 -
500 Server Error on RSS Feed
Hi there, I am getting multiple 500 errors on my RSS feed. Here is the error: <dt>Title</dt> <dd>500 : Error</dd> <dt>Meta Description</dt> <dd>Traceback (most recent call last): File "build/bdist.linux-x86_64/egg/downpour/init.py", line 391, in _error failure.raiseException() File "/usr/local/lib/python2.7/site-packages/twisted/python/failure.py", line 370, in raiseException raise self.type, self.value, self.tb Error: 500 Internal Server Error</dd> <dt>Meta Robots</dt> <dd>Not present/empty</dd> <dt>Meta Refresh</dt> <dd>Not present/empty</dd> Any ideas as to why this is happening, they are valid feeds?
Technical SEO | | mistat20000 -
Allow or Disallow First in Robots.txt
If I want to override a Disallow directive in robots.txt with an Allow command, do I have the Allow command before or after the Disallow command? example: Allow: /models/ford///page* Disallow: /models////page
Technical SEO | | irvingw0 -
Bing rank drop off for multiple sites
Hi Mozzers, Seeing some wacky stuff going on on some sites I manage. In more than a few, the ranking on bing has dropped basically overnight from page one spots to not being found on the first 100 positions. Anyone else seeing similar results? Some of the sites are fairly new, some have been around for ages, some are wordpress, some are not. I've been searching for some news of a big change on bing, but keep reading about bing dropping the thin sites during black friday. In one example, I had the site set up in BWT for a while, and had a look at the data. The reports show that the pages are crawled, the index summary shows pages indexed, and there seems to be no crawl errors, but rankings are absolutely gone. Also, I can't see the sites in bing if I search "site:example.com" in bing. Here's 2 examples, the first would make sense since it's pretty thin as I havent added much content yet: http://homewindowtint.org but this one doesn't make sense to me. Sure there's a few errors, but to be dropped like a rock seems weird http://www.ahmedandsukaram.com
Technical SEO | | rosstaylor0