Robots.txt and robots meta
-
I have an odd situation. I have a CMS that has a global robots.txt which has the generic
User-Agent: *
Allow: /I also have one CMS site that needs to not be indexed ever. I've read in various pages (like http://www.jesterwebster.com/robots-txt-vs-meta-tag-which-has-precedence/22 ) that robots.txt always wins over meta, but I have also read that robots.txt indicates spiderability whereas meta can control indexation. I just want the site to not be indexed. Can I leave the robots.txt as is and still put NOINDEX in the robots meta?
-
I see. Have you considered putting it behind an htpasswd?
-
I can control it (it's a custom piece of software) but it's not as easy a fix as adding a meta to the template.
The main problem is we have a junk TLD we use to test some new ideas off the live server (lets clients give us feedback) but it gets spidered and indexed and starts ranking for client sites before they're ready to live in their own TLD. This means we have to compete against ourselves (even with a 301). There's nothing sensitive or it would live behind a password.
-
Do you need to control access to the site beyond the SERPS? I would not rely on robots.txt to shield any sensitive data.
For a breakdown of robots.txt and robots meta-tags checkout: http://www.robotstxt.org/robotstxt.html and http://www.searchtools.com/robots/robots-meta.html/, and for a great post on using these standards in SEO check out: http://www.seomoz.org/blog/serious-robotstxt-misuse-high-impact-solutions
I am also concerned that you are unable to control your robots.txt! If your CMS doesn't let you do that and overwrites it when you change it manually, you have some major control problems on your hands that you should remedy.
-
Blocking it at the robots.txt will not guarantee that your site will not appear at Google's index. I think you can use meta robots NOINDEX to guarantee that Google will not show your pages when someone try to Google it.
It is important to say that Googlebot and other spiders will continue to visit your page.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Bloking pages in roborts.txt that are under a redirected subdomain
Hi Everyone, I have a lot of Marketo landing pages that I don't want to show in SERP. Adding the noindex meta tag for each page will be too much, I have thousands of pages. Blocking it in roborts.txt could have been an option, BUT, the subdomain homepage is redirected to my main domain (with a 302) so I may confuse search engines ( should they follow the redirect or should they block) marketo.mydomain.com is redirected to www.mydomain.com disallow: / (I think this will be confusing with the redirect) I don't have folders, all pages are under the subdomain, so I can't block folders in Robots.txt also Would anyone had this scenario or any suggestions? I appreciate your thoughts here. Thank you Rachel
Technical SEO | | RaquelSaiz0 -
Meta keywords shown in Google SERPS as site description
I'm seeing Google display meta-keywords in the SERP description for some sites (at least a half dozen that I've checked). I BELIEVE IT IS AN AJAX ISSUE BECAUSE: The sites all use AJAX to display content. So the meta-keywords are in the header, and the javascript that displays the content. Non-AJAX parts of the site display properly in Google SERPS The meta-keywords don't visibly appear anywhere on the page. When I turn off images and Javascript in Chrome I don't see any hidden keyword text. I BELIEVE IT IS A GOOGLE-SPECIFIC ISSUE BECAUSE: Each site displays properly in Bing and Yahoo SERPS - the meta-description is the description. However, (as expected) I see the same strange meta-keyword activity in Aol search In Screaming Frog's SERP preview I see the meta-description as the description. Google has been ignoring met-keywords for years. Any idea why it's appearing in the SERPS for these AJAX powered sites? I found one other person who saw that Google may be reading and displaying their content in AJAX even though that content is meant to appear on a different "page". No one on that Google Forum seemed to understand the person's problem. The only reason I get it is because now I'm seeing it with my own eyes. I know the Moz community can do better, so i'm posting about it here.
Technical SEO | | AlexCobb0 -
Should I use these Meta Tags or Remove it?
Hi, I have a lot of older pages that I am cleaning up older pages, and I see that I have <title>Actual Title</title> (I understand the importance of this tag.) (I have some text in this meta tag on a lot of pages, sometimes matching my title tag exactly but in some cases I treated it like a mini description. Should I remove the on my pages, or keep it and make sure it is the exact as the main Title Tag. -------- Question about meta tag #2. I have heard rumors that the keywords tag should be removed. example: Thanks in advance! Force7
Technical SEO | | Force70 -
Ajax #! URLs, Linking & Meta Refresh
Hi, We recently underwent a platform change and unfortunately our updated ecom site was coded using java script. The top navigation is uncrawlable, the pertinent product copy is undetectable and duplicated throughout the code, etc - it needs a lot of work to make it (even somewhat) seo-friendly. We're in the process of implementing ajax #! to our site and I've been tasked with creating a document of items that I will test to see if this solution will help our rankings, indexing, etc (on Google, I've read the issues w/ Bing). I have 2 questions: 1. Do I need to notify our content team who works on our linking strategy about the new urls? Would we use the #! url (for seo) or would we continue to use the clean url (without the #!) for inbound links? 2. When our site transferred over, we used meta refresh on all of the pages instead of 301s for some reason. Instead of going to a clean url, our meta refresh says this: . Would I update it to have the #! in the url? Should I try and clean up the meta refresh so it goes to an actual www. url and not this browsererrorview page? Or just push for the 301? I have read a ton of articles, including GWT docs, but I can't seem to find any solid information on these specific questions so any help I can get would be greatly appreciated. Thanks!
Technical SEO | | Improvements0 -
Best practice: unique meta descriptions on blog 'tag' pages
Hi everyone, I'm curious, are there best practices for introducing unique meta descriptions on blog tag pages (I'm using wordpress)? For instance, using platinum seo, on an original post, the meta description is either the excerpt or a specified custom sentence. It doesn't appear that platinum seo allows for custom descriptions on tag pages. Love to hear your thoughts. Thanks! Peter
Technical SEO | | peterdbaron1 -
Having both <title>and <meta name="title"...> on a web page?</title>
Hi All, Client of mine using reversed Meta Tags format in their website and Honestly i never saw such Meta Tags formats. In my opinion having 2 Title tags and wrong reversed description tag is not correct and the needs to be removed, and other tags need to be changed,too But they said that it probably doesn't make a difference because they don't think it affects search engine results and won't remove it just based on opinion. Because weird thing is Search Engines are apparently able to index them. So should i persist on correcting them or just hope for the best and ignore it?!?!?! Thanks!
Technical SEO | | DigitalJungle0 -
Severe rank drop due to overwritten robots.txt
Hi, Last week we made a change to drupal core for an update to our website. We accidentally overwrote our good robots.txt that blocked hundreds of pages with the default drupal robots.txt. Several hours after that happened (and we didn't catch the mistake) our rankings dropped from mostly first, second place in Google organic to bottom and mid first page. Basically I believe we flooded the index with very low quality pages at once and threw a red flag and we got de-ranked. We have since fixed the robots.txt and have been re-crawled but have not seen a return in rank. Would this be a safe assumption of what happened? I haven't seen any other sites getting hit in the retail vertical yet in regards to any Panda 2.3 type of update. Will we see a return in our results anytime soon? Thanks, Justin
Technical SEO | | BrettKrasnove0 -
What is the sense of robots.txt?
Using robots.txt to prevent search engine from indexing the page is not a good idea. so what is the sense of robots.txt? just for attracting robots to crawl sitemap?
Technical SEO | | jallenyang0