Robots.txt and robots meta
-
I have an odd situation. I have a CMS that has a global robots.txt which has the generic
User-Agent: *
Allow: /I also have one CMS site that needs to not be indexed ever. I've read in various pages (like http://www.jesterwebster.com/robots-txt-vs-meta-tag-which-has-precedence/22 ) that robots.txt always wins over meta, but I have also read that robots.txt indicates spiderability whereas meta can control indexation. I just want the site to not be indexed. Can I leave the robots.txt as is and still put NOINDEX in the robots meta?
-
I see. Have you considered putting it behind an htpasswd?
-
I can control it (it's a custom piece of software) but it's not as easy a fix as adding a meta to the template.
The main problem is we have a junk TLD we use to test some new ideas off the live server (lets clients give us feedback) but it gets spidered and indexed and starts ranking for client sites before they're ready to live in their own TLD. This means we have to compete against ourselves (even with a 301). There's nothing sensitive or it would live behind a password.
-
Do you need to control access to the site beyond the SERPS? I would not rely on robots.txt to shield any sensitive data.
For a breakdown of robots.txt and robots meta-tags checkout: http://www.robotstxt.org/robotstxt.html and http://www.searchtools.com/robots/robots-meta.html/, and for a great post on using these standards in SEO check out: http://www.seomoz.org/blog/serious-robotstxt-misuse-high-impact-solutions
I am also concerned that you are unable to control your robots.txt! If your CMS doesn't let you do that and overwrites it when you change it manually, you have some major control problems on your hands that you should remedy.
-
Blocking it at the robots.txt will not guarantee that your site will not appear at Google's index. I think you can use meta robots NOINDEX to guarantee that Google will not show your pages when someone try to Google it.
It is important to say that Googlebot and other spiders will continue to visit your page.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Log in, sign up, user registration and robots
Hi all, We have an accommodation site that asks users only to register when they want to book a room, in the last step. Though this is the ideal situation when you have tons of users, nowadays we are having around 1500 - 2000 per day and making tests we found out that if we ask for a registration (simple, 1 click FB) we mail them all and through a good customer service we are increasing our sales. That is why, we would like to ask users to register right after the home page ie Home/accommodation or and all the rest. I am not sure how can I make to make that content still visible to robots.
Technical SEO | | Eurasmus.com
Will the authentication process block google crawling it? Maybe something we can do? We are not completely sure how to proceed so any tip would be appreciated. Thank you all for answering.3 -
How to use robots.txt to block areas on page?
Hi, Across the categories/product pages on out site there are archives/shipping info section and the texts are always the same. Would this be treated as duplicated content and harmful for seo? How can I alter robots.txt to tell google not to crawl those particular text Thanks for any advice!
Technical SEO | | LauraHT0 -
A few misc Webmaster tools questions & Robots.txt etc
Hi I have a few general misc questions re Robots.tx & GWT: 1) In the Robots.txt file what do the below lines block, internal search ? Disallow: /?
Technical SEO | | Dan-Lawrence
Disallow: /*? 2) Also the sites feeds are blocked in robots.txt, why would you want to block a sites feeds ? **3) **What's the best way to deal with the below: - old removed page thats returning a 500 response code ? - a soft 404 for an old removed page that has no current replacement old removed pages returning a 404 The old pages didn't have any authority or inbound links hence is it best/ok to simply create a url removal request in GWT ? Cheers Dan0 -
Googlebot does not obey robots.txt disallow
Hi Mozzers! We are trying to get Googlebot to steer away from our internal search results pages by adding a parameter "nocrawl=1" to facet/filter links and then robots.txt disallow all URLs containing that parameter. We implemented this late august and since that, the GWMT message "Googlebot found an extremely high number of URLs on your site", stopped coming. But today we received yet another. The weird thing is that Google gives many of our nowadays robots.txt disallowed URLs as examples of URLs that may cause us problems. What could be the reason? Best regards, Martin
Technical SEO | | TalkInThePark0 -
We just fixed a Meta refresh, unified our link profile and now our rankings are going crazy
Crazy in a bad way!I am hoping that perhaps some of you have experienced this scenario before and can shed some light on what might be happening.Here is what happened:We recently fixed a meta refresh that was on our site's homepage. It was completely fragmenting our link profile. All of our external links were being counted towards one URL, and our internal links were counting for the other URL. In addition to that, our most authoritative URL, because it was subject to a meta refresh, was not passing any of its authority to our other pages.Here is what happened to our link profile:Total External Links: Before - 2,757 After - **4,311 **Total Internal Links: Befpre - 125 After - 3,221
Technical SEO | | danatanseo
Total Links: Before - 2,882 After - 7,532Yeah....huge change. Great right? Well, I have been tracking a set of keywords that were ranking from spots 10-30 in Google. There are about 66 keywords in the set. I started tracking them because at MozCon last July Fabio Riccotta suggested that targeting keywords showing up on page 2 or 3 of the results might be easier to improve than terms that were on the bottom of page 1. So, take a look at this. The first column shows where a particular keyword ranked on 11/8 and the second column shows where it is ranking today and the third column shows the change. For obvious reasons I haven't included the keywords.11/8 11/14 Change****10 44 -34
10 26 -16
10 28 -18
10 34 -24
10 25 -15
15 29 -14
16 33 -17
16 32 -16
17 24 -7
17 53 -36
17 41 -24
18 27 -9
19 42 -23
19 35 -16
19 - Not in top 200
19 30 -11
19 25 -6
19 43 -24
20 33 -13
20 41 -21
20 34 -14
21 46 -25
21 - Not in top 200
21 33 -12
21 40 -19
21 61 -40
22 46 -24
22 35 -13
22 46 -24
23 51 -28
23 49 -26
24 43 -19
24 47 -23
24 45 -21
24 39 -15
25 45 -20
25 50 -25
26 39 -13
26 118 - 92
26 30 -4
26 139 -113
26 57 -31
27 48 -21
27 47 -20
27 47 -20
27 45 -18
27 48 -21
27 59 -32
27 55 -28
27 40 -13
27 48 -21
27 51 -24
27 43 -16
28 66 -38
28 49 -21
28 51 -23
28 58 -30
29 58 -29
29 43 -14
29 41 -12
29 49 -20
29 60 -31
30 42 -12
31 - Not in top 200
31 59 -28
31 68 -37
31 53 -22Needless to say, this is exactly the opposite of what I expected to see after fixing the meta refresh problem. I wouldn't think anything of normal fluctuation, but every single one of these keywords moved down, almost consistently 20-25 spots. The further down a keyword was to begin with, it seems the further it dropped.What do you make of this? Could Google be penalizing us because our link profile changed so dramatically in a short period of time? I should say that we have never taken part in spammy link-building schemes, nor have we ever been contacted by Google with any kind of suspicious link warnings. We've been online since 1996 and are an e-commerce site doing #RCS. Thanks all!0 -
Adding 'NoIndex Meta' to Prestashop Module & Search pages.
Hi Looking for a fix for the PrestaShop platform Look for the definitive answer on how to best stop the indexing of PrestaShop modules such as "send to a friend", "Best Sellers" and site search pages. We want to be able to add a meta noindex ()to pages ending in: /search?tag=ball&p=15 or /modules/sendtoafriend/sendtoafriend-form.php We already have in the robot text: Disallow: /search.php
Technical SEO | | reallyitsme
Disallow: /modules/ (Google seems to ignore these) But as a further tool we would like to incude the noindex to all these pages too to stop duplicated pages. I assume this needs to be in either the head.tpl or the .php file of each PrestaShop module.? Or is there a general site wide code fix to put in the metadata to apply' Noindex Meta' to certain files. Current meta code here: Please reply with where to add code and what the code should be. Thanks in advance.0 -
Can I Disallow Faceted Nav URLs - Robots.txt
I have been disallowing /*? So I know that works without affecting crawling. I am wondering if I can disallow the faceted nav urls. So disallow: /category.html/? /category2.html/? /category3.html/*? To prevent the price faceted url from being cached: /category.html?price=1%2C1000
Technical SEO | | tylerfraser
and
/category.html?price=1%2C1000&product_material=88 Thanks!0 -
Robots.txt
Hi there, My question relates to the robots.txt file. This statement: /*/trackback Would this block domain.com/trackback and domain.com/fred/trackback ? Peter
Technical SEO | | PeterM220