How do you disallow HTTPS?
-
I currently have a site (startuploans.org) that runs everything as http, recently we decided to start an online application to process loan apps. Now, for one certain section we configured ssl to work (https://www.startuploans.org/secure/).
If I go to the HTTPS url for any of my other pages they show up...I was going to just 301 everything from https but because it is in a subdirectiory I can't...
Also, canonical URL's won't work either because it's a totally different system and the pages are generated in an odd manor.
It's really just 1 page that needs to be disallowed..
Is there any way to disallow all HTTPS requests from robots.txt while keeping all the HTTP requests working as normal?
-
Hi Rick,
Your first thought was correct. If you apply the noindex meta tag to every page in the secure part of the site, then all of those pages will be de-indexed and you will have no duplicate content problem.
For Wordpress, you just need to install a plugin that allows you to edit and apply page elements and meta tags. My preference is Yoast SEO. If you do a plugin search from your dashboard you will find it.
Hope that helps,
Sha
-
Perfect. This is the answer I was looking for...I will just use the meta tag globally in HTTPS....BUT...what about the fact that my entire site is duplicated in HTTPS?
It's all good for the /secure/ part, but what about my Wordpress install...how do I handle that? Maybe my best option is to just load 2 different robots.txt files...
-
Hi Rick,
If you wish to use the robots.txt method to disallow all or part of your site's https protocol, you simply need to load two separate robots.txt files.
The http and https protocols are basically viewed by bots as if they were two completely separate root domains (which I guess you already know as you have mentioned the fact that port 443 is used for the secure protocol).
Google's advice is that to use this method, you should have a separate robots.txt file for each protocol with code as follows:
For your http protocol (http://www.startuploans.org/robots.txt
User-agent: *
Allow: /For the https protocol (https://www.startuploans.org/robots.txt
User-agent: *
Disallow: /However, blocking crawlers with robots.txt is not the most reliable method for excluding pages from Search engines. The reason for this is that the page will continue to be indexed if it happens to be found via a link from another page. Basically, the robots.txt is the sign on the front door that says "Please stay out of our house", but it is never seen by the people who enter via the rear exit or climb in a window!
The most reliable method of excluding pages is to add the noindex meta tag as suggested by MagentoWebDeveloper and Alan.When a bot encounters the noindex meta tag it will send a signal to the search engine to de-index the page and there is no further problem.
I would generally use noindex, follow rather than noindex, nofollow as the nofollow tag will stop the flow of link value through your site. In most cases, as long as the noindex is in place, there is no reason to be worried about the links on the pages being followed.
You should NEVER use both methods at the same time.
Hope that helps,
Sha
-
I agree. Best practices dictate that the proper answer is to block the entire folder from indexing.
-
Why not just NO INDEX / NO FOLLOW the page? What is the reason behind this? Do you want Google not to index your https page? Duplicate content? All checkouts have https.
-
I should have added that -the code above goes in the htaccess...that code would deliver two different robots.txt files based on if it's port 443 (secure) or the normal robots.txt file if it's any other port (normal).
Is there any easier way? I feel like one misstep on this and I could block bots from my site.
-
Nope...thanks though Code is no problem for us...it's just a technical question. Here is what I want:
I want to restrict robots from the HTTPS version (secure) of my site while leaving the HTTP version (unsecure) perfectly normal and accessible by bots.
Basically what I am asking is..is this the best way (below)? Is there a simpler way...to my knowledge robots.txt doesn't support protocols so doing something like disallow:https://......yada yada won't work.
RewriteEngine on
RewriteCond %{SERVER_PORT} ^443$
RewriteRule ^robots.txt$ robots_ssl.txt [L] -
Hello Rick,
First caveat is I am not sure what you want to accomplish: You want it so that once the app is done, the person is no longer in https:// ?? If that is it, then while I am not sure I will be able to help, I want to clarify the issue.
Currently, you have one page that is https: and that is your loan app page with url of https://startuploans.org/secure/site/step1 (I did not get a step two on my test, but the next page was https://startuploans.org/secure/step3.) You want a person to finish the app, and then not be in https when they return to the site?
I am not a coder per se, but I am wondering if y ou change the target on the menu link to the secure pages to open in a new window there would be no option to go back. once finished, page 3 have an option to close to secure my information. Then, they are left at the page they were on before going to application.
Now, if none of this was what you wanted, I owe you a beer.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
301 Domain Redirect from old domain with HTTPS
My domain was indexed with HTTPS://WWW. now that we redirected it the certificate has been removed and if you try to visit the old site with https it throws an obvious error that this sites not secure and the 301 does not happen. My question is will googles bot have this issue. Right now the domain has been in redirection status to the new domain for a couple months and the old site is still indexed, while the new one is not ranking well for half its terms. If that is not causing the problem can anyone tell me why would the 301 take such a long time. Ive double and quadruple checked the 301's and all settings to ensure its being redirected properly. Yet it still hasn't fully redirected. Something is wrong and my clients ready to ditch the old domain we worked on for a good amount of time. backgorund:About 30 days ago we found some redirect loops .. well not loop but it was redirecting from old domain to the new domain several times without error. I removed the plugins causing the multi redirects and now we have just one redirect from any page on the old domain to the new https version. Any suggestions? This is really frustrating me and I just can't figure it out. My only answer at this point is wait it out because others have had this issue where it takes up to 2 months to redirect the domain. My only issue is that this is the first domain redirect out of many that have ever taken more than a week or three.
Technical SEO | | waqid0 -
Once on https should Moz still be picking up errors on http
Hello, Should Moz be picking up http errors still if the sites on https? Or has the https not been done properly? I'm getting duplicate errors amoung other things. Cheers, Ruth
Technical SEO | | Ruth-birdcage1 -
Http -> https redirections / 301 the right way
Dear mozers, Thank you for your time reading the message and wanting to help! So, we have moved our WordPress to https and redirected all the content successfully via htaccess file. We used a simple 301 redirect plugin, which we are using to redirect old URLs to the new ones. The problem today is, the redirections in the plugin are not working for http version. Here is an example: htaccess redirect: http --> https Plugin redirect domain.com/old --> domain.com/new but, the url http://domain.com/old is not redirecting to https://domain.com/new while https://domain.com/old does redirects to https://domain.com/new What can you suggest as a solution? Thank you in advance! P.S. I don't think having 2 redirects for each version of the URL is the smartest solution Best wishes, Dusan
Technical SEO | | Chemometec0 -
Keeping external links after moving from http to https?
Hi, Does anyone have experience moving a website to https? I am about to do so. I have 84 linking root domains and around 2k+ external links. If i move a website to https will these links be lost? And how to keep these links? Many thanks, Dusan
Technical SEO | | Chemometec0 -
Migration to https
Hi there, For several reasons we consider to switch from http to https. My question about this: Does this change impact organic search results since the URL changes? Is a simple 301 on the highest level enough to keep all of our positions with every page? Are there any other possible issues we might think about before deciding? I'm talking about a webshop with over 50k indexed pages and lots of running marketing channels all setted up based on the http URL structure. Thanks in advance.
Technical SEO | | MarcelMoz
Marcel0 -
Will SEO Moz index our keywords if the site is ALL https?
We have a site coming into beta next week. Playing around with SEO Moz, I had trouble getting the keywords to rank at all. Was this because the site is entirely https? If yes, what else can SEO Moz NOT do if the site is all https? Thanks!
Technical SEO | | OTSEO0 -
Https-pages still in the SERP's
Hi all, my problem is the following: our CMS (self-developed) produces https-versions of our "normal" web pages, which means duplicate content. Our it-department put the <noindex,nofollow>on the https pages, that was like 6 weeks ago.</noindex,nofollow> I check the number of indexed pages once a week and still see a lot of these https pages in the Google index. I know that I may hit different data center and that these numbers aren't 100% valid, but still... sometimes the number of indexed https even moves up. Any ideas/suggestions? Wait for a longer time? Or take the time and go to Webmaster Tools to kick them out of the index? Another question: for a nice query, one https page ranks No. 1. If I kick the page out of the index, do you think that the http page replaces the No. 1 position? Or will the ranking be lost? (sends some nice traffic :-))... thanx in advance 😉
Technical SEO | | accessKellyOCG0 -
Disallowing https URLs
It there a problem disallowing all https URLs to be indexed in order to avoid duplication? This is the article recommending this practice - http://blog.leonardchallis.com/seo/serve-a-different-robots-txt-for-https/ Thanks!
Technical SEO | | theLotter0