How do you disallow HTTPS?
-
I currently have a site (startuploans.org) that runs everything as http, recently we decided to start an online application to process loan apps. Now, for one certain section we configured ssl to work (https://www.startuploans.org/secure/).
If I go to the HTTPS url for any of my other pages they show up...I was going to just 301 everything from https but because it is in a subdirectiory I can't...
Also, canonical URL's won't work either because it's a totally different system and the pages are generated in an odd manor.
It's really just 1 page that needs to be disallowed..
Is there any way to disallow all HTTPS requests from robots.txt while keeping all the HTTP requests working as normal?
-
Hi Rick,
Your first thought was correct. If you apply the noindex meta tag to every page in the secure part of the site, then all of those pages will be de-indexed and you will have no duplicate content problem.
For Wordpress, you just need to install a plugin that allows you to edit and apply page elements and meta tags. My preference is Yoast SEO. If you do a plugin search from your dashboard you will find it.
Hope that helps,
Sha
-
Perfect. This is the answer I was looking for...I will just use the meta tag globally in HTTPS....BUT...what about the fact that my entire site is duplicated in HTTPS?
It's all good for the /secure/ part, but what about my Wordpress install...how do I handle that? Maybe my best option is to just load 2 different robots.txt files...
-
Hi Rick,
If you wish to use the robots.txt method to disallow all or part of your site's https protocol, you simply need to load two separate robots.txt files.
The http and https protocols are basically viewed by bots as if they were two completely separate root domains (which I guess you already know as you have mentioned the fact that port 443 is used for the secure protocol).
Google's advice is that to use this method, you should have a separate robots.txt file for each protocol with code as follows:
For your http protocol (http://www.startuploans.org/robots.txt
User-agent: *
Allow: /For the https protocol (https://www.startuploans.org/robots.txt
User-agent: *
Disallow: /However, blocking crawlers with robots.txt is not the most reliable method for excluding pages from Search engines. The reason for this is that the page will continue to be indexed if it happens to be found via a link from another page. Basically, the robots.txt is the sign on the front door that says "Please stay out of our house", but it is never seen by the people who enter via the rear exit or climb in a window!
The most reliable method of excluding pages is to add the noindex meta tag as suggested by MagentoWebDeveloper and Alan.When a bot encounters the noindex meta tag it will send a signal to the search engine to de-index the page and there is no further problem.
I would generally use noindex, follow rather than noindex, nofollow as the nofollow tag will stop the flow of link value through your site. In most cases, as long as the noindex is in place, there is no reason to be worried about the links on the pages being followed.
You should NEVER use both methods at the same time.
Hope that helps,
Sha
-
I agree. Best practices dictate that the proper answer is to block the entire folder from indexing.
-
Why not just NO INDEX / NO FOLLOW the page? What is the reason behind this? Do you want Google not to index your https page? Duplicate content? All checkouts have https.
-
I should have added that -the code above goes in the htaccess...that code would deliver two different robots.txt files based on if it's port 443 (secure) or the normal robots.txt file if it's any other port (normal).
Is there any easier way? I feel like one misstep on this and I could block bots from my site.
-
Nope...thanks though Code is no problem for us...it's just a technical question. Here is what I want:
I want to restrict robots from the HTTPS version (secure) of my site while leaving the HTTP version (unsecure) perfectly normal and accessible by bots.
Basically what I am asking is..is this the best way (below)? Is there a simpler way...to my knowledge robots.txt doesn't support protocols so doing something like disallow:https://......yada yada won't work.
RewriteEngine on
RewriteCond %{SERVER_PORT} ^443$
RewriteRule ^robots.txt$ robots_ssl.txt [L] -
Hello Rick,
First caveat is I am not sure what you want to accomplish: You want it so that once the app is done, the person is no longer in https:// ?? If that is it, then while I am not sure I will be able to help, I want to clarify the issue.
Currently, you have one page that is https: and that is your loan app page with url of https://startuploans.org/secure/site/step1 (I did not get a step two on my test, but the next page was https://startuploans.org/secure/step3.) You want a person to finish the app, and then not be in https when they return to the site?
I am not a coder per se, but I am wondering if y ou change the target on the menu link to the secure pages to open in a new window there would be no option to go back. once finished, page 3 have an option to close to secure my information. Then, they are left at the page they were on before going to application.
Now, if none of this was what you wanted, I owe you a beer.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Fundamental HTTP to HTTPS Redirect Question
Hi All I'm planning a http to https migration for a site with over 500 pages. The site content and structure will be staying the same, this is simply a https migration. Can I just confirm the answer to this fundamental question? From my reading, I do not need to create 301 redirect for each and every page, but can add a single generic redirect so that all http references are redirected to https. Can I just double check this would suffice to preserve existing google rankings? Many Thanks
Technical SEO | | ruislip180 -
Google Serps Not Showing HTTPS in Front of URL
Hi Everyone, We implemented the HTTPS change to our four websites about 6 months ago. I have found something that I feel is strange. The homepage of each website shows www.domain.com, but all the internal pages show https://www.domain.com/page. If you click through it shows it as secure, but I feel that because it is happening on all four websites, that something was done incorrectly. Here is one Google SERP: https://www.google.com/search?client=firefox-b-1&biw=1920&bih=947&ei=gq9GWpizBuuF_Qa_p5e4Bw&q=tanzanite+jewelry+designs&oq=tanzanite+jewelry+designs&gs_l=psy-ab.3..0l2.130446.136028.0.136152.29.17.4.7.9.0.207.2214.7j9j1.17.0....0...1c.1.64.psy-ab..1.28.2350...0i131k1j0i22i30k1.0.BA5-meGmuA0 As you can see, our site displays with no https, but all the internal pages do. It just worries me as I have seen our internal pages increasing in positioning, but not our homepage. Any ideas?
Technical SEO | | vetofunk0 -
Does anyone know if an increase in 804 HTTPS errors will affect SEO rankings?
We recently moved our whole site over from HTTP to HTTPS and we went from having 106 keywords in the top 3 positions to 80 in just one week. The only thing that I can think of that caused the drop is the HTTPS changes to our site. Any input would be greatly appreciated.
Technical SEO | | SimonWorsfold0 -
How long after disallowing Googlebot from crawling a domain until those pages drop out of their index?
We recently had Google crawl a version of the site we that we had thought we had disallowed already. We have corrected the issue of them crawling the site, but pages from that version are still appearing in the search results (the version we want them to not index and serve up is our .us domain which should have been blocked to them). My question is this: How long should I expect that domain (the .us we don't want to appear) to stay in their index after disallowing their bot? Is this a matter of days, weeks, or months?
Technical SEO | | TLM0 -
Will it make any difference to SEO on an ecommerce site if they use their SSL certificate (https) across every page
I know that e-commerce sites usually have SSL certificates on their payment pages. A site I have come across is using has the https: prefix to every page on their site. I'm just wondering if this will make any difference to the site in the eyes of Search Engines, and whether it could effect the rankings of the site?
Technical SEO | | Sayers1 -
Https indexed - though a no index no follow tag has been added
Hi, The https-pages of our booking section are being indexed by Google. We added But the pages are still being indexed. What can I do to exclude these URL's from the Google index? Thank you very much in advance! Kind regards, Dennis Overbeek ACSI Publishing | dennis@acsi.eu
Technical SEO | | SEO_ACSI0 -
Google cached https rather than http
Google is using a secure version of a page (https) that is meant to be displayed using only http. I don't know of any links to the page using https, but want to verify that. I only have 1 secure page on the site and it does not link to the page in question. What is the easiest way to nail down why Google is using the https version?
Technical SEO | | TheDude0 -
HTTPS attaching to home page
Hi!! Okay - weird tech question. Domain is http://hiphound.com. I have SSL attaching to checkout and my account pages. Tested and works well. Issue - I am able to reach the home page at https://hiphound.com AND http://hiphound.com. If I access the home page via HTTPS and click on a link (any link) then the site is redirected to HTTP again which is good. My concern is the home page displaying via HTTPS and HTTP. Is this is an issue that can be resolved or is it expected behavior I have to live with.? I am being told by DEV there is nothing they can do about it but want to understand why and if they are correct. Thoughts? Thank you!! Lynn
Technical SEO | | hiphound0