How do you disallow HTTPS?
-
I currently have a site (startuploans.org) that runs everything as http, recently we decided to start an online application to process loan apps. Now, for one certain section we configured ssl to work (https://www.startuploans.org/secure/).
If I go to the HTTPS url for any of my other pages they show up...I was going to just 301 everything from https but because it is in a subdirectiory I can't...
Also, canonical URL's won't work either because it's a totally different system and the pages are generated in an odd manor.
It's really just 1 page that needs to be disallowed..
Is there any way to disallow all HTTPS requests from robots.txt while keeping all the HTTP requests working as normal?
-
Hi Rick,
Your first thought was correct. If you apply the noindex meta tag to every page in the secure part of the site, then all of those pages will be de-indexed and you will have no duplicate content problem.
For Wordpress, you just need to install a plugin that allows you to edit and apply page elements and meta tags. My preference is Yoast SEO. If you do a plugin search from your dashboard you will find it.
Hope that helps,
Sha
-
Perfect. This is the answer I was looking for...I will just use the meta tag globally in HTTPS....BUT...what about the fact that my entire site is duplicated in HTTPS?
It's all good for the /secure/ part, but what about my Wordpress install...how do I handle that? Maybe my best option is to just load 2 different robots.txt files...
-
Hi Rick,
If you wish to use the robots.txt method to disallow all or part of your site's https protocol, you simply need to load two separate robots.txt files.
The http and https protocols are basically viewed by bots as if they were two completely separate root domains (which I guess you already know as you have mentioned the fact that port 443 is used for the secure protocol).
Google's advice is that to use this method, you should have a separate robots.txt file for each protocol with code as follows:
For your http protocol (http://www.startuploans.org/robots.txt
User-agent: *
Allow: /For the https protocol (https://www.startuploans.org/robots.txt
User-agent: *
Disallow: /However, blocking crawlers with robots.txt is not the most reliable method for excluding pages from Search engines. The reason for this is that the page will continue to be indexed if it happens to be found via a link from another page. Basically, the robots.txt is the sign on the front door that says "Please stay out of our house", but it is never seen by the people who enter via the rear exit or climb in a window!
The most reliable method of excluding pages is to add the noindex meta tag as suggested by MagentoWebDeveloper and Alan.When a bot encounters the noindex meta tag it will send a signal to the search engine to de-index the page and there is no further problem.
I would generally use noindex, follow rather than noindex, nofollow as the nofollow tag will stop the flow of link value through your site. In most cases, as long as the noindex is in place, there is no reason to be worried about the links on the pages being followed.
You should NEVER use both methods at the same time.
Hope that helps,
Sha
-
I agree. Best practices dictate that the proper answer is to block the entire folder from indexing.
-
Why not just NO INDEX / NO FOLLOW the page? What is the reason behind this? Do you want Google not to index your https page? Duplicate content? All checkouts have https.
-
I should have added that -the code above goes in the htaccess...that code would deliver two different robots.txt files based on if it's port 443 (secure) or the normal robots.txt file if it's any other port (normal).
Is there any easier way? I feel like one misstep on this and I could block bots from my site.
-
Nope...thanks though Code is no problem for us...it's just a technical question. Here is what I want:
I want to restrict robots from the HTTPS version (secure) of my site while leaving the HTTP version (unsecure) perfectly normal and accessible by bots.
Basically what I am asking is..is this the best way (below)? Is there a simpler way...to my knowledge robots.txt doesn't support protocols so doing something like disallow:https://......yada yada won't work.
RewriteEngine on
RewriteCond %{SERVER_PORT} ^443$
RewriteRule ^robots.txt$ robots_ssl.txt [L] -
Hello Rick,
First caveat is I am not sure what you want to accomplish: You want it so that once the app is done, the person is no longer in https:// ?? If that is it, then while I am not sure I will be able to help, I want to clarify the issue.
Currently, you have one page that is https: and that is your loan app page with url of https://startuploans.org/secure/site/step1 (I did not get a step two on my test, but the next page was https://startuploans.org/secure/step3.) You want a person to finish the app, and then not be in https when they return to the site?
I am not a coder per se, but I am wondering if y ou change the target on the menu link to the secure pages to open in a new window there would be no option to go back. once finished, page 3 have an option to close to secure my information. Then, they are left at the page they were on before going to application.
Now, if none of this was what you wanted, I owe you a beer.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Robots.txt Disallow: / in Search Console
Two days ago I found out through search console that my website's Robots.txt has changed to User-agent: *
Technical SEO | | RAN_SEO
Disallow: / When I check the robots.txt in the website it looks fine - I see its blocked just in search console( in the robots.txt tester). when I try to do fetch as google to the homepage I see its blocked. Any ideas why would robots.txt block my website? it was fine until the weekend. before that, in the last 3 months I saw I had blocked resources in the website and I brought back pages with fetch as google. Any ideas?0 -
Http to https for large ecommerce - our steps taken (any others recommended?)
**Here is the message from our technical team for the http to https migration; are there any other http to https migration steps recommended? ** Http to https migration steps (for this large ecommerce site): We implemented HTTPS (HTTP over TLS) protocol today (5/4/2017). Applied a patch to ensure that HTTPS pages did not have NoIndex, NoFollow and tested before and after . Added new IIS HTTPS Redirect to enforce HTTPS from HTTP and changed others, including the WWW redirect Changed HTTPS only for Cookies as required as per new PCI vulnerabilities Changed the Basepage HTML template to use Relative Paths or Absolute URLs with HTTPS only (to prevent mixed content) Created and ran a SQL Script to cleanup 16 tables from HTTP to HTTPS (about 20,000 of them, including internal URL links, site settings, etc) Ran Google Sitemap Generator to create new sitemaps with HTTPS Added new HTTPS instance of the site into Webmaster Tools, then added verification code to master page, verified and then submitted the sitemaps to Search Console (QUESTION: will historical data in Google Console/ WMT be preserved for https?) **Follow up steps for http to https migration for large ecommerce: ** From this point forward, to avoid “mixed content”, the Marketing team must use either Relative Paths or Absolute Paths with HTTPS only in any customization (i.e. Basepage) or any new link, such as created in Content Management (i.e. Long Description). Any mixed content will make the website look not secure to customers and search engine spiders – so it is very important to be disciplined and diligent about this. Contact Salesforce to change the protocol to HTTPS only. Meanwhile, to prevent mixed content, we put in a temporary custom javascript change as workaround – but this should not be permanent especially as to the next upgrade will remove it – so we need Saleforce to make a change ASAP. We did not change Blog site (on sub domain), but we should even though it is only a Content site because it will not be signaled as Secure. This means we need to have someone make the changes to WordPress to enforce HTTPS and then change any links. In terms of impact to page ranking due to Google’s treatment of HTTPS over HTTP and due to some impact to page speed – we will need to monitor closely to see how indexing, organic traffic and page ranking goes and take any additional actions as necessary.
Technical SEO | | seo20170 -
Https & http
I have my website (HTTP://thespacecollective.com) marked on Google Webmaster Tools as being the primary domain, as opposed to https. But should all of my on page links be http? For instance, if I click the Home button on my home page it will take the user to http, but if you type in the domain name in the address bar it will take you to https. Could this be causing me problems for SEO?
Technical SEO | | moon-boots0 -
Proper 301 redirect code for http to https
I see lots of suggestions on the web for forwarding http to https. I've got several existing sites that want to take advantage of the SSL boost for SEO (however slight) and I don't want to lose SEO placements in the process. I can force all pages to be viewed through the SSL - that's no problem. But for SEO reasons, do I need to do a 301 redirect line of code for every page in the site to the new "https" version? Or is there a way to catch all with one line of code that Google, etc. will recognize & honor?
Technical SEO | | wcksmith10 -
HTTPS & 301s
Hi We have like most set up a redirect from HTTP to HTTPS. We also changed our website and set up redirects from .ASP pages to PHP pages We are now seeing 2 redirects in place for the whole of the website.
Technical SEO | | Direct_Ram
http.www.domain.com > https.www.domain.com (1) >> oldwebpage.asp >> new webpage.php (2) The question is: Is there anyway of making the redirect 1 and not 2? thanks
Enver0 -
Mass HTTP to HTTPs move
Hi, As as part of an on-site SEO optimisation process, we've identified moving over from http to https - this is also in part to ensure our on-site forms are secure. In our industry our website has a high traffic volume (top 2 in the industry), we are concerned what impact the 301-redirecting from http to https would have on our organic traffic, both in terms of how Google would react to this mass-301 redirect plus the loss of 'search value' of inbound links. Privacy issues aside, would the minor quality-signal improvement be worth the move? Anyone have experience with such a move - was the outcome positive? Many thanks, Jason
Technical SEO | | Clickmetrics0 -
Https indexed...how?
Hello Moz, Since a while i am struggling with a SEO case: At the moment a https version of a homepage of a client of us is indexed in Google. Thats really strange because the url is redirected to an other website url for three weeks now. And we did everything to make clear to google that he has to index the other url.
Technical SEO | | Searchresult
So we have a few homepage urls A https://www.website.nl
B https://www.websites.nl/category
C http://www.websites.nl/category What we did: Redirected A with a 301 to B, a redirect from A or B to C is difficult because of the security issue with the ssl certificate. We put the right canonical url (VERSION C) on every version of the homepage(A,B) We only put the canonical urls in the sitemap.xml, only version C and uploaded it to Google Webmastertools We changed all important internal links to Version C We also get some valuable external backlinks to Version C Is there something i missed or i forget to say to Google hey look you've got the wrong url indexed, you have to index version C? How is it possible Google still prefers Version A after doing al those changes three weeks a go? I'am really looking forward to your answer. Thanks a lot in advanced! Greetz Djacko0 -
How to disallow google and roger?
Hey Guys and girls, i have a question, i want to disallow all robots from accessing a certain root link: Get rid of bots User-agent: * Disallow: /index.php?_a=login&redir=/index.php?_a=tellafriend%26productId=* Will this make the bots not to access any web link that has the prefix you see before the asterisk? And at least google and roger will get away by reading "user-agent: *"? I know this isn't the standard proceedure but if it works for google and seomoz bot we are good.
Technical SEO | | iFix0