Protect/Encrypt R Package Code for Distribution

Protect/encrypt R package code for distribution

Did you try following that thread?

https://stat.ethz.ch/pipermail/r-help/2011-July/282717.html

At some point the R code has to be processed by the R interpreter. If you give someone encrypted code, you have to give them the decryption key so R can run it. Maybe you can hide the key somewhere and hope they don't find it. But they must have access to it in order to generate the plain text R code somehow.

This is true for all programs or files that you run or view on your computer. Encrypted PDF files? No, they are just obfuscated, and once you find the decryption keys you can decrypt them. Even code written in C or C++ distributed as a binary can be reverse engineered with enough time, tools, and clever enough hackers.

You want it protected, you keep it on your servers, and only allow access via a network API.

Is it possible to compile R scripts into a binary?

Found this: I'm not sure about the security, but this is definitely a deterrent and would take (I think) some fairly concentrated effort to crack. There is a byte code compiler for R based on the paper linked below. There is a method in library(compiler), which comes standard with R, that allows you to compile an R script to byte code. In the same library, you can load in the source files and use them as you'd like.

A Byte Code Compiler for R

Is SSL appropriate for sending secure contents?

In theory, yes (for some definition of "transit"), but in practice for "Does this message get encrypted in transit?" the answer is maybe. In short, just ssl = True or equivalent put somewhere does almost not guarantee anything really, for all the reasons explained below.
Hence you are probably not going to like the following detailed response, as it shows basically that nothing is simple and that you have no 100% guarantee even if you do everything right and you have A LOT of things to do right.

Also TLS is the real true name of the feature you are using, SSL is dead since 20 days now, yes everyone use the old name, but that does not make this usage right nevertheless.

First, and very important, TLS provides various guarantees, among which confidentiality (the content is encrypted while in transit), but also authentication which is in your case far more important, and for the following reasons.

You need to make sure that smtp.gmail.com is resolved correctly, otherwise if your server uses lying resolvers, and is inside an hostile network that rewrites the DNS queries or responses, then you can send an encrypted content... to another party than the real "smtp.gmail.com" which makes the content not confidential anymore because you are sending it to a stranger or an active attacker.

To solve that, you need basically DNSSEC, if you are serious.
No, and contrary to what a lot of people seem to believe and convey, TLS alone or even DOH - DNS over HTTPS - do not solve that point.
Why? Because of the following that is not purely theoretical since it
happened recently (https://www.bleepingcomputer.com/news/security/hacker-hijacks-dns-server-of-myetherwallet-to-steal-160-000/), even if it was in the WWW world and not the email, the scenario can be the same:

you manage to grab the IP addresses tied to the name contacted (this can be done by a BGP hijack and it happens, for misconfigurations, "policy" reasons, or active attacks, all the time)
now that you control all communications, you put whatever server you need at the end of it
you contact any CA delivering DV certificates, including those purely automated
since the name now basically resolve to an IP you control, the web (or even DNS) validation that a CA can do will succeed and the CA will give you a certificate for this name (which may continue to work even after the end of the BGP hijack because CAs may not be quick to revoke certificates, and clients may not properly check for that).
hence any TLS stack accepting this CA will happily accept this certificate and your client will send securely content with TLS... to another target than the intended one, hence 0 real security.

In fact, as the link above shows, attackers do not even need to be so smart: even a self signed certificate or an hostname mismatch may go through because users will not care and/or library will have improper default behavior and/or programmer using the library will not use it properly (see this fascinating, albeit a tad old now, paper showing the very sad state of many "SSL" toolkits with incorrect default behavior, confusing APIs and various errors making invalid use of it far more probably than proper sane TLS operations: https://www.cs.utexas.edu/~shmat/shmat_ccs12.pdf)

Proper TLS use does not make DNSSEC irrelevant. Both targets and protects against different attacks. You need both to be more secure than just with one, and any of the two (properly used) does not replace the other. Never has and never will.

Now even if the resolution is correct, someone may have hijacked (thanks to BGP) the IP address. Then, again, you are sending to some host some encrypted content except that you do not really authenticate who is this host, so it can be anyone if an attacker managed to hijack IP addresses of smtp.gmail.com (it does not need to do it globally, just locally, "around" where your code execute).

This is where the very important TLS property of authentication kicks in.
This is typically done through X.509 certificates (which will be called - incorrectly - SSL certificates everywhere). Each end of the communication authenticate the other one by looking at the certificate it presented: either it recognizes this certificate as special, or it recognizes the issuing authority of this certificate as trusted.

So you do not just need to connect with TLS on smtp.gmail.com you also need to double check that the certificate then presented:

is for smtp.gmail.com (and not any other name), taking into account wildcards
is issued by a certificate authority you trust

All of this is normally handled by the TLS library you use except that in many cases you need at least to explicitly enable this behaviour (verification) and you need, if you want to be extra sure, to decide clearly with CAs you trust. Otherwise, too many attacks happen as can be seen in the past by rogue, incompetent or other adjectives CAs that issued certificates where they should not (and yes noone is safe against that, even Google and Microsoft got in the past mis-issued certificates with potential devastating consequences).

Now you have another problem more specific to SMTP and SMTP over TLS: the server typically advertises it does TLS and the client seeing this then can start the TLS exchange. Then all is fine (baring all the above).
But in the path between the SMTP server and you someone can rewrite the first part (which is in clear) in order to remove the information that this SMTP server speaks TLS. Then the client will not see TLS and will continue (depending on how it is developed, of course to be secure in such cases the client should abort the communication), then speaking in clear. This is called a downgrade attack. See this detailed explanation for example: https://elie.net/blog/understanding-how-tls-downgrade-attacks-prevent-email-encryption/

As Steffen points out, based on the port you are using this above issue of SMTP STARTTLS and hence the possible downgrade does not exist, because this is for port 25 which you are not using. However I prefer to still warn users about this case because it may not be well known and downgrade attacks are often both hard to detect and hard to defend against (all of this because protocols used nowadays were designed at a time where there was no need to even think about defending one against a malicious actor on the path)

Then of course you have the problem of the TLS version you use, and its parameters. The standard is now TLS version 1.3 but this is still slowly being deployed everywhere. You will find many TLS servers only knowing about 1.2
This can be good enough, if some precautions are taken. But you will also find old stuff speaking TLS 1.1, 1.0 or even worse (that is SSL 3). A secure client code should refuse to continue exchanging packets if it was not able to secure at least a TLS 1.2 connection.
Again this is normally all handled by your "SSL" library, but again you have to check for that, enable the proper settings, etc.
You have also a similar downgrade attack problem: without care, a server first advertise what it offers, in clear, and hence an attacker could modify this to remove the "highest" secure versions to force the client to use a lower versions that has more attacks (there are various attacks against TLS 1.0 and 1.1).
There are solutions, specially in TLS 1.3 and 1.2 (https://www.rfc-editor.org/rfc/rfc7633 : "The purpose of the TLS feature extension is to prevent downgrade
attacks that are not otherwise prevented by the TLS protocol.")

Aside and contrary to Steffen's opinion I do no think that TLS downgrade attacks are purely theoretical. Some examples:

(from 2014): https://p16.praetorian.com/blog/man-in-the-middle-tls-ssl-protocol-downgrade-attack (mostly because web browsers are eager to connect no matter what so typically if an attempt with highest settings fail they will fallback to lower versions until finding a case where the connection happens)
https://www.rfc-editor.org/rfc/rfc7507 specifically offers a protection, stating that: "All unnecessary protocol downgrades are undesirable (e.g., from TLS
1.2 to TLS 1.1, if both the client and the server actually do support
TLS 1.2); they can be particularly harmful when the result is loss of
the TLS extension feature by downgrading to SSL 3.0. This document
defines an SCSV that can be employed to prevent unintended protocol
downgrades between clients and servers that comply with this document
by having the client indicate that the current connection attempt is
merely a fallback and by having the server return a fatal alert if it
detects an inappropriate fallback."
https://www.nccgroup.trust/us/about-us/newsroom-and-events/blog/2019/february/downgrade-attack-on-tls-1.3-and-vulnerabilities-in-major-tls-libraries/ discusses not less than 5 CVEs in 2018 that allows TLS attacks: " Two ways exist to attack TLS 1.3. In each attack, the server needs to support an older version of the protocol as well. [..] The second one relies on the fact that both peers support an older version of TLS with a cipher suite supporting an RSA key exchange." and "This prowess is achieved because of the only known downgrade attack on TLS 1.3." and "Besides protocol downgrades, other techniques exist to force browser clients to fallback onto older TLS versions: network glitches, a spoofed TCP RST packet, a lack of response, etc. (see POODLE)".

Even if you are using a correct version, you need to make sure to use correct algorithms, key sizes, etc. Sometimes some server/library enable a "NULL" encryption algorithm, which means in fact no encryption. Silly of course, but that exists, and this is a simple case, there are far more complicated ones.

This other post from Steffen: https://serverfault.com/a/696502/396475 summarizes and touches the various above points, and gives another views on what is most important (we disagree on this, but he answered here as well so anyone is free to take both views into account and make their own opinion).

Hence MTA-STS instead of SMTP STARTTLS, https://www.rfc-editor.org/rfc/rfc8461 with this clear abstract:

SMTP MTA Strict Transport Security (MTA-STS) is a mechanism
enabling mail service providers (SPs) to declare their ability to
receive Transport Layer Security (TLS) secure SMTP connections and
to specify whether sending SMTP servers should refuse to deliver to
MX hosts that do not offer TLS with a trusted server certificate.

Hence you will need to make sure that the host you send your email too does use that feature, and that your client is correctly programmed to handle it.
Again, probably done inside your "SSL Library" but this clearly show you need specific bit in it for SMTP, and you need to contact a webserver to retrieve the remote end SMTP policies, and you need also to do DNS requests, which gets back to you on one of the earlier point about if you trust your resolver or not and if records are protected with DNSSEC.

And with all the above, which already covers many areas and is really hard to do correctly, there are still many other points to cover...
The transit is safe, let us assume. But then how does the content gets retrieved? You may say it is not your problem anymore. Maybe. Maybe not. Do you want to be liable for that? Which means that you should maybe also encrypt the attachment itself, this is in addition (not in replacement) of the transport being secured.
The default mechanisms to secure email contents either use OpenPGP (has a more geek touch to it), or S/MIME (has a more corporate touch to it). This works for everything. Then you have specific solutions depending on the document (but this does not solve the problem of securing the body of the email), like PDF documents can be protected by a password (warning: this has been cracked in the past).

I am sending sensitive info

This is then probably covered by some contract or some norms, depending on your area of business. You may want to dig deeper into those to see exactly what are the requirements forced upon you so that you are not liable for some problems, if you secured everything else correctly.

Protect/Encrypt R Package Code for Distribution