Interpreting Openssl Speed Output for Rsa with Multi Option

Interpreting openssl speed output for rsa with multi option

The code for the speed test is in <openssl>/apps/speed.c.

-multi is a switch for multiple benchmarks in parallel, not multiplications (to remove all confusion). See the comments around line 1145:

#ifndef NO_FORK
    BIO_printf(bio_err,"-multi n        run n benchmarks in parallel.\n");
#endif

What does the column sign and verify mean?

Sign and verify do just what they say. They time a signing operation and a verify operation with different RSA moduli.

Sign/s and Verify/s are the inversions of Sign and Verify. I.e., 1/0.000008s => 125,000 signs per second.

Here's the code to print the report you are seeing. It starts around line 2450:

#ifndef OPENSSL_NO_RSA
    j=1;
    for (k=0; k<RSA_NUM; k++)
        {
        if (!rsa_doit[k]) continue;
        if (j && !mr)
            {
            printf("%18ssign    verify    sign/s verify/s\n"," ");
            j=0;
            }
        if(mr)
            fprintf(stdout,"+F2:%u:%u:%f:%f\n",
                k,rsa_bits[k],rsa_results[k][0],
                rsa_results[k][1]);
        else
            fprintf(stdout,"rsa %4u bits %8.6fs %8.6fs %8.1f %8.1f\n",
                rsa_bits[k],rsa_results[k][0],rsa_results[k][1],
                1.0/rsa_results[k][0],1.0/rsa_results[k][1]);
        }
#endif

Finding the code to perform the sign and verify is left as an exercise to the reader ;)

have an Intel(R) Xeon(R) CPU E5-1650 v2 @ 3.50GHz

Just bike shedding, but be sure to config with enable-ec_nistp_64_gcc_128 if you are using a modern GCC. Using ec_nistp_64_gcc_128 will speed up some operations, such as DH operation, by 2x or 4x.

You need a modern GCC for the __uint128_t. Configure cannot determine if the compiler supports __uint128_t on its own, so it leaves ec_nistp_64_gcc_128 disabled.

openssl speed rsa less performant on (normally) better cpu

Edit:

As stated by @Alexei Khlebnikov, the openssl speed rsa command only measures the speed of the rsa sign/verify functions, and these don't use random numbers. Because of that, my original answer doesn't answer the question.

After a quick search, I found that the 1st server has bmi2 and adx instructions, while the 2nd server doesn't. These instructions are used to improve the performance of
Montgomery’s integer multiplication/squaring, that are used in the RSA signing operations. It's hard to confirm that's the reason for the performance difference, but it can be one of the reasons.

Original answer:

To generate RSA keys you need random and large prime numbers. The process to find a random and large prime number consists in:

Generate a random number;
Check if it's prime;
If it's not, repeat.

As you can see, this involves a lot of RNG, and generating good RNG is really slow. So, having a faster RNG means a faster RSA key generation.

How can I interpret openssl speed output?

While it could probably be worded better, it pretty much means what it says - run the md4 hash routine in a loop for 3 seconds with a 16 byte input. After 3 seconds, observe that we ran just a bit over 9 million iterations. That's about 144 million bytes processed, or 48 million bytes per second (where "million" means 10^6).

Fully utilizing HW accelerator

According to Interpreting openssl speed output for rsa with multi option , -multi doesn't "parallelize" work or something, it just runs multiple benchmarks in parallel.

So, your HW card's load will be essentially limited by how much work is available at the moment (note that in industry in general, 80% planned capacity load is traditionally considered optimal in case of load spikes). Of course, running multiple server threads/processes will give you the same effect as multiple benchmarks.

OpenSSL supports multiple threads provided that you give it callbacks to lock shared data. For multiple processes, it warns about reusing data state inherited from parent.

That's it for scaling vertically. For scaling horizontally:

openssl supports asynchronous I/O through asynchronous BIOs
but, its elemental crypto operations and internal ENGINE calls are synchronous, and changing this would require a logic overhaul
private efforts to make them provide asynchronous operation have met severe criticism due to major design flaws

Intel announced some "Asynchronous OpenSSL" project (08.2014) to use with its hardware, but the linked white paper gives little details about its implementation and development state. One developer published some related code (10.2015), noting that it's "stable enough to get an overview".

Signing 20-byte message with 256-bit RSA key working with openssl.exe but not in code

Guess I found the solution. openssl rsautl -sign uses RSA_private_encrypt instead of RSA_sign (what I would have expected). RSA_sign creates a longer structure than the 20-bytes message I provided, so it fails with the given error.

openssl smime: aes vs rsa - which one encrypts?

Using the openssl smime command to encrypt data encrypts it in such a way that it can be received and decrypted by a recipient using email. This means that it uses certificates representing various entities (sender, recipient) and containing keys uniquely associated with those entities to both sign, encrypt, decrypt, and verify signatures on the contained data.

As Artjom pointed out in their comment, a hybrid cryptosystem is used. This means a combination of symmetric and asymmetric encryption algorithms are used, as each has benefits and drawbacks. Symmetric encryption (such as AES) is extremely fast in comparison to asymmetric encryption (such as RSA). However, AES/CBC provides only confidentiality, not integrity, authentication, or non-repudiation. Asymmetric encryption can provide integrity, authentication, and non-repudiation. Asymmetric encryption also has fairly low limits for the amount of data it can encrypt (for example, an RSA-4096 bit key can encrypt at most 446 bytes at once). By combining both methods, we can exercise each for their strengths and mitigate the weaknesses of the other.

In this case, let our message be M. An AES/CBC key Kaes of length 256 bits is generated and used to encrypt the raw data such that the cipher text C is C = E(Kaes, M). The recipient's public key Krpub is then used to encrypt Kaes as C' = E(Krpub, Kaes). The cost of encrypting this small amount of data is relatively low, even though we are using an expensive operation.

Note prior to encrypting the ephemeral session key, it is likely signed using the sender's private key Kspriv unless digital signing is disabled. I am not 100% certain with S/MIME, but this is how PGP works, as when sending to multiple recipients, it would be much more expensive to encrypt the session key with n recipient's public keys and then sign each encrypted key with the sender's private key O(2n) vs. O(n+1) (not really Big-O notation, but effective for communicating the point). The signature of the session key is the same regardless of recipient because it depends only on the sender's private key.

Now all that is left is to concatenate the encrypted session key C' and the cipher text C and transmit them to the recipient. In a real S/MIME message, the identifying information about the sender's public key is transmitted as well in order to facilitate signature verification. The recipient parses the complete encrypted message, decrypts the session key using the recipient's private key Krpriv and then uses the session key Kaes to decrypt the cipher text C.

However, it seems like all of this may be overkill for your situation. If you do not need to integrate with an email communication system and are simply trying to encrypt large files for storage, openssl also offers a simple enc command. You can use it as follows:

Password-based encryption:
$ openssl enc -aes-256-cbc -in large_plain_file.dmg -out large_encrypted_file.enc -k thisIsABadPassword

-k allows you to enter a password on the command line from which the key will be derived. OpenSSL uses a deterministic key derivation function (KDF) (effectively key, iv = MD5(password||salt)). You can run this command with -p on the end to get the key, salt, and IV printed to the console. The IV is deterministically derived from the password and salt. A KDF like PBKDF2 is recommended but the library function in OpenSSL does not expose it to the command-line tool for some reason.

Warning you can run this and specify -nosalt to skip the salt generation, but this is not recommended as it makes an already extremely weak KDF even weaker.

Keyed encryption:
$ openssl enc -aes-256-cbc -in large_plain_file.dmg -out large_encrypted_file.enc K 0123456789ABCDEFFEDCBA98765432100123456789ABCDEFFEDCBA9876543210 -IV 0123456789ABCDEFFEDCBA9876543210

Running with the actual key and IV provided. The key is 32 bytes (256 bits) and the IV is 16 bytes (the size of one AES block).

To decrypt the data, run the command with the -d flag:

$openssl enc -aes-256-cbc -d -in large_encrypted_file.dmg -out large_decrypted_file.enc

This is going to be simpler for you (no certificates/key pairs needed) and faster (no RSA encryption).

Something to keep in mind for all of this is that the password/key will be kept in terminal history if you provide it in the command invocation. Running without -k or -K will prompt the password to be entered in a secure prompt. Or use -pass to read from an environment variable, file, or file descriptor.

Update to address original questions explicitly:

The AES key is responsible for encrypting the data. The RSA key is used to encrypt the AES key.
Both impact the performance. The AES key is encrypting much more data, but is much faster than RSA encryption.
- Yes, changing the size of the RSA key impacts performance, (higher key sizes are slower), but for a large file this is unlikely to be the limiting factor on performance.
- Yes, changing the AES mode of operation may have a substantial impact on performance. AES/CBC, AES/OFB, and AES/CFB are similar in that they are serial operations (depending on the result of one block operation to proceed to the next), but AES/CTR operates as a stream cipher and can be parallelized to provide both encryption and decryption.
See answer above.

Interpreting Openssl Speed Output for Rsa with Multi Option