Linux of Equivalent Cryptprotectmemory

Linux of equivalent CryptProtectMemory

The closest equivalent in Linux to CryptProtectMemory() is gcry_malloc_secure() in libgcrypt. The secure memory allocated will be locked in memory; gcry_free() will zeroize and deallocate it. Other crypto libraries have similar calls, for instance the template secure_vector in Botan.

Another approach is indeed to use the lower-level POSIX call mlock() on the whole buffer. The burden of zeroizing the buffer is with you though. You must manually call memset()) when the buffer is not used anymore or when your program terminates.

CryptProtectMemory() seems to do something slightly different than any of the two approaches above: it creates a small, random session key and uses it to encrypt the buffer. The benefit is that you only need to lock and finally zeroize only the very small page where the key resides, and not the whole buffer. That may make a difference if the buffer is very big. However, we will not be able to operate or process data in the buffer. There is also a small time window when secret data is swappable.

Overwriting data in memory

TL;DR;

You shall not store any secret information in OCaml heap. Thus you must never copy your secret into any OCaml heap-allocated value, consequently, neither Bytes, nor Strings or Arrays could be used, even temporary.

Introduction to the OCaml Memory Model

OCaml values are uniformly represented as tagged machine words. The least significant bit of a word is used as a tag, that distinguishes between pointers (tag=0) and immediate values (tag=1). Thus a value has always a fixed size, and is a pointer or an immediate.

Immediate values store their data in the most significant part of the word, that is 31-bits in 32 bit systems, and 63 bits in 64-bit systems. Pointers store their data in blocks, that are located in a so-called OCaml Heap. The OCaml Heap is a set of blocks managed by the Garbage Collector (GC). A block is a chunk of data prefixed with a header. The header specifies the size of data, and some other meta information, used by the GC. Block can contain OCaml values (pointers or immediate values) or opaque data.

To summarize. All OCaml values are represented as machine words, that either store data directly in the word or are pointers to heap-allocated blocks. Each pointer points to one and only one block. Several pointers may point to the same block. Such values are considered physically equal. Some blocks are not pointed by any pointers. Such blocks are called dead and are reclaimed by the GC.

Introduction to the OCaml Garbage Collector

The GC manages blocks, by allocating, moving, and deallocating them. The GC itself uses an arena, that is either obtained from the C memory allocator (malloc) or directly from a kernel via the memmap syscall (depends on a particular system and runtime).

The GC is generational, that means that values are first allocated in a special region of a heap called minor heap. The minor heap is a contiguous region of memory of fixed size, represented in the runtime with three pointers: the pointer beg to the beginning of the minor heap, a pointer end to the end of the minor heap, and the pointer cur to the beginning of the free area of the minor heap. When a block is allocated, cur is increased by the size of the block. Then the block is initialized with data. When there is no more free space in the minor heap (i.e., then end - cur is less than the required block size), then a minor GC cycle is triggered. The GC analyzes all blocks stored in the Minor Heap and copies all blocks that are referenced by at least one pointer to the Major Heap. After that, the cur pointer is set to beg.

In the Major Heap, a block may also be copied several times during a process called compaction. The compactor may try to rearrange blocks in its arena in order to achieve more compact representation of the heap.

Security Consequences

As the OCaml GC is a moving GC, it may copy the heap-allocated data arbitrarily. Although it is called moving, it is still in fact just copying. I.e., when a block is moved from the minor heap to the major heap, it is in fact just bit-copied, and thus is duplicated. The block phantom in the minor heap may live for an arbitrary amount of time until it is overwritten by some newly allocated value. When an object is moved during the compaction, it is also copied, and may or may not be overwritten during the process. And, of course, it goes without saying, that once a block becomes dead, it still may survive in a heap for an arbitrary amount of time until reused by the GC.

That all means, that if a secret ends up in the OCaml heap, it will go wild, as the GC can replicate it multiple times in an arbitrary and rather unpredictable ways. Thus, we can only store secrets either in immediate values or in regions that are not controlled by the GC. As it was said before, all OCaml values that are pointers, always point to a block in the OCaml heap. A block may contain data directly, or it could contain a pointer itself, that will point outside the memory heap. The so-called custom blocks, may or may not store their information in the OCaml heap, it depeds on a particular representation of each custom block. For example, the Bigarray library provides custom blocks that store their payload outside of the OCaml heap. Thus a Bigarray is a custom block, that has two fields: a pointer and size. It is an opaque block, i.e., the GC will never treat these two values as OCaml values, and will never follow neither the size nor the pointer. The data pointed by a pointer is located outside of the OCaml heap, and is either allocated by malloc or by memmap (in fact, it could be arbitrary integer, and even point to stack, or static data, it doesn't really matter, as long as we treat bigarrays just as a ptr,len pair).

This all makes Bigarrays ideal for storing secrets. We can be sure, that they are not moved by the GC, we can overwrite them to prevent the information leakage once they are freed.

Further considerations

We should be careful, and never allow a secret to be copied into the OCaml heap from our safe place. That means, that even if our main storage is a safe bigarray the information will still leak if we will copy its contents to an OCaml string. Consequently, if we first read the information into OCaml string, and then copy it into bigarray, the information will still leak. Thus, any interface that uses OCaml heap-allocated values is unsafe and shall not be used. For example, we can't use OCaml channels to read or write secrets (we should rely on memory mapping or unbuffered IO provided by the Unix module). And again, whenever you get a string data type from a Bigarray, you get your data copied, with all the ramifications.

Does VirtualLock really do what it say?

If you need security you realistically have two options:

Move to kernel mode
Encrypt the data in memory using CryptProtectMemory

The problem is that even if you lock the ram in place other processes can read it if they have the right access rights. There are ways using process isolation to potentially prevent this but they require access to an SDK that MS limits to DRM developers.

Remember when you're done with the memory you'll need to clear it with SecureZeroMemory which the compiler will not optimize away.

Linux of Equivalent Cryptprotectmemory