Deterministic Builds Under Windows

Deterministic builds under Windows

I solved this to an extent.

Currently we have build system that makes sure all new builds are on the path of constant length (builds/001, builds/002, etc), thus avoiding shifts in the PE layout. After build a tool compares old and new binaries ignoring relevant PE fields and other locations with known superficial changes. It also runs some simple heuristics to detect dynamic ignorable changes. Here is full list of things to ignore:

PE timestamp and checksum
Digital signature directory entry
Export table timestamp
Debugger section timestamp
PDB signature, age and file path
Resources timestamp
All file/product versions in VS_VERSION_INFO resource
Digital signature section
MIDL vanity stub for embedded type libraries (contains timestamp string)
__FILE__, __DATE__ and __TIME__ macros when they are used as literal strings (can be wide or narrow char)

Once in a while linker would make some PE sections bigger without throwing anything else out of alignment. Looks like it moves section boundary inside the padding -- it is zeros all around anyway, but because of it I'll get binaries with 1 byte difference.

UPDATE: we recently opensourced the tool on GitHub. See Compare section in documentation.

How to program in Windows 7.0 to make it more deterministic?

Real time solutions for Windows such as LabVIEW Real-time or RTX are expensive; a stand-alone RTOS would often be less expensive (or even free), but if you need Windows functionality as well, you are perhaps no further forward.

If cost is critical, you might run a free or low-cost RTOS in a virtual machine. This can work, though there is no cooperation over hardware access between the RTOS and Windows, and no direct communication mechanism (you could use TCP/IP over a virtual (or real) network I suppose.

Another alternative is to perform the real-time data acquisition on stand-alone hardware (a microcontroller development board or SBC for example) and communicate with Windows via USB or TCP/IP for example. It is possible that way to get timing jitter down to the microsecond level or better.

Deterministic Library Build Using CMake

CMAKE_CXX_ARCHIVE_FINISH worked for me.

CMakeLists.txt :

cmake_minimum_required(VERSION 3.10)
project(Test)
SET(CMAKE_CXX_ARCHIVE_CREATE "<CMAKE_AR> -crD <TARGET> <LINK_FLAGS> <OBJECTS>")
SET(CMAKE_CXX_ARCHIVE_APPEND "<CMAKE_AR> -rD <TARGET> <LINK_FLAGS> <OBJECTS>")
SET(CMAKE_CXX_ARCHIVE_FINISH "<CMAKE_RANLIB> -D <TARGET>")
add_library(Test Main.cpp)

Is BCryptGetProperty call deterministic?

Yes, the size of a SHA256 hash is always the same. Getting the size by asking the crypto provider is useful if you are working at a higher level.

Imagine you have a generic hash class:

class Hash {
  bool Init(LPCWSTR pszAlgId) { BCryptGetProperty(m_AlgoProvider, BCRYPT_OBJECT_LENGTH, ...); m_data = malloc(); ... BCryptCreateHash(..., pszAlgId, m_data, ...) ... }
  void AddData(LPCVOID p, SIZE_T cb) { ... }
  DWORD GetHashSize() { BCryptGetProperty(m_HashObj, BCRYPT_HASH_LENGTH, ...); }
  bool Finalize(LPVOID pHash) { ... }
};

The class does not know the hash algorithm nor the hash size at compile time.

BCRYPT_OBJECT_LENGTH is the size of the internal data used by the hashing function. It is the same for all hashes of a specific type implemented by a specific crypto provider. If you only support Windows 7 and later you can ask Windows to allocate this memory for you and you don't have to query the object size.

I believe all BCRYPT properties are deterministic after the crypto object has been properly created/initialized and you can cache obvious constant fields like sizes and modes. Things like BCRYPT_INITIALIZATION_VECTOR are obviously a per-object property and should only be cached for that specific object.

Deterministic python script behaves in non-deterministic way

In general, linalg libraries on Windows give different answers on different runs at machine precision level. I never heard of an explanation why this happens only or mainly on Windows.

If your matrix is ill conditioned, then the inv will be largely numerical noise. On Windows the noise is not always the same in consecutive runs, on other operating systems the noise might be always the same but can differ depending on the details of the linear algebra library, on threading options, cache usage and so on.

I've seen on and posted to the scipy mailing list several examples for this on Windows, I was using mostly the official 32 bit binaries with ATLAS BLAS/LAPACK.

The only solution is to make the outcome of your calculation not depend so much on floating point precision issues and numerical noise, for example regularize the matrix inverse, use generalized inverse, pinv, reparameterize or similar.

Deterministic Builds Under Windows