Why am I able to perform floating point operations inside a Linux kernel module?
I thought you couldn't perform floating point operations in the Linux kernel
You can't safely: failure to use kernel_fpu_begin()
/ kernel_fpu_end()
doesn't mean FPU instructions will fault (not on x86 at least).
Instead it will silently corrupt user-space's FPU state. This is bad; don't do that.
The compiler doesn't know what kernel_fpu_begin()
means, so it can't check / warn about code that compiles to FPU instructions outside of FPU-begin regions.
There may be a debug mode where the kernel does disable SSE, x87, and MMX instructions outside of kernel_fpu_begin
/ end
regions, but that would be slower and isn't done by default.
It is possible, though: setting CR0::TS = 1
makes x87 instructions fault, so lazy FPU context switching is possible, and there are other bits for SSE and AVX.
There are many ways for buggy kernel code to cause serious problems. This is just one of many. In C, you pretty much always know when you're using floating point (unless a typo results in a 1.
constant or something in a context that actually compiles).
Why is the FP architectural state different from integer?
Linux has to save/restore the integer state any time it enters/exits the kernel. All code needs to use integer registers (except for a giant straight-line block of FPU computation that ends with a jmp
instead of a ret
(ret
modifies rsp
).)
But kernel code avoids FPU generally, so Linux leaves the FPU state unsaved on entry from a system call, only saving before an actual context switch to a different user-space process or on kernel_fpu_begin
. Otherwise, it's common to return to the same user-space process on the same core, so FPU state doesn't need to be restored because the kernel didn't touch it. (And this is where corruption would happen if a kernel task actually did modify the FPU state. I think this goes both ways: user-space could also corrupt your FPU state).
The integer state is fairly small, only 16x 64-bit registers + RFLAGS and segment regs. FPU state is more than twice as large even without AVX: 8x 80-bit x87 registers, and 16x XMM or YMM, or 32x ZMM registers (+ MXCSR, and x87 status + control words). Also the MPX bnd0-4
registers are lumped in with "FPU". At this point "FPU state" just means all non-integer registers. On my Skylake, dmesg
says x86/fpu: Enabled xstate features 0x1f, context size is 960 bytes, using 'compacted' format.
See Understanding FPU usage in linux kernel; modern Linux doesn't do lazy FPU context switches by default for context switches (only for kernel/user transitions). (But that article explains what Lazy is.)
Most processes use SSE for copying/zeroing small blocks of memory in compiler-generated code, and most library string/memcpy/memset implementations use SSE/SSE2. Also, hardware supported optimized save/restore is a thing now (xsaveopt
/ xrstor), so "eager" FPU save/restore may actually do less work if some/all FP registers haven't actually been used. e.g. save just the low 128b of YMM registers if they were zeroed with vzeroupper
so the CPU knows they're clean. (And mark that fact with just one bit in the save format.)
With "eager" context switching, FPU instructions stay enabled all the time, so bad kernel code can corrupt them at any time.
GCC [for ARM] force no floating point
It is not just a library issue. Your target will use soft-fp, and the compiler will supply floating point code to implement arithmetic operators regardless of the library.
The solution I generally apply is to scan the map file for instances of the compiler supplied floating-point routines. If your code is "fp clean" there will be no such references. The math library and any other code that perform floating-point arithmetic operations will use these operator implementations, so you only need look for these operator calls and can ignore the Newlib math library functions.
The internal soft-fp routines are listed at https://gcc.gnu.org/onlinedocs/gccint/Soft-float-library-routines.html. It is probably feasible to manually check the mapfile for fp symbols but you might write yourself a script or tool to scan the map file for these names to check your. The cross-reference section of the map file will list all modules these symbols are used in so you can use that to identify where the floating point code is used.
The Newlib stdio functions support floating-point by default. If your formatted I/O is limited to printf()
you can use iprintf()
instead or you can rebuild Newlib with FLOATING_POINT
undefined to remove floating point support from all but scanf()
(no idea why). You can then use the map file technique again to find "banned" formatted I/O functions (although these are likely to also use the floating point operator functions in any case, so you will already have spotted them indirectly).
An alternative is to use an alternative stdio library to override the Newlib versions. There are any number of "tiny printf" implementations available you could use. If you link such a library as object code or list its library ahead of Newlib in the link command, it will override the Newlib versions.
Enabling strict floating point mode in GCC
Compiling with -msse2
on an Intel/AMD processor that supports it will get you almost there. Do not let any library put the FPU in FTZ/DNZ mode, and you will be mostly set (processor bugs notwithstanding).
For other architectures, the answer would be different. Those achitectures that do not offer any convenient way to get exact IEEE 754 semantics (for instance, pre-SSE2 IA32 CPUs) would require the use of a floating-point emulation library to get the result you want, at a very high performance penalty.
If your target architecture supports the fmadd
(multiplication and addition without intermediate rounding) instruction, make sure your compiler does not use it when you have explicit multiplications and additions in the source code. GCC is not supposed to do this unless you use the -ffast-math option.
Third party code is modifying the FPU control word
I think your diagnosis that the component is written in an Embarcadero product is very likely to be true. Delphi's runtime library does indeed enable floating point exceptions, same for C++ Builder.
One of the nice things about Embarcaderos tools is that floating point errors get converted into language exceptions which makes numerical coding a lot easier. That is going to be of little consolation to you!
This entire area is a colossal PITA. There are no rules whatsoever regarding the FP controls word. It's a total free-for-all.
I don't believe that catching unhandled exceptions isn't going to get the job done because the MS C++ runtime will presumably already be catching these exceptions, but I'm no expert in that area and I may be wrong.
I believe that your only realistic solution is to set the FPU to what you want it to be whenever execution arrives in your code, and restore it when execution leaves your code. I don't know enough about COM event sinks to understand why they present an obstacle to doing this.
My product includes a DLL implemented in Delphi and I suffer from the reverse problem. Mostly the clients that call in have an FPU control word that disables exceptions. The strategy we adopt is to remember the 8087CW on entry, set it to the standard Delphi CW before executing code, and then restore it at the exit point. We take care to deal with callbacks too by restoring the caller's 8087CW before making the callback. This is a plain DLL rather than a COM object so it's probably a bit simpler.
If you decide to attempt to get the COM supplier to modify their code then they need to call the Set8087CW()
function.
However, since there are no rules to the game, I believe that the COM object vendor would be justified in refusing to change their code and to put the onus back on you.
Sorry if this is not a 100% conclusive answer, but I couldn't get all these thoughts into a comment!
Related Topics
Changing /Proc/Sys/Kernel/Core_Pattern File Inside Docker Container
How to Use Unicode in Aspell Dictionary
How to Change The Desktop Wallpaper on Linux from Within a Shell/Bash Script
How to Start Linux with Gui Without Monitor
Elk Not Passing Metadata from Filebeat into Logstash
How to Run 16 Bit Code on 32 Bit Linux
Gfortran: Compiling 32-Bit Executable in 64-Bit System
Why Does The Stack Have to Be Page Aligned
Difference Between ./Executable and . Executable
Producer Consumer Implementation in a Block Device Driver
Cython Standalone Executable on Ubuntu
Visual Studio 2017 Could Not Create Directories, Mkdir Exit Code: 1
Version Control for My Web Server
Can a Gnome Application Be Automated? How
Using Source to Include Part of a File in a Bash Script
How to Solve Ssh: /Usr/Lib64/Libcrypto.So.10: No Version Information Available