Use of Floating Point in the Linux Kernel

Use of floating point in the Linux kernel

Because...

  • many programs don't use floating point or don't use it on any given time slice; and
  • saving the FPU registers and other FPU state takes time; therefore

...an OS kernel may simply turn the FPU off. Presto, no state to save and restore, and therefore faster context-switching. (This is what mode meant, it just meant that the FPU was enabled.)

If a program attempts an FPU op, the program will trap into the kernel, the kernel will turn the FPU on, restore any saved state that may already exist, and then return to re-execute the FPU op.

At context switch time, it knows to actually go through the state save logic. (And then it may turn the FPU off again.)

By the way, I believe the book's explanation for the reason kernels (and not just Linux) avoid FPU ops is ... not perfectly accurate.1

The kernel can trap into itself and does so for many things. (Timers, page faults, device interrupts, others.) The real reason is that the kernel doesn't particularly need FPU ops and also needs to run on architectures without an FPU at all. Therefore, it simply avoids the complexity and runtime required to manage its own FPU context by not doing ops for which there are always other software solutions.

It's interesting to note how often the FPU state would have to be saved if the kernel wanted to use FP . . . every system call, every interrupt, every switch between kernel threads. Even if there was a need for occasional kernel FP,2 it would probably be faster to do it in software.



1. That is, dead wrong.

2. There are a few cases I know about where kernel software contains a floating point arithmetic implementation. Some architectures implement traditional FPU ops in hardware but leave some complex IEEE FP operations to software. (Think: denormal arithmetic.) When some odd IEEE corner case happens they trap to software which contains a pedantically correct emulation of the ops that can trap.

Floating point operations in linux kernel module (again)

You can just use integer arithmetic, e.g.

int perc = 100 * v1 / v2;

This will give an integer percentage. If you need higher resolution than 1% then use a scale factor larger than 100 and insert a decimal point for display purposes as required.

Why am I able to perform floating point operations inside a Linux kernel module?

I thought you couldn't perform floating point operations in the Linux kernel

You can't safely: failure to use kernel_fpu_begin() / kernel_fpu_end() doesn't mean FPU instructions will fault (not on x86 at least).

Instead it will silently corrupt user-space's FPU state. This is bad; don't do that.

The compiler doesn't know what kernel_fpu_begin() means, so it can't check / warn about code that compiles to FPU instructions outside of FPU-begin regions.

There may be a debug mode where the kernel does disable SSE, x87, and MMX instructions outside of kernel_fpu_begin / end regions, but that would be slower and isn't done by default.

It is possible, though: setting CR0::TS = 1 makes x87 instructions fault, so lazy FPU context switching is possible, and there are other bits for SSE and AVX.


There are many ways for buggy kernel code to cause serious problems. This is just one of many. In C, you pretty much always know when you're using floating point (unless a typo results in a 1. constant or something in a context that actually compiles).


Why is the FP architectural state different from integer?

Linux has to save/restore the integer state any time it enters/exits the kernel. All code needs to use integer registers (except for a giant straight-line block of FPU computation that ends with a jmp instead of a ret (ret modifies rsp).)

But kernel code avoids FPU generally, so Linux leaves the FPU state unsaved on entry from a system call, only saving before an actual context switch to a different user-space process or on kernel_fpu_begin. Otherwise, it's common to return to the same user-space process on the same core, so FPU state doesn't need to be restored because the kernel didn't touch it. (And this is where corruption would happen if a kernel task actually did modify the FPU state. I think this goes both ways: user-space could also corrupt your FPU state).

The integer state is fairly small, only 16x 64-bit registers + RFLAGS and segment regs. FPU state is more than twice as large even without AVX: 8x 80-bit x87 registers, and 16x XMM or YMM, or 32x ZMM registers (+ MXCSR, and x87 status + control words). Also the MPX bnd0-4 registers are lumped in with "FPU". At this point "FPU state" just means all non-integer registers. On my Skylake, dmesg says x86/fpu: Enabled xstate features 0x1f, context size is 960 bytes, using 'compacted' format.

See Understanding FPU usage in linux kernel; modern Linux doesn't do lazy FPU context switches by default for context switches (only for kernel/user transitions). (But that article explains what Lazy is.)

Most processes use SSE for copying/zeroing small blocks of memory in compiler-generated code, and most library string/memcpy/memset implementations use SSE/SSE2. Also, hardware supported optimized save/restore is a thing now (xsaveopt / xrstor), so "eager" FPU save/restore may actually do less work if some/all FP registers haven't actually been used. e.g. save just the low 128b of YMM registers if they were zeroed with vzeroupper so the CPU knows they're clean. (And mark that fact with just one bit in the save format.)

With "eager" context switching, FPU instructions stay enabled all the time, so bad kernel code can corrupt them at any time.

What are coding conventions for using floating-point in Linux device drivers?

Short answer: Kernel code can use floating point if this use is surrounded by kernel_fpu_begin()/kernel_fpu_end(). These function handle saving and restoring the fpu context. Also, they call preempt_disable()/preempt_enable(), which means no sleeping, page faults etc. in the code between those functions. Google the function names for more information.

If I understand correctly, whenever a
KM is running, it is using a hardware
context (or hardware thread or
register set -- whatever you want to
call it) that has been preempted from
some application thread.

No, a kernel module can run in user context as well (eg. when userspace calls syscalls on a device provided by the KM). It has, however, no relation to the float issue.

If you write your KM in c, the
compiler will correctly insure that
the general-purpose registers are
properly saved and restored (much as
in an application), but that doesn't
automatically happen with
floating-point registers.

That is not because of the compiler, but because of the kernel context-switching code.

Why it throws Floating Point Exception if I divide a floating number by zero?

Dividing by 0 does not necessarily result in infinity. There's a good numberphile video that goes into this.

More importantly here, the IEEE 754 floating point standard (which is what most languages/cpus use) dictates that dividing by 0 should result in NaN, and many programming languages just turn this into an error.

This is not linux specific. I don't even think Linux itself can raise something called an exception, so this must be a higher-level language thing.

Floating-point constant without using floating-point registers (Linux kernel module)

How about using something like this:

union val {
float fval;
int ival;
};

static const union val my_val1 = { .fval = 3.8 * 0.98 / 1000.0 };

int *vp = whatever;
*vp = my_val1.ival;

The use of static const ought to be enough to prevent floating-point calculations at run-time.

Overhead of supporting Floating Point Arithmetic inside the Linux Kernel

The usual answer is that if the kernel does not use floating point, it does not have to save the floating-point registers on entry to the kernel or restore them on exit. This shaves several hundred cycles off the cost of all system calls.

I do not know if anyone has tried to compare this savings against the performance improvements that might be available if the kernel could make indiscriminate use of those registers. Note that you can use them in the kernel if you take proper care, and this is done in contexts where tremendous speed benefits are available, e.g. using SSE instructions to accelerate memcpy and the like. (Look for calls to kernel_fpu_begin in the Linux sources.)



Related Topics



Leave a reply



Submit