Intrinsics for Cpuid Like Informations

Intrinsics for CPUID like informations?

After some digging I have found a useful built-in functions that is gcc specific.

The only problem is that this kind of functions are really limited ( basically you have only 2 functions, 1 for the CPU "name" and 1 for the set of registers )

an example is

#include <stdio.h>

int main()
{
if (__builtin_cpu_supports("mmx")) {
printf("\nI got MMX !\n");
} else
printf("\nWhat ? MMX ? What is that ?\n");
return (0);
}

and apparently this built-in functions work under mingw-w64 too.

Determine CPUID as listed in the Intel Intrinsics Guide

you can get that information using CPUID instruction, where

The extended family, bit positions 20 through
27 are used in conjunction with the family
code, specified in bit positions 8 through 11, to indicate whether the processor belongs to
the Intel386, Intel486, Pentium, Pentium Pro or Pentium 4 family of processors. P6
family processors include all processors based on the Pentium Pro processor architecture
and have an extended family equal to 00h
and a family code equal to 06h. Pentium 4
family processors include all processors based on the Intel NetBurst® microarchitecture
and have an extended family equal to 00h and a family code equal to 0Fh.

The extended model specified in bit positi
ons 16 through 19, in conjunction with the
model number specified in bits 4 though 7 are
used to identify the model of the processor
within the processor’s family.

see page 22 in Intel Processor Identification and the CPUID Instruction for futher details.

Actual CPUID is then "family_model".
The following code should do the job:

#include "stdio.h"

int main () {

int ebx = 0, ecx = 0, edx = 0, eax = 1;
__asm__ ("cpuid": "=b" (ebx), "=c" (ecx), "=d" (edx), "=a" (eax):"a" (eax));

int model = (eax & 0x0FF) >> 4;
int extended_model = (eax & 0xF0000) >> 12;
int family_code = (eax & 0xF00) >> 8;
int extended_family_code = (eax & 0xFF00000) >> 16;

printf ("%x %x %x %x \n", eax, ebx, ecx, edx);
printf ("CPUID: %02x %x\n", extended_family_code | family_code, extended_model | model);
return 0;
}

For my computer I get:

CPUID: 06_25

hope it helps.

How do applications determine if instruction set is available and use it in case it is?

For the detection part

See Are the xgetbv and CPUID checks sufficient to guarantee AVX2 support? which shows how to detect CPU and OS support for new extensions: cpuid and xgetbv, respectively.

ISA extensions that add new/wider registers that need to be saved/restored on context switch also need to be supported and enabled by the OS, not just the CPU. New instructions like AVX-512 will still fault on a CPU that supports them if the OS hasn't set a control-register bit. (Effectively promising that it knows about them and will save/restore them.) Intel designed things so the failure mode is faulting, not silent corruption of registers on CPU migration, or context switch between two programs using the extension.

Extensions that added new or wider registers are AVX, AVX-512F, and AMX. OSes need to know about them. (AMX is very new, and adds a large amount of state: 8 tile registers T0-T7 of 1KiB each. Apparently OSes need to know about AMX for power-management to work properly.)

OSes don't need to know about AVX2/FMA3 (still YMM0-15), or any of the various AVX-512 extensions which still use k0-k7 and ZMM0-31.

There's no OS-independent way to detect OS support of SSE, but fortunately it's old enough that these days you don't have to. It and SSE2 are baseline for x86-64. Everything up to SSE4.2 uses the same register state (XMM0-15) so OS support for SSE1 is sufficient for user-space to use SSE4.2. SSE1 was new in 1999, with Pentium 3.

Different compilers have different ways of doing CPUID and xgetbv detection. See does gcc's __builtin_cpu_supports check for OS support? - unfortunately no, only CPUID, at least when that was asked. I'd consider that a GCC bug, but IDK if it ever got reported or fixed.



For the optional-use part

Typically setting function pointers to selected versions of some important functions. Inlining through function pointers isn't generally possible, so make sure you choose the boundaries appropriately, like an AVX-512 version of a function that includes a loop, not just a single vector.

GCC's function multi-versioning can automate that for you, transparently compiling multiple versions and hooking some function-pointer setup.

There have been some previous Q&As about this with different compilers, search for "CPU dispatch avx" or something like that, along with other search terms.

See The Effect of Architecture When Using SSE / AVX Intrinisics to understand the difference between GCC/clang's model for intrinsics where you have to enable -march=skylake or whatever, or manually -mavx2, before you can use an intrinsic. vs. MSVC and classic ICC where you could use any intrinsic anywhere, even to emit instructions the compiler wouldn't be able to auto-vectorize with. (Those compilers can't or don't optimize intrinsics much at all, perhaps because that could lead to them getting hoisted out of if(cpu) statements.)

Is there something like x86 cpuid() available for PowerPC?

Note that PowerPC has not dozens of extensions / features like x86. It is required to read specific privileged registers that may depend on cores.

I checked on Linux and you can access PVR, there is a trap in the kernel to manage that.

Reading /proc/cpuinfo can return if Altivec is supported, the memory and L2 cache size ... but that is not really convenient.

A better solution is described here:
http://www.freehackers.org/thomas/2011/05/13/how-to-detect-altivec-availability-on-linuxppc-at-runtime/

That uses the content of /proc/self/auxv that provides "the ELF interpreter information passed to the process at exec time".

The example is about Altivec but you can get other features (listed in include "asm/cputable.h"): 32 or 64 bit cpu, Altivec, SPE, FPU, MMU, 4xx MAC, ...

Last, you will find information on caches (size, line size, associativity, ...), look at files in:
/sys/devices/system/cpu/cpu0/cache

What's the proper way to use different versions of SSE intrinsics in GCC?

I think that the Mystical's tip is fine, but if you really want to do it in the one file, you can use proper pragmas, for instance:

#pragma GCC target("sse4.1")

GCC 4.4 is needed, AFAIR.



Related Topics



Leave a reply



Submit