How to Check If a Cpu Supports the Sse3 Instruction Set

How to check if a CPU supports the SSE3 instruction set?

I've created a GitHub repro that will detect CPU and OS support for all the major x86 ISA extensions: https://github.com/Mysticial/FeatureDetector

Here's a shorter version:


First you need to access the CPUID instruction:

#ifdef _WIN32

// Windows
#define cpuid(info, x) __cpuidex(info, x, 0)

#else

// GCC Intrinsics
#include <cpuid.h>
void cpuid(int info[4], int InfoType){
__cpuid_count(InfoType, 0, info[0], info[1], info[2], info[3]);
}

#endif

Then you can run the following code:

//  Misc.
bool HW_MMX;
bool HW_x64;
bool HW_ABM; // Advanced Bit Manipulation
bool HW_RDRAND;
bool HW_BMI1;
bool HW_BMI2;
bool HW_ADX;
bool HW_PREFETCHWT1;

// SIMD: 128-bit
bool HW_SSE;
bool HW_SSE2;
bool HW_SSE3;
bool HW_SSSE3;
bool HW_SSE41;
bool HW_SSE42;
bool HW_SSE4a;
bool HW_AES;
bool HW_SHA;

// SIMD: 256-bit
bool HW_AVX;
bool HW_XOP;
bool HW_FMA3;
bool HW_FMA4;
bool HW_AVX2;

// SIMD: 512-bit
bool HW_AVX512F; // AVX512 Foundation
bool HW_AVX512CD; // AVX512 Conflict Detection
bool HW_AVX512PF; // AVX512 Prefetch
bool HW_AVX512ER; // AVX512 Exponential + Reciprocal
bool HW_AVX512VL; // AVX512 Vector Length Extensions
bool HW_AVX512BW; // AVX512 Byte + Word
bool HW_AVX512DQ; // AVX512 Doubleword + Quadword
bool HW_AVX512IFMA; // AVX512 Integer 52-bit Fused Multiply-Add
bool HW_AVX512VBMI; // AVX512 Vector Byte Manipulation Instructions

int info[4];
cpuid(info, 0);
int nIds = info[0];

cpuid(info, 0x80000000);
unsigned nExIds = info[0];

// Detect Features
if (nIds >= 0x00000001){
cpuid(info,0x00000001);
HW_MMX = (info[3] & ((int)1 << 23)) != 0;
HW_SSE = (info[3] & ((int)1 << 25)) != 0;
HW_SSE2 = (info[3] & ((int)1 << 26)) != 0;
HW_SSE3 = (info[2] & ((int)1 << 0)) != 0;

HW_SSSE3 = (info[2] & ((int)1 << 9)) != 0;
HW_SSE41 = (info[2] & ((int)1 << 19)) != 0;
HW_SSE42 = (info[2] & ((int)1 << 20)) != 0;
HW_AES = (info[2] & ((int)1 << 25)) != 0;

HW_AVX = (info[2] & ((int)1 << 28)) != 0;
HW_FMA3 = (info[2] & ((int)1 << 12)) != 0;

HW_RDRAND = (info[2] & ((int)1 << 30)) != 0;
}
if (nIds >= 0x00000007){
cpuid(info,0x00000007);
HW_AVX2 = (info[1] & ((int)1 << 5)) != 0;

HW_BMI1 = (info[1] & ((int)1 << 3)) != 0;
HW_BMI2 = (info[1] & ((int)1 << 8)) != 0;
HW_ADX = (info[1] & ((int)1 << 19)) != 0;
HW_SHA = (info[1] & ((int)1 << 29)) != 0;
HW_PREFETCHWT1 = (info[2] & ((int)1 << 0)) != 0;

HW_AVX512F = (info[1] & ((int)1 << 16)) != 0;
HW_AVX512CD = (info[1] & ((int)1 << 28)) != 0;
HW_AVX512PF = (info[1] & ((int)1 << 26)) != 0;
HW_AVX512ER = (info[1] & ((int)1 << 27)) != 0;
HW_AVX512VL = (info[1] & ((int)1 << 31)) != 0;
HW_AVX512BW = (info[1] & ((int)1 << 30)) != 0;
HW_AVX512DQ = (info[1] & ((int)1 << 17)) != 0;
HW_AVX512IFMA = (info[1] & ((int)1 << 21)) != 0;
HW_AVX512VBMI = (info[2] & ((int)1 << 1)) != 0;
}
if (nExIds >= 0x80000001){
cpuid(info,0x80000001);
HW_x64 = (info[3] & ((int)1 << 29)) != 0;
HW_ABM = (info[2] & ((int)1 << 5)) != 0;
HW_SSE4a = (info[2] & ((int)1 << 6)) != 0;
HW_FMA4 = (info[2] & ((int)1 << 16)) != 0;
HW_XOP = (info[2] & ((int)1 << 11)) != 0;
}

Note that this only detects whether the CPU supports the instructions. To actually run them, you also need to have operating system support.

Specifically, operating system support is required for:

  • x64 instructions. (You need a 64-bit OS.)
  • Instructions that use the (AVX) 256-bit ymm registers. See Andy Lutomirski's answer for how to detect this.
  • Instructions that use the (AVX512) 512-bit zmm and mask registers. Detecting OS support for AVX512 is the same as with AVX, but using the flag 0xe6 instead of 0x6.

Checking if SSE is supported at runtime

GCC has a way of doing this that starts by calling __builtin_cpu_init then calling __builtin_cpu_is and __builtin_cpu_supports to check features. https://gcc.gnu.org/onlinedocs/gcc-4.8.2/gcc/X86-Built-in-Functions.html

On x86, when using the C++ frontend, GCC supports "function multiversioning", which allows you to write multiple versions of the function, specify the target it should be used on, and let GCC take care of making sure it is called. https://gcc.gnu.org/onlinedocs/gcc-4.9.0/gcc/Function-Multiversioning.html

Does a processor that supports SSE4 support SSSE3 instructions?

This answer is for Intel processors only.

First, all Intel Atom processors from the earliest ones to the most recent ones support SSSE3. Section 1.2.14 of the Intel manual states:

The initial Intel Atom Processor family and subsequent generations including Intel Atom
processor D2000, N2000, E2000, Z2000, C1000 series provide the
following features:

  • ...
  • Support for instruction set extensions up to and including Supplemental Streaming SIMD Extensions 3 (SSSE3).
  • ...

And also Table 5-1 of the manual states:

SSSE3 Extensions: Intel Xeon processor 3xxx, 5100, 5200, 5300, 5400,
5500, 5600, 7300, 7400, 7500 series, Intel Core 2 Extreme processors
QX6000 series, Intel Core 2 Duo, Intel Core 2 Quad processors, Intel
Pentium Dual-Core processors, Intel Atom processors.

This is also consistent with Wikipedia.

I'm not sure whether the manual states explicitly that if SSE4 is supported then SSSE3 is supported, but we can derive that.

Section 12.7.3 discusses how to check for SSSE3 support:

Before an application attempts to use the SSSE3 extensions, the
application should follow the steps illustrated in Section 11.6.2,
“Checking for SSE/SSE2 Support.” Next, use the additional step
provided below:

  • Check that the processor supports SSSE3 (if CPUID.01H:ECX.SSSE3[bit 9] = 1).

also Section 12.12.2 discusses how to check for SSE4.1 support:

Check that the processor supports SSE4.1 (if CPUID.01H:ECX.SSE4_1[bit
19] = 1), SSE3 (if CPUID.01H:ECX.SSE3[bit 0] = 1), and SSSE3 (if
CPUID.01H:ECX.SSSE3[bit 9] = 1)
.

and finally Section 12.12.3 discusses how to check for SSE4.2 support:

Check that the processor supports SSE4.2 (if CPUID.01H:ECX.SSE4_2[bit
20] = 1), SSE4.1 (if CPUID.01H:ECX.SSE4_1[bit 19] = 1), and SSSE3 (if
CPUID.01H:ECX.SSSE3[bit 9] = 1)
.

As you can see, both SSE4.1 and SSE4.1 require support for SSSE3. We can also conclude that SSSE3 requires support for SSE2.

CAVEAT: Most likely this will continue to hold in the future, although it's hard to be sure 100%.

One interesting observation though is when comparing the list of processors that support SSSE3 against the lists of those that support SSE4.1 and SSE4.2 (Table 5-1 and Table 5-2), there is only one processor, Intel Core i7 965 processor, that is in the list of SSE4.2 but not SSSE3, yet the processor actually supports SSSE3. Not sure whether this is an error in the manual, or more horrifyingly, it's incomplete.

Another interesting observation is that, for processors other than Atom (see the quote from Section 1.2.14 above), it seems that support for SSSE3, SSE4.1, or SSE4.2 does not necessarily mean that SSE3 is supported. I didn't find anything in the manual that enables me to make that conclusion. At the same time, I don't know of any processor that supports SSSE3, SSE4.1, or SSE4.2, but not SSE3.

Section 12.1.1 specifies which registers are available in SSSE3:

In compatibility mode, SSE3, SSSE3, and SSE4 function like they do in
protected mode. In 64-bit mode, eight additional XMM registers are
accessible. Registers XMM8-XMM15 are accessed by using REX prefixes.

Section 12.7.1 discusses OS support for SSSE3:

Ensure that your operating system supports SSE/SSE2/SSE3/SSSE3
extensions. (Operating system support for the SSE extensions implies
sufficient support for SSE2, SSE3, and SSSE3.
)

So any OS that requires SSE automatically supports SSSE3.

Most recent processor without support of SSSE3 instructions?

The most recent CPUs without SSSE3 are based on the AMD K10 microarchitecture:

  • AMD Phenom II, the last-generation K10 socketed desktop CPUs before Bulldozer-family. They were produced from 2008 to 2012.
  • AMD Llano APUs, introduced June 2011. (Bulldozer-based APUs were introduced Oct 2012, IDK when the last Llano APUs were made / sold). Also based on K10 cores, but reporting CPUID "family" = 12h.

K10 CPUs support SSE3 (FP instructions like movddup and haddps), and AMD-only SSE4a. Some early K8 cores only have SSE2, but later K8 also had SSE3.

Notice that AMD CPUs listed in https://en.wikipedia.org/wiki/SSSE3#CPUs_with_SSSE3 only start at Bulldozer, but do include AMD's low-power Bobcat / Jaguar CPUs.

If you google AMD Phenom II ssse3, you'll find some pages about some games removing an SSSE3 requirement so they can work on Phenom II.


On Intel you have to go back as far as Pentium M / Core, because SSSE3 was introduced with Core 2. (First-gen core2 (Conroe/Merom) only has 64-bit wide shuffle execution units, so pshufb is relatively slow. But so is SSE2 pshufd. See Fastest way to do horizontal float vector sum on x86.)

I think even first-gen Atom has SSSE3. https://en.wikipedia.org/wiki/Intel_Atom.

There are CPUs like AMD Geode that don't have SSE at all, but I think the point of the question is CPUs that do have SSE2/3 but not SSSE3.


There are no new mainstream CPUs being made that don't have SSE4.2, but some Phenom II CPUs are probably still in use even in 2018. The older they are, the more it's expected that new software might not work on them.

There are unfortunately still brand-new mainstream CPUs being made without AVX and BMI: Intel's Pentium and Celeron models, even for Skylake / Kaby Lake. Presumably when a die has defects in the upper 128-bits of its vector ALUs, e.g. the large FMA units, they fuse it off and disable decoding of VEX prefixes, and label it as a Pentium or Celeron1. (This is presumably why Pentium/Celeron models don't support BMI1/BMI2 either; other than pext/pdep those take trivial die area.)

So we're not getting any closer to BMI1/BMI2 being baseline at some point in the future, which is really unfortunate because it's required for single-uop variable-count shifts on Intel CPUs. (shl cl,reg is 3 uops because of the cl=0 no-flag-update case being possible; SHLX / SHRX are 1 uop). BMI1/2 is most useful when used throughout your whole code, not just in a couple functions.


Footnote 1: Certainly some fully-working chips get this treatment, too, especially once yields improve for a new process, but for consistency / market-segmentation they're still crippled.

But I think rep movs/rep stos ERMSB still work with 256-bit loads/stores, so the FP register file, load/store units, and bypass forwarding network would all still need to support full width. (And ERMSB becomes much more attractive vs. vector loops because it can use twice the width.

I wonder if there's a way for the CPU to be rewired with fuses so it can use any 2 of the 4 128-bit lanes of FMA units that are working. We know Skylake-AVX512 can mix and match FMA units with ports 0, 1, and 5, only powering up the p5 FMA (if available) for 512-bit vectors, and combining the 256-bit FMA units on p0 and p1 as one 512-bit FMA unit. Statically doing something like that with fuses could let Intel use chips that had a defect affecting both lanes of what would have been one FMA unit.

Anyway, this is pure guesswork. It's likely, but don't know if we have any reliable source that Intel actually ever did this as a way to sell chips with FMA defects. We do know that chips with defects in a whole physical core get sold as lower core-count SKUs, like a dual-core chip from a quad-core die. And that quad-core i5 CPUs with only 6MB of L3 cache instead of 8MB means they have one of their 4 slices of L3 cache disabled, again probably for salvaging defects.

How to check if compiled code uses SSE and AVX instructions?

Under Linux, you could also decompile your binary:

objdump -d YOURFILE > YOURFILE.asm

Then find all SSE instructions:

awk '/[ \t](addps|addss|andnps|andps|cmpps|cmpss|comiss|cvtpi2ps|cvtps2pi|cvtsi2ss|cvtss2s|cvttps2pi|cvttss2si|divps|divss|ldmxcsr|maxps|maxss|minps|minss|movaps|movhlps|movhps|movlhps|movlps|movmskps|movntps|movss|movups|mulps|mulss|orps|rcpps|rcpss|rsqrtps|rsqrtss|shufps|sqrtps|sqrtss|stmxcsr|subps|subss|ucomiss|unpckhps|unpcklps|xorps|pavgb|pavgw|pextrw|pinsrw|pmaxsw|pmaxub|pminsw|pminub|pmovmskb|psadbw|pshufw)[ \t]/' YOURFILE.asm

Find only packed SSE instructions (suggested by @Peter Cordes in comments):

awk '/[ \t](addps|andnps|andps|cmpps|cvtpi2ps|cvtps2pi|cvttps2pi|divps|maxps|minps|movaps|movhlps|movhps|movlhps|movlps|movmskps|movntps|movntq|movups|mulps|orps|pavgb|pavgw|pextrw|pinsrw|pmaxsw|pmaxub|pminsw|pminub|pmovmskb|pmulhuw|psadbw|pshufw|rcpps|rsqrtps|shufps|sqrtps|subps|unpckhps|unpcklps|xorps)[ \t]/' YOURFILE.asm

Find all SSE2 instructions (except MOVSD and CMPSD, which were first introduced in 80386):

awk '/[ \t](addpd|addsd|andnpd|andpd|cmppd|comisd|cvtdq2pd|cvtdq2ps|cvtpd2dq|cvtpd2pi|cvtpd2ps|cvtpi2pd|cvtps2dq|cvtps2pd|cvtsd2si|cvtsd2ss|cvtsi2sd|cvtss2sd|cvttpd2dq|cvttpd2pi|cvtps2dq|cvttsd2si|divpd|divsd|maxpd|maxsd|minpd|minsd|movapd|movhpd|movlpd|movmskpd|movupd|mulpd|mulsd|orpd|shufpd|sqrtpd|sqrtsd|subpd|subsd|ucomisd|unpckhpd|unpcklpd|xorpd|movdq2q|movdqa|movdqu|movq2dq|paddq|pmuludq|pshufhw|pshuflw|pshufd|pslldq|psrldq|punpckhqdq|punpcklqdq)[ \t]/' YOURFILE.asm

Find only packed SSE2 instructions:

awk '/[ \t](addpd|andnpd|andpd|cmppd|cvtdq2pd|cvtdq2ps|cvtpd2dq|cvtpd2pi|cvtpd2ps|cvtpi2pd|cvtps2dq|cvtps2pd|cvttpd2dq|cvttpd2pi|cvttps2dq|divpd|maxpd|minpd|movapd|movapd|movhpd|movhpd|movlpd|movlpd|movmskpd|movntdq|movntpd|movupd|movupd|mulpd|orpd|pshufd|pshufhw|pshuflw|pslldq|psrldq|punpckhqdq|shufpd|sqrtpd|subpd|unpckhpd|unpcklpd|xorpd)[ \t]/' YOURFILE.asm

Find all SSE3 instructions:

awk '/[ \t](addsubpd|addsubps|haddpd|haddps|hsubpd|hsubps|movddup|movshdup|movsldup|lddqu|fisttp)[ \t]/' YOURFILE.asm

Find all SSSE3 instructions:

awk '/[ \t](psignw|psignd|psignb|pshufb|pmulhrsw|pmaddubsw|phsubw|phsubsw|phsubd|phaddw|phaddsw|phaddd|palignr|pabsw|pabsd|pabsb)[ \t]/' YOURFILE.asm

Find all SSE4 instructions:

awk '/[ \t](mpsadbw|phminposuw|pmulld|pmuldq|dpps|dppd|blendps|blendpd|blendvps|blendvpd|pblendvb|pblenddw|pminsb|pmaxsb|pminuw|pmaxuw|pminud|pmaxud|pminsd|pmaxsd|roundps|roundss|roundpd|roundsd|insertps|pinsrb|pinsrd|pinsrq|extractps|pextrb|pextrd|pextrw|pextrq|pmovsxbw|pmovzxbw|pmovsxbd|pmovzxbd|pmovsxbq|pmovzxbq|pmovsxwd|pmovzxwd|pmovsxwq|pmovzxwq|pmovsxdq|pmovzxdq|ptest|pcmpeqq|pcmpgtq|packusdw|pcmpestri|pcmpestrm|pcmpistri|pcmpistrm|crc32|popcnt|movntdqa|extrq|insertq|movntsd|movntss|lzcnt)[ \t]/' YOURFILE.asm

Find most common AVX instructions (including scalar, including AVX2, AVX-512 family and some FMA like vfmadd132pd):

awk '/[ \t](vmovapd|vmulpd|vaddpd|vsubpd|vfmadd213pd|vfmadd231pd|vfmadd132pd|vmulsd|vaddsd|vmosd|vsubsd|vbroadcastss|vbroadcastsd|vblendpd|vshufpd|vroundpd|vroundsd|vxorpd|vfnmadd231pd|vfnmadd213pd|vfnmadd132pd|vandpd|vmaxpd|vmovmskpd|vcmppd|vpaddd|vbroadcastf128|vinsertf128|vextractf128|vfmsub231pd|vfmsub132pd|vfmsub213pd|vmaskmovps|vmaskmovpd|vpermilps|vpermilpd|vperm2f128|vzeroall|vzeroupper|vpbroadcastb|vpbroadcastw|vpbroadcastd|vpbroadcastq|vbroadcasti128|vinserti128|vextracti128|vpminud|vpmuludq|vgatherdpd|vgatherqpd|vgatherdps|vgatherqps|vpgatherdd|vpgatherdq|vpgatherqd|vpgatherqq|vpmaskmovd|vpmaskmovq|vpermps|vpermd|vpermpd|vpermq|vperm2i128|vpblendd|vpsllvd|vpsllvq|vpsrlvd|vpsrlvq|vpsravd|vblendmpd|vblendmps|vpblendmd|vpblendmq|vpblendmb|vpblendmw|vpcmpd|vpcmpud|vpcmpq|vpcmpuq|vpcmpb|vpcmpub|vpcmpw|vpcmpuw|vptestmd|vptestmq|vptestnmd|vptestnmq|vptestmb|vptestmw|vptestnmb|vptestnmw|vcompresspd|vcompressps|vpcompressd|vpcompressq|vexpandpd|vexpandps|vpexpandd|vpexpandq|vpermb|vpermw|vpermt2b|vpermt2w|vpermi2pd|vpermi2ps|vpermi2d|vpermi2q|vpermi2b|vpermi2w|vpermt2ps|vpermt2pd|vpermt2d|vpermt2q|vshuff32x4|vshuff64x2|vshuffi32x4|vshuffi64x2|vpmultishiftqb|vpternlogd|vpternlogq|vpmovqd|vpmovsqd|vpmovusqd|vpmovqw|vpmovsqw|vpmovusqw|vpmovqb|vpmovsqb|vpmovusqb|vpmovdw|vpmovsdw|vpmovusdw|vpmovdb|vpmovsdb|vpmovusdb|vpmovwb|vpmovswb|vpmovuswb|vcvtps2udq|vcvtpd2udq|vcvttps2udq|vcvttpd2udq|vcvtss2usi|vcvtsd2usi|vcvttss2usi|vcvttsd2usi|vcvtps2qq|vcvtpd2qq|vcvtps2uqq|vcvtpd2uqq|vcvttps2qq|vcvttpd2qq|vcvttps2uqq|vcvttpd2uqq|vcvtudq2ps|vcvtudq2pd|vcvtusi2ps|vcvtusi2pd|vcvtusi2sd|vcvtusi2ss|vcvtuqq2ps|vcvtuqq2pd|vcvtqq2pd|vcvtqq2ps|vgetexppd|vgetexpps|vgetexpsd|vgetexpss|vgetmantpd|vgetmantps|vgetmantsd|vgetmantss|vfixupimmpd|vfixupimmps|vfixupimmsd|vfixupimmss|vrcp14pd|vrcp14ps|vrcp14sd|vrcp14ss|vrndscaleps|vrndscalepd|vrndscaless|vrndscalesd|vrsqrt14pd|vrsqrt14ps|vrsqrt14sd|vrsqrt14ss|vscalefps|vscalefpd|vscalefss|vscalefsd|valignd|valignq|vdbpsadbw|vpabsq|vpmaxsq|vpmaxuq|vpminsq|vpminuq|vprold|vprolvd|vprolq|vprolvq|vprord|vprorvd|vprorq|vprorvq|vpscatterdd|vpscatterdq|vpscatterqd|vpscatterqq|vscatterdps|vscatterdpd|vscatterqps|vscatterqpd|vpconflictd|vpconflictq|vplzcntd|vplzcntq|vpbroadcastmb2q|vpbroadcastmw2d|vexp2pd|vexp2ps|vrcp28pd|vrcp28ps|vrcp28sd|vrcp28ss|vrsqrt28pd|vrsqrt28ps|vrsqrt28sd|vrsqrt28ss|vgatherpf0dps|vgatherpf0qps|vgatherpf0dpd|vgatherpf0qpd|vgatherpf1dps|vgatherpf1qps|vgatherpf1dpd|vgatherpf1qpd|vscatterpf0dps|vscatterpf0qps|vscatterpf0dpd|vscatterpf0qpd|vscatterpf1dps|vscatterpf1qps|vscatterpf1dpd|vscatterpf1qpd|vfpclassps|vfpclasspd|vfpclassss|vfpclasssd|vrangeps|vrangepd|vrangess|vrangesd|vreduceps|vreducepd|vreducess|vreducesd|vpmovm2d|vpmovm2q|vpmovm2b|vpmovm2w|vpmovd2m|vpmovq2m|vpmovb2m|vpmovw2m|vpmullq|vpmadd52luq|vpmadd52huq|v4fmaddps|v4fmaddss|v4fnmaddps|v4fnmaddss|vp4dpwssd|vp4dpwssds|vpdpbusd|vpdpbusds|vpdpwssd|vpdpwssds|vpcompressb|vpcompressw|vpexpandb|vpexpandw|vpshld|vpshldv|vpshrd|vpshrdv|vpopcntd|vpopcntq|vpopcntb|vpopcntw|vpshufbitqmb|gf2p8affineinvqb|gf2p8affineqb|gf2p8mulb|vpclmulqdq|vaesdec|vaesdeclast|vaesenc|vaesenclast)[ \t]/' YOURFILE.asm

NOTE: tested with gawk and nawk.

Clarifications about SIMD in C

Of course we need to take care which ISA is supported, because if we use an unknown instruction then the program will be killed with a non-supported instruction signal. Besides it allows us to optimize for each architecture, for example on CPUs with AVX-512 we can use AVX-512 for better performance, but if on an older CPU then we can fallback to the appropriate version for that architecture

What are the best practices when developing such programs?

There are no general best practices. It depends on each situation because each compiler has different tools for this

  • If your compiler doesn't support dynamic dispatching then you need to write separate code for each ISA and call the corresponding version for the current platform
  • Some compilers automatically dispatch to the version optimized for the running platform, for example ICC can compile a hot loop to separate versions of SSE/AVX/AVX-512 and jump to the correct version for maximum performance.
  • Some other compilers support compiling to separate versions of a single function and automatically dispatch but you need to specify which function you want to optimize. For example in GCC, Clang and ICC you can use the attributes target and target_clones. See Building backward compatible binaries with newer CPU instructions support

How can I determine which instructions are supported on which Intel processor families?

Actually, you have the answer there already: The best resource to find such information is in the CPU manufacturers' documentation.

If you look carefully, the The Intel manual you (almost) link to has all the information you need on your example FISTTP: it is explicitly listed as an SSE3 instruction (see here: Vol 1, Section 5.7.1 of Intel 64 and IA-32 Architectures Software Developer’s Manual, June 2013). This implies that any CPU which supports the SSE3 instruction set, should support FISTTP.

As far as modern instruction sets go (SSE, AVX, BMI, ..., you name it), the Intel manuals really do a good job of detailing which instruction sets (and associated CPUID feature flags) any instruction belongs to, pretty much back to the instructions that were around when CPUID was introduced (late 80486 CPUs). With this information, it becomes easy to figure out which CPU model supports a given instruction.

I am not sure about how well the Intel manuals would work for figuring out when really ancient things were introduced (for CPUs up to the 486 I have an old hard copy Microsoft MASM Reference manual from 1992 that details these things). But I would be surprised if this info wasn't google-able -- anyway these really old changes (like introduction of BT instructions on the 386) are nowadays probably only interesting from an academic standpoint anyway.

Determine which instructions are supported by the processor

In kernel32.dll you have the function IsProcessorFeaturePresent which you can pInvoke.

Edit

Regarding EM64T extended instruction set, this is only available on x64 platforms so you can check which type of CPU is present through WMI:

public static bool IsEM64TSupported()
{
ManagementObject mo;
mo = new ManagementObject("Win32_Processor.DeviceID='CPU0'");
ushort i = (ushort) mo["Architecture"];

return i == 9;
}

But since EM64T instructions are not available in 32-bit operating systems you'll need to check that too.

How to programmatically check if fused mul add (FMA) instruction are enabled on the CPU?

I used __cpuid to code my function by modifying the microsoft code. Thank you very much to all for your help.

#include <intrin.h>
#include <vector>
#include <bitset>
#include <array>

bool CheckFMA()
{
std::array<int, 4> cpui;
std::bitset<32> ECX;
int nIds;
bool fma;

__cpuid(cpui.data(), 0);
nIds = cpui[0];

if (nIds < 1)
{
return false;
}

__cpuidex(cpui.data(), 1, 0);
ECX = cpui[2];

return ECX[12];
}

does gcc's __builtin_cpu_supports check for OS support?

No.

I disabled AVX on my Skylake system by adding noxsave to the Linux kernel boot options. When I do cat /proc/cpuinfo AVX (and AVX2) no longer appear and when I run code with AVX instructions it crashes. This tells me that AVX has been disabled by the OS.

However, when I compile and run the following code

#include <stdio.h>

int main(void) {
__builtin_cpu_init();
printf("%d\n", __builtin_cpu_supports ("sse"));
printf("%d\n", __builtin_cpu_supports ("avx"));
}

it returns 8 and 512. This means that __builtin_cpu_supports does not check to see if AVX was disabled by the OS.



Related Topics



Leave a reply



Submit