Why Isn't There Int128_T

Why isn't there int128_t?

I'll refer to the C standard; I think the C++ standard inherits the rules for <stdint.h> / <cstdint> from C.

I know that gcc implements 128-bit signed and unsigned integers, with the names __int128 and unsigned __int128 (__int128 is an implementation-defined keyword) on some platforms.

Even for an implementation that provides a standard 128-bit type, the standard does not require int128_t or uint128_t to be defined. Quoting section 7.20.1.1 of the N1570 draft of the C standard:

These types are optional. However, if an implementation provides
integer types with widths of 8, 16, 32, or 64 bits, no padding bits,
and (for the signed types) that have a two’s complement
representation, it shall define the corresponding typedef names.

C permits implementations to defined extended integer types whose names are implementation-defined keywords. gcc's __int128 and unsigned __int128 are very similar to extended integer types as defined by the standard -- but gcc doesn't treat them that way. Instead, it treats them as a language extension.

In particular, if __int128 and unsigned __int128 were extended integer types, then gcc would be required to define intmax_t and uintmax_t as those types (or as some types at least 128 bits wide). It does not do so; instead, intmax_t and uintmax_t are only 64 bits.

This is, in my opinion, unfortunate, but I don't believe it makes gcc non-conforming. No portable program can depend on the existence of __int128, or on any integer type wider than 64 bits. And changing intmax_t and uintmax_t would cause serious ABI compatibility problems.

Assigning 128 bit integer in C

Am I doing something wrong or is this a bug in gcc?

The problem is in 47942806932686753431 part, not in __uint128_t p. According to gcc docs there's no way to declare 128 bit constant:

There is no support in GCC for expressing an integer constant of type __int128 for targets with long long integer less than 128 bits wide.

So, it seems that while you can have 128 bit variables, you cannot have 128 bit constants, unless your long long is 128 bit wide.

The workaround could be to construct 128 bit value from "narrower" integral constants using basic arithmetic operations, and hope for compiler to perform constant folding.

atoi() for int128_t type

Here is a C++ implementation:

#include <string>
#include <stdexcept>

__int128_t atoint128_t(std::string const & in)
{
    __int128_t res = 0;
    size_t i = 0;
    bool sign = false;

    if (in[i] == '-')
    {
        ++i;
        sign = true;
    }

    if (in[i] == '+')
    {
        ++i;
    }

    for (; i < in.size(); ++i)
    {
        const char c = in[i];
        if (not std::isdigit(c)) 
            throw std::runtime_error(std::string("Non-numeric character: ") + c)
        res *= 10;
        res += c - '0';
    }

    if (sign)
    {
        res *= -1;
    }

    return res;
}

int main()
{
  __int128_t a = atoint128_t("170141183460469231731687303715884105727");
}

If you want to test it then there is a stream operator here.

Performance

I ran a few performance test. I generate 100,000 random numbers uniformly distributed in the entire support of __int128_t. Then I converted each of them 2000 times. All of these (200,000,000) conversions where completed within ~12 seconds.
Using this code:

#include <iostream>
#include <string>
#include <random>
#include <vector>
#include <chrono>

int main()
{
    std::mt19937 gen(0);
    std::uniform_int_distribution<> num(0, 9);
    std::uniform_int_distribution<> len(1, 38);
    std::uniform_int_distribution<> sign(0, 1);

    std::vector<std::string> str;

    for (int i = 0; i < 100000; ++i)
    {
        std::string s;
        int l = len(gen);
        if (sign(gen))
            s += '-';
        for (int u = 0; u < l; ++u)
            s += std::to_string(num(gen));
        str.emplace_back(s);
    }

    namespace sc = std::chrono;
    auto start =  sc::duration_cast<sc::microseconds>(sc::high_resolution_clock::now().time_since_epoch()).count();
    __int128_t b = 0;
    for (int u = 0; u < 200; ++u)
    {
        for (int i = 0; i < str.size(); ++i)
        {
            __int128_t a = atoint128_t(str[i]);
            b += a;
        }
    }
    auto time =  sc::duration_cast<sc::microseconds>(sc::high_resolution_clock::now().time_since_epoch()).count() - start;
    std::cout << time / 1000000. << 's' << std::endl;
}

Is a 128 bit int written or loaded in two instructions in C/C++?

The support is discussed in other answers. I'll discuss implementation issues.

Usually when reading from memory, the compiler will emit processor instructions to fetch the data from memory into a register. This may be atomic depending on how the databus is set up between the processor and the memory.

If your processor supports 128-bit transfers and the memory supports 128-bit data bus, this could be a single fetch (or write).

If your processor supports 128-bit {register} transfers, but the data bus is smaller, the processor will perform enough fetches to transfer the data from memory. This may or may not be atomic, depending on your definition of atomic (it's one processor instruction, but may require more than one fetch).

For processors that don't support 128-bit register transfers, the compiler will emit enough instructions to read the memory into register(s). This is for register to memory or memory to register transfers.

For memory to memory transfers (e.g. variable assignments), the compiler may choose to use block reading and writing (if your processor has support for block reading and writing). Some processors support SIMD, others may have block transfer instructions. For example, the ARM has LDM (load multiple) and STM (store multiple) instructions for loading many registers from memory and storing many registers to memory. Another method of block reading and writing is to use a DMA device (if present). The DMA can transfer data while the processor executes other instructions. However, the overhead to use the DMA may require more instructions than using 16 8-byte (byte) transfers.

In summary, compilers are not required to support int128_t. If they do support it, there are various methods to transfer the data, depending on the processor and platform hardware support. View the assembly language to see the instructions emitted by the compiler to support int128_t.

Is __int128_t arithmetic emulated by GCC, even with SSE?

I was confusing two different things in my question.

Firstly, as PaulR explained in the comments: "There are no 128 bit arithmetic operations in SSE or AVX (apart from bitwise operations)". Considering this, 128-bit arithmetic has to be emulated on modern x86-64 based processors (e.g. AMD Family 10 or Intel Core architecture). This has nothing to do with GCC.

The second part of the question is whether or not 128-bit arithmetic emulation in GCC benefits from SSE/AVX instructions or registers. As implied in PaulR's comments, there isn't much in SSE/AVX that's going to allow you to do 128-bit arithmetic more easily; most likely x86-64 instructions will be used for this. The code I'm interested in can't compile with -mno-sse, but it compiles fine with -mno-sse2 -mno-sse3 -mno-ssse3 -mno-sse4 -mno-sse4.1 -mno-sse4.2 -mno-avx -mno-avx2 and performance isn't affected. So my code doesn't benefit from modern SSE instructions.

how to print __uint128_t number using gcc?

No there isn't support in the library for printing these types. They aren't even extended integer types in the sense of the C standard.

Your idea for starting the printing from the back is a good one, but you could use much larger chunks. In some tests for P99 I have such a function that uses

uint64_t const d19 = UINT64_C(10000000000000000000);

as the largest power of 10 that fits into an uint64_t.

As decimal, these big numbers get unreadable very soon so another, easier, option is to print them in hex. Then you can do something like

  uint64_t low = (uint64_t)x;
  // This is UINT64_MAX, the largest number in 64 bit
  // so the longest string that the lower half can occupy
  char buf[] = { "18446744073709551615" };
  sprintf(buf, "%" PRIX64, low);

to get the lower half and then basically the same with

  uint64_t high = (x >> 64);

for the upper half.

Is there a 128 bit integer in C++?

GCC and Clang support __int128

How to enable __int128 on Visual Studio?

MSDN doesn't list it as being available, and this recent response agrees, so officially, no, there is no type called __int128 and it can't be enabled.

Additionally, never trust the syntax hilighter; it is user editable, and thus likely to either have bogus or 'future' types in it. (it is probably a reserved word however, due to the error, thus you should avoid naming any types __int128, this follows the convention that anything prefixed with a double underscore should reserved for compiler use).

One would think the __int128 might be available on x64/IPF machines via register spanning, like __in64 is on 32bit targets, but right now the only 128 bit types stem from SIMD types (__m128 and its various typed forms).

__int128 alignment segment fault with gcc -O SSE optimize

For x86-64 System V, alignof(max_align_t) == 16 so malloc always returns 16-byte aligned pointers. It sounds like your allocator is broken, and would violate the ABI if used for long double as well. (Reposting this as an answer because it turns out it was the answer).

Memory returned by malloc is guaranteed to be able to hold any standard type, so that means being aligned enough if the size is large enough.

This can't be 32-bit code, because gcc doesn't support __int128 in 32-bit targets. (32-bit glibc malloc only guarantees 8-byte alignment I think. Actually on current systems, alignof(max_align_t) == 16 in 32-bit mode as well.)

In general, the compiler is allowed to make code that faults if you violate the alignment requirements of types. On x86 things typically just work with misaligned memory until the compiler uses alignment-required SIMD instructions. Even auto-vectorization with a mis-aligned uint16_t* can fault (Why does unaligned access to mmap'ed memory sometimes segfault on AMD64?), so don't assume that narrow types are always safe. Use memcpy if you need to express an unaligned load in C.

Apparently alignof(__int128) is 16. So they aren't repeating the weirdness from i386 System V where even 8-byte objects are only guaranteed 4-byte alignment, and struct-packing rules mean that compilers can't always give them natural alignment.

This is a Good Thing, because it makes it efficient to copy with SSE, and means _Atomic __int128 doesn't need any extra special handling to avoid cache-line splits that would make lock cmpxchg16b very slow.

Why Isn't There Int128_T