What Is Uint_Fast32_T and Why Should It Be Used Instead of the Regular Int and Uint32_T

What is uint_fast32_t and why should it be used instead of the regular int and uint32_t?

int may be as small as 16 bits on some platforms. It may not be sufficient for your application.
uint32_t is not guaranteed to exist. It's an optional typedef that the implementation must provide iff it has an unsigned integer type of exactly 32-bits. Some have a 9-bit bytes for example, so they don't have a uint32_t.
uint_fast32_t states your intent clearly: it's a type of at least 32 bits which is the best from a performance point-of-view. uint_fast32_t may be in fact 64 bits long. It's up to the implementation.
There's also uint_least32_t in the mix. It designates the smallest type that's at least 32 bits long, thus it can be smaller than uint_fast32_t. It's an alternative to uint32_t if the later isn't supported by the platform.

... there is uint_fast32_t which has the same typedef as uint32_t ...

What you are looking at is not the standard. It's a particular implementation (BlackBerry). So you can't deduce from there that uint_fast32_t is always the same as uint32_t.

Why would uint32_t be preferred rather than uint_fast32_t?

uint32_t is guaranteed to have nearly the same properties on any platform that supports it.¹

uint_fast32_t has very little guarantees about how it behaves on different systems in comparison.

If you switch to a platform where uint_fast32_t has a different size, all code that uses uint_fast32_t has to be retested and validated. All stability assumptions are going to be out the window. The entire system is going to work differently.

When writing your code, you may not even have access to a uint_fast32_t system that isn't 32 bits in size.

uint32_t won't work differently (see footnote).

Correctness is more important than speed. Premature correctness is thus a better plan than premature optimization.

In the event I was writing code for systems where uint_fast32_t was 64 or more bits, I might test my code for both cases and use it. Barring both need and opportunity, doing so is a bad plan.

Finally, uint_fast32_t when you are storing it for any length of time or number of instances can be slower than uint32 simply due to cache size issues and memory bandwidth. Todays computers are far more often memory-bound than CPU bound, and uint_fast32_t could be faster in isolation but not after you account for memory overhead.

¹ As @chux has noted in a comment, if unsigned is larger than uint32_t, arithmetic on uint32_t goes through the usual integer promotions, and if not, it stays as uint32_t. This can cause bugs. Nothing is ever perfect.

uint32_t vs uint_fast32_t vs uint_least32_t

what's the difference with uint32_t

uint_fast32_t is an unsigned type of at least 32 bits that is (in some general way) the fastest such type. "fast" means that given a choice, the implementer will probably pick the size for which the architecture has arithmetic, load and store instructions. It's not the winner of any particular benchmark.

uint_least32_t is the smallest unsigned type of at least 32 bits.

uint32_t is a type of exactly 32 bits with no padding, if any such type exists.

Am I right?

No. If uint24_t exists at all then it is an integer type, not a struct. If there is no unsigned integer type of 24 bits in this implementation then it does not exist.

Since unsigned long is required to be at least 32 bits, the only standard types that uint24_t could possibly ever be an alias for are char, unsigned char, unsigned short and unsigned int. Alternatively it could be an extended type (that is, an integer type provided by the implementation that is not any of the defined integer types in the standard).

Will you suggest me to use uint48_t for my 48-bit unsigned integers?

If it exists and is the size you want then you might as well use it. However, it will not exist on very many implementations, so it's only suitable for use in non-portable code. That's OK provided the reason you have to deal with exact 48-bit integers is platform-specific.

The exact 16, 32 and 64 bit types are also technically optional, but they are required to exist if the implementation has suitable integer types. "Suitable" means not only that there is an exact N bit unsigned type with no padding bits, but also that the corresponding signed type has no padding bits and uses 2's complement representation. In practice this is so close to everywhere that you limit portability very little by using any of them. For maximum portability though you should use uint_least32_t or uint_fast32_t in preference to uint32_t. Which one depends on whether you care more about speed or size. In my experience very few people bother, since platforms that don't have a 32 bit integer type are already so weird that most people don't care whether their code runs on it or not.

What is portable and correct way to print int_fast_32_t etc data-type

include <inttypes.h> and use the macros therein

#include <inttypes.h> // <inttypes.h> also includes <stdint.h> on its own
uint_fast8_t a = 0;
uint_fast16_t b = 0
uint_fast32_t c = 0;
uint_fast64_t d = 0;

printf("%" PRIuFAST8 " %" PRIuFAST16 " %" PRIuFAST32 " %" PRIuFAST64 "\n", a, b, c, d);

A practical example using uint32_t instead of unsigned int

unsigned int may not be 32 bits, it's depends on the computer architecture your program runs on.

uint32_t is defined in library, and typedef as 32 bits integer.

The practice is more for portability. Even the target platform doesn't have uint32_t defined, you can easily typedef it to a 32 bits integer type.

How can you define the uint_fast32_t for MPI data types?

Let's assume we work with an MPI implementation conforming to MPI standard version 2.2 or later.

MPI 2.2 and later define signed integer datatypes MPI_INT8_T, MPI_INT16_T, MPI_INT32_T, MPI_INT64_T (corresponding to C int8_t, int16_t, int32_t, and int64_t), and unsigned integer datatypes MPI_UINT8_T, MPI_UINT16_T, MPI_UINT32_T, and MPI_UINT64_T (corresponding to C uint8_t, uint16_t, uint32_t, and uint64_t).

This means that you can use these specific-size integer types directly in MPI in C.

The situation with minimum-width (int_leastN_t, uint_leastN_t) and fastest minimum-width (int_fastN_t, uint_fastN_t) integer types is different. A language-lawyer will tell you that you cannot really use these types with MPI, because the C or MPI standards do not provide a clean way to use them.

In practice, the situation is much simpler. All existing C implementations supporting <stdint.h> types typedef the minimum-width and fastest minimum-width integer types to types compatible with the exact-width types.

Personally, I would create a header file, say extra-mpi-types.h, that includes the appropriate header file, say

/* extra_mpi_types.h */
#ifndef   EXTRA_MPI_TYPES_H

/* Use build-time generated file */
#include <extra_mpi_types_internal.h>

#endif /* EXTRA_MPI_TYPES_H */

where extra_mpi_types_internal.h is generated at build time by compiling and running something like

/* type_generator.c */
#include <stdlib.h>
#include <stdint.h>
#include <stdio.h>

static inline const char *mpi_type_name(const char *const name, const size_t size, const int is_signed)
{
    if (is_signed) {
        if (size == sizeof (int8_t))  return "MPI_INT8_T";
        if (size == sizeof (int16_t)) return "MPI_INT16_T";
        if (size == sizeof (int32_t)) return "MPI_INT32_T";
        if (size == sizeof (int64_t)) return "MPI_INT64_T";
    } else {
        if (size == sizeof (uint8_t))  return "MPI_UINT8_T";
        if (size == sizeof (uint16_t)) return "MPI_UINT16_T";
        if (size == sizeof (uint32_t)) return "MPI_UINT32_T";
        if (size == sizeof (uint64_t)) return "MPI_UINT64_T";
    }
    fprintf(stderr, "%s: Unsupported %s integer type.\n", name, (is_signed) ? "signed" : "unsigned");
    exit(EXIT_FAILURE);
}

static void define(const char *const mpiname, const char *const typename, const size_t typesize, const int is_signed)
{
    printf("#ifndef  %s\n", mpiname);
    printf("# define %s  %s\n", mpiname, mpi_type_name(typename, typesize, is_signed));
    printf("#endif\n");
}

#define  DEFINE_SIGNED(mpitype, type)    define(#mpitype, #type, sizeof (type), 1)
#define  DEFINE_UNSIGNED(mpitype, type)  define(#mpitype, #type, sizeof (type), 0)

int main(void)
{
    printf("/* This is an autogenerated header file: do not modify. */\n\n");

    DEFINE_SIGNED(MPI_INT_LEAST8_T,  int_least8_t);
    DEFINE_SIGNED(MPI_INT_LEAST16_T, int_least16_t);
    DEFINE_SIGNED(MPI_INT_LEAST32_T, int_least32_t);
    DEFINE_SIGNED(MPI_INT_LEAST64_T, int_least64_t);

    DEFINE_UNSIGNED(MPI_UINT_LEAST8_T,  uint_least8_t);
    DEFINE_UNSIGNED(MPI_UINT_LEAST16_T, uint_least16_t);
    DEFINE_UNSIGNED(MPI_UINT_LEAST32_T, uint_least32_t);
    DEFINE_UNSIGNED(MPI_UINT_LEAST64_T, uint_least64_t);

    DEFINE_SIGNED(MPI_INT_FAST8_T,  int_fast8_t);
    DEFINE_SIGNED(MPI_INT_FAST16_T, int_fast16_t);
    DEFINE_SIGNED(MPI_INT_FAST32_T, int_fast32_t);
    DEFINE_SIGNED(MPI_INT_FAST64_T, int_fast64_t);

    DEFINE_UNSIGNED(MPI_UINT_FAST8_T,  uint_fast8_t);
    DEFINE_UNSIGNED(MPI_UINT_FAST16_T, uint_fast16_t);
    DEFINE_UNSIGNED(MPI_UINT_FAST32_T, uint_fast32_t);
    DEFINE_UNSIGNED(MPI_UINT_FAST64_T, uint_fast64_t);

    return EXIT_SUCCESS;
}

redirecting its output to extra_mpi_types_internal.h. Note that this depends only on the C implementation, and not on the MPI implementation at all. This only finds out which fixed-width integer types match the minimum-width or minimum-width fast integer types.

On x86-64 Linux, this will generate

/* This is an autogenerated header file: do not modify. */

#ifndef  MPI_INT_LEAST8_T
# define MPI_INT_LEAST8_T  MPI_INT8_T
#endif
#ifndef  MPI_INT_LEAST16_T
# define MPI_INT_LEAST16_T  MPI_INT16_T
#endif
#ifndef  MPI_INT_LEAST32_T
# define MPI_INT_LEAST32_T  MPI_INT32_T
#endif
#ifndef  MPI_INT_LEAST64_T
# define MPI_INT_LEAST64_T  MPI_INT64_T
#endif
#ifndef  MPI_UINT_LEAST8_T
# define MPI_UINT_LEAST8_T  MPI_UINT8_T
#endif
#ifndef  MPI_UINT_LEAST16_T
# define MPI_UINT_LEAST16_T  MPI_UINT16_T
#endif
#ifndef  MPI_UINT_LEAST32_T
# define MPI_UINT_LEAST32_T  MPI_UINT32_T
#endif
#ifndef  MPI_UINT_LEAST64_T
# define MPI_UINT_LEAST64_T  MPI_UINT64_T
#endif
#ifndef  MPI_INT_FAST8_T
# define MPI_INT_FAST8_T  MPI_INT8_T
#endif
#ifndef  MPI_INT_FAST16_T
# define MPI_INT_FAST16_T  MPI_INT64_T
#endif
#ifndef  MPI_INT_FAST32_T
# define MPI_INT_FAST32_T  MPI_INT64_T
#endif
#ifndef  MPI_INT_FAST64_T
# define MPI_INT_FAST64_T  MPI_INT64_T
#endif
#ifndef  MPI_UINT_FAST8_T
# define MPI_UINT_FAST8_T  MPI_UINT8_T
#endif
#ifndef  MPI_UINT_FAST16_T
# define MPI_UINT_FAST16_T  MPI_UINT64_T
#endif
#ifndef  MPI_UINT_FAST32_T
# define MPI_UINT_FAST32_T  MPI_UINT64_T
#endif
#ifndef  MPI_UINT_FAST64_T
# define MPI_UINT_FAST64_T  MPI_UINT64_T
#endif

If you use a Makefile to organize your product, I would use something like

CC      := mpicc
CFLAGS  := -Wall -O2
LDFLAGS := -lmpi

all: your-main-program

clean:
    @rm -f *.o extra_mpi_types_internal.h

type-generator: type-generator.c
    $(CC) $(CFLAGS) $^ -o $@

extra_mpi_types_internal.h: type-generator
    ./type-generator > $@

%.o: %.c extra_mpi_types_internal.h
    $(CC) $(CFLAGS) $< -c -o $@

your-main-program: all.o needed.o object.o files.o
    $(CC) $(CFLAGS) $^ $(LDFLAGS) -o $@

although this approach does mean that you cannot cross-compile MPI programs for a different architecture.

Alternatively, you can use pre-defined compiler macros to determine the OS, hardware architecture, and C library used, to include a pre-prepared header file with the correct macro definitions:

/* extra_mpi_types.h */
#ifndef   EXTRA_MPI_TYPES_H

#if defined(__linux__)
#if   defined(__amd64__)
#include <extra-linux-amd64.h>
#elif defined(__i386__)
#include <extra-linux-x86.h>
#elif defined(__aarch64__)
#include <extra-linux-arm64.h>
#elif defined(__ARM_ARCH_4T__)
#include <extra-linux-arm-4t.h>
#else
#error "Unsupported Linux hardware architecture"
#endif

#elif defined(_WIN64)
#include <extra-win64.h>

#elif defined(_WIN32)
#include <extra-win32.h>

#else
#error  Unsupported operating system.
#endif

#endif /* EXTRA_MPI_TYPES_H */

The contents for each of the above files (or rather, the architectures and operating systems as needed), can be either discovered using a C program like above, or by examining the C compiler and library header files.

Why are the fast integer types faster than the other integer types?

Imagine a CPU that performs only 64 bit arithmetic operations. Now imagine how you would implement an unsigned 8 bit addition on such CPU. It would necessarily involve more than one operation to get the right result. On such CPU, 64 bit operations are faster than operations on other integer widths. In this situation, all of Xint_fastY_t might presumably be an alias of the 64 bit type.

If a CPU supports fast operations for narrow integer types and thus a wider type is not faster than a narrower one, then Xint_fastY_t will not (should not) be an alias of the wider type than is necessary to represent all Y bits.

Out of curiosity, I checked the sizes on a particular implementation (GNU, Linux) on some architectures. These are not same across all implementations on same architecture:

┌────╥───────────────────────────────────────────────────────────┐
│ Y  ║   sizeof(Xint_fastY_t) * CHAR_BIT                         │
│    ╟────────┬─────┬───────┬─────┬────────┬──────┬────────┬─────┤
│    ║ x86-64 │ x86 │ ARM64 │ ARM │ MIPS64 │ MIPS │ MSP430 │ AVR │
╞════╬════════╪═════╪═══════╪═════╪════════╪══════╪════════╪═════╡
│ 8  ║ 8      │ 8   │ 8     │ 32  │ 8      │ 8    │ 16     │ 8   │
│ 16 ║ 64     │ 32  │ 64    │ 32  │ 64     │ 32   │ 16     │ 16  │
│ 32 ║ 64     │ 32  │ 64    │ 32  │ 64     │ 32   │ 32     │ 32  │
│ 64 ║ 64     │ 64  │ 64    │ 64  │ 64     │ 64   │ 64     │ 64  │
└────╨────────┴─────┴───────┴─────┴────────┴──────┴────────┴─────┘

Note that although operations on the larger types may be faster, such types also take more space in cache, and thus using them doesn't necessarily yield better performance. Furthermore, one cannot always trust that the implementation has made the right choice in the first place. As always, measuring is required for optimal results.

Screenshot of table, for Android users:

Screenshot of above table

^{(Android doesn't have box-drawing characters in the mono font - ref)}

What Is Uint_Fast32_T and Why Should It Be Used Instead of the Regular Int and Uint32_T