Problem of Understanding Clock_Gettime

Problem of understanding clock_gettime

Your system clock source is probably set to TSC instead of HPET.

On modern multi-core systems in general, HPET is a newer system that is more accurate and consistent, and TSC is an older system that is more performant.

On openSUSE, you can find out what your current clocksource is by

cat /sys/devices/system/clocksource/clocksource0/current_clocksource

To set your clock source to HPET on openSUSE, do

echo 'hpet' > /sys/devices/system/clocksource/clocksource0/current_clocksource

compilation error on clock_gettime and CLOCK_MONOTONIC

Before including the header(<time.h>), do

#define _POSIX_C_SOURCE 199309L

http://man7.org/linux/man-pages/man2/clock_gettime.2.html

Feature Test Macro Requirements for glibc (see feature_test_macros(7)):
   clock_getres(), clock_gettime(), clock_settime():
          _POSIX_C_SOURCE >= 199309L

http://man7.org/linux/man-pages/man7/feature_test_macros.7.html

_POSIX_C_SOURCE
Defining this macro causes header files to expose definitions as
follows:

  ·  The value 1 exposes definitions conforming to POSIX.1-1990 and
     ISO C (1990).

  ·  The value 2 or greater additionally exposes definitions for
     POSIX.2-1992.

  ·  The value 199309L or greater additionally exposes definitions
     for POSIX.1b (real-time extensions).

  ·  The value 199506L or greater additionally exposes definitions
     for POSIX.1c (threads).

  ·  (Since glibc 2.3.3) The value 200112L or greater additionally
     exposes definitions corresponding to the POSIX.1-2001 base
     specification (excluding the XSI extension).  This value also
     causes C95 (since glibc 2.12) and C99 (since glibc 2.10)
     features to be exposed (in other words, the equivalent of
     defining _ISOC99_SOURCE).

  ·  (Since glibc 2.10) The value 200809L or greater additionally
     exposes definitions corresponding to the POSIX.1-2008 base
     specification (excluding the XSI extension).

Linux clock_gettime(CLOCK_MONOTONIC) strange non-monotonic behavior

man clock_gettime says:

CLOCK_MONOTONIC_RAW (since Linux 2.6.28; Linux-specific)
Similar to CLOCK_MONOTONIC, but provides access to a raw hardware-based time that is not subject to NTP adjustments.

Since CLOCK_MONOTONIC_RAW is not subject of NTP adjustments, I guess CLOCK_MONOTONIC could be.

We had similar problems with Redhat Enterprise 5.0 with 2.6.18 kernel and some specific Itanium processor. We couldn't reproduce it with other processor on the same OS. It was fixed in RHEL 5.3 with slightly newer kernel and some Redhat patches.

clock_gettime suddenly not working anymore today, what could be the issue?

tv_sec is a signed integer holding seconds, so 1000 * tv_sec will overflow an int32_t. Use int64_t.

Overflow will occur after (1 << 31) milliseconds, which is about 24.8 days.

Why does clock_gettime not compile when using C99?

It’s because using the -std=c99 option defines a macro that causes the declaration of clock_gettime to be hidden.

The GNU C library can expose different versions of its functions, or hide them altogether, depending on what is expected by user code. User code declares those expectations by defining certain specially-named macros (called feature test macros, though this is something of a misnomer) with pre-determined values recognised by the GNU C library headers. One such macro is __STRICT_ANSI__, which is implicitly defined if any of the -std=cXX options are used.

Quoting feature_test_macros(7):

__STRICT_ANSI__
ISO Standard C. This macro is implicitly defined by gcc(1) when invoked with, for example, the -std=c99 or -ansi flag.
[…]
If any of __STRICT_ANSI__, _ISOC99_SOURCE, _ISOC11_SOURCE (since glibc 2.18), _POSIX_SOURCE, _POSIX_C_SOURCE, _XOPEN_SOURCE, _XOPEN_SOURCE_EXTENDED (in glibc 2.11 and earlier), _BSD_SOURCE (in glibc 2.19 and earlier), or _SVID_SOURCE (in glibc 2.19 and earlier) is explicitly defined, then _BSD_SOURCE, _SVID_SOURCE, and _DEFAULT_SOURCE are not defined by default.

Using -std=gnuXX option instead of -std=cXX does not cause __STRICT_ANSI__ to be defined; it also enables GNU extensions to the C language. If you want to keep those language extensions disabled, you can instead put #define _DEFAULT_SOURCE 1 at the very top of the file to expose all the symbols the C library otherwise would expose (or perhaps another feature test macro to adjust the library interface to your liking). An explicit feature test macro will take precedence over __STRICT_ANSI__.

clock_gettime returning a nonsense value

tv_nsec is nanoseconds, i.e. 1 billionth (1 / 1,000,000,000) of one second. Your calcuation however is treating it as if it's microseconds.

Here's the fix:

return ((ts.tv_sec * 1000) + (ts.tv_nsec / 1000000)) + 0.5;
                                               ^^^

`clock_gettime()` yields incorrect results on Apple M1 (Monterey 12.0.1)

There is nothing wrong with the timing. Your multiplication function (omitted from the question but included in the gist) has a bug and does not actually do matrix multiplication, so it is much faster than you expect.

void matmul(size_t N, double C[static N][N], double A[static N][N],
            double B[static N][N]) {
  for (size_t i = 0; i < N; i++) {
    for (size_t j = 0; j < N; j++) {
      for (size_t k = 0; k < N; k++) {
        double acc = A[i][k] * B[k][j]; // oops
        C[i][j] = acc;
      }
    }
  }
}

You forgot to actually add the terms in the innermost loop. When optimizing, the compiler then notices that all iterations of the loop on k simply overwrite the same value, so only the last one needs to be done. It can effectively replace the loop on k with the single statement C[i][j] = A[i][N-1] * B[N-1][j], and your supposedly O(N^3) algorithm is actually O(N^2). So no wonder it runs 1024 times faster than you expected.

If you're running the same code in your x86 tests, it may be that your x86 compiler doesn't do this optimization. It requires some good inlining as this optimization can only be done if the compiler can prove that C doesn't alias A or B.

When I corrected it to

void matmul(size_t N, double C[static N][N], double A[static N][N],
            double B[static N][N]) {
  for (size_t i = 0; i < N; i++) {
    for (size_t j = 0; j < N; j++) {
      double acc = 0;
      for (size_t k = 0; k < N; k++) {
        acc += A[i][k] * B[k][j];
      }
      C[i][j] = acc;
    }
  }
}

it correctly reports taking about 1.2 seconds on my MacBook Pro M1, which seems quite reasonable.

Morals:

Minimal reproducible examples are important. The code snippet you included in the question had nothing to do with the actual bug, and so people who just tried to stare at the question itself were wasting their time. The full code in the gist helped, but should really be in the question itself, as not everyone will take the time to chase links.
Correctness testing should come before benchmarking, or at least together with it. Your program outputs the result of the matrix "multiplication" but I suppose you never actually checked that it was mathematically correct.
select isn't broken.

Problem of Understanding Clock_Gettime