Problem of understanding clock_gettime
Your system clock source is probably set to TSC instead of HPET.
On modern multi-core systems in general, HPET is a newer system that is more accurate and consistent, and TSC is an older system that is more performant.
On openSUSE, you can find out what your current clocksource is by
cat /sys/devices/system/clocksource/clocksource0/current_clocksource
To set your clock source to HPET on openSUSE, do
echo 'hpet' > /sys/devices/system/clocksource/clocksource0/current_clocksource
Further reading:
http://en.wikipedia.org/wiki/HPET
http://en.wikipedia.org/wiki/Time_Stamp_Counter
compilation error on clock_gettime and CLOCK_MONOTONIC
Before including the header(<time.h>
), do
#define _POSIX_C_SOURCE 199309L
http://man7.org/linux/man-pages/man2/clock_gettime.2.html
Feature Test Macro Requirements for glibc (see feature_test_macros(7)):
clock_getres(), clock_gettime(), clock_settime():
_POSIX_C_SOURCE >= 199309L
http://man7.org/linux/man-pages/man7/feature_test_macros.7.html
_POSIX_C_SOURCE
Defining this macro causes header files to expose definitions as
follows:· The value 1 exposes definitions conforming to POSIX.1-1990 and
ISO C (1990).
· The value 2 or greater additionally exposes definitions for
POSIX.2-1992.
· The value 199309L or greater additionally exposes definitions
for POSIX.1b (real-time extensions).
· The value 199506L or greater additionally exposes definitions
for POSIX.1c (threads).
· (Since glibc 2.3.3) The value 200112L or greater additionally
exposes definitions corresponding to the POSIX.1-2001 base
specification (excluding the XSI extension). This value also
causes C95 (since glibc 2.12) and C99 (since glibc 2.10)
features to be exposed (in other words, the equivalent of
defining _ISOC99_SOURCE).
· (Since glibc 2.10) The value 200809L or greater additionally
exposes definitions corresponding to the POSIX.1-2008 base
specification (excluding the XSI extension).
Linux clock_gettime(CLOCK_MONOTONIC) strange non-monotonic behavior
man clock_gettime
says:
CLOCK_MONOTONIC_RAW (since Linux 2.6.28; Linux-specific)
Similar to CLOCK_MONOTONIC, but provides access to a raw hardware-based time that is not subject to NTP adjustments.
Since CLOCK_MONOTONIC_RAW
is not subject of NTP adjustments, I guess CLOCK_MONOTONIC
could be.
We had similar problems with Redhat Enterprise 5.0 with 2.6.18 kernel and some specific Itanium processor. We couldn't reproduce it with other processor on the same OS. It was fixed in RHEL 5.3 with slightly newer kernel and some Redhat patches.
clock_gettime suddenly not working anymore today, what could be the issue?
tv_sec is a signed integer holding seconds, so 1000 * tv_sec will overflow an int32_t. Use int64_t.
Overflow will occur after (1 << 31) milliseconds, which is about 24.8 days.
Why does clock_gettime not compile when using C99?
It’s because using the -std=c99
option defines a macro that causes the declaration of clock_gettime
to be hidden.
The GNU C library can expose different versions of its functions, or hide them altogether, depending on what is expected by user code. User code declares those expectations by defining certain specially-named macros (called feature test macros, though this is something of a misnomer) with pre-determined values recognised by the GNU C library headers. One such macro is __STRICT_ANSI__
, which is implicitly defined if any of the -std=cXX
options are used.
Quoting feature_test_macros(7)
:
__STRICT_ANSI__
ISO Standard C. This macro is implicitly defined by
gcc(1)
when invoked with, for example, the-std=c99
or-ansi
flag.[…]
If any of
__STRICT_ANSI__
,_ISOC99_SOURCE
,_ISOC11_SOURCE
(since glibc 2.18),_POSIX_SOURCE
,_POSIX_C_SOURCE
,_XOPEN_SOURCE
,_XOPEN_SOURCE_EXTENDED
(in glibc 2.11 and earlier),_BSD_SOURCE
(in glibc 2.19 and earlier), or_SVID_SOURCE
(in glibc 2.19 and earlier) is explicitly defined, then_BSD_SOURCE
,_SVID_SOURCE
, and_DEFAULT_SOURCE
are not defined by default.
Using -std=gnuXX
option instead of -std=cXX
does not cause __STRICT_ANSI__
to be defined; it also enables GNU extensions to the C language. If you want to keep those language extensions disabled, you can instead put #define _DEFAULT_SOURCE 1
at the very top of the file to expose all the symbols the C library otherwise would expose (or perhaps another feature test macro to adjust the library interface to your liking). An explicit feature test macro will take precedence over __STRICT_ANSI__
.
clock_gettime returning a nonsense value
tv_nsec
is nanoseconds, i.e. 1 billionth (1 / 1,000,000,000) of one second. Your calcuation however is treating it as if it's microseconds.
Here's the fix:
return ((ts.tv_sec * 1000) + (ts.tv_nsec / 1000000)) + 0.5;
^^^
`clock_gettime()` yields incorrect results on Apple M1 (Monterey 12.0.1)
There is nothing wrong with the timing. Your multiplication function (omitted from the question but included in the gist) has a bug and does not actually do matrix multiplication, so it is much faster than you expect.
void matmul(size_t N, double C[static N][N], double A[static N][N],
double B[static N][N]) {
for (size_t i = 0; i < N; i++) {
for (size_t j = 0; j < N; j++) {
for (size_t k = 0; k < N; k++) {
double acc = A[i][k] * B[k][j]; // oops
C[i][j] = acc;
}
}
}
}
You forgot to actually add the terms in the innermost loop. When optimizing, the compiler then notices that all iterations of the loop on k
simply overwrite the same value, so only the last one needs to be done. It can effectively replace the loop on k
with the single statement C[i][j] = A[i][N-1] * B[N-1][j]
, and your supposedly O(N^3) algorithm is actually O(N^2). So no wonder it runs 1024 times faster than you expected.
If you're running the same code in your x86 tests, it may be that your x86 compiler doesn't do this optimization. It requires some good inlining as this optimization can only be done if the compiler can prove that C doesn't alias A or B.
When I corrected it to
void matmul(size_t N, double C[static N][N], double A[static N][N],
double B[static N][N]) {
for (size_t i = 0; i < N; i++) {
for (size_t j = 0; j < N; j++) {
double acc = 0;
for (size_t k = 0; k < N; k++) {
acc += A[i][k] * B[k][j];
}
C[i][j] = acc;
}
}
}
it correctly reports taking about 1.2 seconds on my MacBook Pro M1, which seems quite reasonable.
Morals:
Minimal reproducible examples are important. The code snippet you included in the question had nothing to do with the actual bug, and so people who just tried to stare at the question itself were wasting their time. The full code in the gist helped, but should really be in the question itself, as not everyone will take the time to chase links.
Correctness testing should come before benchmarking, or at least together with it. Your program outputs the result of the matrix "multiplication" but I suppose you never actually checked that it was mathematically correct.
select
isn't broken.
Related Topics
Unix/Linux Ipc: Reading from a Pipe. How to Know Length of Data at Runtime
How to Compare 3 Files Together (To See What Is in Common Between Them)
How to Enable Mixed Mode Debugging in Visual Studio Code
What Is a Good Interface for a Linux Device Driver for a Co-Processing Peripheral
Is It Secure to Rely on "X-Forwarded-For" to Restrict Access by Ip in Apache While Using Cloudflare
How to Execute 'X86_64-Conda_Cos6-Linux-Gnu-Gcc': No Such File or Directory (Pysam Installation)
How to Use 9-Bit Serial Communication in Linux
How to Run Sh File from Another Sh File
Where Is G_Multi Configured in Beaglebone Black
Socket Send Concurrency Guarantees
Enabling The Vt-X Inside a Virtual Machine