Portable Compare and Swap (Atomic Operations) C/C++ Library

Portable Compare And Swap (atomic operations) C/C++ library?

Intel Threading Building Blocks has a nice portable atomic<T> template which does what you want. But whether it is a small library or not can of course be debated..

How can I implement a portable pointer compare and swap?

I would use a Hardware Abstraction Layer, (HAL) that allows generic code to be common - and any portable source can be included and build for each platform.

In my opinion, this allows for better structured and more readable source.

To allow you to better understand this process I would suggest Google for finding examples and explanations.

Hopefully this brief answer helps.

[EDIT] I will attempt a simple example for Bionix, to show how to implement a HAL system...

  • Mr A wants his application to run on his 'Tianhe-2' and also his 'Amiga 500'. He has the cross compilers etc and will build both binaries on his PC. He want to read keys and print to the screen.

mrAMainApplication.c contains the following...

#include "hal.h"

// This gets called every time around the main loop ...
void mainProcessLoop( void )
{
unsigned char key = 0;

// scan key ...
key = hal_ReadKey();

if ( key != 0 )
{
hal_PrintChar( key );
}
}

He then creates a header file (Remember - this is an example, not working code! )...
He creates hal.h ...

#ifndef _HAL_H_
#define _HAL_H_

unsigned char hal_ReadKey( void );
unsigned char hal_PrintChar( unsigned char pKey );

#endif // _HAL_H_

Now Mr A needs two separate source files, one for his 'Tianhe-2' system and another for his Amiga 500...

hal_A500.c

void hal_ReadKey( void )
{
// Amiga related code for reading KEYBOARD
}

void hal_PrintChar( unsigned char pKey )
{
// Amiga related code for printing to a shell...
}

hal_Tianhe2_VERYFAST.c

void hal_ReadKey( void )
{
// Tianhe-2 related code for reading KEYBOARD
}

void hal_PrintChar( unsigned char pKey )
{
// Tianhe-2 related code for printing to a shell...
}

Mr A then - when building for the Amiga - builds mrAmainApplication.c and hal_A500.c
When building for the Tianhe-2 - he uses hal_Tianhe2_VERYFAST.c instead of hal_A500.c

Right - I've written this example with some humour, this is not ear-marked at anyone, just I feel it makes the example more interesting and hopefully aids in understanding.

Neil

How to handle integer overflows with atomic increment and compare-and-swap operations?

You can build everything from compare-and-swap. The general algorithm is

int DoSomeAtomicCalculation()
{
static int value;
int capture, newValue;
do {
capture = value;
newValue = some_function_based_on(capture);
} while (!compare_and_swap(value, capture, newValue));
return newValue;
}

(I'm assuming that compare_and_swap takes a variable, a comparison value, and a swap value, and it returns true if the comparison succeeded (and the swap occurred).

UNIX Portable Atomic Operations

As of C11 there is an optional Atomic library which provides atomic operations. This is portable to whatever platform that has a C11 compiler (like gcc-4.9) with this optional feature.

The presence of the atomic can be checked with __STDC_NO_ATOMICS__and the presence of <stdatomic.h>

atomic.c

#include <stdio.h>
#include <stdlib.h>
#ifndef __STDC_NO_ATOMICS__
#include <stdatomic.h>
#endif

int main(int argc, char**argv) {
_Atomic int a;
atomic_init(&a, 42);
atomic_store(&a, 5);
int b = atomic_load(&a);
printf("b = %i\n", b);

return EXIT_SUCCESS;
}

Compiler invocations

clang -std=c11 atomic.c
gcc -std=c11 atomic.c

Atomic Operations in C on Linux

Well, nothing is there to stop you from using OSAtomic operations on other platforms. The sources for OSAtomic operations for ARM, x86 and PPC are a part of Apple's libc which is opensource. Just make sure you are not using OSSpinLock as that is specific to Mac OS X, but this can be easily replaced by Linux futexes.

See these:

http://opensource.apple.com/source/Libc/Libc-594.1.4/i386/sys/OSAtomic.s
http://opensource.apple.com/source/Libc/Libc-594.1.4/ppc/sys/OSAtomic.s
http://opensource.apple.com/source/Libc/Libc-594.1.4/arm/sys/OSAtomic.s

Alternatively, you can use the sync_* family, which I believe should work on most platforms, which I believe are described here: http://gcc.gnu.org/wiki/Atomic

Atomic operations - C

Atomic operations are very special and provide only limited support. Applying them to two variables sounds impossible for me.

Note, that it is even not garanteed that an atomic operation is really done with a resp. actomic operation (i.e. machine code command).

Out of the gcc doc. 5.47 Built-in functions for atomic memory access:

Not all operations are supported by all target processors. If a particular operation cannot be implemented on the target processor, a warning will be generated and a call an external function will be generated. The external function will carry the same name as the builtin, with an additional suffix '_n' where n is the size of the data type.

The external function probably emulates the atomic operation using a mutex.

But I guess, it would be possible with a "dirty hack" and only with certain limitations:

If 16 bit unsigned counters are sufficient, you could put two of them in one 32 bit variable where c1c2 += 0x00000001 increments one, c1c2 += 0x00010000 increments the other, and c1c2 += 0x00010001 increments both
or using the atomic operations:

/* combined counters c1 and c2 */
static uint32_t c1c2 = 0;

/* count c1 atomically */
__sync_fetch_and_add(&c1c2, 0x00000001);
/* count c2 atomically */
__sync_fetch_and_add(&c1c2, 0x00010000);
/* count c1 AND c2 atomically */
__sync_fetch_and_add(&c1c2, 0x00010001);

This has to be combined with the appropriate bit shifting and masking to access the invidiual counter values.

Of course, counter overflow could be an issue. The same might work for two 32 bit counters on a 64 bit platform (considering that atomic operations are usually only available for "machine word" width).

Btw. while googling for background info, I stumbled over this: Why does __sync_add_and_fetch work for a 64 bit variable on a 32 bit system?. I found the hint that atomic operations may require sufficient alignment of variables to work properly (which I found worth to mention).

This might be the reason why the C11 Atomic Library provides dedicated types for atomic variables (e.g. atomic_uint_least32_t).

C99 atomic load in baremetal portable library

Are you running on any systems that have uint32_t larger than a single assembly instruction word read/write size? If not, the IO to memory should be a single instructions and therefore atomic (assuming the bus is also word sized...) You get in trouble when the compiler breaks it up into multiple smaller read/writes. Otherwise, I've always had to resort to DI/EI. You could have the user configure your library such that it has information if atomic instructions or minimum 32-bit word size are available to prevent interrupt twiddling. If you have these guarantees, you don't need to verification code.

To answer the question though, on a system that must split the read/writes, your code is not safe. Imagine a case where you read your value in correctly in the "do" part, but the value gets split during the "while" part check. Further, in an extreme case, this is an infinite loop. For complete safety, you'd need a retry count and error condition to prevent that. The loop case is extreme for sure, but I'd want it just in case. That of course makes the run time longer.

Let's show a failure case for examples - will use 16-bit numbers on a machine that reads 8-bit values at a time to make it easier to follow:

  1. Value to read from memory *var is 0x1234
  2. Read 8-bit 0x12
  3. *var becomes 0x5678
  4. Read 8-bit 0x78 - value is now 0x1278 (invalid)
  5. *var becomes 0x1234
  6. Verification step reads 8-bit 0x12
  7. *var becomes 0x5678
  8. Verification reads 8-bit 0x78

Value confirmed correct 0x1278, but this is an error as *var was only 0x1234 and 0x5678.

Another failure case would be when *var just happens to change at the same frequency as your code is running, which could lead to an infinite loop as each verification fails. Or even if it did break out eventually, this would be a very hard to track performance bug.



Related Topics



Leave a reply



Submit