How to Share a Register Between Threads

Is it possible to share a register between threads?

Your question seems reasonable at first glance. Other people have tried to answer the question directly. First we have two fairly nebulous concepts,

Threads
Registers

If you talk to Ada folks, they will freak out at the lack of definition of a linux or posix threads. They like something more like Java's green threads with very deterministic scheduling. I think you mean threads that are fast for the processor, like posix threads.

The 2^nd issue is what is a register? To most people they are limited to 8,16 or 32 registers that are hard coded in the CPU's instruction set. There are often second class registers that can be accessed by other means. Mainly they are are amazingly fast.

The inverse

The inverse of your question is quite common. How to set a register to a different value for each thread. The general purpose registers are use by the compiler and the ABI of the compiler is intimately familiar to the OS context switch code. What may not be clear is that things like the upper bits of a stack register may be constant every time a thread runs; but are different for each thread. That is to say that each thread has its own stack.

With ARM Linux, a special co-processor register is used to implement thread local storage. The co-processor register is slower to access than a general purpose register, but it is still quite fast. That takes us to the difference between a process and a thread.

Endemic to threads

A process has a completely different memory layout. Ie, the mmu page tables switch for different processes. For a thread, the register set may be different, but all of regular memory is shared between threads. For this reason, there is lots of mutexes when you do thread programming.

Now, consider a CPU cache. It is ultra-fast memory just like a general purpose register. The only difference is the amount of instructions it takes to address it.

Answer

All of the OS's and CPUs already have this! Each thread shares memory and that memory is cached. Loading a global variable in two threads from cache is near as fast as register access. As the thread register you propose can only hold a pointer, you would need to de-reference it to access some larger entity. Loading a global variable will be nearly as fast and the compiler is free to put this in any register it likes. It is also possible for the compiler to use these registers in routines that don't need this access. So even, if there was an OS that reserved a general purpose register to be the same between threads, it would only be faster for a very small set of applications.

How are registers shared among threads?

Registers are used by the CPU when it's currently running a particular thread. When the OS decides to switch from one thread to another, the OS saves the current values of all the register into a private memory area specific to the first thread. Before the second thread starts running, the OS loads the values of all the registers from its saved area. This is called a context switch.

Do threads of a process share the same register set or only the current running thread has the whole register set dedicated to it

Threads are oblivious to those things.

During a context switch, the state of cpu registers are saved and restored the same way whether or not the threads of execution are within the same process or different processes.

A process always has the register set dedicated to it

False. It is the same registers for all processes. The process just think it is dedicated to itself.

Does the thread have exclusive access to the entire register set

Yes

What resources are shared between threads?

You're pretty much correct, but threads share all segments except the stack. Threads have independent call stacks, however the memory in other thread stacks is still accessible and in theory you could hold a pointer to memory in some other thread's local stack frame (though you probably should find a better place to put that memory!).

Is there a way to view the register contents of one thread from another thread within the same process?

You might be able to work around this using a signal. Pick an otherwise unused signal, eg SIGUSR1 and install a signal handler for it using the sa_sigaction member of struct sigaction and specifying the SA_SIGINFO flag. Block the signal in every thread except the thread of interest (thread B).

When you want to examine thread B, send a thread-directed signal to it using pthread_kill(). The signal handler will then fire, and its third argument will be a pointer to a ucontext_t structure. The uc_mcontext member of this structure is a machine-dependent mcontext_t structure, which will contain the register values at the point that the thread was interrupted.

You then just need to devise a safe way to pass these values back to thread A.

CUDA Block-level Shared Registers

Generally, no.

You have no mechanism to force or direct the compiler to place items in registers. For example the C++ register keyword does not do this. Register usage is under control of the compiler.
Even if you could, registers are not "shared" the way shared memory is or can be, visible to multiple threads in the warp/threadblock. Registers, once allocated to the thread that owns them, are visible that thread only. They cannot become visible to any other thread in the warp or threadblock.

If you simply had a need for per-thread local storage, that is not shared among threads, and your usage met a number of other requirements, you might be able to "coax" the compiler into placing those items in registers. This forum thread offers an example on how it may be done, and what it looks like. Again, once the compiler sees its way clear to do so, and decides that there is good reason to do so (i.e. a "belief" of possible performance improvement) the compiler will automatically place items in registers. You don't have direct control over it.