How to Implement Getthreadcontext in Linux/Unix

How to retrieve the register information of the specified thread in Linux?

The ptrace system call is the standard debugging facility in Linux, which GDB uses to interact with other processes being debugged. GDB probably gets this information with the PTRACE_GETREGS or PTRACE_GETREGSET requests.

ptrace man page

How do you stop a thread and flush its registers into the stack?

I thought this was a very interesting question, so I dug into it a bit. It turns out that the Hotspot JVM uses a mechanism called "safepoints" which cause the threads of the JVM to cooperatively all stop themselves so that the GC can begin. In other words, the thread initiating GC doesn't forcibly stop the other threads, the other threads voluntarily suspend themselves by various clever mechanisms.

I don't believe the JVM scans registers, because a safepoint is defined such that all roots are known (I presume this means in memory).

For more information see:

HotSpot Glossary -- which defines safepoints
safepoint.cpp -- the source in HotSpot that implements safepoints
A slide deck that describes safepoints in some detail (look 10 slides or so in)

In regards to your desire to "interrupt" all threads, according to the slide deck I referenced above, thread suspension is "unreliable on Solaris and Linux, e.g., spurious signals." I'm not sure what mechanism even exists for thread suspension that the slides would be referring to.

Anti-debugging: gdb does not write 0xcc byte for breakpoints. Any idea why?

Second part is easily explained (as Flortify correctly stated):
GDB shows original memory contents, not the breakpoint "bytes". In default mode it actually even removes breakpoints when debugger suspends and re-inserts them before continuing. Users typically want to see their code, not strange modified instructions used for breakpoints.

With your C code you missed breakpoint for few bytes. GDB sets breakpoint after function prologue, because function prologue is not typically what gdb users want to see. So, if you put break to foo, actual breakpoint will be typically located few bytes after that (depends on prologue code itself that is function dependent as it may or might not have to save stack pointer, frame pointer and so on). But it is easy to check. I used this code:

#include <stdio.h>
int main()
{
    int i,j;
    unsigned char *p = (unsigned char*)main;

    for (j=0; j<4; j++) {
        printf("%p: ",p);
        for (i=0; i<16; i++)
            printf("%.2x ", *p++);
        printf("\n");
    }
    return 0;
}

If we run this program by itself it prints:

0x40057d: 55 48 89 e5 48 83 ec 10 48 c7 45 f8 7d 05 40 00
0x40058d: c7 45 f4 00 00 00 00 eb 5a 48 8b 45 f8 48 89 c6
0x40059d: bf 84 06 40 00 b8 00 00 00 00 e8 b4 fe ff ff c7
0x4005ad: 45 f0 00 00 00 00 eb 27 48 8b 45 f8 48 8d 50 01

Now we run it in gdb (output re-formatted for SO).

(gdb) break main
Breakpoint 1 at 0x400585: file ../bp.c, line 6.
(gdb) info break
Num     Type           Disp Enb Address            What
1       breakpoint     keep y   0x0000000000400585 in main at ../bp.c:6
(gdb) disas/r main,+32
Dump of assembler code from 0x40057d to 0x40059d:
  0x000000000040057d (main+0):  55                        push %rbp
  0x000000000040057e (main+1):  48 89 e5                  mov %rsp,%rbp
  0x0000000000400581 (main+4):  48 83 ec 10               sub $0x10,%rsp
  0x0000000000400585 (main+8):  48 c7 45 f8 7d 05 40 00   movq $0x40057d,-0x8(%rbp)
  0x000000000040058d (main+16): c7 45 f4 00 00 00 00      movl $0x0,-0xc(%rbp)
  0x0000000000400594 (main+23): eb 5a                     jmp 0x4005f0 
  0x0000000000400596 (main+25): 48 8b 45 f8               mov -0x8(%rbp),%rax
  0x000000000040059a (main+29): 48 89 c6                  mov %rax,%rsi
End of assembler dump.

With this we verified, that program is printing correct bytes. But this also shows that breakpoint has been inserted at 0x400585 (that is after function prologue), not at first instruction of function.
If we now run program under gdb (with run) and then "continue" after breakpoint is hit, we get this output:

(gdb) cont
Continuing.
0x40057d: 55 48 89 e5 48 83 ec 10 cc c7 45 f8 7d 05 40 00 
0x40058d: c7 45 f4 00 00 00 00 eb 5a 48 8b 45 f8 48 89 c6 
0x40059d: bf 84 06 40 00 b8 00 00 00 00 e8 b4 fe ff ff c7 
0x4005ad: 45 f0 00 00 00 00 eb 27 48 8b 45 f8 48 8d 50 01

This now shows 0xcc being printed for address 9 bytes into main.

How would a debugger running in Linux/Windows read the PC register on ARM32 & Aarch64?

ptrace(2) has a PTRACE_GETREGS option that reads all the general-purpose registers of the tracee, into a struct user_regs_struct as defined in <sys/user.h>. For AArch64, this struct has an array of size 31 for registers x0 through x30, as well as separate fields for sp, pc and pstate. So you could do (untested):

#include <sys/ptrace.h>
#include <sys/user.h>

struct user_regs_struct regs;
if (ptrace(PTRACE_GETREGS, pid, NULL, ®s) < 0)
    die();
printf("pc is %#llx\n", regs.pc);

For ARM32 it looks like the struct is called struct user_regs instead, which is just an array of size 18. I would guess that it is r0 through r15 (where r15 is pc), then maybe pstate and something else? You'd have to check kernel or GDB sources to confirm. So you can probably do (very untested):

#include <sys/ptrace.h>
#include <sys/user.h>

struct user_regs regs;
if (ptrace(PTRACE_GETREGS, pid, NULL, ®s) < 0)
    die();
printf("pc is %#lx\n", regs.uregs[15]);

Querying a Threads Parameter using Context

There is no reason at all to expect that the EBX register of a thread will contain the parameter passed to the thread procedure. The EBX register is a general purpose register and will contain whatever the thread happens to have last put in that register to do whatever it happens to be doing.

How do I get the parameter back using the Thread's context?

You don't. The CONTEXT struct does not contain that information.

It is possible that you could find the stack frame associated with the call to TestThread. So long as the thread has not overwritten that part of the stack that was used to pass the parameter, it will contain the value you are interested in.

As it happens, in your code it seems to me that a thread that is created suspended will not actually have made it as far as the thread procedure. So even the method described above is not likely to work.

Getting a handle to the process's main thread

DWORD GetMainThreadId () {
    const std::tr1::shared_ptr<void> hThreadSnapshot(
        CreateToolhelp32Snapshot(TH32CS_SNAPTHREAD, 0), CloseHandle);
    if (hThreadSnapshot.get() == INVALID_HANDLE_VALUE) {
        throw std::runtime_error("GetMainThreadId failed");
    }
    THREADENTRY32 tEntry;
    tEntry.dwSize = sizeof(THREADENTRY32);
    DWORD result = 0;
    DWORD currentPID = GetCurrentProcessId();
    for (BOOL success = Thread32First(hThreadSnapshot.get(), &tEntry);
        !result && success && GetLastError() != ERROR_NO_MORE_FILES;
        success = Thread32Next(hThreadSnapshot.get(), &tEntry))
    {
        if (tEntry.th32OwnerProcessID == currentPID) {
            result = tEntry.th32ThreadID;
        }
    }
    return result;
}