How Is Pthread_Join Implemented

How is pthread_join implemented?

Yes that's the general idea. For gory details of a particular implementation take a look at glibc.

Why does pthread_join not take a thread pointer?

It takes a copy of the pthread_t.

Yes.

I just don't get why it doesn't take
a pointer to that already existing thread,

pthread_t is a thread identifier. The specs consistently refer to it that way. Copying it does not duplicate the thread itself, nor consume more memory than one pthread_t occupies.

rather than copying it and
passing it over; it seems like if anything, doing so would cause more
problems and more memory use.

It does not necessarily cause more memory use, as a pthread_t is not necessarily larger than a pointer. It might be a pointer, or an integer. Even if it is a structure, however, there is no reason to think that it so large that passing it by value presents a significant problem, because the specifics are under control of the pthreads implementation. Why would implementers shoot themselves in the foot that way? Note well that passing a structure by value is not inherently less efficient than passing a pointer.

As for problems other than excessive memory use, you would have to be more specific, but I don't see any issues inherent in accessing a copy of a thread identifier directly vs. accessing a common identifier object indirectly, for the purposes of those functions that accept a pthread_t by value.

Am I missing something?

I suspect that your concerns are tied up in a misunderstanding of type pthread_t as somehow carrying data supporting thread operation as opposed to simply identifying a thread.

You may also be supposing that pthreads is a library, with a particular implementation, whereas in fact, it is first and foremost a specification, designed to afford multiple implementations. This is part of the reason for defining abstract data type pthread_t instead of specifying int or struct something * -- implementations can choose what actual type to use.

Perhaps you are also focusing too closely on the API functions. Even if in some particular implementation, passing a pthread_t by value to, say, pthread_join() were less efficient than passing a pointer to one, how much of an impact do you suppose that would actually have? pthread_join() is called infrequently, and only in cases where the caller is prepared to block. What does it matter if argument passing consumes a few more nanoseconds than it might otherwise do?

I read the
manpage, it doesn't seem to offer a reason to this.

Few manual pages provide rationale for function design, but I think the most likely explanation is essentially that form follows function. Those functions that receive a pthread_t by value do so because they do not need or want to modify the caller's value. The functions' designs reflect that.

Code of the function pthread_join - Pthread Library

There is not "the code", there are many different implementations that partially build on each other. For example

The GNU C Library if you are using gcc
Google's implementation for Android
Apple's implementation

Why is retval a void** in pthread_join?

It's because you are supposed to supply the address of a void* to pthread_join.
pthread_join will then write the address supplied by pthread_exit(void*) into the variable (who's address you supplied).

Example scenario:

typedef struct {
    // members
} input_data;

typedef struct {
    // members
} output_data;

Starting thread side:

input_data id;
pthread_create(..., start_routine, &id);

void* start_routine(void *ptr) {
    input_data *id = ptr;
    output_data *od = malloc(sizeof *od);
    // use the input data `id`, populate the output data `od`.
    pthread_exit(od);
}

Joining side:

output_data *od;
pthread_join((void**) &od);
// use `od`
free(od);

When to use pthread_exit() and when to use pthread_join() in Linux?

As explained in the openpub documentations,

pthread_exit() will exit the thread that calls it.

In your case since the main calls it, main thread will terminate whereas your spawned threads will continue to execute. This is mostly used in cases where the
main thread is only required to spawn threads and leave the threads to do their job

pthread_join
will suspend execution of the thread that has called it unless the target thread terminates

This is useful in cases when you want to wait for thread/s to terminate before further
processing in main thread.

Pthread_join of one of a number of threads

Most obvious, without restructuring your code as aix suggests, is to have each thread set something to indicate that it has finished (probably a value in an array shared between all threads, one slot per worker thread), and then signal a condition variable. Main thread waits on the condition variable and each time it wakes up, handle all threads that have indicated themselves finished: there may be more than one.

Of course that means that if the thread is cancelled you never get signalled, so use a cancellation handler or don't cancel the thread.

How Is Pthread_Join Implemented