Why Cannot I Directly Compare 2 Thread Ids Instead of Using Pthread_Equal

Is there an invalid pthread_t id?

Your assumption is incorrect to start with. pthread_t objects are opaque. You cannot compare pthread_t types directly in C. You should use pthread_equal instead.

Another consideration is that if pthread_create fails, the contents of your pthread_t will be undefined. It may not be set to your invalid value any more.

My preference is to keep the return values of the pthread_create calls (along with the thread IDs) and use that to determine whether each thread was started correctly.

Why does pthread_join not take a thread pointer?

It takes a copy of the pthread_t.

Yes.

I just don't get why it doesn't take
a pointer to that already existing thread,

pthread_t is a thread identifier. The specs consistently refer to it that way. Copying it does not duplicate the thread itself, nor consume more memory than one pthread_t occupies.

rather than copying it and
passing it over; it seems like if anything, doing so would cause more
problems and more memory use.

It does not necessarily cause more memory use, as a pthread_t is not necessarily larger than a pointer. It might be a pointer, or an integer. Even if it is a structure, however, there is no reason to think that it so large that passing it by value presents a significant problem, because the specifics are under control of the pthreads implementation. Why would implementers shoot themselves in the foot that way? Note well that passing a structure by value is not inherently less efficient than passing a pointer.

As for problems other than excessive memory use, you would have to be more specific, but I don't see any issues inherent in accessing a copy of a thread identifier directly vs. accessing a common identifier object indirectly, for the purposes of those functions that accept a pthread_t by value.

Am I missing something?

I suspect that your concerns are tied up in a misunderstanding of type pthread_t as somehow carrying data supporting thread operation as opposed to simply identifying a thread.

You may also be supposing that pthreads is a library, with a particular implementation, whereas in fact, it is first and foremost a specification, designed to afford multiple implementations. This is part of the reason for defining abstract data type pthread_t instead of specifying int or struct something * -- implementations can choose what actual type to use.

Perhaps you are also focusing too closely on the API functions. Even if in some particular implementation, passing a pthread_t by value to, say, pthread_join() were less efficient than passing a pointer to one, how much of an impact do you suppose that would actually have? pthread_join() is called infrequently, and only in cases where the caller is prepared to block. What does it matter if argument passing consumes a few more nanoseconds than it might otherwise do?

I read the
manpage, it doesn't seem to offer a reason to this.

Few manual pages provide rationale for function design, but I think the most likely explanation is essentially that form follows function. Those functions that receive a pthread_t by value do so because they do not need or want to modify the caller's value. The functions' designs reflect that.

Linux thread id comparison

If you add / remove / check elements to my_threads from different threads everything could get wild.
From your fragment of code I suspect you don't have mutex protection for this structure.

If you really did not implement locking and need to read list more often than write, consider about pthread_rwlock() interface.

UPDATE: Also could you please check sizeof(pthread_t) on your platform? If 8 (unsigned long), you should at least use %lu format in printf.

The thread ID returned by pthread_self() is not the same thing as the kernel thread ID returned by a call to gettid(2)

So, on what basis should I decide whether I should use pthread_self or
gettid to determine which thread is running the function?

You should always use pthread_self() whenever you want to identify a thread within your application. gettid() can be used for certain purposes and if you know it's Linux. For example, gettid() can be used to get seed for a thread specific seed (used in srand()).

Both are non portable.

This is not entirely true. gettid() is not portable as its a Linux specific function. But pthread_self() is portable as long as you don't make any assumptions about its representation.

For example, the following is not portable.

printf("Thread ID is: %ld", (long) pthread_self());

as there's no guarantee that whatever pthread_self() is going to be an integer of some sort. But

pthread_t my_tid; //filled elsewhere

pthread_t tid = pthread_self();

if( pthread_equal(my_tid, tid) ) {
   /* do stuff */
}

is fully portable.

The former is not portable because it assumes that thread id is an integer whereas the latter is not.

Why are there two different functions to get the thread ID?

They are not two different ways to get the same value. One (pthread_self() is provided by the thread library (pthreads) while the other (gettid()is an OS-specific function. A different OS may provide a different interface/syscall to get thread ID similar to gettid(). So you can't rely on gettid() in a portable application.

A Unique and Constant Identifier for a pthreads thread?

You cannot rely on a pthread_t being unique, but you can use pthread_equal() to determine whether two thread ids refer to the same thread.

NAME
     pthread_equal -- compare thread IDs

SYNOPSIS
     #include <pthread.h>

     int
     pthread_equal(pthread_t t1, pthread_t t2);

DESCRIPTION
     The pthread_equal() function compares the thread IDs t1 and t2.

RETURN VALUES
     The pthread_equal() function will return non-zero if the thread IDs t1 and t2
     correspond to the same thread. Otherwise, it will return zero.

Is there a safe method to check if a pthread exists?

... can I disable the ability for thread ids to be reused at runtime?

No, you can't.

Should not std::thread::id default constructor create a NULL id?

UPDATE: Jonathan Wakely kindly looked at the issue an he says (below in comments) that -pthread has to be passed to both the compiler and the linker. If I do that the code does not fail with gcc 4.7.2 either. So the answer has apparently nothing to do with the quoted e-mail. Thanks Jonathan!

~~Here are some quotes straight form the gcc developer Jonathan Wakely's mail, written in 2011:~~

All the comparison operators on our std::thread::id rely on undefined
behaviour because our thread::id is just a pthread_t.

[...]

2) operator== uses pthread_equal, which is undefined for invalid
thread IDs, POSIX says:
   If either t1 or t2 are not valid thread IDs, the behavior is undefined.

Although it was written two years ago, it probably still applies.

At the moment I cannot check the gcc codebase to say more.

Weird. The following code:

#include <iostream>
#include <thread>

int main() {

    std::cout << "Started" << std::endl;

    std::thread::id nobody;

    if ( nobody != std::this_thread::get_id() )  {

      std::cout << "OK" << std::endl;
    }

    std::cout << "Finished" << std::endl;
}

produces:

Started 
OK 
Finished

Check here. However your code does fail with 4.7.2.