Core Dump of Multithreaded Application Shows Only One Thread

Core dump of multithreaded application shows only one thread

It turned out to be kernel bug in default Red Hat Enterprise 5.3, fixed in later Red Hat version (5.4) - kernel-2.6.18-164.el5

http://docs.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/5/html-single/5.4_Technical_Notes/index.html

1.110.1. RHSA-2009:1193: Important security and bug fix update on 32-bit systems, core dumps for some multithreaded applications did not include all thread information. (BZ#505322)

https://bugzilla.redhat.com/show_bug.cgi?id=505322

core dump in a multithread program

This line is wrong:

printf("Thread %d starting...\n",*tid);

You can achieve what you want with:

printf("Thread %d starting...\n",(int) t);

or
printf("Thread %d starting...\n", tid);

When using a coredump in gdb how do I know exactly which thread caused SIGSEGV?

When you use gdb to analyze the core dump file, the gdb will stop at the function which causes program core dump. And the current thread will be the murder. Take the following program as an example:

#include <stdio.h>
#include <pthread.h>
void *thread_func(void *p_arg)
{
        while (1)
        {
                printf("%s\n", (char*)p_arg);
                sleep(10);
        }
}
int main(void)
{
        pthread_t t1, t2;

        pthread_create(&t1, NULL, thread_func, "Thread 1");
        pthread_create(&t2, NULL, thread_func, NULL);

        sleep(1000);
        return;
}

The t2 thread will cause program down because it refers a NULL pointer. After the program down, use gdb to analyze the core dump file:

[root@localhost nan]# gdb -q a core.32794
Reading symbols from a...done.
[New LWP 32796]
[New LWP 32795]
[New LWP 32794]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `./a'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00000034e4281451 in __strlen_sse2 () from /lib64/libc.so.6
(gdb)

The gdb stops at __strlen_sse2 function, this means this function causes the program down. Then use bt command to see it is called by which thread:

(gdb) bt
#0  0x00000034e4281451 in __strlen_sse2 () from /lib64/libc.so.6
#1  0x00000034e4268cdb in puts () from /lib64/libc.so.6
#2  0x00000000004005cc in thread_func (p_arg=0x0) at a.c:7
#3  0x00000034e4a079d1 in start_thread () from /lib64/libpthread.so.0
#4  0x00000034e42e8b6d in clone () from /lib64/libc.so.6
(gdb) i threads
  Id   Target Id         Frame
  3    Thread 0x7ff6104c1700 (LWP 32794) 0x00000034e42accdd in nanosleep () from /lib64/libc.so.6
  2    Thread 0x7ff6104bf700 (LWP 32795) 0x00000034e42accdd in nanosleep () from /lib64/libc.so.6
* 1    Thread 0x7ff60fabe700 (LWP 32796) 0x00000034e4281451 in __strlen_sse2 () from /lib64/libc.so.6

The bt command shows the stack frame of the current thread(which is the murder). "i threads" commands shows all the threads, the thread number which begins with * is the current thread.

As for "How are the threads numbered?", it depends on the OS. you can refer the gdb manual for more information.

Segmentation fault (core dumped) with multiple threads

An immediate problem leading to segfault is in (irrelevant details omitted):

    if(seats+how_many_seats>N_seat) {
        ....
    } else {       
        int c=0,i=0;
        pthread_mutex_lock(&lock_plan);
        while(c<how_many_seats) {
            if(!plan[i]){
                plan[i]=tid;
                c++;
            }
            i++;
        }
        seats+=how_many_seats;
        pthread_mutex_unlock(&lock_plan);

The code determines if it can satisfy the request, and happily proceeds to reserving seats. Only then it locks the plan. Meanwhile, between testing for seats+how_many_seats>N_seat and locking the plan, another thread does the same and modifies the plan. After that there is less seats available than the first thread expects, and the while(c<how_many_seats) loop accesses plan off bounds.

I didn't check the rest; I expect other similar problems. The non-volatile globals are very suspicious. In any case, do yourself a favor and use more functions.

Segmentation fault(core dumped) in multi threading using boost threads

You can use Valgrind, its very easy. Build your app in debug config and pass program executable to valgrind. It can tell you wide spectre of programming errors occuring in your app in runtime. The price of using Valgrind is that program runs considerably slower (some times tens times slower) than without Valgrind. Specically, for example, Valgrind will tell you where your your programs' memory was free'ed first when it tried to free it second time when it happens.

Core Dump of Multithreaded Application Shows Only One Thread