Interpreting Segfault Messages

Interpreting segfault messages

This is a segfault due to following a null pointer trying to find code to run (that is, during an instruction fetch).

If this were a program, not a shared library

Run addr2line -e yourSegfaultingProgram 00007f9bebcca90d (and repeat for the other instruction pointer values given) to see where the error is happening. Better, get a debug-instrumented build, and reproduce the problem under a debugger such as gdb.

Since it's a shared library

You're hosed, unfortunately; it's not possible to know where the libraries were placed in memory by the dynamic linker after-the-fact. Reproduce the problem under gdb.

What the error means

Here's the breakdown of the fields:

address (after the at) - the location in memory the code is trying to access (it's likely that 10 and 11 are offsets from a pointer we expect to be set to a valid value but which is instead pointing to 0)
ip - instruction pointer, ie. where the code which is trying to do this lives
sp - stack pointer

error - An error code for page faults; see below for what this means on x86 (link).

/*
 * Page fault error code bits:
 *
 *   bit 0 ==    0: no page found       1: protection fault
 *   bit 1 ==    0: read access         1: write access
 *   bit 2 ==    0: kernel-mode access  1: user-mode access
 *   bit 3 ==                           1: use of reserved bit detected
 *   bit 4 ==                           1: fault was an instruction fetch
 *   bit 5 ==                           1: protection keys block access
 *   bit 15 ==                          1: SGX MMU page-fault
 */

Ubuntu: segfault at 125 ip 00cd6df4 sp bfeef720 error 6 in libQtCore.so.4.7.4[b51000+2ca000]?

I found good explaination about this error on stackoverflow forum itself. Click Here

I am also pasting the same below (bit modified as per the error i m facing):

Error 6 is ENXIO (No such device or address). It may be that libQtWebKit is habitually mishandling that error, or it may be that there's something else that's going on.

If this were a program, not a shared library

Run

addr2line -e yourSegfaultingProgram 00cd6df4

(and repeat for the other instruction pointer values given) to see where the error is happening. Better, get a debug-instrumented build, and reproduce the problem under a debugger such as gdb.

Since it's a shared library

You're hosed, unfortunately; it's not possible to know where the libraries were placed in memory by the dynamic linker after-the-fact. Reproduce the problem under gdb.

What the error means

Here's the breakdown of the fields:

address - the location in memory the code is trying to access (it's likely that 10 and 11 are offsets from a pointer we expect to be set to a valid value but which is instead pointing to 0)

ip - instruction pointer, ie. where the code which is trying to do this lives

sp - stack pointer

error - value of errno, ie. last error code which was reported by a syscall

Also When system requests fail, error code are returned. To understand the nature of the error these codes need to be interpreted. They are recorded in:-

  /usr/include/asm/errno.h

Click Here to get the list of Errors and Meaning of each errors

How do I map a segfault instruction pointer address from /var/log/messages to an address/function in my .map file?

According to the message it looks like it crashed inside memcpy(), from libc-2.9.so, which is mapped in to your process starting at 0x7f7e79185000. This is expected since memcpy is the function that is attempting to dereference the pointer. The instruction pointer looks valid since it is within the range of libc. If you were intending to override memcpy and call your own version, you may need to compile with -fno-builtin-memcpy.

Edit: You may be linking libc statically but according to the message you also have the libc shared library mapped into your process memory. You should see it listed in /proc/pid/maps while your program is running. It may be that you are linking with another shared library, such as libstdc++, and it depends on the libc shared library. As a result you have two versions of memcpy, and in this case it is calling the libc shared library version which is mapped at the high address. If you don't want the libc shared library then make sure you are linking all libraries statically; use the -static option at the beginning of your link line.

Configure kern.log to give more info about a segfault

The short answer is: No, it is not possible without making code changes and recompiling the kernel. The normal solution to this problem is to instruct your students to name their executable <student user name>_ex3.x so that you can easily have this information.

However, it is possible to get the information you desire from other methods. Appleman1234 has provided some alternatives in his answer to this question.

How do we know the answer is "Not possible to the the full path in the kern.log segfault messages without recompiling the kernel":

We look in the kernel source code to find out how the message is produced and if there are any configuration options.

The files in question are part of the kernel source. You can download the entire kernel source as an rpm package (or other type of package) for whatever version of linux/debian you are running from a variety of places.

Specifically, the output that you are seeing is produced from whichever of the following files is for your architecture:

linux/arch/sparc/mm/fault_32.c
linux/arch/sparc/mm/fault_64.c
linux/arch/um/kernel/trap.c
linux/arch/x86/mm/fault.c

An example of the relevant function from one of the files(linux/arch/x86/mm/fault.c):

/*
 * Print out info about fatal segfaults, if the show_unhandled_signals
 * sysctl is set:
 */
static inline void
show_signal_msg(struct pt_regs *regs, unsigned long error_code,
        unsigned long address, struct task_struct *tsk)
{
    if (!unhandled_signal(tsk, SIGSEGV))
        return;

    if (!printk_ratelimit())
        return;

    printk("%s%s[%d]: segfault at %lx ip %p sp %p error %lx",
        task_pid_nr(tsk) > 1 ? KERN_INFO : KERN_EMERG,
        tsk->comm, task_pid_nr(tsk), address,
        (void *)regs->ip, (void *)regs->sp, error_code);

    print_vma_addr(KERN_CONT " in ", regs->ip);

    printk(KERN_CONT "\n");
}

From that we see that the variable passed to printout the process identifier is tsk->comm where struct task_struct *tsk and regs->ip where struct pt_regs *regs

Then from linux/include/linux/sched.h

struct task_struct {
    ...
    char comm[TASK_COMM_LEN]; /* executable name excluding path
                                 - access with [gs]et_task_comm (which lock
                                   it with task_lock())
                                 - initialized normally by setup_new_exec */

The comment makes it clear that the path for the executable is not stored in the structure.

For regs->ip where struct pt_regs *regs, it is defined in whichever of the following are appropriate for your architecture:

arch/arc/include/asm/ptrace.h
arch/arm/include/asm/ptrace.h
arch/arm64/include/asm/ptrace.h
arch/cris/include/arch-v10/arch/ptrace.h
arch/cris/include/arch-v32/arch/ptrace.h
arch/metag/include/asm/ptrace.h
arch/mips/include/asm/ptrace.h
arch/openrisc/include/asm/ptrace.h
arch/um/include/asm/ptrace-generic.h
arch/x86/include/asm/ptrace.h
arch/xtensa/include/asm/ptrace.h

From there we see that struct pt_regs is defining registers for the architecture. ip is just: unsigned long ip;

Thus, we have to look at what print_vma_addr() does. It is defined in mm/memory.c

/*
 * Print the name of a VMA.
 */
void print_vma_addr(char *prefix, unsigned long ip)
{
    struct mm_struct *mm = current->mm;
    struct vm_area_struct *vma;

    /*
     * Do not print if we are in atomic
     * contexts (in exception stacks, etc.):
     */
    if (preempt_count())
        return;

    down_read(&mm->mmap_sem);
    vma = find_vma(mm, ip);
    if (vma && vma->vm_file) {
        struct file *f = vma->vm_file;
        char *buf = (char *)__get_free_page(GFP_KERNEL);
        if (buf) {
            char *p;

            p = d_path(&f->f_path, buf, PAGE_SIZE);
            if (IS_ERR(p))
                p = "?";
            printk("%s%s[%lx+%lx]", prefix, kbasename(p),
                    vma->vm_start,
                    vma->vm_end - vma->vm_start);
            free_page((unsigned long)buf);
        }
    }
    up_read(&mm->mmap_sem);
}

Which shows us that a path was available. We would need to check that it was the path, but looking a bit further in the code gives a hint that it might not matter. We need to see what kbasename() did with the path that is passed to it. kbasename() is defined in include/linux/string.h as:

/**
 * kbasename - return the last part of a pathname.
 *
 * @path: path to extract the filename from.
 */
static inline const char *kbasename(const char *path)
{
    const char *tail = strrchr(path, '/');
    return tail ? tail + 1 : path;
}

Which, even if the full path is available prior to it, chops off everything except for the last part of a pathname, leaving the filename.

Thus, no amount of runtime configuration options will permit printing out the full pathname of the file in the segment fault messages you are seeing.

NOTE: I've changed all of the links to kernel source to be to archives, rather than the original locations. Those links will get close to the code as it was at the time I wrote this, 2104-09. As should be no surprise, the code does evolve over time, so the code which is current when you're reading this may or may not be similar or perform in the way which is described here.

kernel - postgres segfault error 15 in libc-2.19.so

error code 15 means: NX_EDEADLK 15

No, it doesn't mean that. This answer explains how to interpret 15 here.

It's bits 0, 1, 2, 3 set => protection fault, write access, user mode, use of reserved bit. Most likely your postgress process attempted to write to some wild pointer.

if we can do something to avoid this problem in the future?

The only thing you can do is find the bug and fix it, or upgrade to a release of postgress where that bug is already fixed (and hope that no new ones were introduced).

To understand where the bug might be, you should check whether a core dump was produced (if not, do enable them). If you have the core, use gdb /path/to/postgress /path/to/core, and then where GDB command. That will give you crash stack trace, which may allow you to find similar bug reports.

Interpreting Segfault Messages