Linux: handling a segmentation fault and getting a core dump
The answer: set the sigaction with flag SA_RESETHAND
and just return from the handler. The same instruction occurs again, causing a segmentation fault again and invoking the default handler.
Analyzing segmentation fault without core file
In the past, I had to deal with this kind of restriction on several occasions. A segmentation fault or, more generally, abnormal process termination had to be investigated with the caveat that a core dump was not available.
For Linux, our platform of choice for this walkthrough, a few reasons come to mind:
- Core dump generation is disabled altogether (using
limits.conf
orulimit
) - The target directory (current working directory or a directory in
/proc/sys/kernel/core_pattern
) does not exist or is inaccessible due to filesystem permissions or SELinux - The target filesystem has insufficient diskspace resulting in a partial dump
For all of those, the net result is the same: there's no (valid) core dump to use for analysis. Fortunately, a workaround exists for post-mortem debugging that has the potential to save the day, but given it's inherent limitations, your mileage may vary from case to case.
Identifying the Faulting Instruction
The following sample contains a classic use-after-free memory error:
#include <iostream>
struct Test
{
const std::string &m_value;
Test(const std::string &value):
m_value(value)
{
}
void print()
{
std::cout << m_value << std::endl;
}
};
int main()
{
std::string *value = new std::string("this is a test");
Test test(*value);
delete value;
test.print();
return 0;
}
After delete value
, the std::string
reference Test::m_value
points to inaccessible memory. Therefore, running it results in a segmentation fault:
$ ./a.out
Segmentation fault
When a process terminates due to an access violation, the Linux kernel creates a log entry accessible via dmesg
and, depending on the system's configuration, the syslog (usually /var/log/messages
). The example (compiled with -O0
) creates the following entry:
$ dmesg | grep segfault
[80440.957955] a.out[7098]: segfault at ffffffffffffffe8 ip 00007f9f2c2b56a3 sp 00007ffc3e75bc48 error 5 in libstdc++.so.6.0.19[7f9f2c220000+e9000]
The corresponding Linux kernel source from arch/x86/mm/fault.c
:
printk("%s%s[%d]: segfault at %lx ip %px sp %px error %lx",
loglvl, tsk->comm, task_pid_nr(tsk), address,
(void *)regs->ip, (void *)regs->sp, error_code);
The error (error_code
) reveals what the trigger was. It's a CPU-specific bit set (x86). In our case, the value 5
(101
in binary) indicates that the page represented by the faulting address 0xffffffffffffffe8
was mapped but inaccessible due to page protection and a read was attempted.
The log message identifies the module that executed the faulting instruction: libstdc++.so.6.0.1
. The sample was compiled without optimization, so the call to std::basic_ostream<char, std::char_traits<char> >& std::operator<< <char, std::char_traits<char>, std::allocator<char> >(std::basic_ostream<char, std::char_traits<char> >&, std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)
was not inlined:
400bef: e8 4c fd ff ff callq 400940 <_ZStlsIcSt11char_traitsIcESaIcEERSt13basic_ostreamIT_T0_ES7_RK
SbIS4_S5_T1_E@plt>
The STL performs the read access. Knowing those basics, how can we identify where the segmentation fault occurred exactly? The log entry features two essential addresses we need for doing so:
ip 00007f9f2c2b56a3 [...] error 5 in
^^^^^^^^^^^^^^^^
libstdc++.so.6.0.19[7f9f2c220000+e9000]
^^^^^^^^^^^^
The first is the instruction pointer (rip
) at the time of the access violation, the second is the address the .text
section of the library is mapped to. By subtracting the .text
base address from rip
, we get the relative address of the instruction in the library and can disassemble the implementation using objdump
(you can simply search for the offset):
0x7f9f2c2b56a3-0x7f9f2c220000=0x956a3
$ objdump --demangle -d /usr/lib64/libstdc++.so.6
[...]
00000000000956a0 <std::basic_ostream<char, std::char_traits<char> >& std::operator<< <char, std::char_traits<char>, s
td::allocator<char> >(std::basic_ostream<char, std::char_traits<char> >&, std::basic_string<char, std::char_traits<ch
ar>, std::allocator<char> > const&)@@GLIBCXX_3.4>:
956a0: 48 8b 36 mov (%rsi),%rsi
956a3: 48 8b 56 e8 mov -0x18(%rsi),%rdx
^^^^^
956a7: e9 24 4e fc ff jmpq 5a4d0 <std::basic_ostream<char, std::char_traits<char> >& std::__ostream_insert<char, std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&, char const*, long)@plt>
956ac: 0f 1f 40 00 nopl 0x0(%rax)
[...]
Is that the correct instruction? We can consult GDB to confirm our analysis:
Program received signal SIGSEGV, Segmentation fault.
0x00007ffff7b686a3 in std::basic_ostream<char, std::char_traits<char> >& std::operator<< <char, std::char_traits<char>, std::allocator<char> >(std::basic_ostream<char, std::char_traits<char> >&, std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) () from /lib64/libstdc++.so.6
Missing separate debuginfos, use: debuginfo-install glibc-2.17-323.el7_9.x86_64 libgcc-4.8.5-44.el7.x86_64 libstdc++-4.8.5-44.el7.x86_64
(gdb) disass
Dump of assembler code for function _ZStlsIcSt11char_traitsIcESaIcEERSt13basic_ostreamIT_T0_ES7_RKSbIS4_S5_T1_E:
0x00007ffff7b686a0 <+0>: mov (%rsi),%rsi
=> 0x00007ffff7b686a3 <+3>: mov -0x18(%rsi),%rdx
0x00007ffff7b686a7 <+7>: jmpq 0x7ffff7b2d4d0 <_ZSt16__ostream_insertIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_PKS3_l@plt>
End of assembler dump.
GDB shows the very same instruction. We can also use a debugging session to verify the read address:
(gdb) print /x $rsi-0x18
$2 = 0xffffffffffffffe8
This value matches the read address in the log entry.
Identifying the Callers
So, despite the absence of a core dump, the kernel output enables us to identify the exact location of the segmentation fault. In many scenarios, though, that is far from being enough. For one thing, we're missing the list of calls that got us to that point - the call stack or stack trace.
Without a dump in the backpack, you have two options to get hold of the callers: you can start your process using catchsegv
(a glibc utility) or you can implement your own signal handler.
catchsegv
serves as a wrapper, generates the stack trace, and also dumps register values and the memory map:
$ catchsegv ./a.out
*** Segmentation fault
Register dump:
RAX: 0000000002158040 RBX: 0000000002158040 RCX: 0000000002158000
[...]
Backtrace:
/lib64/libstdc++.so.6(_ZStlsIcSt11char_traitsIcESaIcEERSt13basic_ostreamIT_T0_ES7_RKSbIS4_S5_T1_E+0x3)[0x7f1794fd36a3]
??:?(_ZN4Test5printEv)[0x400bf4]
??:?(main)[0x400b2d]
/lib64/libc.so.6(__libc_start_main+0xf5)[0x7f179467a555]
??:?(_start)[0x4009e9]
Memory map:
00400000-00401000 r-xp 00000000 08:02 50331747 /home/user/a.out
[...]
7f1794f3e000-7f1795027000 r-xp 00000000 08:02 33600977 /usr/lib64/libstdc++.so.6.0.19
7f1795027000-7f1795227000 ---p 000e9000 08:02 33600977 /usr/lib64/libstdc++.so.6.0.19
7f1795227000-7f179522f000 r--p 000e9000 08:02 33600977 /usr/lib64/libstdc++.so.6.0.19
7f179522f000-7f1795231000 rw-p 000f1000 08:02 33600977 /usr/lib64/libstdc++.so.6.0.19
[...]
How does catchsegv
work? It essentially injects a signal handler using LD_PRELOAD
and the library libSegFault.so
. If your application already happens to install a signal handler for SIGSEGV
and you intend to take advantage of libSegFault.so
, your signal handler needs to forward the signal to the original handler (as returned by sigaction(SIGSEGV, NULL)
).
The second option is to implement the stack trace functionality yourself using a custom signal handler and backtrace()
. This allows you to customize the output location and the output itself.
Based on that information, we can essentially do the same we did before (0x7f1794fd36a3-0x7f1794f3e000=0x956a3
). This time around, we can go back to the callers to dig deeper. The second frame is represented by the following line:
??:?(_ZN4Test5printEv)[0x400bf4]
0x400bf4
is the address the callee returns to after Test::print()
, it's located in the executable. We can visualize the call site as follows:
$ objdump --demangle -d ./a.out
[...]
400bea: bf a0 20 60 00 mov $0x6020a0,%edi
400bef: e8 4c fd ff ff callq 400940 <std::basic_ostream<char, std::char_traits<char> >& std::operator<< <char, std:
:char_traits<char>, std::allocator<char> >(std::basic_ostream<char, std::char_traits<char> >&, std::basic_string<char, std::char_trai
ts<char>, std::allocator<char> > const&)@plt>
400bf4: be 70 09 40 00 mov $0x400970,%esi
^^^^^^
400bf9: 48 89 c7 mov %rax,%rdi
400bfc: e8 5f fd ff ff callq 400960 <std::ostream::operator<<(std::ostream& (*)(std::ostream&))@plt>
[...]
Note that the output of objdump matches the address in this instance because we run it against the executable, which has a default base address of 0x400000
on x86_64 - objdump takes that into account. With address space layout randomization (ASLR) enabled (compiled with -fpie
, linked with -pie
), the base address has to be taken into account as outlined before.
Going back further involves the same steps:
??:?(main)[0x400b2d]
$ objdump --demangle -d ./a.out
[...]
400b1c: e8 af fd ff ff callq 4008d0 <operator delete(void*)@plt>
400b21: 48 8d 45 d0 lea -0x30(%rbp),%rax
400b25: 48 89 c7 mov %rax,%rdi
400b28: e8 a7 00 00 00 callq 400bd4 <Test::print()>
400b2d: b8 00 00 00 00 mov $0x0,%eax
^^^^^^
400b32: eb 2a jmp 400b5e <main+0xb1>
[...]
Until now, we've been manually translating the absolute address to a relative address. Instead, the base address of the module can be passed to objdump via --adjust-vma=<base-address>
. That way, the value of rip
or a caller's address can be used directly.
Adding Debug Symbols
We've come a long way without a dump. For debugging to be effective, another critical puzzle piece is absent, however: debug symbols. Without them, it can be difficult to map the assembly to the corresponding source code. Compiling the sample with -O3
and without debug information illustrates the problem:
[98161.650474] a.out[13185]: segfault at ffffffffffffffe8 ip 0000000000400a4b sp 00007ffc9e738270 error 5 in a.out[400000+1000]
As a consequence of inlining, the log entry now points to our executable as the trigger. Using objdump gets us to the following:
400a3e: e8 dd fe ff ff callq 400920 <operator delete(void*)@plt>
400a43: 48 8b 33 mov (%rbx),%rsi
400a46: bf a0 20 60 00 mov $0x6020a0,%edi
400a4b: 48 8b 56 e8 mov -0x18(%rsi),%rdx
^^^^^^
400a4f: e8 4c ff ff ff callq 4009a0 <std::basic_ostream<char, std::char_traits<char> >& std::__ostream_insert<char, std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&, char const*, long)@plt>
400a54: 48 89 c5 mov %rax,%rbp
400a57: 48 8b 00 mov (%rax),%rax
Part of the stream implementation was inlined, making it harder to identify the associated source code. Without symbols, you have to use export symbols, calls (like operator delete(void*)
) and the surrounding instructions (mov $0x6020a0
loads the address of std::cout
: 00000000006020a0 <std::cout@@GLIBCXX_3.4>
) for the purpose of orientation.
With debug symbols (-g
), more context is available by calling objdump
with --source
:
400a43: 48 8b 33 mov (%rbx),%rsi
operator<<(basic_ostream<_CharT, _Traits>& __os,
const basic_string<_CharT, _Traits, _Alloc>& __str)
{
// _GLIBCXX_RESOLVE_LIB_DEFECTS
// 586. string inserter not a formatted function
return __ostream_insert(__os, __str.data(), __str.size());
400a46: bf a0 20 60 00 mov $0x6020a0,%edi
400a4b: 48 8b 56 e8 mov -0x18(%rsi),%rdx
^^^^^^
400a4f: e8 4c ff ff ff callq 4009a0 <std::basic_ostream<char, std::char_traits<char> >& std::__ostream_insert<char, std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&, char const*, long)@plt>
400a54: 48 89 c5 mov %rax,%rbp
That worked as expected. In the real world, debug symbols are not embedded in the binaries - they are managed in separate debuginfo packages. In those circumstances, objdump
ignores debug symbols even if they are installed. To address this limitation, symbols have to be re-added to the affected binary. The following procedure creates detached symbols and re-adds them using eu-unstrip
from elfutils
to the benefit of objdump:
# compile with debug info
g++ segv.cxx -O3 -g
# create detached debug info
objcopy --only-keep-debug a.out a.out.debug
# remove debug info from executable
strip -g a.out
# re-add debug info to executable
eu-unstrip ./a.out ./a.out.debug -o ./a.out-debuginfo
# objdump with executable containing debug info
objdump --demangle -d ./a.out-debuginfo --source
Using GDB instead of objdump
Thus far, we've been using objdump because it's usually available, even on production systems. Can we just use GDB instead? Yes, by executing gdb
with the module of interest. I use 0x0x400a4b
as in the previous objdump invocation:
$ gdb ./a.out
[...]
(gdb) disass 0x400a4b
Dump of assembler code for function main():
[...]
0x0000000000400a43 <+67>: mov (%rbx),%rsi
0x0000000000400a46 <+70>: mov $0x6020a0,%edi
0x0000000000400a4b <+75>: mov -0x18(%rsi),%rdx
0x0000000000400a4f <+79>: callq 0x4009a0 <_ZSt16__ostream_insertIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_PKS3_l@plt>
0x0000000000400a54 <+84>: mov %rax,%rbp
In contrast to objdump, GDB can deal with external symbol information without a hitch. disass /m
corresponds to objdump --source
:
(gdb) disass /m 0x400a4b
Dump of assembler code for function main():
[...]
21 Test test(*value);
22 delete value;
0x0000000000400a25 <+37>: test %rbx,%rbx
0x0000000000400a28 <+40>: je 0x400a43 <main()+67>
0x0000000000400a3b <+59>: mov %rbx,%rdi
0x0000000000400a3e <+62>: callq 0x400920 <_ZdlPv@plt>
23 test.print();
24 return 0;
25 }
0x0000000000400a88 <+136>: add $0x18,%rsp
[...]
End of assembler dump.
In case of an optimized binary, GDB might skip instructions in this mode if the source code cannot be mapped unambiguously. Our instruction at 0x400a4b
is not listed. objdump never skips instructions and might skip the source context instead - an approach, that I prefer for debugging at this level. This does not mean that GDB is not useful for this task, it's just something to be aware of.
Final Thoughts
Termination reason, registers, memory map, and stack trace. It's all there without even a trace of a core dump. While definitely useful (I fixed quite a few crashes that way), you have to keep in mind that you're still missing valuable information by going that route, most notably the stack and heap as well as per-thread data (thread metadata, registers, stack).
So, whatever the scenario may be, you should seriously consider enabling core dump generation and ensure that dumps can be generated successfully if push comes to shove. Debugging in itself is complex enough, debugging without information you could technically have needlessly increases complexity and turnaround time, and, more importantly, significantly lowers the probability that the root cause can be found and addressed in a timely manner.
Segmentation fault (core dumped) runtime error
The problem is the missed check over input. If you don't provide argument to the executable it crash, and give you the error you say.
I suggest you to introduce a check over the argument provided, before accessing them.
C Programming: Segmentation Fault (Core Dumped)
You have several problems in your code
Create the pipe before the
fork
. You create the pipe twice, once for
the parent process and one for the child process. That makes no sense, the pipe
that the child created cannot be used by the parent. The pipe must already
exists so that the child inherits the file descriptors when the child is
created.Usually the parent creates the shared memory and the child gets the
shmid
from the parent when it does thefork
. Otherwise you will have to synchronize
the child and parent. So I would put the creation of the shared memory before
thefork
, so that the child inherits theshmid
from the parent.In the line
char *n = (char *) shm;
the cast is not needed,shm
is
already achar*
.In the parent block after the
fork
, you dowait(NULL);
and then proceed to
write into the pipe. That makes no sense and you block both parent and child.
The child blocks onread
because the parent hasn't send anything through the
pipe, yet. And the parent blocks onwait
, because the child never exits and thus
cannot send anything through the pipe. The parent must first send data
through the pipe, thenwait
for the child to exit.In the child block you do
scanf("%s", n);
, you are not protecting you
against buffer overflows.scanf("%14s", n)
would be better. Also you are not
checking ifscanf
read anything at all. If the user presses
CtrlD thenstdin
is closed,scanf
fails. In that casen
might not be'\0'
-terminated and this would lead to undefined behaviour
when the parent tries to print it. So it would be better:if(scanf("%14s", n) != 1) // avoid buffer overflow
{
fprintf(stderr, "Child: cannot read from stdin\n");
n[0] = 0; // 0-terminating
}In the parent block after the
fork
, you dowait
twice, why?Your
main
is wrong, it should beint main(int argc, char **argv);
The parent sends the contents of
argv[1]
to the child through the pipe, but
you fail to check ifargv[1]
is notNULL
. Use this at the start of the
program:if(argc != 2)
{
fprintf(stderr, "usage: %s string\n", argv[0]);
return 1;
}
So the correct version would be:
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/wait.h>
#include <err.h>
#include <sysexits.h>
#include <sys/shm.h>
#include <sys/types.h>
#include <sys/ipc.h>
#include <string.h>
#define SHM_SIZE 15
int main (int argc, char **argv) {
pid_t pid; //pid variable of type pid
int shmid; //shared memory id
char *shm; //shared memory name
if(argc != 2)
{
fprintf(stderr, "usage: %s string\n", argv[0]);
return 1;
}
int pipefd[2];
char buff;
// create shared memory before the fork,
// otherwise you will need to syncronize parent
// and child
pipe(pipefd); //creating pipe before the fork
// parent creates shared memory, child inherits shmid
// after fork
shmid = shmget(IPC_PRIVATE, SHM_SIZE, IPC_CREAT | 0666);
pid = fork(); //creating child process
if (pid < 0) {
fprintf(stderr, "Fork Failed");
return 1; // return -1 would be the same as return 255
} else if (pid == 0) {
shm = shmat(shmid, 0, 0);
char *n = shm; // shm is already a char*
printf("hello i am the child process. my pid is %d. what is your name?: ", getpid());
if(scanf("%14s", n) != 1) // avoid buffer overflow
{
fprintf(stderr, "Child: cannot read from stdin\n");
n[0] = 0; // 0-terminating
}
printf("\n");
///////////////////////////////////////////////////////////////////////////////////////
close(pipefd[1]);
printf("pipe opened on child end");
printf("\n");
printf("Parent sends: ");
fflush(stdout);
while(read(pipefd[0], &buff, 1) > 0) {
write(1, &buff, 1);
}
write(1, "\n", 1);
close(pipefd[0]);
printf("pipe successfully closed");
printf("\n");
exit(EXIT_SUCCESS);
} else {
shm = shmat(shmid, 0, 0);
close(pipefd[0]);
printf("pipe open on parent end");
printf("\n");
write(pipefd[1], argv[1], strlen(argv[1]));
close(pipefd[1]);
printf("pipe successfully closed");
// not we wait for child to exit
wait(NULL);
printf("\nThis is Child's Parent. My pid is %d. Nice to me you %s.\n", getpid(), shm);
printf("\n");
//////////////////////////////////////////////////////////////////////////////////////
exit(EXIT_SUCCESS);
}
return 0;
}
And the output is:
$ ./b "message to child: stop playing video games!"
pipe open on parent end
hello i am the child process. my pid is 10969. what is your name?: Pablo
pipe opened on child end
Parent sends: message to child: stop playing video games!
pipe successfully closed
pipe successfully closed
This is Child's Parent. My pid is 10968. Nice to me you Pablo.
What causes a segmentation fault (core dump) to occur in C?
Two things I see. First, you're mixing chars with ints in the matrix array. Second, you're printing the elements of matrix out to a file using the "%s" format. "%s" expects a null-terminated string where as you are passing chars and ints. This will cause the printf to try and access memory that is out-of-bounds, thus the fault.
Related Topics
Replace Text Based on a Dictionary
How to Replace Finding Words with the Different in Each Occurrence in Vi/Vim Editor
What Is Echo $? in Linux Terminal
Perl: What Does Checkstack.Pl in Linux Source Do
Bash Join Multiple Files with Empty Replacement (-E Option)
Why Does Docker Prompt "Permission Denied" When Backing Up the Data Volume
How Does the Linux Kernel Determine Ld.So's Load Address
Using a Command That Needs Backticks to Be Passed as Part of an Argument in Bash
Access Denied to Android.Git.Kernel.Org
How to Disassemble a System Call
How to Show Read Prompt with a New Line
Generate Public Ssh Key from Private Key
Ignore Case When Trying to Match File Names Using Find Command in Linux
How to Find Out Where Is My Code Causing Glib-Gobject-Critical
How Does This Canonical Flock Example Work