Core dump file name truncated
The code for this can be found in exec.c here.
The code is going to copy the corename based on the pattern up to the first percentage (giving /cores/core.). At the percentage it's going to increment and process the 'e'. The code for processing the 'e' part prints out the pattern using snprintf based on the current->comm structure.
This is the executable name (excluding path) TRUNCATED to the value TASK_COMM_LEN. Since this is defined as 16 characters (at least in the Kernel I found) then SampleCrashApplication is truncated to 15 + 1 characters (1 for the null byte at the end) which explains why you get your truncated core dump name.
At to why this structure truncates the name TASK_COMM_LEN, that's a deeper question, but it's something internal to the kernel and there's some discussion here.
missing corefiles when SEGV occurs in thread different from main thread
For what its worth - it had something to do with the corepattern which I found out with some trial and error
core_pattern core -> corefile
core_pattern /opt/tmp/core -> corefile
core_pattern /opt/tmp/core_%e.%p -> no corefile
core_pattern /opt/tmp/core_%e -> no corefile
core_pattern /opt/tmp/core_%h -> corefile
core_pattern /opt/tmp/core_%h_%p -> corefile
core_pattern /opt/tmp/core_%h_%p_%e -> no corefile
So the %e seems to be reason why sometimes no core is written.
Then core dump filename gets thread name instead of executable name with core_pattern %e.%p.core
explains what is going on - namely that %e is not the executable name but contains information about the threads - which in my case contains "/"
This also explains why segv in different threads behave differently and also why my simplest programs did not show the problem - as there was no code give names to the threads
Parse command line with uncertain number of arguments
Option 1: Change your core pattern to %p %s %e
. Since %e
is the only thing that can get replaced with whitespace, you can simply consider all the trailing arguments (i.e. argv[i]
for i > 2
) to make up the thread name.
Option 2: If you have multiple specifiers that may be replaced with whitespace (e.g. repeated instances of %e
, or %h
), you can set add magic separators to your arguments which you hope will never appear as part of a thread name, and then look for those as you iterate over the arguments:
|store_dump MAGIC1 %p MAGIC2 %e MAGIC3
Neither option is perfect in the sense that any whitespace in the thread name is normalized, so you cannot reconstruct the actual name accurately. For example, you cannot distinguish threads that only differ in the length of their embedded whitespace runs.
per process configurable core dump directory
No, you cannot set it per process. The core file gets dumped either to the current working directory of the process, or the directory set in /proc/sys/kernel/core_pattern if the pattern includes a directory.
CoreDumpDirectory in apache is a hack, apache registers signal handlers for all signals that cause a core dump , and changes the current directory in its signal handler.
/* handle all varieties of core dumping signals */
static void sig_coredump(int sig)
{
apr_filepath_set(ap_coredump_dir, pconf);
apr_signal(sig, SIG_DFL);
#if AP_ENABLE_EXCEPTION_HOOK
run_fatal_exception_hook(sig);
#endif
/* linuxthreads issue calling getpid() here:
* This comparison won't match if the crashing thread is
* some module's thread that runs in the parent process.
* The fallout, which is limited to linuxthreads:
* The special log message won't be written when such a
* thread in the parent causes the parent to crash.
*/
if (getpid() == parent_pid) {
ap_log_error(APLOG_MARK, APLOG_NOTICE,
0, ap_server_conf,
"seg fault or similar nasty error detected "
"in the parent process");
/* XXX we can probably add some rudimentary cleanup code here,
* like getting rid of the pid file. If any additional bad stuff
* happens, we are protected from recursive errors taking down the
* system since this function is no longer the signal handler GLA
*/
}
kill(getpid(), sig);
/* At this point we've got sig blocked, because we're still inside
* the signal handler. When we leave the signal handler it will
* be unblocked, and we'll take the signal... and coredump or whatever
* is appropriate for this particular Unix. In addition the parent
* will see the real signal we received -- whereas if we called
* abort() here, the parent would only see SIGABRT.
*/
}
Core dump file is not generated
Make sure your current directory (at the time of crash -- server
may change directories) is writable. If the server calls setuid
, the directory has to be writable by that user.
Also check /proc/sys/kernel/core_pattern
. That may redirect core dumps to another directory, and that directory must be writable. More info here.
How to change core pattern only for a particular application?
man core
tells us:
Piping core dumps to a program
Since kernel 2.6.19, Linux supports an alternate syntax for the
/proc/sys/kernel/core_pattern
file. If the first character of this
file is a pipe symbol (|
), then the remainder of the line is
interpreted as a program to be executed. Instead of being written to
a disk file, the core dump is given as standard input to the program.Note the following points:
The program must be specified using an absolute pathname (or a
pathname relative to the root directory, /), and must immediately
follow the '|' character.The process created to run the program runs as user and group
root.Command-line arguments can be supplied to the program (since Linux
2.6.24), delimited by white space (up to a total line length of
128 bytes).The command-line arguments can include any of the % specifiers
listed above. For example, to pass the PID of the process that is
being dumped, specify %p in an argument.
You can put a script there, like e.g.
| /path/to/myscript %p %s %c
You can detect which process is triggering the coredump: (man core
):
%% a single % character
%p PID of dumped process
%u (numeric) real UID of dumped process
%g (numeric) real GID of dumped process
%s number of signal causing dump
%t time of dump, expressed as seconds since the Epoch, 1970-01-01
00:00:00 +0000 (UTC)
%h hostname (same as nodename returned by uname(2))
%e executable filename (without path prefix)
%E pathname of executable, with slashes ('/') replaced by exclama‐
tion marks ('!').
%c core file size soft resource limit of crashing process (since
Linux 2.6.24)
Now all you have to do is "do the default thing" for other processes than your own
Related Topics
Low-Overhead Way to Access the Memory Space of a Traced Process
Parsing Data from Ifconfig with Awk or Sed
How to Copy a File with '$' in Name in Linux
Get Yesterday's Date in Solaris
Running Script in Crontab--Reboot: Command Not Found
Interrupting Syscalls in Threads on Linux
Run Multiple Commands At Once in the Same Terminal
Bash Alias Create File with Current Timestamp in Filename
Scp: How to Find Out That Copying Was Finished
Access Bash Positional Parameter Through Variable
The New Line Characted in the String Constant Isn't Being Recognized by Nasm
How to Understand Diff -U in Linux
Where Is the 'Sdk' Command Installed for Sdkman
How to Search for Invisible Control Characters
How to Wait for a Keystroke Interrupt with a Syscall on Linux