Core Dump Filename Gets Thread Name Instead of Executable Name with Core_Pattern %E.%P.Core

Core dump file name truncated

The code for this can be found in exec.c here.

The code is going to copy the corename based on the pattern up to the first percentage (giving /cores/core.). At the percentage it's going to increment and process the 'e'. The code for processing the 'e' part prints out the pattern using snprintf based on the current->comm structure.

This is the executable name (excluding path) TRUNCATED to the value TASK_COMM_LEN. Since this is defined as 16 characters (at least in the Kernel I found) then SampleCrashApplication is truncated to 15 + 1 characters (1 for the null byte at the end) which explains why you get your truncated core dump name.

At to why this structure truncates the name TASK_COMM_LEN, that's a deeper question, but it's something internal to the kernel and there's some discussion here.

missing corefiles when SEGV occurs in thread different from main thread

For what its worth - it had something to do with the corepattern which I found out with some trial and error

core_pattern  core                   -> corefile
core_pattern /opt/tmp/core -> corefile
core_pattern /opt/tmp/core_%e.%p -> no corefile
core_pattern /opt/tmp/core_%e -> no corefile
core_pattern /opt/tmp/core_%h -> corefile
core_pattern /opt/tmp/core_%h_%p -> corefile
core_pattern /opt/tmp/core_%h_%p_%e -> no corefile

So the %e seems to be reason why sometimes no core is written.
Then core dump filename gets thread name instead of executable name with core_pattern %e.%p.core
explains what is going on - namely that %e is not the executable name but contains information about the threads - which in my case contains "/"

This also explains why segv in different threads behave differently and also why my simplest programs did not show the problem - as there was no code give names to the threads

Parse command line with uncertain number of arguments

Option 1: Change your core pattern to %p %s %e. Since %e is the only thing that can get replaced with whitespace, you can simply consider all the trailing arguments (i.e. argv[i] for i > 2) to make up the thread name.

Option 2: If you have multiple specifiers that may be replaced with whitespace (e.g. repeated instances of %e, or %h), you can set add magic separators to your arguments which you hope will never appear as part of a thread name, and then look for those as you iterate over the arguments:

|store_dump MAGIC1 %p MAGIC2 %e MAGIC3

Neither option is perfect in the sense that any whitespace in the thread name is normalized, so you cannot reconstruct the actual name accurately. For example, you cannot distinguish threads that only differ in the length of their embedded whitespace runs.

per process configurable core dump directory

No, you cannot set it per process. The core file gets dumped either to the current working directory of the process, or the directory set in /proc/sys/kernel/core_pattern if the pattern includes a directory.

CoreDumpDirectory in apache is a hack, apache registers signal handlers for all signals that cause a core dump , and changes the current directory in its signal handler.

/* handle all varieties of core dumping signals */
static void sig_coredump(int sig)
{
apr_filepath_set(ap_coredump_dir, pconf);
apr_signal(sig, SIG_DFL);
#if AP_ENABLE_EXCEPTION_HOOK
run_fatal_exception_hook(sig);
#endif
/* linuxthreads issue calling getpid() here:
* This comparison won't match if the crashing thread is
* some module's thread that runs in the parent process.
* The fallout, which is limited to linuxthreads:
* The special log message won't be written when such a
* thread in the parent causes the parent to crash.
*/
if (getpid() == parent_pid) {
ap_log_error(APLOG_MARK, APLOG_NOTICE,
0, ap_server_conf,
"seg fault or similar nasty error detected "
"in the parent process");
/* XXX we can probably add some rudimentary cleanup code here,
* like getting rid of the pid file. If any additional bad stuff
* happens, we are protected from recursive errors taking down the
* system since this function is no longer the signal handler GLA
*/
}
kill(getpid(), sig);
/* At this point we've got sig blocked, because we're still inside
* the signal handler. When we leave the signal handler it will
* be unblocked, and we'll take the signal... and coredump or whatever
* is appropriate for this particular Unix. In addition the parent
* will see the real signal we received -- whereas if we called
* abort() here, the parent would only see SIGABRT.
*/
}

Core dump file is not generated

Make sure your current directory (at the time of crash -- server may change directories) is writable. If the server calls setuid, the directory has to be writable by that user.

Also check /proc/sys/kernel/core_pattern. That may redirect core dumps to another directory, and that directory must be writable. More info here.

How to change core pattern only for a particular application?

man core tells us:

Piping core dumps to a program

Since kernel 2.6.19, Linux supports an alternate syntax for the
/proc/sys/kernel/core_pattern file. If the first character of this
file is a pipe symbol (|), then the remainder of the line is
interpreted as a program to be executed. Instead of being written to
a disk file, the core dump is given as standard input to the program.

Note the following points:

  • The program must be specified using an absolute pathname (or a
    pathname relative to the root directory, /), and must immediately
    follow the '|' character.

  • The process created to run the program runs as user and group
    root.

  • Command-line arguments can be supplied to the program (since Linux
    2.6.24), delimited by white space (up to a total line length of
    128 bytes).

  • The command-line arguments can include any of the % specifiers
    listed above. For example, to pass the PID of the process that is
    being dumped, specify %p in an argument.

You can put a script there, like e.g.

| /path/to/myscript %p %s %c

You can detect which process is triggering the coredump: (man core):

       %%  a single % character
%p PID of dumped process
%u (numeric) real UID of dumped process
%g (numeric) real GID of dumped process
%s number of signal causing dump
%t time of dump, expressed as seconds since the Epoch, 1970-01-01
00:00:00 +0000 (UTC)
%h hostname (same as nodename returned by uname(2))
%e executable filename (without path prefix)
%E pathname of executable, with slashes ('/') replaced by exclama‐
tion marks ('!').
%c core file size soft resource limit of crashing process (since
Linux 2.6.24)

Now all you have to do is "do the default thing" for other processes than your own



Related Topics



Leave a reply



Submit