"Unexplainable" Core Dump

Unexplainable core dump

So, unlikely as it may seem, we appear to have hit an actual bona-fide CPU bug.

http://support.amd.com/us/Processor_TechDocs/41322_10h_Rev_Gd.pdf has erratum #721:

721 Processor May Incorrectly Update Stack Pointer

Description

Under a highly specific and detailed set of internal timing conditions,
the processor may incorrectly update the stack pointer after a long series
of push and/or near-call instructions, or a long series of pop
and/or near-return instructions. The processor must be in 64-bit mode for
this erratum to occur.

Potential Effect on System

The stack pointer value jumps by a value of approximately 1024, either in
the positive or negative direction.
This incorrect stack pointer causes unpredictable program or system behavior,
usually observed as a program exception or crash (for example, a #GP or #UD).

Analyzing core dump generated by multiple applications with gdb

I think you cannot achieve what you want with a single invocation of gdb. But you could run gdb twice, in different terminal windows. I did that more than once, and it works quite well (except of course that your own brain could be slightly overloaded).

a gdb process can debug only one single program, with one single debugged process or (for post mortem debug) one single core file.

And a given core file is produced by abnormal termination of one single process (not several), so I don't understand your question.

Apparently, you have a crash in some execution of python probably augmented by your faulty C code. I suggest having a debuggable variant of Python, perhaps by installing the python3-all-dbg package or something similar, then use gdb on it. Of course, compile your C code plugged into Python with debugging enabled. Perhaps you violated some invariant of the Python garbage collector.

How to generate a core dump in Linux on a segmentation fault?

This depends on what shell you are using. If you are using bash, then the ulimit command controls several settings relating to program execution, such as whether you should dump core. If you type

ulimit -c unlimited

then that will tell bash that its programs can dump cores of any size. You can specify a size such as 52M instead of unlimited if you want, but in practice this shouldn't be necessary since the size of core files will probably never be an issue for you.

In tcsh, you'd type

limit coredumpsize unlimited

Dumping only stack trace in linux core dumps

You can set /proc/$PID/coredump_filter to 0x10.

See http://man7.org/linux/man-pages/man5/core.5.html

Large unexplained memory in the memory dump of a .NET process

After investigation, the problem happens to be heap fragmentation because of pinned buffers. I'll explain how to investigate and what pinned buffers are.

All profilers I've used agreed to say most of the heap is free. Now I needed to look at fragmentation. I can do it with WinDbg for example:

!dumpheap -stat

Then I looked at the "Fragmented blocks larger than..." section. WinDbg says objects lie between the free blocks making compaction impossible. Then I looked at what is holding these objects and if they are pinned, here for example object at address 0000000bfaf93b80:

!gcroot 0000000bfaf93b80

It displays the reference graph:

00000004082945e0 (async pinned handle)
-> 0000000535b3a3e0 System.Threading.OverlappedData
-> 00000006f5266d38 System.Threading.IOCompletionCallback
-> 0000000b35402220 System.Net.Sockets.SocketAsyncEventArgs
-> 0000000bf578c850 System.Net.Sockets.Socket
-> 0000000bf578c900 System.Net.SocketAddress
-> 0000000bfaf93b80 System.Byte[]

00000004082e2148 (pinned handle)
-> 0000000bfaf93b80 System.Byte[]

The last two lines tell you the object is pinned.

Pinned objects are buffers than can't be moved because their address is shared with non-managed code. Here you can guess it is the system TCP layer. When managed code needs to send the address of a buffer to external code, it needs to "pin" the buffer so that the address remains valid: the GC cannot move it.

These buffers, while being a very small part of the memory make compaction impossible and thus cause large memory "leak", even if it is not exactly a leak, more a fragmentation problem. This can happen on the LOH or on generational heaps just the same. Now the question is: what is causing these pinned objects to live forever: find the root cause of the leak that causes the fragmentation.

You can read similar questions here:

  • https://ayende.com/blog/181761-C/the-curse-of-memory-fragmentation

  • .NET deletes pinned allocated buffer (good explanation of pinned objects in the answer)

Note: the root cause was in a third party library AerospikeClient using the .NET async Socket API that is known for pinning the buffers sent to it. While AerospikeClient properly used a buffer pool, the buffer pool was re-created when re-creating their client. Since we re-created their client every hour instead of creating one forever, the buffer pool was re-created, causing a growing number of pinned buffers, in turn causing unlimited fragmentation. What remains unclear is why old buffers are never unpinned when transmission is over or at least when their client is disposed.

Identify concrete type of object behind auto_ptr from core dump

What I'd really like to know, if the pointer belongs to an IBaror an IBaz

GDB should be able to tell you that. Use (gdb) set print object on. Documentation here.

When displaying a pointer to an object, identify the actual (derived)
type of the object rather than the declared type, using the virtual
function table. Note that the virtual function table is required—this
feature can only work for objects that have run-time type
identification; a single virtual method in the object's declared type
is sufficient.

Update:

it only outputs the IFoo* interface

That likely means that the pointer really is pointing to IFoo (e.g. the object that was of type IBar or IBaz has already been destructed).

Would working with dynamic_cast imply

Yes, dynamic_cast can't work without RTTI; if you are using dynamic_cast, print object on should just work.



Related Topics



Leave a reply



Submit