Difference Between Core and Core-File

Difference between core and core-file

Is there anyway I can do this in command line.

Yes: gdb /path/to/exe /path/to/core

My main interest is why does gdb behaves differently.

It doesn't.

Most UNIX systems, in order to conserve disk space, do not dump file-backed read-only pages (such as program code) into the core file -- that code is already on disk, so why write it again? (This is actually configurable: see man core and coredump_filter).

But these read-only pages contain symbols (what you see in nm and in backtrace output), so if you don't tell GDB where the executable file is, then it can't produce a meaningful backtrace.

After all, gdb does know the path to executable

No, it does not.

The kernel records incomplete info about the executable which produced the core. That info is unreliable:

It may record relative path, e.g. ./a.out and there is absolutely no guarantee that your current directory is the same at GDB analysis time as it was when the executable was invoked, and
There is only space for 16 characters in elf_prpsinfo.pr_fname[], and anything longer than that will be truncated.

What are core files by node.js

core.<number> files are typically memory dumps created on Linux systems. The <number> is the Process ID of the process that crashed.

I guess your Node.JS application has crashed a number of times and these are the memory dumps left there for you to debug.

Compare 2 GDB-Core Dumps

Independent from your underlying motivation, I'd like to get into your question. You ask how the difference between two core dumps can be identified. This is going to be lengthy, but will hopefully give you your answer.

A core dump is represented by an ELF file that contains metadata and a specific set of memory regions (on Linux, this can be controlled via /proc/[pid]/coredump_filter) that were mapped into the given process at the time of dump creation.

The obvious way to compare the dumps would be to compare a hex-representation:

$ diff -u <(hexdump -C dump1) <(hexdump -C dump2)
--- /dev/fd/63  2020-05-17 10:01:40.370524170 +0000
+++ /dev/fd/62  2020-05-17 10:01:40.370524170 +0000
@@ -90,8 +90,9 @@
 000005e0  00 00 00 00 00 00 00 00  00 00 00 00 80 1f 00 00  |................|
 000005f0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|

The result is rarely useful because you're missing the context. More specifically, there's no straightforward way to get from the offset of a value change in the file to the offset corresponding to the process virtual memory address space.

So, more context if needed. The optimal output would be a list of VM addresses including before and after values.

Before we can get on that, we need a test scenario that loosely resembles yours. The following application includes a use-after-free memory issue that does not lead to a segmentation fault at first (a new allocation with the same size hides the issue). The idea here is to create a core dump using gdb (generate) during each phase based on break points triggered by the code:

dump1: Correct state
dump2: Incorrect state, no segmentation fault
dump3: Segmentation fault

The code:

#include <stdlib.h>
#include <unistd.h>
#include <signal.h>
#include <stdio.h>

int **g_state;

int main()
{
  int value = 1;
  g_state = malloc(sizeof(int*));
  *g_state = &value;
  if (g_state && *g_state) {
    printf("state: %d\n", **g_state);
  }
  printf("no corruption\n");
  raise(SIGTRAP);
  free(g_state);
  char **unrelated = malloc(sizeof(int*));
  *unrelated = "val";
  if (g_state && *g_state) {
    printf("state: %d\n", **g_state);
  }
  printf("use-after-free hidden by new allocation (invalid value)\n");
  raise(SIGTRAP);
  printf("use-after-free (segfault)\n");
  free(unrelated);
  int *unrelated2 = malloc(sizeof(intptr_t));
  *unrelated2 = 1;
  if (g_state && *g_state) {
    printf("state: %d\n", **g_state);
  }
  return 0;
}

Now, the dumps can be generated:

Starting program: test 
state: 1
no corruption

Program received signal SIGTRAP, Trace/breakpoint trap.
0x00007ffff7a488df in raise () from /lib64/libc.so.6
(gdb) generate dump1
Saved corefile dump1
(gdb) cont
Continuing.
state: 7102838
use-after-free hidden by new allocation (invalid value)

Program received signal SIGTRAP, Trace/breakpoint trap.
0x00007ffff7a488df in raise () from /lib64/libc.so.6
(gdb) generate dump2
Saved corefile dump2
(gdb) cont
Continuing.
use-after-free (segfault)

Program received signal SIGSEGV, Segmentation fault.
main () at test.c:31
31          printf("state: %d\n", **g_state);
(gdb) generate dump3
Saved corefile dump3

A quick manual inspection shows the relevant differences:

# dump1
(gdb) print g_state
$1 = (int **) 0x602260
(gdb) print *g_state
$2 = (int *) 0x7fffffffe2bc
# dump2
(gdb) print g_state
$1 = (int **) 0x602260
(gdb) print *g_state
$2 = (int *) 0x4008c1
# dump3
$2 = (int **) 0x602260
(gdb) print *g_state
$3 = (int *) 0x1

Based on that output, we can clearly see that *g_state changed but is still a valid pointer in dump2. In dump3, the pointer becomes invalid. Of course, we'd like to automate this comparison.

Knowing that a core dump is an ELF file, we can simply parse it and generate a diff ourselves. What we'll do:

Open a dump
Identify PROGBITS sections of the dump
Remember the data and address information
Repeat the process with the second dump
Compare the two data sets and print the diff

Based on elf.h, it's relatively easy to parse ELF files. I created a sample implementation that compares two dumps and prints a diff that is similar to comparing two hexdump outputs using diff. The sample makes some assumptions (x86_64, mappings either match in terms of address and size or they only exist in dump1 or dump2), omits most error handling and always chooses a simple implementation approach for the sake of brevity.

#include <elf.h>
#include <fcntl.h>
#include <stdio.h>
#include <sys/mman.h>
#include <sys/stat.h>

#define MAX_MAPPINGS 1024

struct dump
{
  char *base;
  Elf64_Shdr *mappings[MAX_MAPPINGS];
};

unsigned readdump(const char *path, struct dump *dump)
{
  unsigned count = 0;
  int fd = open(path, O_RDONLY);
  if (fd != -1) {
    struct stat stat;
    fstat(fd, &stat);
    dump->base = mmap(NULL, stat.st_size, PROT_READ, MAP_PRIVATE, fd, 0);
    Elf64_Ehdr *header = (Elf64_Ehdr *)dump->base;
    Elf64_Shdr *secs = (Elf64_Shdr*)(dump->base+header->e_shoff);
    for (unsigned secinx = 0; secinx < header->e_shnum; secinx++) {
      if (secs[secinx].sh_type == SHT_PROGBITS) {
        if (count == MAX_MAPPINGS) {
          count = 0;
          break;
        }
        dump->mappings[count] = &secs[secinx];
        count++;
      }
    }
    dump->mappings[count] = NULL;
  }
  return count;
}

#define DIFFWINDOW 16

void printsection(struct dump *dump, Elf64_Shdr *sec, const char mode,
  unsigned offset, unsigned sizelimit)
{
  unsigned char *data = (unsigned char *)(dump->base+sec->sh_offset);
  uintptr_t addr = sec->sh_addr+offset;
  unsigned size = sec->sh_size;
  data += offset;
  if (sizelimit) {
    size = sizelimit;
  }
  unsigned start = 0;
  for (unsigned i = 0; i < size; i++) {
    if (i%DIFFWINDOW == 0) {
      printf("%c%016x ", mode, addr+i);
      start = i;
    }
    printf(" %02x", data[i]);
    if ((i+1)%DIFFWINDOW == 0 || i + 1 == size) {
      printf(" [");
      for (unsigned j = start; j <= i; j++) {
        putchar((data[j] >= 32 && data[j] < 127)?data[j]:'.');
      }
      printf("]\n");
    }
    addr++;
  }
}

void printdiff(struct dump *dump1, Elf64_Shdr *sec1,
  struct dump *dump2, Elf64_Shdr *sec2)
{
  unsigned char *data1 = (unsigned char *)(dump1->base+sec1->sh_offset);
  unsigned char *data2 = (unsigned char *)(dump2->base+sec2->sh_offset);
  unsigned difffound = 0;
  unsigned start = 0;
  for (unsigned i = 0; i < sec1->sh_size; i++) {
    if (i%DIFFWINDOW == 0) {
      start = i;
      difffound = 0;
    }
    if (!difffound && data1[i] != data2[i]) {
      difffound = 1;
    }
    if ((i+1)%DIFFWINDOW == 0 || i + 1 == sec1->sh_size) {
      if (difffound) {
        printsection(dump1, sec1, '-', start, DIFFWINDOW);
        printsection(dump2, sec2, '+', start, DIFFWINDOW);
      }
    }
  }
}

int main(int argc, char **argv)
{
  if (argc != 3) {
    fprintf(stderr, "Usage: compare DUMP1 DUMP2\n");
    return 1;
  }
  struct dump dump1;
  struct dump dump2;
  if (readdump(argv[1], &dump1) == 0 ||
      readdump(argv[2], &dump2) == 0) {
    fprintf(stderr, "Failed to read dumps\n");
    return 1;
  }
  unsigned sinx1 = 0;
  unsigned sinx2 = 0;
  while (dump1.mappings[sinx1] || dump2.mappings[sinx2]) {
    Elf64_Shdr *sec1 = dump1.mappings[sinx1];
    Elf64_Shdr *sec2 = dump2.mappings[sinx2];
    if (sec1 && sec2) {
      if (sec1->sh_addr == sec2->sh_addr) {
        // in both
        printdiff(&dump1, sec1, &dump2, sec2);
        sinx1++;
        sinx2++;
      }
      else if (sec1->sh_addr < sec2->sh_addr) {
        // in 1, not 2
        printsection(&dump1, sec1, '-', 0, 0);
        sinx1++;
      }
      else {
        // in 2, not 1
        printsection(&dump2, sec2, '+', 0, 0);
        sinx2++;
      }
    }
    else if (sec1) {
      // in 1, not 2
      printsection(&dump1, sec1, '-', 0, 0);
      sinx1++;
    }
    else {
      // in 2, not 1
      printsection(&dump2, sec2, '+', 0, 0);
      sinx2++;
    }
  } 
  return 0;
}

With the sample implementation, we can re-evaluate our scenario above. A except from the first diff:

$ ./compare dump1 dump2
-0000000000601020  86 05 40 00 00 00 00 00 50 3e a8 f7 ff 7f 00 00 [..@.....P>......]
+0000000000601020  00 6f a9 f7 ff 7f 00 00 50 3e a8 f7 ff 7f 00 00 [.o......P>......]
-0000000000602260  bc e2 ff ff ff 7f 00 00 00 00 00 00 00 00 00 00 [................]
+0000000000602260  c1 08 40 00 00 00 00 00 00 00 00 00 00 00 00 00 [..@.............]
-0000000000602280  6e 6f 20 63 6f 72 72 75 70 74 69 6f 6e 0a 00 00 [no corruption...]
+0000000000602280  75 73 65 2d 61 66 74 65 72 2d 66 72 65 65 20 68 [use-after-free h]
-0000000000602290  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [................]
+0000000000602290  69 64 64 65 6e 20 62 79 20 6e 65 77 20 61 6c 6c [idden by new all]

The diff shows that *gstate (address 0x602260) was changed from 0x7fffffffe2bc to 0x4008c1:

-0000000000602260  bc e2 ff ff ff 7f 00 00 00 00 00 00 00 00 00 00 [................]
+0000000000602260  c1 08 40 00 00 00 00 00 00 00 00 00 00 00 00 00 [..@.............]

The second diff with only the relevant offset:

$ ./compare dump1 dump2
-0000000000602260  c1 08 40 00 00 00 00 00 00 00 00 00 00 00 00 00 [..@.............]
+0000000000602260  01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [................]

The diff shows that *gstate (address 0x602260) was changed from 0x4008c1 to 0x1.

There you have it, a core dump diff. Now, whether or not that can prove to be useful in your scenario depends on various factors, one being the timeframe between the two dumps and the activity that takes place within that window. A large diff will possibly be difficult to analyze, so the aim must be to minimize its size by choosing the diff window carefully.

The more context you have, the easier the analysis will turn out to be. For example, the relevant scope of the diff could be reduced by limiting the diff to addresses of the .data and .bss sections of the library in question if changes in there are relevant to your situation.

Another approach to reduce the scope: excluding changes to memory that is not referenced by the library. The relationship between arbitrary heap allocations and specific libraries is not immediately apparent. Based on the the addresses of changes in your initial diff, you could search for pointers in the .data and .bss sections of the library right in the diff implementation. This does not take every possible reference into account (most notably indirect references from other allocations, register and stack references of library-owned threads), but it's a start.

What does a core dump of a C code mean ?

The extension is most often the process ID that crashed. You need to examine the file with a debug tool.

Difference between Jstack and gcore when generating Core Dumps?

GCore act at OS level and you got a dump of native code that is currently running. From a java point of view, it is not really understandable.

JStack get you the stack trace at VM level (the java stack) of all thread your application have. You can find from that what is the real java code executed at a point.

Clearly, GCore is almost never used (too low level, native code...). Only really strange issue with native library or stuff like that will perhaps need this kind of tool.

There is also jmap that can generate a hprof file which is the heap data from you VM. A tool like 'Memory Analyser Tool' can open the hprof, and you can drill down on what was going on (on memory side).
If your VM crash because of a OutOfMemory, you can also set parameter to get the hprof when the event occurs. It helps to understand why (too many users, A DB query that fetch too much data...)

Last thing is the fact that you can add a debug option when your start your VM, so that you can connect to it, and put debug on running process. It can help if you have some strange issue that you are not able to reproduce in your local environment.

Why are core dump files generated?

A process dumps core when it is terminated by the operating system due to a fault in the program. The most typical reason this occurs is because the program accessed an invalid pointer value. Given that you have a sporadic dump, it's likely that you are using an uninitialized pointer.

Can you post the code that is causing the fault? Other than vague generalizations it's hard to guess what's wrong without actually seeing code.

As for what a core dump actually is, check out this Wikipedia article:

http://en.wikipedia.org/wiki/Core_dump

What is a core dump file in Linux? What information does it provide?

It's basically the process address space in use (from the mm_struct structure which contains all the virtual memory areas), and any other supporting information^*a, at the time it crashed.

For example, let's say you try to dereference a NULL pointer and receive a SEGV signal, causing you to exit. As part of that process, the operating system tries to write your information to a file for later post-mortem analysis.

You can load the core file into a debugger along with the executable file (for symbols and other debugging information, for example) and poke around to try and discover what caused the problem.

^*a: in kernel version 2.6.38, fs/exec.c/do_coredump() is the one responsible for core dumps and you can see that it's passed the signal number, exit code and registers. It in turn passes the signal number and registers to a binary-format-specific (ELF, a.out, etc) dumper.

The ELF dumper is fs/binfmt_elf.c/elf_core_dump() and you can see that it outputs non-memory-based information, like thread details, in fs/binfmt_elf.c/fill_note_info(), then returns to output the process space.

Difference Between Core and Core-File