Examining C/C++ Heap Memory Statistics in Gdb

Examining C/C++ Heap memory statistics in gdb

@fd - the RedHat bug had your answer.

The mallinfo function has been deprecated, and won't be updated. A true query stats API is TDB. Today, you have malloc_stats and malloc_info. I can't find any documentation on either one, but here's what they give you.

Is this close enough to what you need?

(gdb) call malloc_stats()
Arena 0:
system bytes = 135168
in use bytes = 96
Total (incl. mmap):
system bytes = 135168
in use bytes = 96
max mmap regions = 0
max mmap bytes = 0

(gdb) call malloc_info(0, stdout)
<malloc version="1">
<heap nr="0">
<sizes>
<unsorted from="1228788" to="1229476" total="3917678" count="3221220448"/>
</sizes>
<total type="fast" count="0" size="0"/>
<total type="rest" count="3221220448" size="3917678"/>
<system type="current" size="135168"/>
<system type="max" size="135168"/>
<aspace type="total" size="135168"/>
<aspace type="mprotect" size="135168"/>
</heap>
<total type="fast" count="0" size="0"/>
<total type="rest" count="3221220448" size="3917678"/>
<system type="current" size="135168
/>
<system type="max" size="135168
/>
<aspace type="total" size="135168"/>
<aspace type="mprotect" size="135168"/>
</malloc>

Report how much memory on stack/heap is being used by an object? (GDB)

If you're prepared to use GLIBC specific functions you can use mallinfo() directly within your program to answer the question:

#include <cstdlib>
#include <vector>
#include <string>
#include <iostream>
#include <malloc.h>

int main(){
std::cout << "Using: " << mallinfo().uordblks << "\n";

std::vector<std::string> vec{"11","22","33"};

std::cout << "Using: " << mallinfo().uordblks << "\n";

return EXIT_SUCCESS;
}

Examine how the variables are allocated in the heap memory (for debugging runtime errors)

Is there way to show how variables are allocated in the heap?

Yes: you can examine locations that vector will use in a debugger. For example (using your program) and GDB:

(gdb) start
Temporary breakpoint 1 at 0x1185: file t.cc, line 3.
Starting program: /tmp/a.out

Temporary breakpoint 1, main (argc=1, argv=0x7fffffffdbb8) at t.cc:3
3 std::vector<int> foo(10, 0);
(gdb) n
4 std::vector<int> bar(10, 1);
(gdb) p/r foo
$1 = {<std::_Vector_base<int, std::allocator<int> >> = {_M_impl = {<std::allocator<int>> = {<__gnu_cxx::new_allocator<int>> = {<No data fields>}, <No data fields>}, <std::_Vector_base<int, std::allocator<int> >::_Vector_impl_data> = {_M_start = 0x55555556aeb0, _M_finish = 0x55555556aed8, _M_end_of_storage = 0x55555556aed8}, <No data fields>}}, <No data fields>}
(gdb) n
5 for(int i = 0; i < 20; i++) {
(gdb) p/r bar
$2 = {<std::_Vector_base<int, std::allocator<int> >> = {_M_impl = {<std::allocator<int>> = {<__gnu_cxx::new_allocator<int>> = {<No data fields>}, <No data fields>}, <std::_Vector_base<int, std::allocator<int> >::_Vector_impl_data> = {_M_start = 0x55555556aee0, _M_finish = 0x55555556af08, _M_end_of_storage = 0x55555556af08}, <No data fields>}}, <No data fields>}

Here you can see that foo will use locations 0x55555556aeb0 through 0x55555556aed8, and bar will use 0x55555556aee0 through 0x55555556af08.

Note however that these locations may change from run to run, especially in multi-threaded programs, making this technique very difficult to use.

If your problem can be found by the Address Sanitizer, that would be significantly faster and more reliable approach.

Is it possible to monitor which function break certain memory accidentally?

Yes: that's what watchpoints are for. For example, we don't expect the location pointed to by foo._M_impl._M_end_of_storage to be changed, so we can set a watchpoint on it:

(gdb) start
Temporary breakpoint 1 at 0x1185: file t.cc, line 3.
Starting program: /tmp/a.out

Temporary breakpoint 1, main (argc=1, argv=0x7fffffffdbb8) at t.cc:3
3 std::vector<int> foo(10, 0);
(gdb) n
4 std::vector<int> bar(10, 1);
(gdb) n
5 for(int i = 0; i < 20; i++) {

(gdb) watch *(int*)0x55555556aed8
Hardware watchpoint 2: *(int*)0x55555556aed8

(gdb) c
Continuing.

Hardware watchpoint 2: *(int*)0x55555556aed8

Old value = 49
New value = 42
main (argc=1, argv=0x7fffffffdbb8) at t.cc:5
5 for(int i = 0; i < 20; i++) {

(gdb) p i
$1 = 10 <-- voila, found the place where overflow happened.

How to examine the heap and stack of an RTEMS application using gdb?

For the heap you can investigate the Heap_Control structure (defined at cpukit/score/include/rtems/score/heap.h) and the two variables, RTEMS_Malloc_Heap and _Workspace_Area. In particular you seem interested in the Heap_Control.area_begin and Heap_Control.area_end fields. The _Workspace_Area can be part of the heap or a separate memory region, and it holds the kernel data structures. The RTEMS_Malloc_Heap points to the Heap_Control describing the traditional C program heap.

For the stack, you can look at the Thread_Start_information structure (defined at cpukit/score/include/rtems/score/thread.h) contained in the Thread_Control associated with the thread whose stack you want to examine. You can get a pointer to the executing thread with the _Thread_Executing macro.

how to use gdb to explore the stack/heap?

My first approach to using GDB for debugging is to setup breakpoints. This is done like so:

prompt> gdb ./x_bstree.c
(gdb) #prompt
(gdb) b 123 #break at line 123
(gdb) r #start program

Now your program halts at line 123 of your program. Now you can examine variables in stack or heap using print. For stack variables just use print <varname>. For heap variables (pointers) use print <*varname>. Not sure there is anything special to do for examining stack/heap variables?

Of course to debug multi-threaded applications you would need to make it run in single-threaded mode & then dubug Otherwise it becomes difficult to predict what's happening.

For anything else there is extensive documentation of gdb & many sites also provide gdb cheat sheets.

GDB and Assembly: how to examine consts variables defined in heap?

Your understanding about str reside on heap is not correct. Its global variable which gets stored into the data segment. Regarding your print global variable, you can do as follows on my GNU/Linux terminal.

$ gcc -g -Wall hello.c
$ gdb -q ./a.out
Reading symbols from /home/mantosh/practice/a.out...done.
(gdb) break main
Breakpoint 1 at 0x400524: file hello.c, line 6.
(gdb) run
Starting program: /home/mantosh/practice/a.out

Breakpoint 1, main () at bakwas.c:6
6 printf("%s",str);
(gdb) disassemble main
Dump of assembler code for function main:
0x0000000000400520 <+0>: push %rbp
0x0000000000400521 <+1>: mov %rsp,%rbp
=> 0x0000000000400524 <+4>: mov $0x601020,%esi
0x0000000000400529 <+9>: mov $0x4005e4,%edi
0x000000000040052e <+14>: mov $0x0,%eax
0x0000000000400533 <+19>: callq 0x4003f0 <printf@plt>
0x0000000000400538 <+24>: mov $0x0,%eax
0x000000000040053d <+29>: pop %rbp
0x000000000040053e <+30>: retq
End of assembler dump.

(gdb) p str
$1 = "justatest\000\000\000\000\000"
(gdb) p &str
$2 = (char (*)[15]) 0x601020

// These are addresses of two arguments which would be passed in printf.
// From assembly instruction we can verify that before calling the printf
// these are getting stored into the registers.
(gdb) x/s 0x4005e4
0x4005e4: "%s"
(gdb) x/s 0x601020
0x601020 <str>: "justatest

examining virtual memory block reported by pmap

putting that addr on gdb dint help.

I don't know what you mean by "putting that addr on gdb", but doing that correctly will help.

I heard some address randomization is used by gdb.

You heard wrong: GDB doesn't do any randomization by itself, and it (by default) disables randomization that OS performs, so as to make debugging easier and more reproducible.

Can some one help me how to get the symbol that corresponds to the memory location reported by pmap output.

You are confused: heap allocated memory doesn't have any symbols by definition.

Ok, so let's work through example of examinining memory that is visible in pmap with GDB. Let's start by compiling this program, which builds a 1 million long linked list with some strings in it:

#include <stdlib.h>
#include <stdio.h>
#include <unistd.h>

typedef struct Node { struct Node *next; char payload[64]; } Node;

int main()
{
int j;
Node *head = NULL;

for (j = 0; j < 1000000; j++) {
Node *n = malloc(sizeof(*n));
n->next = head;
sprintf(n->payload, "string %d", j);
head = n;
}
return 0;
}

gcc -Wall -g -std=c99 t.c && gdb -q ./a.out

(gdb) b 17
Breakpoint 1 at 0x4005e3: file t.c, line 17.
(gdb) r
Starting program: /tmp/a.out

Breakpoint 1, main () at t.c:17
17 return 0;

Now we can examine the program with pmap:

(gdb) info prog
Using the running image of child process 23785.
Program stopped at 0x4005e3.
It stopped at breakpoint 1.
Type "info stack" or "info registers" for more information.
(gdb) shell pmap 23785
23785: /tmp/a.out
0000000000400000 4K r-x-- a.out
0000000000600000 4K r---- a.out
0000000000601000 4K rw--- a.out
0000000000602000 78144K rw--- [ anon ]
00007ffff7a11000 1784K r-x-- libc-2.19.so
00007ffff7bcf000 2048K ----- libc-2.19.so
00007ffff7dcf000 16K r---- libc-2.19.so
00007ffff7dd3000 8K rw--- libc-2.19.so
00007ffff7dd5000 20K rw--- [ anon ]
00007ffff7dda000 140K r-x-- ld-2.19.so
00007ffff7fd1000 12K rw--- [ anon ]
00007ffff7ff6000 8K rw--- [ anon ]
00007ffff7ff8000 8K r---- [ anon ]
00007ffff7ffa000 8K r-x-- [ anon ]
00007ffff7ffc000 4K r---- ld-2.19.so
00007ffff7ffd000 4K rw--- ld-2.19.so
00007ffff7ffe000 4K rw--- [ anon ]
00007ffffffde000 132K rw--- [ stack ]
ffffffffff600000 4K r-x-- [ anon ]
total 82356K

It seems pretty clear that the anon space of 78MiB starting at 0x602000 must be where most of our data is. (You can also verify this by stepping a few times through the loop.)

How can we look at this data? Like so:

(gdb) x/30gx 0x602000
0x602000: 0x0000000000000000 0x0000000000000051
0x602010: 0x0000000000000000 0x3020676e69727473
0x602020: 0x0000000000000000 0x0000000000000000
0x602030: 0x0000000000000000 0x0000000000000000
0x602040: 0x0000000000000000 0x0000000000000000
0x602050: 0x0000000000000000 0x0000000000000051
0x602060: 0x0000000000602010 0x3120676e69727473
0x602070: 0x0000000000000000 0x0000000000000000
0x602080: 0x0000000000000000 0x0000000000000000
0x602090: 0x0000000000000000 0x0000000000000000
0x6020a0: 0x0000000000000000 0x0000000000000051
0x6020b0: 0x0000000000602060 0x3220676e69727473
0x6020c0: 0x0000000000000000 0x0000000000000000
0x6020d0: 0x0000000000000000 0x0000000000000000
0x6020e0: 0x0000000000000000 0x0000000000000000

Immediately you can notice that at 0x602018, at 0x602068 and at 0x6020b8 there are ASCII strings.

You can examine these strings like so:

(gdb) x/s 0x602018
0x602018: "string 0"
(gdb) x/s 0x602068
0x602068: "string 1"
(gdb) x/s 0x6020b8
0x6020b8: "string 2"

You can also notice that at 0x602060 there is a pointer to 0x602010, and at 0x6020b0 there is a pointer to 0x602060.

That gives you a guess that there is a Node at 0x602060, and another at 0x6020b0. You can confirm this guess:

(gdb) p *(Node*)0x602060
$1 = {next = 0x602010, payload = "string 1", '\000' <repeats 55 times>}
(gdb) p *(Node*)0x6020b0
$2 = {next = 0x602060, payload = "string 2", '\000' <repeats 55 times>}

And that's all there is to it.

How is memory arranged in stack, heap?

GCC uses a default stack alignment of 16 bytes (See the -mpreferred-stack-boundary option) This is why your pointer variables (which are on the stack) are aligned to 16 bytes.

The variable st is a structure and will be packed according to what the compiler thinks is most efficient but generally bytes will pack without padding. You have put 4 arrays of 4 bytes in your structure, so no padding is needed. Therefore each entry is 4-byte 'aligned'. Note that st itself still starts on a 16-byte boundary, even though its elements do not.

(If you have a mix of types in your structure, the compiler will pad them to ensure word alignment, although you can use an attribute to turn off padding if you have a good reason - e.g. defining some kind of comms stack)

The way memory is allocated on the heap (i.e. by malloc) is a function of the heap allocation strategy of the system (the c library and the OS) - it's possible to use different allocation strategies but this is a big topic (see: https://en.wikipedia.org/wiki/Heap_(data_structure) )

how to interpret stack memory from gdb

(gdb) x/10x $sp
0xffeac63c: 0xf7d39cba 0xf7d3c0d8 0xf7d3c21b 0x00000001
0xffeac64c: 0xf78d133f 0xffeac6f4 0xf7a14450 0xffeac678
0xffeac65c: 0x00000000 0xf7d3790e

First column is memory address of the first byte following.

The other four columns are four 32 bit values stored in memory.

I.e. the first line means that at address 0xffeac63c the memory contains byte value 0xba, at address 0xffeac63d there is value 0x9c, etc.. up till address 0xffeac64b where the value 0x00 is stored (Intel is little endian, so dword 0xf7d39cba is stored in memory as bytes ba 9c d3 f7).

What those values means .. well, 0xf7d39cba is 0xf7d39cba, a 32 bit value. Content of memory doesn't mean anything, until you give those values some meaning by the code which is using them.

I.e. if next instruction to execute is ret and esp is pointing at 0xf7d39cba, then that value is used as return address.

If next instruction is pop eax, then that value will be fetched into register eax, and used for whatever the code does further with value in eax...



Related Topics



Leave a reply



Submit