Gprof Reports No Time Accumulated

gprof reports no time accumulated

I thought I might share this Apple mailing list discussion which I recently ran across.

The behaviour described here is exactly what I am experiencing.
It looks like gprof has been broken on OSX for quite some time.

I've resorted to Shark which has been helpfully suggested by Dave Rigby.

Thanks!

Report shows no time accumulated for gprof using Eclipse CDT

Sorry for the question, seems that the code is faster than gprof can measure.

As my application involves a neural network train with several iterations and further testing of kernels, I didn't suspected that a fast code could be causing the problem. I inserted a long loop in the main body and the gprof time was printed.

Discrepancy between gprof and (unix) time; gprof reports lower runtimes

gprof doesn't know about functions it doesn't have debug info access to, i.e the standard library. If you want to get accurate elapsed time and still get a callgraph you can use perf.

As an example I wrote a program that loops 10000 times. In this loop, I fill a vector with random values and then sort it. For gprof, I do the following steps:

g++ -std=c++11 -O2 -pg -g
./a.out
gprof -b ./a.out

gmon.out is created if it doesn't exist, and overwritten if it does, and is automatically used by gprof if you don't specify a file to use. -b suppresses the text blurbs.

Here's example output:

Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls Ts/call Ts/call name
100.52 4.94 4.94 frame_dummy
0.00 4.94 0.00 26 0.00 0.00 void std::__adjust_heap<__gnu_cxx::__normal_iterator<double*, std::vector<double, std::allocator<double> > >, long, double, __gnu_cxx::__ops::_Iter_less_iter>(__gnu_cxx::__normal_iterator<double*, std::vector<double, std::allocator<double> > >, long, long, double, __gnu_cxx::__ops::_Iter_less_iter)
0.00 4.94 0.00 1 0.00 0.00 _GLOBAL__sub_I_main

As you can see, it only records the time for the vector heap implementation, and doesn't know about sort (or anything else) at all. Now let's try perf:

perf record -g ./a.out
perf report --call-graph --stdio

# Total Lost Samples: 0
#
# Samples: 32K of event 'cycles'
# Event count (approx.): 31899806183
#
# Children Self Command Shared Object Symbol
# ........ ........ ....... ................... ..................................................................................
#
99.98% 34.46% a.out a.out [.] main
|
|--65.52%-- main
| |
| |--65.29%-- std::__introsort_loop<__gnu_cxx::__normal_iterator<double*, std::vector<double

[Rest of text omitted]

As you can see, perf catches the sort function. If I ran perf stat, I would also get the accurate runtime.

If you're using GCC, you can pass -D_GLIBCXX_DEBUG to have it use the debug library implementation. This will make your code run a lot slower, but necessary in order for gprof to see those functions. An example:

g++ -std=c++11 -O2 test.cpp -D_GLIBCXX_DEBUG -pg -g
./a.out
gprof -b ./a.out

Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls us/call us/call name
88.26 0.15 0.15 102875 1.46 1.46 __gnu_debug::_Safe_iterator<__gnu_cxx::__normal_iterator<double*, std::__cxx1998::vector<double, std::allocator<double> > >, std::__debug::vector<double, std::allocator<double> > > std::__unguarded_partition<__gnu_debug::_Safe_iterator<__gnu_cxx::__normal_iterator<double*, std::__cxx1998::vector<double, std::allocator<double> > >, std::__debug::vector<double, std::allocator<double> > >, __gnu_cxx::__ops::_Iter_less_iter>(__gnu_debug::_Safe_iterator<__gnu_cxx::__normal_iterator<double*, std::__cxx1998::vector<double, std::allocator<double> > >, std::__debug::vector<double, std::allocator<double> > >, __gnu_debug::_Safe_iterator<__gnu_cxx::__normal_iterator<double*, std::__cxx1998::vector<double, std::allocator<double> > >, std::__debug::vector<double, std::allocator<double> > >, __gnu_debug::_Safe_iterator<__gnu_cxx::__normal_iterator<double*, std::__cxx1998::vector<double, std::allocator<double> > >, std::__debug::vector<double, std::allocator<double> > >, __gnu_cxx::__ops::_Iter_less_iter)
11.77 0.17 0.02 996280 0.02 0.02 void std::__unguarded_linear_insert<__gnu_debug::_Safe_iterator<__gnu_cxx::__normal_iterator<double*, std::__cxx1998::vector<double, std::allocator<double> > >, std::__debug::vector<double, std::allocator<double> > >, __gnu_cxx::__ops::_Val_less_iter>(__gnu_debug::_Safe_iterator<__gnu_cxx::__normal_iterator<double*, std::__cxx1998::vector<double, std::allocator<double> > >, std::__debug::vector<double, std::allocator<double> > >, __gnu_cxx::__ops::_Val_less_iter)
0.00 0.17 0.00 417220 0.00 0.00 frame_dummy
0.00 0.17 0.00 102875 0.00 0.00 void std::__move_median_to_first<__gnu_debug::_Safe_iterator<__gnu_cxx::__normal_iterator<double*, std::__cxx1998::vector<double, std::allocator<double> > >, std::__debug::vector<double, std::allocator<double> > >, __gnu_cxx::__ops::_Iter_less_iter>(__gnu_debug::_Safe_iterator<__gnu_cxx::__normal_iterator<double*, std::__cxx1998::vector<double, std::allocator<double> > >, std::__debug::vector<double, std::allocator<double> > >, __gnu_debug::_Safe_iterator<__gnu_cxx::__normal_iterator<double*, std::__cxx1998::vector<double, std::allocator<double> > >, std::__debug::vector<double, std::allocator<double> > >, __gnu_debug::_Safe_iterator<__gnu_cxx::__normal_iterator<double*, std::__cxx1998::vector<double, std::allocator<double> > >, std::__debug::vector<double, std::allocator<double> > >, __gnu_debug::_Safe_iterator<__gnu_cxx::__normal_iterator<double*, std::__cxx1998::vector<double, std::allocator<double> > >, std::__debug::vector<double, std::allocator<double> > >, __gnu_cxx::__ops::_Iter_less_iter)
0.00 0.17 0.00 1000 0.00 0.25 void std::__insertion_sort<__gnu_debug::_Safe_iterator<__gnu_cxx::__normal_iterator<double*, std::__cxx1998::vector<double, std::allocator<double> > >, std::__debug::vector<double, std::allocator<double> > >, __gnu_cxx::__ops::_Iter_less_iter>(__gnu_debug::_Safe_iterator<__gnu_cxx::__normal_iterator<double*, std::__cxx1998::vector<double, std::allocator<double> > >, std::__debug::vector<double, std::allocator<double> > >, __gnu_debug::_Safe_iterator<__gnu_cxx::__normal_iterator<double*, std::__cxx1998::vector<double, std::allocator<double> > >, std::__debug::vector<double, std::allocator<double> > >, __gnu_cxx::__ops::_Iter_less_iter)
0.00 0.17 0.00 1 0.00 0.00 _GLOBAL__sub_I_main

I deliberately reduced the amount of iterations to make the execution complete in a reasonable amount of time, but you see gprof now shows the functions it wasn't counting before.

Time Sampling Problems with gprof

gprof doesn't count any blocked time, like I/O or other stuff. Also "self time" typically is extremely small in any routine that does all its work in subfunctions, like if you're mostly using a library in a DLL where gprof can't see it. Check this answer.



Related Topics



Leave a reply



Submit