Discrepancy between gprof and (unix) time; gprof reports lower runtimes
This question already has an answer here:
gprof
doesn't know about functions it doesn't have debug info access to, ie the standard library. If you want to get accurate elapsed time and still get a callgraph you can use perf
.
As an example I wrote a program that loops 10000 times. In this loop, I fill a vector with random values and then sort it. For gprof
, I do the following steps:
g++ -std=c++11 -O2 -pg -g
./a.out
gprof -b ./a.out
gmon.out
is created if it doesn't exist, and overwritten if it does, and is automatically used by gprof
if you don't specify a file to use. -b
suppresses the text blurbs.
Here's example output:
Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls Ts/call Ts/call name
100.52 4.94 4.94 frame_dummy
0.00 4.94 0.00 26 0.00 0.00 void std::__adjust_heap<__gnu_cxx::__normal_iterator<double*, std::vector<double, std::allocator<double> > >, long, double, __gnu_cxx::__ops::_Iter_less_iter>(__gnu_cxx::__normal_iterator<double*, std::vector<double, std::allocator<double> > >, long, long, double, __gnu_cxx::__ops::_Iter_less_iter)
0.00 4.94 0.00 1 0.00 0.00 _GLOBAL__sub_I_main
As you can see, it only records the time for the vector heap implementation, and doesn't know about sort (or anything else) at all. Now let's try perf
:
perf record -g ./a.out
perf report --call-graph --stdio
# Total Lost Samples: 0
#
# Samples: 32K of event 'cycles'
# Event count (approx.): 31899806183
#
# Children Self Command Shared Object Symbol
# ........ ........ ....... ................... ..................................................................................
#
99.98% 34.46% a.out a.out [.] main
|
|--65.52%-- main
| |
| |--65.29%-- std::__introsort_loop<__gnu_cxx::__normal_iterator<double*, std::vector<double
[Rest of text omitted]
As you can see, perf
catches the sort function. If I ran perf stat
, I would also get the accurate runtime.
If you're using GCC, you can pass -D_GLIBCXX_DEBUG
to have it use the debug library implementation. This will make your code run a lot slower, but necessary in order for gprof
to see those functions. An example:
g++ -std=c++11 -O2 test.cpp -D_GLIBCXX_DEBUG -pg -g
./a.out
gprof -b ./a.out
Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls us/call us/call name
88.26 0.15 0.15 102875 1.46 1.46 __gnu_debug::_Safe_iterator<__gnu_cxx::__normal_iterator<double*, std::__cxx1998::vector<double, std::allocator<double> > >, std::__debug::vector<double, std::allocator<double> > > std::__unguarded_partition<__gnu_debug::_Safe_iterator<__gnu_cxx::__normal_iterator<double*, std::__cxx1998::vector<double, std::allocator<double> > >, std::__debug::vector<double, std::allocator<double> > >, __gnu_cxx::__ops::_Iter_less_iter>(__gnu_debug::_Safe_iterator<__gnu_cxx::__normal_iterator<double*, std::__cxx1998::vector<double, std::allocator<double> > >, std::__debug::vector<double, std::allocator<double> > >, __gnu_debug::_Safe_iterator<__gnu_cxx::__normal_iterator<double*, std::__cxx1998::vector<double, std::allocator<double> > >, std::__debug::vector<double, std::allocator<double> > >, __gnu_debug::_Safe_iterator<__gnu_cxx::__normal_iterator<double*, std::__cxx1998::vector<double, std::allocator<double> > >, std::__debug::vector<double, std::allocator<double> > >, __gnu_cxx::__ops::_Iter_less_iter)
11.77 0.17 0.02 996280 0.02 0.02 void std::__unguarded_linear_insert<__gnu_debug::_Safe_iterator<__gnu_cxx::__normal_iterator<double*, std::__cxx1998::vector<double, std::allocator<double> > >, std::__debug::vector<double, std::allocator<double> > >, __gnu_cxx::__ops::_Val_less_iter>(__gnu_debug::_Safe_iterator<__gnu_cxx::__normal_iterator<double*, std::__cxx1998::vector<double, std::allocator<double> > >, std::__debug::vector<double, std::allocator<double> > >, __gnu_cxx::__ops::_Val_less_iter)
0.00 0.17 0.00 417220 0.00 0.00 frame_dummy
0.00 0.17 0.00 102875 0.00 0.00 void std::__move_median_to_first<__gnu_debug::_Safe_iterator<__gnu_cxx::__normal_iterator<double*, std::__cxx1998::vector<double, std::allocator<double> > >, std::__debug::vector<double, std::allocator<double> > >, __gnu_cxx::__ops::_Iter_less_iter>(__gnu_debug::_Safe_iterator<__gnu_cxx::__normal_iterator<double*, std::__cxx1998::vector<double, std::allocator<double> > >, std::__debug::vector<double, std::allocator<double> > >, __gnu_debug::_Safe_iterator<__gnu_cxx::__normal_iterator<double*, std::__cxx1998::vector<double, std::allocator<double> > >, std::__debug::vector<double, std::allocator<double> > >, __gnu_debug::_Safe_iterator<__gnu_cxx::__normal_iterator<double*, std::__cxx1998::vector<double, std::allocator<double> > >, std::__debug::vector<double, std::allocator<double> > >, __gnu_debug::_Safe_iterator<__gnu_cxx::__normal_iterator<double*, std::__cxx1998::vector<double, std::allocator<double> > >, std::__debug::vector<double, std::allocator<double> > >, __gnu_cxx::__ops::_Iter_less_iter)
0.00 0.17 0.00 1000 0.00 0.25 void std::__insertion_sort<__gnu_debug::_Safe_iterator<__gnu_cxx::__normal_iterator<double*, std::__cxx1998::vector<double, std::allocator<double> > >, std::__debug::vector<double, std::allocator<double> > >, __gnu_cxx::__ops::_Iter_less_iter>(__gnu_debug::_Safe_iterator<__gnu_cxx::__normal_iterator<double*, std::__cxx1998::vector<double, std::allocator<double> > >, std::__debug::vector<double, std::allocator<double> > >, __gnu_debug::_Safe_iterator<__gnu_cxx::__normal_iterator<double*, std::__cxx1998::vector<double, std::allocator<double> > >, std::__debug::vector<double, std::allocator<double> > >, __gnu_cxx::__ops::_Iter_less_iter)
0.00 0.17 0.00 1 0.00 0.00 _GLOBAL__sub_I_main
I deliberately reduced the amount of iterations to make the execution complete in a reasonable amount of time, but you see gprof
now shows the functions it wasn't counting before.
上一篇: gprof产生空输出