gprof和(unix)时间之间的差异; gprof报告运行时间较短

这个问题在这里已经有了答案:

  • 时间抽样问题与gprof 1答案

  • gprof不知道它没有调试信息访问的功能,即标准库。 如果你想得到准确的经过时间,仍然可以得到一个callgraph,你可以使用perf

    作为一个例子,我编写了一个循环10000次的程序。 在这个循环中,我用随机值填充一个向量然后对其进行排序。 对于gprof ,我执行以下步骤:

    g++ -std=c++11 -O2 -pg -g
    ./a.out
    gprof -b ./a.out
    

    如果gmon.out不存在,则创建gmon.out ;如果没有,则覆盖gmon.out ,如果不指定要使用的文件, gprof将自动使用它。 -b抑制文本blurbs。

    以下是输出示例:

    Each sample counts as 0.01 seconds.
      %   cumulative   self              self     total           
     time   seconds   seconds    calls  Ts/call  Ts/call  name    
    100.52      4.94     4.94                             frame_dummy
      0.00      4.94     0.00       26     0.00     0.00  void std::__adjust_heap<__gnu_cxx::__normal_iterator<double*, std::vector<double, std::allocator<double> > >, long, double, __gnu_cxx::__ops::_Iter_less_iter>(__gnu_cxx::__normal_iterator<double*, std::vector<double, std::allocator<double> > >, long, long, double, __gnu_cxx::__ops::_Iter_less_iter)
      0.00      4.94     0.00        1     0.00     0.00  _GLOBAL__sub_I_main
    

    如你所见,它只记录向量堆实现的时间,并且根本不知道排序(或其他)。 现在让我们试试perf

    perf record -g ./a.out
    perf report --call-graph --stdio
    
    # Total Lost Samples: 0
    #
    # Samples: 32K of event 'cycles'
    # Event count (approx.): 31899806183
    #
    # Children      Self  Command  Shared Object        Symbol                                                                            
    # ........  ........  .......  ...................  ..................................................................................
    #
        99.98%    34.46%  a.out    a.out                [.] main                                                                          
                  |          
                  |--65.52%-- main
                  |          |          
                  |          |--65.29%-- std::__introsort_loop<__gnu_cxx::__normal_iterator<double*, std::vector<double
    

    [其余部分省略]

    正如你所看到的, perf捕获了排序功能。 如果我运行perf stat ,我也会得到准确的运行时间。

    如果您使用的是GCC,则可以传递-D_GLIBCXX_DEBUG以使其使用调试库实现。 这会让你的代码运行慢很多,但为了让gprof能够看到这些函数,这是必须的。 一个例子:

    g++ -std=c++11 -O2 test.cpp -D_GLIBCXX_DEBUG -pg -g
    ./a.out
    gprof -b ./a.out
    
    Each sample counts as 0.01 seconds.
      %   cumulative   self              self     total           
     time   seconds   seconds    calls  us/call  us/call  name    
     88.26      0.15     0.15   102875     1.46     1.46  __gnu_debug::_Safe_iterator<__gnu_cxx::__normal_iterator<double*, std::__cxx1998::vector<double, std::allocator<double> > >, std::__debug::vector<double, std::allocator<double> > > std::__unguarded_partition<__gnu_debug::_Safe_iterator<__gnu_cxx::__normal_iterator<double*, std::__cxx1998::vector<double, std::allocator<double> > >, std::__debug::vector<double, std::allocator<double> > >, __gnu_cxx::__ops::_Iter_less_iter>(__gnu_debug::_Safe_iterator<__gnu_cxx::__normal_iterator<double*, std::__cxx1998::vector<double, std::allocator<double> > >, std::__debug::vector<double, std::allocator<double> > >, __gnu_debug::_Safe_iterator<__gnu_cxx::__normal_iterator<double*, std::__cxx1998::vector<double, std::allocator<double> > >, std::__debug::vector<double, std::allocator<double> > >, __gnu_debug::_Safe_iterator<__gnu_cxx::__normal_iterator<double*, std::__cxx1998::vector<double, std::allocator<double> > >, std::__debug::vector<double, std::allocator<double> > >, __gnu_cxx::__ops::_Iter_less_iter)
     11.77      0.17     0.02   996280     0.02     0.02  void std::__unguarded_linear_insert<__gnu_debug::_Safe_iterator<__gnu_cxx::__normal_iterator<double*, std::__cxx1998::vector<double, std::allocator<double> > >, std::__debug::vector<double, std::allocator<double> > >, __gnu_cxx::__ops::_Val_less_iter>(__gnu_debug::_Safe_iterator<__gnu_cxx::__normal_iterator<double*, std::__cxx1998::vector<double, std::allocator<double> > >, std::__debug::vector<double, std::allocator<double> > >, __gnu_cxx::__ops::_Val_less_iter)
      0.00      0.17     0.00   417220     0.00     0.00  frame_dummy
      0.00      0.17     0.00   102875     0.00     0.00  void std::__move_median_to_first<__gnu_debug::_Safe_iterator<__gnu_cxx::__normal_iterator<double*, std::__cxx1998::vector<double, std::allocator<double> > >, std::__debug::vector<double, std::allocator<double> > >, __gnu_cxx::__ops::_Iter_less_iter>(__gnu_debug::_Safe_iterator<__gnu_cxx::__normal_iterator<double*, std::__cxx1998::vector<double, std::allocator<double> > >, std::__debug::vector<double, std::allocator<double> > >, __gnu_debug::_Safe_iterator<__gnu_cxx::__normal_iterator<double*, std::__cxx1998::vector<double, std::allocator<double> > >, std::__debug::vector<double, std::allocator<double> > >, __gnu_debug::_Safe_iterator<__gnu_cxx::__normal_iterator<double*, std::__cxx1998::vector<double, std::allocator<double> > >, std::__debug::vector<double, std::allocator<double> > >, __gnu_debug::_Safe_iterator<__gnu_cxx::__normal_iterator<double*, std::__cxx1998::vector<double, std::allocator<double> > >, std::__debug::vector<double, std::allocator<double> > >, __gnu_cxx::__ops::_Iter_less_iter)
      0.00      0.17     0.00     1000     0.00     0.25  void std::__insertion_sort<__gnu_debug::_Safe_iterator<__gnu_cxx::__normal_iterator<double*, std::__cxx1998::vector<double, std::allocator<double> > >, std::__debug::vector<double, std::allocator<double> > >, __gnu_cxx::__ops::_Iter_less_iter>(__gnu_debug::_Safe_iterator<__gnu_cxx::__normal_iterator<double*, std::__cxx1998::vector<double, std::allocator<double> > >, std::__debug::vector<double, std::allocator<double> > >, __gnu_debug::_Safe_iterator<__gnu_cxx::__normal_iterator<double*, std::__cxx1998::vector<double, std::allocator<double> > >, std::__debug::vector<double, std::allocator<double> > >, __gnu_cxx::__ops::_Iter_less_iter)
      0.00      0.17     0.00        1     0.00     0.00  _GLOBAL__sub_I_main
    

    我故意减少了迭代次数,以便在合理的时间内完成执行,但是您会发现gprof现在显示了它之前不计算的函数。

    链接地址: http://www.djcxy.com/p/40353.html

    上一篇: Discrepancy between gprof and (unix) time; gprof reports lower runtimes

    下一篇: Is GNU gprof buggy?