Profiling specific functions C++

I have looked into gprof. But dont quite understand how to acheive the following:

I have written a clustering procedure. In each iteration 4 functions are called repetitively. There are about a 100000 iterations to be done. I want to find out how much time was spent in each function.
These functions might call other sub functions and may involve data structures like hashmaps, maps etc. But I dont care about these sub functions. I just want to know how much total time was spent in all those parent functions over all the iterations. This will help me optimize my program better.

The problem with gprof is that, it analyzes every function. So even the functions of the stl datastructures are taken in to account.

Currently I am using clock_gettime. For each function, I output the time taken for each iteration. Then I manipulate this outputfile. For this I have to type a lot of profiling code. The profiling code makes my code look very complex and I want to avoid it. How is this done in industries?

Is there an easier way to do this?

If you have any other cleaner ways, please let me know


If I understand correctly, you're interested in how much time was spent in the four target functions you're interested in, but not any of the child functions called by those functions.

This information is provided in gprof's "flat" profile under "self seconds". Alternatively, if you're looking at the call graph, this timing is in the "self" column.


I'd take a look at telemetry. It's mainly targeted at game developers which wants to compare per frame data, but it seems to fit your requirements very well.


You want the self-time of those 4 functions, so you can optimize them specifically.

gprof will show you that, as a % of total time. Suppose it is 10%. If so, even if you were able to optimize it to 0%, you would get a speedup factor of 100/90 = 1.11, or a speedup of 11%. If it took 100 seconds, and that was too slow, chances are 90 seconds is also too slow.

However, the inclusive (self plus callees) time taken by those functions is likely to be a much larger %, 80%, to pick a number. If so, you could optimize it much more by having it make fewer calls to those callees. Alternatively, you could find that the callees are spending a big % doing things that you don't strictly need done, such as testing their arguments for generality's sake, in which case you could replace them with ad-hoc routines.

In fact, strictly speaking, there is no such thing as self time. Even the simplest instruction where the program counter is found is actually a call to a microcode subroutine.

Here is some discussion of the issues and a constructive recommendation.

链接地址: http://www.djcxy.com/p/7624.html

上一篇: 用C / Intel程序集寻求最大位图(又名位阵列)性能

下一篇: 剖析特定函数C ++