Profiling C++ multi

Have you used any profiling tool like Intel Vtune analyzer?

What are your recommendations for a C++ multi threaded application on Linux and windows? I am primarily interested in cache misses, memory usage, memory leaks and CPU usage.

I use valgrind (only on UNIX), but mainly for finding memory errors and leaks.


Following are the good tools for multithreaded applications. You can try evaluation copy.

  • Runtime sanity check tool
  • Thread Checker -- Intel Thread checker / VTune, here
  • Memory consistency-check tools (memory usage, memory leaks) - Memory Validator, here
  • Performance Analysis. (CPU usage) - AQTime , here
  • EDIT : Intel thread checker can be used to diagnose Data races, Deadlocks, Stalled threads, abandoned locks etc. Please have lots of patience in analyzing the results as it is easy to get confused.

    Few tips:

  • Disable the features that are not required.(In case of identifying deadlocks, data race can be disabled and vice versa.)
  • Use Instrumentation level based on your need. Levels like "All Function" and "Full Image" are used for data races, where as "API Imports" can be used for deadlock detection)
  • use context sensitive menu "Diagnostic Help" often.

  • On Linux, try oprofile. It supports various performance counters.

    On Windows, AMD's CodeAnalyst (free, unlike VTune) is worth a look. It only supports event profiling on AMD hardware though (on Intel CPUs it's just a handy timer-based profiler).

    A colleague recently tried Intel Parallel Studio (beta) and rated it favourably (it found some interesting parallelism-related issues in some code).


    VTune give you a lot of details on what the processor is doing and sometimes I find it hard to see the wood for the trees. VTune will not report on memory leaks. You'll need purify plus for that, or if you can run on a Linux box valgrind is good for memory leaks at a great price.

    VTune shows two views, one is useful the tabular one, the other I think is just for sales men to impress people with but not that useful.

    For quick and cheap option I'd go with valgrind. Valgrind also has a cache grind part to it but i've not used it, but suspect its very good also.

    cheers, Martin.

    链接地址: http://www.djcxy.com/p/40330.html

    上一篇: 如何在Python中使用线程?

    下一篇: 剖析C ++ multi