Debugging the CPU Caches

I'm currently trying to optimize my software for better CPU cache usage. There are some posts on SO which suggest that it's sometimes hard to guess what the CPU cache is doing and why there are some performance drops in certain cases. For example:

  • Why does the speed of memcpy() drop dramatically every 4KB?
  • Why is my program slow when looping over exactly 8192 elements?
  • Why is transposing a matrix of 512x512 much slower than transposing a matrix of 513x513?
  • So in order to get a clue where the cache misses happen, I can run perf to get a count of cache misses and where they occur as well as valgrind --tool=cachegrind to simulate the caches (at least an L1 and a last-level cache).

    It's really nice to know where cache misses happen, but I'd like to know why they happen (for example cache trashing etc.). Is there a way to explicitly pause the program and see whats inside the caches (maybe with the program running in valgrind and vgdb attached)?


    In my experience you'll need to disassemble your binary and, look to see where the program is using the cache. Look to see where the prefetch or cache instructions are called. That will give you the where and whys of it. It's an unfortunately painful process.

    链接地址: http://www.djcxy.com/p/85964.html

    上一篇: 为什么MATLAB在矩阵乘法中如此快速?

    下一篇: 调试CPU高速缓存