Measure memory usage of code unit

I have a function memory that takes a function and measures the memory usage of it:

import java.lang.management.ManagementFactory

def memory[T](
    f: ⇒ T
)(
    mu: Long ⇒ Unit
): T = {
    val memoryMXBean = ManagementFactory.getMemoryMXBean
    memoryMXBean.gc()
    val usedBefore = memoryMXBean.getHeapMemoryUsage.getUsed
    println(s"${memoryMXBean.getObjectPendingFinalizationCount()} pending, used $usedBefore")
    val r = f
    memoryMXBean.gc()
    val usedAfter = memoryMXBean.getHeapMemoryUsage.getUsed
    println(s"${memoryMXBean.getObjectPendingFinalizationCount()} pending, used $usedAfter")
    mu(usedAfter - usedBefore)
    r
}

Getting the amount of memory used by new Array[Byte](1024*1024) should return 1MB.

memory{new Array[Byte](1024*1024)}{r=>println(s"$r byte")}

But the very first call of memory returns a negative result, subsequent calls measure (even with different bodys) the memory usage just fine:

scala> memory{new Array[Byte](1024*1024)}{r=>println(s"$r byte")}
0 pending, used 45145040
0 pending, used 45210384
65344 byte                <- 65kb != 1MB

scala> memory{new Array[Byte](1024*1024)}{r=>println(s"$r byte")}
0 pending, used 45304512
0 pending, used 46353104
1048592 byte              <- Correct

Somewhere between the two memoryMXBean.getHeapMemoryUsage something gets freed, but there where no pending object to be freed. This behaviour can be also determined when you have an empty body (remember to restart the scala console to get this result):

scala> memory{}{r=>println(s"$r byte")}
0 pending, used 44917584
0 pending, used 44025552
-892032 byte              <- 800kb less memory?

scala> memory{}{r=>println(s"$r byte")}
0 pending, used 44070440
0 pending, used 44069960
-480 byte                 <- This is ok

Also executing the gc() and getHeapMemoryUsage on the console produces this result:

scala> import java.lang.management.ManagementFactory; val memoryMXBean = ManagementFactory.getMemoryMXBean; memoryMXBean.setVerbose(true)
import java.lang.management.ManagementFactory
memoryMXBean: java.lang.management.MemoryMXBean = sun.management.MemoryImpl@2f98635e

scala> memoryMXBean.gc(); memoryMXBean.getHeapMemoryUsage
[GC (System.gc())  57400K->44462K(109056K), 0,0148555 secs]
[Full GC (System.gc())  44462K->39602K(109056K), 0,2641397 secs]
res1: java.lang.management.MemoryUsage = init = 33554432(32768K) used = 41358440(40389K) committed = 111673344(109056K) max = 239075328(233472K)

scala> memoryMXBean.gc(); memoryMXBean.getHeapMemoryUsage
[GC (System.gc())  46702K->40258K(111104K), 0,0025801 secs]
[Full GC (System.gc())  40258K->39631K(111104K), 0,1988796 secs]
res2: java.lang.management.MemoryUsage = init = 33554432(32768K) used = 40583120(39631K) committed = 113770496(111104K) max = 239075328(233472K)

41358440 - 40583120 = 775320 , almost 800kb less memory usage (see used ).

Why does the very first measurement return a wrong result? Is there a way to fix this other than running the method twice?

Using Scala 2.12.1-20161205-201300-2787b47 (OpenJDK 64-Bit Server VM, Java 1.8.0_112) on Arch Linux.

Thanks!


Using JAMM

If you want to check how much memory a data structure on the JVM consumes, you should look into instrumentation libraries such as JAMM. It works by traversing the object graph of the object you want to measure, and exploiting knowledge about the memory layout on the JVM you are running on.

Note that the data you will get back is specific to the JVM version and architecture you are using. On different architectures, the memory consumption might be different because of different pointer size and encoding. And on different JVMs, even the memory layout might be different.

Nevertheless, this is a powerful tool to implement highly efficient data structures on the JVM.

Here is how you would use JAMM from scala:

val o = new Array[Byte](1024*1024)
val mm = new MemoryMeter()
println("Size of new Array[Byte](1024*1024): " + mm.measureDeep(o))

And here is the result:

Size of new Array[Byte](1024*1024): 1048592

The JAMM library is a java agent that hooks into the JVM. Therefore, using JAMM requires downloading the jamm jar and adding a parameter (eg -javaagent:jamm-0.3.0.jar ) to the java options, preferably using the javaOptions sbt key.

Automated memory tests

Note that if you rely on compact in-memory representation for some data structures you write, you should have automated tests that ensure that the in-memory representation is as you expect. For inspiration on how to set this up, here is a minimal project that imports and configures the JAMM java agent for the tests.

To play around, you can just add your test code to JammTest and run it with sbt test:run .


The problem you have is that memory usage is not accurately accounted for to improve performance. This shows in two areas

  • The memory used is for live objects and object not yet collected. When you create a large object you can trigger a collection and end up with less memory in use than before.
  • Smaller objects are allocated from a Thread Local Allocation Buffer or TLAB. A TLAB is a local buffer for each thread to minimise the contention on the Eden space and thus allow thread to allocate concurrently. The down side is you don't see how much of each of these TLABs are used and only see big jumps in usage occasionally. A simple way around this is to turn the TLABs off -XX:-UseTLAB and you will get accurate account for even new Object() (Assuming a GC doesn't occur)
  • 链接地址: http://www.djcxy.com/p/80174.html

    上一篇: 如何确定一个过程“虚拟大小”(WinXP)?

    下一篇: 测量代码单元的内存使用情况