Processes exceeding thread stack size limit on RedHat Enterprise Linux 6?
I have a couple of processes running on RHEL 6.3, but for some reason they are exceeding the thread stack sizes.
For example, the Java process is given the stack size of -Xss256k at runtime on startup, and the C++ process is given a thread stack size of 1MB using pthread_attr_setstacksize() in the actual code.
For some reason however, these processes are not sticking to these limits, and I'm not sure why.
For example, when I run
pmap -x <pid>
for the C++ and Java process, I can see hundreds of 'anon' threads for each (which I have confirmed are the internal worker threads created by each of these processes), but these have an allocated value of 64MB each, not the limits set above:
00007fa4fc000000 168 40 40 rw--- [ anon ]
00007fa4fc02a000 65368 0 0 ----- [ anon ]
00007fa500000000 168 40 40 rw--- [ anon ]
00007fa50002a000 65368 0 0 ----- [ anon ]
00007fa504000000 168 40 40 rw--- [ anon ]
00007fa50402a000 65368 0 0 ----- [ anon ]
00007fa508000000 168 40 40 rw--- [ anon ]
00007fa50802a000 65368 0 0 ----- [ anon ]
00007fa50c000000 168 40 40 rw--- [ anon ]
00007fa50c02a000 65368 0 0 ----- [ anon ]
00007fa510000000 168 40 40 rw--- [ anon ]
00007fa51002a000 65368 0 0 ----- [ anon ]
00007fa514000000 168 40 40 rw--- [ anon ]
00007fa51402a000 65368 0 0 ----- [ anon ]
00007fa518000000 168 40 40 rw--- [ anon ]
...
But when I run the following on the above process with all the 64MB 'anon' threads
cat /proc/<pid>/limits | grep stack
Max stack size 1048576 1048576 bytes
it shows a max thread stack size of 1MB, so am a bit confused as to what is going on here. Also, the script that calls these programs sets 'ulimit -s 1024' as well.
It should be noted that this only seems to occur when using a very high end machines (eg 48GB RAM, 24 CPU cores). The issue does not appear on less powerful machines (eg 4GB RAM, 2 CPU cores).
Any help understanding what is happening here would be much appreciated.
Turns out that RHEL6 2.11 have changed the thread model such that each thread where possible gets allocated its own thread pool, so on a larger system you may see it grabbing up to the 64MB. On 64 bit the max number of thread pools allowed is greater.
The fix for this was to add
export LD_PRELOAD=/path/to/libtcmalloc.so
in the script that starts the processes (rather than using glibc2.11)
Some more inforation on this is available from:
Linux glibc >= 2.10 (RHEL 6) malloc may show excessive virtual memory usage https://www.ibm.com/developerworks/mydeveloperworks/blogs/kevgrig/entry/linux_glibc_2_10_rhel_6_malloc_may_show_excessive_virtual_memory_usage?lang=en
glibc bug malloc uses excessive memory for multi-threaded applications http://sourceware.org/bugzilla/show_bug.cgi?id=11261
Apache hadoop have fixed the problem by setting MALLOC_ARENA_MAX https://issues.apache.org/jira/browse/HADOOP-7154
The stack size as reported with /proc/1234/limits
is set with setrlimit(2) (perhaps by PAM subsystem at login time).
I have no real idea as to why the actual stack segments seems to be 64Mb each. Perhaps your big server uses huge pages (but your desktop don't).
You might call setrlimit
(perhaps with the ulimit
bash builtin, or the limit
zsh builtin) in eg the script calling your program.
You can use the ulimit -s <size_in_KB>
to set the maximum stack size for processes. You can see the current limit using ulimit -s
too.