How to know which malloc is used?

The way I understand it, there exist many different malloc implementations:

  • dlmalloc – General purpose allocator
  • ptmalloc2 – glibc
  • jemalloc – FreeBSD and Firefox
  • tcmalloc – Google
  • libumem – Solaris
  • Is there any way to determine which malloc is actually used on my (linux) system?

    I read that "due to ptmalloc2's threading support, it became the default memory allocator for linux." Is there any way for me to check this myself?

    I am asking because I do not seem to get any speed up by paralellizing my malloc loop in the code below:

    for (int i = 1; i <= 16; i += 1 ) {
        parallelMalloc(i);
    }
    
     void parallelMalloc(int parallelism, int mallocCnt = 10000000) {
    
        omp_set_num_threads(parallelism);
    
        std::vector<char*> ptrStore(mallocCnt);
    
        boost::posix_time::ptime t1 = boost::posix_time::microsec_clock::local_time();
    
        #pragma omp parallel for
        for (int i = 0; i < mallocCnt; i++) {
            ptrStore[i] = ((char*)malloc(100 * sizeof(char)));
        }
    
        boost::posix_time::ptime t2 = boost::posix_time::microsec_clock::local_time();
    
        #pragma omp parallel for
        for (int i = 0; i < mallocCnt; i++) {
            free(ptrStore[i]);
        }
    
        boost::posix_time::ptime t3 = boost::posix_time::microsec_clock::local_time();
    
    
        boost::posix_time::time_duration malloc_time = t2 - t1;
        boost::posix_time::time_duration free_time   = t3 - t2;
    
        std::cout << " parallelism = "  << parallelism << "t itr = " << mallocCnt <<  "t malloc_time = " <<
                malloc_time.total_milliseconds() << "t free_time = " << free_time.total_milliseconds() << std::endl;
    }
    

    which gives me an output of

     parallelism = 1         itr = 10000000  malloc_time = 1225      free_time = 1517
     parallelism = 2         itr = 10000000  malloc_time = 1614      free_time = 1112
     parallelism = 3         itr = 10000000  malloc_time = 1619      free_time = 687
     parallelism = 4         itr = 10000000  malloc_time = 2325      free_time = 620
     parallelism = 5         itr = 10000000  malloc_time = 2233      free_time = 550
     parallelism = 6         itr = 10000000  malloc_time = 2207      free_time = 489
     parallelism = 7         itr = 10000000  malloc_time = 2778      free_time = 398
     parallelism = 8         itr = 10000000  malloc_time = 1813      free_time = 389
     parallelism = 9         itr = 10000000  malloc_time = 1997      free_time = 350
     parallelism = 10        itr = 10000000  malloc_time = 1922      free_time = 291
     parallelism = 11        itr = 10000000  malloc_time = 2480      free_time = 257
     parallelism = 12        itr = 10000000  malloc_time = 1614      free_time = 256
     parallelism = 13        itr = 10000000  malloc_time = 1387      free_time = 289
     parallelism = 14        itr = 10000000  malloc_time = 1481      free_time = 248
     parallelism = 15        itr = 10000000  malloc_time = 1252      free_time = 297
     parallelism = 16        itr = 10000000  malloc_time = 1063      free_time = 281
    

    I read that "due to ptmalloc2's threading support, it became the default memory allocator for linux." Is there any way for me to check this myself?

    glibc internally uses ptmalloc2 and this isn't a recent development. Either way, it's not terribly difficult to do getconf GNU_LIBC_VERSION , then cross-check the version to see if ptmalloc2 is used in that version or not, but I'm willing to bet you'd be wasting your time.

    I am asking because I do not seem to get any speed up by paralellizing my malloc loop in the code below

    Turning your example into an MVCE (omitting code here for brevity), and compiling with g++ -Wall -pedantic -O3 -pthread -fopenmp , with g++ 5.3.1 here are my results.

    With OpenMP:

     parallelism = 1     itr = 10000000  malloc_time = 746   free_time = 263
     parallelism = 2     itr = 10000000  malloc_time = 541   free_time = 267
     parallelism = 3     itr = 10000000  malloc_time = 405   free_time = 259
     parallelism = 4     itr = 10000000  malloc_time = 324   free_time = 221
     parallelism = 5     itr = 10000000  malloc_time = 330   free_time = 242
     parallelism = 6     itr = 10000000  malloc_time = 287   free_time = 244
     parallelism = 7     itr = 10000000  malloc_time = 257   free_time = 226
     parallelism = 8     itr = 10000000  malloc_time = 270   free_time = 225
     parallelism = 9     itr = 10000000  malloc_time = 253   free_time = 225
     parallelism = 10    itr = 10000000  malloc_time = 236   free_time = 226
     parallelism = 11    itr = 10000000  malloc_time = 225   free_time = 239
     parallelism = 12    itr = 10000000  malloc_time = 276   free_time = 258
     parallelism = 13    itr = 10000000  malloc_time = 241   free_time = 228
     parallelism = 14    itr = 10000000  malloc_time = 254   free_time = 225
     parallelism = 15    itr = 10000000  malloc_time = 278   free_time = 272
     parallelism = 16    itr = 10000000  malloc_time = 235   free_time = 220
    
    23.87 user 
    2.11 system 
    0:10.41 elapsed 
    249% CPU
    

    Without OpenMP:

     parallelism = 1     itr = 10000000  malloc_time = 748   free_time = 263
     parallelism = 2     itr = 10000000  malloc_time = 344   free_time = 256
     parallelism = 3     itr = 10000000  malloc_time = 751   free_time = 254
     parallelism = 4     itr = 10000000  malloc_time = 339   free_time = 262
     parallelism = 5     itr = 10000000  malloc_time = 748   free_time = 253
     parallelism = 6     itr = 10000000  malloc_time = 330   free_time = 256
     parallelism = 7     itr = 10000000  malloc_time = 734   free_time = 260
     parallelism = 8     itr = 10000000  malloc_time = 334   free_time = 259
     parallelism = 9     itr = 10000000  malloc_time = 750   free_time = 256
     parallelism = 10    itr = 10000000  malloc_time = 339   free_time = 255
     parallelism = 11    itr = 10000000  malloc_time = 743   free_time = 267
     parallelism = 12    itr = 10000000  malloc_time = 342   free_time = 261
     parallelism = 13    itr = 10000000  malloc_time = 739   free_time = 252
     parallelism = 14    itr = 10000000  malloc_time = 333   free_time = 252
     parallelism = 15    itr = 10000000  malloc_time = 740   free_time = 252
     parallelism = 16    itr = 10000000  malloc_time = 330   free_time = 252
    
    13.38 user 
    4.66 system 
    0:18.08 elapsed 
    99% CPU 
    

    Parallelism seems to be faster by about 8 seconds. Still not convinced? OK. I went ahead and grabbed dlmalloc , ran make to produce libmalloc.a . My new command like is g++ -Wall -pedantic -O3 -pthread -fopenmp -L$HOME/Development/test/dlmalloc/lib test.cpp -lmalloc

    With OpenMP:

    parallelism = 1  itr = 10000000  malloc_time = 814   free_time = 277
    

    I CTRL-C'd after 37 seconds.

    Without OpenMP:

     parallelism = 1     itr = 10000000  malloc_time = 772   free_time = 271
     parallelism = 2     itr = 10000000  malloc_time = 780   free_time = 272
     parallelism = 3     itr = 10000000  malloc_time = 783   free_time = 272
     parallelism = 4     itr = 10000000  malloc_time = 792   free_time = 277
     parallelism = 5     itr = 10000000  malloc_time = 813   free_time = 281
     parallelism = 6     itr = 10000000  malloc_time = 800   free_time = 275
     parallelism = 7     itr = 10000000  malloc_time = 795   free_time = 277
     parallelism = 8     itr = 10000000  malloc_time = 790   free_time = 273
     parallelism = 9     itr = 10000000  malloc_time = 788   free_time = 277
     parallelism = 10    itr = 10000000  malloc_time = 784   free_time = 276
     parallelism = 11    itr = 10000000  malloc_time = 786   free_time = 284
     parallelism = 12    itr = 10000000  malloc_time = 807   free_time = 279
     parallelism = 13    itr = 10000000  malloc_time = 791   free_time = 277
     parallelism = 14    itr = 10000000  malloc_time = 790   free_time = 273
     parallelism = 15    itr = 10000000  malloc_time = 785   free_time = 276
     parallelism = 16    itr = 10000000  malloc_time = 787   free_time = 275
    
    6.48 user 
    11.27 system 
    0:17.81 elapsed 
    99% CPU
    

    Pretty significant difference. I suspect that the issue lies within your more complicated code, or something's wrong with your benchmark.

    链接地址: http://www.djcxy.com/p/33960.html

    上一篇: imageOutput在conditionalPanel中单击

    下一篇: 如何知道使用哪个malloc?