如何知道使用哪个malloc?

我理解它的方式存在许多不同的malloc实现:

  • dlmalloc - 通用分配器
  • ptmalloc2 - glibc
  • jemalloc - FreeBSD和Firefox
  • tcmalloc - Google
  • libumem - Solaris
  • 有什么方法可以确定我的(linux)系统上实际使用了哪个malloc?

    我读到“由于ptmalloc2的线程支持,它成为Linux的默认内存分配器。” 有什么办法让我自己检查一下吗?

    我问,因为我似乎没有得到任何加速通过paralellizing我的malloc循环在下面的代码中:

    for (int i = 1; i <= 16; i += 1 ) {
        parallelMalloc(i);
    }
    
     void parallelMalloc(int parallelism, int mallocCnt = 10000000) {
    
        omp_set_num_threads(parallelism);
    
        std::vector<char*> ptrStore(mallocCnt);
    
        boost::posix_time::ptime t1 = boost::posix_time::microsec_clock::local_time();
    
        #pragma omp parallel for
        for (int i = 0; i < mallocCnt; i++) {
            ptrStore[i] = ((char*)malloc(100 * sizeof(char)));
        }
    
        boost::posix_time::ptime t2 = boost::posix_time::microsec_clock::local_time();
    
        #pragma omp parallel for
        for (int i = 0; i < mallocCnt; i++) {
            free(ptrStore[i]);
        }
    
        boost::posix_time::ptime t3 = boost::posix_time::microsec_clock::local_time();
    
    
        boost::posix_time::time_duration malloc_time = t2 - t1;
        boost::posix_time::time_duration free_time   = t3 - t2;
    
        std::cout << " parallelism = "  << parallelism << "t itr = " << mallocCnt <<  "t malloc_time = " <<
                malloc_time.total_milliseconds() << "t free_time = " << free_time.total_milliseconds() << std::endl;
    }
    

    这给了我一个输出

     parallelism = 1         itr = 10000000  malloc_time = 1225      free_time = 1517
     parallelism = 2         itr = 10000000  malloc_time = 1614      free_time = 1112
     parallelism = 3         itr = 10000000  malloc_time = 1619      free_time = 687
     parallelism = 4         itr = 10000000  malloc_time = 2325      free_time = 620
     parallelism = 5         itr = 10000000  malloc_time = 2233      free_time = 550
     parallelism = 6         itr = 10000000  malloc_time = 2207      free_time = 489
     parallelism = 7         itr = 10000000  malloc_time = 2778      free_time = 398
     parallelism = 8         itr = 10000000  malloc_time = 1813      free_time = 389
     parallelism = 9         itr = 10000000  malloc_time = 1997      free_time = 350
     parallelism = 10        itr = 10000000  malloc_time = 1922      free_time = 291
     parallelism = 11        itr = 10000000  malloc_time = 2480      free_time = 257
     parallelism = 12        itr = 10000000  malloc_time = 1614      free_time = 256
     parallelism = 13        itr = 10000000  malloc_time = 1387      free_time = 289
     parallelism = 14        itr = 10000000  malloc_time = 1481      free_time = 248
     parallelism = 15        itr = 10000000  malloc_time = 1252      free_time = 297
     parallelism = 16        itr = 10000000  malloc_time = 1063      free_time = 281
    

    我读到“由于ptmalloc2的线程支持,它成为Linux的默认内存分配器。” 有什么办法让我自己检查一下吗?

    glibc内部使用ptmalloc2 ,这不是最近的发展。 无论哪种方式,执行getconf GNU_LIBC_VERSION并不难,然后交叉检查版本以查看ptmalloc2是否在该版本中使用,但我敢打赌,你会浪费你的时间。

    我在问,因为我似乎没有得到任何加快我的malloc循环在下面的代码paralellll

    打开你的例子为MVCE(为简便起见这里省略码),并与编制g++ -Wall -pedantic -O3 -pthread -fopenmp ,与g++ 5.3.1这里是我的结果。

    借助OpenMP:

     parallelism = 1     itr = 10000000  malloc_time = 746   free_time = 263
     parallelism = 2     itr = 10000000  malloc_time = 541   free_time = 267
     parallelism = 3     itr = 10000000  malloc_time = 405   free_time = 259
     parallelism = 4     itr = 10000000  malloc_time = 324   free_time = 221
     parallelism = 5     itr = 10000000  malloc_time = 330   free_time = 242
     parallelism = 6     itr = 10000000  malloc_time = 287   free_time = 244
     parallelism = 7     itr = 10000000  malloc_time = 257   free_time = 226
     parallelism = 8     itr = 10000000  malloc_time = 270   free_time = 225
     parallelism = 9     itr = 10000000  malloc_time = 253   free_time = 225
     parallelism = 10    itr = 10000000  malloc_time = 236   free_time = 226
     parallelism = 11    itr = 10000000  malloc_time = 225   free_time = 239
     parallelism = 12    itr = 10000000  malloc_time = 276   free_time = 258
     parallelism = 13    itr = 10000000  malloc_time = 241   free_time = 228
     parallelism = 14    itr = 10000000  malloc_time = 254   free_time = 225
     parallelism = 15    itr = 10000000  malloc_time = 278   free_time = 272
     parallelism = 16    itr = 10000000  malloc_time = 235   free_time = 220
    
    23.87 user 
    2.11 system 
    0:10.41 elapsed 
    249% CPU
    

    没有OpenMP:

     parallelism = 1     itr = 10000000  malloc_time = 748   free_time = 263
     parallelism = 2     itr = 10000000  malloc_time = 344   free_time = 256
     parallelism = 3     itr = 10000000  malloc_time = 751   free_time = 254
     parallelism = 4     itr = 10000000  malloc_time = 339   free_time = 262
     parallelism = 5     itr = 10000000  malloc_time = 748   free_time = 253
     parallelism = 6     itr = 10000000  malloc_time = 330   free_time = 256
     parallelism = 7     itr = 10000000  malloc_time = 734   free_time = 260
     parallelism = 8     itr = 10000000  malloc_time = 334   free_time = 259
     parallelism = 9     itr = 10000000  malloc_time = 750   free_time = 256
     parallelism = 10    itr = 10000000  malloc_time = 339   free_time = 255
     parallelism = 11    itr = 10000000  malloc_time = 743   free_time = 267
     parallelism = 12    itr = 10000000  malloc_time = 342   free_time = 261
     parallelism = 13    itr = 10000000  malloc_time = 739   free_time = 252
     parallelism = 14    itr = 10000000  malloc_time = 333   free_time = 252
     parallelism = 15    itr = 10000000  malloc_time = 740   free_time = 252
     parallelism = 16    itr = 10000000  malloc_time = 330   free_time = 252
    
    13.38 user 
    4.66 system 
    0:18.08 elapsed 
    99% CPU 
    

    并行性似乎更快了大约8秒。 仍然不相信? 好。 我继续抓住dlmalloc ,运行make来生成libmalloc.a 。 我的新命令就像g++ -Wall -pedantic -O3 -pthread -fopenmp -L$HOME/Development/test/dlmalloc/lib test.cpp -lmalloc

    借助OpenMP:

    parallelism = 1  itr = 10000000  malloc_time = 814   free_time = 277
    

    我在37秒后CTRL-C'd。

    没有OpenMP:

     parallelism = 1     itr = 10000000  malloc_time = 772   free_time = 271
     parallelism = 2     itr = 10000000  malloc_time = 780   free_time = 272
     parallelism = 3     itr = 10000000  malloc_time = 783   free_time = 272
     parallelism = 4     itr = 10000000  malloc_time = 792   free_time = 277
     parallelism = 5     itr = 10000000  malloc_time = 813   free_time = 281
     parallelism = 6     itr = 10000000  malloc_time = 800   free_time = 275
     parallelism = 7     itr = 10000000  malloc_time = 795   free_time = 277
     parallelism = 8     itr = 10000000  malloc_time = 790   free_time = 273
     parallelism = 9     itr = 10000000  malloc_time = 788   free_time = 277
     parallelism = 10    itr = 10000000  malloc_time = 784   free_time = 276
     parallelism = 11    itr = 10000000  malloc_time = 786   free_time = 284
     parallelism = 12    itr = 10000000  malloc_time = 807   free_time = 279
     parallelism = 13    itr = 10000000  malloc_time = 791   free_time = 277
     parallelism = 14    itr = 10000000  malloc_time = 790   free_time = 273
     parallelism = 15    itr = 10000000  malloc_time = 785   free_time = 276
     parallelism = 16    itr = 10000000  malloc_time = 787   free_time = 275
    
    6.48 user 
    11.27 system 
    0:17.81 elapsed 
    99% CPU
    

    相当显着的区别。 我怀疑问题在于你的代码更复杂,或者你的基准测试有问题。

    链接地址: http://www.djcxy.com/p/33959.html

    上一篇: How to know which malloc is used?

    下一篇: Using Google Drive API to upload large files