Speed of memcpy() greatly influenced by different ways of malloc()

2018-06-30 19:35:16

I wrote a program to test the speed of memcpy() . However, how memory are allocated greatly influences the speed.

CODE

#include<stdlib.h>
#include<stdio.h>
#include<sys/time.h>

void main(int argc, char *argv[]){
    unsigned char * pbuff_1;
    unsigned char * pbuff_2;
    unsigned long iters = 1000*1000;

    int type = atoi(argv[1]);
    int buff_size = atoi(argv[2])*1024;

    if(type == 1){ 
        pbuff_1 = (void *)malloc(2*buff_size);
        pbuff_2 = pbuff_1+buff_size;
    }else{
        pbuff_1 = (void *)malloc(buff_size);
        pbuff_2 = (void *)malloc(buff_size);
    }   

    for(int i = 0; i < iters; ++i){
        memcpy(pbuff_2, pbuff_1, buff_size);
    }   

    if(type == 1){ 
        free(pbuff_1);
    }else{
        free(pbuff_1);
        free(pbuff_2);
    }   
}

The OS is linux-2.6.35 and the compiler is GCC-4.4.5 with options "-std=c99 -O3".

Results on my computer( memcpy 4KB, iterate 1 million times):

time ./test.test 1 4

real    0m0.128s
user    0m0.120s
sys 0m0.000s

time ./test.test 0 4

real    0m0.422s
user    0m0.420s
sys 0m0.000s

This question is related with a previous question:

Why does the speed of memcpy() drop dramatically every 4KB?

UPDATE

The reason is related with GCC compiler, and I compiled and run this program with different versions of GCC:

GCC version-------- 4.1.3 -------- 4.4.5 -------- 4.6.3

Time Used(1)----- 0m0.183s ---- 0m0.128s ---- 0m0.110s

Time Used(0)----- 0m1.788s ---- 0m0.422s ---- 0m0.108s

It seems GCC is getting smarter.

The specific addresses returned by malloc are selected by the implementation and not always optimal for the using code. You already know that the speed of moving memory around depends greatly on cache and page effects.

Here, the specific pointers malloced are not known. You could print them out using printf("%p", ptr) . What is known however, is that using just one malloc for two blocks surely avoids page and cache waste between the two blocks. That may already be the reason for the speed difference.

链接地址: http://www.djcxy.com/p/85960.html

上一篇: 为什么我的8M L3缓存不能为大于1M的阵列提供任何好处？

下一篇: memcpy（）的速度极大地受到malloc（）的不同方式的影响