Speed of memcpy() greatly influenced by different ways of malloc()
I wrote a program to test the speed of memcpy()
. However, how memory are allocated greatly influences the speed.
CODE
#include<stdlib.h>
#include<stdio.h>
#include<sys/time.h>
void main(int argc, char *argv[]){
unsigned char * pbuff_1;
unsigned char * pbuff_2;
unsigned long iters = 1000*1000;
int type = atoi(argv[1]);
int buff_size = atoi(argv[2])*1024;
if(type == 1){
pbuff_1 = (void *)malloc(2*buff_size);
pbuff_2 = pbuff_1+buff_size;
}else{
pbuff_1 = (void *)malloc(buff_size);
pbuff_2 = (void *)malloc(buff_size);
}
for(int i = 0; i < iters; ++i){
memcpy(pbuff_2, pbuff_1, buff_size);
}
if(type == 1){
free(pbuff_1);
}else{
free(pbuff_1);
free(pbuff_2);
}
}
The OS is linux-2.6.35 and the compiler is GCC-4.4.5 with options "-std=c99 -O3".
Results on my computer( memcpy
4KB, iterate 1 million times):
time ./test.test 1 4
real 0m0.128s
user 0m0.120s
sys 0m0.000s
time ./test.test 0 4
real 0m0.422s
user 0m0.420s
sys 0m0.000s
This question is related with a previous question:
Why does the speed of memcpy() drop dramatically every 4KB?
UPDATE
The reason is related with GCC compiler, and I compiled and run this program with different versions of GCC:
GCC version-------- 4.1.3
-------- 4.4.5
-------- 4.6.3
Time Used(1)----- 0m0.183s
---- 0m0.128s
---- 0m0.110s
Time Used(0)----- 0m1.788s
---- 0m0.422s
---- 0m0.108s
It seems GCC is getting smarter.
The specific addresses returned by malloc are selected by the implementation and not always optimal for the using code. You already know that the speed of moving memory around depends greatly on cache and page effects.
Here, the specific pointers malloced are not known. You could print them out using printf("%p", ptr)
. What is known however, is that using just one malloc for two blocks surely avoids page and cache waste between the two blocks. That may already be the reason for the speed difference.