Why are std::fstreams so slow?
I was working on a simple parser and when profiling I observed the bottleneck is in... file read! I extracted very simple test to compare the performance of fstreams
and FILE*
when reading a big blob of data:
#include <stdio.h>
#include <chrono>
#include <fstream>
#include <iostream>
#include <functional>
void measure(const std::string& test, std::function<void()> function)
{
auto start_time = std::chrono::high_resolution_clock::now();
function();
auto duration = std::chrono::duration_cast<std::chrono::nanoseconds>(std::chrono::high_resolution_clock::now() - start_time);
std::cout<<test<<" "<<static_cast<double>(duration.count()) * 0.000001<<" ms"<<std::endl;
}
#define BUFFER_SIZE (1024 * 1024 * 1024)
int main(int argc, const char * argv[])
{
auto buffer = new char[BUFFER_SIZE];
memset(buffer, 123, BUFFER_SIZE);
measure("FILE* write", [buffer]()
{
FILE* file = fopen("test_file_write", "wb");
fwrite(buffer, 1, BUFFER_SIZE, file);
fclose(file);
});
measure("FILE* read", [buffer]()
{
FILE* file = fopen("test_file_read", "rb");
fread(buffer, 1, BUFFER_SIZE, file);
fclose(file);
});
measure("fstream write", [buffer]()
{
std::ofstream stream("test_stream_write", std::ios::binary);
stream.write(buffer, BUFFER_SIZE);
});
measure("fstream read", [buffer]()
{
std::ifstream stream("test_stream_read", std::ios::binary);
stream.read(buffer, BUFFER_SIZE);
});
delete[] buffer;
}
The results of running this code on my machine are:
FILE* write 1388.59 ms
FILE* read 1292.51 ms
fstream write 3105.38 ms
fstream read 3319.82 ms
fstream
write/read are about 2 times slower than FILE*
write/read! And this while reading a big blob of data, without any parsing or other features of fstreams
. I'm running the code on Mac OS, Intel I7 2.6GHz, 16GB 1600 MHz Ram, SSD drive. Please note that running again same code the time for FILE* read
is very low (about 200 ms) probably because the file gets cached... This is why the files opened for reading are not created using the code.
Why when reading just a blob of binary data using fstream
is so slow compared to FILE*
?
EDIT 1: I updated the code and the times. Sorry for the delay!
EDIT 2: I added command line and new results (very similar to previous ones!)
$ clang++ main.cpp -std=c++11 -stdlib=libc++ -O3
$ ./a.out
FILE* write 1417.9 ms
FILE* read 1292.59 ms
fstream write 3214.02 ms
fstream read 3052.56 ms
Following the results for the second run:
$ ./a.out
FILE* write 1428.98 ms
FILE* read 196.902 ms
fstream write 3343.69 ms
fstream read 2285.93 ms
It looks like the file gets cached when reading for both FILE*
and stream
as the time reduces with the same amount for both of them.
EDIT 3: I reduced the code to this:
FILE* file = fopen("test_file_write", "wb");
fwrite(buffer, 1, BUFFER_SIZE, file);
fclose(file);
std::ofstream stream("test_stream_write", std::ios::binary);
stream.write(buffer, BUFFER_SIZE);
And started the profiler. It seems like stream
spends lots of time in xsputn
function, and the actual write
calls have the same duration (as it should be, it's the same function...)
Running Time Self Symbol Name
3266.0ms 66.9% 0,0 std::__1::basic_ostream<char, std::__1::char_traits<char> >::write(char const*, long)
3265.0ms 66.9% 2145,0 std::__1::basic_streambuf<char, std::__1::char_traits<char> >::xsputn(char const*, long)
1120.0ms 22.9% 7,0 std::__1::basic_filebuf<char, std::__1::char_traits<char> >::overflow(int)
1112.0ms 22.7% 2,0 fwrite
1127.0ms 23.0% 0,0 fwrite
EDIT 4 For some reason this question is marked as duplicate. I wanted to point out that I don't use printf
at all, I use only std::cout
to write the time. The files used in the read
part are the output from the write
part, copied with different name to avoid caching
It would seem that, on Linux, for this large set of data, the implementation of fwrite
is much more efficient, since it uses write
rather than writev
.
I'm not sure WHY writev
is so much slower than write
, but that appears to be where the difference is. And I see absolutely no real reason as to why the fstream
needs to use that construct in this case.
This can easily be seen by using strace ./a.out
(where a.out
is the program testing this).
Output:
Fstream:
clock_gettime(CLOCK_REALTIME, {1411978373, 114560081}) = 0
open("test", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 3
writev(3, [{NULL, 0}, {" "..., 1073741824}], 2) = 1073741824
close(3) = 0
clock_gettime(CLOCK_REALTIME, {1411978386, 376353883}) = 0
write(1, "fstream write 13261.8 msn", 25fstream write 13261.8 ms) = 25
FILE*:
clock_gettime(CLOCK_REALTIME, {1411978386, 930326134}) = 0
open("test", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 3
write(3, " "..., 1073741824) = 1073741824
clock_gettime(CLOCK_REALTIME, {1411978388, 584197782}) = 0
write(1, "FILE* write 1653.87 msn", 23FILE* write 1653.87 ms) = 23
I don't have them fancy SSD drives, so my machine will be a bit slower on that - or something else is slower in my case.
As pointed out by Jan Hudec, I'm misinterpreting the results. I just wrote this:
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <sys/uio.h>
#include <unistd.h>
#include <iostream>
#include <cstdlib>
#include <cstring>
#include <functional>
#include <chrono>
void measure(const std::string& test, std::function<void()> function)
{
auto start_time = std::chrono::high_resolution_clock::now();
function();
auto duration = std::chrono::duration_cast<std::chrono::nanoseconds>(std::chrono::high_resolution_clock::now() - start_time);
std::cout<<test<<" "<<static_cast<double>(duration.count()) * 0.000001<<" ms"<<std::endl;
}
#define BUFFER_SIZE (1024 * 1024 * 1024)
int main()
{
auto buffer = new char[BUFFER_SIZE];
memset(buffer, 0, BUFFER_SIZE);
measure("writev", [buffer]()
{
int fd = open("test", O_CREAT|O_WRONLY);
struct iovec vec[] =
{
{ NULL, 0 },
{ (void *)buffer, BUFFER_SIZE }
};
writev(fd, vec, sizeof(vec)/sizeof(vec[0]));
close(fd);
});
measure("write", [buffer]()
{
int fd = open("test", O_CREAT|O_WRONLY);
write(fd, buffer, BUFFER_SIZE);
close(fd);
});
}
It is the actual fstream
implementation that does something daft - probably copying the whole data in small chunks, somewhere and somehow, or something like that. I will try to find out further.
And the result is pretty much identical for both cases, and faster than both fstream
and FILE*
variants in the question.
Edit:
It would seem like, on my machine, right now, if you add fclose(file)
after the write, it takes approximately the same amount of time for both fstream
and FILE*
- on my system, around 13 seconds to write 1GB - with old style spinning disk type drives, not SSD.
I can however write MUCH faster using this code:
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <sys/uio.h>
#include <unistd.h>
#include <iostream>
#include <cstdlib>
#include <cstring>
#include <functional>
#include <chrono>
void measure(const std::string& test, std::function<void()> function)
{
auto start_time = std::chrono::high_resolution_clock::now();
function();
auto duration = std::chrono::duration_cast<std::chrono::nanoseconds>(std::chrono::high_resolution_clock::now() - start_time);
std::cout<<test<<" "<<static_cast<double>(duration.count()) * 0.000001<<" ms"<<std::endl;
}
#define BUFFER_SIZE (1024 * 1024 * 1024)
int main()
{
auto buffer = new char[BUFFER_SIZE];
memset(buffer, 0, BUFFER_SIZE);
measure("writev", [buffer]()
{
int fd = open("test", O_CREAT|O_WRONLY, 0660);
struct iovec vec[] =
{
{ NULL, 0 },
{ (void *)buffer, BUFFER_SIZE }
};
writev(fd, vec, sizeof(vec)/sizeof(vec[0]));
close(fd);
});
measure("write", [buffer]()
{
int fd = open("test", O_CREAT|O_WRONLY, 0660);
write(fd, buffer, BUFFER_SIZE);
close(fd);
});
}
gives times of about 650-900 ms.
I can also edit the original program to give a time of approximately 1000ms for fwrite
- simply remove the fclose
.
I also added this method:
measure("fstream write (new)", [buffer]()
{
std::ofstream* stream = new std::ofstream("test", std::ios::binary);
stream->write(buffer, BUFFER_SIZE);
// Intentionally no delete.
});
and then it takes about 1000 ms here too.
So, my conclusion is that, somehow, sometimes, closing the file makes it flush to disk. In other cases, it doesn't. I still don't understand why...
In contrary to other answers, a big issue with large file reads comes from buffering by the C standard library. Try using low level read
/ write
calls in large chunks (1024KB) and see the performance jump.
File buffering by the C library is useful for reading or writing small chunks of data (smaller than disk block size).
On Windows I got almost a 3x performance boost dropping file buffering when reading and writing raw video streams.
I also opened the file using native OS (win32) API calls and told the OS not to cache the file as this involves yet another copy.
The stream is somehow broken on the MAC, old implementation or setup.
An old setup could cause the FILE to be written in the exe directory and the stream in the user directory, this shouldn't make any difference unless you got 2 disks or other different setting.
On my lousy Vista I get Normal buffer+Uncached:
C++ 201103
FILE* write 4756 ms
FILE* read 5007 ms
fstream write 5526 ms
fstream read 5728 ms
Normal buffer+Cached:
C++ 201103
FILE* write 4747 ms
FILE* read 454 ms
fstream write 5490 ms
fstream read 396 ms
Large Buffer+cached:
C++ 201103
5th run:
FILE* write 4760 ms
FILE* read 446 ms
fstream write 5278 ms
fstream read 369 ms
This shows that the FILE write is faster than the fstream, but slower in read than fstream ... but all numbers are within ~10% of each other.
Try adding some more buffering to your stream to see if that helps.
const int MySize = 1024*1024;
char MrBuf[MySize];
stream.rdbuf()->pubsetbuf(MrBuf, MySize);
The equivalent for FILE is
const int MySize = 1024*1024;
if (!setvbuf ( file , NULL , _IOFBF , MySize ))
DieInDisgrace();
链接地址: http://www.djcxy.com/p/31520.html