Why are std::fstreams so slow?

I was working on a simple parser and when profiling I observed the bottleneck is in... file read! I extracted very simple test to compare the performance of fstreams and FILE* when reading a big blob of data:

#include <stdio.h>
#include <chrono>
#include <fstream>
#include <iostream>
#include <functional>

void measure(const std::string& test, std::function<void()> function)
{
    auto start_time = std::chrono::high_resolution_clock::now();

    function();

    auto duration = std::chrono::duration_cast<std::chrono::nanoseconds>(std::chrono::high_resolution_clock::now() - start_time);
    std::cout<<test<<" "<<static_cast<double>(duration.count()) * 0.000001<<" ms"<<std::endl;
}

#define BUFFER_SIZE (1024 * 1024 * 1024)

int main(int argc, const char * argv[])
{
    auto buffer = new char[BUFFER_SIZE];
    memset(buffer, 123, BUFFER_SIZE);

    measure("FILE* write", [buffer]()
    {
        FILE* file = fopen("test_file_write", "wb");
        fwrite(buffer, 1, BUFFER_SIZE, file);
        fclose(file);
    });
    measure("FILE* read", [buffer]()
    {
        FILE* file = fopen("test_file_read", "rb");
        fread(buffer, 1, BUFFER_SIZE, file);
        fclose(file);
    });
    measure("fstream write", [buffer]()
    {
        std::ofstream stream("test_stream_write", std::ios::binary);
        stream.write(buffer, BUFFER_SIZE);
    });
    measure("fstream read", [buffer]()
    {
        std::ifstream stream("test_stream_read", std::ios::binary);
        stream.read(buffer, BUFFER_SIZE);
    });

    delete[] buffer;
}

The results of running this code on my machine are:

FILE* write 1388.59 ms
FILE* read 1292.51 ms
fstream write 3105.38 ms
fstream read 3319.82 ms

fstream write/read are about 2 times slower than FILE* write/read! And this while reading a big blob of data, without any parsing or other features of fstreams . I'm running the code on Mac OS, Intel I7 2.6GHz, 16GB 1600 MHz Ram, SSD drive. Please note that running again same code the time for FILE* read is very low (about 200 ms) probably because the file gets cached... This is why the files opened for reading are not created using the code.

Why when reading just a blob of binary data using fstream is so slow compared to FILE* ?

EDIT 1: I updated the code and the times. Sorry for the delay!

EDIT 2: I added command line and new results (very similar to previous ones!)

$ clang++  main.cpp -std=c++11 -stdlib=libc++ -O3
$ ./a.out
FILE* write 1417.9 ms
FILE* read 1292.59 ms
fstream write 3214.02 ms
fstream read 3052.56 ms

Following the results for the second run:

$ ./a.out
FILE* write 1428.98 ms
FILE* read 196.902 ms
fstream write 3343.69 ms
fstream read 2285.93 ms

It looks like the file gets cached when reading for both FILE* and stream as the time reduces with the same amount for both of them.

EDIT 3: I reduced the code to this:

FILE* file = fopen("test_file_write", "wb");
fwrite(buffer, 1, BUFFER_SIZE, file);
fclose(file);

std::ofstream stream("test_stream_write", std::ios::binary);
stream.write(buffer, BUFFER_SIZE);

And started the profiler. It seems like stream spends lots of time in xsputn function, and the actual write calls have the same duration (as it should be, it's the same function...)

Running    Time     Self       Symbol Name
3266.0ms   66.9%    0,0        std::__1::basic_ostream<char, std::__1::char_traits<char> >::write(char const*, long)
3265.0ms   66.9%    2145,0          std::__1::basic_streambuf<char, std::__1::char_traits<char> >::xsputn(char const*, long)
1120.0ms   22.9%    7,0                 std::__1::basic_filebuf<char, std::__1::char_traits<char> >::overflow(int)
1112.0ms   22.7%    2,0                      fwrite
1127.0ms   23.0%    0,0        fwrite

EDIT 4 For some reason this question is marked as duplicate. I wanted to point out that I don't use printf at all, I use only std::cout to write the time. The files used in the read part are the output from the write part, copied with different name to avoid caching


It would seem that, on Linux, for this large set of data, the implementation of fwrite is much more efficient, since it uses write rather than writev .

I'm not sure WHY writev is so much slower than write , but that appears to be where the difference is. And I see absolutely no real reason as to why the fstream needs to use that construct in this case.

This can easily be seen by using strace ./a.out (where a.out is the program testing this).

Output:

Fstream:

clock_gettime(CLOCK_REALTIME, {1411978373, 114560081}) = 0
open("test", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 3
writev(3, [{NULL, 0}, {""..., 1073741824}], 2) = 1073741824
close(3)                                = 0
clock_gettime(CLOCK_REALTIME, {1411978386, 376353883}) = 0
write(1, "fstream write 13261.8 msn", 25fstream write 13261.8 ms) = 25

FILE*:

clock_gettime(CLOCK_REALTIME, {1411978386, 930326134}) = 0
open("test", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 3
write(3, ""..., 1073741824) = 1073741824
clock_gettime(CLOCK_REALTIME, {1411978388, 584197782}) = 0
write(1, "FILE* write 1653.87 msn", 23FILE* write 1653.87 ms) = 23

I don't have them fancy SSD drives, so my machine will be a bit slower on that - or something else is slower in my case.

As pointed out by Jan Hudec, I'm misinterpreting the results. I just wrote this:

#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <sys/uio.h>
#include <unistd.h>
#include <iostream>
#include <cstdlib>
#include <cstring>
#include <functional>
#include <chrono>

void measure(const std::string& test, std::function<void()> function)
{
    auto start_time = std::chrono::high_resolution_clock::now();

    function();

    auto duration = std::chrono::duration_cast<std::chrono::nanoseconds>(std::chrono::high_resolution_clock::now() - start_time);
    std::cout<<test<<" "<<static_cast<double>(duration.count()) * 0.000001<<" ms"<<std::endl;
}

#define BUFFER_SIZE (1024 * 1024 * 1024)


int main()
{
    auto buffer = new char[BUFFER_SIZE];
    memset(buffer, 0, BUFFER_SIZE);

    measure("writev", [buffer]()
    {
        int fd = open("test", O_CREAT|O_WRONLY);
        struct iovec vec[] = 
        {
            { NULL, 0 },
            { (void *)buffer, BUFFER_SIZE }
        };
        writev(fd, vec, sizeof(vec)/sizeof(vec[0]));
        close(fd);
    });

    measure("write", [buffer]()
    {
        int fd = open("test", O_CREAT|O_WRONLY);
        write(fd, buffer, BUFFER_SIZE);
        close(fd);
    });
}

It is the actual fstream implementation that does something daft - probably copying the whole data in small chunks, somewhere and somehow, or something like that. I will try to find out further.

And the result is pretty much identical for both cases, and faster than both fstream and FILE* variants in the question.

Edit:

It would seem like, on my machine, right now, if you add fclose(file) after the write, it takes approximately the same amount of time for both fstream and FILE* - on my system, around 13 seconds to write 1GB - with old style spinning disk type drives, not SSD.

I can however write MUCH faster using this code:

#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <sys/uio.h>
#include <unistd.h>
#include <iostream>
#include <cstdlib>
#include <cstring>
#include <functional>
#include <chrono>

void measure(const std::string& test, std::function<void()> function)
{
    auto start_time = std::chrono::high_resolution_clock::now();

    function();

    auto duration = std::chrono::duration_cast<std::chrono::nanoseconds>(std::chrono::high_resolution_clock::now() - start_time);
    std::cout<<test<<" "<<static_cast<double>(duration.count()) * 0.000001<<" ms"<<std::endl;
}

#define BUFFER_SIZE (1024 * 1024 * 1024)


int main()
{
    auto buffer = new char[BUFFER_SIZE];
    memset(buffer, 0, BUFFER_SIZE);

    measure("writev", [buffer]()
    {
        int fd = open("test", O_CREAT|O_WRONLY, 0660);
        struct iovec vec[] = 
        {
            { NULL, 0 },
            { (void *)buffer, BUFFER_SIZE }
        };
        writev(fd, vec, sizeof(vec)/sizeof(vec[0]));
        close(fd);
    });

    measure("write", [buffer]()
    {
        int fd = open("test", O_CREAT|O_WRONLY, 0660);
        write(fd, buffer, BUFFER_SIZE);
        close(fd);
    });
}

gives times of about 650-900 ms.

I can also edit the original program to give a time of approximately 1000ms for fwrite - simply remove the fclose .

I also added this method:

measure("fstream write (new)", [buffer]()
{
    std::ofstream* stream = new std::ofstream("test", std::ios::binary);
    stream->write(buffer, BUFFER_SIZE);
    // Intentionally no delete.
});

and then it takes about 1000 ms here too.

So, my conclusion is that, somehow, sometimes, closing the file makes it flush to disk. In other cases, it doesn't. I still don't understand why...


In contrary to other answers, a big issue with large file reads comes from buffering by the C standard library. Try using low level read / write calls in large chunks (1024KB) and see the performance jump.

File buffering by the C library is useful for reading or writing small chunks of data (smaller than disk block size).

On Windows I got almost a 3x performance boost dropping file buffering when reading and writing raw video streams.

I also opened the file using native OS (win32) API calls and told the OS not to cache the file as this involves yet another copy.


The stream is somehow broken on the MAC, old implementation or setup.

An old setup could cause the FILE to be written in the exe directory and the stream in the user directory, this shouldn't make any difference unless you got 2 disks or other different setting.

On my lousy Vista I get Normal buffer+Uncached:
C++ 201103
FILE* write 4756 ms
FILE* read 5007 ms
fstream write 5526 ms
fstream read 5728 ms

Normal buffer+Cached:
C++ 201103
FILE* write 4747 ms
FILE* read 454 ms
fstream write 5490 ms
fstream read 396 ms

Large Buffer+cached:
C++ 201103
5th run:
FILE* write 4760 ms
FILE* read 446 ms
fstream write 5278 ms
fstream read 369 ms

This shows that the FILE write is faster than the fstream, but slower in read than fstream ... but all numbers are within ~10% of each other.

Try adding some more buffering to your stream to see if that helps.

const int MySize = 1024*1024;
char MrBuf[MySize];
stream.rdbuf()->pubsetbuf(MrBuf, MySize);

The equivalent for FILE is

const int MySize = 1024*1024;
if (!setvbuf ( file , NULL , _IOFBF , MySize )) 
    DieInDisgrace();
链接地址: http://www.djcxy.com/p/31520.html

上一篇: std :: ifstream明显比FILE慢吗?

下一篇: 为什么std :: fstreams很慢?