Using std:vector as low level buffer

The usage here is the same as Using read() directly into a C++ std:vector, but with an acount of reallocation.

The size of input file is unknown, thus the buffer is reallocated by doubling size when file size exceeds buffer size. Here's my code:

#include <vector>
#include <fstream>
#include <iostream>

int main()
{
    const size_t initSize = 1;
    std::vector<char> buf(initSize); // sizes buf to initSize, so &buf[0] below is valid
    std::ifstream ifile("D:Picturesinput.jpg", std::ios_base::in|std::ios_base::binary);
    if (ifile)
    {
        size_t bufLen = 0;
        for (buf.reserve(1024); !ifile.eof(); buf.reserve(buf.capacity() << 1))
        {
            std::cout << buf.capacity() << std::endl;
            ifile.read(&buf[0] + bufLen, buf.capacity() - bufLen);
            bufLen += ifile.gcount();
        }
        std::ofstream ofile("rebuild.jpg", std::ios_base::out|std::ios_base::binary);
        if (ofile)
        {
            ofile.write(&buf[0], bufLen);
        }
    }
}

The program prints the vector capacity just as expected, and writes the output file just the same size as input, BUT, with only the same bytes as input before offset initSize , and all zeros afterward...

Using &buf[bufLen] in read() is definitly an undefined behavior, but &buf[0] + bufLen gets the right postition to write because continuous allocation is guaranteed, isn't it? (provided initSize != 0 . Note that std::vector<char> buf(initSize); sizes buf to initSize . And yes, if initSize == 0 , a rumtime fatal error ocurrs in my environment.) Do I miss something? Is this also an UB? Does the standard say anything about this usage of std::vector?

PS: Yes, I know we can calculate the file size first and allocate exactly the same buffer size, but in my project, it can be expected that the input files nearly ALWAYS be smaller than a certain SIZE , so I can set initSize to SIZE and expect no overhead (like file size calculation), and use reallocation just for "exception handling". And yes, I know I can replace reserve() with resize() and capacity() with size() , then get things work with little overhead (zero the buffer in every resizing), but I still want to get rid of any redundent operation, just a kind of paranoid...

updated 1:

In fact, we can logically deduce from the standard that &buf[0] + bufLen gets the right postition, consider:

std::vector<char> buf(128);
buf.reserve(512);
char* bufPtr0 = &buf[0], *bufPtrOutofRange = &buf[0] + 200;
buf.resize(256); std::cout << "standard guarantees no reallocation" << std::endl;
char* bufPtr1 = &buf[0], *bufInRange = &buf[200]; 
if (bufPtr0 == bufPtr1)
    std::cout << "so bufPtr0 == bufPtr1" << std::endl;
std::cout << "and 200 < buf.size(), standard guarantees bufInRange == bufPtr1 + 200" << std::endl;
if (bufInRange == bufPtrOutofRange)
    std::cout << "finally we have: bufInRange == bufPtrOutofRange" << std::endl;

output:

standard guarantees no reallocation
so bufPtr0 == bufPtr1
and 200 < buf.size(), standard guarantees bufInRange == bufPtr1 + 200
finally we have: bufInRange == bufPtrOutofRange

And here 200 can be replaced with every buf.size() <= i < buf.capacity() and the similar deduction holds.

updated 2:

Yes, I did miss something... But the problem is not continuity (see update 1), and even not failure to write memory. Today I got some time to look into the problem, the program got the right address, wrote the right data into reserved memory, but in the next reserve() , buf is reallocated and with ONLY the elements in range [0, buf.size()) copied to the new memory. So this's the answer to the whole riddle...

Final note: If you needn't reallocation after your buffer is filled with some data, you can definitely use reserve()/capatity() instead of resize()/size() , but if you need, use the latter.

example:

const size_t initSize = 32;
std::vector<char> buf(initSize);
buf.reserve(1024*100); // reserve enough space for file reading
std::ifstream ifile("D:Picturesinput.jpg", std::ios_base::in|std::ios_base::binary);
if (ifile)
{
    ifile.read(&buf[0], buf.capacity());  // ok. the whole file is read into buf
    std::ofstream ofile("rebuld.jpg", std::ios_base::out|std::ios_base::binary);
    if (ofile)
    {
        ofile.write(&buf[0], ifile.gcount()); // rebuld.jpg just identical to input.jpg
    }
}
buf.reserve(1024*200); // horror! probably always lose all data in buf after offset initSize

PS: I haven't found any authoritative sources (standard, TC++PL, etc.) explictly agree or disagree with the above suggestion I've made. But under all implementations available here (VC++, g++, ICC), the above example works fine.

And here's another example, quoted from 'TC++PL, 4e' pp 1041, note that the first line in the function uses reserve() rather than resize() :

void fill(istream& in, string& s, int max)
// use s as target for low-level input (simplified)
{
    s.reserve(max); // make sure there is enough allocated space
    in.read(&s[0],max);
    const int n = in.gcount(); // number of characters read
    s.resize(n);
    s.shrink_to_fit();  // discard excess capacity
}

reserve doesn't actually add the space to the vector, it only makes sure that you won't need a reallocation when you resize it. Instead of using reserve you should use resize , then do a final resize once you know how many bytes you actually read in.

Edit: All that reserve is guaranteed to do is prevent the invalidation of iterators and pointers as you increase the size of the vector up to capacity() . It is not guaranteed to maintain the contents of those reserved bytes unless they're part of the size() .

链接地址: http://www.djcxy.com/p/16328.html

上一篇: 如何为NLTK中的歧义句子生成多个分析树?

下一篇: 使用std:vector作为低级缓冲区