Optimizing for 3D imaging processes in C++

2018-06-04 15:53:30

I am working with 3D volumetric images, possibly (256x256x256). I have 3 such volumes that I want to read in and operate on. Presently, each volume is stored as a text file of numbers which I read in using ifstream. I save it as a matrix(This is a class I have written by dynamic allocation of a 3D array). Then I perform operations on these 3 matrices, addition, multiplication and even fourier transform. So far, everything works well, but, it takes a hell lot of time, especially the fourier transform since it has 6 nested loops.

I want to know how I can speed this up. Also, whether the fact that I have stored the images in text files makes a difference. Should I save them as binary or in some other easier/faster to read in format? Is fstream the fastest way I can read in? I use the same 3 matrices each time without changing them. Does that make a difference? Also, is pointer to pointer to pointer best way to store a 3d volume? If not what else can I do?

Also, is pointer to pointer to pointer best way to store a 3d volume?

Nope thats usually very ineficient.

If not what else can I do?

Its likely that you will get better performance if you store it in a contiguous block, and use computed offsets into the block.

I'd usually use a structure like this:

class DataBlock {

  unsigned int nx;
  unsigned int ny;
  unsigned int nz;
  std::vector<double> data;

  DataBlock(in_nx,in_ny,in_nz) : 
   nx(in_nx), ny(in_ny), nz(in_nz) , data(in_nx*in_ny*in_nz, 0)
  {}

  //You may want to make this check bounds in debug builds
  double& at(unsigned int x, unsigned int y, unsigned int z) { 
    return data[ x + y*nx + z*nx*ny ];
  };

  const double& at(unsigned int x, unsigned int y, unsigned int z) const { 
    return data[ x + y*nx + z*nx*ny ];
  };

  private:
    //Dont want this class copied, so remove the copy constructor and assignment.
    DataBlock(const DataBlock&);
    DataBlock&operator=(const DataBlock&);
};

Storing a large (256^3 elements) 3D image file as plaintext is a waste of resources. Without loss of generality, if you have a plaintext file for your image and each line of your file consists of one value, you will have to read several characters until you find the end of the line (for a 3 digit number, these will be 4 bytes; 3 bytes for the digits, 1 byte for newline). Afterwards you will have to convert these single digits to a number. When using binary, you directly read a fixed amount of bytes and you will have your number. You could and should write and read it as a binary image.

There are several formats for doing so, the one I would recommend is the meta image file format of VTK. In this format, you have a plaintext header file and a binary file with the actual image data. With the information from the header file you will know how large your image is and what datatype you will be using. In your program, you then directly read the binary data and save it to a 3D array.

If you really want to speed things up, use CUDA or OpenCL which will be pretty fast for your applications.

There are several c++ libraries that can help you with writing, saving and manipulating image data, including the before-mentioned VTK and ITK.

2563 is rather large. Parsing 2563 text strings will take considerable amount of time. Using binary will make the reading/writing process much faster because it doesn't require converting a number to/from string, and using much less space . For example to read the number 123 as char from a text file the program will need to read it as a string and convert from decimal to binary using lots of multiplies by 10. Whereas if you had written it directly as the binary value 0b1111011 you only need to read that byte back again into memory, no conversion at all.

Using hexadecimal number may also increase reading speed since each hex digit can map directly to binary value but if you need more speed, binary file is the way to go. Just a fread command is enough to load the whole 2563 bytes = 16MB file into memory in less than 1 sec. And when you're done, just fwrite it back to file. To speedup you can use SIMD (SSE/AVX), CUDA or another parallel processing technique. You can improve the speed even further by multithreading or by only saving the non zero values because in many cases, most values will often be 0's.

Another reason maybe because your array is large and each dimesion is a power of 2. This has been discussed in many questions on SO:

Why is there huge performance hit in 2048x2048 versus 2047x2047 array multiplication?

Why is my program slow when looping over exactly 8192 elements?

Why is transposing a matrix of 512x512 much slower than transposing a matrix of 513x513?

You may consider changing the last dimension to 257 and try again.

链接地址: http://www.djcxy.com/p/15090.html

上一篇: 从程序内部调用gdb打印堆栈跟踪的最佳方式是什么？

下一篇: 在C ++中优化3D成像过程