Use stack or heap for MPI program variables

I recently started working with Intel MPI for parallelizing a very simple flow solver. I'm wondering whether I should use the stack for storing my variables (ie use datatype name[size]; for declaration) or the heap (ie use datatype *name = (datatype *)malloc(size*sizeof(datatype)); ).

First I used malloc, because I split up the flow field into n parts, where n is the number of processes that were created and I thought it would be nice to use the same code for all values of n. That means the size of my arrays is first known at runtime. Therefore I obviously need dynamic memory allocation. So far so good.

But that made the whole program very slow, due to the dynamic allocation. It was even slower then solving in serial.

I altered my program and used array declaration instead and got the expected speed up. But now my program cannot adjust to different starting conditions (eg number of processes, size of flow field, number of gridpoints,...).

Can anybody give advise what the common practise is to tackle this dilemma? There are obviously a lot of flow solvers in the world that have awesome performance AND can adjust to starting conditions.

Thanks a lot!

EDIT: I tried to simplify my code (it's no MWE though):

int main(int argc, char** argv)
{
    int rank, numProcs, start, length, left, right;

    MPI_Init(&argc, &argv);
    MPI_Comm_rank(MPI_COMM_WORLD, &rank);
    MPI_Comm_size(MPI_COMM_WORLD, &numProcs);

    if (rank==0)
    {
        split_and_send_domain(&start, &length, &left, &right, numProcs);                                                                            

        // allocate memory for arrays like velocity, temperature,...
        double x1[length]; // double *x1=(double *)malloc(length*sizeof(double));
        double x2[length]; // double *x2=(double *)malloc(length*sizeof(double));
        ...
        double xn[length]; // double *xn=(double *)malloc(length*sizeof(double));

        // initialize variables like local residual, global residual, iteration step,...
        int res = 1, resGlob=1; iter=0,...;
        int keepOn = 1;

        setupCalculation(start, length, left, right, x1, x2, ...); // initializes the arrays

        MPI_Barrier(MPI_COMM_WORLD);

        while (keepOn){
            iter++;
            pass_boundaries(left, right, length, utilde, rank);
            do_calculation(length, x1, x2, ...);
            calc_own_residual(length, x1, x2, ...);         
            calc_glob_residual(&resGlob, res);

            if (iter>=maxiter || resGlob<1e-8)  keepOn = 0;

            MPI_Bcast(&keepOn, 1, MPI_INT, 0, MPI_COMM_WORLD);
            MPI_Barrier(MPI_COMM_WORLD);
        }

        /* gather results & do some final calculations & output*/
    }
    else
    {
        receive_domain(&start, &length, &left, &right);                                                                         

        // allocate memory for arrays like velocity, temperature,...
        double x1[length]; // double *x1=(double *)malloc(length*sizeof(double));
        double x2[length]; // double *x2=(double *)malloc(length*sizeof(double));
        ...
        double xn[length]; // double *xn=(double *)malloc(length*sizeof(double));

        // initialize variables like residual, iteration step,...
        int res = 1;
        int keepOn = 1;

        setupCalculation(start, length, left, right, x1, x2, ...); // initializes the arrays

        MPI_Barrier(MPI_COMM_WORLD);

        while (keepOn){
            pass_boundaries(left, right, length, utilde, rank);
            do_calculation(length, x1, x2, ...);
            calc_own_residual(length, x1, x2, ...);
            calc_glob_residual(&resGlob, res);

            MPI_Bcast(&keepOn, 1, MPI_INT, 0, MPI_COMM_WORLD);
            MPI_Barrier(MPI_COMM_WORLD);
        }
    }

    MPI_Finalize();
}

When using the dynamic allocation int length is additionally calculated at the beginning, else it is set as global const variable.

链接地址: http://www.djcxy.com/p/79258.html

上一篇: 堆栈或堆变量

下一篇: MPI程序变量使用堆栈或堆