What is the best algorithm for this array

What is the most efficient for speed algorithm to solve the following problem?

Given 6 arrays, D1,D2,D3,D4,D5 and D6 each containing 6 numbers like:

D1[0] = number              D2[0] = number      ......       D6[0] = number
D1[1] = another number      D2[1] = another number           ....
.....                       ....                ......       ....
D1[5] = yet another number  ....                ......       ....

Given a second array ST1, containing 1 number:

ST1[0] = 6

Given a third array ans, containing 6 numbers:

ans[0] = 3, ans[1] = 4, ans[2] = 5, ......ans[5] = 8

Using as index for the arrays D1,D2,D3,D4,D5 and D6, the number that goes from 0, to the number stored in ST1[0] minus one, in this example 6, so from 0 to 6-1, compare the ans array against each D array. The result should be 0 if one or more ans numbers are not found in any D at the same index and should be 1 if all ans numbers are found in some D at the same index. That is, return 0 if some ans[i] doesn't equal any DN[i] and return 1 if every ans[i] equals some DN[i].

My algorithm so far is:
I tried to keep everything unlooped as much as possible.

EML  := ST1[0]   //number contained in ST1[0]   
EML1 := 0        //start index for the arrays D 

While EML1 < EML
   if D1[ELM1] = ans[0] 
     goto two
   if D2[ELM1] = ans[0] 
     goto two
   if D3[ELM1] = ans[0] 
     goto two
   if D4[ELM1] = ans[0] 
     goto two
   if D5[ELM1] = ans[0] 
     goto two
   if D6[ELM1] = ans[0] 
     goto two

   ELM1 = ELM1 + 1

return 0     //If the ans[0] number is not found in either D1[0-6], D2[0-6].... D6[0-6] return 0 which will then exclude ans[0-6] numbers


two:

EML1 := 0      start index for arrays Ds 
While EML1 < EML
   if D1[ELM1] = ans[1] 
     goto three
   if D2[ELM1] = ans[1] 
     goto three
   if D3[ELM1] = ans[1] 
     goto three
   if D4[ELM1] = ans[1] 
     goto three
   if D5[ELM1] = ans[1] 
     goto three
   if D6[ELM1] = ans[1] 
     goto three
   ELM1 = ELM1 + 1

return 0    //If the ans[1] number is not found in either D1[0-6], D2[0-6]....  D6[0-6]  return 0 which will then exclude ans[0-6] numbers

three:

EML1 := 0      start index for arrays Ds 

While EML1 < EML
   if D1[ELM1] = ans[2] 
     goto four
   if D2[ELM1] = ans[2] 
     goto four
   if D3[ELM1] = ans[2] 
     goto four
   if D4[ELM1] = ans[2] 
     goto four
   if D5[ELM1] = ans[2] 
     goto four
   if D6[ELM1] = ans[2] 
     goto four
   ELM1 = ELM1 + 1

return 0   //If the ans[2] number is not found in either D1[0-6], D2[0-6]....  D6[0-6]  return 0 which will then exclude ans[0-6] numbers

four:

EML1 := 0      start index for arrays Ds 

While EML1 < EML
   if D1[ELM1] = ans[3] 
     goto five
   if D2[ELM1] = ans[3] 
     goto five
   if D3[ELM1] = ans[3] 
     goto five
   if D4[ELM1] = ans[3] 
     goto five
   if D5[ELM1] = ans[3] 
     goto five
   if D6[ELM1] = ans[3] 
     goto five
   ELM1 = ELM1 + 1

return 0 //If the ans[3] number is not found in either D1[0-6], D2[0-6]....  D6[0-6]  return 0 which will then exclude ans[0-6] numbers


five:

EML1 := 0      start index for arrays Ds 

While EML1 < EML
   if D1[ELM1] = ans[4] 
     goto six
   if D2[ELM1] = ans[4] 
     goto six
   if D3[ELM1] = ans[4] 
     goto six
   if D4[ELM1] = ans[4] 
     goto six
   if D5[ELM1] = ans[4] 
     goto six
   if D6[ELM1] = ans[4] 
     goto six
   ELM1 = ELM1 + 1

return 0  //If the ans[4] number is not found in either D1[0-6], D2[0-6]....  D6[0-6]  return 0 which will then exclude ans[0-6] numbers

six:

EML1 := 0      start index for arrays Ds 

While EML1 < EML
   if D1[ELM1] = ans[5] 
     return 1            ////If the ans[1] number is not found in either D1[0-6].....  
   if D2[ELM1] = ans[5]      return 1 which will then include ans[0-6] numbers
     return 1
   if D3[ELM1] = ans[5] 
     return 1
   if D4[ELM1] = ans[5] 
     return 1
   if D5[ELM1] = ans[5] 
     return 1
   if D6[ELM1] = ans[5] 
     return 1
   ELM1 = ELM1 + 1

return 0 

As language of choice, it would be pure c


I did a direct trivial C implementation of the algorithm provided by the original poster. It is here

As other proposed the first thing to do is to roll up the code. Unrolling is not really good for speed as it lead to code cache misses. I began by rolling inner loops and got this. Then I rolled the outer loop and removed the now useless gotos and got code below.

EDIT : I changed several time the C code because even as simple as it is there seems to be problems when JIT compiling or executing it with CUDA (and CUDA seems not to be very verbose about errors). That is why the piece of code below use globals... and that is just trivial implementation. We are not yet going for speed. It says much about premature optimization. Why bother to make it fast if we can't even make it work ? I guess there is still issues as CUDA seems to impose many restrictions on the code you can make work if I believe the Wikipedia article. Also maybe we should use float instead of int ?

#include <stdio.h>

int D1[6] = {3, 4, 5, 6, 7, 8};
int D2[6] = {3, 4, 5, 6, 7, 8};
int D3[6] = {3, 4, 5, 6, 7, 8};
int D4[6] = {3, 4, 5, 6, 7, 8};
int D5[6] = {3, 4, 5, 6, 7, 8};
int D6[6] = {3, 4, 5, 6, 7, 9};
int ST1[1] = {6};
int ans[6] = {1, 4, 5, 6, 7, 9};
int * D[6] = { D1, D2, D3, D4, D5, D6 };

/* beware D is passed through globals */
int algo(int * ans, int ELM){
    int a, e, p;

    for (a = 0 ; a < 6 ; a++){ 
        for (e = 0 ; e < ELM ; e++){
            for (p = 0 ; p < 6 ; p++){
                if (D[p][e] == ans[a]){
                    goto cont;
                }
            }
        }
        return 0; //bad row of numbers found
    cont:;
    }
    return 1;
}

int main(){
    int res;
    res = algo(ans, ST1[0]);
    printf("algo returned %dn", res);
}

Now that is interesting, because we can understand what code is doing. By the way doing this packing job I corrected several oddities in the original question. I believe it was typos as it wasn't logical at all in the global context. - goto always jump to two (it should have progressed) - the last test checked ans[0] instead of ans[5]

please Mark, correct me if I'm wrong in the above asumptions on what the original code should do and your original algorithm is typo free.

What the code is doing it for each value in ans check it is present in a two dimentional array. If any number miss it returns 0. If all numbers are found it returns 1.

What I would do to get a real fast code is not to implement it in C but in another language like python (or C++) where set is a basic data structure provided by standard libraries. Then I would build a set with all the values of the arrays (that is O(n)) and check if numbers searched are present in set or not (that is O(1)). The final implementation should be faster than the existing code at least from an algorithmic point of view.

Python example is below as it is really trivial (print true/false instead of 1/0 but you get the idea):

ans_set = set(ans)
print len(set(D1+D2+D3+D4+D5+D6).intersection(ans_set)) == len(ans_set)

Here is a possible C++ implementation using sets:

#include <iostream>
#include <set>

int algo(int * D1, int * D2, int * D3, int * D4, int * D5, int * D6, int * ans, int ELM){
    int e, p;
    int * D[6] = { D1, D2, D3, D4, D5, D6 };
    std::set<int> ans_set(ans, ans+6);

    int lg = ans_set.size();

    for (e = 0 ; e < ELM ; e++){
        for (p = 0 ; p < 6 ; p++){
            if (0 == (lg -= ans_set.erase(D[p][e]))){
                // we found all elements of ans_set
                return 1;
            }
        }
    }
    return 0; // some items in ans are missing
}

int main(){
    int D1[6] = {3, 4, 5, 6, 7, 8};
    int D2[6] = {3, 4, 5, 6, 7, 8};
    int D3[6] = {3, 4, 5, 6, 7, 8};
    int D4[6] = {3, 4, 5, 6, 7, 8};
    int D5[6] = {3, 4, 5, 6, 7, 8};
    int D6[6] = {3, 4, 5, 6, 7, 1};

    int ST1[1] = {6};

    int ans[] = {1, 4, 5, 6, 7, 8};

    int res = algo(D1, D2, D3, D4, D5, D6, ans, ST1[0]);
    std::cout << "algo returned " << res << "n";
}

We make some performance hypothesis : the content of ans should be sorted or we should construct it otherwise, we suppose that content of D1..D6 will change between calls to algo. Hence we do not bother constructing a set for it (as set construction is O(n) anyway we wouldn't gain anything if D1..D6 are changing). But if we call several times algo with the same D1..D6 and that is ans that change we should do the opposite and transform D1..D6 into one larger set that we keep available.

If I stick to CI could do it as follow:

  • copy all numbers in D1.. D6 in one unique array (using memcpy for each row)
  • sort content of this array
  • use a dichotomic search to check if number if available
  • As data size are really small here , we could also try going for micro optimizations. It could pay better here. Don't know for sure.

    EDIT2 : there is hard restrictions on the subset of C supported by CUDA. The most restrictive one is that we shouldn't use pointers to main memory. That will have to be taken into account. It explains why the current code does not work. The simplest change is probably to call it for every array D1..D6 in turn. To keep it short and avoid function call cost we may use a macro or an inline function. I will post a proposal.


    I was a little bit confused by your question, but I think I have it enough to help you get started.

    #define ROW 6
    #define COL 6
    
    int D[ROW][COL]; // This is all of your D arrays in one 2 dimensional array.
    

    Next you should probably use nested for loops. Each loop will correspond to a dimension of D . Remember that the order of your indexes matters. The easiest way to keep it straight in C is to remember that D[i] is a valid expression even when D has more than one dimension (and would evaluate to a pointer to a row : a sub array).

    If you can't change the independent D arrays to one multidimensional array you could easily make a pointer array whose members point to the heads of each of those arrays and achieve the same effect.

    Then you can use the break statement to break out of the inner loop after you have determined that the current D[i] doesn't match the ans .


    With only 36 values to compare, the most efficient approach would be to not use CUDA at all.

    Just use a CPU loop.

    If you change your inputs, I'll change my answer.

    链接地址: http://www.djcxy.com/p/12234.html

    上一篇: 为什么处理排序后的数组比未排序的数组更快?

    下一篇: 什么是这个阵列的最佳算法