什么是这个阵列的最佳算法

2018-06-03 15:05:36

速度算法解决以下问题的最有效方法是什么？

给定6个阵列，D1，D2，D3，D4，D5和D6，每个包含6个数字，如：

D1[0] = number              D2[0] = number      ......       D6[0] = number
D1[1] = another number      D2[1] = another number           ....
.....                       ....                ......       ....
D1[5] = yet another number  ....                ......       ....

给定第二个数组ST1，包含1个数字：

ST1[0] = 6

给定第三个数组ans，包含6个数字：

ans[0] = 3, ans[1] = 4, ans[2] = 5, ......ans[5] = 8

使用数组D1，D2，D3，D4，D5和D6作为索引，从0到存储在ST1 [0]中的数字减1，在这个例子6中，从0到6-1，将ans数组与每个D数组进行比较。如果一个或多个ans号码在同一索引中的任何D中都没有找到，则结果应该为0，如果在同一个索引处的某个D中找到所有ans号码，则结果应该是1。也就是说，如果某个ans [i]不等于任何DN [i]，则返回0，如果每个ans [i]等于某个DN [i]，则返回1。

我的算法到目前为止是：
我试图尽可能地避免所有的事情。

EML  := ST1[0]   //number contained in ST1[0]   
EML1 := 0        //start index for the arrays D 

While EML1 < EML
   if D1[ELM1] = ans[0] 
     goto two
   if D2[ELM1] = ans[0] 
     goto two
   if D3[ELM1] = ans[0] 
     goto two
   if D4[ELM1] = ans[0] 
     goto two
   if D5[ELM1] = ans[0] 
     goto two
   if D6[ELM1] = ans[0] 
     goto two

   ELM1 = ELM1 + 1

return 0     //If the ans[0] number is not found in either D1[0-6], D2[0-6].... D6[0-6] return 0 which will then exclude ans[0-6] numbers


two:

EML1 := 0      start index for arrays Ds 
While EML1 < EML
   if D1[ELM1] = ans[1] 
     goto three
   if D2[ELM1] = ans[1] 
     goto three
   if D3[ELM1] = ans[1] 
     goto three
   if D4[ELM1] = ans[1] 
     goto three
   if D5[ELM1] = ans[1] 
     goto three
   if D6[ELM1] = ans[1] 
     goto three
   ELM1 = ELM1 + 1

return 0    //If the ans[1] number is not found in either D1[0-6], D2[0-6]....  D6[0-6]  return 0 which will then exclude ans[0-6] numbers

three:

EML1 := 0      start index for arrays Ds 

While EML1 < EML
   if D1[ELM1] = ans[2] 
     goto four
   if D2[ELM1] = ans[2] 
     goto four
   if D3[ELM1] = ans[2] 
     goto four
   if D4[ELM1] = ans[2] 
     goto four
   if D5[ELM1] = ans[2] 
     goto four
   if D6[ELM1] = ans[2] 
     goto four
   ELM1 = ELM1 + 1

return 0   //If the ans[2] number is not found in either D1[0-6], D2[0-6]....  D6[0-6]  return 0 which will then exclude ans[0-6] numbers

four:

EML1 := 0      start index for arrays Ds 

While EML1 < EML
   if D1[ELM1] = ans[3] 
     goto five
   if D2[ELM1] = ans[3] 
     goto five
   if D3[ELM1] = ans[3] 
     goto five
   if D4[ELM1] = ans[3] 
     goto five
   if D5[ELM1] = ans[3] 
     goto five
   if D6[ELM1] = ans[3] 
     goto five
   ELM1 = ELM1 + 1

return 0 //If the ans[3] number is not found in either D1[0-6], D2[0-6]....  D6[0-6]  return 0 which will then exclude ans[0-6] numbers


five:

EML1 := 0      start index for arrays Ds 

While EML1 < EML
   if D1[ELM1] = ans[4] 
     goto six
   if D2[ELM1] = ans[4] 
     goto six
   if D3[ELM1] = ans[4] 
     goto six
   if D4[ELM1] = ans[4] 
     goto six
   if D5[ELM1] = ans[4] 
     goto six
   if D6[ELM1] = ans[4] 
     goto six
   ELM1 = ELM1 + 1

return 0  //If the ans[4] number is not found in either D1[0-6], D2[0-6]....  D6[0-6]  return 0 which will then exclude ans[0-6] numbers

six:

EML1 := 0      start index for arrays Ds 

While EML1 < EML
   if D1[ELM1] = ans[5] 
     return 1            ////If the ans[1] number is not found in either D1[0-6].....  
   if D2[ELM1] = ans[5]      return 1 which will then include ans[0-6] numbers
     return 1
   if D3[ELM1] = ans[5] 
     return 1
   if D4[ELM1] = ans[5] 
     return 1
   if D5[ELM1] = ans[5] 
     return 1
   if D6[ELM1] = ans[5] 
     return 1
   ELM1 = ELM1 + 1

return 0

作为选择的语言，它将是纯粹的c

我对原始海报提供的算法做了一个直接简单的C实现。是这里

正如其他人所建议的，首先要做的就是汇总代码。展开对速度不太好，因为它导致代码缓存未命中。我开始通过滚动内部循环，并得到这个。然后，我滚动外部循环，并删除了现在无用的gotos，并获得了下面的代码。

编辑：我改变了几次C代码，因为即使这样简单，当JIT编译或使用CUDA执行它（和CUDA似乎不是很详细的错误）似乎有问题。这就是为什么下面的代码段使用全局变量...而这只是简单的实现。我们还没有追求速度。它说很多关于过早优化。如果我们甚至无法完成工作，为什么还要加快速度？我想还是有问题，因为如果我相信维基百科的文章，CUDA似乎会对您可以开展工作的代码施加很多限制。也许我们应该使用float而不是int？

#include <stdio.h>

int D1[6] = {3, 4, 5, 6, 7, 8};
int D2[6] = {3, 4, 5, 6, 7, 8};
int D3[6] = {3, 4, 5, 6, 7, 8};
int D4[6] = {3, 4, 5, 6, 7, 8};
int D5[6] = {3, 4, 5, 6, 7, 8};
int D6[6] = {3, 4, 5, 6, 7, 9};
int ST1[1] = {6};
int ans[6] = {1, 4, 5, 6, 7, 9};
int * D[6] = { D1, D2, D3, D4, D5, D6 };

/* beware D is passed through globals */
int algo(int * ans, int ELM){
    int a, e, p;

    for (a = 0 ; a < 6 ; a++){ 
        for (e = 0 ; e < ELM ; e++){
            for (p = 0 ; p < 6 ; p++){
                if (D[p][e] == ans[a]){
                    goto cont;
                }
            }
        }
        return 0; //bad row of numbers found
    cont:;
    }
    return 1;
}

int main(){
    int res;
    res = algo(ans, ST1[0]);
    printf("algo returned %dn", res);
}

现在这很有趣，因为我们可以理解代码在做什么。顺便做这个包装工作，我纠正了原始问题中的几个古怪问题。我认为这是错字，因为在全球范围内它根本不合逻辑。 - 转到总是跳到两个（它应该有进步） - 最后一次测试检查ans [0]而不是ans [5]

请马克，纠正我，如果我错了上面的假设原代码应该做什么，你的原始算法是无字体错误。

对于每个值，代码是在做什么检查它是存在于一个二维数组中。如果任何数字未命中，则返回0.如果找到所有数字，则返回1。

我想要得到一个真正快速的代码不是用C语言实现它，而是用python（或C ++）这样的其他语言实现它，其中set是标准库提供的基本数据结构。然后，我将建立一个包含所有数组值（即O（n））的集合，并检查搜索的数字是否存在于集合中（即O（1））。至少从算法的角度来看，最终的实现应该比现有的代码更快。

Python示例如下，因为它非常简单（打印真/假而不是1/0，但您明白了）：

ans_set = set(ans)
print len(set(D1+D2+D3+D4+D5+D6).intersection(ans_set)) == len(ans_set)

这是一个使用集合的可能的C ++实现：

#include <iostream>
#include <set>

int algo(int * D1, int * D2, int * D3, int * D4, int * D5, int * D6, int * ans, int ELM){
    int e, p;
    int * D[6] = { D1, D2, D3, D4, D5, D6 };
    std::set<int> ans_set(ans, ans+6);

    int lg = ans_set.size();

    for (e = 0 ; e < ELM ; e++){
        for (p = 0 ; p < 6 ; p++){
            if (0 == (lg -= ans_set.erase(D[p][e]))){
                // we found all elements of ans_set
                return 1;
            }
        }
    }
    return 0; // some items in ans are missing
}

int main(){
    int D1[6] = {3, 4, 5, 6, 7, 8};
    int D2[6] = {3, 4, 5, 6, 7, 8};
    int D3[6] = {3, 4, 5, 6, 7, 8};
    int D4[6] = {3, 4, 5, 6, 7, 8};
    int D5[6] = {3, 4, 5, 6, 7, 8};
    int D6[6] = {3, 4, 5, 6, 7, 1};

    int ST1[1] = {6};

    int ans[] = {1, 4, 5, 6, 7, 8};

    int res = algo(D1, D2, D3, D4, D5, D6, ans, ST1[0]);
    std::cout << "algo returned " << res << "n";
}

我们做出一些性能假设：ans的内容应该被排序或者我们应该构造它，否则我们假设D1..D6的内容将在对算法的调用之间改变。因此，我们不打算为它构造一个集合（因为集合构造是O（n），无论如何，如果D1..D6正在改变，我们不会获得任何东西）。但是如果我们使用相同的D1..D6多次调用算法，并且这是变化，我们应该做相反的事情，并将D1..D6转换为我们保持可用的更大集合。

如果我坚持CI可以做到如下：

将D1 .. D6中的所有数字复制到一个唯一数组中（每行使用memcpy）

对该数组的内容进行排序

使用二分搜索来检查数字是否可用

由于这里的数据量非常小，我们也可以尝试进行微观优化。它可以在这里支付更好。不确定。

EDIT2 ：对CUDA支持的C子集有严格的限制。最具限制性的是我们不应该使用指向主内存的指针。这必须考虑在内。它解释了为什么当前的代码不起作用。最简单的改变可能是依次调用每个数组D1..D6。为了保持简短并避免函数调用成本，我们可以使用宏或内联函数。我会发布一个提案。

我对你的问题有点困惑，但我觉得我已经足够帮你开始了。

#define ROW 6
#define COL 6

int D[ROW][COL]; // This is all of your D arrays in one 2 dimensional array.

接下来你应该使用嵌套for循环。每个循环将对应于D的维度。请记住，索引的顺序很重要。在C中保持直线的最简单方法是记住D[i]是一个有效的表达式，即使D具有多个维度（并且将评估为指向行的指针：子数组）。

如果不能将独立的D数组更改为一个多维数组，您可以轻松地创建一个指针数组，其成员指向每个数组的头并达到相同的效果。

然后，您可以在确定当前D[i]与ans不匹配后，使用break语句突破内部循环。

只有36个值进行比较，最有效的方法是根本不使用CUDA。

只需使用一个CPU循环。

如果你改变你的输入，我会改变我的答案。

链接地址: http://www.djcxy.com/p/12233.html

上一篇: What is the best algorithm for this array

下一篇: Get list from pandas DataFrame column headers