Is cuda suitable for array filtering
I am working on optimizing a program with a time complexity of O(N^7). I have an array of strings, represented as 32 bit integers, where each bit corresponds to a specific character in the input string. The job is to find all combinations of input strings, where each character is present exactly once, and having all characters present. The naive solutions require 7 layers of recursion, with each layer iterating over the entire list. This pretty soon becomes extremely slow.
So I was wondering if I could use cuda to speed up the process a bit, by feeding the GPU with an array of possible strings, and a bitmask that should not be matched, and get a filtered list back, so I could speed up the recursive steps a bit.
So the question is: Is this kind of filtering suitable for parallel processing?
What I am doing in C right now is described below.
void recursive_search (unsigned int used, unsigned int *list, int listlen,
int start,unsigned int * stack, int reclevel) {
int index, newindex;
newindex = 0;
for (index=0; index< listlen; index++) {
if (!list[index] & used) {
newlist[newindex++] = list[index];
}
}
if ((newindex == 1 && (used | newlist[0])) == 0xffffffff) {
/* Hooray! We have a match */
stack[reclevel] = newlist[0];
report_match(stack);
return;
}
for (index = 0;index < newindex; index++) {
recursive_search (used | newlist[index], newlist, newindex,
index, stack, reclevel + 1);
}
}
I hope this make my question clearer.
This code section below may be converted to a copy_if statement.
for (index=0; index< listlen; index++) {
if (!list[index] & used) {
newlist[newindex++] = list[index];
}
}
And the statement is like
thrust::copy_if(list.begin(), list.end(), newlist.begin(), predicate());
So you will achieve your new list easily.
You can generate the array of possible strings on the GPU?
About the recursion:
上一篇: Cuda图像平均过滤器
下一篇: cuda适合阵列过滤