Fastest way to count the number of occurrences of a string

I was wondering what is the fastest way to count the number of occurrences of a string (needle) within another string (haystack). The way I'm doing it is:

int findWord(char * file, char * word){
 char *fptr;
 char * current = strtok_r(file, " ,.n", &fptr);
 int sum = 0;
 while (current != NULL){
    //printf("%sn", current);
    if(strcmp(current, word) == 0)
        sum+=1;
    current = strtok_r(NULL, " ,.n", &fptr);
 }
 return sum;
}

Would it be faster to use a more complex algorithm (Boyer-Moore)? Thanks


Currently, if your program is counting word "blah" and encounters a token is "blahblah" , your algorithm counts it as zero occurrences. If it needed to count it as two, you cound benefit from a more advanced approach.

If your program does what you want, you are processing as fast as you can: it is already linear in the number of letters of the longer "word", so you cannot speed it up further.

An even more interesting solution would be required to count words with self-aliasing: for example, count "aa" s inside "aaaa" string. If you needed to return 3 for this situation, you'd need a lot more advanced algorithm.


Would it be faster to use a more complex algorithm (Boyer-Moore)?

In your algorithm, the unit of comparison is a word rather than a character. This enables the algorithm to ignore matches that straddle a word boundary, and thus makes it run in O(n) time.

I doubt you'd be able to beat that asymptotically.

As far as lowering the multiplicative constant, right now your algorithm looks at every character in file twice. You can eliminate that redundancy by rewriting the code to use a pair of pointers and a single for loop (figuring out the details is left as an exercise for the reader :))


Unless your system has a bad implementation of string functions, this should be roughly the fastest:

const char *s, *t;
size_t cnt;
for (cnt=0, s=haystack; t=strchr(s, needle); s=t+1, cnt++);

Adjust it a bit (+strlen(needle) rather than +1) if you don't want to count overlapping matches.

链接地址: http://www.djcxy.com/p/85036.html

上一篇: 系统的Windows应用程序

下一篇: 计算字符串出现次数的最快方法