How could I speed up comparison of std::string against string literals?

I have a bunch of code where objects of type std::string are compared for equality against string literals. Something like this:

//const std:string someString = //blahblahblah;
if( someString == "(" ) {
   //do something
} else if( someString == ")" ) {
   //do something else
} else if// this chain can be very long

The comparison time accumulates to a serious amount (yes, I profiled) and so it'd be nice to speed it up.

The code compares the string against numerous short string literals and this comparison can hardly be avoided. Leaving the string declared as std::string is most likely inevitable - there're thousands lines of code like that. Leaving string literals and comparison with == is also likely inevitable - rewriting the whole code would be a pain.

The problem is the STL implementation that comes with Visual C++11 uses somewhat strange approach. == is mapped onto std::operator==(const basic_string&, const char*) which calls basic_string::compare( const char* ) which in turn calls std::char_traits<char>( const char* ) which calls strlen() to compute the length of the string literal. Then the comparison runs for the two strings and lengths of both strings are passed into that comparison.

The compiler has a hard time analyzing all this and emits code that traverses the string literal twice. With short literals that's not much time but every comparison involves traversing the literal twice instead of once. Simply calling strcmp() would most likely be faster.

Is there anything I could do like perhaps writing a custom comparator class that would help avoid traversing the string literals twice in this scenario?


Similar to Dietmar's solution, but with slightly less editing: you can wrap the string (once) instead of each literal

#include <string>
#include <cstring>
struct FastLiteralWrapper {
    std::string const &s;

    explicit FastLiteralWrapper(std::string const &s_) : s(s_) {}

    template <std::size_t ArrayLength>
    bool operator== (char const (&other)[ArrayLength]) {
        std::size_t const StringLength = ArrayLength - 1;
        return StringLength == s.size()
            && std::memcmp(s.data(), other, StringLength) == 0;
    }
};

and your code becomes:

const std:string someStdString = "blahblahblah";
// just for the context of the comparison:
FastLiteralWrapper someString(someStdString);
if( someString == "(" ) {
   //do something
} else if( someString == ")" ) {
   //do something else
} else if// this chain can be very long

NB. the fastest solution - at the cost of more editing - is probably to build a (perfect) hash or trie mapping string literals to enumerated constants, and then just switch on the looked-up value. Long if / else if chains usually smell bad IMO.


Well, aside from C++14's string_literal , you could easily code up a solution:

  • For comparison with a single character, use a character literal and:

    bool operator==(const std::string& s, char c)
    {
      return s.size() == 1 && s[0] == c;
    }
    
  • For comparison with a string literal, you can use something like this:

    template<std::size_t N>
    bool operator==(const std::string& s, char const (&literal)[N])
    {
      return s.size() == N && std::memcmp(s.data(), literal, N-1) == 0;
    }
    
  • Disclaimer:

  • The first might even be superfluous,
  • Only do this if you measure an improvement over what you had.

  • If you have long chain of string literals to compare to there is likely some potential to deal with comparing prefixes to group common processing. Especially when comparing a known set of strings for equality with an input string, there is also the option to use a perfect hash and key the operations off an integer produced by those.

    Since the use of a perfect hash will probably have the best performance but also requires major changes of the code layout, an alternative could be to determine the size of the string literals at compile time and use this size while comparing. For example:

    class Literal {
        char const* d_base;
        std::size_t d_length;
    public:
        template <std::size_t Length>
        Literal(char const (&base)[Length]): d_base(base), d_length(Length - 1) {}
        bool operator== (std::string const& other) const {
            return other.size() == this->d_length
                && !other.memcmp(this->d_base, other.c_str(), this->d_length);
        }
        bool operator!=(std::string const& other) const { return !(*this == other); }
    };
    bool operator== (std::string const& str, Literal const& literal) {
        return literal == str;
    }
    bool operator!= (std::string const& str, Literal const& literal) {
        return !(str == literal);
    }
    

    Obviously, this assumes that your literals don't embed null characters ('') other than the implicitly added terminating null character as the static length would otherwise be distorted. Using C++11 constexpr it would be possible to guard against that possibility but the code gets somewhat more complicated without any good reason. You'd then compare your strings using something like

    if (someString == Literal("(")) {
        ...
    }
    else if (someString == Literal(")")) {
        ...
    }
    
    链接地址: http://www.djcxy.com/p/72128.html

    上一篇: 当你不遵循argv和argc的做法会发生什么

    下一篇: 我怎么能加快比较std :: string与字符串文字?