Making templatized optimization more maintainable

Sometimes a piece of code can be better-optimized by the compiler by using a templatized internal implementation for an invariant. For example, if you have a known number of channels in an image, instead of doing something like:

Image::doOperation() {
    for (unsigned int i = 0; i < numPixels; i++) {
        for (unsigned int j = 0; i j mChannels; j++) {
            // ...
        }
    }
}

you can do this:

template<unsigned int c> Image::doOperationInternal() {
    for (unsigned int i = 0; i < numPixels; i++) {
        for (unsigned int j = 0; j < c; j++) {
            // ...
        }
    }
}

Image::doOperation() {
    switch (mChannels) {
        case 1: doOperation<1>(); break;
        case 2: doOperation<2>(); break;
        case 3: doOperation<3>(); break;
        case 4: doOperation<4>(); break;
    }
}

which allows the compiler to generate different unrolled loops for the different channel counts (which can in turn vastly improve runtime efficiency and also open up different optimizations such as SIMD instructions and so forth).

However, this can often expand into some fairly large case statements, and any method which has been optimized in this way must have the unrolled case statement. So, let's say that instead we had an enum Format for the known image formats (where the value of the enum happens to map to the channel count). Since the enum only has a certain range of known values, there is a temptation to try this:

template<Image::Format f> Image::doOperationInternal() {
    for (unsigned int i = 0; i < numPixels; i++) {
        for (unsigned int j = 0; j < static_cast<unsigned int>(f); j++) {
            // ...
        }
    }
}

Image::doOperation() {
    const Format f = mFormat;
    doOperationInternal<f>();
}

However, in this case the compiler (rightfully) complains that f is not a constant expression, even though it has only a finite range and in theory the compiler could generate the switch logic to cover all of the enumerated values.

So, my question: is there an alternate approach which will allow the compiler to generate invariant-value-optimized code without requiring a switch-case explosion per function invocation?


Make jump table array, then invoke. The goal is to create an array of the various functions, then do an array lookup and call the one you want.

First, I'll do the C++11 one. C++1y contains its own integral sequence types, and has easy to write auto return types: the C++11 one will return void .

Our functor class looks something like this:

struct example_functor {
  template<unsigned N>
  static void action(double d) const {
    std::cout << N << ":" << d << "n"; // or whatever, N is a compile time constant
  }
};

In C++11, we will want some boilerplate:

template<unsigned...> struct indexes {};
template<unsigned Max, unsigned... Is> struct make_indexes:make_indexes< Max-1, Max-1, Is... > {};
template<unsigned... Is> struct make_indexes<0, Is...>:indexes<Is...> {};

to create and pattern match packs of indexes.

The interface then looks like:

template<typename Functor, unsigned Max, typename... Ts>
void invoke_jump( unsigned index, Ts&&... ts );

and is called like:

invoke_jump<example_functor, 10>( 7, 3.14 );

We first create a helper:

template<typename Functor, unsigned... Is, typename... Ts>
void do_invoke_jump( unsigned index, indexes<Is...>, Ts&&... ts ) {
  static auto table[]={ &(Functor::template action<Is>)... };
  table[index]( std::forward<Ts>(ts)... )
}
template<typename Functor, unsigned Max, typename... Ts>
void invoke_jump( unsigned index, Ts&&... ts ) {
  do_invoke_jump( index, make_indexes<Max>(), std::forward<Ts>(ts)... );
}

which creates a static table of Functor::action then does a lookup on them and invokes it.

In C++03 we don't have ... syntax, so we have to do more things manually, and no perfect forwarding. What I'll do is create a std::vector table instead.

First, a cute little program that runs Functor.action<I>() for I in [Begin, End) in order:

template<unsigned Begin, unsigned End, typename Functor>
struct ForEach:ForEach<Begin, End-1, Functor> {
  ForEach(Functor& functor):
    ForEach<Begin, End-1, Functor>(functor)
  {
    functor->template action<End-1>();
  }
};
template<unsigned Begin, typename Functor>
struct ForEach<Begin,Begin,Functor> {};

which I will admit is overly cute (the chain is implicitly created by the constructor dependencies).

We then use this to build a vector up.

template<typename Signature, typename Functor>
struct PopulateVector {
  std::vector< Signature* >* target; // change the signature here to whatever you want
  PopulateVector(std::vector< Signature* >* t):target(t) {}
  template<unsigned I>
  void action() {
    target->push_back( &(Functor::template action<I>) );
  }
};

We can then hook the two up:

template<typename Signature, typename Functor, unsigned Max>
std::vector< Signature* > make_table() {
  std::vector< Signature* > retval;
  retval.reserve(Max);
  PopulateVector<Signature, Functor> worker(&retval);
  ForEach<0, Max>( worker ); // runtime work basically done on this line
  return retval;
}

which builds our jump table as a std::vector .

We can then call the Ith element of the jump table easily.

struct example_functor {
  template<unsigned I>
  static void action() {
    std::cout << I << "n";
  }
};
void test( unsigned i ) {
  static std::vector< void(*)() > table = make_table< void(), example_functor, 100 >();
  if (i < 100)
    table[i]();
}

which when passed the integer i prints it and then a newline.

The signature of the function in the table can be whatever you want, so you can pass in a pointer to a type and invoke a method, with I being a compile-time constant. The action method does have to be static , but it can call non- static based methods of its arguments.

The big differences in C++03 is that you need different code for different signatures of the jump table, a lot of machinery (and a std::vector instead of a static array) to build the jump table.

When doing serious image processing, you'll want to have scanline functions generated this way, with per-pixel operations possibly embedded in it somewhere in the generated scanline function. Doing a jump-dispatch once per scanline is usually fast enough, unless your images are 1 pixel wide and a billion pixels tall.

The above code still needs auditing for correctness: it was written without being compiled.


Yakk的C ++ 11 / 1y技术非常棒,但如果C ++ 03版本对你来说有点太过模板欺骗,那么至少会避免复制和粘贴switch语句,并且只给你一个简单/不太优雅的版本switch语句保持:

#include<iostream>

using namespace std;

struct Foo {
    template<unsigned int c>
    static void Action() {
        std::cout << "c: " << c << endl;
    }
};

template<typename F>
void Dispatch(unsigned int c) {
    switch (c) {
    case 1: F::Action<1>(); break;
    case 2: F::Action<2>(); break;
    case 3: F::Action<3>(); break;
    }
}

int main() {
    for (int i = 0; i < 4; ++i)
        Dispatch<Foo>(i);
}

Just for the sake of completeness, here's my (temporary) solution that I went with in the meantime:

#define DISPATCH_TEMPLATE_CALL(func, args) do { 
    switch (mChannels) { 
    case 1: func<1> args; break; 
    case 2: func<2> args; break; 
    case 3: func<3> args; break; 
    case 4: func<4> args; break; 
    default: throw std::range_error("Unhandled format"); 
    } 
} while (0)

template<unsigned int n> void Image::doSomethingInternal(a, b, c) {
    // ...
}

void Image::doSomething(a, b, c) {
    DISPATCH_TEMPLATE_CALL(doSomethingInternal, (a, b, c));
}

which is obviously not a preferable approach. But it works.

链接地址: http://www.djcxy.com/p/88660.html

上一篇: Java切换枚举不会检测到所有情况都被覆盖

下一篇: 使模板化优化更易于维护