make your compiler work for you
What is Template Metaprogramming? The prefix meta- is usually used to indicate an additional level of abstraction: metadata is information about data. Similarly, metaprogramming is writing programs about programs. The emerging technique of template metaprogramming takes advantage of C++ templates to write programs that manipulate themselves or other programs at compile time.
Template metaprogramming is both a curiosity and a powerful optimisation method. The curiosity is that a standard C++ compiler is a Turing-complete interpreter for a subset of C++. In theory, any computable problem can be solved at compile time without ever executing compiled code; in practice achievable results are limited by inefficiencies of the system and details of the compiler [ 1 ] . Metaprogramming can facilitate increased performance by computing performance-critical results at compile time or using compile-time logic to choose the best algorithm for the job.
I became interested in the subject of template metaprogramming after reading two recently published books: Andrei Alexandrescu's Modern C++ Design [Alexandrescu] , and Czarnecki and Eisenecker's Generative Programming [Czarnecki-] . I highly recommend both these books; they will remind you of the richness of the C++ language.
The basic tools of metaprogramming consist of constructions that should be familiar to most C++ programmers: compile-time constants, typedefs, template specialization, and recursive templates. These are enough to provide both a looping construction (recursive templates) and a conditional (template specialization), which are, in turn, enough for a Turing-complete language.
The mechanics of metaprogramming may require a readjustment from the regular C++ programmer, since the C++ metaprogramming sub-language resembles a dynamically-typed functional language. However, with a good grasp of recursion, a few hints from functional programming languages, and a will to learn, the area can be quickly understood.
Metafunctions
One of the basic metaphors of metaprogramming is what I call template-is-function. Template-is-function is characterized by a template struct with an enum or typedef as a return value. Czarnecki and Eisenecker call such structs "metafunctions" [Czarnecki-] . An example metafunction to compute the square of N is:
template<int n> struct Sqr {enum { RET = n*n }; };
And it is used like this:
std::cout << Sqr<10>::RET << std::endl;
Note that the example above and those that follow use the "enum hack" to define integer constants. This is a workaround for compilers which do not support the standard C++ feature of initialization of static constants inside the class definition, as shown below [ 2 ] :
//alternative implementation template<int n> struct Sqr {static const int RET = n*n; };
Several parts of metafunctions are analogous to those of ordinary functions. Here is a mapping from the concepts of ordinary functions to those of metafunctions:
function -> templated type (struct or
class)
function arguments -> template arguments
function return(s) -> public enums/static
constants/typedefs
There are several things to note here. First, it is often convenient to use structs rather than classes to avoid typing the public: declaration. However, if our metafunction has something to hide (intermediate implementation details), we can use a class and encapsulate those details. Second, it is possible for a metafunction to have multiple return values. I use Czarnecki and Eisenecker's convention of naming a single return value ::RET (for return). Another popular name is ::value , which is used by Andrei Alexandrescu in Modern C++ Design [ Alexandrescu ], as well as the boost libraries [ Boost ]. Regardless of the particular convention, it is often important to have consistently named returns, as we will see.
Here is another metafunction that computes N factorial:
template<int n> struct Factorial {enum {RET = n * Factorial<n-1>::RET }; };
By itself, this would result in infinite recursion, so we add a base case through template specialization:
template<> struct Factorial<0> {enum {RET = 1 }; };
This example demonstrates two central elements of metaprogramming: looping using recursive templates, and conditionals using template specialization. As an exercise, you might write a metafunction to compute the nth member of the Fibonacci series.
So far this should be pretty comfortable territory. Now prepare yourself for a dark journey to the land of the angle brackets as we explore compile-time data structures.
Metaobjects [ 3 ]
To simulate collections and other objects at compile time we will take a hint from a functional language, in this case, Lisp. The basic data structure in Lisp is called a cons , which is more or less the equivalent of std::pair . A cons has two members, car , which is the head, and cdr , which is the tail [ 4 ] . I have followed Czarnecki and Eisenecker by adopting the more descriptive Head and Tail . We can form lists, trees, and other useful data structures by creating cons es which themselves contain cons es. We terminate these data structures with a special empty value, Nil . Our metaprogramming list basics are:
struct Nil { typedef Nil Head; typedef Nil Tail; }; template<class Head_, class Tail_=Nil> struct Cons { typedef Head_ Head; typedef Tail_ Tail; };
To enable us to hold integers in cons es, we'll define:
template <int n> struct Int {enum {RET = n }; };
Now we can represent the list 3, 1, 2, 4 as:
Cons<Int<3>, Cons<Int<1>, Cons<Int<2>, Cons<Int<4>, Nil> > > >
See what I mean about angle brackets?
Simulating cons in C++ metaprogramming demonstrates the metaphor I call "template-is-object." Multiple typedefs or enums in a struct work like the members of a C++ object. Here is the mapping from the concepts of ordinary objects to those of metaobjects:
object -> templated type (struct or
class)
constructor arguments -> template arguments
methods/member variables -> enums/static
constants/typedefs
The template-is-object metaphor works somewhat less well than template-is-function because in metaprogramming there is no way to modify member variables. Rather than true variables, functional programming has symbolic names for values. If a new value is needed, a new name must be introduced [ Czarnecki- ].
There are many interesting metafunctions that could be written to operate on lists made up of conses, e.g.,
//metafunction to compute the length of a list made up of conses template<class List> struct Length {enum {RET = 1 + Length<typename List::Tail>::RET}; }; template<> struct Length<Nil> {enum {RET = 0 }; }; //metafunction to append an object to the end of a list template<class Elt, class List> struct Append1 { typedef Cons<typename List::Head, Append1<Elt, typename List::Tail>::RET> RET; }; template<class Elt> struct Append1<Elt, Nil> { typedef Elt RET; }
A quick glance through an introductory Lisp book shows the functions member , nth , and union as interesting candidates for metafunction-writing exercises [ Graham ]. If nothing else, your excursion into metaprogramming will teach you how to use the typename keyword [ 5 ] .
How Many Queens are Too Many?
Knowing that the best way to learn is by doing, I set out to solve an example problem with metaprogramming, N queens. For those who are not familiar with the problem, the task is to find a placement of N queens on an N x N chessboard so that none of them is attacked by the other queens. This is a generalization of the 8-queens problem, which uses a standard chessboard.
The first task was to develop an ordinary solution to the problem. After some tinkering, I came up with:
//represent a queen by its row and column struct Queen { Queen(int r, int c) : row(r), col(c) { } int row; int col; }; //determine whether newQueen would attack any of queens check for horizontal // (rowdiff==0), vertical (coldiff==0), and diagonal attacks (rowdiff==coldiff) bool hasConflicts(Queen newQueen, std::list<Queen> queens){ for (std::list<Queen>::iterator ii=queens.begin(); ii!=queens.end(); ++ii) { int rowdiff = abs(ii-row - newQueen.row); int coldiff = abs(ii-col - newQueen.col); if (rowdiff==0 || coldiff==0 || rowdiff==coldiff) return true; } return false; } //display a solution (a list of queens) as pairs of coordinates separated by semicolons void printSolution(std::list<Queen> queens){ for(std::list<Queen>::iterator ii=queens.begin(); ii!=queens.end(); ++ii) { std::cout << "(" << ii-row << "," << ii-col << ");"; } std::cout << std::endl; } //place a Queen on the "board" by adding to the list queens //Recursively places the next queen and prints a solution if it exists void placeQueen(int N, int row, int col, std::list<Queen> queens) { //try the next column for additional solutions if (col N-1) placeQueen(N, row, col+1, queens); //if there is no conflict, attempt another queen in the next row if (!hasConflicts(Queen(row, col), queens)) { queens.push_back(Queen(row, col)); //if we have not yet reached a full solution, place another //queen; else, print the solution if (row N-1) placeQueen(N, row+1, 0, queens); else printSolution(queens); } } //reads the problem size from the command line and prints any solutions int main(int argc, char* argv[]) { int N = 8; if (argc > 1) N = atoi(argv[1]); placeQueen(N, 0, 0, std::list<Queen>()); }
This prints the solutions in a not-so-user-friendly format to stdout . A good test is the number of solutions (for 8 queens it should be 92). We can check by piping the output into the Unix "wc" utility, which counts lines when given the -l option.
jwalker@avakkai $a.out 8 | wc -l 92
Looks good. Now to translate it to a metaprogram. The Queen struct becomes:
template<int r, int c> struct Queen { enum { row = r }; enum { col = c }; };
To implement HasConflicts , we will need an abs equivalent:
template<int> n struct Abs {enum { RET = n < 0 ? -n : n }; };
We can replace the for loop in hasConflicts with recursion:
template<class Queen, class QueenList> class HasConflicts { typedef typename QueenList::Head Head; enum { coldiff = Abs<Queen::col - Head::col>::RET }; enum { rowdiff = Abs<Queen::row - Head::row>::RET }; public: enum {RET = coldiff==0 || rowdiff==0 || coldiff == rowdiff ? true : HasConflicts<Queen, typename QueenList::Tail>::RET }; }; template<class Queen> class HasConflicts<Queen, Nil> { public: enum { RET = false }; };
Notice that the implementation details are hidden by making them private .
A compile-time if statement will make implementing placeQueen much easier. Czarnecki and Eisenecker provide this implementation using partial specialization (as well as another which does not require partial specialization) [ Czarnecki- ]:
template<bool condition, class Then, class Else> struct IF {typedef Then RET;}; template<class Then, class Else> struct IF<false, Then, Else> {typedef Else RET; };
As Czarnecki and Eisenecker explain, when using this compile-time if statement, it is important to move as much evaluation outside the template arguments as possible. Otherwise, infinite template recursion could result. That is, instead of:
typedef typename IF<n==0, F<n>::RET, G<n> >::RET::RET RET;
The evaluation of ::RET in the Then or Else parts should be delayed:
typedef typename IF<n==0, F<n>, G<n> >::RET::RET RET;
This explains the need for a consistent naming scheme for metafunction returns. If F<> and G<> used different names for their return values, it would not be possible to move the evaluation outside IF<> . Of course, there will be occasions when it is inconvenient to have the same name for return values. Consider a case where we have received a list in the template parameter OldList and wish to conditionally append a value to it:
//won't compile typedef typename IF<n==0, OldList, Append1<Int<n>, OldList >::RET::RET RET;
Here the problem is that there is no typedef RET in OldList (which is just a cons ). To solve this problem, we can use an extra level of indirection with a "template adapter." In the case of con s, this should simply make the structure available as the typedef RET . We will name the metafunction Identity , after the Lisp function which performs the same operation.
template<class Obj> struct Identity {typedef Obj RET; };
Now the example above could correctly be written as:
typedef typename IF<n==0, Identity<OldList>, Append1<Int<n>, OldList >::RET::RET RET;
Now let's get back to N Queens. Rather than immediately printing the solutions, we will accumulate them in a list which is passed in as a template argument of PlaceQueen<> and out as the typedef RET:
// translation of dynamic placeQueen function. SolList is passed the list of existing // solutions since we cannot print them immediately when found template<int N, int col=0, int row=0, class QueenList=Nil, class SolList=Nil> class PlaceQueen { typedef Cons<Queen<row, col>, QueenList> NewQueenList; //we add any solutions found by placing in the next column as the temporary TempSol typedef typename IF<col < N - 1, PlaceQueen<N, col+1, row, QueenList, SolList>, Identity<SolList> >::RET::RET TempSol; public: typedef typename IF< HasConflicts<Queen<row, col>, QueenList>::RET, Identity<Identity<TempSol> >, IF<row < N - 1, PlaceQueen<N, 0, row+1, NewQueenList, TempSol>, Identity<Cons<NewQueenList, TempSol> > > >::RET::RET::RET RET; };
This is enough to test our algorithm. We can find out the result of the compiler's computation by asking for a nonexistent typedef, e.g., foo , in the result. This will force the compiler (gcc, at least), to output an error message containing the type name:
const int NUMQUEENS=4; int main() { PlaceQueen<NUMQUEENS>::RET::foo bar; }
Compiling, we get:
jwalker@avakkai $ g++ NQueensStatic.cpp -ftemplate-depth-100 -w NQueensStatic.cpp: In function 'int main()': NQueensStatic.cpp:85: 'foo' is not a member of type Cons<Cons<Queen<3,2>,Cons<Queen<2,0>,Cons<Queen<1,3>,Cons<Queen<0,1>,Nil> > > >, Cons<Cons<Queen<3,1>,Cons<Queen<2,3>,Cons<Queen<1,0>,Cons<Queen<0,2>,Nil> > > >,Nil> > ' NQueensStatic.cpp:85: parse error before ';'
The error message contains our solutions!
This is interesting, but not incredibly useful. What we would like to have is a way to use the information from compile-time computations at runtime. We can now use what Czarnecki and Eisenecker call "execute-loops" to statically generate code. This Foreach loop was inspired by the EWHILE , EDO , and EFOR examples in Generative Programming [ Czarnecki- ].
//compare two types for equality template<class lhs, class rhs> struct EqualType {enum {RET = false }; }; template<class lhs> struct EqualTypelhs, lhs {enum { RET = true }; }; struct EmptyStatement { static void exec() { } }; //loop over a cons-based list, executing Statement::exec() for each list member template<class Statement> struct Foreach { static void exec() { Statement::exec(); IF<EqualType<typename Statement::NextList, Nil>::RET, EmptyStatement, Foreach<typename Statement::Next> >::RET::exec(); } };
We combine this code with a custom statement to loop over each solution, and then over each queen within the solution in turn.
//statement for Foreach which prints a list of queens in // the same format as the dynamic solution template<class QueenList> struct PrintQueens { typedef typename QueenList::Head Queen; static void exec() { std::cout << "(" << Queen::row << "," << Queen::col << ");"; } typedef typename QueenList::Tail NextList; typedef PrintQueens<NextList> Next; }; //statement for Foreach which prints a list of solutions one per line template<class SolutionList> struct PrintSolutions { typedef typename SolutionList::Head Solution; static void exec() { Foreach<PrintQueens<Solution> >::exec(); std::cout << std::endl; } typedef typename SolutionList::Tail NextList; typedef PrintSolutions<NextList> Next; }; const int NUMQUEENS=4; int main() { typedef PlaceQueen<NUMQUEENS>::RET Solutions; Foreach<PrintSolutions<Solutions> >::exec(); }
Let's try it:
jwalker@avakkai $ g++ NQueensStatic.cpp -ftemplate-depth-100 -w jwalker@avakkai $ a.out (3,2);(2,0);(1,3);(0,1); (3,1);(2,3);(1,0);(0,2);
That's what we wanted. We have computed the solution at compile-time and printed it at runtime. A similar technique could be used to convert the compile-time solution into a run-time data structure for further analysis.
Now for my dirty secret. As you may have guessed based on the cautious NUMQUEENS=4 statement above, compile-time computation is not terribly efficient. Here is a comparison of compile and execution times for the static and dynamic solutions, varying the number of queens from 4 to 8 (tests were run using gcc 2.95.3 on a Sun Enterprise 420R with quad 450Mhz processors and 2GB of RAM).3
4 | 5 | 6 | 7 | 8 | |
Static Metaprogram | |||||
compile time | 1.067s | 4.655s | 31.76s | 696.8s | N/A |
compiler peak VM usage | 6.6MB | 22.2MB | 58.2MB | 982MB | N/A |
execution time | 0.009s | 0.008s | 0.009s | 0.010s | N/A |
Dynamic Program | |||||
compile time | 1.019s | 1.019s | 1.019s | 1.019s | 1.019s |
execution time | 0.009s | 0.012s | 0.020s | 0.059s | 0.246s |
In fact, I cannot even compile the 8-queens solution because the machine runs out of virtual memory after exhausting all 2GB6! While it is somewhat depressing that this canonical instance of N Queens cannot be solved, I am quite satisfied by the wealth of knowledge I learned constructing this example. As a similar exercise, it might be fun to solve the Towers of Hanoi with static metaprogramming.
Luckily, there are many useful techniques that can be performed without taxing the compiler so heavily as this toy example. Real-world uses of metaprogramming include customisation using template traits (see Andrei Alexandrescu's Loki library [ Loki ] for excellent examples of traits) and high-performance numeric computing with expression templates (for which Blitz++ [ Blitz ] is gaining fame).
References
[Alexandrescu] Andrei Alexandrescu: Modern C++ Design: Generic Programming and Design Patterns Applied, Addison-Wesley, 2001, ISBN 0-201-70431-5
[Czarnecki-] Krzysztof Czarnecki and Ulrich W. Eisenecker: Generative Programming: Methods, Tools, and Applications, Addison-Wesley, 2000, ISBN 0-201-30977-7
[Boost] "Boost C++ Libraries", http://www.boost.org
[Graham] Paul Graham: ANSI Common Lisp, Prentice Hall, 1996, ISBN 0-13-370875-6
[Williams] Anthony Williams: "Introduction to C++ Templates", Overload 45, October 2001
[Loki] http://www.awl.com/cseng/titles/0-201-70431-5/loki.zip
[Blitz] "Blitz++ Home Page", http://www.oonumerics.org/blitz/
[ 1 ] Although in practice, the implementation-dependent maximum depth for template instantiations means that this Turing machine may have a very short tape. The C++ Standard only requires a maximum depth of 17; anything else is an extension. I used gcc 2.95.3 to compile the examples in this article. Gcc has a -ftemplate-depth-xxx switch allowing the user to explicitly set the maximum depth for template instantiations.
[ 2 ] Strictly speaking, the C++ standard requires that a definition be provided for a static constant member if it is used in the program. However, the intent (and probable result using existing compilers) seems to be that the definition should only be required if the static constant is used in a way requiring storage in memory, e.g., taking the address. Additionally, curiously enough, the enum version compiles over twice as fast as the static const int version on gcc 2.95.3.
[ 3 ] Metaobjects is probably a misuse of the meta prefix. More accurately, these are metaprogramming objects.
[ 4 ] "The names car and cdr derive from the internal representation of lists in the first Lisp implementation: car stood for 'contents of the address part of the register' and cdr stood for 'contents of the decrement part of the register'" [ Graham ].
[ 5 ] The "typename" keyword must be used to qualify Dependent Names. Anthony Williams has written an informative article on C++ templates which unfortunately arrived in my mailbox too late to help me with these details [ Williams ].