Giving STL Iterators a Base Class Some little while ago, I posted a question on the accu-progquestions list regarding virtual template functions. The short answer I already knew – “it can’t be done.” The slightly longer answer was provided by Kevlin Henney – “it can be done if you know how.”
A template function cannot be made virtual because, as Kevlin put it, The dynamic binding you want ... is more significant than the polymorphism required to handle any type . The virtual function table for the class would contain the address of the function normally, but a template function may never be instantiated, or may be instantiated multiple times. What’s a poor compiler to do? Spit it out with a diagnostic, probably (which is exactly what mine did).
I thought that this could be overcome in a simple and idiomatic manner. I wasn’t far off the mark; the solution provided by Kevlin is, in his words, a collaboration of idioms. I’ll try to identify them as we go along.
The Good, the Bad and the Ugly
Consider a class to write to a file. The format of a given file is probably fixed (few files contain both fixed width and comma-separated values), but a more general solution will provide for several formats. Even better, a good solution will also provide for extensions; today you may want only CSV and fixed width, but tomorrow you may also require an ODBC data source. An obvious way to do this is provide an interface class, with pure virtual functions defining the interface to be implemented in derived classes. Each derived class has knowledge of the layout and delimiter requirements of the target.
class file_writer
{
public:
// pure virtual functions define the
// interface
};
class csv_writer : public file_writer
{
// implement virtual functions according
// to CSV format
};
class fixed_w_writer : public file_writer
{
// implement virtual functions according to
// fixed width format
};
The original requirement was for something along these lines. A record writer function would take a range of values to be written as a single record:
class file_writer
{
public:
// ...
template<typename Iterator>
virtual void write(Iterator begin, Iterator end) = 0;
};
As already mentioned, this can’t be done directly. Three possibilities immediately present themselves:
virtual void write(
std::vector<std::string>::const_iterator begin,
std::vector<std::string>::const_iterator end );
virtual void write(
std::vector<std::string>container);
Neither of these is very pleasing; both tie you, as the client, into a specific container and contained type. The first has advantages over the second, in that you, the client, can pass only a partial range if you wish. The
write()
function itself doesn’t have to change. Both suffer from having to specify the particular iterator to be used. For example, the function would need to be overloaded for a reverse iterator.
The third option is to pass each value in the container in turn to the
write()
function. This has the improvement over the previous solutions in that the container type is no longer an issue, but has other limitations, a burden on the client not least among them.
virtual void file_writer::write(const std::string & value) = 0;
// ...
std::vector<std::vector<std::string> > values;
file_writer * csv_file = new csv_file_writer (file_name);
// ...
for (std::vector<std::vector<std::string> >::iterator record = values.begin();
record != values.end(); ++record)
{
std::vector<std::string>::iterator i = record->begin();
while (i != record->end()
{
csv_file->write (*i);
++i;
if (i != record->end())
{
csv_file->write (",");
}
}
csv_file->write ("\n");
}
This code would probably do the job, but is prone to error, slightly obfuscated (which could be helped with judicious use of using declarations), and the client is required to know the format of the output file – defeating the original object.
The clue to the solution is in the requirement. Ideally, we want to be able to write:
virtual void write (any_iterator begin, any_iterator end);
So, a good starting point is a class called
any_iterator
.
The Type With No Name
The reason for writing a function which operates on an iterator range as a template function, is so that it’ll work with any iterator (provided the function itself uses a conformant interface). Given that fact, we need to be able to create an
any_iterator
from, well, any iterator. In this case, it is sufficient to be able to create one from any
Input Iterator
.
The concept of
Input Iterator
limits the scope of what we need to achieve in
any_iterator
. We need the following operations:
- default construction
- assignment/copy
- equality compare
- dereference
- member access
- pre increment
- post increment
- post increment and dereference 1
On this basis, then, our
any_iterator
class will provide the following member functions:
- any_iterator()
- any_iterator (const any_iterator &)
- operator= (const any_iterator &)
- operator== (const any_iterator &)
- operator!= (const any_iterator &)
- operator*()
- operator->()
- operator++()
- operator++(int)
This provides our interface, but we have to decide, for example, what
operator*()
should return. We can specialise the entire class on its contained type (in our original requirement,
std::string
), or we can parameterise the class on the contained type:
template<typename Contained>
class any_iterator
{
// ...
This latter method means that the signature for our
write()
function becomes
void write (any_iterator<std::string> begin, any_iterator<std::string> end)
which seems a reasonable compromise.
With this in mind, we can provide an implicit conversion from any
Input Iterator
(actually, from anything, but the restrictions will come later) using a member template function for a constructor.
template<typename Iterator>
any_iterator (const Iterator & rhs);
Painting Your Wagon
The real magic of the solution is expressed in a Pattern called “External Polymorphism” which is published as having the following intent:
Allow C++ classes unrelated by inheritance and/or having no virtual methods to be treated polymorphically. These unrelated classes can be treated in a common manner by software that uses them. [1]
Standard Library Iterators are expressly not related by inheritance; they are related by concept [2]. They do not have virtual functions, and cannot be treated polymorphically, in the late-binding sense. Our requirements for
any_iterator
are to treat
Iterators
as if they had a common ancestor. They could then be treated in the common manner described above.
class interface
{
public:
// pure virtual interface declarations
};
template<typename ConcreteType>
class adaptor : public interface
{
public:
// implementations of pure virtuals in
// base class
private:
ConcreteType adaptee;
};
The
adaptor
class has member functions which forward the call to the member function of
ConcreteType
. In this sense, this structure is similar in some respects to the Adaptor pattern [2], but it has differing motivations; it does not set out to convert the interface of
ConcreteType
in any way (although it could), only to provide
ConcreteType
with polymorphic behaviour. The client class, using a pointer to interface, can use it as if it were a
ConcreteType
.
In our
any_iterator
class,
ConcreteType
is an iterator. We need to provide in the interface those functions which will allow us to provide an iterator-like interface in
any_iterator
. A start is the following:
class wrapper
{
public:
virtual void next () = 0;
virtual std::string & current () const = 0;
virtual bool equal(const wrapper * rhs) const = 0;
virtual void assign(const wrapper * rhs) = 0;
};
template<typename Iterator>
class adaptor : public wrapper
{
public:
adaptor (const Iterator & rhs);
virtual void next ()
{
++adaptee;
}
virtual std::string & current () const
{
return *adaptee;
}
virtual bool equal (const wrapper * rhs)
{
return adaptee == static_cast<adaptor<Iterator> *>(rhs)->adaptee;
}
virtual void assign (const wrapper * rhs)
{
adaptee = static_cast<adaptor<Iterator> *>(rhs)->adaptee;
}
private:
Iterator adaptee;
};
Note the addition of a conversion constructor in adaptor’s interface; that will become clear shortly. There are a couple of gotcha’s here. The first one to go is the return from
current()
. This is the same contained type we already fixed in the
any_iterator
class by using a template parameter to the class. The easiest way to parameterise this one is to nest
interface
and
adaptor<>
inside
any_iterator
.
template<typename Contained>
class any_iterator
{
public:
// ...
private:
class wrapper
{
public:
virtual void next () = 0;
virtual Contained & current () const = 0;
virtual bool equal(const wrapper * rhs) const = 0;
virtual void assign(const wrapper * rhs) const = 0;
};
template<class Iterator>
class adaptor : public wrapper
{
// ...
};
};
The next problem is with
equal()
. The rhs pointer could have a different adaptee type to
this
– in our case, for example, a
list<>::iterator
when this one is a
vector<>::iterator
.
It is the polymorphic type which interests us, so we can make use of RTTI to determine consistency of type:
adaptor<Iterator> * tmp = dynamic_cast<adaptor<Iterator>*>(rhs);
return tmp && adaptee == tmp->adaptee;
The nature of using
dynamic_cast
on pointers means that
tmp
will be null if the conversion fails, i.e. the runtime type of
rhs
is different to
this
. To make this fail more noisily, we could have
adaptor::equal()
take a reference, and make
tmp
a reference, thus forcing
dynamic_cast
to throw
std::bad_cast
on failure.
Note that
assign()
also suffers from this problem, but we’ll address that one differently.
We now need access to this framework in the
any_iterator
class. The declarations are already nested; all that remains is to define such a thing.
template<typename Contained>
class any_iterator
{
public:
// ...
private:
class wrapper
{
// ...
};
template<typename Iterator>
class adaptor : public wrapper
{
// ...
};
wrapper * body;
};
Now the member functions of
any_iterator
can operate on body which will forward all requests to
adaptee
.
const any_iterator<Contained>
any_iterator<Contained>::operator++(int)
{
body->next();
return *this;
}
bool any_iterator<Contained>::operator==(const any_iterator & rhs)
{
return body->equal (rhs.body);
}
// ...
A Few Idioms More...
So far, so good. However, the functions we’ve created only work if we have an actual instance of
body
. Recall the converting constructor being a member template function. The template type of the Iterator is known at this point, and we can construct a new
adaptor
object accordingly:
template<typename Contained>
template<typename Iterator>
any_iterator<Contained>::any_iterator(const Iterator & rhs)
: body (new adaptor<Iterator>(rhs))
{ }
The copy constructor is more of a problem; we have no way of identifying the Iterator type for
rhs->body
any_iterator<Contained>::any_iterator(const any_iterator<Contained> & rhs)
{
// ???
}
and so have no way of creating a new instance of it. This identifies a need for a new idiom in the
adaptor
class: a Virtual Copy Constructor [3, 4, 5].
A constructor for a class cannot be virtual. Normally, you know the concrete type of the object being created, and so this doesn’t cause a problem. In some circumstances (here for example), we have only access to the base class type of the object we want to construct. It is meaningless to create an empty one, but perfectly reasonable to require a copy, or a clone. It forms part of the interface to the
wrapper
class
wrapper * wrapper::clone () const = 0;
and is implemented in
adaptor
:
template<typename Iterator>
wrapper * adaptor<Iterator>::clone () const
{
return new adaptor<Iterator>(adaptee);
}
This uses the already given conversion from
Iterator
(the type of
adaptee
), which has been used before in the conversion constructor of
any_iterator
. The return from
clone()
could be covariant, i.e. return a pointer to the actual class rather than a pointer to its base class, but there is no benefit in this, since the target for the returned pointer is a pointer to the base class; it won’t be used as part of an expression such as
Contained string_value = body->clone()->current();
(Aside to users of Borland C++Builder 4/5 – I originally declared a copy constructor for adaptor thus:
adaptor (const adaptor<Iterator> & rhs);
but the entire class refused to compile. Removing the template type from the
rhs
parameter solved the problem legally, a language construct I had not previously encountered – but the diagnostics were not helpful in reaching this conclusion!)
This allows us to write a copy constructor for
any_iterator
as
template<typename Contained>
any_iterator<Contained>::any_iterator(const any_iterator & rhs)
: body (rhs.body ? rhs.body->clone () : 0)
{ }
Having defined a copy constructor, we now need a copy assignment operator, which brings us to another idiom – Copy Before Release (CBR). Self assignment is not a problem in a class with only intrinsic data types (or those that provide the same copy semantics of intrinsics); but then, if only intrinsic data types are present, we can do without copy construction and copy assignment altogether, leaving the compiler to provide the necessary logic.
This actually leaves us at the Rule of Three – if you need any one of copy constructor, copy assignment operator, destructor, you generally need all three.[4, 6]
Since the data member of
any_iterator
is a pointer, we require a proper deep copy to be performed on assignment. We also need to ensure that assignment to self does not occur. Finally, assignment should be exception safe, meaning it will operate correctly in the presence of an exception.
The idiom which expresses all these things is Copy Before Release [7]. At its most basic level, using the current example, it is implemented as
any_iterator<Contained> &
any_iterator<Contained>::operator=(const any_iterator<Contained> & rhs)
{
wrapper * tmp = body->clone();
delete body;
body = tmp.body;
return *this;
}
As you can see, the name given this idiom is apt. It turns out that this can be simplified using a suitable copy constructor. We’ve already defined the copy constructor for
any_iterator
, and so can use that. Further more, exception safety can be guaranteed by using
std::swap()
.This leads to a remarkably simple implementation [8]:
any_iterator<Contained> &
any_iterator<Contained>::swap(any_iterator<Contained> & rhs)
{
std::swap (body, rhs.body);
return *this;
}
any_iterator<Contained> &
any_iterator<Contained>::operator=(const any_iterator<Contained> & rhs)
{
any_iterator<Contained> temp(rhs));
swap(temp);
return *this;
}
The
swap(
) member function merely swapping two pointers may look a little suspect, but the call to it provides the insight. We create a new copy of the
rhs
and swap it with this. When the call to
swap()
completes, the temporary object constructed in its arguments goes out of scope, taking its pointer with it. At that point, its
body
pointer is the memory which used to be attached to
this
, whereas
this->body
points at newly allocated memory containing a copy of
rhs
.
This safely handles null assignment, self assignment and exceptions (such as
std::bad_alloc
), as well as a successful copy. If an exception does occur, via the
any_iterator
copy constructor, then
this
will remain in its previous state, because the copy will not occur.
Conclusion
The whole premise of this technique revolves around being able to silently convert from a STL iterator to an
any_iterator
. In the wider sense [1], it can be used to superimpose polymorphic behaviour on other concrete types. This property of encouraging silent conversions is, rightly, regarded as dangerous (when C++ is described as a language which enables you to shoot yourself in the foot, this is one of the big shooters), but under specific conditions, such as those described here, it may be inescapable.
The circumstances under which it’s considered dangerous are generally where the converted type appears as a parameter to a function; however, this is exactly the circumstance under which this idiom is intended to be used. Here’s an example.
void csv_writer::write(
const any_iterator<std::string> &begin,
const any_iterator<std::string> & end)
{
// ...
}
// ...
csv_writer cw;
std::vector<std::string> records;
// initialisation of vector with strings here
// using "ordinary" STL iterators from vector.
// It's also perfectly valid to use
// reverse_iterators or const_iterators
cw.write (records.begin(), records.end());
// similarly, we can use istream_iterators
// because the basis for any_iterator is the
// Input Iterator, which is modelled by
// istream_iterator
std::ifstream strm(file_name);
cw.write(
std::istream_iterator<std::string>(strm),
std::istream_iterator<std::string> ());
This extract demonstrates the use of
any_iterator
. The intention is that client code never uses – indeed, probably doesn’t even need to know about – the
any_iterator
type directly, but is able to use it transparently because it exhibits properties to allow this.
Steve Love
Acknowledgements
My thanks to Kevlin Henney, who provided more than just “significant input” to both this article and the code that backs it, and Nigel Dickens, who provided further code and text reviews.
References
-
Cleeland, (1996) External Polymorphism, http://siesta.cs.wustl.edu/~cleeland/papers/External-Polymorphism/ExternalPolymorphism.html
-
Gamma, Helm, Johnson and Vlissides (1995), Design Patterns, Addison Wesley 3 Henney, Kevlin (1999) Overload, “Coping With Copying in C++ -Clone Alone” August 1999
-
Cline (1991) The C++ FAQ, http://www.cs.bham.ac.uk/ ~jdm/CPP/index.html
-
Koenig, Andrew and Barbara Moo (1997) Ruminations on C++, Addison Wesley/AT&T
-
Coplien, James O (1992) Advanced C++ Programming Styles & Idioms, Addison Wesley
-
Henney, Kevlin (1998) C++ Report, “Creating Stable Assignments”.
-
Sutter, Herb (1999) Exceptional C++, Addison Wesley
-
Leone, Sergio Spaghetti Western Director
-
The “post-increment and dereference” isn’t an operator in itself, but the expression
*iter++
uses bothoperator*()
andoperator++(int)
. It merely requires thatoperator++(int
) returns*this
(usually as aconst
reference). Since some standard library functions use this idiom,Input Iterators
are required to support it. (ISO, 1998). ↩