Static and dynamic languages have different trade-off. Alex Fabijanic attempts to get the best of both worlds.
As to which is more important, Dynamic or static,
both are absolutely essential,
even when they are in conflict.
~ Robert Pirsig, Metaphysics of Quality [ Pirsig ]
C++ is a statically-typed language. The static nature of the C++ type system provides a data integrity ‘safety net’. The compiler is an indispensable runtime-surprise-prevention tool and the static nature of C++ provides runtime performance gain. Before we go any further, let’s clarify the nomenclature – static typing here should not be confused with its close relative strong-typing, see the definitions on the right for details. It is precisely the ‘weaknesses’ of the type system (in combination with polymorphism and templates) that provides the functionality needed for dynamic-like behavior within a statically typed language such as C++. And there are circumstances calling for a ‘softened’ type system, where a degree of performance sacrifice is acceptable and runtime type system relaxation desirable (or even necessary). To provide generic functionality automatically adaptable to different data types at runtime within the confines of standard C++, the runtime type detection system does not suffice – one has to resort to library solutions based on various techniques described later.
Static vs. dynamic |
This classification has to do with the timing of value-to-type attachment. Static means values are attached to types (‘compiled’) at compile time. Dynamic means they are attached (‘interpreted’) at runtime. Since C++ attaches values to types at compile, it follows that C++ is a statically typed language. |
Strong vs. weak |
This classification has to do with ‘loopholes’ the programming language type system leaves open for its type system to be ‘subverted’. Both C and C++ allow different types and pointers thereof to be cast to each other. While C++ is stricter than C, it is also backward compatible. But even without the C compatibility, C++ provides ways to subvert the type system and therefore can not be considered a strongly typed language. As a (non-exhaustive) example, void* and union disqualify C and C++ from strongly-typed qualification. |
Data from external data sources arrives in a variety of types and brings along the need for efficient and transparent datatype conversion. The proliferation of web-based interfaces and databases with the addition of popular textual formats such as JSON and XML exacerbates the need for a relaxed type system with transparent and safe conversion facilities. This is a domain where dynamic languages have gained a significant footing. In order to clarify the premises for this writing, we must take a brief detour here (see ‘C$$++ Dynamism’).
C++ dynamism |
To get a broader perspective, let’s look at the need for and benefits of ‘dynamic’ typing in C++. Even the dynamic language environments are ultimately built on a statically typed foundation. Yet, when the static/dynamic langugage interaction need arises, we embed those ‘foreign’ language environments and we must speak to them in a ‘foreign tongue’ (i.e. through a specialized translation layer). Wouldn’t it be nice to (a) smooth the rough edge between the two in a reusable way, while also (b) addressing the concern of dealing with external data of different types and (c) gain a generic-purpose standard C++ ‘dynamic’ environment natively and seamlessly, as a side-effect?
While standard C++ claims to be a general-purpose language, it stops abruptly at the point where (among other things) dynamic-language-like behavior is needed – as things stand at the time of this writing, even a well-known, half-way-there oldie like
The original spark triggering this systematic overview (
|
So, back on track – is it possible to provide dynamic-like behavior within the constraints of standard ANSI/ISO C++? How can a C++ programmer accurately and efficiently transfer data from a database to XML, JSON or HTML without stumbling over the rigid C++ static type-checking mechanism at compile time while ensuring accuracy at runtime? Can type-erasure and (checked) type-conversion techniques fit the bill? Given both historical (ANSI C union and
void*
, MS COM Variant,
boost::
[variant, any,
lexical_cast
], Qt QVariant,
adobe::any_regular
) and recent (
Boost.TypeErasure
, Facebook
folly::dynamic
) development trends (including the pending Boost.Any C++ standard proposal), the need for a way around the static nature of C++ language is obvious. Since the DynamicAny [
Fabijanic08a
,
Fabijanic08b
] article, some new solutions [
Folly
] have appeared, POCO [
POCO
] has seen several release cycles and
Poco::DynamicAny
is now known under a new name –
Poco::Dynamic::Var
. Additionally, the performance and type-safety of number/string conversion has been improved by replacing
sscanf
/
sprintf
-based conversion with double-conversion [
DoubleConversion
] (also used by
folly::dynamic
).
POCO
In this article series, both externals and internals of
boost::
[
variant
,
any
,
type_erasure
],
folly::dynamic
,
Poco::Dynamic::Var
, Qt QVariant and
adobe::any_regular
are explored and compared. Design, capabilities, ease of use as well as pros and cons of each solution will be examined. Performance benchmark comparisons results will be provided as well.
POCO |
POrtable COmponents C++ Libraries are:
In regards to Boost, in spite of some functional overlapping, POCO is best thought of as a Boost complement (rather than replacement). Side-by-side use of Boost and POCO is a very common occurence. |
We will start our journey through Dynamic C++ world with a smooth sail – simple, minimalistic and well-known
boost::any
, a ‘bipolar’ class with deceptively soft, entirely type-agnostic conception and surprisingly rigid, ultra-strongly typed delivery interface (or, should we say, lack thereof). As we move on, the journey takes us into the rough waters of solutions that endeavor, each in its own way, to provide dynamic facilities within the confines of standard C++ and its static type system. The solutions gradually build on existing foundations, attacking the problem from various angles while trying to keep size, performance and datatype integrity under control.
But, first things first – let us start by looking at the concerns shaping the solutions and the ingredients they're made of.
Dynamic concerns
What are the concerns involved with dynamic behavior and how are they solved? Let’s enumerate, disect and analyze them ...
Storing value
This concern has to do with the location where the actual bits representing the value reside. Within the C++ memory model, there are two distinct choices – heap and stack – and a hybrid between the two; more on this later. There are various memory allocation optimization methods that look just like heap allocation from programmer’s standpoint but actually allocate from different places; such constructs are beyond the scope of this article. Let us just mention here that the term ‘stack’ above should be used cautiously; it is very common to refer to placement new techniques constructing objects in a dedicated storage inside a class as ‘stack-based’; that convention is used in this article as well. It is, however, important to remember that there is nothing preventing an object of such class to be allocated on the heap.
Performing operations
The most frequently encountered operations are type conversions, between string and numeric or other values. Furthermore, there are assignment, arithmetic and logical operators. There are other language operations such as bitwise but those are not of concern for this article’s theme. Finally, there are various conversions or transformations; as we will see later, some solutions even provide capability to add custom operations to types at compile time.
Retrieving value
Value retrieval ranges from a strict requirement to explicitly specify the held type, to transparent conversion between different types, sometimes with runtime exceptions thrown if conversion is impossible; with some solutions, it is very easy to venture into undefined behavior if the user is not careful. Sometimes, built-in value retrieval is readily available, while in some cases the user is required to use pre-existing or provide custom external ‘scaffolding’ in order to extract the held value.
Runtime performance
From the runtime performance standpoint, there will typically be two concerns: heap memory allocation and conversion/transformation costs. From this aspect, anything that could be done at compile time, should. Additionally, as mentioned above, small object optimization affects runtime performance in both ways – positively when heap allocation is avoided and negatively every time the value is retrieved.
Memory usage
Memory usage will vary, from the exact type size (plus platform-dependent alignment, if applicable) to a fixed size, large enough to hold the largest stack-based type supported.
Code size
The binary code size generated by various solutions will mostly be proportional to the functionality provided. For example,
boost::any
code will be small due to non-existent conversion logic.
Poco::Dynamic::Var
code will be the largest, due to exhaustive involvement in type conversions and accuracy checks. The rest of the solutions are somewhere in between.
Ease of use
Last but not least, this concerns the user experience when dealing with dynamic functionality. Some solutions have rigid compile-time constraints, while some others may exhibit surprising runtime behavior. If the user has to understand its implementation details in order to use a software component correctly, the total value of the abstraction is diminished regardless of the implementation quality. Or, as Scott Meyers succintly puts it, “ Make interfaces easy to use correctly and hard to use incorrectly ”.
Data storage
We have several choices for where and how we store the data values. Again, each comes with its own implementation and concerns.
Storing the value on the heap
If the value resides on the heap, we will pay in runtime performance for memory allocation. However, the amount of memory will be variable, commensurate with the size of held type plus platform-dependent padding/alignment.
Implementation:
-
void*
andoperator new
This technique provides dynamic-like behavior by virtue of
void*
, a C language construct allowing pointers to unknown types. The defaultoperator new
allocates memory on the heap. Due to the type-independent nature ofvoid*
, the newly created entity can be of any type, so thenew operator
can construct the type needed in the allocated memory; note the difference in word order –new operator
first callsoperator new
and, after the memory is allocated, constructs the object. From that point on, it is up to the ‘dynamic’ solution and programmer to ensure the newly created type value is properly treated. There are some variations on this theme in later described solutions and we will examine them in due course.
Concerns:
-
Allocation overhead
Memory allocation on the heap can be an expensive runtime operation; for optimization purposes the language allows overloading of the
operator new
. This allows for various schemes of memory (pre)allocation that alleviate the performance hit imposed by the defaultoperator new
. -
Memory cleanup
Memory that was allocated on the heap by
new
must be released withdelete
.
Storing the value on the stack
If the value resides on the stack, we will invariably pay the storage size of the largest value we wish to store. As mentioned earlier, although commonly referred to as ‘stack’, a more appropriate name would be ‘internal’ because there is nothing preventing the creation of object on the heap.
Implementation:
-
union
+ tagThis technique utilizes the C++
union
facility. Unlikestruct
, whose members are laid out in memory next to each other,union
can only hold one value at a time because its data members overlap. Thisunion
feature provides the same storage location for different types – a feature that can be exploited for dynamic-type-like behavior at runtime without paying the full sum-of-storage price for all the types supported. Additional tag is needed to indicate the currently active union member. In C++03 standard, the limitation is that only POD and classes with trivial construction/destruction can be used. C++11 standard relaxes the only-trivial-construction/destruction limitation. -
union
+placement new
This technique utilizes the C++
union
in combination with placementnew
. Placementnew
does not allocate memory but only constructs object in pre-allocated storage. The reason for the use ofunion
is twofold:- there is a need for a special-purpose union member ensuring proper alignment for the largest type held when it is not known at compile time
- there are ‘hybrid’ solutions, mixing types known at compile time with ‘raw’ storage for the unknown types (placement-new-constructed at runtime); the tag indicating currently active type is necessary in this case
Concerns:
-
Size
Since a
union
must accomodate the largest type supported, it has to occupy at least the largest type size. -
Alignment
In practice, the amount of space needed is often more than largest
union
member size due to platform-dependent alignment requirements. This means that, if only smaller types are used, sometimes there may be some space not effectively used but consumed nevertheless. Alignment requirements and details are beyond the scope of this writing, but let us just mention here that it is a fairly complex topic, especially in the C++03 context; for details, see [ Sutter ]. -
Destruction
When the held object is placement-new constructed in pre-allocated storage, there is no need to explicitly call
delete
. This, however, means that the destructor has to be called explicitly by the programmer.
Using a hybrid solution
The hybrid solution (also known as small object optimization, configurable at compile time) compromises, to an extent, the stack size concern in order to avoid the heap allocation penalty for types under certain size; this solution, however, imposes runtime penalties of size inspection (a) before instantiation and (b) at every value retrieval.
Implementation:
-
Small Object Optimization
This is a combined technique of heap- and stack-based storage strategies. The programmer decides and specifies at compile time the maximum object size that can be created on the stack. At runtime, based on the compile-time value, the decision is made whether the new object will be constructed on the stack or the storage for it to be constructed will be allocated on the heap.
Concerns:
-
Runtime detection performance
Obviously, every creation and retrieval of the value will incur the penalty of the value location detection. There are additional difficulties with assignment and swap operations as well as with exception safety.
-
Stack use
The fixed stack space is used indiscriminately, even when the value is allocated on the heap (in which case, the stack space usually serves as the pointer storage).
Operations
Finally, we get to choose what sort of operations can be applied to these types. These choices will strongly affect the usability of the types so care and a deep understanding must be used.
Type conversions
Type conversions are the most frequently encountered operations, the most frequent conversions being those between numbers and strings. Conversions between compatible types (e.g.
short
to
int
) can often be performed statically. If static conversion is not possible, then dynamic functionality must take its place; this typically involves parsing a string to generate a corresponding number or vice-versa – formatting a number into string. Not all solutions described here are equally cooperative in this area; they range from those not providing any (no pun intended) conversions, via those providing accompanying mechanisms for defining conversion facilities to those providing built-in conversions.
Standard language operations (+, -, ==, ...)
These operations are indispensible for built-in types. They can also be brittle due to many runtime cases where they may make no sense for the held types/values. Therefore the choice is to either not provide them at all, or provide them and throw exception at runtime if the attempted operation makes no sense for the current values. The latter behavior is consistent with the way a dynamic language would behave.
Custom operations
This is an area where things get really complicated – ‘dynamically’ attaching operations to types that do not ‘natively’support them. There are some solutions that provide this functionality. There are also some pitfalls. These will be analyzed and discussed later.
Ingredients
The ‘ingredients’ for the dynamic functionality within C++ ‘recipe’ are summarized in the following list:
- new
- placement new
- void*
- union
- virtual functions
- templates
From the entities listed above, we already discussed
new
and
union
; the ones that were not touched on so far are virtual functions and templates.
-
Virtual functions
Virtual functions are, of course, an indispensable mechanism for runtime polymorphism, providing objects with identical interface that behave differently. They help tremendously in defining conversions and other operations, where it is very convenient to provide default behavior (often throwing an exception) in the parent class and appropriately override it in descendants. Virtual functions inflict both size and performance penalty.
-
Templates
Templates are another powerful C++ mechanism providing compile-time genericity. When combined with other facilities described here, templates can produce very powerful (but often complicated) programming constructs.
Boost.Any
This well-known class has been around for a long time; at the time of this writing, it is an active proposal for standardization [
Dawes12
]. According to proposal authors,
std::any
is a container for “
Discriminated types that contain values of different types but do not attempt conversion between them
”. This classifies any as a generic (in the sense of ‘general’, not template-based) solution for the first half of the problem – how to accommodate any type in a single container. The ‘syntactic sugar’ is avoided template syntax – any itself is not a template class but it has a template constructor and assignment operator; this is conveniently used to avoid the aesthetically displeasing angle brackets:
any a = "42"; any b(42);
What happens ‘under the hood’ is:
- at compile time, assignment (or construction) code for the appropriate type is generated
- at run time, the value is assigned to a polymorphic holder instantiated on the heap.
See Listing 1 for an example.
template<typename ValueType> any(const ValueType & value):content (new holder<ValueType>(value)) { } |
Listing 1 |
Runtime dynamism is achieved through polymorphism as shown in Listing 2.
class placeholder { public: virtual ~placeholder() { } // ... virtual const std::type_info & type() const = 0; // ... } template<typename ValueType> class holder : public placeholder { public: holder(const ValueType & value):held(value) { } // ... ValueType held; }; |
Listing 2 |
Right away, it is obvious that assignment will incur a performance penalty due to heap allocation, and a size/performance penalty due to virtual inheritance of the internal placeholder. The convenience of
any
extends from the construction/assignment moment during its lifetime and stops the moment one wants to retrieve the value. Until then,
any
looks and acts like, well – any value. While it works in a wonderfully transparent manner on the assignment side, the data extraction side is out of
any
’s ‘scope of supply’ – the class does not offer value retrieval or type conversion functionality; the only way to retrieve the value is through
any_cast
– a set of free-standing functions that either return the value of the exact held type or throw if something else is requested.
Poco::Dynamic::Var
takes off where
any
stops, providing user-extensible conversion facilities for non pre-specialized types; the design, rationale, use and performance of this class hierarchy is described in a later installment of this series of articles.
Poco::Any
[
POCO.Any
] is a port of
Boost.Any
to POCO.
Boost.Variant
According to the authors [
Boost.Variant
],
Boost.Variant
class template is “
a safe, generic, stack-based discriminated union container, offering a simple solution for manipulating an object from a heterogeneous set of types in a uniform manner
”. It determines the needed storage at compile time, uses
boost::mpl
and limits the runtime capabilities to types defined at compile time.
The performance penalty of Boost.Any creation and polymorphic nature, as well as its incapability to provide reliable compile-time type detection, were the motivating factors for
boost::variant
authors. For that reason, variant is stack-based and provides reliable compile-time type detection and value extraction. There is a caveat – to enforce the ‘never empty’ requirement, variant may temporarily allocate storage on the heap to keep the old value for the case when an exception being thrown during assignment. The authors claim to have plans for alleviating this shortcoming.
Faced with a
boost::variant
, hoping it comes with built-in (or at least accompanying) type conversion facilities, a naïve user may try something like this:
variant<int, string> v = 1; string s = v; // compile error boost::get<std::string>(v); // throws
While
Boost.Variant
offers slightly more cooperation than Boost.Any on the extraction side, it is not seamless or without dangers – intuitive code won’t compile, while the next simplest way is brittle. Authors admit the shortcomings and brittleness of the above approach and provide a visitor mechanism as a vehicle to unleash the full strength of Boost.Variant. The visitor is created by inheriting from the
boost::static_visitor<>
class template (see Listing 3).
class my_visitor : public boost::static_visitor<int> { public: int operator()(int i) const { return i; } int operator()(const std::string & str) const return str.length(); } }; int main() { boost::variant< int, std::string > u ("hello world"); std::cout << u; // output: hello world int result = boost::apply_visitor( my_visitor(), u ); std::cout << result; // output: 11 (i.e., length of "hello world") } |
Listing 3 |
In order to provide the type conversions however, user must define a visitor per destination type, e.g. to facilitate the most common conversion between numbers and strings, the following minimal set of classes is needed, as seen in Listing 4.
struct string_int_converter : public boost::static_visitor<int> { int operator()(int i) const; int operator()(const std::string & str) const; int operator()(double d) const; }; struct string_dbl_converter : public boost::static_visitor<double> { double operator()(int i) const; double operator() (const std::string & str) const; double operator()(double d) const; }; struct num_string_converter : public boost::static_visitor<std::string> { std::string operator()(int i) const; std::string operator() (const std::string& str) const; std::string operator()(double d) const; }; |
Listing 4 |
Internally, the variant data is in-place constructed into the storage allocated at compile time and large enough to accommodate the largest datatype specified; storage is a union of char array plus alignment padding (see Listing 5).
variant<char>: (1) 8 variant<int>: (4) 8 variant<float>: (4) 8 variant<double>: (8) 16 variant<std::string>: (32) 40 |
Listing 5 |
Typically, depending on the size of the held type, there will be some extra space used (e.g. on 64-bit Win8/VS2012),
variant<char>
will occupy 8 bytes, while
variant<std::string>
will occupy 40 (see Listing 6).
template <std::size_t size_, std::size_t alignment_> struct aligned_storage_imp { union data_t { char buf[size_]; typename mpl::eval_if_c< alignment_ == std::size_t(-1) , mpl::identity<detail::max_align> , type_with_alignment<alignment_> >::type align_; } data_; void* address() const { return const_cast<aligned_storage_imp*>(this); } }; |
Listing 6 |
The most significant constraint of Boost.Variant is that it can only accept a predefined set of types. If a type is not explicitly listed in the declaration of the variant variable via a template instantiation, that type can not be assigned to it. The never-empty guarantee imposes some significant limitations and there is a discussion going on as to whether it is a reasonable constraint to start with. Default constructing variant as empty would alleviate this problem but it would also introduce the problem of always having to deal with empty in the visitors.
Comparison between Boost.Variant and Boost.Any
For easier understanding of the concepts behind the two classes described so far,
boost::any
is often compared to ‘type-safe
void*
’ whereas
boost:variant
is compared to ‘type-safe
union
’. While there are certainly similarities, this comparison should be taken cautiously.
Boost.Variant advantages over Boost.Any:
- guarantees the type of its content is one of a finite, user-specified set of types
- provides compile-time checked generic visitation of its content (Boost.Any provides no visitation mechanism at all; even if it did, it would need to be checked at run-time)
- offers an efficient, stack-based storage scheme (avoiding the overhead of dynamic allocation).
Boost.Any advantages over Boost.Variant:
- allows any type for its content, providing great flexibility
- provides the no-throw guarantee of exception safety for its swap operation
- no template meta-programming techniques, which avoids potentially hard-to-read error messages and significant compile-time processor and memory demands.
Conclusion
The analysis of the available techniques, solutions ‘ingredients’ and trade-offs for dynamic-language-like functionality within standard C++ was provided. Two existing solutions,
boost::variant
and
boost::any
were described and compared. In the next installment, we will look into more existing solutions with designs/funactionality comparisons and performance benchmarks. Stay tuned ...
Credits
Help in assembling and systematizing the information in this article came from numerous sources. Kevlin Henney provided feedback and constructive discussions on
boost::any
and the topic in general. Steven Watanabe provided valuable help and guidance on
boost::variant
and
boost::type_erasure
, which will be presented in the Part II. Günter Obiltschnig and Andrei Alexandrescu provided valuable feedback and encouragement. The list is, of course, not exhaustive – many other people, discussions, libraries and code samples were an indispensable source of help in gathering and systematizing the information provided in this article.
References
Boost.Any] ‘Boost.Any’, Boost C++ libraries, Kevlin Henney, http://www.boost.org/doc/libs/1_53_0/doc/html/any.html
[Boost.Variant] ‘Boost.Variant’, Boost C++ libraries, Eric Friedman and Itay Maman, http://www.boost.org/doc/libs/1_52_0/doc/html/variant.html
[Dawes12] ‘Any Library Proposal’, Revision 1, Beman Dawes and Kevlin Henney (2012) http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2012/n3390.html
[DoubleConversion] ‘Double-conversion library’, https://code.google.com/p/double-conversion/
[Fabijanic08a] ‘DynamicAny’, Part I, Alex Fabijanic, Overload 86, August 2008, http://accu.org/index.php/journals/1502
[Fabijanic08b] ‘DynamicAny’, Part II, Alex Fabijanic, Overload 87, October 2008, http://accu.org/index.php/journals/1511
[Folly] Facebook folly library, dynamic class, https://github.com/facebook/folly/blob/master/folly/docs/Dynamic.md
[Pirsig] ‘A brief summary of the Metaphysics of Quality’, Robert Pirsig (2005) http://robertpirsig.org/MOQSummary.htm
[POCO] POCO C++ libraries: http://pocoproject.org/
[POCO.Any] Poco::Any, http://pocoproject.org/docs/Poco.Any.html
[Sutter] ‘Construction Unions: A C++ Challenge’, http://www.informit.com/articles/article.aspx?p=360435
More information
[ACCU13] ‘Dynamic C++’, ACCU 2013 Conference , http://www.slideshare.net/aleks-f/dynamic-caccu2013
[C#] ‘Using Type dynamic’, C# Programming Guide , http://msdn.microsoft.com/en-us/library/dd264736.aspx