Previously we saw how to use some simple dynamic features in C++. Alex Fabijanic and Richard Saunders explore more powerful dynamic tools.
[Y]ou can’t build a system that is completely statically typed.
~ Bjarne Stroustrup [ Venners04 ]
In this installment of the ‘Dynamic C++’ series of articles, we continue to explore the dynamic solutions in C++ language. We start with
Boost
type_erasure
[
Boost.TypeErasure
], a combination of
Boost.Any
[
Boost.Any
] and
Boost.Function
[
Boost.Function
], addressing the C++ runtime polymorphism shortcomings. Next, we look into
Val
, a class at the heart of the
PicklingTools
library [
PicklingTools
] aimed at interaction with Python environments. We conclude with
Facebook
’s
folly
library solution for interfacing the world of web and JSON from C++ – the
dynamic
class.
Boost.TypeErasure
According to the author,
type_erasure
is a generalization of
Boost.Any
and
Boost.Function
classes, allowing easy composition of arbitrary type erased operations; it addresses the shortcomings of C++ runtime polymorphism, in particular:
- intrusiveness
- dynamic memory management
- inability to apply multiple independent concepts to a single object.
Library uses some advanced constructs such as concepts and template metaprogramming constructs from
boost::mpl
. In a similar fashion to
boost::variant
specifying a set of types at construction time that can be contained at runtime, the
type_erasure
library specifies at construction time a set of operations that can be performed on it at runtime. This is achieved through a vector of
concepts
provided at object declaration site as shown in Listing 1 for an
incrementable
and
ostreamable
object.
any< mpl::vector< copy_constructible<>, typeid_<>, incrementable<>, ostreamable<> > > x(10); ++x; // incrementable std::cout << x << std::endl; // ostreamable |
Listing 1 |
In the example,
copy_constructible
allows
copying
and
destruction
of the object, while
typeid_
provides
run-time type information
so that
any_cast
can be used; these effectively make
type_erasure any
equivalent to
any
[
Boost.Any
]. Additionally,
incrementable
and
ostreamable
concepts are specified, allowing incrementing and streaming of the value
x
. Operations can have arguments, so replacing
incrementable
concept
with
addable
allows adding of two
any
’s. The functionality is brittle in a subtle way, though – while adding two values of different types will compile, unfortunately it results in undefined behaviour at runtime. This problem can be alleviated by specifying
relaxed_match
concept (according to author, recently renamed to a more appropriate
relaxed
name), which causes exception to be thrown. The proper way of dealing with this problem is using
placeholders
, as shown in Listing 2.
int array[5]; typedef mpl::vector< copy_constructible<_a>, copy_constructible<_b>, typeid_<_a>, addable<_a, _b, _a> > requirements; tuple<requirements, _a, _b> t(&array[0], 2); any<requirements, _a> x(get< 0 > (t) + get< 1 >(t)); // x now holds array + 2 |
Listing 2 |
Placeholders are used extensively throughout the library. A placeholder is a substitute for a template parameter in a concept. The library automatically replaces all placeholders with the actual wrapped types.
Furthermore,
type_erasure
supports references (both
const
and non-
const
), as well as user-defined
concepts
. Listing 3 demonstrates adding
stringable
concept
to
type_erasure
, allowing a
to_string()
member function call syntax directly on an
integer
value wrapped in
any
. Things can be simplified when implementation and interface are the same (i.e. a member of
type_erasure::any
called
to_string
calls a
to_string
member of the contained type),
BOOST_TYPE_ERASURE_MEMBER
‘shortcut’ macro can be used.
template<class F, class T> struct to_string { // conversion function static T apply(const F& from, T& to) { return to = NumberFormatter::format(from); } }; namespace boost { namespace type_erasure { template<class F, class T, class Base> // binding struct concept_interface <::to_string<F, T>, Base, F> : Base { typedef typename rebind_any<Base, T>::type IntType; T to_string(IntType arg = IntType()) { return call(::to_string<C, T>(), *this, arg); } }; } typedef any<to_string<_self, std::string>, _self&> stringable; int i = 123; stringable s(i); std::string str = s.to_string(); // s == "123" |
Listing 3 |
Internally, a
void*
pointer
points to the held heap-allocated value and a static equivalent of
virtual table
serves as a
binding
for attached operations as shown in Listing 4.
// storage struct storage { storage() {} template<class T> storage(const T& arg) : data(new T(arg)) {} void* data; }; // binding of concept to actual type typedef ::boost::type_erasure::binding<Concept> table_type; // actual storage ::boost::type_erasure::detail::storage data; table_type table; |
Listing 4 |
Boost.Type erasure is an interesting and valuable ‘merger’ of
any
and
function
features, providing dynamic-language like features within the confines of standard C++. Implementation is rather complex and use has some non-intuitive weak spots that can quickly get an inexperienced user in serious trouble. A more robust interface (even if only with
_typedef_
s provided for most frequently used types) would greatly improve the usability and safety for less experienced users.
PicklingTools Val
The next reviewed solution to the dynamic typing problem is the PicklingTools [ PicklingTools ] Val [ Saunders1 ], [ Saunders2 ], [ Saunders3 ]. The PicklingTools library is an open-source library made up of Python, C++ and Java code allowing cross language communication. The PicklingTools evolved from a need to allow Python and C++ to share Python dictionaries across language boundaries. Many modern applications are built using multiple languages: Python, C++, Java, JavaScript, Lua, Icon/Unicon. The front-end languages (JavaScript, Python, Lua) tend to be dynamic languages for handling scripting and basic data flow. The back-end languages (C++, C, Java, FORTRAN) handle the heavy-lifting of fast communications, data I/O and CPU intensive work. In these hybrid systems, front-end languages need to communicate with the back-end languages intensively. Dictionaries, supported in some form by most dynamic languages, became the currency of many systems. The PicklingTools library solution focuses on the Python dictionary and C++.
The Val Name Rationale |
Why Val and not something like dynamic or any? Three reasons:
|
While making Python dictionaries easy to express in C++ was not a primary goal, given how much users enjoyed the ease of use, the C++ PicklingTools embraced them wholeheartedly. The goal, then, became to make Python dictionaries as easy to express in C++ as they are in Python.
Consider the ease of a dynamic dictionary manipulation in Python shown in Listing 5.
# Create a Python literal >>> d = { 'a': 1, 'b':2.2, 'c': ...{ 'X':1, 'Y':[1,2,3] }} >>> print d['a'] # lookup a single key: 'a' -> 1 >>> d['b'] = 3.3 # insert into dict # Also easy to lookup/insert nested entities >>> print d['c']['X'] # lookup nested key >>> d['c']['Y'] = 0 # insert nested } |
Listing 5 |
Because C++ is a statically-typed language, it requires a compile-time type for all variables. PicklingTools use Val to indicate a dynamic value.
A side-by-side comparison of basic dynamic typing in Python and C++ using Val is shown in Listing 6.
# Python >>> a = 1 >>> a = 2.2 >>> a = 'three' # a takes three different types // C++ Val a = 1; // overload constructor a = 2.2; // overload op= a = "three"; |
Listing 6 |
The
PicklingTools
Val is implemented as a
union
and a
type-tag
, where the value is constructed using
placement new
inside the
union
. The destructor has to manually notice which type to destruct (for non-POD types) and explicitly call the correct constructor. The Val is really just a dynamic container, with storage as shown in Listing 7. C++11 standard introduces
alignof
/
alignas
to address alignment concerns; in pre-C++11, although standard does not explicitly guarantee it and some experts discourage it [
GotW28
], a union with a
double
member is practically close enough guarantee of the alignment to the largest member of the union.
As can be concluded from Listing 7, by design
Val
can only contain certain types, shown in Listing 8.
struct Val { // Flags: the ascii typetag, subtype for arrays, char tag; char subtype; char isproxy; char pad; Allocator *a; // if using shared memory or // special allocator union { int_1 s; // type tag 's' int_u1 S; // type tag 'S' int_2 i; // type tag 'i' int_u2 I; // type tag 'I' int_4 l; // type tag 'l' int_u4 L; // type tag 'L' int_8 x; // type tag 'x' int_u8 X; // type tag 'X' real_4 f; // type tag 'f' real_8 d; // type tag 'd' complex_8 F; // type tag 'F' complex_16 D; // type tag 'D' char t[sizeof(Tab)]; // type tag 't', usually 32 char n[sizeof(Arr)]; // type tag 'n', usually 32 } u; }; |
Listing 7 |
// POD types int_1, int_u1, int_2, int_u2, int_4, int_u4, int_8, int_u8, real_4, real_8, complex_8, complex_16, size_t // Non-POD types string, None, Tab (like Python dictionary), Arr (like Python list) |
Listing 8 |
There are no-user defined types by design. This allows library writers to concentrate on making the interface for C++ Python dictionaries as close to Python as possible without worrying about the problems generality brings.
Listing 9 shows how easy it is to construct a
Val
from basic types and get values out by means of user-defined conversion.
// overload Val constructor on all supported types Val a = 1; Val b = 2.2; Val c = "three"; // Get a value out via user-defined conversions int_u4 i = a; |
Listing 9 |
The user-defined conversions and overloading are syntactic sugar. Of course, overloaded cast operators direct calls are really just taking advantage of ‘syntactic sugar’ for function calls as shown below:
Val v = 1; int_u4 i = v.operator int_u4();
The
Val
supports user-defined conversions for all basic types. If the conversion isn’t direct, then the user-defined conversions follow the
Principle of Least Surprise
: convert as C++ would (example below).
Val v = 3.14159265; int_u4 i4 = v; // Which conversion? As C++ would: // int_u4 i4 = static_cast<int_u4>(3.14159265);
If the conversion doesn’t make sense, behaviour is identical to that in Python – an exception is thrown, see Listing 10.
# Python >>> a = 3.14159265 >>> i4 = int(a) # converts to 3 >>> d = dict(a) # Exception! doesn't make # sense to convert to dict // C++ Val a = 3.14159265; int_u4 i4 = a; // converts to 3 Tab d = i4; // Compile-time error, can't // construct Tab from float |
Listing 10 |
The
Val
provides the basic container to support dynamic dictionaries. The C++
Tab
is equivalent to the Python
dict
(think
Tab == Table
). The keys of the C++
Tab
, aswell as the values of the dictionary are
Val
’s, allowing us to construct dynamic dictionaries:
# PythonId-1007573"> d = {'a':1, 'b':2.2, 'c':'three'} // C++ Tab d = "{'a':1, 'b':2.2, 'c':'three'}";
Note that the C++ literal is inside a string: the library overloads the constructor of the
Tab
to take a string which contains the literal. While C++11 has some great new literal constructs, it can’t quite mimic Python’s syntax.
Literal construction of a Python dictionary is supported using exactly the Python syntax: you can cut-and-paste the Python dictionary literal and paste it into the C++ quotes. There is a little Python dictionary parser embedded in the
Tab
class so that it recognizes the same syntax. Rather than invent a new syntax for literal construction, library leverages the well-known Python syntax.
Facebook folly::dynamic
The Facebook
folly::dynamic
[
Folly.Dynamic
] class is another one in the spectrum of dynamic-typing classes. The class aims to relax the static typing constraints, especially in the
JSON
format data manipulation scenarios; it provides a runtime dynamically typed value for C++, similar to the way languages with runtime type systems work (e.g. Python). It can hold types from a predetermined set (ints, bools, arrays of other dynamics, etc), similar to
boost::variant
and
PicklingTools
Val
, but the syntax is intended to be more akin to using the native type directly.
Facebook String Flavour |
Facebook
folly::fbstring
is a drop-in replacement for
std::string
, providing the benefit of significantly increased performance on virtually all important primitives. This is achieved by using a three-tiered storage strategy and cooperating with the memory allocator;
fbstring
is designed to detect use of jemalloc [
jemalloc
] and cooperate with it to significantly improve speed and memory usage.
Storage strategies
Implementation highlights
Supported architectures are x86 and x64; porting fbstring to big-endian architectures would require changes. |
An example of creating a
dynamic
holding most common types is shown below. Strings are stored internally as
fbstring
, the Facebook drop-in replacement for
std::string
.
dynamic twelve = 12; dynamic str = "string"; // fbstring dynamic nul = nullptr; dynamic boolean = false;
The library extensively uses C++11 features for both speed and syntactic advantages. For example, as shown in Listing 11, arrays can be initialized with
initializer lists
. This particular feature, however, also imposes a limitation –
dynamic
has no default constructor. The rationale for this design decision is due to the standard requirement for the expression
dynamic d = {}
to call default constructor. The conflict arises in the default construction either having to result in
d.isArray()
(a) being
false
for the expression
dynamic d = {}
or (b) being
true
for
dynamic d
. The solution the authors of
folly::dynamic
deemed most appropriate is to entirely disallow the default construction.
dynamic array = { "array ", "of ", 4, " elements" }; assert(array.size() == 4); dynamic emptyArray = {}; assert(array.empty()); |
Listing 11 |
Maps from dynamics to dynamics are called objects. As shown in Listing 12, the
dynamic::object
constant is how an empty map fromdynamics to dynamics is created. The same listing also shows how
dynamic
objects can be created by using
object::operator()
.
dynamic map = dynamic::object; map["something"] = 12; map["another_something"] = map["something"] * 2; dynamic map2 = dynamic::object ("something", 12)("another_something", 24); |
Listing 12 |
The internal
dynamic
storage is shown in Listing 13. Types that can be held are:
null
,
Array
,
bool
,
double
,
integer
(64-bit),
Object
and
String
.
enum Type { NULLT, ARRAY, BOOL, DOUBLE, INT64, OBJECT, STRING, }; // ... Type type_; union Data { explicit Data() : nul(nullptr) {} void* nul; // void* used instead of // std::nullptr_t due to gcc bug Array array; bool boolean; double doubl; int64_t integer; fbstring string; typename std::aligned_storage< sizeof(std::unordered_map<int,int>), alignof(std::unordered_map<int,int>) >::type objectBuffer; } u_; |
Listing 13 |
Examples of object and string construction are shown in Listing 14. Most notably,
ObjectImpl
is not a mere typedef but inherits from hash map; the reason for this to avoid undefined behavior of parameterizing
std::unordered_map
with an incomplete type.
struct dynamic::ObjectImpl : std::unordered_map<dynamic, dynamic> {}; // ... inline dynamic::dynamic(ObjectMaker (*)()) : type_(OBJECT) { new (getAddress<ObjectImpl>()) ObjectImpl(); } inline dynamic::dynamic(char const* s) : type_(STRING) { new (&u_.string) fbstring(s); } |
Listing 14 |
The gist of the
folly
’s conversion facilities is shown in the Listing 15. The listing shows the code involved in conversion of
dynamic
to
fbstring
. The actual code of converting ‘anything to anything’, as the documentation states, is in a separate header and too large for inclusion here. For binary/decimal and vice-versa conversion of IEEE doubles, the class uses V8 double-conversion [
Double.Conversion
].
template<> struct dynamic::GetAddrImpl<bool> { static bool* get(Data& d) { return &d.boolean; } }; template<> struct dynamic::GetAddrImpl<int64_t> { static int64_t* get(Data& d) { return &d.integer; } }; template<class T> T* dynamic::getAddress() { return GetAddrImpl<T>::get(u_); } template<class T> T* dynamic::get_nothrow() { if (type_ != TypeInfo<T>::type) { return nullptr; } return getAddress<T>(); } template<class T> T dynamic::asImpl() const { switch (type()) { case INT64: return to<T>(*get_nothrow<int64_t>()); case DOUBLE: return to<T>(*get_nothrow<double>()); case BOOL: return to<T>(*get_nothrow<bool>()); case STRING: return to<T>(*get_nothrow<fbstring>()); default: throw TypeError("int/double/bool/string", type()); } } inline fbstring dynamic::asString() const { return asImpl<fbstring>(); } |
Listing 15 |
As will be shown in one of the next installments,
dynamic
provides a very nice user interface, yet also provides a lot in terms of performance. It is a class designed with definite business goal in mind and it succeeds in that endeavor. The only downside for the whole
folly
library is a patchy build system which requires a significant effort to build the library. The library is also not portable, at least not in the out-of-the-box fashion.
Conclusion
In this installment, we reviewed three C++ dynamic typing solutions:
Boost
type_erasure
,
PicklingTools
Val
and
Facebook
folly::dynamic
. While
dynamic
and
Val
provide dynamically-typed storage within the confines of the standard C++,
type_erasure
also ventures in a new direction by adding operations to C++ types. In the next installment, we’ll look into more similar solutions, so stay tuned …
Credits
Steven Watanabe provided valuable advice on
boost::type_erasure
. The list is, of course, not inclusive - many other people, discussions, libraries and code samples were an indispensable source of help in gathering and systematizing this writing.
References and further information
[Boost.Any] http://www.boost.org/doc/libs/1_53_0/doc/html/any.html
[Boost.Function] http://www.boost.org/doc/libs/1_53_0/doc/html/function.html
[Boost.TypeErasure] http://www.boost.org/doc/libs/1_54_0/doc/html/boost_typeerasure.html
[Double.Conversion] ‘Double-conversion library’ https://code.google.com/p/double-conversion/
[Folly.Dynamic] Facebook folly library, dynamic class – https://github.com/facebook/folly/blob/master/folly/docs/Dynamic.md
[GotW28] The Fast Pimpl Idiom http://www.gotw.ca/gotw/028.htm
[jemalloc] A general-purpose scalable concurrent malloc(3) implementation http://www.canonware.com/jemalloc/
[PicklingTools] The PicklingTools Library http://www.picklingtools.com
[Saunders1] ‘Dynamic, Recursive, Heterogeneous Types in Statically-Typed Languages’, Clinton Jeffery, Richard Saunders, C++ Now 2013 Presentation, http://cppnow.org/session/dynamic-recursive-heterogeneous-types-in-statically-typed-languages/
[Saunders2] ‘Dynamic, Recursive, Heterogeneous Types in Statically-Typed Languages’ Clinton Jeffery, Richard Saunders http://cppnow.org/files/2013/03/saunders-jeffery.pdf
[Saunders3] C++ Now 2013 Presentation, Richard Saunders http://www.youtube.com/watch?v=W3TsQtnMtqg
[Venners04] ‘Abstraction and Efficiency: A Conversation with Bjarne Stroustrup’ by Bill Venners, February 16, 2004 http://www.artima.com/intv/abstreffi.html
[Wikipedia] Boyer–Moore string search algorithm http://en.wikipedia.org/wiki/Boyer%E2%80%93Moore_string_search_algorithm
Further information
‘Dynamic C++’, ACCU 2013 Conference http://www.slideshare.net/aleks-f/dynamic-caccu2013
Facebook folly library, fbstring class – https://github.com/facebook/folly/blob/master/folly/docs/FBString.md