Christopher Baus explains the advantages of using a pooled memory allocation strategy for high performance applications.
By default, the C++ list container allocates memory with the global operator new when elements are added to the list, and frees memory with the global operator delete when elements are removed. While this provides a convenient interface for most applications, high performance and long lived applications could benefit from a pooled memory allocation strategy, where all memory used by the list is preallocated at list creation time.
Most default implementations of the operators new and delete resolve to a direct call to the underlying system heap. This has the following drawbacks:
- List insertion operations may throw indeterminately.
- System heap operations create contention between otherwise unrelated threads.
- System heap operations require an often expensive system call.
- Frequent allocations and deallocations of small objects can result in fragmentation, and inefficient use of memory.
Kevlin Henney [ Henney ] says this regarding collection memory management:
If you are serious about managing the allocation of a container, then get serious and manage it: Write your own container type.
The standard list allocator interface is the logical starting point for implementing a pooled list container, but as Pablo Halpern noted in his proposal to the C++ standards committee [ Halpern, 2005 ], there some inconsistencies in the standard. The handling of allocators which compare unequal is currently under specified in C++, as is noted in issue 431 of the C++ Standard Library Active Issues List [ C++ Active Issues ]. While Howard E. Hinnant [ Hinnant ] provides guidance for a solution, it currently is not part of the standard library. Instances of pool allocators of the same type compare unequally when they allocate from different memory pools, hence it isn't possible to implement satisfactory pooled allocators given the current standard. This leaves Kevlin's suggestion to implement a custom container.
This article investigates implementing a pooled list container using the C++ standard list as the underlying data store. The objective is to provide pooling whilst delegating most functionality to the standard library.
The Interface
The pooled_list class provides a simple and familiar interface to C++ developers. With a few exceptions, pooled_list s behave similarly to standard list s. When using pooled_list , the user must first create a pool as follows:
size_t pool_size = 256; pooled_list<int>::pool list_pool(pool_size);
The pool allocates memory for both the element data and the list 's internal structure. No additional allocations will be made after the list_pool has been constructed.
One or more list s are attached to the pool:
pooled_list<int> my_list(list_pool);
The user can then operate on the list just as a standard list . For instance:
my_list.push_back(5); my_list.pop_front();
Exceptions and error handling
Using the standard list container insert operations can fail unpredictably. If memory is not available to complete the operation they will throw std::bad_alloc . In comparison, the pooled_list container provides deterministic failure. Insert operations only throw on a pooled_list with a depleted pool . Since the pooled_list takes a reference to a pool , pools must have a lifetime greater than all the lists from which they are created. If a pool is destroyed before the lists which are created from it, the behaviour is undefined.
Since memory is allocated from the C++ runtime heap to create pool s, pool construction may throw std::bad_alloc .
The semantics of operations which act upon multiple lists are affected by pooling. The operations including the overloading of merge() and splice() require that pooled_list s are allocated from the same pool. The condition is checked with an assert and violation results in undefined behaviour. These semantics are borrowed directly from Howard E. Hinnant's recommendation for swapping containers with unequal allocators [ Hinnant ].
To provide the strong exception guarantee, the assignment operator is implemented by creating a temporary copy of the right hand list, and then swapping the temporary with the left hand list. The pools of the left hand side and right hand side are also swapped (again as recommended by Hinnant). This requires that the right hand side list's pool contains at least as many free elements as are used by the right hand side list, even though fewer elements maybe required upon completion of the operation.
In explicit terms, the assignment operation involves three lists: the list being assigned from ( right ), the initial list being assigned to ( left ), and the final state of the list being assigned to ( left' ). Following the operation both left' and right will use the same pool , even if left has used a different pool . All elements will be allocated from the pool used by right . The assignment operator, again, to provide the strong exception guarantee, creates left' (as a copy of right ) before left is destroyed. For the operation to succeed, the pool used by right must contain at least the number of elements in right . When right contains more elements than left , and uses the same pool as left , it is not sufficient for the pool used by right to contain right minus left number of elements, even though that number of elements will be used (in conjunction by left' and right ) after the operation completes.
Concurrency
The global heap is a resource which is shared among all threads, and access to the heap is often serialized by the underlying operating system to prevent corruption. Microsoft [ Microsoft ] describes the problem and offers the following with respect to Windows:
A single global lock protects the heap against multi threaded usage... This lock is essential to protecting the heap data structures from random access across multiple threads. This lock can have an adverse impact on performance when heap operations are too frequent.
In all the server systems (such as IIS, MSProxy, DatabaseStacks, Network servers, Exchange, and others), the heap lock is a BIG bottleneck. The larger the number of processors, the worse the contention.
Heap access causes contention between threads which would otherwise be unrelated. The standard lists insert() and erase() operations require heap access, and hence can cause contention. Consider the following example: a program contains two threads in which each thread creates a list which is only accessed by that thread. While it is safe for either thread to insert or erase elements from its respective list, access to the heap is serialized by those operations. The result is reduced performance even though there is no data shared between the threads. A pooled_list does not access the heap directly after the pool is created, so there is no contention between lists.
Like the STL, the pooled_list class makes few guarantees about thread safety. There is no thread synchronization in the pooled_list class, hence multiple threads can not concurrently insert or erase elements in lists which share the same pool. This requires that users externally synchronize list operations when one or more lists which use the same pool are accessed from multiple threads. If pooled_list s do not share pools, there is no need to synchronize access to pooled_list s.
The implementation
Linked lists impose storage overhead beyond contiguous arrays due to their node structure. Typically a list node holds pointers to the next node in the list and the previous node in the list . A list node could be defined as shown in Listing 1.
template<typename T> struct node { node(T initial_value):element_(initial_value){} node* next_; node* previous_; T element_; } |
Listing 1 |
List allocation requires not only memory for the element data, but the list node structure as well. This is why the specification requires that allocators provide a rebind() [ Rebind ] implementation, which allows them to allocate node types. Providing allocators that only return elements of type T is insufficient. In classical C linked list implementations, such as the one provided by the Linux kernel [ LinkedList ], the user provides user allocated nodes at insertion time. With the C++ standard list , nodes are abstracted from the user.
The goal of the pooled_list implementation was to preallocate not only element data, but the node data, and hence all the memory used by the list . This can be achieved by implementing the list from scratch - creating new node types and implementing all the list operations. For the sake of expediency and correctness, I chose a hybrid approach which uses the standard list itself as the underlying structure for the pooled_list .
The pooled_list implementation uses two standard lists: the free and active lists. When created, the free list contains n number of elements and the active list is empty. When elements are inserted, the required nodes are moved from the free list to the active list. When elements are erased, the released nodes are moved from the active list to the free list. This is accomplished with the standard list's splice() operation, which allows lists to be combined in constant time without memory allocation. While this is a rudimentary implementation, it does offer some challenges in correctly providing value semantics.
A naive implementation
The structure of my first attempted implementation was similar to Listing 2.
template<typename T> class pooled_list { ... private: std::list<T> free_list_; std::list<T> active_list_; }; |
Listing 2 |
While it will work for built-in types and PODs [ POD ], it results in the constructors for objects being called when the free list is created, not when elements are inserted. Likewise, destructors for the elements are called when the free list is deleted and not when elements are erased. To solve this problem the underlying lists must contain unstructured data, which leads to the next attempt, Listing 3.
template<typename T> class pooled_list { ... iterator insert(iterator pos, const T& value) { if(!free_list_->empty()){ // use inplace new on the raw data // and splice from the free list to // the active list ... } else{ throw std::bad_alloc(); } return --pos; } ... private: std::list<char[sizeof(T)]> free_list_; std::list<char[sizeof(T)]> active_list_; }; |
Listing 3 |
By using standard lists of arrays of char the construction/destruction of elements can be constrained to insert and erase operations, properly implementing value semantics. Unfortunately this leads to some subtle alignment problems. Internally to the standard list , the char array is a member of the node, as shown in the node code example in Listing 3, and C++ does not guarantee all types T will be properly aligned. Stephen Cleary [ Cleary ] provides further discussion of alignment in his documentation for the boost pool library.
Lists of type std::list<char*> are used by the final implementation which is based on the following from Cleary's discussion:
Any block of memory allocated as an array of characters through operator
For an example, see Listing 4.
template<typename T> class pooled_list { ... pooled_list():free_pool_(pool_size, 0) { std::list<char*>::iterator cur; std::list<char*>::iterator end = free_list_.end(); for(cur = free_list_.begin(); cur != end; ++cur){ *cur = new char[sizeof(T)]; } } ... iterator insert(iterator pos, const T& value) { if(!free_list_->empty()){ // use inplace new on the raw data, // and splice from the free list to the // active list ... } else{ throw std::bad_alloc(); } return --pos; } ... private: std::list<char*> free_list_; std::list<char*> active_list_; }; |
Listing 4 |
The final implementation differs slightly in that the free_list_ is moved to a separate pool class which allows it to be shared by multiple pooled_list s. The alignment workaround does impose one pointer's worth of space overhead per element for each node used in free and active lists. This could be avoided by developing a custom list rather than using the standard list as a base implementation.
List iterators
Since the underlying list is of type std::list<char*> and not std::list<T> , iterators to the active list can not be used directly. The data must be dereferenced and cast to the type T . The boost iterator library is employed to perform the operation. This greatly simplifies the implementation at the cost of a dependency on the boost iterator library (see Listing 5).
template<typename T> class pooled_list { ... // Functor to convert iterator to underlying data type to type T. class reinterpret_raw_data_ptr { public: typedef T& result_type; T& operator()(raw_data_type& raw_data) const { return *reinterpret_cast<T*>(raw_data); } }; typedef char* raw_data_type; typedef std::list<raw_data_type> raw_data_ptr_list; typedef typename std::list<raw_data_type>::iterator raw_data_ptr_list_it; typedef boost::transform_iterator<reinterpret_raw_data_ptr, typename raw_data_ptr_list::iterator, T&, T > iterator; ... }; |
Listing 5 |
Potential enhancements
As noted in the threading section, multiple threads can not concurrently insert or erase elements in lists which share the same pool . I chose to not impose the overhead of thread synchronization by default. I do not recommend sharing pool s across threads, but this could be supported by adding a synchronization policy to the pool with no additional overhead for the default case.
Element data is allocated by the pool using new[] . This might not be sufficient for all use cases (for instance if the user wants to allocate elements from an arena or global pool ). This could be also be addressed by adding an allocation strategy to the pool . It should be noted that because the standard list is used as the underlying data structure, it would be difficult to change the allocation strategy of the node structures. Providing an alternate strategy to allocate list nodes would require a reimplementation of the list structure.
Conclusion
The C++ library specification currently requires developing custom containers to implement allocators with shared state. While it might be possible to develop pool allocators which work with existing standard library implementations, it is not be possible to guarantee that the pool allocators would work correctly across library implementations. As Kevlin says, if you are serious about memory management, you must rely on custom containers. Pool allocation is a proven strategy for many long lived, concurrent applications with high reliability and performance requirements such as network servers. The provided implementation provides a simple solution which successfully leverages the standard library for most operations. The result is a pooled list container that is compatible with any standard compliant standard library implementation.
Acknowledgements
Thanks to C++ guru Thomas Becker for discussion on data alignment and help with const_iterator syntax.
References
[ C++ Active Issues ] Issue 431, C++ Standard Library Active Issues List : http://www.open-std.org/JTC1/SC22/WG21/docs/lwg-active.html#431
[ Cleary ] Stephen Cleary : http://www.boost.org/libs/pool/doc/implementation/alignment.html
[ Halpern, 2005 ] Proposal to the C++ standards committee : http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2005/n1850.pdf
[ Henney ] Dr Dobb's Journal : http://www.ddj.com/dept/cpp/184403779
[ Hinnant ] Hinnant, H.E. : http://www.open-std.org/jtc1/wg21/docs/papers/2004/n1599.html
[ LinkedList ] Classical C linked list : http://isis.poly.edu/kulesh/stuff/src/klist/
[ Microsoft ] Microsoft : http://msdn2.microsoft.com/en-us/library/ms810466.aspx
[ POD ] PODs : http://www.fnal.gov/docs/working-groups/fpcltf/Pkg/ISOcxx/doc/POD.html
[ Rebind ] Rebind documentation : http://msdn2.microsoft.com/en-us/library/5fk3e8ek.aspx