Boost C++ Libraries

...one of the most highly regarded and expertly designed C++ library projects in the world. Herb Sutter and Andrei Alexandrescu, C++ Coding Standards

PrevUpHomeNext

Pool in More Depth

Basic ideas behind pooling
Simple Segregated Storage
Guaranteeing Alignment - How we guarantee alignment portably.
How Contiguous Chunks are Handled
Simple Segregated Storage (Not for the faint of heart - Embedded programmers only!)
The UserAllocator Concept

Dynamic memory allocation has been a fundamental part of most computer systems since roughly 1960... 1

Everyone uses dynamic memory allocation. If you have ever called malloc or new, then you have used dynamic memory allocation. Most programmers have a tendency to treat the heap as a magic bag": we ask it for memory, and it magically creates some for us. Sometimes we run into problems because the heap is not magic.

The heap is limited. Even on large systems (i.e., not embedded) with huge amounts of virtual memory available, there is a limit. Everyone is aware of the physical limit, but there is a more subtle, 'virtual' limit, that limit at which your program (or the entire system) slows down due to the use of virtual memory. This virtual limit is much closer to your program than the physical limit, especially if you are running on a multitasking system. Therefore, when running on a large system, it is considered nice to make your program use as few resources as necessary, and release them as soon as possible. When using an embedded system, programmers usually have no memory to waste.

The heap is complicated. It has to satisfy any type of memory request, for any size, and do it fast. The common approaches to memory management have to do with splitting the memory up into portions, and keeping them ordered by size in some sort of a tree or list structure. Add in other factors, such as locality and estimating lifetime, and heaps quickly become very complicated. So complicated, in fact, that there is no known perfect answer to the problem of how to do dynamic memory allocation. The diagrams below illustrate how most common memory managers work: for each chunk of memory, it uses part of that memory to maintain its internal tree or list structure. Even when a chunk is malloc'ed out to a program, the memory manager must save some information in it - usually just its size. Then, when the block is free'd, the memory manager can easily tell how large it is.

Dynamic memory allocation is often inefficient

Because of the complication of dynamic memory allocation, it is often inefficient in terms of time and/or space. Most memory allocation algorithms store some form of information with each memory block, either the block size or some relational information, such as its position in the internal tree or list structure. It is common for such header fields to take up one machine word in a block that is being used by the program. The obvious disadvantage, then, is when small objects are dynamically allocated. For example, if ints were dynamically allocated, then automatically the algorithm will reserve space for the header fields as well, and we end up with a 50% waste of memory. Of course, this is a worst-case scenario. However, more modern programs are making use of small objects on the heap; and that is making this problem more and more apparent. Wilson et. al. state that an average-case memory overhead is about ten to twenty percent2. This memory overhead will grow higher as more programs use more smaller objects. It is this memory overhead that brings programs closer to the virtual limit.

In larger systems, the memory overhead is not as big of a problem (compared to the amount of time it would take to work around it), and thus is often ignored. However, there are situations where many allocations and/or deallocations of smaller objects are taking place as part of a time-critical algorithm, and in these situations, the system-supplied memory allocator is often too slow.

Simple segregated storage addresses both of these issues. Almost all memory overhead is done away with, and all allocations can take place in a small amount of (amortized) constant time. However, this is done at the loss of generality; simple segregated storage only can allocate memory chunks of a single size.

Simple Segregated Storage is the basic idea behind the Boost Pool library. Simple Segregated Storage is the simplest, and probably the fastest, memory allocation/deallocation algorithm. It begins by partitioning a memory block into fixed-size chunks. Where the block comes from is not important until implementation time. A Pool is some object that uses Simple Segregated Storage in this fashion. To illustrate:

Each of the chunks in any given block are always the same size. This is the fundamental restriction of Simple Segregated Storage: you cannot ask for chunks of different sizes. For example, you cannot ask a Pool of integers for a character, or a Pool of characters for an integer (assuming that characters and integers are different sizes).

Simple Segregated Storage works by interleaving a free list within the unused chunks. For example:

By interleaving the free list inside the chunks, each Simple Segregated Storage only has the overhead of a single pointer (the pointer to the first element in the list). It has no memory overhead for chunks that are in use by the process.

Simple Segregated Storage is also extremely fast. In the simplest case, memory allocation is merely removing the first chunk from the free list, a O(1) operation. In the case where the free list is empty, another block may have to be acquired and partitioned, which would result in an amortized O(1) time. Memory deallocation may be as simple as adding that chunk to the front of the free list, a O(1) operation. However, more complicated uses of Simple Segregated Storage may require a sorted free list, which makes deallocation O(N).

Simple Segregated Storage gives faster execution and less memory overhead than a system-supplied allocator, but at the loss of generality. A good place to use a Pool is in situations where many (noncontiguous) small objects may be allocated on the heap, or if allocation and deallocation of the same-sized objects happens repeatedly.

Terminology

Review the concepts section if you are not already familiar with it. Remember that block is a contiguous section of memory, which is partitioned or segregated into fixed-size chunks. These chunks are what are allocated and deallocated by the user.

Overview

Each Pool has a single free list that can extend over a number of memory blocks. Thus, Pool also has a linked list of allocated memory blocks. Each memory block, by default, is allocated using new[], and all memory blocks are freed on destruction. It is the use of new[] that allows us to guarantee alignment.

Proof of Concept: Guaranteeing Alignment

Each block of memory is allocated as a POD type (specifically, an array of characters) through operator new[]. Let POD_size be the number of characters allocated.

Predicate 1: Arrays may not have padding

This follows from the following quote:

[5.3.3/2] (Expressions::Unary expressions::Sizeof) ... When applied to an array, the result is the total number of bytes in the array. This implies that the size of an array of n elements is n times the size of an element.

Therefore, arrays cannot contain padding, though the elements within the arrays may contain padding.

Predicate 2: Any block of memory allocated as an array of characters through operator new[] (hereafter referred to as the block) is properly aligned for any object of that size or smaller

This follows from:

  • [3.7.3.1/2] (Basic concepts::Storage duration::Dynamic storage duration::Allocation functions) "... The pointer returned shall be suitably aligned so that it can be converted to a pointer of any complete object type and then used to access the object or array in the storage allocated ..."
  • [5.3.4/10] (Expressions::Unary expressions::New) "... For arrays of char and unsigned char, the difference between the result of the new-expression and the address returned by the allocation function shall be an integral multiple of the most stringent alignment requirement (3.9) of any object type whose size is no greater than the size of the array being created. [Note: Because allocation functions are assumed to return pointers to storage that is appropriately aligned for objects of any type, this constraint on array allocation overhead permits the common idiom of allocating character arrays into which objects of other types will later be placed."
Consider: imaginary object type Element of a size which is a multiple of some actual object size; assume sizeof(Element) > POD_size

Note that an object of that size can exist. One object of that size is an array of the "actual" objects.

Note that the block is properly aligned for an Element. This directly follows from Predicate 2.

Corollary 1: The block is properly aligned for an array of Elements

This follows from Predicates 1 and 2, and the following quote:

[3.9/9] (Basic concepts::Types) "An object type is a (possibly cv-qualified) type that is not a function type, not a reference type, and not a void type."

(Specifically, array types are object types.)

Corollary 2: For any pointer p and integer i, if p is properly aligned for the type it points to, then p + i (when well-defined) is properly aligned for that type; in other words, if an array is properly aligned, then each element in that array is properly aligned

There are no quotes from the Standard to directly support this argument, but it fits the common conception of the meaning of "alignment".

Note that the conditions for p + i being well-defined are outlined in [5.7/5]. We do not quote that here, but only make note that it is well-defined if p and p + i both point into or one past the same array.

Let: sizeof(Element) be the least common multiple of sizes of several actual objects (T1, T2, T3, ...)
Let: block be a pointer to the memory block, pe be (Element *) block, and pn be (Tn *) block
Corollary 3: For each integer i, such that pe + i is well-defined, then for each n, there exists some integer jn such that pn + jn is well-defined and refers to the same memory address as pe + i

This follows naturally, since the memory block is an array of Elements, and for each n, sizeof(Element) % sizeof(Tn) == 0; thus, the boundary of each element in the array of Elements is also a boundary of each element in each array of Tn.

Theorem: For each integer i, such that pe + i is well-defined, that address (pe + i) is properly aligned for each type Tn

Since pe + i is well-defined, then by Corollary 3, pn + jn is well-defined. It is properly aligned from Predicate 2 and Corollaries 1 and 2.

Use of the Theorem

The proof above covers alignment requirements for cutting chunks out of a block. The implementation uses actual object sizes of:

  • The requested object size (requested_size); this is the size of chunks requested by the user
  • void* (pointer to void); this is because we interleave our free list through the chunks
  • size_type; this is because we store the size of the next block within each memory block

Each block also contains a pointer to the next block; but that is stored as a pointer to void and cast when necessary, to simplify alignment requirements to the three types above.

Therefore, alloc_size is defined to be the largest of the sizes above, rounded up to be a multiple of all three sizes. This guarantees alignment provided all alignments are powers of two: something that appears to be true on all known platforms.

A Look at the Memory Block

Each memory block consists of three main sections. The first section is the part that chunks are cut out of, and contains the interleaved free list. The second section is the pointer to the next block, and the third section is the size of the next block.

Each of these sections may contain padding as necessary to guarantee alignment for each of the next sections. The size of the first section is number_of_chunks * lcm(requested_size, sizeof(void *), sizeof(size_type)); the size of the second section is lcm(sizeof(void *), sizeof(size_type); and the size of the third section is sizeof(size_type).

Here's an example memory block, where requested_size == sizeof(void *) == sizeof(size_type) == 4:

To show a visual example of possible padding, here's an example memory block where requested_size == 8 and sizeof(void *) == sizeof(size_type) == 4

The theorem above guarantees all alignment requirements for allocating chunks and also implementation details such as the interleaved free list. However, it does so by adding padding when necessary; therefore, we have to treat allocations of contiguous chunks in a different way.

Using array arguments similar to the above, we can translate any request for contiguous memory for n objects of requested_size into a request for m contiguous chunks. m is simply ceil(n * requested_size / alloc_size), where alloc_size is the actual size of the chunks.

To illustrate:

Here's an example memory block, where requested_size == 1 and sizeof(void *) == sizeof(size_type) == 4:

Then, when the user deallocates the contiguous memory, we can split it up into chunks again.

Note that the implementation provided for allocating contiguous chunks uses a linear instead of quadratic algorithm. This means that it may not find contiguous free chunks if the free list is not ordered. Thus, it is recommended to always use an ordered free list when dealing with contiguous allocation of chunks. (In the example above, if Chunk 1 pointed to Chunk 3 pointed to Chunk 2 pointed to Chunk 4, instead of being in order, the contiguous allocation algorithm would have failed to find any of the contiguous chunks).

Introduction

simple_segregated_storage.hpp provides a template class simple_segregated_storage that controls access to a free list of memory chunks.

Note that this is a very simple class, with unchecked preconditions on almost all its functions. It is intended to be the fastest and smallest possible quick memory allocator for example, something to use in embedded systems. This class delegates many difficult preconditions to the user (especially alignment issues). For more general usage, see the other Pool Interfaces.

Synopsis
template <typename SizeType = std::size_t>
class simple_segregated_storage
{
  private:
    simple_segregated_storage(const simple_segregated_storage &);
    void operator=(const simple_segregated_storage &);

  public:
    typedef SizeType size_type;

    simple_segregated_storage();
    ~simple_segregated_storage();

    static void * segregate(void * block,
        size_type nsz, size_type npartition_sz,
        void * end = 0);
    void add_block(void * block,
        size_type nsz, size_type npartition_sz);
    void add_ordered_block(void * block,
        size_type nsz, size_type npartition_sz);

    bool empty() const;

    void * malloc();
    void free(void * chunk);
    void ordered_free(void * chunk);
    void * malloc_n(size_type n, size_type partition_sz);
    void free_n(void * chunks, size_type n,
        size_type partition_sz);
    void ordered_free_n(void * chunks, size_type n,
        size_type partition_sz);
};
Semantics

An object of type simple_segregated_storage<SizeType> is empty if its free list is empty. If it is not empty, then it is ordered if its free list is ordered. A free list is ordered if repeated calls to malloc() will result in a constantly-increasing sequence of values, as determined by std::less<void *>. A member function is order-preserving if the free-list maintains its order orientation (that is, an ordered free list is still ordered after the member function call).

Table 1. Symbol Table

Symbol

Meaning

Store

simple_segregated_storage<SizeType>

t

value of type Store

u

value of type const Store

block, chunk, end

values of type void *

partition_sz, sz, n

values of type Store::size_type


Table 2. Template Parameters

Parameter

Default

Requirements

SizeType

std::size_t

An unsigned integral type


Table 3. Typedefs

Symbol

Type

size_type

SizeType


Table 4. Constructors, Destructors, and State

Expression

Return Type

Post-Condition

Notes

Store()

not used

empty()

Constructs a new Store

(&t)->~Store()

not used

Destructs the Store

u.empty()

bool

Returns true if u is empty. Order-preserving.


Table 5. Segregation

Expression

Return Type

Pre-Condition

Post-Condition

Semantic Equivalence

Notes

Store::segregate(block, sz, partition_sz, end)

void *

partition_sz >= sizeof(void *) partition_sz = sizeof(void *) * i, for some integer i sz >= partition_sz block is properly aligned for an array of objects of size partition_sz block is properly aligned for an array of void *

Interleaves a free list through the memory block specified by block of size sz bytes, partitioning it into as many partition_sz-sized chunks as possible. The last chunk is set to point to end, and a pointer to the first chunck is returned (this is always equal to block). This interleaved free list is ordered. O(sz).

Store::segregate(block, sz, partition_sz)

void *

Same as above

Store::segregate(block, sz, partition_sz, 0)

t.add_block(block, sz, partition_sz)

void

Same as above

!t.empty()

Segregates the memory block specified by block of size sz bytes into partition_sz-sized chunks, and adds that free list to its own. If t was empty before this call, then it is ordered after this call. O(sz).

t.add_ordered_block(block, sz, partition_sz)

void

Same as above

!t.empty()

Segregates the memory block specified by block of size sz bytes into partition_sz-sized chunks, and merges that free list into its own. Order-preserving. O(sz).


Table 6. Allocation and Deallocation

Expression

Return Type

Pre-Condition

Post-Condition

Semantic Equivalence

Notes

t.malloc()

void *

!t.empty()

Takes the first available chunk from the free list and returns it. Order-preserving. O(1).

t.free(chunk)

void

chunk was previously returned from a call to t.malloc()

!t.empty()

Places chunk back on the free list. Note that chunk may not be 0. O(1).

t.ordered_free(chunk)

void

Same as above

!t.empty()

Places chunk back on the free list. Note that chunk may not be 0. Order-preserving. O(N) with respect to the size of the free list.

t.malloc_n(n, partition_sz)

void *

Attempts to find a contiguous sequence of n partition_sz-sized chunks. If found, removes them all from the free list and returns a pointer to the first. If not found, returns 0. It is strongly recommended (but not required) that the free list be ordered, as this algorithm will fail to find a contiguous sequence unless it is contiguous in the free list as well. Order-preserving. O(N) with respect to the size of the free list.

t.free_n(chunk, n, partition_sz)

void

chunk was previously returned from a call to t.malloc_n(n, partition_sz)

!t.empty()

t.add_block(chunk, n * partition_sz, partition_sz)

Assumes that chunk actually refers to a block of chunks spanning n * partition_sz bytes; segregates and adds in that block. Note that chunk may not be 0. O(n).

t.ordered_free_n(chunk, n, partition_sz)

void

same as above

same as above

t.add_ordered_block(chunk, n * partition_sz, partition_sz)

Same as above, except it merges in the free list. Order-preserving. O(N + n) where N is the size of the free list.


Pool objects need to request memory blocks from the system, which the Pool then splits into chunks to allocate to the user. By specifying a UserAllocator template parameter to various Pool interfaces, users can control how those system memory blocks are allocated.

In the following table, UserAllocator is a User Allocator type, block is a value of type char *, and n is a value of type UserAllocator::size_type

Table 7. UserAllocator Requirements

Expression

Result

Description

UserAllocator::size_type

An unsigned integral type that can represent the size of the largest object to be allocated.

UserAllocator::difference_type

A signed integral type that can represent the difference of any two pointers.

UserAllocator::malloc(n)

char *

Attempts to allocate n bytes from the system. Returns 0 if out-of-memory.

UserAllocator::free(block)

void

block must have been previously returned from a call to UserAllocator::malloc.


There are two UserAllocator classes provided in this library: default_user_allocator_new_delete and default_user_allocator_malloc_free, both in pool.hpp. The default value for the template parameter UserAllocator is always default_user_allocator_new_delete.


PrevUpHomeNext