Guaranteeing Alignment
Terminology
Review the concepts document if you are not already familiar with it. Remember that block is a contiguous section of memory, which is partitioned or segregated into fixed-size chunks. These chunks are what are allocated and deallocated by the user.
Overview
Each Pool has a single free list that can extend over a number of memory blocks. Thus, Pool also has a linked list of allocated memory blocks. Each memory block, by default, is allocated using new[], and all memory blocks are freed on destruction. It is the use of new[] that allows us to guarantee alignment.
Proof of Concept: Guaranteeing Alignment
Each block of memory is allocated as a POD type (specifically, an array of characters) through operator new[]. Let POD_size be the number of characters allocated.
Predicate 1: Arrays may not have padding
This follows from the following quote:
[5.3.3/2] (Expressions::Unary expressions::Sizeof) "... When applied to an array, the result is the total number of bytes in the array. This implies that the size of an array of n elements is n times the size of an element."
Therefore, arrays cannot contain padding, though the elements within the arrays may contain padding.
Predicate 2: Any block of memory allocated as an array of characters through operator new[] (hereafter referred to as the block) is properly aligned for any object of that size or smaller
This follows from:
- [3.7.3.1/2] (Basic concepts::Storage duration::Dynamic storage duration::Allocation functions) "... The pointer returned shall be suitably aligned so that it can be converted to a pointer of any complete object type and then used to access the object or array in the storage allocated ..."
- [5.3.4/10] (Expressions::Unary expressions::New) "... For arrays of char and unsigned char, the difference between the result of the new-expression and the address returned by the allocation function shall be an integral multiple of the most stringent alignment requirement (3.9) of any object type whose size is no greater than the size of the array being created. [Note: Because allocation functions are assumed to return pointers to storage that is appropriately aligned for objects of any type, this constraint on array allocation overhead permits the common idiom of allocating character arrays into which objects of other types will later be placed. ]"
Consider: imaginary object type Element of a size which is a multiple of some actual object size; assume sizeof(Element) > POD_size
Note that an object of that size can exist. One object of that size is an array of the "actual" objects.
Note that the block is properly aligned for an Element. This directly follows from Predicate 2.
Corollary 1: The block is properly aligned for an array of Elements
This follows from Predicates 1 and 2, and the following quote:
[3.9/9] (Basic concepts::Types) "An object type is a (possibly cv-qualified) type that is not a function type, not a reference type, and not a void type." (Specifically, array types are object types.)
Corollary 2: For any pointer p and integer i, if p is properly aligned for the type it points to, then p + i (when well-defined) is properly aligned for that type; in other words, if an array is properly aligned, then each element in that array is properly aligned
There are no quotes from the Standard to directly support this argument, but it fits the common conception of the meaning of "alignment".
Note that the conditions for p + i being well-defined are outlined in [5.7/5]. We do not quote that here, but only make note that it is well-defined if p and p + i both point into or one past the same array.
Let: sizeof(Element) be the least common multiple of sizes of several actual objects (T1, T2, T3, ...)
Let: block be a pointer to the memory block, pe be (Element *) block, and pn be (Tn *) block
Corollary 3: For each integer i, such that pe + i is well-defined, then for each n, there exists some integer jn such that pn + jn is well-defined and refers to the same memory address as pe + i
This follows naturally, since the memory block is an array of Elements, and for each n, sizeof(Element) % sizeof(Tn) == 0; thus, the boundary of each element in the array of Elements is also a boundary of each element in each array of Tn.
Theorem: For each integer i, such that pe + i is well-defined, that address (pe + i) is properly aligned for each type Tn
Since pe + i is well-defined, then by Corollary 3, pn + jn is well-defined. It is properly aligned from Predicate 2 and Corollaries 1 and 2.
Use of the Theorem
The proof above covers alignment requirements for cutting chunks out of a block. The implementation uses actual object sizes of:
- The requested object size (requested_size); this is the size of chunks requested by the user
- void * (pointer to void); this is because we interleave our free list through the chunks
- size_type; this is because we store the size of the next block within each memory block
Each block also contains a pointer to the next block; but that is stored as a pointer to void and cast when necessary, to simplify alignment requirements to the three types above.
Therefore, alloc_size is defined to be the lcm of the sizes of the three types above.
A Look at the Memory Block
Each memory block consists of three main sections. The first section is the part that chunks are cut out of, and contains the interleaved free list. The second section is the pointer to the next block, and the third section is the size of the next block.
Each of these sections may contain padding as necessary to guarantee alignment for each of the next sections. The size of the first section is number_of_chunks * lcm(requested_size, sizeof(void *), sizeof(size_type)); the size of the second section is lcm(sizeof(void *), sizeof(size_type); and the size of the third section is sizeof(size_type).
Here's an example memory block, where requested_size == sizeof(void *) == sizeof(size_type) == 4:
| Sections | size_type alignment | void * alignment | requested_size alignment |
|---|---|---|---|
| Memory not belonging to process | |||
| Chunks section (16 bytes) | (4 bytes) | FLP for Chunk 1 (4 bytes) | Chunk 1 (4 bytes) |
| (4 bytes) | FLP for Chunk 2 (4 bytes) | Chunk 2 (4 bytes) | |
| (4 bytes) | FLP for Chunk 3 (4 bytes) | Chunk 3 (4 bytes) | |
| (4 bytes) | FLP for Chunk 4 (4 bytes) | Chunk 4 (4 bytes) | |
| Pointer to next Block (4 bytes) | (4 bytes) | Pointer to next Block (4 bytes) | |
| Size of next Block (4 bytes) | Size of next Block (4 bytes) | ||
| Memory not belonging to process | |||
To show a visual example of possible padding, here's an example memory block where requested_size == 8 and sizeof(void *) == sizeof(size_type) == 4:
| Sections | size_type alignment | void * alignment | requested_size alignment |
|---|---|---|---|
| Memory not belonging to process | |||
| Chunks section (32 bytes) | (4 bytes) | FLP for Chunk 1 (4 bytes) | Chunk 1 (8 bytes) |
| (4 bytes) | (4 bytes) | ||
| (4 bytes) | FLP for Chunk 2 (4 bytes) | Chunk 2 (8 bytes) | |
| (4 bytes) | (4 bytes) | ||
| (4 bytes) | FLP for Chunk 3 (4 bytes) | Chunk 3 (8 bytes) | |
| (4 bytes) | (4 bytes) | ||
| (4 bytes) | FLP for Chunk 4 (4 bytes) | Chunk 4 (8 bytes) | |
| (4 bytes) | (4 bytes) | ||
| Pointer to next Block (4 bytes) | (4 bytes) | Pointer to next Block (4 bytes) | |
| Size of next Block (4 bytes) | Size of next Block (4 bytes) | ||
| Memory not belonging to process | |||
Finally, here is a convoluted example where the requested_size is 7, sizeof(void *) == 3, and sizeof(size_type) == 5, showing how the least common multiple guarantees alignment requirements even in the oddest of circumstances:
| Sections | size_type alignment | void * alignment | requested_size alignment |
|---|---|---|---|
| Memory not belonging to process | |||
| Chunks section (210 bytes) | (5 bytes) | Interleaved free list pointer for Chunk 1 (15 bytes; 3 used) | Chunk 1 (105 bytes; 7 used) |
| (5 bytes) | |||
| (5 bytes) | |||
| (5 bytes) | (15 bytes) | ||
| (5 bytes) | |||
| (5 bytes) | |||
| (5 bytes) | (15 bytes) | ||
| (5 bytes) | |||
| (5 bytes) | |||
| (5 bytes) | (15 bytes) | ||
| (5 bytes) | |||
| (5 bytes) | |||
| (5 bytes) | (15 bytes) | ||
| (5 bytes) | |||
| (5 bytes) | |||
| (5 bytes) | (15 bytes) | ||
| (5 bytes) | |||
| (5 bytes) | |||
| (5 bytes) | (15 bytes) | ||
| (5 bytes) | |||
| (5 bytes) | |||
| (5 bytes) | Interleaved free list pointer for Chunk 2 (15 bytes; 3 used) | Chunk 2 (105 bytes; 7 used) | |
| (5 bytes) | |||
| (5 bytes) | |||
| (5 bytes) | (15 bytes) | ||
| (5 bytes) | |||
| (5 bytes) | |||
| (5 bytes) | (15 bytes) | ||
| (5 bytes) | |||
| (5 bytes) | |||
| (5 bytes) | (15 bytes) | ||
| (5 bytes) | |||
| (5 bytes) | |||
| (5 bytes) | (15 bytes) | ||
| (5 bytes) | |||
| (5 bytes) | |||
| (5 bytes) | (15 bytes) | ||
| (5 bytes) | |||
| (5 bytes) | |||
| (5 bytes) | (15 bytes) | ||
| (5 bytes) | |||
| (5 bytes) | |||
| Pointer to next Block (15 bytes; 3 used) | (5 bytes) | Pointer to next Block (15 bytes; 3 used) | |
| (5 bytes) | |||
| (5 bytes) | |||
| Size of next Block (5 bytes; 5 used) | Size of next Block (5 bytes; 5 used) | ||
| Memory not belonging to process | |||
How Contiguous Chunks are Handled
The theorem above guarantees all alignment requirements for allocating chunks and also implementation details such as the interleaved free list. However, it does so by adding padding when necessary; therefore, we have to treat allocations of contiguous chunks in a different way.
Using array arguments similar to the above, we can translate any request for contiguous memory for n objects of requested_size into a request for m contiguous chunks. m is simply ceil(n * requested_size / alloc_size), where alloc_size is the actual size of the chunks. To illustrate:
Here's an example memory block, where requested_size == 1 and sizeof(void *) == sizeof(size_type) == 4:
| Sections | size_type alignment | void * alignment | requested_size alignment |
|---|---|---|---|
| Memory not belonging to process | |||
| Chunks section (16 bytes) | (4 bytes) | FLP to Chunk 2 (4 bytes) | Chunk 1 (4 bytes) |
| (4 bytes) | FLP to Chunk 3 (4 bytes) | Chunk 2 (4 bytes) | |
| (4 bytes) | FLP to Chunk 4 (4 bytes) | Chunk 3 (4 bytes) | |
| (4 bytes) | FLP to end-of-list (4 bytes) | Chunk 4 (4 bytes) | |
| Pointer to next Block (4 bytes) | (4 bytes) | Ptr to end-of-list (4 bytes) | |
| Size of next Block (4 bytes) | 0 (4 bytes) | ||
| Memory not belonging to process | |||
| Sections | size_type alignment | void * alignment | requested_size alignment |
|---|---|---|---|
| Memory not belonging to process | |||
| Chunks section (16 bytes) | (4 bytes) | (4 bytes) | 4 bytes in use by program |
| (4 bytes) | (4 bytes) | 3 bytes in use by program (1 byte unused) | |
| (4 bytes) | FLP to Chunk 4 (4 bytes) | Chunk 3 (4 bytes) | |
| (4 bytes) | FLP to end-of-list (4 bytes) | Chunk 4 (4 bytes) | |
| Pointer to next Block (4 bytes) | (4 bytes) | Ptr to end-of-list (4 bytes) | |
| Size of next Block (4 bytes) | 0 (4 bytes) | ||
| Memory not belonging to process | |||
Then, when the user deallocates the contiguous memory, we can split it up into chunks again.
Note that the implementation provided for allocating contiguous chunks uses a linear instead of quadratic algorithm. This means that it may not find contiguous free chunks if the free list is not ordered. Thus, it is recommended to always use an ordered free list when dealing with contiguous allocation of chunks. (In the example above, if Chunk 1 pointed to Chunk 3 pointed to Chunk 2 pointed to Chunk 4, instead of being in order, the contiguous allocation algorithm would have failed to find any of the contiguous chunks).
Copyright © 2000, 2001 Stephen Cleary (scleary AT jerviswebb DOT com)
This file can be redistributed and/or modified under the terms found in copyright.html
This software and its documentation is provided "as is" without express or implied warranty, and with no claim as to its suitability for any purpose.
