Memory allocation algorithms

The algorithm is a variation of sequential fit using singly linked list of free memory buffers. The algorithm is based on the article about shared memory titled "Taming Shared Memory" . The algorithm is as follows:

The shared memory is divided in blocks of free shared memory, each one with some control data and several bytes of memory ready to be used. The control data contains a pointer (in our case offset_ptr) to the next free block and the size of the block. The allocator consists of a singly linked list of free blocks, ordered by address. The last block, points always to the first block:

simple_seq_fit memory layout:

    main      extra  allocated  free_block_1     allocated   free_block_2    allocated   free_block_3
    header    header  block       ctrl     usr     block      ctrl     usr     block      ctrl     usr
   _________  _____  _________  _______________  _________  _______________  _________  _______________
  |         ||     ||         ||         |     ||         ||         |     ||         ||         |     |
  |free|ctrl||extra||         ||next|size| mem ||         ||next|size| mem ||         ||next|size| mem |
  |_________||_____||_________||_________|_____||_________||_________|_____||_________||_________|_____|
      |                         | |                         |  |                       | |
      |_>_>_>_>_>_>_>_>_>_>_>_>_| |_>_>_>_>_>_>_>_>_>_>_>_>_|  |_>_>_>_>_>_>_>_>_>_>_>_| |
                                |                                                        |
                                |_<_<_<_<_<_<_<_<_<_<_<_<_<_<_<_<_<_<_<_<_<_<_<_<_<_<_<__|

When a user requests N bytes of memory, the allocator traverses the free block list looking for a block large enough. If the "mem" part of the block has the same size as the requested memory, we erase the block from the list and return a pointer to the "mem" part of the block. If the "mem" part size is bigger than needed, we split the block in two blocks, one of the requested size and the other with remaining size. Now, we take the block with the exact size, erase it from list and give it to the user.

When the user deallocates a block, we traverse the list (remember that the list is ordered), and search its place depending on the block address. Once found, we try to merge the block with adjacent blocks if possible.

To ease implementation, the size of the free memory block is measured in multiples of "basic_size" bytes. The basic size will be the size of the control block aligned to machine most restrictive alignment.

This algorithm is a low size overhead algorithm suitable for simple allocation schemes. This algorithm should only be used when size is a major concern, because the performance of this algorithm suffers when the memory is fragmented. This algorithm has linear allocation and deallocation time, so when the number of allocations is high, the user should use a more performance-friendly algorithm.

In most 32 systems, with 8 byte alignment, "basic_size" is 8 bytes. This means that an allocation request of 1 byte leads to the creation of a 16 byte block, where 8 bytes are available to the user. The allocation of 8 bytes leads also to the same 16 byte block.

rbtree_best_fit: Best-fit logarithmic-time complexity allocation

This algorithm is an advanced algorithm using red-black trees to sort the free portions of the memory segment by size. This allows logarithmic complexity allocation. Apart from this, a doubly-linked list of all portions of memory (free and allocated) is maintained to allow constant-time access to previous and next blocks when doing merging operations.

The data used to create the red-black tree of free nodes is overwritten by the user since it's no longer used once the memory is allocated. This maintains the memory size overhead down to the doubly linked list overhead, which is pretty small (two pointers). Basically this is the scheme:

rbtree_best_fit memory layout:

   main            allocated block   free block                        allocated block  free block
   header
  _______________  _______________  _________________________________  _______________  _________________________________
 |               ||         |     ||         |                 |     ||         |     ||         |                 |     |
 |  main header  ||next|prev| mem ||next|prev|left|right|parent| mem ||next|prev| mem ||next|prev|left|right|parent| mem |
 |_______________||_________|_____||_________|_________________|_____||_________|_____||_________|_________________|_____|

This allocation algorithm is pretty fast and scales well with big shared memory segments and big number of allocations. To form a block a minimum memory size is needed: the sum of the doubly linked list and the red-black tree control data. The size of a block is measured in multiples of the most restrictive alignment value.

In most 32 systems with 8 byte alignment the minimum size of a block is 24 byte. When a block is allocated the control data related to the red black tree is overwritten by the user (because it's only needed for free blocks).

In those systems a 1 byte allocation request means that:

24 bytes of memory from the segment are used to form a block.
16 bytes of them are usable for the user.

For really small allocations (<= 8 bytes), this algorithm wastes more memory than the simple sequential fit algorithm (8 bytes more). For allocations bigger than 8 bytes the memory overhead is exactly the same. This is the default allocation algorithm in Boost.Interprocess managed memory segments.

Boost C++ Libraries

Memory allocation algorithms

simple_seq_fit: A simple shared memory management algorithm

rbtree_best_fit: Best-fit logarithmic-time complexity allocation