Architecture and internals

Basic guidelines

When building Boost.Interprocess architecture, I took some basic guidelines that can be summarized by these points:

Boost.Interprocess should be portable at least in UNIX and Windows systems. That means unifying not only interfaces but also behaviour. This is why Boost.Interprocess has chosen kernel or filesystem persistence for shared memory and named synchronization mechanisms. Process persistence for shared memory is also desirable but it's difficult to achieve in UNIX systems.
Boost.Interprocess inter-process synchronization primitives should be equal to thread synchronization primitives. Boost.Interprocess aims to have an interface compatible with the C++ standard thread API.
Boost.Interprocess architecture should be modular, customizable but efficient. That's why Boost.Interprocess is based on templates and memory algorithms, index types, mutex types and other classes are templatizable.
Boost.Interprocess architecture should allow the same concurrency as thread based programming. Different mutual exclusion levels are defined so that a process can concurrently allocate raw memory when expanding a shared memory vector while another process can be safely searching a named object.
Boost.Interprocess containers know nothing about Boost.Interprocess. All specific behaviour is contained in the STL-like allocators. That allows STL vendors to slightly modify (or better said, generalize) their standard container implementations and obtain a fully std::allocator and boost::interprocess::allocator compatible container. This also make Boost.Interprocess containers compatible with standard algorithms.

Boost.Interprocess is built above 3 basic classes: a memory algorithm, a segment manager and a managed memory segment:

From the memory algorithm to the managed segment

The memory algorithm
The segment manager
Boost.Interprocess managed memory segments

The memory algorithm is an object that is placed in the first bytes of a shared memory/memory mapped file segment. The memory algorithm can return portions of that segment to users marking them as used and the user can return those portions to the memory algorithm so that the memory algorithm mark them as free again. There is an exception though: some bytes beyond the end of the memory algorithm object, are reserved and can't be used for this dynamic allocation. This "reserved" zone will be used to place other additional objects in a well-known place.

To sum up, a memory algorithm has the same mission as malloc/free of standard C library, but it just can return portions of the segment where it is placed. The layout of a memory segment would be:

Layout of the memory segment:
 ____________ __________ ____________________________________________  
|            |          |                                            | 
|   memory   | reserved |  The memory algorithm will return portions | 
| algorithm  |          |  of the rest of the segment.               | 
|____________|__________|____________________________________________|

The memory algorithm takes care of memory synchronizations, just like malloc/free guarantees that two threads can call malloc/free at the same time. This is usually achieved placing a process-shared mutex as a member of the memory algorithm. Take in care that the memory algorithm knows nothing about the segment (if it is shared memory, a shared memory file, etc.). For the memory algorithm the segment is just a fixed size memory buffer.

The memory algorithm is also a configuration point for the rest of the Boost.Interprocess framework since it defines two basic types as member typedefs:

typedef /*implementation dependent*/ void_pointer;
typedef /*implementation dependent*/ mutex_family;

The void_pointer typedef defines the pointer type that will be used in the Boost.Interprocess framework (segment manager, allocators, containers). If the memory algorithm is ready to be placed in a shared memory/mapped file mapped in different base addresses, this pointer type will be defined as offset_ptr<void> or a similar relative pointer. If the memory algorithm will be used just with fixed address mapping, void_pointer can be defined as void*.

The rest of the interface of a Boost.Interprocess memory algorithm is described in Writing a new shared memory allocation algorithm section. As memory algorithm examples, you can see the implementations simple_seq_fit or rbtree_best_fit classes.

The segment manager

The segment manager, is an object also placed in the first bytes of the managed memory segment (shared memory, memory mapped file), that offers more sophisticated services built above the memory algorithm. How can both the segment manager and memory algorithm be placed in the beginning of the segment? That's because the segment manager owns the memory algorithm: The truth is that the memory algorithm is embedded in the segment manager:

The layout of managed memory segment:
 _______ _________________
|       |         |       |
| some  | memory  | other |<- The memory algorithm considers 
|members|algorithm|members|   "other members" as reserved memory, so
|_______|_________|_______|   it does not use it for dynamic allocation.
|_________________________|____________________________________________
|                         |                                            |
|    segment manager      |  The memory algorithm will return portions |
|                         |  of the rest of the segment.               |
|_________________________|____________________________________________|

The segment manager initializes the memory algorithm and tells the memory manager that it should not use the memory where the rest of the segment manager's member are placed for dynamic allocations. The other members of the segment manager are a recursive mutex (defined by the memory algorithm's mutex_family::recursive_mutex typedef member), and two indexes (maps): one to implement named allocations, and another one to implement "unique instance" allocations.

The first index is a map with a pointer to a c-string (the name of the named object) as a key and a structure with information of the dynamically allocated object (the most important being the address and the size of the object).
The second index is used to implement "unique instances" and is basically the same as the first index, but the name of the object comes from a typeid(T).name() operation.

The memory needed to store [name pointer, object information] pairs in the index is allocated also via the memory algorithm, so we can tell that internal indexes are just like ordinary user objects built in the segment. The rest of the memory to store the name of the object, the object itself, and meta-data for destruction/deallocation is allocated using the memory algorithm in a single allocate() call.

As seen, the segment manager knows nothing about shared memory/memory mapped files. The segment manager itself does not allocate portions of the segment, it just asks the memory algorithm to allocate the needed memory from the rest of the segment. The segment manager is a class built above the memory algorithm that offers named object construction, unique instance constructions, and many other services.

The segment manager is implemented in Boost.Interprocess by the segment_manager class.

template<class CharType 
        ,class MemoryAlgorithm
        ,template<class IndexConfig> class IndexType>
class segment_manager;

As seen, the segment manager is quite generic: we can specify the character type to be used to identify named objects, we can specify the memory algorithm that will control dynamically the portions of the memory segment, and we can specify also the index type that will store the [name pointer, object information] mapping. We can construct our own index types as explained in Building custom indexes section.

Boost.Interprocess managed memory segments

The Boost.Interprocess managed memory segments that construct the shared memory/memory mapped file, place there the segment manager and forward the user requests to the segment manager. For example, basic_managed_shared_memory is a Boost.Interprocess managed memory segment that works with shared memory. basic_managed_mapped_file works with memory mapped files, etc...

Basically, the interface of a Boost.Interprocess managed memory segment is the same as the segment manager but it also offers functions to "open", "create", or "open or create" shared memory/memory-mapped files segments and initialize all needed resources. Managed memory segment classes are not built in shared memory or memory mapped files, they are normal C++ classes that store a pointer to the segment manager (which is built in shared memory or memory mapped files).

Apart from this, managed memory segments offer specific functions: managed_mapped_file offers functions to flush memory contents to the file, managed_heap_memory offers functions to expand the memory, etc...

Most of the functions of Boost.Interprocess managed memory segments can be shared between all managed memory segments, since many times they just forward the functions to the segment manager. Because of this, in Boost.Interprocess all managed memory segments derive from a common class that implements memory-independent (shared memory, memory mapped files) functions: boost::interprocess::detail::basic_managed_memory_impl

Deriving from this class, Boost.Interprocess implements several managed memory classes, for different memory backends:

basic_managed_shared_memory (for shared memory).
basic_managed_mapped_file (for memory mapped files).
basic_managed_heap_memory (for heap allocated memory).
basic_managed_external_buffer (for user provided external buffer).

Allocators and containers

Boost.Interprocess allocators
Implementation of Boost.Interprocess segregated storage pools
Implementation of Boost.Interprocess adaptive pools
Boost.Interprocess containers

Boost.Interprocess allocators

The Boost.Interprocess STL-like allocators are fairly simple and follow the usual C++ allocator approach. Normally, allocators for STL containers are based above new/delete operators and above those, they implement pools, arenas and other allocation tricks.

In Boost.Interprocess allocators, the approach is similar, but all allocators are based on the segment manager. The segment manager is the only one that provides from simple memory allocation to named object creations. Boost.Interprocess allocators always store a pointer to the segment manager, so that they can obtain memory from the segment or share a common pool between allocators.

As you can imagine, the member pointers of the allocator are not a raw pointers, but pointer types defined by the segment_manager::void_pointer type. Apart from this, the pointer typedef of Boost.Interprocess allocators is also of the same type of segment_manager::void_pointer.

This means that if our allocation algorithm defines void_pointer as offset_ptr<void>, boost::interprocess::allocator<T> will store an offset_ptr<segment_manager> to point to the segment manager and the boost::interprocess::allocator<T>::pointer type will be offset_ptr<T>. This way, Boost.Interprocess allocators can be placed in the memory segment managed by the segment manager, that is, shared memory, memory mapped files, etc...

Implementation of Boost.Interprocess segregated storage pools

Segregated storage pools are simple and follow the classic segregated storage algorithm.

The pool allocates chunks of memory using the segment manager's raw memory allocation functions.
The chunk contains a pointer to form a singly linked list of chunks. The pool will contain a pointer to the first chunk.
The rest of the memory of the chunk is divided in nodes of the requested size and no memory is used as payload for each node. Since the memory of a free node is not used that memory is used to place a pointer to form a singly linked list of free nodes. The pool has a pointer to the first free node.
Allocating a node is just taking the first free node from the list. If the list is empty, a new chunk is allocated, linked in the list of chunks and the new free nodes are linked in the free node list.
Deallocation returns the node to the free node list.
When the pool is destroyed, the list of chunks is traversed and memory is returned to the segment manager.

The pool is implemented by the private_node_pool and shared_node_pool classes.

Implementation of Boost.Interprocess adaptive pools

Adaptive pools are a variation of segregated lists but they have a more complicated approach:

Instead of using raw allocation, the pool allocates aligned chunks of memory using the segment manager. This is an essential feature since a node can reach its chunk information applying a simple mask to its address.
The chunks contains pointers to form a doubly linked list of chunks and an additional pointer to create a singly linked list of free nodes placed on that chunk. So unlike the segregated storage algorithm, the free list of nodes is implemented per chunk.
The pool maintains the chunks in increasing order of free nodes. This improves locality and minimizes the dispersion of node allocations across the chunks facilitating the creation of totally free chunks.
The pool has a pointer to the chunk with the minimum (but not zero) free nodes. This chunk is called the "active" chunk.
Allocating a node is just returning the first free node of the "active" chunk. The list of chunks is reordered according to the free nodes count. The pointer to the "active" pool is updated if necessary.
If the pool runs out of nodes, a new chunk is allocated, and pushed back in the list of chunks. The pointer to the "active" pool is updated if necessary.
Deallocation returns the node to the free node list of its chunk and updates the "active" pool accordingly.
If the number of totally free chunks exceeds the limit, chunks are returned to the segment manager.
When the pool is destroyed, the list of chunks is traversed and memory is returned to the segment manager.

The adaptive pool is implemented by the private_adaptive_node_pool and adaptive_node_pool classes.

Boost.Interprocess containers

Boost.Interprocess containers are standard conforming counterparts of STL containers in boost::interprocess namespace, but with these little details:

Boost.Interprocess STL containers don't assume that memory allocated with an allocator can be deallocated with other allocator of the same type. They always compare allocators with operator==() to know if this is possible.
The pointers of the internal structures of the Boost.Interprocess containers are of the same type the pointer type defined by the allocator of the container. This allows placing containers in managed memory segments mapped in different base addresses.

Performance of Boost.Interprocess

Performance of raw memory allocations
Performance of named allocations

This section tries to explain the performance characteristics of Boost.Interprocess, so that you can optimize Boost.Interprocess usage if you need more performance.

Performance of raw memory allocations

You can have two types of raw memory allocations with Boost.Interprocess classes:

Explicit: The user calls allocate() and deallocate() functions of managed_shared_memory/managed_mapped_file... managed memory segments. This call is translated to a MemoryAlgorithm::allocate() function, which means that you will need just the time that the memory algorithm associated with the managed memory segment needs to allocate data.
Implicit: For example, you are using boost::interprocess::allocator<...> with Boost.Interprocess containers. This allocator calls the same MemoryAlgorithm::allocate() function than the explicit method, every time a vector/string has to reallocate its buffer or every time you insert an object in a node container.

If you see that memory allocation is a bottleneck in your application, you have these alternatives:

If you use map/set associative containers, try using flat_map family instead of the map family if you mainly do searches and the insertion/removal is mainly done in an initialization phase. The overhead is now when the ordered vector has to reallocate its storage and move data. You can also call the reserve() method of these containers when you know beforehand how much data you will insert. However in these containers iterators are invalidated in insertions so this substitution is only effective in some applications.
Use a Boost.Interprocess pooled allocator for node containers, because pooled allocators call allocate() only when the pool runs out of nodes. This is pretty efficient (much more than the current default general-purpose algorithm) and this can save a lot of memory. See Segregated storage node allocators and Adaptive node allocators for more information.
Write your own memory algorithm. If you have experience with memory allocation algorithms and you think another algorithm is better suited than the default one for your application, you can specify it in all Boost.Interprocess managed memory segments. See the section Writing a new shared memory allocation algorithm to know how to do this. If you think its better than the default one for general-purpose applications, be polite and donate it to Boost.Interprocess to make it default!

Performance of named allocations

Boost.Interprocess allows the same parallelism as two threads writing to a common structure, except when the user creates/searches named/unique objects. The steps when creating a named object are these:

Lock a recursive mutex (so that you can make named allocations inside the constructor of the object to be created).
Try to insert the [name pointer, object information] in the name/object index. This lookup has to assure that the name has not been used before. This is achieved calling insert() function in the index. So the time this requires is dependent on the index type (ordered vector, tree, hash...). This can require a call to the memory algorithm allocation function if the index has to be reallocated, it's a node allocator, uses pooled allocations...
Allocate a single buffer to hold the name of the object, the object itself, and meta-data for destruction (number of objects, etc...).
Call the constructors of the object being created. If it's an array, one construtor per array element.
Unlock the recursive mutex.

The steps when destroying a named object using the name of the object (destroy<T>(name)) are these:

Lock a recursive mutex .
Search in the index the entry associated to that name. Copy that information and erase the index entry. This is done using find(const key_type &) and erase(iterator) members of the index. This can require element reordering if the index is a balanced tree, an ordered vector...
Call the destructor of the object (many if it's an array).
Deallocate the memory buffer containing the name, metadata and the object itself using the allocation algorithm.
Unlock the recursive mutex.

The steps when destroying a named object using the pointer of the object (destroy_ptr(T *ptr)) are these:

Lock a recursive mutex .
Depending on the index type, this can be different:
- If the index is a node index, (marked with boost::interprocess::is_node_index specialization): Take the iterator stored near the object and call erase(iterator). This can require element reordering if the index is a balanced tree, an ordered vector...
- If it's not an node index: Take the name stored near the object and erase the index entry calling `erase(const key &). This can require element reordering if the index is a balanced tree, an ordered vector...
Call the destructor of the object (many if it's an array).
Deallocate the memory buffer containing the name, metadata and the object itself using the allocation algorithm.
Unlock the recursive mutex.

If you see that the performance is not good enough you have these alternatives:

Maybe the problem is that the lock time is too big and it hurts parallelism. Try to reduce the number of named objects in the global index and if your application serves several clients try to build a new managed memory segment for each one instead of using a common one.
Use another Boost.Interprocess index type if you feel the default one is not fast enough. If you are not still satisfied, write your own index type. See Building custom indexes for this.
Destruction via pointer is at least as fast as using the name of the object and can be faster (in node containers, for example). So if your problem is that you make at lot of named destructions, try to use the pointer. If the index is a node index you can save some time.