...one of the most highly
regarded and expertly designed C++ library projects in the
world.
— Herb Sutter and Andrei
Alexandrescu, C++
Coding Standards
With BOOST_FIBERS_NO_ATOMICS
defined at the compiler’s command line, synchronization between fibers (in different
threads) is disabled. This is acceptable if the application is single threaded
and/or fibers are not synchronized between threads.
Memory allocation algorithm is significant for performance in a multithreaded environment, especially for Boost.Fiber where fiber stacks are allocated on the heap. The default user-level memory allocator (UMA) of glibc is ptmalloc2 but it can be replaced by another UMA that fit better for the concret work-load For instance Google’s TCmalloc enables a better performance at the skynet microbenchmark than glibc’s default memory allocator.
The fibers in a thread are coordinated by a fiber manager. Fibers trade control
cooperatively, rather than preemptively. Depending on the work-load several
strategies of scheduling the fibers are possible [10] that can be implmented on behalf of algorithm
.
work_stealing
and
numa::work_stealing
)
shared_work
)
Boost.Fiber uses internally spinlocks to protect critical regions if fibers running on different threads interact. Spinlocks are implemented as TTAS (test-test-and-set) locks, i.e. the spinlock tests the lock before calling an atomic exchange. This strategy helps to reduce the cache line invalidations triggered by acquiring/releasing the lock.
A lock is considered under contention, if a thread repeatedly fails to acquire the lock because some other thread was faster. Waiting for a short time lets other threads finish before trying to enter the critical section again. While busy waiting on the lock, relaxing the CPU (via pause/yield mnemonic) gives the CPU a hint that the code is in a spin-wait loop.
It is obvious that this strategy is useless on single core systems because
the lock can only released if the thread gives up its time slice in order to
let other threads run. The macro BOOST_FIBERS_SPIN_SINGLE_CORE replaces the
CPU hints (pause/yield mnemonic) by informing the operating system (via std::this_thread_yield()
) that the thread gives up its time slice
and the operating system switches to another thread.
The macro BOOST_FIBERS_RETRY_THRESHOLD determines how many times the CPU iterates in the spin-wait loop before yielding the thread or blocking in futex-wait. The spinlock tracks how many times the thread failed to acquire the lock. The higher the contention, the longer the thread should back-off. A “Binary Exponential Backoff” algorithm together with a randomized contention window is utilized for this purpose. BOOST_FIBERS_CONTENTION_WINDOW_THRESHOLD determines the upper limit of the contention window (expressed as the exponent for basis of two).
Boost.Fiber uses spinlocks to protect critical regions that can be used together with transactional memory (see section Speculative execution).
Note | |
---|---|
TXS is enabled if property |
Note | |
---|---|
A TSX-transaction will be aborted if the floating point state is modified inside a critical region. As a consequence floating point operations, e.g. tore/load of floating point related registers during a fiber (context) switch are disabled. |
Modern multi-socket systems are usually designed as NUMA
systems. A suitable fiber scheduler like numa::work_stealing
reduces
remote memory access (latence).
Table 1.5. Parameters that migh be defiend at compiler's command line
Parameter |
Default value |
Effect on Boost.Fiber |
---|---|---|
BOOST_FIBERS_NO_ATOMICS |
- |
no multithreading support, all atomics removed, no synchronization between fibers running in different threads |
BOOST_FIBERS_SPINLOCK_STD_MUTEX |
- |
|
BOOST_FIBERS_SPINLOCK_TTAS |
+ |
spinlock with test-test-and-swap on shared variable |
BOOST_FIBERS_SPINLOCK_TTAS_ADAPTIVE |
- |
spinlock with test-test-and-swap on shared variable, adaptive retries while busy waiting |
BOOST_FIBERS_SPINLOCK_TTAS_FUTEX |
- |
spinlock with test-test-and-swap on shared variable, suspend on futex after certain number of retries |
BOOST_FIBERS_SPINLOCK_TTAS_ADAPTIVE_FUTEX |
- |
spinlock with test-test-and-swap on shared variable, while busy waiting adaptive retries, suspend on futex certain amount of retries |
BOOST_FIBERS_SPINLOCK_TTAS + BOOST_USE_TSX |
- |
spinlock with test-test-and-swap and speculative execution (Intel TSX required) |
BOOST_FIBERS_SPINLOCK_TTAS_ADAPTIVE + BOOST_USE_TSX |
- |
spinlock with test-test-and-swap on shared variable, adaptive retries while busy waiting and speculative execution (Intel TSX required) |
BOOST_FIBERS_SPINLOCK_TTAS_FUTEX + BOOST_USE_TSX |
- |
spinlock with test-test-and-swap on shared variable, suspend on futex after certain number of retries and speculative execution (Intel TSX required) |
BOOST_FIBERS_SPINLOCK_TTAS_ADAPTIVE_FUTEX + BOOST_USE_TSX |
- |
spinlock with test-test-and-swap on shared variable, while busy waiting adaptive retries, suspend on futex certain amount of retries and speculative execution (Intel TSX required) |
BOOST_FIBERS_SPIN_SINGLE_CORE |
- |
on single core machines with multiple threads, yield thread ( |
BOOST_FIBERS_RETRY_THRESHOLD |
64 |
max number of retries while busy spinning, the use fallback |
BOOST_FIBERS_CONTENTION_WINDOW_THRESHOLD |
16 |
max size of collisions window, expressed as exponent for the basis of two |
BOOST_FIBERS_SPIN_BEFORE_SLEEP0 |
32 |
max number of retries that relax the processor before the thread sleeps for 0s |
BOOST_FIBERS_SPIN_BEFORE_YIELD |
64 |
max number of retries where the thread sleeps for 0s before yield
thread ( |