Mixed Precision Arithmetic

Mixed precision arithmetic is fully supported by the library.

There are three different forms:

Where the operands are of different precision or types.
Where the operands are of the same type and precision, but yield a higher precision result.
Where the operands or result are different but equivalent types (for example types which differ only in their memory management).

Mixing Operands of Differing Types or Precision

If the arguments to a binary operator are of different types or precision, then the operation is allowed as long as there is an unambiguous implicit conversion from one argument type to the other. In all cases the arithmetic is performed "as if" the lower precision type is promoted to the higher precision type before applying the operator. However, particular backends may optimise this and avoid actually creating a temporary if they are able to do so.

For example:

mpfr_float_50         a(2), b;
mpfr_float_100        c(3), d;
static_mpfr_float_50  e(5), f;
mpz_int               i(20);

d = a * c;  // OK, result of operand is an mpfr_float_100.
b = a * c;  // Error, can't convert the result to an mpfr_float_50 as it will lose digits.
f = e * i;  // OK, unambiguous conversion from mpz_int to static_mpfr_float_50

Operands of the Same Precision

Sometimes you want to apply an operator to two arguments of the same precision in such a way as to obtain a result of higher precision. The most common situation occurs with fixed precision integers, where you want to multiply two N-bit numbers to obtain a 2N-bit result. This is supported in this library by the following free functions:

template <class ResultType, class Source1 class Source2>
ResultType& add(ResultType& result, const Source1& a, const Source2& b);

template <class ResultType, class Source1 class Source2>
ResultType& subtract(ResultType& result, const Source1& a, const Source2& b);

template <class ResultType, class Source1 class Source2>
ResultType& multiply(ResultType& result, const Source1& a, const Source2& b);

These functions apply the named operator to the arguments a and b and store the result in result, returning result. In all cases they behave "as if" arguments a and b were first promoted to type ResultType before applying the operator, though particular backends may well avoid that step by way of an optimization.

The type ResultType must be an instance of class number, and the types Source1 and Source2 may be either instances of class number or native integer types. The latter is an optimization that allows arithmetic to be performed on native integer types producing an extended precision result.

For example:

#include <boost/multiprecision/cpp_int.hpp>

int main()
{
   using namespace boost::multiprecision;

   std::uint64_t i = (std::numeric_limits<std::uint64_t>::max)();
   std::uint64_t j = 1;

   uint128_t ui128;
   uint256_t ui256;
   //
   // Start by performing arithmetic on 64-bit integers to yield 128-bit results:
   //
   std::cout << std::hex << std::showbase << i << std::endl;
   std::cout << std::hex << std::showbase << add(ui128, i, j) << std::endl;
   std::cout << std::hex << std::showbase << multiply(ui128, i, i) << std::endl;
   //
   // The try squaring a 128-bit integer to yield a 256-bit result:
   //
   ui128 = (std::numeric_limits<uint128_t>::max)();
   std::cout << std::hex << std::showbase << multiply(ui256, ui128, ui128) << std::endl;

   return 0;
}

Produces the output:

0xffffffffffffffff
0x10000000000000000
0xFFFFFFFFFFFFFFFE0000000000000001
0xFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFE00000000000000000000000000000001

Mixing different, but "equivalent" types

Ordinarily, mixing types of the same precision will produce a compiler error since there is no unambiguous result type. However, there is a traits class:

namespace boost{ namespace multiprecision

template <class NumberType1, class NumberType2>
struct is_equivalent_number_type;

}
}

When it's value const-member value is true then the library will treat the types NumberType1 and NumberType2 as if they are interchangeable. This is typically used to optimise memory management by using two types with differing memory allocation strategies for different roles. Typically, we would be using a type with dymanic memory allocation and a minimal memory footprint for the main storage type (think large arrays or matrices), but a type with internal storage and no dynamic allocation (but a larger memory footprint) for a few select calculations.

There are three backends that define this trait by default:

cpp_int's, provided the two types differ only in their internal cache size.
cpp_bin_float's provided they are of the same precision.
mpfr_float's provided they are of the same precision.

In addition, while this feature can be used with expression templates turned off, this feature minimises temporaries and hence memory allocations when expression template are turned on.

By way of an example, consider the dot product of two vectors of cpp_int's, our first, fairly trivial implementation might look like this:

cpp_int dot_product_1(const std::vector<cpp_int>& v1, const std::vector<cpp_int>& v2)
{
   if (v1.size() != v2.size())
      throw std::domain_error("Mismatched arguments");

   cpp_int result;
   for (std::size_t i = 0; i < v1.size(); ++i)
      result += v1[i] * v2[i];
   //
   // Named-return value optimisation (even better than a move):
   //
   return result;
}

However, in order to reduce the need for memory allocations when constructing the temporaries needed for the multiply-and-add operations, we could use an equivalent type with a larger internal cache like this:

cpp_int dot_product_2(const std::vector<cpp_int>& v1, const std::vector<cpp_int>& v2)
{
   if (v1.size() != v2.size())
      throw std::domain_error("Mismatched arguments");
   //
   // If we know that most of our data is of a certain range of values, then we can use a cpp_int type
   // with an internal cache large enough to *probably* not require an allocation:
   //
   number<cpp_int_backend<1024> > result;
   for (std::size_t i = 0; i < v1.size(); ++i)
      result += v1[i] * v2[i];
   //
   // We can't rely on the named-return-value optimisation here, since the variable being returned
   // is a different type to the return value.  However, since these are "equivalent" types we
   // can move the result to the return value and get all the expected move-optimisations should
   // variable result have dynamically allocated:
   //
   return std::move(result);
}

Before we compare performance though, there is one other obvious thing we could try. By simply declaring a variable for the result of the intermediate multiplications, and reusing that variable each time through the loop, we might also expect to greatly reduce the number of allocations required.

cpp_int dot_product_3(const std::vector<cpp_int>& v1, const std::vector<cpp_int>& v2)
{
   if (v1.size() != v2.size())
      throw std::domain_error("Mismatched arguments");

   cpp_int result, term;
   for (std::size_t i = 0; i < v1.size(); ++i)
   {
      //
      // Re-use the same variable for the result of the multiplications, rather than rely on 
      // an internally generated temporary.  Depending on the input data, this may allocate
      // a few times depending how soon in the input vector's we encounter the largest values.
      // In the best case though, or for fairly uniformly sized input data, we will allocate 
      // only once:
      //
      term = v1[i] * v2[i];
      result += term;
   }
   //
   // Named-return value optimisation (even better than a move):
   //
   return result;
}

We'll begin by comparing how many actual allocations were required to calculate the dot product of 1000 value vectors for random data with various bit counts:

Bit Count	Allocations Count Version 1	Allocations Count Version 2	Allocations Count Version 3
32	1^[1]	0	0
64	1001	1^[2]	1
128	1002	1	2
256	1002	1	3^[3]
512	1002	1	3
1024	1002	1001^[4]	3
^[1] Here everything fits within cpp_int's default internal cache, so no allocation are required. ^[2] A single allocation for the return value. ^[3] Here the input data is such that more than one allocation is required for the temporary. ^[4] At this point we exceed the internal cache of our internal calculation type.

Timings for the three methods are as follows (MSVC-16.8.0, x64):

Bit Count	time/ms Version 1	time/ms Version 2	time/ms Version 3
32	0.021	0.021	0.021
64	0.032	0.032	0.029
128	0.099	0.041	0.041
256	0.154	0.091	0.094
512	0.323	0.270	0.269
1024	0.998	0.995	0.949

As you can see, there is a sweet spot for middling-sized integers where we gain: if the values are small, then cpp_int's own internal cache is large enough anyway, and no allocation occur. Conversely, if the values are sufficiently large, then the cost of the actual arithmetic dwarfs the memory allocation time. In this particular case, carefully writing the code (version 3) is clearly at least as good as using a separate type with a larger cache. However, there may be times when it's not practical to re-write existing code, purely to optimise it for the multiprecision use case.

A typical example where we can't rewrite our code to avoid unnecessary allocations, occurs when we're calling an external routine. For example the arc length of an ellipse with radii a and b is given by:

L(a, b) = 4aE(k)

with:

k = √(1 - b²/a²)

where E(k) is the complete elliptic integral of the second kind, which is available as a template function ellint_2 in Boost.Math.

Naively, we might implement this for use with mpfr_float like this:

template <unsigned N>
number<mpfr_float_backend<N> > elliptic_arc_length_1(const number<mpfr_float_backend<N> >& a, const number<mpfr_float_backend<N> >& b)
{
   number<mpfr_float_backend<N> > k = sqrt(1 - b * b / (a * a));
   return 4 * a * boost::math::ellint_2(k);
}

But we might also try mixing our arithmetic types - regular dynamically allocated mpfr_float's for the interface to minimise memory footprint in our external storage, and statically allocated mpfr_float's for the internal arithmetic:

template <unsigned N>
number<mpfr_float_backend<N> > elliptic_arc_length_2(const number<mpfr_float_backend<N> >& a, const number<mpfr_float_backend<N> >& b)
{
   number<mpfr_float_backend<N, allocate_stack> > k = sqrt(1 - b * b / (a * a));
   return 4 * a * boost::math::ellint_2(k);
}

The performance comparisons are surprisingly stark:

N	`number<mpfr_float_backend<N>>` / ms	`number<mpfr_float_backend<N, allocate_stack>>` / ms
30	19.5	3.1
40	12.5	6.2
50	14.4	6.6
60	18.0	9.5
70	18.0	9.6
80	20.0	12.8

As before, the results are for MSVC-16.8.0/x64, and in point of fact, the results do not always favour non-allocating types so much, it does depend very much on the special function being called and/or the arguments used.

Backends With Optimized Mixed Precision Arithmetic

The following backends have at least some direct support for mixed-precision arithmetic, and therefore avoid creating unnecessary temporaries when using the interfaces above. Therefore when using these types it's more efficient to use mixed-precision arithmetic, than it is to explicitly cast the operands to the result type:

mpfr_float, gmp_float, cpp_int.

Boost C++ Libraries