...one of the most highly
regarded and expertly designed C++ library projects in the
world.
— Herb Sutter and Andrei
Alexandrescu, C++
Coding Standards
In the table below, the cube root of 28 was computed for three fundamental types floating-point types, and one Boost.Multiprecision type cpp_bin_float using 50 decimal digit precision, using four algorithms.
The 'exact' answer was computed using a 100 decimal digit type:
cpp_bin_float_100 full_answer ("3.036588971875662519420809578505669635581453977248111123242141654169177268411884961770250390838097895");
Times were measured using Boost.Timer
using class cpu_timer
.
The cube-root function is a simple function, and is a contrived example for root-finding. It does allow us to investigate some of the factors controlling efficiency that may be extrapolated to more complex functions.
The program used was root_finding_algorithms.cpp.
100000 evaluations of each floating-point type and algorithm were used
and the CPU times were judged from repeat runs to have an uncertainty of
10 %. Comparing MSVC for double
and long double
(which are identical on this patform) may give a guide to uncertainty of
timing.
The requested precision was set as follows:
Function |
Precision Requested |
---|---|
TOMS748 |
numeric_limits<T>::digits - 2 |
Newton |
floor(numeric_limits<T>::digits * 0.6) |
Halley |
floor(numeric_limits<T>::digits * 0.4) |
Schröder |
floor(numeric_limits<T>::digits * 0.4) |
std::cbrt
appeared several times as quick
as the more general boost::math::cbrt
,
on other platforms / compiler options boost::math::cbrt
is noticeably faster. In general, the results are highly dependent
on the code-generation / processor architecture selection compiler
options used. One can assume that the standard library will have been
compiled with options nearly optimal for the platform
it was installed on, where as the user has more choice over the options
used for Boost.Math. Pick something too general/conservative and performance
suffers, while selecting options that make use of the latest instruction
set opcodes speed's things up noticeably.
boost::math::cbrt
allows use with any
user-defined floating-point type, conveniently Boost.Multiprecision.
It too can take some advantage of the good-behaviour of the cube function,
compared to the more general implementation in the nth root-finding
examples. For example, it uses a polynomial approximation to generate
a better guess than dividing the exponent by three, and can avoid the
complex checks in Newton-Raphson
iteration required to prevent the search going wildly off-track.
For a known precision, it may also be possible to fix the number of
iterations, allowing inlining and loop unrolling. It also algebraically
simplifies the Halley steps leading to a big reduction in the number
of floating point operations required compared to a "black box"
implementation that calculates the derivatives seperately and then
combines them in the Halley code. Typically, it was found that computation
using type double
took
a few times longer when using the various root-finding algorithms directly
rather than the hand coded/optimized cbrt
routine.
cpp_bin_float_50
for a precision of 50 decimal digits took a lot longer, as expected
because most computation uses software rather than 64-bit floating-point
hardware. Speeds are often more than 50 times slower.
cpp_bin_float_50
,
TOMS Algorithm
748: enclosing zeros of continuous functions was much slower
showing the benefit of using derivatives. Newton-Raphson
iteration was found to be twice as quick as either of the second-derivative
methods: this is an extreme case though, the function and its derivatives
are so cheap to compute that we're really measuring the complexity
of the boilerplate root-finding code.
Table 12.1. Cube root(28) for float, double, long double and cpp_bin_float_50
float |
double |
long d |
cpp50 |
|||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Algorithm |
Its |
Times |
Norm |
Dis |
Its |
Times |
Norm |
Dis |
Its |
Times |
Norm |
Dis |
Its |
Times |
Norm |
Dis |
||||
cbrt |
0 |
46875 |
1.0 |
0 |
0 |
46875 |
1.0 |
1 |
0 |
46875 |
1.0 |
1 |
0 |
4906250 |
1.1 |
0 |
||||
TOMS748 |
8 |
234375 |
5.0 |
-1 |
11 |
437500 |
9.3 |
2 |
11 |
437500 |
9.3 |
2 |
7 |
66218750 |
15. |
-2 |
||||
Newton |
5 |
109375 |
2.3 |
0 |
6 |
125000 |
2.7 |
0 |
6 |
140625 |
3.0 |
0 |
2 |
4531250 |
1.0 |
0 |
||||
Halley |
3 |
125000 |
2.7 |
0 |
4 |
156250 |
3.3 |
0 |
4 |
156250 |
3.3 |
0 |
2 |
10625000 |
2.3 |
0 |
||||
Schröder |
4 |
140625 |
3.0 |
0 |
5 |
187500 |
4.0 |
0 |
5 |
203125 |
4.3 |
0 |
2 |
13109375 |
2.9 |
0 |
Table 12.2. Cube root(28) for float, double, long double and cpp_bin_float_50
float |
double |
long d |
cpp50 |
|||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Algorithm |
Its |
Times |
Norm |
Dis |
Its |
Times |
Norm |
Dis |
Its |
Times |
Norm |
Dis |
Its |
Times |
Norm |
Dis |
||||
cbrt |
0 |
46875 |
1.0 |
0 |
0 |
46875 |
1.0 |
0 |
0 |
46875 |
1.0 |
0 |
0 |
3500000 |
1.1 |
0 |
||||
TOMS748 |
8 |
187500 |
4.0 |
-1 |
11 |
406250 |
8.7 |
2 |
10 |
609375 |
13. |
-1 |
7 |
44531250 |
14. |
-2 |
||||
Newton |
5 |
93750 |
2.0 |
0 |
6 |
109375 |
2.3 |
0 |
6 |
171875 |
3.7 |
0 |
2 |
3140625 |
1.0 |
-1 |
||||
Halley |
3 |
93750 |
2.0 |
0 |
4 |
125000 |
2.7 |
0 |
4 |
218750 |
4.7 |
0 |
2 |
7171875 |
2.3 |
0 |
||||
Schröder |
4 |
109375 |
2.3 |
0 |
5 |
171875 |
3.7 |
0 |
5 |
281250 |
6.0 |
0 |
2 |
8703125 |
2.8 |
0 |