...one of the most highly
regarded and expertly designed C++ library projects in the
world.
— Herb Sutter and Andrei
Alexandrescu, C++
Coding Standards
In the table below, the cube root of 28 was computed for three fundamental (built-in) types floating-point types, and one Boost.Multiprecision type cpp_bin_float using 50 decimal digit precision, using four algorithms.
The 'exact' answer was computed using a 100 decimal digit type:
cpp_bin_float_100 full_answer ("3.036588971875662519420809578505669635581453977248111123242141654169177268411884961770250390838097895");
Times were measured using Boost.Timer
using class cpu_timer
.
The cube-root function is a simple function, and is a contrived example for root-finding. It does allow us to investigate some of the factors controlling efficiency that may be extrapolated to more complex functions.
The program used was root_finding_algorithms.cpp.
100000 evaluations of each floating-point type and algorithm were used and
the CPU times were judged from repeat runs to have an uncertainty of 10 %.
Comparing MSVC for double
and
long double
(which are identical on this platform) may give a guide to uncertainty of
timing.
The requested precision was set as follows:
Function |
Precision Requested |
---|---|
TOMS748 |
numeric_limits<T>::digits - 2 |
Newton |
floor(numeric_limits<T>::digits * 0.6) |
Halley |
floor(numeric_limits<T>::digits * 0.4) |
Schröder |
floor(numeric_limits<T>::digits * 0.4) |
std::cbrt
appeared several times as quick
as the more general boost::math::cbrt
,
on other platforms / compiler options boost::math::cbrt
is noticeably faster. In general, the results are highly dependent on
the code-generation / processor architecture selection compiler options
used. One can assume that the standard library will have been compiled
with options nearly optimal for the platform it
was installed on, where as the user has more choice over the options
used for Boost.Math. Pick something too general/conservative and performance
suffers, while selecting options that make use of the latest instruction
set opcodes speed's things up noticeably.
boost::math::cbrt
allows use with any
user-defined floating-point type, conveniently Boost.Multiprecision.
It too can take some advantage of the good-behaviour of the cube function,
compared to the more general implementation in the nth root-finding examples.
For example, it uses a polynomial approximation to generate a better
guess than dividing the exponent by three, and can avoid the complex
checks in Newton-Raphson
iteration required to prevent the search going wildly off-track.
For a known precision, it may also be possible to fix the number of iterations,
allowing inlining and loop unrolling. It also algebraically simplifies
the Halley steps leading to a big reduction in the number of floating
point operations required compared to a "black box" implementation
that calculates the derivatives separately and then combines them in
the Halley code. Typically, it was found that computation using type
double
took a few times
longer when using the various root-finding algorithms directly rather
than the hand coded/optimized cbrt
routine.
cpp_bin_float_50
for a precision of 50 decimal digits took a lot longer, as expected because
most computation uses software rather than 64-bit floating-point hardware.
Speeds are often more than 50 times slower.
cpp_bin_float_50
,
TOMS Algorithm
748: enclosing zeros of continuous functions was much slower
showing the benefit of using derivatives. Newton-Raphson
iteration was found to be twice as quick as either of the second-derivative
methods: this is an extreme case though, the function and its derivatives
are so cheap to compute that we're really measuring the complexity of
the boilerplate root-finding code.
Table 10.1. Cube root(28) for float, double, long double and cpp_bin_float_50
float |
double |
long d |
cpp50 |
|||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Algorithm |
Its |
Times |
Norm |
Dis |
Its |
Times |
Norm |
Dis |
Its |
Times |
Norm |
Dis |
Its |
Times |
Norm |
Dis |
||||
cbrt |
0 |
78125 |
1.0 |
0 |
0 |
62500 |
1.0 |
1 |
0 |
93750 |
1.0 |
1 |
0 |
11890625 |
1.1 |
0 |
||||
TOMS748 |
8 |
468750 |
6.0 |
-1 |
11 |
906250 |
15. |
2 |
11 |
906250 |
9.7 |
2 |
6 |
80859375 |
7.6 |
-2 |
||||
Newton |
5 |
203125 |
2.6 |
0 |
6 |
234375 |
3.8 |
0 |
6 |
187500 |
2.0 |
0 |
2 |
10640625 |
1.0 |
0 |
||||
Halley |
3 |
234375 |
3.0 |
0 |
4 |
265625 |
4.3 |
0 |
4 |
234375 |
2.5 |
0 |
2 |
26250000 |
2.5 |
0 |
||||
Schröder |
4 |
296875 |
3.8 |
0 |
5 |
281250 |
4.5 |
0 |
5 |
234375 |
2.5 |
0 |
2 |
32437500 |
3.0 |
0 |
Table 10.2. Cube root(28) for float, double, long double and cpp_bin_float_50
float |
double |
long d |
cpp50 |
|||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Algorithm |
Its |
Times |
Norm |
Dis |
Its |
Times |
Norm |
Dis |
Its |
Times |
Norm |
Dis |
Its |
Times |
Norm |
Dis |
||||
cbrt |
0 |
30000 |
1.0 |
0 |
0 |
60000 |
1.0 |
0 |
0 |
70000 |
1.0 |
0 |
0 |
4440000 |
1.0 |
0 |
||||
TOMS748 |
8 |
220000 |
7.3 |
-1 |
11 |
370000 |
6.2 |
2 |
10 |
580000 |
8.3 |
-1 |
6 |
28360000 |
6.7 |
-2 |
||||
Newton |
5 |
120000 |
4.0 |
0 |
6 |
130000 |
2.2 |
0 |
6 |
180000 |
2.6 |
0 |
2 |
4260000 |
1.0 |
-1 |
||||
Halley |
3 |
110000 |
3.7 |
0 |
4 |
140000 |
2.3 |
0 |
4 |
230000 |
3.3 |
0 |
2 |
9210000 |
2.2 |
0 |
||||
Schröder |
4 |
120000 |
4.0 |
0 |
5 |
140000 |
2.3 |
0 |
5 |
280000 |
4.0 |
0 |
2 |
11630000 |
2.7 |
0 |