Boost C++ Libraries

...one of the most highly regarded and expertly designed C++ library projects in the world.

Comparison of Nth-root Finding Algorithms

A second example compares four generalized nth-root finding algorithms for various n-th roots (5, 7 and 13) of a single value 28.0, for four floating-point types, `float`, `double`, ```long double``` and a Boost.Multiprecision type `cpp_bin_float_50`. In each case the target accuracy was set using our "recommended" accuracy limits (or at least limits that make a good starting point - which is likely to give close to full accuracy without resorting to unnecessary iterations).

Function

Precision Requested

TOMS748

numeric_limits<T>::digits - 2

Newton

floor(numeric_limits<T>::digits * 0.6)

Halley

floor(numeric_limits<T>::digits * 0.4)

Schröder

floor(numeric_limits<T>::digits * 0.4)

Tests used Microsoft Visual Studio 2013 (Update 1) and GCC 4.9.1 using source code root_n_finding_algorithms.cpp.

The timing uncertainty (especially using MSVC) is at least 5% of normalized time 'Norm'.

To pick out the 'best' and 'worst' algorithms are highlighted in blue and red. More than one result can be 'best' when normalized times are indistinguishable within the uncertainty.

Program ..\example\root_n_finding_algorithms.cpp, Microsoft Visual C++ version 14.1, Dinkumware standard library version 650, Win32 Compiled in optimise mode., _X86_SSE2

Fraction of full accuracy 1

Table 10.3. 5th root(28) for float, double, long double and cpp_bin_float_50 types, using _X86_SSE2

 float double long d cpp50 Algo Its Times Norm Dis Its Times Norm Dis Its Times Norm Dis Its Times Norm Dis TOMS748 7 457 2.00 0 11 860 3.54 1 11 806 3.02 1 12 226875 8.11 0 Newton 3 228 1.00 0 4 243 1.00 -1 4 298 1.12 -1 6 27968 1.00 0 Halley 2 250 1.10 0 3 268 1.10 0 3 267 1.00 0 4 52812 1.89 0 Schröder 2 256 1.12 0 3 271 1.12 -1 3 270 1.01 -1 4 61406 2.20 0

Table 10.4. 7th root(28) for float, double, long double and cpp_bin_float_50 types, using _X86_SSE2

 float double long d cpp50 Algo Its Times Norm Dis Its Times Norm Dis Its Times Norm Dis Its Times Norm Dis TOMS748 12 825 3.06 1 15 1145 4.06 2 15 1159 4.17 2 14 295781 8.12 0 Newton 5 270 1.00 0 6 282 1.00 0 6 278 1.00 0 8 36406 1.00 0 Halley 4 303 1.12 0 5 329 1.17 0 5 335 1.21 0 6 78281 2.15 0 Schröder 5 340 1.26 0 6 432 1.53 0 6 367 1.32 0 7 85156 2.34 0

Table 10.5. 11th root(28) for float, double, long double and cpp_bin_float_50 types, using _X86_SSE2

 float double long d cpp50 Algo Its Times Norm Dis Its Times Norm Dis Its Times Norm Dis Its Times Norm Dis TOMS748 12 714 3.16 -2 14 909 4.19 2 14 793 3.69 2 17 211718 9.28 2 Newton 6 226 1.00 0 7 217 1.00 0 7 215 1.00 0 9 22812 1.00 0 Halley 4 262 1.16 -1 5 260 1.20 0 5 260 1.21 0 6 40781 1.79 0 Schröder 6 332 1.47 0 7 314 1.45 0 7 310 1.44 0 8 67187 2.95 0

Program root_n_finding_algorithms.cpp, Microsoft Visual C++ version 12.0, Dinkumware standard library version 610, Win32 Compiled in optimise mode., _X64_AVX

Fraction of full accuracy 1

Table 10.6. 5th root(28) for float, double, long double and cpp_bin_float_50 types, using _X64_AVX

 float double long d cpp50 Algo Its Times Norm Dis Its Times Norm Dis Its Times Norm Dis Its Times Norm Dis TOMS748 7 239 1.50 0 11 451 2.53 1 11 439 2.49 1 12 90312 7.51 0 Newton 3 159 1.00 0 4 178 1.00 -1 4 176 1.00 -1 6 12031 1.00 0 Halley 2 168 1.06 0 3 203 1.14 0 3 198 1.13 0 4 20937 1.74 0 Schröder 2 173 1.09 0 3 206 1.16 -1 3 203 1.15 -1 4 26250 2.18 0

Table 10.7. 7th root(28) for float, double, long double and cpp_bin_float_50 types, using _X64_AVX

 float double long d cpp50 Algo Its Times Norm Dis Its Times Norm Dis Its Times Norm Dis Its Times Norm Dis TOMS748 12 385 2.19 1 15 635 3.13 2 15 621 3.17 2 14 114843 6.81 0 Newton 5 176 1.00 0 6 203 1.00 0 6 196 1.00 0 8 16875 1.00 0 Halley 4 209 1.19 0 5 254 1.25 0 5 246 1.26 0 6 32343 1.92 0 Schröder 5 223 1.27 0 6 273 1.34 0 6 275 1.40 0 7 45156 2.68 0

Table 10.8. 11th root(28) for float, double, long double and cpp_bin_float_50 types, using _X64_AVX

 float double long d cpp50 Algo Its Times Norm Dis Its Times Norm Dis Its Times Norm Dis Its Times Norm Dis TOMS748 12 467 2.42 -2 14 648 3.06 2 14 640 2.99 2 17 170000 8.85 2 Newton 6 193 1.00 0 7 212 1.00 0 7 214 1.00 0 9 19218 1.00 0 Halley 4 209 1.08 -1 5 256 1.21 0 5 250 1.17 0 6 32656 1.70 0 Schröder 6 248 1.28 0 7 306 1.44 0 7 298 1.39 0 8 53437 2.78 0

Program ..\example\root_n_finding_algorithms.cpp, GNU C++ version 7.1.0, GNU libstdc++ version 20170502, Win32 Compiled in optimise mode., _X64_SSE2

Fraction of full accuracy 1

Table 10.9. 5th root(28) for float, double, long double and cpp_bin_float_50 types, using _X64_SSE2

 float double long d cpp50 Algo Its Times Norm Dis Its Times Norm Dis Its Times Norm Dis Its Times Norm Dis TOMS748 7 206 2.24 0 11 460 4.04 1 9 554 4.40 0 12 57656 8.39 0 Newton 3 92 1.00 0 4 114 1.00 -1 5 126 1.00 0 6 6875 1.00 0 Halley 2 106 1.15 0 3 134 1.18 0 3 178 1.41 0 4 12500 1.82 0 Schröder 2 126 1.37 0 3 143 1.25 -1 3 198 1.57 0 4 15312 2.23 0

Table 10.10. 7th root(28) for float, double, long double and cpp_bin_float_50 types, using _X64_SSE2

 float double long d cpp50 Algo Its Times Norm Dis Its Times Norm Dis Its Times Norm Dis Its Times Norm Dis TOMS748 12 345 2.09 1 15 615 3.14 2 13 875 3.98 0 14 70937 7.32 0 Newton 5 165 1.00 0 6 196 1.00 0 7 220 1.00 0 8 9687 1.00 0 Halley 4 193 1.17 0 5 239 1.22 0 5 298 1.35 0 6 19062 1.97 0 Schröder 5 217 1.32 0 6 270 1.38 0 6 367 1.67 0 7 27343 2.82 0

Table 10.11. 11th root(28) for float, double, long double and cpp_bin_float_50 types, using _X64_SSE2

 float double long d cpp50 Algo Its Times Norm Dis Its Times Norm Dis Its Times Norm Dis Its Times Norm Dis TOMS748 12 412 2.15 -2 14 646 2.96 2 14 1054 4.22 1 17 107187 9.53 2 Newton 6 192 1.00 0 7 218 1.00 0 7 250 1.00 0 9 11250 1.00 0 Halley 4 200 1.04 -1 5 243 1.11 0 5 345 1.38 0 6 19687 1.75 0 Schröder 6 254 1.32 0 7 321 1.47 0 7 471 1.88 0 8 33281 2.96 0

Some tentative conclusions can be drawn from this limited exercise.

• Perhaps surprisingly, there is little difference between the various algorithms for fundamental (built-in) types floating-point types. Using the first derivatives (Newton-Raphson iteration) is usually the best, but while the improvement over the no-derivative TOMS Algorithm 748: enclosing zeros of continuous functions is considerable in number of iterations, but little in execution time. This reflects the fact that the function we are finding the root for is trivial to evaluate, so runtimetimes are dominated by the time taken by the boilerplate code in each method.
• The extra cost of evaluating the second derivatives (Halley or Schröder) is usually too much for any net benefit: as with the cube root, these functors are so cheap to evaluate that the runtime is largely dominated by the complexity of the root finding method.
• For a Boost.Multiprecision floating-point type, the Newton-Raphson iteration is a clear winner with a several-fold gain over TOMS Algorithm 748: enclosing zeros of continuous functions, and again no improvement from the second-derivative algorithms.
• The run-time of 50 decimal-digit Boost.Multiprecision is about 30-fold greater than `double`.
• The column 'dis' showing the number of bits distance from the correct result. The Newton-Raphson algorithm shows a bit or two better accuracy than TOMS Algorithm 748: enclosing zeros of continuous functions.
• The goodness of the 'guess' is especially crucial for Boost.Multiprecision. Separate experiments show that evaluating the 'guess' using `double` allows convergence to the final exact result in one or two iterations. So in this contrived example, crudely dividing the exponent by N for a 'guess', it would be far better to use a `pow<double>` or , if more precise `pow<long double>`, function to estimate a 'guess'. The limitation of this tactic is that the range of possible (exponent) values may be less than the multiprecision type.
• Using floating-point extension SSE2 instructions made a modest ten-percent speedup.
• Using MSVC, there was some improvement using 64-bit, markedly for Boost.Multiprecision.
• The GCC compiler 4.9.1 using 64-bit was at least five-folder faster that 32-bit, apparently reflecting better optimization.

Clearly, your mileage will vary, but in summary, Newton-Raphson iteration seems the first choice of algorithm, and effort to find a good 'guess' the first speed-up target, especially for Boost.Multiprecision. And of course, compiler optimisation is crucial for speed.

 Copyright © 2006-2021 Nikhar Agrawal, Anton Bikineev, Matthew Borland, Paul A. Bristow, Marco Guazzone, Christopher Kormanyos, Hubert Holin, Bruno Lalande, John Maddock, Evan Miller, Jeremy Murphy, Matthew Pulver, Johan Råde, Gautam Sewani, Benjamin Sobotta, Nicholas Thompson, Thijs van den Berg, Daryle Walker and Xiaogang Zhang Distributed under the Boost Software License, Version 1.0. (See accompanying file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)