Comparison of Cube Root Finding Algorithms

In the table below, the cube root of 28 was computed for three fundamental (built-in) types floating-point types, and one Boost.Multiprecision type cpp_bin_float using 50 decimal digit precision, using four algorithms.

The 'exact' answer was computed using a 100 decimal digit type:

cpp_bin_float_100 full_answer ("3.036588971875662519420809578505669635581453977248111123242141654169177268411884961770250390838097895");

Times were measured using Boost.Timer using class cpu_timer.

Its is the number of iterations taken to find the root.
Times is the CPU time-taken in arbitrary units.
Norm is a normalized time, in comparison to the quickest algorithm (with value 1.00).
Dis is the distance from the nearest representation of the 'exact' root in bits. Distance from the 'exact' answer is measured by using function Boost.Math float_distance. One or two bits distance means that all results are effectively 'correct'. Zero means 'exact' - the nearest representable value for the floating-point type.

The cube-root function is a simple function, and is a contrived example for root-finding. It does allow us to investigate some of the factors controlling efficiency that may be extrapolated to more complex functions.

The program used was root_finding_algorithms.cpp. 100000 evaluations of each floating-point type and algorithm were used and the CPU times were judged from repeat runs to have an uncertainty of 10 %. Comparing MSVC for double and long double (which are identical on this platform) may give a guide to uncertainty of timing.

The requested precision was set as follows:

Function	Precision Requested
TOMS748	numeric_limits<T>::digits - 2
Newton	floor(numeric_limits<T>::digits * 0.6)
Halley	floor(numeric_limits<T>::digits * 0.4)
Schröder	floor(numeric_limits<T>::digits * 0.4)

The C++ Standard cube root function std::cbrt is only defined for built-in or fundamental types, so cannot be used with any User-Defined floating-point types like Boost.Multiprecision. This, and that the cube function is so impeccably-behaved, allows the implementer to use many tricks to achieve a fast computation. On some platforms,std::cbrt appeared several times as quick as the more general boost::math::cbrt, on other platforms / compiler options boost::math::cbrt is noticeably faster. In general, the results are highly dependent on the code-generation / processor architecture selection compiler options used. One can assume that the standard library will have been compiled with options nearly optimal for the platform it was installed on, where as the user has more choice over the options used for Boost.Math. Pick something too general/conservative and performance suffers, while selecting options that make use of the latest instruction set opcodes speed's things up noticeably.
Two compilers in optimise mode were compared: GCC 4.9.1 using Netbeans IDS and Microsoft Visual Studio 2013 (Update 1) on the same hardware. The number of iterations seemed consistent, but the relative run-times surprisingly different.
boost::math::cbrt allows use with any user-defined floating-point type, conveniently Boost.Multiprecision. It too can take some advantage of the good-behaviour of the cube function, compared to the more general implementation in the nth root-finding examples. For example, it uses a polynomial approximation to generate a better guess than dividing the exponent by three, and can avoid the complex checks in Newton-Raphson iteration required to prevent the search going wildly off-track. For a known precision, it may also be possible to fix the number of iterations, allowing inlining and loop unrolling. It also algebraically simplifies the Halley steps leading to a big reduction in the number of floating point operations required compared to a "black box" implementation that calculates the derivatives separately and then combines them in the Halley code. Typically, it was found that computation using type double took a few times longer when using the various root-finding algorithms directly rather than the hand coded/optimized cbrt routine.
The importance of getting a good guess can be seen by the iteration count for the multiprecision case: here we "cheat" a little and use the cube-root calculated to double precision as the initial guess. The limitation of this tactic is that the range of possible (exponent) values may be less than the multiprecision type.
For fundamental (built-in) types, there was little to choose between the three derivative methods, but for cpp_bin_float, Newton-Raphson iteration was twice as fast. Note that the cube-root is an extreme test case as the cost of calling the functor is so cheap that the runtimes are largely dominated by the complexity of the iteration code.
Compiling with optimisation halved computation times, and any differences between algorithms became nearly negligible. The optimisation speed-up of the TOMS Algorithm 748: enclosing zeros of continuous functions was especially noticeable.
Using a multiprecision type like cpp_bin_float_50 for a precision of 50 decimal digits took a lot longer, as expected because most computation uses software rather than 64-bit floating-point hardware. Speeds are often more than 50 times slower.
Using cpp_bin_float_50, TOMS Algorithm 748: enclosing zeros of continuous functions was much slower showing the benefit of using derivatives. Newton-Raphson iteration was found to be twice as quick as either of the second-derivative methods: this is an extreme case though, the function and its derivatives are so cheap to compute that we're really measuring the complexity of the boilerplate root-finding code.
For multiprecision types only one or two extra iterations are needed to get the remaining 35 digits, whatever the algorithm used. (The time taken was of course much greater for these types).
Using a 100 decimal-digit type only doubled the time and required only a very few more iterations, so the cost of extra precision is mainly the underlying cost of computing more digits, not in the way the algorithm works. This confirms previous observations using NTL A Library for doing Number Theory high-precision types.

Program root_finding_algorithms.cpp, Microsoft Visual C++ version 14.1, Dinkumware standard library version 650, Win32, x86
1000000 evaluations of each of 5 root_finding algorithms.

Table 10.1. Cube root(28) for float, double, long double and cpp_bin_float_50

	float				double				long d				cpp50
Algorithm	Its	Times	Norm	Dis	Its	Times	Norm	Dis	Its	Times	Norm	Dis	Its	Times	Norm	Dis
cbrt	0	78125	1.0	0	0	62500	1.0	1	0	93750	1.0	1	0	11890625	1.1	0
TOMS748	8	468750	6.0	-1	11	906250	15.	2	11	906250	9.7	2	6	80859375	7.6	-2
Newton	5	203125	2.6	0	6	234375	3.8	0	6	187500	2.0	0	2	10640625	1.0	0
Halley	3	234375	3.0	0	4	265625	4.3	0	4	234375	2.5	0	2	26250000	2.5	0
Schröder	4	296875	3.8	0	5	281250	4.5	0	5	234375	2.5	0	2	32437500	3.0	0

Program root_finding_algorithms.cpp, GNU C++ version 8.2.0, GNU libstdc++ version 20180728, linux, x64
1000000 evaluations of each of 5 root_finding algorithms.

Table 10.2. Cube root(28) for float, double, long double and cpp_bin_float_50

	float				double				long d				cpp50
Algorithm	Its	Times	Norm	Dis	Its	Times	Norm	Dis	Its	Times	Norm	Dis	Its	Times	Norm	Dis
cbrt	0	30000	1.0	0	0	60000	1.0	0	0	70000	1.0	0	0	4440000	1.0	0
TOMS748	8	220000	7.3	-1	11	370000	6.2	2	10	580000	8.3	-1	6	28360000	6.7	-2
Newton	5	120000	4.0	0	6	130000	2.2	0	6	180000	2.6	0	2	4260000	1.0	-1
Halley	3	110000	3.7	0	4	140000	2.3	0	4	230000	3.3	0	2	9210000	2.2	0
Schröder	4	120000	4.0	0	5	140000	2.3	0	5	280000	4.0	0	2	11630000	2.7	0

Boost C++ Libraries