Why Hypot() Function Is So Slow

Why hypot() function is so slow?

It's not a simple sqrt function. You should check this link for the implementation of the algorithm: http://www.koders.com/c/fid7D3C8841ADC384A5F8DE0D081C88331E3909BF3A.aspx

It has while loop to check for convergence

/* Slower but safer algorithm due to Moler and Morrison.  Never
produces any intermediate result greater than roughly the
larger of X and Y. Should converge to machine-precision
accuracy in 3 iterations. */

double r = ratio*ratio, t, s, p = abig, q = asmall;

do {
t = 4. + r;
if (t == 4.)
break;
s = r / t;
p += 2. * s * p;
q *= s;
r = (q / p) * (q / p);
} while (1);

EDIT (Update from J.M):

Here is the original Moler-Morrison paper, and here is a nice follow-up due to Dubrulle.

Why does C have a `hypot` function?

The hypot function performs this calculation while avoiding overflows.

Section 7.12.7.3p2 of the C standard regarding the hypot function states:

The hypot functions compute the square root of the sum of
the squares of x and y, without undue overflow or underflow. A
range error may occur.

Libc hypot function seems to return incorrect results for double type... why?

  • 0.11920465852172932 is represented by 0x1.e84324de1b576p-4 (as a double)
  • 0.11920465852172930 is represented by 0x1.e84324de1b575p-4 (as a double)
  • 0.119204658521729311251 is the long-double result, which we can assume is correct to a couple more decimal places. i.e. the exact result is closer to rounded up result.

Those FP bit-patterns differ only in the low bit of the mantissa (aka significand), and the exact result is between them. So they each have less than 1 ulp of rounding error, achieving what typical implementations (including glibc) aim for.

Unlike IEEE-754 "basic" operations (add/sub/mul/div/sqrt), hypot is not required to be "correctly rounded". That means <= 0.5 ulp of error. Achieving that would be much slower for operations the HW doesn't provide directly. (e.g. do extended-precision calculation with at least a couple extra definitely-correct bits, so you can round to the nearest double, like the hardware does for basic operations)

It happens that in this case, the naive calculation method produced the correctly-rounded result while glibc's "safe" implementation of std::hypot (that has to avoid underflow when squaring small numbers before adding) produced a result with >0.5 but <1 ulp of error.


You didn't specify whether you were using MSVC in 32-bit mode.

Presumably 32-bit mode would be using x87 for FP math, giving extra temporary precision. Although some MSVC versions' CRT code sets the x87 FPU's internal precision to round to 53-bit mantissa after every operation, so it behaves like SSE2 using actual double, except with a wider exponent range. See Bruce Dawson's blog post.

So I don't know if there's any reason beyond luck that MSVC's std::hypot got the correctly-rounded result for this.

Note that long double in MSVC is the same type as 64-bit double; that C++ implementation doesn't expose x86 / x86-64's 80-bit hardware extended-precision type. (64-bit mantissa).

When to use `std::hypot(x,y)` over `std::sqrt(x*x + y*y)`

The answer is in the documentation you quoted

Computes the square root of the sum of the squares of x and y, without undue overflow or underflow at intermediate stages of the computation.

If x*x + y*y overflows, then if you carry out the calculation manually, you'll get the wrong answer. If you use std::hypot, however, it guarantees that the intermediate calculations will not overflow.

You can see an example of this disparity here.

If you are working with numbers which you know will not overflow the relevant representation for your platform, you can happily use the naive version.

Fastest hypotenuse in javascript?

In ECMAScript ES6 you can use Math.hypot: