How to Multiply a 64 Bit Integer by a Fraction in C++ While Minimizing Error

How to multiply a 64 bit integer by a fraction in C++ while minimizing error?

Improving on provided answer (this reduces overflow when b is big):

int64_t muldiv64(const int64_t a, const int64_t b, const int64_t d)
{
/* find the integer and remainder portions of x/d */
const int64_t diva = a / d;
const int64_t moda = a % d;
const int64_t divb = b / d;
const int64_t modb = b % d;

return diva * b + moda * divb + moda * modb / d;
}

there is no need to write weird code to avoid using the modulus operation: the compiler can do the substitution and you can have a more readable code.

edit:
Any more complicated code is probably not worth to look into. If more precision is needed probably the good idea is moving to 128 bit arithmetic or use arbitrary precision integer libraries (see http://sourceforge.net/projects/cpp-bigint/)

How can I multiply and divide 64-bit ints accurately?

Write a = q*d + r with |r| < |d| (I'm assuming d != 0, otherwise the computation is meaningless anyway). Then (a*b*c)/d = q*b*c + (r*b*c)/d. If q*b*c overflows, the entire computation would overflow anyway, so either you don't care, or you have to check for overflow. r*b*c might still overflow, so we again use the same method to avoid overflow,

int64_t q = a/d, r = a%d;
int64_t part1 = q*b*c;
int64_t q1 = (r*b)/d, r1 = (r*b)%d;
return part1 + q1*c + (r1*c)/d;

Fast method to multiply integer by proper fraction without floats or overflow

I've now benchmarked several possible solutions, including weird/clever ones from other sources like combining 32-bit div & mod & add or using peasant math, and here are my conclusions:

First, if you are only targeting Windows and using VSC++, just use MulDiv(). It is quite fast (faster than directly using 64-bit variables in my tests) while still being just as accurate and rounding the result for you. I could not find any superior method to do this kind of thing on Windows with VSC++, even taking into account restrictions like unsigned-only and N <= D.

However, in my case having a function with deterministic results even across platforms is even more important than speed. On another platform I was using as a test, the 64-bit divide is much, much slower than the 32-bit one when using the 32-bit libraries, and there is no MulDiv() to use. The 64-bit divide on this platform takes ~26x as long as a 32-bit divide (yet the 64-bit multiply is just as fast as the 32-bit version...).

So if you have a case like me, I will share the best results I got, which turned out to be just optimizations of chux's answer.

Both of the methods I will share below make use of the following function (though the compiler-specific intrinsics only actually helped in speed with MSVC in Windows):

inline u32 bitsRequired(u32 val)
{
#ifdef _MSC_VER
DWORD r = 0;
_BitScanReverse(&r, val | 1);
return r+1;
#elif defined(__GNUC__) || defined(__clang__)
return 32 - __builtin_clz(val | 1);
#else
int r = 1;
while (val >>= 1) ++r;
return r;
#endif
}

Now, if x is a constant that's 16-bit in size or smaller and you can pre-compute the bits required, I found the best results in speed and accuracy from this function:

u32 multConstByPropFrac(u32 x, u32 nMaxBits, u32 n, u32 d)
{
//assert(nMaxBits == 32 - bitsRequired(x));
//assert(n <= d);
const int bitShift = bitsRequired(n) - nMaxBits;
if( bitShift > 0 )
{
n >>= bitShift;
d >>= bitShift;
}

// Remove the + d/2 part if don't need rounding
return (x * n + d/2) / d;
}

On the platform with the slow 64-bit divide, the above function ran ~16.75x as fast as return ((u64)x * n + d/2) / d; and with an average 99.999981% accuracy (comparing difference in return value from expected to range of x, i.e. returning +/-1 from expected when x is 2048 would be 100 - (1/2048 * 100) = 99.95% accurate) when testing it with a million or so randomized inputs where roughly half of them would normally have been an overflow. Worst-case accuracy was 99.951172%.

For the general use case, I found the best results from the following (and without needing to restrict N <= D to boot!):

u32 scaleToFraction(u32 x, u32 n, u32 d)
{
u32 bits = bitsRequired(x);
int bitShift = bits - 16;
if( bitShift < 0 ) bitShift = 0;
int sh = bitShift;
x >>= bitShift;

bits = bitsRequired(n);
bitShift = bits - 16;
if( bitShift < 0 ) bitShift = 0;
sh += bitShift;
n >>= bitShift;

bits = bitsRequired(d);
bitShift = bits - 16;
if( bitShift < 0 ) bitShift = 0;
sh -= bitShift;
d >>= bitShift;

// Remove the + d/2 part if don't need rounding
u32 r = (x * n + d/2) / d;
if( sh < 0 )
r >>= (-sh);
else //if( sh > 0 )
r <<= sh;

return r;
}

On the platform with the slow 64-bit divide, the above function ran ~18.5x as fast as using 64-bit variables and with 99.999426% average and 99.947479% worst-case accuracy.

I was able to get more speed or more accuracy by messing with the shifting, such as trying to not shift all the way down to 16-bit if it wasn't strictly necessary, but any increase in speed came at a high cost in accuracy and vice versa.

None of the other methods I tested came even close to the same speed or accuracy, most being slower than just using the 64-bit method or having huge loss in precision, so not worth going into.

Obviously, no guarantee that anyone else will get similar results on other platforms!

EDIT: Replaced some bit-twiddling hacks with plain code that actually ran faster anyway by letting the compiler do its job.

Most accurate way to do a combined multiply-and-divide operation in 64-bit?

Since this is tagged Visual C++ I'll give a solution that abuses MSVC-specific intrinsics.

This example is fairly complicated. It's a highly simplified version of the same algorithm that is used by GMP and java.math.BigInteger for large division.

Although I have a simpler algorithm in mind, it's probably about 30x slower.

This solution has the following constraints/behavior:

  • It requires x64. It will not compile on x86.
  • The quotient is not zero.
  • The quotient saturates if it overflows 64-bits.

Note that this is for the unsigned integer case. It's trivial to build a wrapper around this to make it work for signed cases as well. This example should also produce correctly truncated results.

This code is not fully tested. However, it has passed all the tests cases that I've thrown at it.
(Even cases that I've intentionally constructed to try to break the algorithm.)

#include <intrin.h>

uint64_t muldiv2(uint64_t a, uint64_t b, uint64_t c){
// Normalize divisor
unsigned long shift;
_BitScanReverse64(&shift,c);
shift = 63 - shift;

c <<= shift;

// Multiply
a = _umul128(a,b,&b);
if (((b << shift) >> shift) != b){
cout << "Overflow" << endl;
return 0xffffffffffffffff;
}
b = __shiftleft128(a,b,shift);
a <<= shift;

uint32_t div;
uint32_t q0,q1;
uint64_t t0,t1;

// 1st Reduction
div = (uint32_t)(c >> 32);
t0 = b / div;
if (t0 > 0xffffffff)
t0 = 0xffffffff;
q1 = (uint32_t)t0;
while (1){
t0 = _umul128(c,(uint64_t)q1 << 32,&t1);
if (t1 < b || (t1 == b && t0 <= a))
break;
q1--;
// cout << "correction 0" << endl;
}
b -= t1;
if (t0 > a) b--;
a -= t0;

if (b > 0xffffffff){
cout << "Overflow" << endl;
return 0xffffffffffffffff;
}

// 2nd reduction
t0 = ((b << 32) | (a >> 32)) / div;
if (t0 > 0xffffffff)
t0 = 0xffffffff;
q0 = (uint32_t)t0;

while (1){
t0 = _umul128(c,q0,&t1);
if (t1 < b || (t1 == b && t0 <= a))
break;
q0--;
// cout << "correction 1" << endl;
}

// // (a - t0) gives the modulus.
// a -= t0;

return ((uint64_t)q1 << 32) | q0;
}

Note that if you don't need a perfectly truncated result, you can remove the last loop completely. If you do this, the answer will be no more than 2 larger than the correct quotient.

Test Cases:

cout << muldiv2(4984198405165151231,6132198419878046132,9156498145135109843) << endl;
cout << muldiv2(11540173641653250113, 10150593219136339683, 13592284235543989460) << endl;
cout << muldiv2(449033535071450778, 3155170653582908051, 4945421831474875872) << endl;
cout << muldiv2(303601908757, 829267376026, 659820219978) << endl;
cout << muldiv2(449033535071450778, 829267376026, 659820219978) << endl;
cout << muldiv2(1234568, 829267376026, 1) << endl;
cout << muldiv2(6991754535226557229, 7798003721120799096, 4923601287520449332) << endl;
cout << muldiv2(9223372036854775808, 2147483648, 18446744073709551615) << endl;
cout << muldiv2(9223372032559808512, 9223372036854775807, 9223372036854775807) << endl;
cout << muldiv2(9223372032559808512, 9223372036854775807, 12) << endl;
cout << muldiv2(18446744073709551615, 18446744073709551615, 9223372036854775808) << endl;

Output:

3337967539561099935
8618095846487663363
286482625873293138
381569328444
564348969767547451
1023786965885666768
11073546515850664288
1073741824
9223372032559808512
Overflow
18446744073709551615
Overflow
18446744073709551615

Divide 64-bit integers as though the dividend is shifted left 64 bits, without having 128-bit types

This can be done without a multi-word division

Suppose we want to do ⌊264 × xy⌋ then we can transform the expression like this

Unicode math: ⌊(2^64 x)/y⌋=⌊(⌊2^64/y⌋+{2^64/y})x⌋=⌊2^64/y⌋x+⌊{2^64/y}x┤

The first term is trivially done as ((-y)/y + 1)*x as per this question How to compute 2⁶⁴/n in C?

The second term is equivalent to (264 % y)/y*x and is a little bit trickier. I've tried various ways but all need 128-bit multiplication and 128/64 division if using only integer operations. That can be done using the algorithms to calculate MulDiv64(a, b, c) = a*b/c in the below questions

  • Most accurate way to do a combined multiply-and-divide operation in 64-bit?
  • How to multiply a 64 bit integer by a fraction in C++ while minimizing error?
  • (a * b) / c MulDiv and dealing with overflow from intermediate multiplication
  • How can I multiply and divide 64-bit ints accurately?

However they may be slow, and if you have those functions you calculate the whole expression more easily like MulDiv64(x, UINT64_MAX, y) + x/y + something without messing up with the above transformation

Using long double seems to be the easiest way if it has 64 bits of precision or more. So now it can be done by (264 % y)/(long double)y*x

uint64_t divHi64(uint64_t x, uint64_t y) {
uint64_t mod_y = UINT64_MAX % y + 1;
uint64_t result = ((-y)/y + 1)*x;
if (mod_y != y)
result += (uint64_t)((mod_y/(long double)y)*x);
return result;
}

The overflow check was omitted for simplification. A slight modification will be needed if you need signed division


If you're targeting 64-bit Windows but you're using MSVC which doesn't have __int128 then now it has a 128-bit/64-bit divide intrinsic which simplifies the job significantly without a 128-bit integer type. You still need to handle overflow though because the div instruction will throw an exception on that case

uint64_t divHi64(uint64_t x, uint64_t y) {
uint64_t high, remainder;
uint64_t low = _umul128(UINT64_MAX, y, &high);
if (x <= high /* && 0 <= low */)
return _udiv128(x, 0, y, &remainder);
// overflow case
errno = EOVERFLOW;
return 0;
}

The overflow checking above is can be simplified to checking whether x < y, because if x >= y then the result will overflow


See also

  • Efficient Multiply/Divide of two 128-bit Integers on x86 (no 64-bit)
  • Efficient computation of 2**64 / divisor via fast floating-point reciprocal

Exhaustive tests on 16/16 bit division shows that my solution works correctly for all cases. However you do need double even though float has more than 16 bits of precision, otherwise occasionally a less-than-one result will be returned. It may be fixed by adding an epsilon value before truncating: (uint64_t)((mod_y/(long double)y)*x + epsilon). That means you'll need __float128 (or the -m128bit-long-double option) in gcc for precise 64/64-bit output if you don't correct the result with epsilon. However that type is available on 32-bit targets, unlike __int128 which is supported only on 64-bit targets, so life will be a bit easier. Of course you can use the function as-is if just a very close result is needed

Below is the code I've used for verifying

#include <thread>
#include <iostream>
#include <limits>
#include <climits>
#include <mutex>

std::mutex print_mutex;

#define MAX_THREAD 8
#define NUM_BITS 27
#define CHUNK_SIZE (1ULL << NUM_BITS)

// typedef uint32_t T;
// typedef uint64_t T2;
// typedef double D;
typedef uint64_t T;
typedef unsigned __int128 T2; // the type twice as wide as T
typedef long double D;
// typedef __float128 D;
const D epsilon = 1e-14;
T divHi(T x, T y) {
T mod_y = std::numeric_limits<T>::max() % y + 1;
T result = ((-y)/y + 1)*x;
if (mod_y != y)
result += (T)((mod_y/(D)y)*x + epsilon);
return result;
}

void testdiv(T midpoint)
{
T begin = midpoint - CHUNK_SIZE/2;
T end = midpoint + CHUNK_SIZE/2;
for (T i = begin; i != end; i++)
{
T x = i & ((1 << NUM_BITS/2) - 1);
T y = CHUNK_SIZE/2 - (i >> NUM_BITS/2);
// if (y == 0)
// continue;
auto q1 = divHi(x, y);
T2 q2 = ((T2)x << sizeof(T)*CHAR_BIT)/y;
if (q2 != (T)q2)
{
// std::lock_guard<std::mutex> guard(print_mutex);
// std::cout << "Overflowed: " << x << '&' << y << '\n';
continue;
}
else if (q1 != q2)
{
std::lock_guard<std::mutex> guard(print_mutex);
std::cout << x << '/' << y << ": " << q1 << " != " << (T)q2 << '\n';
}
}
std::lock_guard<std::mutex> guard(print_mutex);
std::cout << "Done testing [" << begin << ", " << end << "]\n";
}

uint16_t divHi16(uint32_t x, uint32_t y) {
uint32_t mod_y = std::numeric_limits<uint16_t>::max() % y + 1;
int result = ((((1U << 16) - y)/y) + 1)*x;
if (mod_y != y)
result += (mod_y/(double)y)*x;
return result;
}

void testdiv16(uint32_t begin, uint32_t end)
{
for (uint32_t i = begin; i != end; i++)
{
uint32_t y = i & 0xFFFF;
if (y == 0)
continue;
uint32_t x = i & 0xFFFF0000;
uint32_t q2 = x/y;
if (q2 > 0xFFFF) // overflowed
continue;

uint16_t q1 = divHi16(x >> 16, y);
if (q1 != q2)
{
std::lock_guard<std::mutex> guard(print_mutex);
std::cout << x << '/' << y << ": " << q1 << " != " << q2 << '\n';
}
}
}

int main()
{
std::thread t[MAX_THREAD];
for (int i = 0; i < MAX_THREAD; i++)
t[i] = std::thread(testdiv, std::numeric_limits<T>::max()/MAX_THREAD*i);
for (int i = 0; i < MAX_THREAD; i++)
t[i].join();

std::thread t2[MAX_THREAD];
constexpr uint32_t length = std::numeric_limits<uint32_t>::max()/MAX_THREAD;
uint32_t begin, end = length;

for (int i = 0; i < MAX_THREAD - 1; i++)
{
begin = end;
end += length;
t2[i] = std::thread(testdiv16, begin, end);
}
t2[MAX_THREAD - 1] = std::thread(testdiv, end, UINT32_MAX);
for (int i = 0; i < MAX_THREAD; i++)
t2[i].join();
std::cout << "Done\n";
}

How can I compute a * b / c when both a and b are smaller than c, but a * b overflows?

I've established a solution which work in O(1) complexity (no loops):

typedef unsigned long long uint;

typedef struct
{
uint n;
uint d;
}
fraction;

uint func(uint a, uint b, uint c);
fraction reducedRatio(uint n, uint d, uint max);
fraction normalizedRatio(uint a, uint b, uint scale);
fraction accurateRatio(uint a, uint b, uint scale);
fraction toFraction(uint n, uint d);
uint roundDiv(uint n, uint d);

uint func(uint a, uint b, uint c)
{
uint hi = a > b ? a : b;
uint lo = a < b ? a : b;
fraction f = reducedRatio(hi, c, (uint)(-1) / lo);
return f.n * lo / f.d;
}

fraction reducedRatio(uint n, uint d, uint max)
{
fraction f = toFraction(n, d);
if (n > max || d > max)
f = normalizedRatio(n, d, max);
if (f.n != f.d)
return f;
return toFraction(1, 1);
}

fraction normalizedRatio(uint a, uint b, uint scale)
{
if (a <= b)
return accurateRatio(a, b, scale);
fraction f = accurateRatio(b, a, scale);
return toFraction(f.d, f.n);
}

fraction accurateRatio(uint a, uint b, uint scale)
{
uint maxVal = (uint)(-1) / scale;
if (a > maxVal)
{
uint c = a / (maxVal + 1) + 1;
a /= c; // we can now safely compute `a * scale`
b /= c;
}
if (a != b)
{
uint n = a * scale;
uint d = a + b; // can overflow
if (d >= a) // no overflow in `a + b`
{
uint x = roundDiv(n, d); // we can now safely compute `scale - x`
uint y = scale - x;
return toFraction(x, y);
}
if (n < b - (b - a) / 2)
{
return toFraction(0, scale); // `a * scale < (a + b) / 2 < MAXUINT256 < a + b`
}
return toFraction(1, scale - 1); // `(a + b) / 2 < a * scale < MAXUINT256 < a + b`
}
return toFraction(scale / 2, scale / 2); // allow reduction to `(1, 1)` in the calling function
}

fraction toFraction(uint n, uint d)
{
fraction f = {n, d};
return f;
}

uint roundDiv(uint n, uint d)
{
return n / d + n % d / (d - d / 2);
}

Here is my test:

#include <stdio.h>

int main()
{
uint a = (uint)(-1) / 3; // 0x5555555555555555
uint b = (uint)(-1) / 2; // 0x7fffffffffffffff
uint c = (uint)(-1) / 1; // 0xffffffffffffffff
printf("0x%llx", func(a, b, c)); // 0x2How to Multiply a 64 Bit Integer by a Fraction in C++ While Minimizing Erroraaaaaaa
return 0;
}

Multiply 64b x 32b divide by 64b integers

I ended up with specific solution, it accepts frequency even above 32b.

static uint64_t counter_and_freq_to_nanotime(uint64_t counter, uint64_t freq)
{
uint32_t div = 1, freq32;
uint64_t q, r;

while (freq >= (1ull << 32)) {
freq /= 2;
div *= 2;
}
freq32 = freq;

q = counter / freq32;
r = counter % freq32;
return (q * NANOS_IN_SEC + (r * NANOS_IN_SEC) / freq32) * div;
}

Quick benchmark (E5-2699v4, Win7 x64):

  • MFllMulDiv: ~50 ns
  • this solution: ~1.5 ns

(a * b) / c MulDiv and dealing with overflow from intermediate multiplication

I've been tinkering with an approach that (1) multiplies a and b with the school algorithm on 21-bit limbs (2) proceeds to do long division by c, with an unusual representation of the residual a*b - c*q that uses a double to store the high-order bits and a long to store the low-order bits. I don't know if it can be made to be competitive with standard long division, but for your enjoyment,

public class MulDiv {
public static void main(String[] args) {
java.util.Random r = new java.util.Random();
for (long i = 0; true; i++) {
if (i % 1000000 == 0) {
System.err.println(i);
}
long a = r.nextLong() >> (r.nextInt(8) * 8);
long b = r.nextLong() >> (r.nextInt(8) * 8);
long c = r.nextLong() >> (r.nextInt(8) * 8);
if (c == 0) {
continue;
}
long x = mulDiv(a, b, c);
java.math.BigInteger aa = java.math.BigInteger.valueOf(a);
java.math.BigInteger bb = java.math.BigInteger.valueOf(b);
java.math.BigInteger cc = java.math.BigInteger.valueOf(c);
java.math.BigInteger xx = aa.multiply(bb).divide(cc);
if (java.math.BigInteger.valueOf(xx.longValue()).equals(xx) && x != xx.longValue()) {
System.out.printf("a=%d b=%d c=%d: %d != %s\n", a, b, c, x, xx);
}
}
}

// Returns truncate(a b/c), subject to the precondition that the result is
// defined and can be represented as a long.
private static long mulDiv(long a, long b, long c) {
// Decompose a.
long a2 = a >> 42;
long a10 = a - (a2 << 42);
long a1 = a10 >> 21;
long a0 = a10 - (a1 << 21);
assert a == (((a2 << 21) + a1) << 21) + a0;
// Decompose b.
long b2 = b >> 42;
long b10 = b - (b2 << 42);
long b1 = b10 >> 21;
long b0 = b10 - (b1 << 21);
assert b == (((b2 << 21) + b1) << 21) + b0;
// Compute a b.
long ab4 = a2 * b2;
long ab3 = a2 * b1 + a1 * b2;
long ab2 = a2 * b0 + a1 * b1 + a0 * b2;
long ab1 = a1 * b0 + a0 * b1;
long ab0 = a0 * b0;
// Compute a b/c.
DivBy d = new DivBy(c);
d.shift21Add(ab4);
d.shift21Add(ab3);
d.shift21Add(ab2);
d.shift21Add(ab1);
d.shift21Add(ab0);
return d.getQuotient();
}
}

public strictfp class DivBy {
// Initializes n <- 0.
public DivBy(long d) {
di = d;
df = (double) d;
oneOverD = 1.0 / df;
}

// Updates n <- 2^21 n + i. Assumes |i| <= 3 (2^42).
public void shift21Add(long i) {
// Update the quotient and remainder.
q <<= 21;
ri = (ri << 21) + i;
rf = rf * (double) (1 << 21) + (double) i;
reduce();
}

// Returns truncate(n/d).
public long getQuotient() {
while (rf != (double) ri) {
reduce();
}
// Round toward zero.
if (q > 0) {
if ((di > 0 && ri < 0) || (di < 0 && ri > 0)) {
return q - 1;
}
} else if (q < 0) {
if ((di > 0 && ri > 0) || (di < 0 && ri < 0)) {
return q + 1;
}
}
return q;
}

private void reduce() {
// x is approximately r/d.
long x = Math.round(rf * oneOverD);
q += x;
ri -= di * x;
rf = repairLowOrderBits(rf - df * (double) x, ri);
}

private static double repairLowOrderBits(double f, long i) {
int e = Math.getExponent(f);
if (e < 64) {
return (double) i;
}
long rawBits = Double.doubleToRawLongBits(f);
long lowOrderBits = (rawBits >> 63) ^ (rawBits << (e - 52));
return f + (double) (i - lowOrderBits);
}

private final long di;
private final double df;
private final double oneOverD;
private long q = 0;
private long ri = 0;
private double rf = 0;
}


Related Topics



Leave a reply



Submit