Round Vector of Numerics to Integer While Preserving Their Sum

Round vector of numerics to integer while preserving their sum

In an even simpler form, I would say this algorithm is:

  1. Start with everything rounded down
  2. Round up the numbers with the highest fractional parts until the desired sum is reached.

This can be implemented in a vectorized way in R by:

  1. Round down with floor
  2. Order numbers by their fractional parts (using order)
  3. Use tail to grab the indices of the elements with the k largest fractional parts, where k is the amount that we need to increase the sum to reach our target value
  4. Increment the output value in each of these indices by 1

In code:

smart.round <- function(x) {
y <- floor(x)
indices <- tail(order(x-y), round(sum(x)) - sum(y))
y[indices] <- y[indices] + 1
y
}
v
# [1] 2.655087 3.721239 5.728534 9.082078 3.813063
sum(v)
# [1] 25
smart.round(v)
# [1] 2 4 6 9 4
sum(smart.round(v))
# [1] 25

How to round floats to integers while preserving their sum?

Here is one algorithm which should accomplish the task. The main difference to other algorithms is that this one rounds the numbers in correct order always. Minimizing roundoff error.

The language is some pseudo language which probably derived from JavaScript or Lua. Should explain the point. Note the one based indexing (which is nicer with x to y for loops. :p)

// Temp array with same length as fn.
tempArr = Array(fn.length)

// Calculate the expected sum.
arraySum = sum(fn)

lowerSum = 0
-- Populate temp array.
for i = 1 to fn.lengthf
tempArr[i] = { result: floor(fn[i]), // Lower bound
difference: fn[i] - floor(fn[i]), // Roundoff error
index: i } // Original index

// Calculate the lower sum
lowerSum = lowerSum + tempArr[i].result
end for

// Sort the temp array on the roundoff error
sort(tempArr, "difference")

// Now arraySum - lowerSum gives us the difference between sums of these
// arrays. tempArr is ordered in such a way that the numbers closest to the
// next one are at the top.
difference = arraySum - lowerSum

// Add 1 to those most likely to round up to the next number so that
// the difference is nullified.
for i = (tempArr.length - difference + 1) to tempArr.length
tempArr.result = tempArr.result + 1
end for

// Optionally sort the array based on the original index.
array(sort, "index")

Round Float Vector to Integer Vector

This is an integer programming problem. Branch and Bound is a simple approach that can be quite efficient in practice with good bounding conditions.

I've implemented a simple Branch and Bound algorithm. The main idea is to try the next higher and lower integer for each member of the array. At each step, we try whichever one produces less loss first. Once we've found a potential solution, we keep it if the solution is better than the best we've found so far, then we backtrack. If at any point we find that the loss of our partial solution is worse than the best total loss we've found, then we can prune that branch.

Here is a Python implementation of a basic brand-and-bound solution. There are many ways it could be optimized further, but this shows the basic idea:

from sys import stderr, stdout
from math import floor, ceil
import random

def debug(msg):
#stderr.write(msg)
pass

def loss(pf,pi):
return sum((pf[i]-pi[i])**2 for i in range(0,len(pf)))

class Solver:
def __init__(self,pf):
n = len(pf)
self.pi = [0]*n
self.pf = pf
self.min_loss = n
self.min_pi = None
self.attempt_count = 0

def test(self):
"""Test a full solution"""
pi = self.pi
pf = self.pf
assert sum(pi)==0
l = loss(pf,pi)
debug('%s: %s\n'%(l,pi))
if l<self.min_loss:
self.min_loss = l
self.min_pi = pi[:]

def attempt(self,i,value):
"""Try adding value to the partial solution"""
self.pi[i] = int(value)
self.extend(i+1)
self.attempt_count += 1

def extend(self,i):
"""Extend the partial solution"""
partial = self.pi[:i]
loss_so_far = loss(self.pf[:i],partial)
debug('%s: pi=%s\n'%(loss_so_far,partial))
if loss_so_far>=self.min_loss:
return
if i==len(self.pf)-1:
self.pi[i] = -sum(partial)
self.test()
return
value = self.pf[i]
d = value-floor(value)
if d<0.5:
# The the next lower integer first, since that causes less loss
self.attempt(i,floor(value))
self.attempt(i,ceil(value))
else:
# Try the next higher integer first
self.attempt(i,ceil(value))
self.attempt(i,floor(value))

def exampleInput(seed):
random.seed(seed)
n = 10
p = [random.uniform(-100,100) for i in range(0,n)]
average = sum(p)/n
pf = [x-average for x in p]
return pf

input = exampleInput(42)
stdout.write('input=%s\n'%input)
stdout.write('sum(input)=%s\n'%sum(input))

solver=Solver(input)
solver.extend(0)

stdout.write('best solution: %s\n'%solver.min_pi)
stdout.write('sum(best): %s\n'%sum(solver.min_pi))
stdout.write('attempts: %s\n'%solver.attempt_count)
stdout.write('loss: %s\n'%loss(input,solver.min_pi))
assert sum(solver.min_pi)==0

Rounding off a list of numbers to a user-defined step while preserving their sum

  1. Round all numbers to the nearest multiple of roundOffStep normally.
  2. If the new sum is lower than the original sum, you're done.
  3. For each number, calculate rounded_number - original_number. Sort this list of differences in decreasing order so that you can find the numbers with the largest difference.
  4. Pick the number that gives the largest difference rounded_number - original_number, and subtract roundOffStep from that number.
  5. Repeat step 4 (picking the next largest difference each time) until the new sum is less than the original.

This process should ensure that the rounded numbers are as close as possible to the originals, without going over the original sum.

Example, with roundOffStep = 5:

    Original Numbers  |   Rounded  |  Difference
----------------------+------------+--------------
29.20 | 30 | 0.80
18.25 | 20 | 1.75
14.60 | 15 | 0.40
8.76 | 10 | 1.24
2.19 | 0 | -2.19
----------------------+------------+--------------
Sum: 73 | 75 |

The sum is too large, so we pick the number giving the largest difference (18.25 which was rounded to 20) and subtract 5 to give 15. Now the sum is 70, so we're done.

Running Rounding

I would do it this way, with simple base vectorised functions:

first calculate the running total of the original numbers, and the rounded value of that running total. Then find a list of numbers that add up to this rounded running total using diff() to see how each rounded sum is larger than the last.

cum.sum <- cumsum(x$numbers)
cum.sum.rounded <- round(cum.sum)
numbers.round <- diff(cum.sum.rounded)
numbers.round <- c(cum.sum.rounded[1], numbers.round)

Check that all is as you want it:

check.cs <- cumsum(numbers.round)
all( abs(check.cs - cum.sum) <=1 )
#TRUE

How to round numbers in a vector to only show necessary decimals in R?

We may use an extra S3 class:

print.myNumeric <- function(x) cat(x, sep = "  ")

Compare printing before and after adding the class:

vec
#> [1] 0.01 0.10 1.00 10.00

class(vec) <- c("myNumeric", class(vec))

vec
#> 0.01 0.1 1 10

Basic math preserves the class and the printing:

4 * vec + 1:4
#> 1.04 2.4 7 44

C: Round signed integer to nearest multiple

In integer arithmetic, if n is positive, add m/2, else subtract m/2, then divide by m (truncating integer divide), then multiply by m:

int round_to_nearest( int n, int m )
{
return (( n + ((n < 0) ? -m : m) / 2) / m ) * m ;
}
int main()
{
int test[] = {16, 23, 22, -23, -22} ;
int m = 5 ;

for( int i = 0; i < sizeof(test) / sizeof(*test); i++ )
{
printf(" round_to_nearest( %d, %d ) = %d\n", test[i], m,
round_to_nearest( test[i], m ) ) ;
}

return 0;
}

Output of test:

 round_to_nearest( 16, 5 ) = 15
round_to_nearest( 23, 5 ) = 25
round_to_nearest( 22, 5 ) = 20
round_to_nearest( -23, 5 ) = -25
round_to_nearest( -22, 5 ) = -20

One caveat is that m must be > 0 - which in this context makes sense, I would accept that as a precondition for correct operation; checking for it as a runtime error is probably unnecessary, but you might include an assert to protect against programmer semantic error:

assert( m > 0 ) ;

Standard library asserts are removed when NDEBUG is defined - normally when debug support is disabled.

rounding of digits

This is a consequence of the way R prints atomic vectors.

With the default digits option set to 7 as you likely have, any value between -1 and 1 will print with seven decimal places. Because of the way R prints atomic vectors, all other values in the vector will also have seven decimal places. Furthermore, a value of .6264538 with digits option set to 7 must print with eight digits (0.6264538) because it must have a leading zero. There are two of these values in your rnorm() vector.

If you look at cumsum(abs(rnorm(100)))[100] alone and you can see the difference (actually it becomes the same as printed value as sum(abs(rnorm(100))), although not exactly the same value).

sum(abs(rnorm(100)))
# [1] 71.67207
cumsum(abs(rnorm(100)))[100]
# [1] 71.67207

Notice that both of these values have seven digits. Probably the most basic example of this I can think of is as follows

0.123456789
#[1] 0.1234568
1.123456789
#[1] 1.123457
11.123456789
# [1] 11.12346
## and so on ...


Related Topics



Leave a reply



Submit