Round values while preserve their rounded sum in R

After an embarrassing teleconference in which I presented a series of percentages that did not sum to 100 (as they should have), I found some R code on stackoverflow.com to help me to avoid this in the future.

In general, the sum of rounded numbers (e.g., using the base::round function) is not the same as their rounded sum. For example:

> sum(c(0.333, 0.333, 0.334))
[1] 1
> sum(round(c(0.333, 0.333, 0.334), 2))
[1] 0.99

The stackoverflow solution applies the following algorithm

    1. Round down to the specified number of decimal places
    2. Order numbers by their remainder values
    3. Increment the specified decimal place of values with 'k' largest remainders, where 'k' is the number of values that must be incremented to preserve their rounded sum

Here's the corresponding R function:

round_preserve_sum <- function(x, digits = 0) {
  up <- 10 ^ digits
  x <- x * up
  y <- floor(x)
  indices <- tail(order(x-y), round(sum(x)) - sum(y))
  y[indices] <- y[indices] + 1
  y / up
}

Continuing with the example:

> sum(c(0.333, 0.333, 0.334))
[1] 1
> sum(round(c(0.333, 0.333, 0.334), 2))
[1] 0.99
> sum(round_preserve_sum(c(0.333, 0.333, 0.334), 2))
[1] 1

10 thoughts on “Round values while preserve their rounded sum in R

  1. Neat idea! That code looks a bit odd though. When was the ">-" operator introduced ? I must have missed that one. Perhaps the code should be as below (which seems to work):

    round_preserve_sum <- function(x, digits = 0) {
    up <- 10 ^ digits
    x <- x * up
    y <- floor(x)
    indices <- tail(order(x-y), round(sum(x)) - sum(y))
    y[indices] <- y[indices] + 1
    y / up
    }

  2. Most folks just include the standard disclaimer "does not sum to 100% due to rounding error" and move on 🙂 . But may I ask: why did you round any variable prior to processing? That was one of the first lessons we got in undergrad physics: always carry at least two extra digits' precision until the end of the calculation. Never round in the middle.
    I read thru the linked SO post and links therein, and it still looks to me like all of this is a kludge to "fix" poorly designed processing.

    1. "But may I ask: why did you round any variable prior to processing?"

      The rounding discussed here occurs at the presentation stage (after any other processing). I agree that the greatest practical precision (e.g., double floating point precision) should be used until then.

    2. "*Never* round in the middle of a calculation. You only round the final answer. " was what I was taught in high school, and I remember there were some classic examples of this, at least pre-internet, not sure what is on the inter webs nowadays.

      In my current line of work, higher level managers who aren't as number savvy don't want to see a table where "0.037 % of the patients Native Americans over the age of 55", they don't seem to want to see any percentage lower than a "0.1%," which means (you guessed it) there is going to be rounding error.

  3. What about: 0.335, 0.335, 0.33 - in this case rounded numbers result in 101% and a longer series of up-rounded numbers could be much more.

    1. Looks like it works:

      > sum(c(0.335,0.335,0.330))
      [1] 1
      > sum(round_preserve_sum(c(0.335,0.335,0.330)))
      [1] 1

Comments are closed.