Number of Significant Digits in Dplyr Summarise

Automatic rounding in dplyr::summarise() function

This is to do with the way tibbles are printed. The actual numbers in the data frame still have all the decimal places they are just not displayed when printing the tibble.

You can use as.data.frame or print.data.frame() which will show you more decimal points (depending on your getOption("digits")). You can also change the tibble settings but my understanding is that these are always based on significant figures rather than decimal points (so your values >100 will have fewer decimal points than values <100) See
https://tibble.tidyverse.org/reference/formatting.html for tibble printing options

So

df %>% group_by(group) %>% summarise(mL = round(mean(large),3), mS = round(mean(small),3)) %>%
as.data.frame()

will give you values to 3 decimal places, and

df %>% group_by(group) %>% summarise(mL = mean(large), mS = mean(small))  %>%
as.data.frame()

will show to getOption("digits") decimal places (I think 7 is default).

Also note if you do want to do the same thing to multiple columns in summarise, summarise_at() can be very helpful, e.g.

df %>% group_by(group) %>% summarise_at(c("large","small"), ~round(mean(.),3)) %>% 
print.data.frame()

Decimal places not showing when using dplyr summarize function in R

The issue is with settings in your environment which controls number of digits to be displayed while printing which can be changed by running options(digits = 5) or any higher number (upto 22) in the console.

From ?options

digits:
controls the number of significant (see signif) digits to print when printing numeric values. It is a suggestion only. Valid values are 1...22 with default 7.

After doing that if you run

library(dplyr)
NLeast_starters %>% summarize(mean_hits = round(mean(H),5))

# mean_hits
#1 123.75

you'll get the expected display of decimal places.

r - rounding in summarise()

For the tibble package you need to modifiy the option pillar.sigfig.

pillar.sigfig: The number of significant digits that will be printed and highlighted, default: 3

library(tibble)
options(pillar.sigfig = 10)

set.seed(1)
tibble(a = rnorm(3), b = rexp(3))
# A tibble: 3 x 2
# a b
# <dbl> <dbl>
#1 -0.6264538107 0.4360686258
#2 0.1836433242 2.894968537
#3 -0.8356286124 1.229562053

dplyr summarise character time variable

I can think of using lubridate::hms to convert those strings to numbers, but I haven't found the right way to format(.., format="%H:%M:%S") back again, so here are two functions I have used for various related purposes:

## simply convert "01:23:45" to 5025 (seconds) and "00:17:14.842" to 1034.842
time2num <- function(x) {
vapply(strsplit(x, ':'), function(y) sum(as.numeric(y) * c(60*60, 60, 1)),
numeric(1), USE.NAMES=FALSE)
}

## and back again
num2time <- function(x, digits.secs = getOption("digits.secs", 3)) {
hr <- as.integer(x %/% 3600)
min <- as.integer((x - 3600*hr) %/% 60)
sec <- (x - 3600*hr - 60*min)
if (anyNA(digits.secs)) {
# a mostly-arbitrary determination of significant digits,
# motivated by @Roland https://stackoverflow.com/a/27767973
for (digits.secs in 1:6) {
if (any(abs(signif(sec, digits.secs) - sec) > (10^(-3 - digits.secs)))) next
digits.secs <- digits.secs - 1L
break
}
}
sec <- sprintf(paste0("%02.", digits.secs[[1]], "f"), sec)
sec <- paste0(ifelse(grepl("^[0-9]\\.", sec), "0", ""), sec)
out <- sprintf("%02i:%02i:%s", hr, min, sec)
out[is.na(x)] <- NA_character_
out
}

With these,

library(dplyr)
df %>%
group_by(ID) %>%
mutate(Freq = num2time(sum(time2num(Time)), digits = 0)) %>%
ungroup()
# # A tibble: 6 x 3
# ID Time Freq
# <int> <chr> <chr>
# 1 456 0:00:01 00:02:06
# 2 456 0:02:05 00:02:06
# 3 123 0:00:14 00:00:14
# 4 756 0:03:47 00:05:44
# 5 756 0:01:56 00:05:44
# 6 756 0:00:01 00:05:44

Data

dat <- structure(list(ID = c(456L, 456L, 123L, 756L, 756L, 756L), Time = c("0:00:01", "0:02:05", "0:00:14", "0:03:47", "0:01:56", "0:00:01")), class = "data.frame", row.names = c(NA, -6L))

Correcting summary in R with appropriate # of digits of precision

The default for summary.data.frame is not digits=3, but rather:

   ... max(3, getOption("digits") - 3)  # set in the argument list
getOption("digits") # the default setting
[1] 7
options(digits=10)
> summary(df)
V1 V2 V3
Min. :-3.70323584 Min. : 11.0 Min. :6.790622e-05
1st Qu.:-0.66847105 1st Qu.:122798.5 1st Qu.:2.497735e-01
Median : 0.00977831 Median :247971.0 Median :5.013797e-01
Mean : 0.01044752 Mean :248776.4 Mean :5.001182e-01
3rd Qu.: 0.68878422 3rd Qu.:374031.0 3rd Qu.:7.502424e-01
Max. : 3.56810079 Max. :499931.0 Max. :9.998686e-01

fix r sum() auto remove the small digital .05

I believe this is just a printout issue; if you want to increase the number of significant digits in the printout, you could try:

sprintf("%.2f",sum(22068.00, 144501.00,  71153.00,  26193.05,  10395.00 , 80619.00))
# [1] "354929.05"

And to change the number of digits, just change the number in the first argument, i.e.:

sprintf("%.10f",sum(22068.00, 144501.00,  71153.00,  26193.05,  10395.00 , 80619.00))
#[1] "354929.0500000000"


Related Topics



Leave a reply



Submit