Round_Any Equivalent for Dplyr

round_any equivalent for dplyr?

ggplot::cut_width as pointed to in one of the comments, does not even return a numeric vector, but a factor instead. So it is no real substitute.

Since round and not floor is the default rounding method, a custom replacement until a (dplyr solution may arrive) would be

round_any = function(x, accuracy, f=round){f(x/ accuracy) * accuracy}

This method could also be used directly from the plyr package which contains this implementation. However, be careful when loading plyr into a workspace which will cause naming conflicts when using dplyr as well.

R: Is there a good replacement for plyr::rbind.fill in dplyr?

Yes. dplyr::bind_rows

Credit goes to commenter.

Equivalent of max and over(partition by) in R for flattening

Using data.table:

library(data.table)
setDT(df)[
, recorded_dt:=NULL][
, lapply(.SD, \(x) sort(x, na.last = TRUE, decreasing = TRUE)[1])
, by=.(ID, time0)]
## ID time0 day0 day1 day4 day30
## 1: 1 2009-01-01 A <NA> B D
## 2: 2 2005-02-02 <NA> B <NA> <NA>

The internal variable .SD represents a subset of the data.table including all columns except those included in the by=... clause. This is why we have to remove the column recorded_dt first.

r-dplyr equivalent of sql query returning monthly utilisation of contracts

You get lubridate errors with working with certain date time formats. It works if you remove as.Date and %m+%.

df %>% 
filter(start< "2016-03-01" &
start + months(duration) >="2016-03-01")

Creating a new column based on multiple conditional statements in r

You can use plyrs round_any function

df$value1 <- plyr::round_any(df$value, 0.1, ceiling)
df
# value value1
#1 0.59465953 0.6
#2 0.10581043 0.2
#3 0.48806113 0.5
#4 0.04106798 0.1
#5 0.24026985 0.3
#6 0.08468660 0.1
#7 0.11598592 0.2
#8 0.50481103 0.6
#9 0.43194839 0.5
#10 0.16032725 0.2
#11 0.29700099 0.3
#12 0.04986834 0.1
#13 0.21233054 0.3
#14 0.58152528 0.6
#...

How to round up to the nearest 10 (or 100 or X)?

If you just want to round up to the nearest power of 10, then just define:

roundUp <- function(x) 10^ceiling(log10(x))

This actually also works when x is a vector:

> roundUp(c(0.0023, 3.99, 10, 1003))
[1] 1e-02 1e+01 1e+01 1e+04

..but if you want to round to a "nice" number, you first need to define what a "nice" number is. The following lets us define "nice" as a vector with nice base values from 1 to 10. The default is set to the even numbers plus 5.

roundUpNice <- function(x, nice=c(1,2,4,5,6,8,10)) {
if(length(x) != 1) stop("'x' must be of length 1")
10^floor(log10(x)) * nice[[which(x <= 10^floor(log10(x)) * nice)[[1]]]]
}

The above doesn't work when x is a vector - too late in the evening right now :)

> roundUpNice(0.0322)
[1] 0.04
> roundUpNice(3.22)
[1] 4
> roundUpNice(32.2)
[1] 40
> roundUpNice(42.2)
[1] 50
> roundUpNice(422.2)
[1] 500

[[EDIT]]

If the question is how to round to a specified nearest value (like 10 or 100), then James answer seems most appropriate. My version lets you take any value and automatically round it to a reasonably "nice" value. Some other good choices of the "nice" vector above are: 1:10, c(1,5,10), seq(1, 10, 0.1)

If you have a range of values in your plot, for example [3996.225, 40001.893] then the automatic way should take into account both the size of the range and the magnitude of the numbers. And as noted by Hadley, the pretty() function might be what you want.

How to do range grouping on a column using dplyr?

We can use cut to do the grouping. We create the 'gr' column within the group_by, use summarise to create the number of elements in each group (n()), and order the output (arrange) based on 'gr'.

library(dplyr)
DT %>%
group_by(gr=cut(B, breaks= seq(0, 1, by = 0.05)) ) %>%
summarise(n= n()) %>%
arrange(as.numeric(gr))

As the initial object is data.table, this can be done using data.table methods (included @Frank's suggestion to use keyby)

library(data.table)
DT[,.N , keyby = .(gr=cut(B, breaks=seq(0, 1, by=0.05)))]

EDIT:

Based on the update in the OP's post, we could substract a small number to the seq

lvls <- levels(cut(DT$B, seq(0, 1, by =0.05)))
DT %>%
group_by(gr=cut(B, breaks= seq(0, 1, by = 0.05) -
.Machine$double.eps, right=FALSE, labels=lvls)) %>%
summarise(n=n()) %>%
arrange(as.numeric(gr))
# gr n
#1 (0,0.05] 2
#2 (0.05,0.1] 2
#3 (0.1,0.15] 3
#4 (0.15,0.2] 2
#5 (0.7,0.75] 1


Related Topics



Leave a reply



Submit