R: 'Split' Preserving Natural Order of Factors

R: `split` preserving natural order of factors

split converts the f (second) argument to factors, if it isn't already one. So, if you want the order to be retained, factor the column yourself with the desired level. That is:

df$yearmon <- factor(df$yearmon, levels=unique(df$yearmon))
# now split
split(df, df$yearmon)
# $`4_2013`
#   Date.of.Inclusion Securities.Included Securities.Excluded yearmon
# 1        2013-04-01          INDUSINDBK             SIEMENS  4_2013
# 2        2013-04-01                NMDC               WIPRO  4_2013

# $`9_2012`
#   Date.of.Inclusion Securities.Included Securities.Excluded yearmon
# 3        2012-09-28               LUPIN                SAIL  9_2012
# 4        2012-09-28          ULTRACEMCO                STER  9_2012

# $`4_2012`
#   Date.of.Inclusion Securities.Included Securities.Excluded yearmon
# 5        2012-04-27          ASIANPAINT                RCOM  4_2012
# 6        2012-04-27          BANKBARODA              RPOWER  4_2012

But do not use `split`. Use `data.table` instead:

However normally, split tends to be terribly slow as the levels increase. So, I'd suggest using data.table to subset to a list. I'd suppose that'd be much faster!

require(data.table)
dt <- data.table(df)
dt[, grp := .GRP, by = yearmon]
setkey(dt, grp)
o2 <- dt[, list(list(.SD)), by = grp]$V1

Benchmarking on huge data:

set.seed(45)
dates <- seq(as.Date("1900-01-01"), as.Date("2013-12-31"), by = "days")
ym <- do.call(paste, c(expand.grid(1:500, 1900:2013), sep="_"))

df <- data.frame(x1 = sample(dates, 1e4, TRUE), 
                 x2 = sample(letters, 1e4, TRUE), 
                 x3 = sample(10, 1e4, TRUE), 
                 yearmon = sample(ym, 1e4, TRUE), 
      stringsAsFactors=FALSE)

require(data.table)
dt <- data.table(df)

f1 <- function(dt) {
    dt[, grp := .GRP, by = yearmon]
    setkey(dt, grp)

    o1 <- dt[, list(list(.SD)), by=grp]$V1
}

f2 <- function(df) {
    df$yearmon <- factor(df$yearmon, levels=unique(df$yearmon))
    o2 <- split(df, df$yearmon)
}

require(microbenchmark)
microbenchmark(o1 <- f1(dt), o2 <- f2(df), times = 10)

# Unit: milliseconds
         expr        min         lq     median        uq      max neval
#  o1 <- f1(dt)   43.72995   43.85035   45.20087  715.1292 1071.976    10
#  o2 <- f2(df) 4485.34205 4916.13633 5210.88376 5763.1667 6912.741    10

Note that the solution from o1 will be an unnamed list. But you can set the names simply by doing names(o1) <- unique(dt$yearmon)

keeping the original order of data after split() in R

We can split by factor converted 'study.name', where the levels are specified as the unique elements of the column and unique returns the values in the same order of occurrence of unique elements

split(D, factor(D$study.name, levels = unique(D$study.name)))

if we need to delete the NA elements, subset the data before the split

D1 <- subset(D, !(is.na(study.name)| study.name == ""))
split(D1, factor(D1$study.name, levels = unique(D1$study.name)))
#$Shin.Ellis
#  study.name group.name  n mpre mpos sdpre sdpos   r autoreg  t sdif F1 sdp df2 post control outcome ESL prof scope type
#1 Shin.Ellis   ME.short 13 0.34 0.72  0.37  0.34 0.5   FALSE NA   NA NA  NA  NA    1   FALSE       1   1    2     1    2
#2 Shin.Ellis    ME.long 13 0.34 0.39  0.37  0.36 0.5    TRUE NA   NA NA  NA  NA    2   FALSE       1   1    2     1    2
#3 Shin.Ellis  DCF.Short 15 0.37 0.54  0.38  0.36 0.5   FALSE NA   NA NA  NA  NA    1   FALSE       1   1    2     1    2
#4 Shin.Ellis   DCF.Long 15 0.37 0.49  0.38  0.36 0.5    TRUE NA   NA NA  NA  NA    2   FALSE       1   1    2     1    2
#5 Shin.Ellis Cont.Short 16 0.32 0.28  0.37  0.36 0.5   FALSE NA   NA NA  NA  NA    1    TRUE       1   1    2     1    2
#6 Shin.Ellis  Cont.Long 16 0.32 0.35  0.37  0.32 0.5    TRUE NA   NA NA  NA  NA    2    TRUE       1   1    2     1    2

#$Trus.Hsu
#  study.name group.name  n   mpre   mpos  sdpre  sdpos   r autoreg  t sdif F1 sdp df2 post control outcome ESL prof scope type
#8   Trus.Hsu      Exper 21 0.0799 0.1130 0.0367 0.0472 0.5   FALSE NA   NA NA  NA  NA    1   FALSE       1   2    2     2    1
#9   Trus.Hsu       Cont 26 0.0763 0.1095 0.0389 0.0537 0.5   FALSE NA   NA NA  NA  NA    1    TRUE       1   2    2     2    1

#$kabla
#   study.name group.name  n mpre mpos sdpre sdpos   r autoreg  t sdif F1 sdp df2 post control outcome ESL prof scope type
#11      kabla   ME.short 13 0.34 0.72  0.37  0.34 0.5   FALSE NA   NA NA  NA  NA    1   FALSE       1   1    3     0    1
#12      kabla    ME.long 13 0.34 0.39  0.37  0.36 0.5   FALSE NA   NA NA  NA  NA    2   FALSE       1   1    3     0    1
#13      kabla  DCF.Short 15 0.37 0.54  0.38  0.36 0.5   FALSE NA   NA NA  NA  NA    1   FALSE       1   1    3     0    1
#14      kabla   DCF.Long 15 0.37 0.49  0.38  0.36 0.5   FALSE NA   NA NA  NA  NA    2   FALSE       1   1    3     0    1
#15      kabla Cont.Short 16 0.32 0.28  0.37  0.36 0.5   FALSE NA   NA NA  NA  NA    1    TRUE       1   1    3     0    1
#16      kabla  Cont.Long 16 0.32 0.35  0.37  0.32 0.5   FALSE NA   NA NA  NA  NA    2    TRUE       1   1    3     0    1

Can't split dataframe into equal buckets preserving order without introducing Xn. prefix

I'm not entirely clear on what your issue is, but if you are just trying to get the sum by group couldn't you use

 library(tidyverse)

df <- data.frame(actual_duration=sample(100))

 df %>% 
   arrange(actual_duration) %>% 
   mutate(group = rep(1:10, each = 10)) %>% 
   group_by(group) %>% 
   summarise(sums = sum(actual_duration))

alternatively if you want to keep the list format

df %>% 
  arrange(actual_duration) %>% 
  mutate(group = factor(rep(1:10, each = 10))) %>% 
  split(., .$group)  %>% 
  map(., function(x) sum(x$actual_duration))

Split data frame based on group into list in defined order in R

We can do an order of the dataset first and then do the split on the 'cycle' by creating a factor with the levels specified as unique elements

t1 <- ts_df[order(ts_df$date),]
split(t1, factor(t1$cycle, levels = unique(t1$cycle)) )

Sort list of strings by order of numeric parts

If your paths has the same pattern and only last number changes then you can use mixedorder from gtools package; otherwise, think about using gsub and regular expression.

L[mixedorder(sapply(L, function(x) x[1], simplify=TRUE), decreasing=FALSE)]

L is the list containing your paths.

Example:

For the sample data provided below this would be the answer:

#Original List before sorting:
# > L
# [[1]] 
# [1] "C:\\Users\\agaz\\AppData\\Local\\Temp\\Rtmp0wZI21/008947515435900b4d1a0b8d/22" 
# [2] "C:\\Users\\agaz\\AppData\\Local\\Temp\\Rtmp0wZI21/008947515435900b4d1a0b8d/30" 
#  
# [[2]] 
# [1] "C:\\Users\\agaz\\AppData\\Local\\Temp\\Rtmp0wZI21/008947515435900b4d1a0b8d/0" 
# [2] "C:\\Users\\agaz\\AppData\\Local\\Temp\\Rtmp0wZI21/008947515435900b4d1a0b8d/9" 
# [3] "C:\\Users\\agaz\\AppData\\Local\\Temp\\Rtmp0wZI21/008947515435900b4d1a0b8d/5" 
#  
# [[3]] 
# [1] "C:\\Users\\agaz\\AppData\\Local\\Temp\\Rtmp0wZI21/008947515435900b4d1a0b8d/6" 
# [2] "C:\\Users\\agaz\\AppData\\Local\\Temp\\Rtmp0wZI21/008947515435900b4d1a0b8d/7" 
# [3] "C:\\Users\\agaz\\AppData\\Local\\Temp\\Rtmp0wZI21/008947515435900b4d1a0b8d/8" 
#

Sorted list based on the first element:

L[mixedorder(sapply(L, function(x) x[1], simplify=TRUE), decreasing=FALSE)]
# [[1]] 
# [1] "C:\\Users\\agaz\\AppData\\Local\\Temp\\Rtmp0wZI21/008947515435900b4d1a0b8d/0" 
# [2] "C:\\Users\\agaz\\AppData\\Local\\Temp\\Rtmp0wZI21/008947515435900b4d1a0b8d/9" 
# [3] "C:\\Users\\agaz\\AppData\\Local\\Temp\\Rtmp0wZI21/008947515435900b4d1a0b8d/5" 
#  
# [[2]] 
# [1] "C:\\Users\\agaz\\AppData\\Local\\Temp\\Rtmp0wZI21/008947515435900b4d1a0b8d/6" 
# [2] "C:\\Users\\agaz\\AppData\\Local\\Temp\\Rtmp0wZI21/008947515435900b4d1a0b8d/7" 
# [3] "C:\\Users\\agaz\\AppData\\Local\\Temp\\Rtmp0wZI21/008947515435900b4d1a0b8d/8" 
#  
# [[3]] 
# [1] "C:\\Users\\agaz\\AppData\\Local\\Temp\\Rtmp0wZI21/008947515435900b4d1a0b8d/22" 
# [2] "C:\\Users\\agaz\\AppData\\Local\\Temp\\Rtmp0wZI21/008947515435900b4d1a0b8d/30" 
#

Sample Data

L <-
 list(c("C:\\Users\\agaz\\AppData\\Local\\Temp\\Rtmp0wZI21/008947515435900b4d1a0b8d/22",  
 "C:\\Users\\agaz\\AppData\\Local\\Temp\\Rtmp0wZI21/008947515435900b4d1a0b8d/30" 
 ), c("C:\\Users\\agaz\\AppData\\Local\\Temp\\Rtmp0wZI21/008947515435900b4d1a0b8d/0",  
 "C:\\Users\\agaz\\AppData\\Local\\Temp\\Rtmp0wZI21/008947515435900b4d1a0b8d/9",  
 "C:\\Users\\agaz\\AppData\\Local\\Temp\\Rtmp0wZI21/008947515435900b4d1a0b8d/5" 
 ), c("C:\\Users\\agaz\\AppData\\Local\\Temp\\Rtmp0wZI21/008947515435900b4d1a0b8d/6",  
 "C:\\Users\\agaz\\AppData\\Local\\Temp\\Rtmp0wZI21/008947515435900b4d1a0b8d/7",  
 "C:\\Users\\agaz\\AppData\\Local\\Temp\\Rtmp0wZI21/008947515435900b4d1a0b8d/8"))

R: How to change item (or object) names of a data.table list?

Here's how I'd approach it at the moment.

require(data.table)
tmp = setDT(df)[, list(grp=list(.SD)), by=.(product, year), .SDcols=names(df)]
setattr(ans <- tmp$grp, 'names', paste(tmp$product, tmp$year, sep="."))
ans
# $b.2001
#    product value year
# 1:       b     7 2001
# 
# $a.2001
#    product value year
# 1:       a     3 2001
# 
# $b.2000
#    product value year
# 1:       b    10 2000
# 
# $a.2000
#    product value year
# 1:       a     9 2000

I've added a FR #1389 to provide a split.data.table method, with which this should be possible in one step.

But in most cases, it's easier to deal with one data.frame/data.table instead of a list. So providing a bit more insight into what your downstream tasks are might help figure out if this is really necessary..

Divide data frame by vector, by rows and not by columns

You can just use the transpose function:

> df[,2:4] <- t(t(df[,2:4]) / div)
> df
  type  V1   V2    V3
1    A 0.1 0.01 0.001
2    B 0.1 0.01 0.001
3    C 0.1 0.01 0.001

R: 'Split' Preserving Natural Order of Factors