R: Split Unbalanced List in Data.Frame Column

R: Split unbalanced list in data.frame column

#Split by ; as before
allJobs <- strsplit(df$b, ";", fixed=TRUE)

#Replicate a by the number of jobs in each case
n <- sapply(allJobs, length)
id <- rep(df$a, times = n)

#Turn allJobs into a vector
job <- unlist(allJobs)

#Retrieve position of each job
jobNum <- unlist(lapply(n, seq_len))

#Combine into a data frame
df2 <- data.frame(id = id, job = job, jobNum = jobNum)

How to split a column into multiple (non equal) columns in R

We could use cSplit from splitstackshape

library(splitstackshape)
cSplit(DF, "Col1",",")

-output

cSplit(DF, "Col1",",")
   Col1_1 Col1_2 Col1_3 Col1_4
1:      a      b      c   <NA>
2:      a      b   <NA>   <NA>
3:      a      b      c      d

How to split my data frame in equal length lists

Although it seems like an easy task, it was very challenging splitting a balanced panel data into small balance panels.

@Allan Cameron's answer got it right in the length of the list but not the content. My panels were unbalanced, each clvs had 188 or 187 in the same chunk, and datetime was not consecutive. B[["1"]] had a sequence of 7:00 ,13:00 and 19:00 for one clvs for example. With unbalanced panels my loop with an splm function didn't work.

The solution was using gl.unequal :

library(DTK)
f<-gl.unequal(n=6,k=c(92,92,92,92,92,91))
B<-split(bb3,f)

This way I get balanced panels, for example B[["1"]]

head(B3[["1"]])
1 07AC~ 2017~ 1      686.    684.    2.19       0 2017-02~ 2017-02-28 02:00:00
2 07AC~ 2017~ 2      665.    664.    1.79       0 2017-02~ 2017-02-28 03:00:00
3 07AC~ 2017~ 3      393.    392.    1.11       0 2017-02~ 2017-02-28 04:00:00
4 07AC~ 2017~ 4      383.    381.    1.4        0 2017-02~ 2017-02-28 05:00:00
5 07AC~ 2017~ 5      383.    381.    1.41       0 2017-02~ 2017-02-28 06:00:00
6 07AC~ 2017~ 6      389.    388.    1.07       0 2017-02~ 2017-02-28 07:00:00

is.pbalanced(B[["1"]])
TRUE

Split an uneven column in a dataframe into multiple columns in R

Using the data shown reproducibly in the Note at the end we can use read.pattern with the indicated pattern pat and then remove junk columns (every other column). The lines marked ## can be omitted if you don't require the column names to be exactly as in the question.

library(gsubfn)

pat <- 
"((\\d+ years), )?((female|male), )?((white|black), )?((stage:\\S+), )?((alive|dead), )?((\\d+) days)?"
r <- read.pattern(text = as.character(DF$Info), pattern = pat, as.is = TRUE)
DF2 <- cbind(Sample = DF$Sample, r[c(FALSE, TRUE)], stringsAsFactors = FALSE)

nc <- ncol(DF2) ## 
names(DF2)[-1] <- paste0("Info_", 1:(nc-1)) ##

DF2

giving:

   Sample   Info_1 Info_2 Info_3     Info_4 Info_5 Info_6
1 Sample1 82 years female  white stage:iiib  alive   1419
2 Sample2 53 years   male        stage:iiib  alive    792
3 Sample3 68 years female  white stage:iiic   dead    740
4 Sample4 43 years   male  white stage:iiic  alive    598
5 Sample5 74 years         white    stage:i  alive   1001
6 Sample6 37 years female  white             alive    257
7 Sample7 69 years female  black  stage:iia  alive    627

Note

The input DF in reproducible form is as follows.

Lines <- "
Sample;Info
Sample1;82 years, female, white, stage:iiib, alive, 1419 days
Sample2;53 years, male, stage:iiib, alive, 792 days
Sample3;68 years, female, white, stage:iiic, dead, 740 days
Sample4;43 years, male, white, stage:iiic, alive, 598 days
Sample5;74 years, white, stage:i, alive, 1001 days
Sample6;37 years, female, white, alive, 257 days
Sample7;69 years, female, black, stage:iia, alive, 627 days"

DF <- read.table(text = Lines, header = TRUE, sep = ";", as.is = TRUE, strip.white = TRUE)

Split dataframe into a list with vectors of unequal lengths

Map(function(x, a, b) x[a:b], df, seq_along(df), c(3, 5, 4, 8, 10))
# $X1
# [1] 1 2 3
# $X2
# [1] 2 3 4 5
# $X3
# [1] 3 4
# $X4
# [1] 4 5 6 7 8
# $X5
# [1]  5  6  7  8  9 10

Split a data frame by a factor and remove rows of unequal columns

Here's a base solution:

result = split(df, df$TOD)

# truncate to the fewest number of rows
result = lapply(result, head, min(sapply(result, nrow)))

result = do.call(cbind, result)
result
#   Day.TOD Day.Value Night.TOD Night.Value
# 1     Day       135     Night         145
# 2     Day       513     Night         267
# 3     Day       567     Night         589
# 4     Day       848     Night         258
# 5     Day       578     Night         278

Splitting a string column with unequal size into multiple columns using R

This is a good occasion to make use of extra = merge argument of separate:

library(dplyr)
df %>% 
  separate(str, c('A', 'B', 'C'), sep= ";", extra = 'merge')

  no    A     B     C
1  1 M 12  M 13  <NA>
2  2 M 24  <NA>  <NA>
3  3 <NA>  <NA>  <NA>
4  4 C 12  C 50  C 78

Split a data frame by a factor and remove rows of unequal columns

Here's a base solution:

result = split(df, df$TOD)

# truncate to the fewest number of rows
result = lapply(result, head, min(sapply(result, nrow)))

result = do.call(cbind, result)
result
#   Day.TOD Day.Value Night.TOD Night.Value
# 1     Day       135     Night         145
# 2     Day       513     Night         267
# 3     Day       567     Night         589
# 4     Day       848     Night         258
# 5     Day       578     Night         278

R: Split Variable Column into multiple (unbalanced) columns by comma

From Ananda's splitstackshape package:

cSplit(df, "Events", sep=",")
#    Name Age Number First      Events_1 Events_2 Events_3 Events_4
#1: Karen  24      8     0  Triathlon/IM Marathon      10k       5k
#2:  Kurt  39      2     0 Half-Marathon      10k       NA       NA
#3: Leah   18      0     1            NA       NA       NA       NA

Or with tidyr:

separate(df, 'Events', paste("Events", 1:4, sep="_"), sep=",", extra="drop")
#   Name Age Number               Events_1 Events_2 Events_3 Events_4 First
#1 Karen  24      8           Triathlon/IM Marathon      10k       5k     0
#2  Kurt  39      2          Half-Marathon      10k     <NA>     <NA>     0
#3 Leah   18      0                     NA     <NA>     <NA>     <NA>     1

With the data.table package:

setDT(df)[,paste0("Events_", 1:4) := tstrsplit(Events, ",")][,-"Events", with=F]
#    Name Age Number First               Events_1 Events_2 Events_3 Events_4
#1: Karen  24      8     0           Triathlon/IM Marathon      10k       5k
#2:  Kurt  39      2     0          Half-Marathon      10k       NA       NA
#3: Leah   18      0     1                     NA       NA       NA       NA

Data

df <- structure(list(Name = structure(1:3, .Label = c("Karen", "Kurt", 
"Leah "), class = "factor"), Age = c(24L, 39L, 18L), Number = c(8L, 
2L, 0L), Events = structure(c(3L, 2L, 1L), .Label = c("               NA", 
"         Half-Marathon,10k", "     Triathlon/IM,Marathon,10k,5k"
), class = "factor"), First = c(0L, 0L, 1L)), .Names = c("Name", 
"Age", "Number", "Events", "First"), class = "data.frame", row.names = c(NA, 
-3L))

split column having uneven character length values into two columns - one for characters & another for numerics

As a bit of an explanation (?<=[a-z])_(?=[1-9]) matches an _, then looks forward for a digit, (?=[1-9]) and looks back for a letter, (?<=[a-z]), since that's what we want to split the string on.

library(tidyr)
library(magrittr)
df %>% 
    separate(name, sep="(?<=[a-z])_(?=[1-9])", into=c("name", "year"))

   id           name year value
1 123           test 2001    15
2 123      test_area 2002    20
3 123 test_area_sqkm 2003    25

R: Split Unbalanced List in Data.Frame Column