R: Split unbalanced list in data.frame column
#Split by ; as before
allJobs <- strsplit(df$b, ";", fixed=TRUE)
#Replicate a by the number of jobs in each case
n <- sapply(allJobs, length)
id <- rep(df$a, times = n)
#Turn allJobs into a vector
job <- unlist(allJobs)
#Retrieve position of each job
jobNum <- unlist(lapply(n, seq_len))
#Combine into a data frame
df2 <- data.frame(id = id, job = job, jobNum = jobNum)
How to split a column into multiple (non equal) columns in R
We could use cSplit
from splitstackshape
library(splitstackshape)
cSplit(DF, "Col1",",")
-output
cSplit(DF, "Col1",",")
Col1_1 Col1_2 Col1_3 Col1_4
1: a b c <NA>
2: a b <NA> <NA>
3: a b c d
How to split my data frame in equal length lists
Although it seems like an easy task, it was very challenging splitting a balanced panel data into small balance panels.
@Allan Cameron's answer got it right in the length of the list but not the content. My panels were unbalanced, each clvs
had 188 or 187 in the same chunk, and datetime
was not consecutive. B[["1"]]
had a sequence of 7:00
,13:00
and 19:00
for one clvs
for example. With unbalanced panels my loop with an splm
function didn't work.
The solution was using gl.unequal
:
library(DTK)
f<-gl.unequal(n=6,k=c(92,92,92,92,92,91))
B<-split(bb3,f)
This way I get balanced panels, for example B[["1"]]
head(B3[["1"]])
1 07AC~ 2017~ 1 686. 684. 2.19 0 2017-02~ 2017-02-28 02:00:00
2 07AC~ 2017~ 2 665. 664. 1.79 0 2017-02~ 2017-02-28 03:00:00
3 07AC~ 2017~ 3 393. 392. 1.11 0 2017-02~ 2017-02-28 04:00:00
4 07AC~ 2017~ 4 383. 381. 1.4 0 2017-02~ 2017-02-28 05:00:00
5 07AC~ 2017~ 5 383. 381. 1.41 0 2017-02~ 2017-02-28 06:00:00
6 07AC~ 2017~ 6 389. 388. 1.07 0 2017-02~ 2017-02-28 07:00:00
is.pbalanced(B[["1"]])
TRUE
Split an uneven column in a dataframe into multiple columns in R
Using the data shown reproducibly in the Note at the end we can use read.pattern
with the indicated pattern pat
and then remove junk columns (every other column). The lines marked ## can be omitted if you don't require the column names to be exactly as in the question.
library(gsubfn)
pat <-
"((\\d+ years), )?((female|male), )?((white|black), )?((stage:\\S+), )?((alive|dead), )?((\\d+) days)?"
r <- read.pattern(text = as.character(DF$Info), pattern = pat, as.is = TRUE)
DF2 <- cbind(Sample = DF$Sample, r[c(FALSE, TRUE)], stringsAsFactors = FALSE)
nc <- ncol(DF2) ##
names(DF2)[-1] <- paste0("Info_", 1:(nc-1)) ##
DF2
giving:
Sample Info_1 Info_2 Info_3 Info_4 Info_5 Info_6
1 Sample1 82 years female white stage:iiib alive 1419
2 Sample2 53 years male stage:iiib alive 792
3 Sample3 68 years female white stage:iiic dead 740
4 Sample4 43 years male white stage:iiic alive 598
5 Sample5 74 years white stage:i alive 1001
6 Sample6 37 years female white alive 257
7 Sample7 69 years female black stage:iia alive 627
Note
The input DF
in reproducible form is as follows.
Lines <- "
Sample;Info
Sample1;82 years, female, white, stage:iiib, alive, 1419 days
Sample2;53 years, male, stage:iiib, alive, 792 days
Sample3;68 years, female, white, stage:iiic, dead, 740 days
Sample4;43 years, male, white, stage:iiic, alive, 598 days
Sample5;74 years, white, stage:i, alive, 1001 days
Sample6;37 years, female, white, alive, 257 days
Sample7;69 years, female, black, stage:iia, alive, 627 days"
DF <- read.table(text = Lines, header = TRUE, sep = ";", as.is = TRUE, strip.white = TRUE)
Split dataframe into a list with vectors of unequal lengths
Map(function(x, a, b) x[a:b], df, seq_along(df), c(3, 5, 4, 8, 10))
# $X1
# [1] 1 2 3
# $X2
# [1] 2 3 4 5
# $X3
# [1] 3 4
# $X4
# [1] 4 5 6 7 8
# $X5
# [1] 5 6 7 8 9 10
Split a data frame by a factor and remove rows of unequal columns
Here's a base solution:
result = split(df, df$TOD)
# truncate to the fewest number of rows
result = lapply(result, head, min(sapply(result, nrow)))
result = do.call(cbind, result)
result
# Day.TOD Day.Value Night.TOD Night.Value
# 1 Day 135 Night 145
# 2 Day 513 Night 267
# 3 Day 567 Night 589
# 4 Day 848 Night 258
# 5 Day 578 Night 278
Splitting a string column with unequal size into multiple columns using R
This is a good occasion to make use of extra = merge
argument of separate
:
library(dplyr)
df %>%
separate(str, c('A', 'B', 'C'), sep= ";", extra = 'merge')
no A B C
1 1 M 12 M 13 <NA>
2 2 M 24 <NA> <NA>
3 3 <NA> <NA> <NA>
4 4 C 12 C 50 C 78
Split a data frame by a factor and remove rows of unequal columns
Here's a base solution:
result = split(df, df$TOD)
# truncate to the fewest number of rows
result = lapply(result, head, min(sapply(result, nrow)))
result = do.call(cbind, result)
result
# Day.TOD Day.Value Night.TOD Night.Value
# 1 Day 135 Night 145
# 2 Day 513 Night 267
# 3 Day 567 Night 589
# 4 Day 848 Night 258
# 5 Day 578 Night 278
R: Split Variable Column into multiple (unbalanced) columns by comma
From Ananda's splitstackshape
package:
cSplit(df, "Events", sep=",")
# Name Age Number First Events_1 Events_2 Events_3 Events_4
#1: Karen 24 8 0 Triathlon/IM Marathon 10k 5k
#2: Kurt 39 2 0 Half-Marathon 10k NA NA
#3: Leah 18 0 1 NA NA NA NA
Or with tidyr
:
separate(df, 'Events', paste("Events", 1:4, sep="_"), sep=",", extra="drop")
# Name Age Number Events_1 Events_2 Events_3 Events_4 First
#1 Karen 24 8 Triathlon/IM Marathon 10k 5k 0
#2 Kurt 39 2 Half-Marathon 10k <NA> <NA> 0
#3 Leah 18 0 NA <NA> <NA> <NA> 1
With the data.table
package:
setDT(df)[,paste0("Events_", 1:4) := tstrsplit(Events, ",")][,-"Events", with=F]
# Name Age Number First Events_1 Events_2 Events_3 Events_4
#1: Karen 24 8 0 Triathlon/IM Marathon 10k 5k
#2: Kurt 39 2 0 Half-Marathon 10k NA NA
#3: Leah 18 0 1 NA NA NA NA
Data
df <- structure(list(Name = structure(1:3, .Label = c("Karen", "Kurt",
"Leah "), class = "factor"), Age = c(24L, 39L, 18L), Number = c(8L,
2L, 0L), Events = structure(c(3L, 2L, 1L), .Label = c(" NA",
" Half-Marathon,10k", " Triathlon/IM,Marathon,10k,5k"
), class = "factor"), First = c(0L, 0L, 1L)), .Names = c("Name",
"Age", "Number", "Events", "First"), class = "data.frame", row.names = c(NA,
-3L))
split column having uneven character length values into two columns - one for characters & another for numerics
As a bit of an explanation (?<=[a-z])_(?=[1-9])
matches an _
, then looks forward for a digit, (?=[1-9])
and looks back for a letter, (?<=[a-z])
, since that's what we want to split the string on.
library(tidyr)
library(magrittr)
df %>%
separate(name, sep="(?<=[a-z])_(?=[1-9])", into=c("name", "year"))
id name year value
1 123 test 2001 15
2 123 test_area 2002 20
3 123 test_area_sqkm 2003 25
Related Topics
R: How to Rbind Two Huge Data-Frames Without Running Out of Memory
Apply a Function to a Subset of Data.Table Columns, by Column-Indices Instead of Name
R: Split Unbalanced List in Data.Frame Column
Greek Letters, Symbols, and Line Breaks Inside a Ggplot Legend Label
How to Separate Comma Separated Values in R in a New Row
Breaking Loop When "Warnings()" Appear in R
How to Merge and Sum Two Data Frames
Purrr Map Equivalent of Nested for Loop
R Matrix to Rownames Colnames Values
How to Change Xts to Data.Frame and Keep Index
Data.Frame Merge and Selection of Values Which Are Common in 2 Data.Frames
Combined Plot of Ggplot2 (Not in a Single Plot), Using Par() or Layout() Function
Group by Two Columns in Ggplot2
Get Column Index from Label in a Data Frame
Standard Error Bars Using Stat_Summary
Add a New Column to a Dataframe Using Matching Values of Another Dataframe