Split Data.Frame Based on Levels of a Factor into New Data.Frames

Split data.frame based on levels of a factor into new data.frames

I think that split does exactly what you want.

Notice that X is a list of data frames, as seen by str:

X <- split(df, df$g)
str(X)

If you want individual object with the group g names you could assign the elements of X from split to objects of those names, though this seems like extra work when you can just index the data frames from the list split creates.

#I used lapply just to drop the third column g which is no longer needed.
Y <- lapply(seq_along(X), function(x) as.data.frame(X[[x]])[, 1:2])

#Assign the dataframes in the list Y to individual objects
A <- Y[[1]]
B <- Y[[2]]
C <- Y[[3]]
D <- Y[[4]]
E <- Y[[5]]

#Or use lapply with assign to assign each piece to an object all at once
lapply(seq_along(Y), function(x) {
assign(c("A", "B", "C", "D", "E")[x], Y[[x]], envir=.GlobalEnv)
}
)

Edit Or even better than using lapply to assign to the global environment use list2env:

names(Y) <- c("A", "B", "C", "D", "E")
list2env(Y, envir = .GlobalEnv)
A

Split dataframe by levels of a factor and name dataframes by those levels

You can do it with the plyr package

require(plyr)
dlply(df, .(Z))

Splitting data frame into segments for each factor based on a cutoff value in a column in R

In data.table:

dt[, V1 := paste0("A.", 1+cumsum(V4 >= 0.4))]

In dplyr:

df %>%
mutate(V1 = paste0("A.", 1+cumsum(V4 >= 0.4)))

Split a data.frame into smaller data.frames, based on the start and end indices (held in two vectors) and using a condition

Based on the answer to initial comment regarding row indices and using a similar 3-part approach like @Roland, the following should be what you want.

This creates a generic function to return all rows from "start" to "end" (assuming the provided elements are integers)

split_data <- function( start, end, dfr ){
dfr[start:end,]
}

This creates a list of ALL available splits.

split.frames <- mapply(split_data,START,END,MoreArgs=list(dfr=ALL_DATA))

This returns a logical vector with the ith element equal to TRUE if the ith split meets the desired condition.

cond <- sapply( split.frames, function(x){sum(x$Value)>=2} )

This returns only the splits that meet the condition.

split.frames <- split.frames[cond]

EDIT #1

Per the comment about saving off the splits, it is probably best to use the str_pad() function from the R package stringr for creating the file names, but here is a base R implementation that should work for you.

nchars <- nchar( length(split.frames) )

print.expr <- paste0("%0",nchars,"d")

for( i in 1:seq_along(split.frames) ){
file.i <- paste0( sprintf(print.expr,i), ".dat" )
write.table( split.frames[[i]], file=file.i, sep="\t", row.names=FALSE )
}

Not sure if you want column and/or row names in your saved outputs, but I assumed they were YES and NO respectively.

Split a data frame based on some criteria

if you want to have in the first rows the values that are in vec, you can create two data frames, one that corresponds to the values in vec and one where they're not.

Then, you can concatenate them with rbind:

in.vec = DF$col2 %in% vec
new.DF = rbind(DF[in.vec,], DF[!in.vec])

where DF[in.vec,] selected the rows for which the values of col2 can be found in vec.

Subsetting a data.frame based on factor levels in a second data.frame

df.1[,unique(df.2$Var[which(df.2$Info=="X1")])]

           A            C
1 0.8924861 0.7149490854
2 0.5711894 0.7200819517
3 0.7049629 0.0004052017
4 0.9188677 0.5007302717
5 0.3440664 0.9138259818
6 0.8657903 0.2724015017
7 0.7631228 0.5686033906
8 0.8388003 0.7377064163
9 0.0796059 0.6196693045
10 0.5029824 0.8717568610

Split/subset a data frame by factors in one column

We could use split:

mylist <- split(df, df$State)

mylist
$AL
ID Rate State
1 1 24 AL
4 4 34 AL

$FL
ID Rate State
3 3 46 FL
6 6 99 FL

$MN
ID Rate State
2 2 35 MN
5 5 78 MN

To access elements number:

mylist[[1]]

or by name:

mylist$AL
ID Rate State
1 1 24 AL
4 4 34 AL

?split

Description

split divides the data in the vector x into the groups defined by f.
The replacement forms replace values corresponding to such a division.
unsplit reverses the effect of split.

Splitting a data frame based on character string

The main idea is to create a factor used to define the grouping for splitting. One way is by extracting the digits pattern form the provided variable Barcode using regular expression. Then we convert the obtained character vector of digits to a factor with as.factor().
We can, of course, use other regular expression techniques to get the job done, or more user friendly wrapper functions from the stringr package, like in the second example (the tidyverse-ish approach).

Example 1

A base R solution using split:

# The provided data
Barcode <- c("ABCD-1", "ABCC-1", "ABCD-2", "ABCC-2", "ABCD-3", "ABCC-3",
"ABCD-4", "ABCC-4", "ABCD-5", "ABCC-5","ABCD-6", "ABCC-6")
bar_f <- data.frame(Barcode)

factor_for_split <- regmatches(x = bar_f$Barcode,
m = regexpr(pattern = "[[:digit:]]",
text = bar_f$Barcode))
factor_for_split
#> [1] "1" "1" "2" "2" "3" "3" "4" "4" "5" "5" "6" "6"

# Create a list of 6 data frames as asked
lst <- split(x = bar_f, f = as.factor(factor_for_split))
lst
#> $`1`
#> Barcode
#> 1 ABCD-1
#> 2 ABCC-1
#>
#> $`2`
#> Barcode
#> 3 ABCD-2
#> 4 ABCC-2
#>
#> $`3`
#> Barcode
#> 5 ABCD-3
#> 6 ABCC-3
#>
#> $`4`
#> Barcode
#> 7 ABCD-4
#> 8 ABCC-4
#>
#> $`5`
#> Barcode
#> 9 ABCD-5
#> 10 ABCC-5
#>
#> $`6`
#> Barcode
#> 11 ABCD-6
#> 12 ABCC-6

# Edit names of the list
names(lst) <- paste0("df_", names(lst))

# Assign each data frame from the list to a data frame object in the global
# environment
for(name in names(lst)) {
assign(name, lst[[name]])
}

Created on 2020-02-24 by the reprex package (v0.3.0)

Example 2

And, if you prefer, here is a tidyverse-ish approach:

library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(stringr)

Barcode <- c("ABCD-1", "ABCC-1", "ABCD-2", "ABCC-2", "ABCD-3", "ABCC-3",
"ABCD-4", "ABCC-4", "ABCD-5", "ABCC-5","ABCD-6", "ABCC-6")
bar_f <- data.frame(Barcode)

bar_f %>%
mutate(factor_for_split = str_extract(string = Barcode,
pattern = "[[:digit:]]")) %>%
group_split(factor_for_split)
#> [[1]]
#> # A tibble: 2 x 2
#> Barcode factor_for_split
#> <fct> <chr>
#> 1 ABCD-1 1
#> 2 ABCC-1 1
#>
#> [[2]]
#> # A tibble: 2 x 2
#> Barcode factor_for_split
#> <fct> <chr>
#> 1 ABCD-2 2
#> 2 ABCC-2 2
#>
#> [[3]]
#> # A tibble: 2 x 2
#> Barcode factor_for_split
#> <fct> <chr>
#> 1 ABCD-3 3
#> 2 ABCC-3 3
#>
#> [[4]]
#> # A tibble: 2 x 2
#> Barcode factor_for_split
#> <fct> <chr>
#> 1 ABCD-4 4
#> 2 ABCC-4 4
#>
#> [[5]]
#> # A tibble: 2 x 2
#> Barcode factor_for_split
#> <fct> <chr>
#> 1 ABCD-5 5
#> 2 ABCC-5 5
#>
#> [[6]]
#> # A tibble: 2 x 2
#> Barcode factor_for_split
#> <fct> <chr>
#> 1 ABCD-6 6
#> 2 ABCC-6 6
#>
#> attr(,"ptype")
#> # A tibble: 0 x 2
#> # ... with 2 variables: Barcode <fct>, factor_for_split <chr>

names(lst) <- paste0("df_", 1:length(lst))
for(name in names(lst)) {
assign(name, lst[[name]])

Created on 2020-02-24 by the reprex package (v0.3.0)



Related Topics



Leave a reply



Submit