Split/Subset a Data Frame by Factors in One Column

Split data.frame based on levels of a factor into new data.frames

I think that split does exactly what you want.

Notice that X is a list of data frames, as seen by str:

X <- split(df, df$g)
str(X)

If you want individual object with the group g names you could assign the elements of X from split to objects of those names, though this seems like extra work when you can just index the data frames from the list split creates.

#I used lapply just to drop the third column g which is no longer needed.
Y <- lapply(seq_along(X), function(x) as.data.frame(X[[x]])[, 1:2])

#Assign the dataframes in the list Y to individual objects
A <- Y[[1]]
B <- Y[[2]]
C <- Y[[3]]
D <- Y[[4]]
E <- Y[[5]]

#Or use lapply with assign to assign each piece to an object all at once
lapply(seq_along(Y), function(x) {
assign(c("A", "B", "C", "D", "E")[x], Y[[x]], envir=.GlobalEnv)
}
)

Edit Or even better than using lapply to assign to the global environment use list2env:

names(Y) <- c("A", "B", "C", "D", "E")
list2env(Y, envir = .GlobalEnv)
A

Split/subset a data frame by factors in one column

We could use split:

mylist <- split(df, df$State)

mylist
$AL
ID Rate State
1 1 24 AL
4 4 34 AL

$FL
ID Rate State
3 3 46 FL
6 6 99 FL

$MN
ID Rate State
2 2 35 MN
5 5 78 MN

To access elements number:

mylist[[1]]

or by name:

mylist$AL
ID Rate State
1 1 24 AL
4 4 34 AL

?split

Description

split divides the data in the vector x into the groups defined by f.
The replacement forms replace values corresponding to such a division.
unsplit reverses the effect of split.

How to subset/split a dataframe of multiple columns by common number of values available in R

This should work to what you are doing, and it produces a list of data frames that you can index into one at a time:

c <- sapply(df[, 2:ncol(df)], function(x) sum(!is.na(x)))
x <- sapply(unique(c), function(x) which(x == c))
dfList <- list(); for(i in 1:length(x)) {dfList[[i]] <- df[, c(1, as.numeric(x[[i]]) + 1)]}

Output is as follows:

dfList
[[1]]
DATE A D E F
1 31/12/1999 79.5 36.7 3 6
2 03/01/2000 79.5 36.7 3 6
3 04/01/2000 79.5 36.7 3 6
4 05/01/2000 79.5 38.8 3 6
5 06/01/2000 79.5 20.3 3 6
6 07/01/2000 79.5 15.6 3 6
7 10/01/2000 79.5 5.4 3 6
8 11/01/2000 79.5 15.0 3 6
9 12/01/2000 79.5 9.3 3 6
10 13/01/2000 79.5 29.1 3 6

[[2]]
DATE B
1 31/12/1999 NA
2 03/01/2000 NA
3 04/01/2000 NA
4 05/01/2000 NA
5 06/01/2000 NA
6 07/01/2000 NA
7 10/01/2000 7
8 11/01/2000 7
9 12/01/2000 7
10 13/01/2000 7

[[3]]
DATE C G H
1 31/12/1999 NA NA NA
2 03/01/2000 NA NA NA
3 04/01/2000 325.0 961 3081.9
4 05/01/2000 322.5 945 2524.7
5 06/01/2000 327.5 952 3272.3
6 07/01/2000 327.5 941 2102.9
7 10/01/2000 327.5 946 2901.5
8 11/01/2000 327.5 888 9442.5
9 12/01/2000 331.5 870 7865.8
10 13/01/2000 334.0 853 7742.1

To retrieve only complete cases from each of the data frames in the data frame list above, you can do:

dfList <- sapply(dfList, function(x) x[complete.cases(x), ])

Resulting output will be the following list of the three data frames in this example:

[[1]]
DATE A D E F
1 31/12/1999 79.5 36.7 3 6
2 03/01/2000 79.5 36.7 3 6
3 04/01/2000 79.5 36.7 3 6
4 05/01/2000 79.5 38.8 3 6
5 06/01/2000 79.5 20.3 3 6
6 07/01/2000 79.5 15.6 3 6
7 10/01/2000 79.5 5.4 3 6
8 11/01/2000 79.5 15.0 3 6
9 12/01/2000 79.5 9.3 3 6
10 13/01/2000 79.5 29.1 3 6

[[2]]
DATE B
7 10/01/2000 7
8 11/01/2000 7
9 12/01/2000 7
10 13/01/2000 7

[[3]]
DATE C G H
3 04/01/2000 325.0 961 3081.9
4 05/01/2000 322.5 945 2524.7
5 06/01/2000 327.5 952 3272.3
6 07/01/2000 327.5 941 2102.9
7 10/01/2000 327.5 946 2901.5
8 11/01/2000 327.5 888 9442.5
9 12/01/2000 331.5 870 7865.8
10 13/01/2000 334.0 853 7742.1

You can access each of these data frames as follows:

for (i in 1:lenght(dfList)) {dfList[[i]]}

Split dataframe by levels of a factor and name dataframes by those levels

You can do it with the plyr package

require(plyr)
dlply(df, .(Z))

Splitting data frame into segments for each factor based on a cutoff value in a column in R

In data.table:

dt[, V1 := paste0("A.", 1+cumsum(V4 >= 0.4))]

In dplyr:

df %>%
mutate(V1 = paste0("A.", 1+cumsum(V4 >= 0.4)))

In R, how to split/subset a data frame by factors in more than one column?

Another simple solution is to use by:

list.df <- by(df, INDICES =  list(df$Test.Type, df$Subject), FUN = data.frame)

Results

> list.df
: Unit test 1
: English
ID Test.Type Subject Marks
1 1 Unit test 1 English 85
2 2 Unit test 1 English 75
3 3 Unit test 1 English 78
--------------------------------------------------------------------------------------------------
: Unit test 2
: English
ID Test.Type Subject Marks
4 1 Unit test 2 English 85
5 2 Unit test 2 English 75
6 3 Unit test 2 English 78
--------------------------------------------------------------------------------------------------
: Unit test 1
: Maths
ID Test.Type Subject Marks
7 1 Unit test 1 Maths 78
8 2 Unit test 1 Maths 79
9 3 Unit test 1 Maths 98
--------------------------------------------------------------------------------------------------
: Unit test 2
: Maths
ID Test.Type Subject Marks
10 1 Unit test 2 Maths 95
11 2 Unit test 2 Maths 98
12 3 Unit test 2 Maths 88

You can then access each individual dataframe by using list.df[[1]] through list.df[[4]].

(And thx to Richard Scriven for dputing the data in his answer.)

Subsetting a data.frame based on factor levels in a second data.frame

df.1[,unique(df.2$Var[which(df.2$Info=="X1")])]

           A            C
1 0.8924861 0.7149490854
2 0.5711894 0.7200819517
3 0.7049629 0.0004052017
4 0.9188677 0.5007302717
5 0.3440664 0.9138259818
6 0.8657903 0.2724015017
7 0.7631228 0.5686033906
8 0.8388003 0.7377064163
9 0.0796059 0.6196693045
10 0.5029824 0.8717568610


Related Topics



Leave a reply



Submit