Split/Subset a Data Frame by Factors in One Column

Split data.frame based on levels of a factor into new data.frames

I think that split does exactly what you want.

Notice that X is a list of data frames, as seen by str:

X <- split(df, df$g)
str(X)

If you want individual object with the group g names you could assign the elements of X from split to objects of those names, though this seems like extra work when you can just index the data frames from the list split creates.

#I used lapply just to drop the third column g which is no longer needed.
Y <- lapply(seq_along(X), function(x) as.data.frame(X[[x]])[, 1:2]) 

#Assign the dataframes in the list Y to individual objects
A <- Y[[1]]
B <- Y[[2]]
C <- Y[[3]]
D <- Y[[4]]
E <- Y[[5]]

#Or use lapply with assign to assign each piece to an object all at once
lapply(seq_along(Y), function(x) {
    assign(c("A", "B", "C", "D", "E")[x], Y[[x]], envir=.GlobalEnv)
    }
)

Edit Or even better than using lapply to assign to the global environment use list2env:

names(Y) <- c("A", "B", "C", "D", "E")
list2env(Y, envir = .GlobalEnv)
A

Split/subset a data frame by factors in one column

We could use split:

mylist <- split(df, df$State)

mylist
$AL
  ID Rate State
1  1   24    AL
4  4   34    AL

$FL
  ID Rate State
3  3   46    FL
6  6   99    FL

$MN
  ID Rate State
2  2   35    MN
5  5   78    MN

To access elements number:

mylist[[1]]

or by name:

mylist$AL
  ID Rate State
1  1   24    AL
4  4   34    AL

?split

Description

split divides the data in the vector x into the groups defined by f.
The replacement forms replace values corresponding to such a division.
unsplit reverses the effect of split.

How to subset/split a dataframe of multiple columns by common number of values available in R

This should work to what you are doing, and it produces a list of data frames that you can index into one at a time:

c <- sapply(df[, 2:ncol(df)], function(x) sum(!is.na(x)))
x <- sapply(unique(c), function(x) which(x == c))
dfList <- list(); for(i in 1:length(x)) {dfList[[i]] <- df[, c(1, as.numeric(x[[i]]) + 1)]}

Output is as follows:

dfList
[[1]]
         DATE    A    D E F
1  31/12/1999 79.5 36.7 3 6
2  03/01/2000 79.5 36.7 3 6
3  04/01/2000 79.5 36.7 3 6
4  05/01/2000 79.5 38.8 3 6
5  06/01/2000 79.5 20.3 3 6
6  07/01/2000 79.5 15.6 3 6
7  10/01/2000 79.5  5.4 3 6
8  11/01/2000 79.5 15.0 3 6
9  12/01/2000 79.5  9.3 3 6
10 13/01/2000 79.5 29.1 3 6

[[2]]
         DATE  B
1  31/12/1999 NA
2  03/01/2000 NA
3  04/01/2000 NA
4  05/01/2000 NA
5  06/01/2000 NA
6  07/01/2000 NA
7  10/01/2000  7
8  11/01/2000  7
9  12/01/2000  7
10 13/01/2000  7

[[3]]
         DATE     C   G      H
1  31/12/1999    NA  NA     NA
2  03/01/2000    NA  NA     NA
3  04/01/2000 325.0 961 3081.9
4  05/01/2000 322.5 945 2524.7
5  06/01/2000 327.5 952 3272.3
6  07/01/2000 327.5 941 2102.9
7  10/01/2000 327.5 946 2901.5
8  11/01/2000 327.5 888 9442.5
9  12/01/2000 331.5 870 7865.8
10 13/01/2000 334.0 853 7742.1

To retrieve only complete cases from each of the data frames in the data frame list above, you can do:

dfList <- sapply(dfList, function(x) x[complete.cases(x), ])

Resulting output will be the following list of the three data frames in this example:

[[1]]
         DATE    A    D E F
1  31/12/1999 79.5 36.7 3 6
2  03/01/2000 79.5 36.7 3 6
3  04/01/2000 79.5 36.7 3 6
4  05/01/2000 79.5 38.8 3 6
5  06/01/2000 79.5 20.3 3 6
6  07/01/2000 79.5 15.6 3 6
7  10/01/2000 79.5  5.4 3 6
8  11/01/2000 79.5 15.0 3 6
9  12/01/2000 79.5  9.3 3 6
10 13/01/2000 79.5 29.1 3 6

[[2]]
         DATE B
7  10/01/2000 7
8  11/01/2000 7
9  12/01/2000 7
10 13/01/2000 7

[[3]]
         DATE     C   G      H
3  04/01/2000 325.0 961 3081.9
4  05/01/2000 322.5 945 2524.7
5  06/01/2000 327.5 952 3272.3
6  07/01/2000 327.5 941 2102.9
7  10/01/2000 327.5 946 2901.5
8  11/01/2000 327.5 888 9442.5
9  12/01/2000 331.5 870 7865.8
10 13/01/2000 334.0 853 7742.1

You can access each of these data frames as follows:

for (i in 1:lenght(dfList)) {dfList[[i]]}

Split dataframe by levels of a factor and name dataframes by those levels

You can do it with the plyr package

require(plyr)
dlply(df, .(Z))

Splitting data frame into segments for each factor based on a cutoff value in a column in R

In data.table:

dt[, V1 := paste0("A.", 1+cumsum(V4 >= 0.4))]

In dplyr:

df %>%
  mutate(V1 = paste0("A.", 1+cumsum(V4 >= 0.4)))

In R, how to split/subset a data frame by factors in more than one column?

Another simple solution is to use by:

list.df <- by(df, INDICES =  list(df$Test.Type, df$Subject), FUN = data.frame)

Results

> list.df
: Unit test 1
: English
  ID   Test.Type Subject Marks
1  1 Unit test 1 English    85
2  2 Unit test 1 English    75
3  3 Unit test 1 English    78
-------------------------------------------------------------------------------------------------- 
: Unit test 2
: English
  ID   Test.Type Subject Marks
4  1 Unit test 2 English    85
5  2 Unit test 2 English    75
6  3 Unit test 2 English    78
-------------------------------------------------------------------------------------------------- 
: Unit test 1
: Maths
  ID   Test.Type Subject Marks
7  1 Unit test 1   Maths    78
8  2 Unit test 1   Maths    79
9  3 Unit test 1   Maths    98
-------------------------------------------------------------------------------------------------- 
: Unit test 2
: Maths
   ID   Test.Type Subject Marks
10  1 Unit test 2   Maths    95
11  2 Unit test 2   Maths    98
12  3 Unit test 2   Maths    88

You can then access each individual dataframe by using list.df[[1]] through list.df[[4]].

(And thx to Richard Scriven for dputing the data in his answer.)

Subsetting a data.frame based on factor levels in a second data.frame

df.1[,unique(df.2$Var[which(df.2$Info=="X1")])]

           A            C
1  0.8924861 0.7149490854
2  0.5711894 0.7200819517
3  0.7049629 0.0004052017
4  0.9188677 0.5007302717
5  0.3440664 0.9138259818
6  0.8657903 0.2724015017
7  0.7631228 0.5686033906
8  0.8388003 0.7377064163
9  0.0796059 0.6196693045
10 0.5029824 0.8717568610