Split data.frame based on levels of a factor into new data.frames
I think that split
does exactly what you want.
Notice that X is a list of data frames, as seen by str
:
X <- split(df, df$g)
str(X)
If you want individual object with the group g names you could assign the elements of X from split
to objects of those names, though this seems like extra work when you can just index the data frames from the list split
creates.
#I used lapply just to drop the third column g which is no longer needed.
Y <- lapply(seq_along(X), function(x) as.data.frame(X[[x]])[, 1:2])
#Assign the dataframes in the list Y to individual objects
A <- Y[[1]]
B <- Y[[2]]
C <- Y[[3]]
D <- Y[[4]]
E <- Y[[5]]
#Or use lapply with assign to assign each piece to an object all at once
lapply(seq_along(Y), function(x) {
assign(c("A", "B", "C", "D", "E")[x], Y[[x]], envir=.GlobalEnv)
}
)
Edit Or even better than using lapply
to assign to the global environment use list2env
:
names(Y) <- c("A", "B", "C", "D", "E")
list2env(Y, envir = .GlobalEnv)
A
Split/subset a data frame by factors in one column
We could use split
:
mylist <- split(df, df$State)
mylist
$AL
ID Rate State
1 1 24 AL
4 4 34 AL
$FL
ID Rate State
3 3 46 FL
6 6 99 FL
$MN
ID Rate State
2 2 35 MN
5 5 78 MN
To access elements number:
mylist[[1]]
or by name:
mylist$AL
ID Rate State
1 1 24 AL
4 4 34 AL
?split
Description
split divides the data in the vector x into the groups defined by f.
The replacement forms replace values corresponding to such a division.
unsplit reverses the effect of split.
How to subset/split a dataframe of multiple columns by common number of values available in R
This should work to what you are doing, and it produces a list of data frames that you can index into one at a time:
c <- sapply(df[, 2:ncol(df)], function(x) sum(!is.na(x)))
x <- sapply(unique(c), function(x) which(x == c))
dfList <- list(); for(i in 1:length(x)) {dfList[[i]] <- df[, c(1, as.numeric(x[[i]]) + 1)]}
Output is as follows:
dfList
[[1]]
DATE A D E F
1 31/12/1999 79.5 36.7 3 6
2 03/01/2000 79.5 36.7 3 6
3 04/01/2000 79.5 36.7 3 6
4 05/01/2000 79.5 38.8 3 6
5 06/01/2000 79.5 20.3 3 6
6 07/01/2000 79.5 15.6 3 6
7 10/01/2000 79.5 5.4 3 6
8 11/01/2000 79.5 15.0 3 6
9 12/01/2000 79.5 9.3 3 6
10 13/01/2000 79.5 29.1 3 6
[[2]]
DATE B
1 31/12/1999 NA
2 03/01/2000 NA
3 04/01/2000 NA
4 05/01/2000 NA
5 06/01/2000 NA
6 07/01/2000 NA
7 10/01/2000 7
8 11/01/2000 7
9 12/01/2000 7
10 13/01/2000 7
[[3]]
DATE C G H
1 31/12/1999 NA NA NA
2 03/01/2000 NA NA NA
3 04/01/2000 325.0 961 3081.9
4 05/01/2000 322.5 945 2524.7
5 06/01/2000 327.5 952 3272.3
6 07/01/2000 327.5 941 2102.9
7 10/01/2000 327.5 946 2901.5
8 11/01/2000 327.5 888 9442.5
9 12/01/2000 331.5 870 7865.8
10 13/01/2000 334.0 853 7742.1
To retrieve only complete cases from each of the data frames in the data frame list above, you can do:
dfList <- sapply(dfList, function(x) x[complete.cases(x), ])
Resulting output will be the following list of the three data frames in this example:
[[1]]
DATE A D E F
1 31/12/1999 79.5 36.7 3 6
2 03/01/2000 79.5 36.7 3 6
3 04/01/2000 79.5 36.7 3 6
4 05/01/2000 79.5 38.8 3 6
5 06/01/2000 79.5 20.3 3 6
6 07/01/2000 79.5 15.6 3 6
7 10/01/2000 79.5 5.4 3 6
8 11/01/2000 79.5 15.0 3 6
9 12/01/2000 79.5 9.3 3 6
10 13/01/2000 79.5 29.1 3 6
[[2]]
DATE B
7 10/01/2000 7
8 11/01/2000 7
9 12/01/2000 7
10 13/01/2000 7
[[3]]
DATE C G H
3 04/01/2000 325.0 961 3081.9
4 05/01/2000 322.5 945 2524.7
5 06/01/2000 327.5 952 3272.3
6 07/01/2000 327.5 941 2102.9
7 10/01/2000 327.5 946 2901.5
8 11/01/2000 327.5 888 9442.5
9 12/01/2000 331.5 870 7865.8
10 13/01/2000 334.0 853 7742.1
You can access each of these data frames as follows:
for (i in 1:lenght(dfList)) {dfList[[i]]}
Split dataframe by levels of a factor and name dataframes by those levels
You can do it with the plyr
package
require(plyr)
dlply(df, .(Z))
Splitting data frame into segments for each factor based on a cutoff value in a column in R
In data.table
:
dt[, V1 := paste0("A.", 1+cumsum(V4 >= 0.4))]
In dplyr
:
df %>%
mutate(V1 = paste0("A.", 1+cumsum(V4 >= 0.4)))
In R, how to split/subset a data frame by factors in more than one column?
Another simple solution is to use by
:
list.df <- by(df, INDICES = list(df$Test.Type, df$Subject), FUN = data.frame)
Results
> list.df
: Unit test 1
: English
ID Test.Type Subject Marks
1 1 Unit test 1 English 85
2 2 Unit test 1 English 75
3 3 Unit test 1 English 78
--------------------------------------------------------------------------------------------------
: Unit test 2
: English
ID Test.Type Subject Marks
4 1 Unit test 2 English 85
5 2 Unit test 2 English 75
6 3 Unit test 2 English 78
--------------------------------------------------------------------------------------------------
: Unit test 1
: Maths
ID Test.Type Subject Marks
7 1 Unit test 1 Maths 78
8 2 Unit test 1 Maths 79
9 3 Unit test 1 Maths 98
--------------------------------------------------------------------------------------------------
: Unit test 2
: Maths
ID Test.Type Subject Marks
10 1 Unit test 2 Maths 95
11 2 Unit test 2 Maths 98
12 3 Unit test 2 Maths 88
You can then access each individual dataframe by using list.df[[1]]
through list.df[[4]]
.
(And thx to Richard Scriven for dput
ing the data in his answer.)
Subsetting a data.frame based on factor levels in a second data.frame
df.1[,unique(df.2$Var[which(df.2$Info=="X1")])]
A C
1 0.8924861 0.7149490854
2 0.5711894 0.7200819517
3 0.7049629 0.0004052017
4 0.9188677 0.5007302717
5 0.3440664 0.9138259818
6 0.8657903 0.2724015017
7 0.7631228 0.5686033906
8 0.8388003 0.7377064163
9 0.0796059 0.6196693045
10 0.5029824 0.8717568610
Related Topics
Knitr Gets Tricked by Data.Table ':=' Assignment
How to Change Order of Array Dimensions
Ggplot Replace Count with Percentage in Geom_Bar
Function to Split a Matrix into Sub-Matrices in R
Add Text to Horizontal Barplot in R, Y-Axis at Different Scale
How to Replace Na with Most Recent Non-Na by Group
R: Select Values from Data Table in Range
How to Programmatically Extract/Unzip a .7Z (7-Zip) File with R
Grid of Multiple Ggplot2 Plots Which Have Been Made in a for Loop
Control the Height in Fluidrow in R Shiny
Exporting Non-S3-Methods with Dots in the Name Using Roxygen2 V4
Rstudio Shiny Error: There Is No Package Called "Shinydashboard"
Replace Duplicated Elements with Na, Instead of Removing Them
Reasons That Ggplot2 Legend Does Not Appear