Split dataframe by levels of a factor and name dataframes by those levels
You can do it with the plyr
package
require(plyr)
dlply(df, .(Z))
Split data.frame based on levels of a factor into new data.frames
I think that split
does exactly what you want.
Notice that X is a list of data frames, as seen by str
:
X <- split(df, df$g)
str(X)
If you want individual object with the group g names you could assign the elements of X from split
to objects of those names, though this seems like extra work when you can just index the data frames from the list split
creates.
#I used lapply just to drop the third column g which is no longer needed.
Y <- lapply(seq_along(X), function(x) as.data.frame(X[[x]])[, 1:2])
#Assign the dataframes in the list Y to individual objects
A <- Y[[1]]
B <- Y[[2]]
C <- Y[[3]]
D <- Y[[4]]
E <- Y[[5]]
#Or use lapply with assign to assign each piece to an object all at once
lapply(seq_along(Y), function(x) {
assign(c("A", "B", "C", "D", "E")[x], Y[[x]], envir=.GlobalEnv)
}
)
Edit Or even better than using lapply
to assign to the global environment use list2env
:
names(Y) <- c("A", "B", "C", "D", "E")
list2env(Y, envir = .GlobalEnv)
A
Subsetting a data.frame based on factor levels in a second data.frame
df.1[,unique(df.2$Var[which(df.2$Info=="X1")])]
A C
1 0.8924861 0.7149490854
2 0.5711894 0.7200819517
3 0.7049629 0.0004052017
4 0.9188677 0.5007302717
5 0.3440664 0.9138259818
6 0.8657903 0.2724015017
7 0.7631228 0.5686033906
8 0.8388003 0.7377064163
9 0.0796059 0.6196693045
10 0.5029824 0.8717568610
Splitting data frame according to (dichotomous) values in a column
# Using data frames
DF1 <- OriginalDF[OriginalDF$SEX == 0, ]
DF2 <- OriginalDF[OriginalDF$SEX == 1, ]
# If it's very large, I recommend you data.table
library(data.table)
OriginalDT <- data.table(OriginalDF)
DT1 <- OriginalDT[SEX == 0]
DT2 <- OriginalDT[SEX == 1]
Split/subset a data frame by factors in one column
We could use split
:
mylist <- split(df, df$State)
mylist
$AL
ID Rate State
1 1 24 AL
4 4 34 AL
$FL
ID Rate State
3 3 46 FL
6 6 99 FL
$MN
ID Rate State
2 2 35 MN
5 5 78 MN
To access elements number:
mylist[[1]]
or by name:
mylist$AL
ID Rate State
1 1 24 AL
4 4 34 AL
?split
Description
split divides the data in the vector x into the groups defined by f.
The replacement forms replace values corresponding to such a division.
unsplit reverses the effect of split.
split a dataframe on a factor and apply a function
You can replace
sdat <- with(dat, split(dat, strat.var))
with
sdat <- split(dat, dat[strat.var])
in the myFun
.
The previous code was not splitting
as it was intended, instead you were getting the sum
for the whole data, i.e.
sum(with(warpbreaks, tapply(breaks, tension, FUN=mean)))
#[1] 84.44444
Using the corrected myFun
myFun(warpbreaks, strat.var='wool', PSU='tension', var1='breaks')
#$N.h
#[1] 2
#$out
# stratum ns mns
#A A 3 93.1111111111111
#B B 3 75.7777777777778
You could also create a function using dplyr
(you can fine-tune the below one)
library(lazyeval)
library(dplyr)
myFun2 <- function(dat, strat.var, PSU, var1) {
dat %>%
mutate_(N.h = interp(~n_distinct(var),
var = as.name(strat.var))) %>%
group_by_(.dots=strat.var) %>%
mutate_(ns = interp(~n_distinct(var), var=as.name(PSU))) %>%
group_by_(.dots=PSU, add=TRUE) %>%
mutate_(mns=interp(~mean(var), var=as.name(var1))) %>%
select_(.dots= list(strat.var, 'ns', 'N.h', 'mns')) %>%
unique() %>%
group_by_(.dots=strat.var, 'ns', 'N.h') %>%
summarise(mns=sum(mns))
}
myFun2(warpbreaks, 'wool', 'tension', 'breaks')
#Source: local data frame [2 x 4]
#Groups: ns, N.h
# ns N.h wool mns
#1 3 2 A 93.11111
#2 3 2 B 75.77778
Split factor levels
We can use separate_rows
from tidyr
then use mutate
to convert language
back to a factor
. The resulting language
column would be a factor with a level for each individual language:
library(dplyr)
library(tidyr)
df = df %>%
separate_rows(language) %>%
mutate(language = factor(language))
Result:
respondent language
1 1 English
2 2 English
3 3 French
4 4 French
5 4 German
6 5 German
7 6 German
> df$language
[1] English English French French German German German
Levels: English French German
Related Topics
Printing Newlines with Print() in R
Extract Names of Objects from List
Spread with Duplicate Identifiers (Using Tidyverse and %>%)
Dplyr on Data.Table, am I Really Using Data.Table
How to Change 'Maximum Upload Size Exceeded' Restriction in Shiny and Save User File Inputs
Reading 40 Gb CSV File into R Using Bigmemory
Sorting Each Row of a Data Frame
How to Generate Distributions Given, Mean, Sd, Skew and Kurtosis in R
Processing Negative Number in "Accounting" Format
Replace Na Value with the Group Value
How to Change the Background Color of a Plot Made with Ggplot2
What Are the Double Colons (::) in R
Finding Out Which Functions Are Called Within a Given Function
Techniques for Finding Near Duplicate Records
Using Substitute to Get Argument Name
Group Integer Vector into Consecutive Runs
Issue When Importing Dataset: 'Error in Scan(...): Line 1 Did Not Have 145 Elements'