Split Dataframe by Levels of a Factor and Name Dataframes by Those Levels

Split dataframe by levels of a factor and name dataframes by those levels

You can do it with the plyr package

require(plyr)
dlply(df, .(Z))

Split data.frame based on levels of a factor into new data.frames

I think that split does exactly what you want.

Notice that X is a list of data frames, as seen by str:

X <- split(df, df$g)
str(X)

If you want individual object with the group g names you could assign the elements of X from split to objects of those names, though this seems like extra work when you can just index the data frames from the list split creates.

#I used lapply just to drop the third column g which is no longer needed.
Y <- lapply(seq_along(X), function(x) as.data.frame(X[[x]])[, 1:2])

#Assign the dataframes in the list Y to individual objects
A <- Y[[1]]
B <- Y[[2]]
C <- Y[[3]]
D <- Y[[4]]
E <- Y[[5]]

#Or use lapply with assign to assign each piece to an object all at once
lapply(seq_along(Y), function(x) {
assign(c("A", "B", "C", "D", "E")[x], Y[[x]], envir=.GlobalEnv)
}
)

Edit Or even better than using lapply to assign to the global environment use list2env:

names(Y) <- c("A", "B", "C", "D", "E")
list2env(Y, envir = .GlobalEnv)
A

Subsetting a data.frame based on factor levels in a second data.frame

df.1[,unique(df.2$Var[which(df.2$Info=="X1")])]

           A            C
1 0.8924861 0.7149490854
2 0.5711894 0.7200819517
3 0.7049629 0.0004052017
4 0.9188677 0.5007302717
5 0.3440664 0.9138259818
6 0.8657903 0.2724015017
7 0.7631228 0.5686033906
8 0.8388003 0.7377064163
9 0.0796059 0.6196693045
10 0.5029824 0.8717568610

Splitting data frame according to (dichotomous) values in a column


# Using data frames
DF1 <- OriginalDF[OriginalDF$SEX == 0, ]
DF2 <- OriginalDF[OriginalDF$SEX == 1, ]

# If it's very large, I recommend you data.table
library(data.table)
OriginalDT <- data.table(OriginalDF)
DT1 <- OriginalDT[SEX == 0]
DT2 <- OriginalDT[SEX == 1]

Split/subset a data frame by factors in one column

We could use split:

mylist <- split(df, df$State)

mylist
$AL
ID Rate State
1 1 24 AL
4 4 34 AL

$FL
ID Rate State
3 3 46 FL
6 6 99 FL

$MN
ID Rate State
2 2 35 MN
5 5 78 MN

To access elements number:

mylist[[1]]

or by name:

mylist$AL
ID Rate State
1 1 24 AL
4 4 34 AL

?split

Description

split divides the data in the vector x into the groups defined by f.
The replacement forms replace values corresponding to such a division.
unsplit reverses the effect of split.

split a dataframe on a factor and apply a function

You can replace

 sdat <- with(dat, split(dat, strat.var))

with

sdat <- split(dat, dat[strat.var])

in the myFun.

The previous code was not splitting as it was intended, instead you were getting the sum for the whole data, i.e.

sum(with(warpbreaks, tapply(breaks, tension, FUN=mean)))
#[1] 84.44444

Using the corrected myFun

myFun(warpbreaks, strat.var='wool', PSU='tension', var1='breaks')
#$N.h
#[1] 2

#$out
# stratum ns mns
#A A 3 93.1111111111111
#B B 3 75.7777777777778

You could also create a function using dplyr (you can fine-tune the below one)

library(lazyeval)
library(dplyr)
myFun2 <- function(dat, strat.var, PSU, var1) {
dat %>%
mutate_(N.h = interp(~n_distinct(var),
var = as.name(strat.var))) %>%
group_by_(.dots=strat.var) %>%
mutate_(ns = interp(~n_distinct(var), var=as.name(PSU))) %>%
group_by_(.dots=PSU, add=TRUE) %>%
mutate_(mns=interp(~mean(var), var=as.name(var1))) %>%
select_(.dots= list(strat.var, 'ns', 'N.h', 'mns')) %>%
unique() %>%
group_by_(.dots=strat.var, 'ns', 'N.h') %>%
summarise(mns=sum(mns))
}

myFun2(warpbreaks, 'wool', 'tension', 'breaks')
#Source: local data frame [2 x 4]
#Groups: ns, N.h

# ns N.h wool mns
#1 3 2 A 93.11111
#2 3 2 B 75.77778

Split factor levels

We can use separate_rows from tidyr then use mutate to convert language back to a factor. The resulting language column would be a factor with a level for each individual language:

library(dplyr)
library(tidyr)

df = df %>%
separate_rows(language) %>%
mutate(language = factor(language))

Result:

  respondent language
1 1 English
2 2 English
3 3 French
4 4 French
5 4 German
6 5 German
7 6 German

> df$language
[1] English English French French German German German
Levels: English French German


Related Topics



Leave a reply



Submit