Split Dataframe by Levels of a Factor and Name Dataframes by Those Levels

Split dataframe by levels of a factor and name dataframes by those levels

You can do it with the plyr package

require(plyr)
dlply(df, .(Z))

Split data.frame based on levels of a factor into new data.frames

I think that split does exactly what you want.

Notice that X is a list of data frames, as seen by str:

X <- split(df, df$g)
str(X)

If you want individual object with the group g names you could assign the elements of X from split to objects of those names, though this seems like extra work when you can just index the data frames from the list split creates.

#I used lapply just to drop the third column g which is no longer needed.
Y <- lapply(seq_along(X), function(x) as.data.frame(X[[x]])[, 1:2]) 

#Assign the dataframes in the list Y to individual objects
A <- Y[[1]]
B <- Y[[2]]
C <- Y[[3]]
D <- Y[[4]]
E <- Y[[5]]

#Or use lapply with assign to assign each piece to an object all at once
lapply(seq_along(Y), function(x) {
    assign(c("A", "B", "C", "D", "E")[x], Y[[x]], envir=.GlobalEnv)
    }
)

Edit Or even better than using lapply to assign to the global environment use list2env:

names(Y) <- c("A", "B", "C", "D", "E")
list2env(Y, envir = .GlobalEnv)
A

Subsetting a data.frame based on factor levels in a second data.frame

df.1[,unique(df.2$Var[which(df.2$Info=="X1")])]

           A            C
1  0.8924861 0.7149490854
2  0.5711894 0.7200819517
3  0.7049629 0.0004052017
4  0.9188677 0.5007302717
5  0.3440664 0.9138259818
6  0.8657903 0.2724015017
7  0.7631228 0.5686033906
8  0.8388003 0.7377064163
9  0.0796059 0.6196693045
10 0.5029824 0.8717568610

Splitting data frame according to (dichotomous) values in a column

# Using data frames
DF1 <- OriginalDF[OriginalDF$SEX == 0, ]
DF2 <- OriginalDF[OriginalDF$SEX == 1, ]

# If it's very large, I recommend you data.table
library(data.table)
OriginalDT <- data.table(OriginalDF)
DT1 <- OriginalDT[SEX == 0]
DT2 <- OriginalDT[SEX == 1]

Split/subset a data frame by factors in one column

We could use split:

mylist <- split(df, df$State)

mylist
$AL
  ID Rate State
1  1   24    AL
4  4   34    AL

$FL
  ID Rate State
3  3   46    FL
6  6   99    FL

$MN
  ID Rate State
2  2   35    MN
5  5   78    MN

To access elements number:

mylist[[1]]

or by name:

mylist$AL
  ID Rate State
1  1   24    AL
4  4   34    AL

?split

Description

split divides the data in the vector x into the groups defined by f.
The replacement forms replace values corresponding to such a division.
unsplit reverses the effect of split.

split a dataframe on a factor and apply a function

You can replace

 sdat <- with(dat, split(dat, strat.var))

with

sdat <- split(dat, dat[strat.var])

in the myFun.

The previous code was not splitting as it was intended, instead you were getting the sum for the whole data, i.e.

sum(with(warpbreaks, tapply(breaks, tension, FUN=mean)))
#[1] 84.44444

Using the corrected myFun

myFun(warpbreaks, strat.var='wool', PSU='tension', var1='breaks')
#$N.h
#[1] 2

#$out
#  stratum ns              mns
#A       A  3 93.1111111111111
#B       B  3 75.7777777777778

You could also create a function using dplyr (you can fine-tune the below one)

library(lazyeval)
library(dplyr)
myFun2 <- function(dat, strat.var, PSU, var1) {
   dat %>%
      mutate_(N.h = interp(~n_distinct(var),
               var = as.name(strat.var))) %>% 
      group_by_(.dots=strat.var) %>%
      mutate_(ns = interp(~n_distinct(var), var=as.name(PSU))) %>% 
      group_by_(.dots=PSU, add=TRUE) %>% 
      mutate_(mns=interp(~mean(var), var=as.name(var1))) %>%  
      select_(.dots= list(strat.var, 'ns', 'N.h', 'mns')) %>%
      unique() %>%
      group_by_(.dots=strat.var, 'ns', 'N.h') %>% 
      summarise(mns=sum(mns))                  
 }

myFun2(warpbreaks, 'wool', 'tension', 'breaks')
#Source: local data frame [2 x 4]
#Groups: ns, N.h

#  ns N.h wool      mns
#1  3   2    A 93.11111
#2  3   2    B 75.77778

Split factor levels

We can use separate_rows from tidyr then use mutate to convert language back to a factor. The resulting language column would be a factor with a level for each individual language:

library(dplyr)
library(tidyr)

df = df %>%
  separate_rows(language) %>%
  mutate(language = factor(language))

Result:

  respondent language
1          1  English
2          2  English
3          3   French
4          4   French
5          4   German
6          5   German
7          6   German

> df$language
[1] English English French  French  German  German  German 
Levels: English French German

Split Dataframe by Levels of a Factor and Name Dataframes by Those Levels