How to Complete Missing Factor Levels in Data Frame

How to complete missing data in R

You can expand using factor levels in complete :

tidyr::complete(x, Name = factor(Name, levels = c('John', 'Dora')), 
                   fill = list(Age = 0))

Insert missing rows by factor level

Use expand.grid to make a master list and then merge:

alllevs <- do.call(expand.grid, lapply(dat[c("Type","Category")], levels))
merge(dat, alllevs, all.y=TRUE)

#  Category Type Number Count
#1        X    A      1    10
#2        X    B      2    14
#3        Y    A     NA    NA
#4        Y    B      3     3
#5        Z    A      4    14
#6        Z    B     NA    NA

How to compare two R data frames to find missing factor-level?

Just take the set difference between the levels of the two factors.

F1 = factor(c('A', 'B', 'C'))
F2 = factor(c('B', 'C'))

setdiff(levels(F1), levels(F2))
 [1] "A"

Complete dataframe with missing combinations of values

You can use the tidyr::complete function:

complete(df, distance, years = full_seq(years, period = 1), fill = list(area = 0))

# A tibble: 14 x 3
   distance years  area
   <fct>    <dbl> <dbl>
 1 100         1.   40.
 2 100         2.    0.
 3 100         3.    0.
 4 100         4.    0.
 5 100         5.   50.
 6 100         6.   60.
 7 100         7.    0.
 8 NPR         1.    0.
 9 NPR         2.    0.
10 NPR         3.   10.
11 NPR         4.   20.
12 NPR         5.    0.
13 NPR         6.    0.
14 NPR         7.   30.

or slightly shorter:

complete(df, distance, years = 1:7, fill = list(area = 0))

For loops? Including rows in a dataframe by the missing values of factor levels

You can use tidyr for this.

First use tidyr::complete to fill in all the combinations of LengthClass, specifying that Count should be filled in as 0.

Then sort the data and use tidyr::fill to fill in the same values for the other columns (other than ID, LengthClass, and Count).

Create Data

library(tidyr)
library(dplyr)


df <- readr::read_csv(
'ID,Day,Month,Year,Depth,Haul_number,Count,LengthClass
H111200840,11,1,2008,-80,40,4,10-20
H111200840,11,1,2008,-80,40,15,20-30
H29320105,29,3,2010,-40,5,3,50-60
H29320105,29,3,2010,-40,5,8,60-70') %>% 
  mutate(LengthClass = as.factor(LengthClass))

df
#> # A tibble: 4 x 8
#>           ID   Day Month  Year Depth Haul_number Count LengthClass
#>        <chr> <int> <int> <int> <int>       <int> <int>      <fctr>
#> 1 H111200840    11     1  2008   -80          40     4       10-20
#> 2 H111200840    11     1  2008   -80          40    15       20-30
#> 3  H29320105    29     3  2010   -40           5     3       50-60
#> 4  H29320105    29     3  2010   -40           5     8       60-70

Fill in the extra rows

df %>% 
  group_by(ID) %>% 
  complete(LengthClass, fill = list(Count = 0)) %>% 
  arrange(ID, Day) %>% 
  fill(-ID, -LengthClass, -Count, .direction = "down") %>% 
  ungroup()

#> # A tibble: 8 x 8
#>           ID LengthClass   Day Month  Year Depth Haul_number Count
#>        <chr>      <fctr> <int> <int> <int> <int>       <int> <dbl>
#> 1 H111200840       10-20    11     1  2008   -80          40     4
#> 2 H111200840       20-30    11     1  2008   -80          40    15
#> 3 H111200840       50-60    11     1  2008   -80          40     0
#> 4 H111200840       60-70    11     1  2008   -80          40     0
#> 5  H29320105       50-60    29     3  2010   -40           5     3
#> 6  H29320105       60-70    29     3  2010   -40           5     8
#> 7  H29320105       10-20    29     3  2010   -40           5     0
#> 8  H29320105       20-30    29     3  2010   -40           5     0

r - Fill in missing years in Data frame

We may use complete on the 'counts' data

library(tidyr)
complete(counts, year = 1990:1999, fill = list(freq = 0))

Drop unused factor levels in a subsetted data frame

All you should have to do is to apply factor() to your variable again after subsetting:

> subdf$letters
[1] a b c
Levels: a b c d e
subdf$letters <- factor(subdf$letters)
> subdf$letters
[1] a b c
Levels: a b c

EDIT

From the factor page example:

factor(ff)      # drops the levels that do not occur

For dropping levels from all factor columns in a dataframe, you can use:

subdf <- subset(df, numbers <= 3)
subdf[] <- lapply(subdf, function(x) if(is.factor(x)) factor(x) else x)