How to Complete Missing Factor Levels in Data Frame

How to complete missing data in R

You can expand using factor levels in complete :

tidyr::complete(x, Name = factor(Name, levels = c('John', 'Dora')), 
fill = list(Age = 0))

Insert missing rows by factor level

Use expand.grid to make a master list and then merge:

alllevs <- do.call(expand.grid, lapply(dat[c("Type","Category")], levels))
merge(dat, alllevs, all.y=TRUE)

# Category Type Number Count
#1 X A 1 10
#2 X B 2 14
#3 Y A NA NA
#4 Y B 3 3
#5 Z A 4 14
#6 Z B NA NA

How to compare two R data frames to find missing factor-level?

Just take the set difference between the levels of the two factors.

F1 = factor(c('A', 'B', 'C'))
F2 = factor(c('B', 'C'))

setdiff(levels(F1), levels(F2))
[1] "A"

Complete dataframe with missing combinations of values

You can use the tidyr::complete function:

complete(df, distance, years = full_seq(years, period = 1), fill = list(area = 0))

# A tibble: 14 x 3
distance years area
<fct> <dbl> <dbl>
1 100 1. 40.
2 100 2. 0.
3 100 3. 0.
4 100 4. 0.
5 100 5. 50.
6 100 6. 60.
7 100 7. 0.
8 NPR 1. 0.
9 NPR 2. 0.
10 NPR 3. 10.
11 NPR 4. 20.
12 NPR 5. 0.
13 NPR 6. 0.
14 NPR 7. 30.

or slightly shorter:

complete(df, distance, years = 1:7, fill = list(area = 0))

For loops? Including rows in a dataframe by the missing values of factor levels

You can use tidyr for this.

First use tidyr::complete to fill in all the combinations of LengthClass, specifying that Count should be filled in as 0.

Then sort the data and use tidyr::fill to fill in the same values for the other columns (other than ID, LengthClass, and Count).

Create Data

library(tidyr)
library(dplyr)


df <- readr::read_csv(
'ID,Day,Month,Year,Depth,Haul_number,Count,LengthClass
H111200840,11,1,2008,-80,40,4,10-20
H111200840,11,1,2008,-80,40,15,20-30
H29320105,29,3,2010,-40,5,3,50-60
H29320105,29,3,2010,-40,5,8,60-70') %>%
mutate(LengthClass = as.factor(LengthClass))

df
#> # A tibble: 4 x 8
#> ID Day Month Year Depth Haul_number Count LengthClass
#> <chr> <int> <int> <int> <int> <int> <int> <fctr>
#> 1 H111200840 11 1 2008 -80 40 4 10-20
#> 2 H111200840 11 1 2008 -80 40 15 20-30
#> 3 H29320105 29 3 2010 -40 5 3 50-60
#> 4 H29320105 29 3 2010 -40 5 8 60-70

Fill in the extra rows

df %>% 
group_by(ID) %>%
complete(LengthClass, fill = list(Count = 0)) %>%
arrange(ID, Day) %>%
fill(-ID, -LengthClass, -Count, .direction = "down") %>%
ungroup()

#> # A tibble: 8 x 8
#> ID LengthClass Day Month Year Depth Haul_number Count
#> <chr> <fctr> <int> <int> <int> <int> <int> <dbl>
#> 1 H111200840 10-20 11 1 2008 -80 40 4
#> 2 H111200840 20-30 11 1 2008 -80 40 15
#> 3 H111200840 50-60 11 1 2008 -80 40 0
#> 4 H111200840 60-70 11 1 2008 -80 40 0
#> 5 H29320105 50-60 29 3 2010 -40 5 3
#> 6 H29320105 60-70 29 3 2010 -40 5 8
#> 7 H29320105 10-20 29 3 2010 -40 5 0
#> 8 H29320105 20-30 29 3 2010 -40 5 0

r - Fill in missing years in Data frame

We may use complete on the 'counts' data

library(tidyr)
complete(counts, year = 1990:1999, fill = list(freq = 0))

Drop unused factor levels in a subsetted data frame

All you should have to do is to apply factor() to your variable again after subsetting:

> subdf$letters
[1] a b c
Levels: a b c d e
subdf$letters <- factor(subdf$letters)
> subdf$letters
[1] a b c
Levels: a b c

EDIT

From the factor page example:

factor(ff)      # drops the levels that do not occur

For dropping levels from all factor columns in a dataframe, you can use:

subdf <- subset(df, numbers <= 3)
subdf[] <- lapply(subdf, function(x) if(is.factor(x)) factor(x) else x)


Related Topics



Leave a reply



Submit