﻿ Drop Unused Factor Levels in a Subsetted Data Frame - ITCodar

# Drop Unused Factor Levels in a Subsetted Data Frame

## Drop unused factor levels in a subsetted data frame

All you should have to do is to apply factor() to your variable again after subsetting:

``> subdf\$letters a b cLevels: a b c d esubdf\$letters <- factor(subdf\$letters)> subdf\$letters a b cLevels: a b c``

EDIT

From the factor page example:

``factor(ff)      # drops the levels that do not occur``

For dropping levels from all factor columns in a dataframe, you can use:

``subdf <- subset(df, numbers <= 3)subdf[] <- lapply(subdf, function(x) if(is.factor(x)) factor(x) else x)``

## How can I drop unused levels from a data frame?

There's a recently added function in R for this:

``y <- droplevels(y)``

## Override [.data.frame to drop unused factor levels by default

I'd be really wary of changing the default behavior; you never know when another function you use depends on the usual default behavior. I'd instead write a similar function to your `subsetDrop` but for `[`, like

``sel <- function(x, ...) droplevels(x[...])``

Then

``> d <- data.frame(a=factor(LETTERS[1:5]), b=factor(letters[1:5]))> str(d[1:2,])'data.frame':   2 obs. of  2 variables: \$ a: Factor w/ 5 levels "A","B","C","D",..: 1 2 \$ b: Factor w/ 5 levels "a","b","c","d",..: 1 2> str(sel(d,1:2,))'data.frame':   2 obs. of  2 variables: \$ a: Factor w/ 2 levels "A","B": 1 2 \$ b: Factor w/ 2 levels "a","b": 1 2``

If you really want to change the default, you could do something like

``foo <- `[.data.frame``[.data.frame` <- function(...) droplevels(foo(...))``

but make sure you know how namespaces work as this will work for anything called from the global namespace but the version in the base namespace is unchanged. Which might be a good thing, but it's something you want to make sure you understand. After this change the output is as you'd like.

``> str(d[1:2,])'data.frame':   2 obs. of  2 variables: \$ a: Factor w/ 2 levels "A","B": 1 2 \$ b: Factor w/ 2 levels "a","b": 1 2``

## Dropping unused factor levels in data.table

We can use `.SDcols` to specify the columns of interest. It can take a vector of columns names (length of 1 or greater than 1) or column index. Now, the `.SD` i.e. Subset of Data.table would have those columns specified in the `.SDcols`. As there is only a single column, extract that column with `[[`, apply the `droplevels` on the `vector` and assign (`:=`) it back to the column of interest. Not the parens around the object identifier v1. It is to evaluate the object to get the value in it instead of creating a column 'v1'

``x[, (v1) := droplevels(.SD[]), .SDcols = v1]``

Usually, the syntax would be

``x[, (v1) := lapply(.SD, droplevels), .SDcols = v1]``

It can take one column or multiple columns. The only reason to extract (`[[`) is because we know it is a single column

Another option is `get`

``x[, (v1) :=  droplevels(get(v1))]``

where,

``v1 <- "y"``

## Subsetting a data.frame based on factor levels in a second data.frame

`df.1[,unique(df.2\$Var[which(df.2\$Info=="X1")])]`

``           A            C1  0.8924861 0.71494908542  0.5711894 0.72008195173  0.7049629 0.00040520174  0.9188677 0.50073027175  0.3440664 0.91382598186  0.8657903 0.27240150177  0.7631228 0.56860339068  0.8388003 0.73770641639  0.0796059 0.619669304510 0.5029824 0.8717568610``

## Change factor levels and rearrange dataframe

This mistakes is easy to make. You have to supply the column vector to `fct_relevel`. Like so:

``library(dplyr,warn.conflicts = F)library(forcats)df <-  structure(    list(layer = structure(      1:5,      .Label = c(        'CEOS and managers',        'Clerks and services',        'Production',        'Professionals',        'Technicians'      ),      class = 'factor'    )),    row.names = c(NA,-5L),    class = c('tbl_df', 'tbl', 'data.frame')  )df %>%  mutate(layer = forcats::fct_relevel(    layer,c(      'CEOS and managers',      'Professionals',      'Technicians',      'Clerks and services',      'Production'))) %>%   arrange(layer)#> # A tibble: 5 x 1#>   layer              #>   <fct>              #> 1 CEOS and managers  #> 2 Professionals      #> 3 Technicians        #> 4 Clerks and services#> 5 Production``

Created on 2021-01-11 by the reprex package (v0.3.0)