Replace Contents of Factor Column in R Dataframe

Replace contents of factor column in R dataframe

I bet the problem is when you are trying to replace values with a new one, one that is not currently part of the existing factor's levels:

levels(iris$Species)
# [1] "setosa" "versicolor" "virginica"

Your example was bad, this works:

iris$Species[iris$Species == 'virginica'] <- 'setosa'

This is what more likely creates the problem you were seeing with your own data:

iris$Species[iris$Species == 'virginica'] <- 'new.species'
# Warning message:
# In `[<-.factor`(`*tmp*`, iris$Species == "virginica", value = c(1L, :
# invalid factor level, NAs generated

It will work if you first increase your factor levels:

levels(iris$Species) <- c(levels(iris$Species), "new.species")
iris$Species[iris$Species == 'virginica'] <- 'new.species'

If you want to replace "species A" with "species B" you'd be better off with

levels(iris$Species)[match("oldspecies",levels(iris$Species))] <- "newspecies"

Replace values in column by factor level

We can loop over the columns and replace the levels using %in%

df1[] <- lapply(df1, function(x) {
levels(x)[levels(x) %in% c("Yes!", "Yay")] <- "Yes"
levels(x)[levels(x) %in% c("Nope", "Nah")] <- "No"
x
})

To drop the unused levels we can use droplevels

df2 <- droplevels(df1)

But, based on the assignment we did earlier, it would be taken care off.

df1
# Col1 Col2 Col3
#1 Yes No No
#2 Yes Yes No
#3 No No No
#4 No No No
#5 No Yes No
#6 No No No
#7 Yes Yes No
#8 No Yes No
#9 No No No
#10 Yes Yes No


str(df1)
#'data.frame': 10 obs. of 3 variables:
#$ Col1: Factor w/ 2 levels "No","Yes": 2 2 1 1 1 1 2 1 1 2
#$ Col2: Factor w/ 2 levels "No","Yes": 1 2 1 1 2 1 2 2 1 2
#$ Col3: Factor w/ 1 level "No": 1 1 1 1 1 1 1 1 1 1

data

set.seed(24)
df1 <- data.frame(Col1 = sample(c("Yes", "Yes!", "Yay", "Nope", "Nah", "No"),
10, replace=TRUE),

Col2 = sample(c("Yes", "Yes!", "Yay", "Nope", "Nah", "No"), 10, replace=TRUE),
Col3 = sample(c("Nope", "Nah", "No"), 10, replace=TRUE)
)

Replace a factor in a data frame column into a numeric value in R?

You can use as.integer

transform(df1, group=as.integer(group))
# group year items
# 1 1 2000 12
# 2 2 2000 10
# 3 3 2000 15
# 4 1 2015 5
# 5 2 2015 10
# 6 3 2015 7

In response to the updated question, you can use a key and index by the integer of your factor

key <- c(0, 5, 3)
transform(df1, group=key[as.integer(group)])

# group year items
# 1 0 2000 12
# 2 5 2000 10
# 3 3 2000 15
# 4 0 2015 5
# 5 5 2015 10
# 6 3 2015 7

And as @SimonG says, you can reorder your factor however you want using factor and its argument levels,

How to replace factor levels in multiples columns of a data frame based on the match lookup data frame using R

# Fake dataframe
df1 <- tibble(
num_var = sample(200, 15),
col1 = rep(c("onda","estrela","rato","caneta","ceu"), 3),
col2 = rep(c("muro","gato","pa","rato","ceu"), 3),
col3 = rep(c("surf","onda","dente","onda","sei"), 3),
col4 = rep(c("onda","casa",NA,"nao","net"), 3))

# Lookup dictionary dataframe
lookup_dat <- tibble(
lab_pt = c("onda","estrela","rato","caneta","ceu"),
lab_en = c("wave","star","rat","pen","sky"))

#******************************************************************
#
# Translation by replacement of lookup dictionary
# Developed to generate Rmd report with labels of plots in different languages
replace_level <- function(df, lookup_df, col_langu_in, col_langu_out){
library(data.table)
# function to replace levels in the df given a reference list in
# another df when level match it replace with the correspondent
#level in the same row name but in other column.
# !!!! Variables col_langu need to be quoted
# 1) Below it creates a dictionary style with the reference df (2cols)
lookup_vec <- setNames(as.character(lookup_df[[col_langu_out]]),
lookup_df[[col_langu_in]])
# 2) iterating over main df col names
for (i in names(df)) { # select cols?: names(df)[sapply(df, is.factor)]
# 3) return index of levels from df levels matching with those from
# the dictionary type to replace (for each cols of df i)
if(is.character(df[[i]])){df[i] <- as.factor(df[[i]])}
# Changing from character to factor before the translation
index_match <- which(levels(df[[i]]) %in%
names(lookup_vec))
# 4) replacing matchable levels based on the index on step 3).
# with the reference to translate
levels(df[[i]])[index_match] <-
lookup_vec[levels(df[[i]])[index_match]]}
return(df)}

# test here
replace_level(df1, lookup_dat, "lab_pt", "lab_en")

Need an efficient way to change factor values from one column of a data frame to another columns

We can use fct_collapse and it returns a factor with new levels

library(dplyr)
library(forcats)
library(magrittr)
df %<>%
mutate(B = fct_collapse(B, CHANGED = as.character(B)[A== "Kelly"]))

glimpse(df)
#Rows: 7
#Columns: 2
#$ A <fct> Jerry, Kelly, Kelly, Lion, Zebra, Bear, Kelly
#$ B <fct> Eats, CHANGED, CHANGED, Roars, Runs, Sleeps, CHANGED

Replace NA in a factor column

1) addNA If fac is a factor addNA(fac) is the same factor but with NA added as a level. See ?addNA

To force the NA level to be 88:

facna <- addNA(fac)
levels(facna) <- c(levels(fac), 88)

giving:

> facna
[1] 1 2 3 3 4 88 2 4 88 3
Levels: 1 2 3 4 88

1a) This can be written in a single line as follows:

`levels<-`(addNA(fac), c(levels(fac), 88))

2) factor It can also be done in one line using the various arguments of factor like this:

factor(fac, levels = levels(addNA(fac)), labels = c(levels(fac), 88), exclude = NULL)

2a) or equivalently:

factor(fac, levels = c(levels(fac), NA), labels = c(levels(fac), 88), exclude = NULL)

3) ifelse Another approach is:

factor(ifelse(is.na(fac), 88, paste(fac)), levels = c(levels(fac), 88))

4) forcats The forcats package has a function for this:

library(forcats)

fct_explicit_na(fac, "88")
## [1] 1 2 3 3 4 88 2 4 88 3
## Levels: 1 2 3 4 88

Note: We used the following for input fac

fac <- structure(c(1L, 2L, 3L, 3L, 4L, NA, 2L, 4L, NA, 3L), .Label = c("1", 
"2", "3", "4"), class = "factor")

Update: Have improved (1) and added (1a). Later added (4).

Replace values in columns of a data frame after match a pattern in another column

Seems to me Col1 is a factor. Try:

# Convert to character first.
rows <- !grepl("BD|HD", as.character(mydf$`Col1`))
mfdf$`Value1`[rows] <- 0
mfdf$`Value2`[rows] <- 0

How to replace all values in a column based on an ordered vector in r

I cannot read in the dta file for some reasons, so below I simulate data to show you my suggestion. You start with your educ_vec vector.

educ_vec <- c("No formal schooling", "1st grade", 
"2nd grade", "3rd grade", "4th grade", "5th grade",
"6th grade", "7th grade", "8th grade", "9th grade",
"10th grade", "11th grade", "12th grade", "1 year of college",
"2 years of college", "3 years of college", "4 years of college",
"5 years of college", "6 years of college", "7 years of college",
"8 years of college")

If you look at the educ_vec , it is already in the format you want

# this is meant for 0
educ_vec[1]
[1] "No formal schooling"
# this is meant for 20
educ_vec[21]
[1] "8 years of college"

If your score is i, the new categorical value will be educ_vec[i+1]; so we can make use of this below:

set.seed(100)
gss_df <- data.frame(educ=sample(0:20,30,replace=TRUE))
gss_df %>%
mutate(new=factor(educ_vec[educ+1],ordered = TRUE, levels = educ_vec))

educ new
1 9 9th grade
2 5 5th grade
3 15 3 years of college
4 18 6 years of college
5 13 1 year of college
6 11 11th grade
7 5 5th grade
8 3 3rd grade
9 5 5th grade
10 1 1st grade
11 6 6th grade
12 6 6th grade
13 10 10th grade
14 17 5 years of college
15 11 11th grade
16 2 2nd grade
17 18 6 years of college
18 7 7th grade
19 17 5 years of college
20 1 1st grade
21 18 6 years of college
22 3 3rd grade
23 3 3rd grade
24 19 7 years of college
25 15 3 years of college
26 20 8 years of college
27 6 6th grade
28 15 3 years of college
29 10 10th grade
30 19 7 years of college

And yes it works if some of the factors are not found in the data:

gss_df <- data.frame(educ=0:5)%>%
mutate(new=factor(educ_vec[educ+1],ordered = TRUE, levels = educ_vec))

educ new
1 0 No formal schooling
2 1 1st grade
3 2 2nd grade
4 3 3rd grade
5 4 4th grade
6 5 5th grade

You can see the new column is a factor with the intended categories.

str(gss_df)
'data.frame': 6 obs. of 2 variables:
$ educ: int 0 1 2 3 4 5
$ new : Ord.factor w/ 21 levels "No formal schooling"<..: 1 2 3 4 5 6

If you have scores that are not in 0-20, for example -1, -2 or 21,22 etc.. then I suggest doing the following:

names(educ_vec) = 0:20
gss_df <- data.frame(educ=c(-1,0,20,21))
# you can also use mutate
gss_df$new <- educ_vec[match(gss_df$educ,names(educ_vec))]
gss_df

educ new
1 -1 <NA>
2 0 No formal schooling
3 20 8 years of college
4 21 <NA>

Match will return a NA if it cannot find the corresponding name in your educ_vec



Related Topics



Leave a reply



Submit