Replace contents of factor column in R dataframe
I bet the problem is when you are trying to replace values with a new one, one that is not currently part of the existing factor's levels:
levels(iris$Species)
# [1] "setosa" "versicolor" "virginica"
Your example was bad, this works:
iris$Species[iris$Species == 'virginica'] <- 'setosa'
This is what more likely creates the problem you were seeing with your own data:
iris$Species[iris$Species == 'virginica'] <- 'new.species'
# Warning message:
# In `[<-.factor`(`*tmp*`, iris$Species == "virginica", value = c(1L, :
# invalid factor level, NAs generated
It will work if you first increase your factor levels:
levels(iris$Species) <- c(levels(iris$Species), "new.species")
iris$Species[iris$Species == 'virginica'] <- 'new.species'
If you want to replace "species A" with "species B" you'd be better off with
levels(iris$Species)[match("oldspecies",levels(iris$Species))] <- "newspecies"
Replace values in column by factor level
We can loop over the columns and replace the levels using %in%
df1[] <- lapply(df1, function(x) {
levels(x)[levels(x) %in% c("Yes!", "Yay")] <- "Yes"
levels(x)[levels(x) %in% c("Nope", "Nah")] <- "No"
x
})
To drop the unused levels we can use droplevels
df2 <- droplevels(df1)
But, based on the assignment we did earlier, it would be taken care off.
df1
# Col1 Col2 Col3
#1 Yes No No
#2 Yes Yes No
#3 No No No
#4 No No No
#5 No Yes No
#6 No No No
#7 Yes Yes No
#8 No Yes No
#9 No No No
#10 Yes Yes No
str(df1)
#'data.frame': 10 obs. of 3 variables:
#$ Col1: Factor w/ 2 levels "No","Yes": 2 2 1 1 1 1 2 1 1 2
#$ Col2: Factor w/ 2 levels "No","Yes": 1 2 1 1 2 1 2 2 1 2
#$ Col3: Factor w/ 1 level "No": 1 1 1 1 1 1 1 1 1 1
data
set.seed(24)
df1 <- data.frame(Col1 = sample(c("Yes", "Yes!", "Yay", "Nope", "Nah", "No"),
10, replace=TRUE),
Col2 = sample(c("Yes", "Yes!", "Yay", "Nope", "Nah", "No"), 10, replace=TRUE),
Col3 = sample(c("Nope", "Nah", "No"), 10, replace=TRUE)
)
Replace a factor in a data frame column into a numeric value in R?
You can use as.integer
transform(df1, group=as.integer(group))
# group year items
# 1 1 2000 12
# 2 2 2000 10
# 3 3 2000 15
# 4 1 2015 5
# 5 2 2015 10
# 6 3 2015 7
In response to the updated question, you can use a key and index by the integer of your factor
key <- c(0, 5, 3)
transform(df1, group=key[as.integer(group)])
# group year items
# 1 0 2000 12
# 2 5 2000 10
# 3 3 2000 15
# 4 0 2015 5
# 5 5 2015 10
# 6 3 2015 7
And as @SimonG says, you can reorder your factor however you want using factor
and its argument levels
,
How to replace factor levels in multiples columns of a data frame based on the match lookup data frame using R
# Fake dataframe
df1 <- tibble(
num_var = sample(200, 15),
col1 = rep(c("onda","estrela","rato","caneta","ceu"), 3),
col2 = rep(c("muro","gato","pa","rato","ceu"), 3),
col3 = rep(c("surf","onda","dente","onda","sei"), 3),
col4 = rep(c("onda","casa",NA,"nao","net"), 3))
# Lookup dictionary dataframe
lookup_dat <- tibble(
lab_pt = c("onda","estrela","rato","caneta","ceu"),
lab_en = c("wave","star","rat","pen","sky"))
#******************************************************************
#
# Translation by replacement of lookup dictionary
# Developed to generate Rmd report with labels of plots in different languages
replace_level <- function(df, lookup_df, col_langu_in, col_langu_out){
library(data.table)
# function to replace levels in the df given a reference list in
# another df when level match it replace with the correspondent
#level in the same row name but in other column.
# !!!! Variables col_langu need to be quoted
# 1) Below it creates a dictionary style with the reference df (2cols)
lookup_vec <- setNames(as.character(lookup_df[[col_langu_out]]),
lookup_df[[col_langu_in]])
# 2) iterating over main df col names
for (i in names(df)) { # select cols?: names(df)[sapply(df, is.factor)]
# 3) return index of levels from df levels matching with those from
# the dictionary type to replace (for each cols of df i)
if(is.character(df[[i]])){df[i] <- as.factor(df[[i]])}
# Changing from character to factor before the translation
index_match <- which(levels(df[[i]]) %in%
names(lookup_vec))
# 4) replacing matchable levels based on the index on step 3).
# with the reference to translate
levels(df[[i]])[index_match] <-
lookup_vec[levels(df[[i]])[index_match]]}
return(df)}
# test here
replace_level(df1, lookup_dat, "lab_pt", "lab_en")
Need an efficient way to change factor values from one column of a data frame to another columns
We can use fct_collapse
and it returns a factor
with new levels
library(dplyr)
library(forcats)
library(magrittr)
df %<>%
mutate(B = fct_collapse(B, CHANGED = as.character(B)[A== "Kelly"]))
glimpse(df)
#Rows: 7
#Columns: 2
#$ A <fct> Jerry, Kelly, Kelly, Lion, Zebra, Bear, Kelly
#$ B <fct> Eats, CHANGED, CHANGED, Roars, Runs, Sleeps, CHANGED
Replace NA in a factor column
1) addNA If fac
is a factor addNA(fac)
is the same factor but with NA added as a level. See ?addNA
To force the NA level to be 88:
facna <- addNA(fac)
levels(facna) <- c(levels(fac), 88)
giving:
> facna
[1] 1 2 3 3 4 88 2 4 88 3
Levels: 1 2 3 4 88
1a) This can be written in a single line as follows:
`levels<-`(addNA(fac), c(levels(fac), 88))
2) factor It can also be done in one line using the various arguments of factor
like this:
factor(fac, levels = levels(addNA(fac)), labels = c(levels(fac), 88), exclude = NULL)
2a) or equivalently:
factor(fac, levels = c(levels(fac), NA), labels = c(levels(fac), 88), exclude = NULL)
3) ifelse Another approach is:
factor(ifelse(is.na(fac), 88, paste(fac)), levels = c(levels(fac), 88))
4) forcats The forcats package has a function for this:
library(forcats)
fct_explicit_na(fac, "88")
## [1] 1 2 3 3 4 88 2 4 88 3
## Levels: 1 2 3 4 88
Note: We used the following for input fac
fac <- structure(c(1L, 2L, 3L, 3L, 4L, NA, 2L, 4L, NA, 3L), .Label = c("1",
"2", "3", "4"), class = "factor")
Update: Have improved (1) and added (1a). Later added (4).
Replace values in columns of a data frame after match a pattern in another column
Seems to me Col1 is a factor. Try:
# Convert to character first.
rows <- !grepl("BD|HD", as.character(mydf$`Col1`))
mfdf$`Value1`[rows] <- 0
mfdf$`Value2`[rows] <- 0
How to replace all values in a column based on an ordered vector in r
I cannot read in the dta file for some reasons, so below I simulate data to show you my suggestion. You start with your educ_vec vector.
educ_vec <- c("No formal schooling", "1st grade",
"2nd grade", "3rd grade", "4th grade", "5th grade",
"6th grade", "7th grade", "8th grade", "9th grade",
"10th grade", "11th grade", "12th grade", "1 year of college",
"2 years of college", "3 years of college", "4 years of college",
"5 years of college", "6 years of college", "7 years of college",
"8 years of college")
If you look at the educ_vec
, it is already in the format you want
# this is meant for 0
educ_vec[1]
[1] "No formal schooling"
# this is meant for 20
educ_vec[21]
[1] "8 years of college"
If your score is i, the new categorical value will be educ_vec[i+1]; so we can make use of this below:
set.seed(100)
gss_df <- data.frame(educ=sample(0:20,30,replace=TRUE))
gss_df %>%
mutate(new=factor(educ_vec[educ+1],ordered = TRUE, levels = educ_vec))
educ new
1 9 9th grade
2 5 5th grade
3 15 3 years of college
4 18 6 years of college
5 13 1 year of college
6 11 11th grade
7 5 5th grade
8 3 3rd grade
9 5 5th grade
10 1 1st grade
11 6 6th grade
12 6 6th grade
13 10 10th grade
14 17 5 years of college
15 11 11th grade
16 2 2nd grade
17 18 6 years of college
18 7 7th grade
19 17 5 years of college
20 1 1st grade
21 18 6 years of college
22 3 3rd grade
23 3 3rd grade
24 19 7 years of college
25 15 3 years of college
26 20 8 years of college
27 6 6th grade
28 15 3 years of college
29 10 10th grade
30 19 7 years of college
And yes it works if some of the factors are not found in the data:
gss_df <- data.frame(educ=0:5)%>%
mutate(new=factor(educ_vec[educ+1],ordered = TRUE, levels = educ_vec))
educ new
1 0 No formal schooling
2 1 1st grade
3 2 2nd grade
4 3 3rd grade
5 4 4th grade
6 5 5th grade
You can see the new column is a factor with the intended categories.
str(gss_df)
'data.frame': 6 obs. of 2 variables:
$ educ: int 0 1 2 3 4 5
$ new : Ord.factor w/ 21 levels "No formal schooling"<..: 1 2 3 4 5 6
If you have scores that are not in 0-20, for example -1, -2 or 21,22 etc.. then I suggest doing the following:
names(educ_vec) = 0:20
gss_df <- data.frame(educ=c(-1,0,20,21))
# you can also use mutate
gss_df$new <- educ_vec[match(gss_df$educ,names(educ_vec))]
gss_df
educ new
1 -1 <NA>
2 0 No formal schooling
3 20 8 years of college
4 21 <NA>
Match will return a NA if it cannot find the corresponding name in your educ_vec
Related Topics
Why Is Apply() Method Slower Than a for Loop in R
How to Get a Reversed, Log10 Scale in Ggplot2
Remove Groups with Less Than Three Unique Observations
Passing Command Line Arguments to R Cmd Batch
Read a Utf-8 Text File with Bom
Ggplot Side by Side Geom_Bar()
Remove Rows in R Matrix Where All Data Is Na
Read.CSV Warning 'Eof Within Quoted String' Prevents Complete Reading of File
How to Overlay Density Plots in R
Prevent Row Names to Be Written to File When Using Write.Csv
Ggplot2, Facet_Grid, Free Scales
Data.Table - Select First N Rows Within Group
Load Multiple Packages at Once
Use Stat_Summary to Annotate Plot with Number of Observations