Replacing Numbers Within a Range with a Factor

Replacing numbers within a range with a factor

Use cut to do this in one step:

dfc <- cut(df$x, breaks=c(0, 15, 45, 56, Inf))
str(dfc)
 Factor w/ 4 levels "(0,15]","(15,45]",..: 3 4 3 2 2 4 2 2 4 4 ...

Once you are satisfied that the breaks are correctly specified, you can then also use the labels argument to relabel the levels:

dfc <- cut(df$x, breaks=c(0, 15, 45, 56, Inf), labels=paste("Age", 1:4, sep=""))
str(dfc)
 Factor w/ 4 levels "Age1","Age2",..: 3 4 3 2 2 4 2 2 4 4 ...

Replace range of values for factor with levels

For integer values you can use simple ifelse to change the values in two groups.

VACounty$MedHouseIncome2012 <- ifelse(VACounty$MedHouseIncome2012 < 8000, 'low', 'high')

If you need column as factors you can do

VACounty$MedHouseIncome2012 <- factor(VACounty$MedHouseIncome2012)

Replace values in a numerical range

It’s as simple as this:

data[data > -1.5 & data < 1.5] <- 0

Replace contents of factor column in R dataframe

I bet the problem is when you are trying to replace values with a new one, one that is not currently part of the existing factor's levels:

levels(iris$Species)
# [1] "setosa"     "versicolor" "virginica"

Your example was bad, this works:

iris$Species[iris$Species == 'virginica'] <- 'setosa'

This is what more likely creates the problem you were seeing with your own data:

iris$Species[iris$Species == 'virginica'] <- 'new.species'
# Warning message:
# In `[<-.factor`(`*tmp*`, iris$Species == "virginica", value = c(1L,  :
#   invalid factor level, NAs generated

It will work if you first increase your factor levels:

levels(iris$Species) <- c(levels(iris$Species), "new.species")
iris$Species[iris$Species == 'virginica'] <- 'new.species'

If you want to replace "species A" with "species B" you'd be better off with

levels(iris$Species)[match("oldspecies",levels(iris$Species))] <- "newspecies"

Is there a vectorized method for replacing factor levels in tidyverse

You can use cur_column() in dplyr to use the name of the column to replace.

library(dplyr)

Health %>% mutate(across(.fns = ~replace(., . == 'yes', cur_column())))

#  Anemia BloodPressure Asthma
#  <chr>  <chr>         <chr> 
#1 Anemia no            no    
#2 no     BloodPressure no    
#3 no     no            Asthma

In base R, with lapply :

Health[] <- lapply(names(Health), function(x) 
                   replace(Health[[x]], Health[[x]] == 'yes', x))

Find rows with incomplete set depending on a factor, then replace values that exist by NA for the incomplete set

You can group by id and if any value has NA in it replace all of them with NA. To apply a function to multiple columns we use across.

library(dplyr)

df %>%
  group_by(id) %>%
  mutate(across(starts_with('var'), ~if(any(is.na(.))) NA else .))
  #for dplyr < 1.0.0 we can use `mutate_at`
  #mutate_at(vars(starts_with('var')), ~if(any(is.na(.))) NA else .)

#     id myfactor var2change var3change var4change
#  <dbl> <fct>         <dbl>      <dbl>      <dbl>
#1     1 1                10          5         NA
#2     1 2                10         10         NA
#3     2 1                NA         15          3
#4     2 2                NA         20          8

replace specific range of numbers with NA

We can use base R to do this. Replace the elements in the first column that are greater than 20 to NA and get the mean

df[,1][df[,1] > 20] <- NA 
mean(df[,1], na.rm = TRUE)
#[1] 3

and for all the other columns

colMeans(df, na.rm = TRUE)
#     V1       V2       V3 
#3.00000  6.00000 11.33333

Or in a single line

mean(df[,1][df[,1] <= 20], na.rm = TRUE)
#[1] 3

How to replace all values in a column based on an ordered vector in r

I cannot read in the dta file for some reasons, so below I simulate data to show you my suggestion. You start with your educ_vec vector.

educ_vec <- c("No formal schooling", "1st grade", 
"2nd grade", "3rd grade", "4th grade", "5th grade", 
"6th grade", "7th grade", "8th grade", "9th grade", 
"10th grade", "11th grade", "12th grade", "1 year of college", 
"2 years of college", "3 years of college", "4 years of college", 
"5 years of college", "6 years of college", "7 years of college", 
"8 years of college")

If you look at the educ_vec , it is already in the format you want

# this is meant for 0
educ_vec[1]
[1] "No formal schooling"
# this is meant for 20
educ_vec[21]
[1] "8 years of college"

If your score is i, the new categorical value will be educ_vec[i+1]; so we can make use of this below:

set.seed(100)
gss_df <- data.frame(educ=sample(0:20,30,replace=TRUE))
gss_df %>% 
mutate(new=factor(educ_vec[educ+1],ordered = TRUE, levels = educ_vec))

   educ                new
1     9          9th grade
2     5          5th grade
3    15 3 years of college
4    18 6 years of college
5    13  1 year of college
6    11         11th grade
7     5          5th grade
8     3          3rd grade
9     5          5th grade
10    1          1st grade
11    6          6th grade
12    6          6th grade
13   10         10th grade
14   17 5 years of college
15   11         11th grade
16    2          2nd grade
17   18 6 years of college
18    7          7th grade
19   17 5 years of college
20    1          1st grade
21   18 6 years of college
22    3          3rd grade
23    3          3rd grade
24   19 7 years of college
25   15 3 years of college
26   20 8 years of college
27    6          6th grade
28   15 3 years of college
29   10         10th grade
30   19 7 years of college

And yes it works if some of the factors are not found in the data:

gss_df <- data.frame(educ=0:5)%>%
mutate(new=factor(educ_vec[educ+1],ordered = TRUE, levels = educ_vec))

  educ                 new
1    0 No formal schooling
2    1           1st grade
3    2           2nd grade
4    3           3rd grade
5    4           4th grade
6    5           5th grade

You can see the new column is a factor with the intended categories.

str(gss_df)
'data.frame':   6 obs. of  2 variables:
 $ educ: int  0 1 2 3 4 5
 $ new : Ord.factor w/ 21 levels "No formal schooling"<..: 1 2 3 4 5 6

If you have scores that are not in 0-20, for example -1, -2 or 21,22 etc.. then I suggest doing the following:

names(educ_vec) = 0:20
gss_df <- data.frame(educ=c(-1,0,20,21))
# you can also use mutate
gss_df$new <- educ_vec[match(gss_df$educ,names(educ_vec))]
gss_df

  educ                 new
1   -1                <NA>
2    0 No formal schooling
3   20  8 years of college
4   21                <NA>

Match will return a NA if it cannot find the corresponding name in your educ_vec

Replacing values from a column using a condition in R

# reassign depth values under 10 to zero
df$depth[df$depth<10] <- 0

(For the columns that are factors, you can only assign values that are factor levels. If you wanted to assign a value that wasn't currently a factor level, you would need to create the additional level first:

levels(df$species) <- c(levels(df$species), "unknown") 
df$species[df$depth<10]  <- "unknown"

Replacing Numbers Within a Range with a Factor