Genre/Sex to numeric in R
ifelse
is pretty efficient. Try
train$Sex_num <- ifelse(train$Sex=="male", 1, 0)
How to replace 1 and 0 with male and female in R?
Your indexing wasn't working because you were trying to replace the entire data frame. That is, df[df$sex == 1]
was causing a problem because R doesn't know which elements you wanted to replace. You could do the following:
df$sex[df$sex == 0] <- "female"
df$sex[df$sex == 1] <- "male"
Or, you could just make the variable into a factor.
df <- data.frame(
+ sex = c(1,0,0,1, NA)
+ )
df$sex <- factor(df$sex,
levels=c(0,1),
labels=c("female","male"))
df
# sex
# 1 male
# 2 female
# 3 female
# 4 male
# 5 <NA>
how to change gender column's values to numeric values in R?
Maybe u can try and tweak:
DT[, GENDER := 2]
DT[toupper(X) %chin% c("M","MAN","BOY") | grepl("male", X, ignore.case=TRUE), GENDER := 0]
DT[toupper(X) %chin% c("F","WOMAN","GIRL") | grepl("female", X, ignore.case=TRUE), GENDER :=1]
data:
library(data.table)
DT <- data.table(X=c("Malel","male","female","m","f","Male","Female","Demiguy",
"none","Trans","Cisgender","non-binary","She/her/they/them","Other","Cis",
"SWM","NB","Genderfluid","Nonbinary/femme"))
How to convert a factor to integer\numeric without loss of information?
See the Warning section of ?factor
:
In particular,
as.numeric
applied to
a factor is meaningless, and may
happen by implicit coercion. To
transform a factorf
to
approximately its original numeric
values,as.numeric(levels(f))[f]
is
recommended and slightly more
efficient than
as.numeric(as.character(f))
.
The FAQ on R has similar advice.
Why is as.numeric(levels(f))[f]
more efficent than as.numeric(as.character(f))
?
as.numeric(as.character(f))
is effectively as.numeric(levels(f)[f])
, so you are performing the conversion to numeric on length(x)
values, rather than on nlevels(x)
values. The speed difference will be most apparent for long vectors with few levels. If the values are mostly unique, there won't be much difference in speed. However you do the conversion, this operation is unlikely to be the bottleneck in your code, so don't worry too much about it.
Some timings
library(microbenchmark)
microbenchmark(
as.numeric(levels(f))[f],
as.numeric(levels(f)[f]),
as.numeric(as.character(f)),
paste0(x),
paste(x),
times = 1e5
)
## Unit: microseconds
## expr min lq mean median uq max neval
## as.numeric(levels(f))[f] 3.982 5.120 6.088624 5.405 5.974 1981.418 1e+05
## as.numeric(levels(f)[f]) 5.973 7.111 8.352032 7.396 8.250 4256.380 1e+05
## as.numeric(as.character(f)) 6.827 8.249 9.628264 8.534 9.671 1983.694 1e+05
## paste0(x) 7.964 9.387 11.026351 9.956 10.810 2911.257 1e+05
## paste(x) 7.965 9.387 11.127308 9.956 11.093 2419.458 1e+05
How to change gender factor value to 0 and 1 instead of 1 and 2 for my excel dataset in r
The main issue is that factor
to direct conversion to numeric/integer
returns the integer storage mode values which start from 1. Instead, we need to use as.numeric
on the character
converted object which is mentioned in the documentation (?factor
)
In particular, as.numeric applied to a factor is meaningless, and may happen by implicit coercion. To transform a factor f to approximately its original numeric values, as.numeric(levels(f))[f] is recommended and slightly more efficient than as.numeric(as.character(f)).
Gender_numeric <- as.numeric(as.character(factor(Dataset_A$gender,
levels=c("female",
"male"),labels=c("0","1"))))
With a small example
v1 <- c("male", "female", "male", "female")
v2 <- factor(v1, levels = c("female", "male"), labels = c("0", "1"))
If we look at 'v2', the new levels
are 0 and 1
> v2
[1] 1 0 1 0
Levels: 0 1
But , converting to integer/numeric
directly
> as.integer(v2)
[1] 2 1 2 1
instead, do convert to character
first`
> as.integer(as.character(v2))
[1] 1 0 1 0
Or may also do this using levels
(which would be fast)
> as.integer(levels(v2)[v2])
[1] 1 0 1 0
Related Topics
Filling in the Area Under a Line Graph in Ggplot2: Geom_Area()
Handling Missing Combinations of Factors in R
Multiplying Vector Combinations
Find Closest Points (Lat/Lon) from One Data Set to a Second Data Set
R Geom_Tile Ggplot2 What Kind of Stat Is Applied
Difference of Two Character Vectors with Substring
Axis Does Not Plot with Date Labels
Why Isn't the R Function Sink() Writing a Summary Output to My Results File
R Shiny - Ui.R Seems to Not Recognize a Dataframe Read by Server.R
Technique for Finding Bad Data in Read.CSV in R
R Cleaning Up a Character and Converting It into a Numeric
Simulate an Ar(1) Process with Uniform Innovations
Pre-Select Rows of a Dynamic Dt in Shiny
R: Miscellaneous Errors While Trying to Plot Graphs