How to Change Gender Factor into an Numerical Coding in R

Genre/Sex to numeric in R

ifelse is pretty efficient. Try

train$Sex_num <- ifelse(train$Sex=="male", 1, 0)

How to replace 1 and 0 with male and female in R?

Your indexing wasn't working because you were trying to replace the entire data frame. That is, df[df$sex == 1] was causing a problem because R doesn't know which elements you wanted to replace. You could do the following:

df$sex[df$sex == 0] <- "female"
df$sex[df$sex == 1] <- "male"

Or, you could just make the variable into a factor.

df <- data.frame(
+ sex = c(1,0,0,1, NA)
+ )
df$sex <- factor(df$sex,
levels=c(0,1),
labels=c("female","male"))

df
# sex
# 1 male
# 2 female
# 3 female
# 4 male
# 5 <NA>

how to change gender column's values to numeric values in R?

Maybe u can try and tweak:

DT[, GENDER := 2]
DT[toupper(X) %chin% c("M","MAN","BOY") | grepl("male", X, ignore.case=TRUE), GENDER := 0]
DT[toupper(X) %chin% c("F","WOMAN","GIRL") | grepl("female", X, ignore.case=TRUE), GENDER :=1]

data:

library(data.table)
DT <- data.table(X=c("Malel","male","female","m","f","Male","Female","Demiguy",
"none","Trans","Cisgender","non-binary","She/her/they/them","Other","Cis",
"SWM","NB","Genderfluid","Nonbinary/femme"))

How to convert a factor to integer\numeric without loss of information?

See the Warning section of ?factor:

In particular, as.numeric applied to
a factor is meaningless, and may
happen by implicit coercion. To
transform a factor f to
approximately its original numeric
values, as.numeric(levels(f))[f] is
recommended and slightly more
efficient than
as.numeric(as.character(f)).

The FAQ on R has similar advice.


Why is as.numeric(levels(f))[f] more efficent than as.numeric(as.character(f))?

as.numeric(as.character(f)) is effectively as.numeric(levels(f)[f]), so you are performing the conversion to numeric on length(x) values, rather than on nlevels(x) values. The speed difference will be most apparent for long vectors with few levels. If the values are mostly unique, there won't be much difference in speed. However you do the conversion, this operation is unlikely to be the bottleneck in your code, so don't worry too much about it.


Some timings

library(microbenchmark)
microbenchmark(
as.numeric(levels(f))[f],
as.numeric(levels(f)[f]),
as.numeric(as.character(f)),
paste0(x),
paste(x),
times = 1e5
)
## Unit: microseconds
## expr min lq mean median uq max neval
## as.numeric(levels(f))[f] 3.982 5.120 6.088624 5.405 5.974 1981.418 1e+05
## as.numeric(levels(f)[f]) 5.973 7.111 8.352032 7.396 8.250 4256.380 1e+05
## as.numeric(as.character(f)) 6.827 8.249 9.628264 8.534 9.671 1983.694 1e+05
## paste0(x) 7.964 9.387 11.026351 9.956 10.810 2911.257 1e+05
## paste(x) 7.965 9.387 11.127308 9.956 11.093 2419.458 1e+05

How to change gender factor value to 0 and 1 instead of 1 and 2 for my excel dataset in r

The main issue is that factor to direct conversion to numeric/integer returns the integer storage mode values which start from 1. Instead, we need to use as.numeric on the character converted object which is mentioned in the documentation (?factor)

In particular, as.numeric applied to a factor is meaningless, and may happen by implicit coercion. To transform a factor f to approximately its original numeric values, as.numeric(levels(f))[f] is recommended and slightly more efficient than as.numeric(as.character(f)).

Gender_numeric <- as.numeric(as.character(factor(Dataset_A$gender,
levels=c("female",
"male"),labels=c("0","1"))))

With a small example

v1 <- c("male", "female", "male", "female")
v2 <- factor(v1, levels = c("female", "male"), labels = c("0", "1"))

If we look at 'v2', the new levels are 0 and 1

> v2
[1] 1 0 1 0
Levels: 0 1

But , converting to integer/numeric directly

> as.integer(v2)
[1] 2 1 2 1

instead, do convert to character first`

> as.integer(as.character(v2))
[1] 1 0 1 0

Or may also do this using levels (which would be fast)

> as.integer(levels(v2)[v2])
[1] 1 0 1 0


Related Topics



Leave a reply



Submit