Simple way of creating dummy variable in R
We can create a logical vector by df$Z < 0
and then coerce it to binary by wrapping with +
.
df$D <- +(df$Z <0)
Or as @BenBolker mentioned, the canonical options would be
as.numeric(df$Z < 0)
or
as.integer(df$Z < 0)
Benchmarks
set.seed(42)
Z <- rnorm(1e7)
library(microbenchmark)
microbenchmark(akrun= +(Z < 0), etienne = ifelse(Z < 0, 1, 0),
times= 20L, unit='relative')
# Unit: relative
# expr min lq mean median uq max neval
# akrun 1.00000 1.00000 1.000000 1.00000 1.00000 1.000000 20
# etienne 12.20975 10.36044 9.926074 10.66976 9.32328 7.830117 20
How to create a dummy variable in R by comparing the string values in a column
Your comment leads me to believe that you have a factor variable, so you should therefore first convert to a character vector and then convert to a numeric. The "random values" you are seeing are the integer indices into the factor levels
attribute:
dfrm$newcol <- as.numeric(as.character(dfrm$oldcol))>55 +0
The "+0" is in there to convert logical to numeric. Could also use as.integer
or as.numeric
around the whole expression.
Create Dummy Variable if NA 1 else -1
If it is a real NA
, then we can use is.na
to detect the NA
elements, which would return TRUE
for all NA
and FALSE
for others as a logical vector, which can be used in ifelse
to change the values
ifelse(is.na(Cox.Reg$active_task_avg_depth), 1, -1)
Or another option is to create a numeric index and change the values accordingly
c(-1, 1)[is.na(Cox.Reg$active_task_avg_depth) + 1]
Using model.matrix() to create dummy variables
We could also convert to character
dataframe1$x1 <- as.character(dataframe1$x1)
> model.matrix(~x1 - 1, dataframe1)
x11 x12 x13 x14 x15
1 1 0 0 0 0
2 0 1 0 0 0
3 0 0 1 0 0
4 0 0 0 1 0
5 0 0 0 0 1
How do I make a dummy variable in R?
With most of R's modelling tools with a formula interface you don't need to create dummy variables, the underlying code that handles and interprets the formula will do this for you. If you want a dummy variable for some other reason then there are several options. The easiest (IMHO) is to use model.matrix()
:
set.seed(1)
dat <- data.frame(sex = sample(c("male","female"), 10, replace = TRUE))
model.matrix( ~ sex - 1, data = dat)
which gives:
> dummy <- model.matrix( ~ sex - 1, data = dat)
> dummy
sexfemale sexmale
1 0 1
2 0 1
3 1 0
4 1 0
5 0 1
6 1 0
7 1 0
8 1 0
9 1 0
10 0 1
attr(,"assign")
[1] 1 1
attr(,"contrasts")
attr(,"contrasts")$sex
[1] "contr.treatment"
> dummy[,1]
1 2 3 4 5 6 7 8 9 10
0 0 1 1 0 1 1 1 1 0
You can use either column of dummy
as a numeric dummy variable; choose whichever column you want to be the 1
-based level. dummy[,1]
chooses 1
as representing the female class and dummy[,2]
the male class.
Cast this as a factor if you want it to be interpreted as a categorical object:
> factor(dummy[, 1])
1 2 3 4 5 6 7 8 9 10
0 0 1 1 0 1 1 1 1 0
Levels: 0 1
But that is defeating the object of factor; what is 0
again?
How to create a dummy variable in R using ifelse() command
Assuming your data frame is called df
, you can create your dummy variable (Vegan
) using:
df$Vegan <- ifelse(df$type == "Vegan", 1, 0) # where variable type is type of restaurants
However, you should note that if type
is a stored as factor, you can also get the coefficient on each type of restaurants (compared to the reference level) using y=b0+b1(reviews_number)+b2(type) i.e. y~reviews+type
, as pointed by @mlt.
Loop over data.frame columns to generate dummy variable in R
dt[, 69:135] == 1
will return TRUE
if the value in column 69:135 is 1 and FALSE
otherwise.
dt[, 178:244] == 2
will return TRUE
if the value in column 178:244 is 2 and FALSE
otherwise.
You can perform an AND (&
) operation between them to compare them elementwise meaning dt[, 69] & dt[, 178]
, dt[, 70] & dt[, 179]
and so on. Take rowwise sum of them and mark it as 'Yes'
even if a single TRUE
is found in that row.
dt$left_region <- ifelse(rowSums(dt[, 69:135] == 1 & dt[, 178:244] == 2) > 0, 'yes', 'no')
Related Topics
When Using Ggplot in R, How to Remove Margins Surrounding the Plot Area
R Shiny Error: Object Input Not Found
Fitting Linear Model/Anova by Group
Selection of Activity Trace in a Chart and Display in a Data Table in R Shiny
Importing Data into R from Google Spreadsheet
Pandoc Insert Appendix After Bibliography
How to Change X-Axis Tick Label Names, Order and Boxplot Colour Using R Ggplot
How to Reference the Local Environment Within a Function, in R
Programmatically Insert Text, Headers and Lists with R Markdown
Figure Out What Version of R a Function Was Introduced In
Using ':=' in Data.Table to Sum the Values of Two Columns in R, Ignoring Nas
Run R Script from .Bat (Batch File)
Sort Data Frame Column by Factor
Importing Only Every Nth Row from a .CSV File in R
How to Exit a Shiny App and Return a Value
Grouped Barplot with Cut Y Axis
Knitr: Getting a Parse_All Error in R When Converting Rmd File into HTML