How to Create a Single Dummy Variable with conditions in multiple columns?
You can use rowSums
(vectorized solution) like this :
set.seed(123)
dat <- matrix(sample(c(35,1:100),size=15*20,rep=T),ncol=15,byrow=T)
cbind(dat,rowSums(dat[,9:15] == 35) > 0)
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14] [,15] [,16]
[1,] 29 79 41 89 94 4 53 90 55 46 96 45 68 57 10 0
[2,] 90 24 4 33 96 89 69 64 100 66 71 54 60 29 14 0
[3,] 97 91 69 80 2 48 76 21 32 23 14 41 41 37 15 0
[4,] 14 23 47 26 86 4 44 80 12 56 20 12 76 90 37 0
[5,] 67 9 38 27 82 45 81 82 80 44 76 63 71 35 48 1
[6,] 22 38 61 35 11 24 67 42 79 10 43 99 90 89 17 0
[7,] 13 65 34 66 32 18 79 9 47 51 60 33 49 96 48 0
[8,] 89 92 61 41 14 94 30 6 95 72 14 55 96 59 40 0
[9,] 65 32 31 22 37 99 15 9 14 69 62 90 67 74 52 0
[10,] 66 83 79 98 44 31 41 1 18 85 23 24 7 24 73 0
[11,] 85 50 39 24 11 39 57 21 44 22 50 35 65 37 35 1
[12,] 53 74 22 41 26 63 18 87 75 67 62 37 53 88 58 0
[13,] 84 31 71 26 60 48 26 57 92 91 27 32 99 62 94 0
[14,] 47 41 66 15 57 24 97 60 52 40 88 36 29 17 17 0
[15,] 48 25 21 68 4 70 35 41 82 92 28 97 73 69 5 0
[16,] 39 48 56 70 92 62 43 54 5 26 40 19 84 15 81 0
[17,] 55 66 17 63 31 73 40 97 97 73 25 22 59 27 53 0
[18,] 79 16 40 47 87 93 89 68 95 52 58 33 35 2 50 1
[19,] 87 35 7 16 77 74 98 47 7 65 76 13 40 22 5 0
[20,] 39 6 22 5 67 30 10 7 88 76 82 99 10 10 80 0
EDIT
I replace the cbind
by transform
. Since the column will be boolean I coerce it to get 0/1.
transform(dat,x=as.numeric((rowSums(dat[,9:15] == 35) > 0)))
The result is a data.frame.( coerced from matrix by transform)
EDIT2 ( as suggested by @flodel)
data$indicator <- as.integer(rowSums(data[paste0("col", 9:15)] == 35) > 0)
where data
is the OP's data.frame.
Dummy variable with multiple conditions
We can use |
with &
to create the logical expression
i1 <- with(df, (x > -100 & x <- 90)|(x > -80 & x < -50)|(y > 50 & y < 45))
df1dummy_var[i1] <- 1
How to create dummy variable based on the value of two columns in R?
With tidyverse
you could try the following.
Use group_by
with Country
to consider all the Time
values within each Country
.
To satisfy DummyTime123
criteria, you need all
values of 1, 2, and 3 in the Time
values within a Country
. If TRUE
, then using +
this becomes 1.
For DummyTime23
, it sounds like you want both 2 and 3 in Time
but do not want any
values of Time
to be 1. Using &
you can make sure both criteria are satisfied.
Let me know if this provides the results expected.
library(tidyverse)
df %>%
group_by(Country) %>%
mutate(DummyTime123 = +all(1:3 %in% Time),
DummyTime23 = +(all(2:3 %in% Time) & !any(Time == 1)))
Output
Country Time DummyTime123 DummyTime23
<chr> <dbl> <int> <int>
1 US 1 1 0
2 US 1 1 0
3 US 2 1 0
4 US 3 1 0
5 IT 1 0 0
6 IT 2 0 0
7 IT 1 0 0
8 FR 2 0 1
9 FR 3 0 1
Add a new column having a dummy variable for complete group based on a condition
You can do it like this also.
df['col_2'] = (df.groupby('id')['col_1']
.transform(lambda x: x.rolling(3).sum().eq(3).any())
.astype(int))
df
Output:
id date col_1 col_2
0 A 2015 1 1
1 A 2016 1 1
2 A 2017 1 1
3 A 2018 0 1
4 B 2015 1 0
5 B 2016 0 0
6 B 2017 1 0
7 B 2018 1 0
8 C 2015 0 1
9 C 2016 1 1
10 C 2017 1 1
11 C 2018 1 1
Creating a dummy variable based on whether words appear in multiple columns
base R
found <- sapply(dat[c("protesterdemand1", "protesterdemand2", "protesterdemand3", "protesterdemand1")],
grepl, pattern = "political behavior|police brutality|removal of politician", ignore.case = TRUE) # ignore is just-in-case, over to you
found
# protesterdemand1 protesterdemand2 protesterdemand3 protesterdemand1.1
# [1,] TRUE FALSE FALSE TRUE
# [2,] TRUE FALSE FALSE TRUE
# [3,] TRUE FALSE FALSE TRUE
# [4,] FALSE FALSE FALSE FALSE
# [5,] TRUE FALSE FALSE TRUE
# [6,] TRUE FALSE FALSE TRUE
dat$sensitive_issue <- rowSums(found) > 0
dat
# Country COWcode Year Region Protest protesterviolence protesterdemand1 protesterdemand2 protesterdemand3
# 1 Canada 20 1990 North America 1 0 political behavior, process labor wage dispute
# 2 Canada 20 1990 North America 1 0 political behavior, process
# 3 Canada 20 1990 North America 1 0 political behavior, process
# 4 Canada 20 1990 North America 1 1 land farm issue
# 5 Canada 20 1990 North America 1 1 political behavior, process
# 6 Canada 20 1990 North America 1 0 police brutality
# protesterdemand4 stateresponse1 stateresponse2 stateresponse3 stateresponse4 stateresponse5 stateresponse6 stateresponse7
# 1 ignore
# 2 ignore
# 3 ignore
# 4 accomodation
# 5 crowd dispersal arrests accomodation
# 6 crowd dispersal shootings
# participants participants_category sensitive_issue
# 1 1000s TRUE
# 2 1000 TRUE
# 3 500 TRUE
# 4 100s FALSE
# 5 950 TRUE
# 6 200 TRUE
Create dummy variables for every unique value in a column based on a condition from a second column in R
Here is a crude way to do this
df <- data.frame(country = c ("Australia","Australia","Australia","Angola","Angola","Angola","US","US","US"), year=c("1945","1946","1947"), leader = c("David", "NA", "NA", "NA","Henry","NA","Tom","NA","Chris"), natural.death = c(0,NA,NA,NA,1,NA,1,NA,0),gdp.growth.rate=c(1,4,3,5,6,1,5,7,9))
tmp=which(df$natural.death==1) #index of deaths
lng=length(tmp) #number of deaths
#create matrix with zeros and lng columns, append to df
df=cbind(df,data.frame(matrix(0,nrow=nrow(df),ncol=lng)))
#change the newly added column names
colnames(df)[(ncol(df)-lng+1):ncol(df)]=paste0("id",1:lng)
for (i in 1:lng) { #loop over new columns
df[tmp[i],paste0("id",i)]=1 #at index i of death and column id+i set df to 1
}
country year leader natural.death gdp.growth.rate id1 id2
1 Australia 1945 David 0 1 0 0
2 Australia 1946 NA NA 4 0 0
3 Australia 1947 NA NA 3 0 0
4 Angola 1945 NA NA 5 0 0
5 Angola 1946 Henry 1 6 1 0
6 Angola 1947 NA NA 1 0 0
7 US 1945 Tom 1 5 0 1
8 US 1946 NA NA 7 0 0
9 US 1947 Chris 0 9 0 0
Building dummy variable with many conditions (R)
Welcome to the world of code! R's syntax can be tricky (even for experienced coders) and dplyr
adds its own quirks. First off, it's useful when you ask questions to provide code that other people can run in order to be able to reproduce your data. You can learn more about that here.
Are you trying to create code that works for all possible values of DOB
and ATTx
? In other words, do you have a whole bunch of variables that start with ATT and you want to look at all of them? That format is called wide data, and R works much better with long data. Fortunately the reshape2
package does exactly that. The code below creates a dummy
variable with a value of 1 for people who were in school when they were either 19 or 20 years old.
# Load libraries
library(dplyr)
library(reshape2)
# Create a sample dataset
ATT94 <- runif(500, min = 0, max = 1) %>% round(digits = 0)
ATT96 <- runif(500, min = 0, max = 1) %>% round(digits = 0)
ATT98 <- runif(500, min = 0, max = 1) %>% round(digits = 0)
DOB <- rnorm(500, mean = 1977, sd = 5) %>% round(digits = 0)
df <- cbind(DOB, ATT94, ATT96, ATT98) %>% data.frame()
# Recode ATTx variables with the actual year
df$ATT94[df$ATT94==1] <- 1994
df$ATT96[df$ATT96==1] <- 1996
df$ATT98[df$ATT98==1] <- 1998
# Melt the data into a long format and perform requested analysis
df %>%
melt(id = "DOB") %>%
tbl_df() %>%
mutate(dummy = ifelse(value - DOB %in% c(19,20), 1, 0))
Related Topics
Usage of Dot/Period in R Functions
Ggplot: How to Produce a Gradient Fill Within a Geom_Polygon
Cant Create File Name with Time Stamp
Filter Data Table by Dynamic Column Name
Obtain Function from Akima::Interp() Matrix
How to Add a Legend for the Secondary Axis Ggplot
Uri Routing for Shinydashboard Using Shiny.Router
Format a Vector of Rows in Italic and Red Font in R Dt (Datatable)
In Place Modification of Matrices in R
How to Extract Text from R's Help Command
Return Call from Ggplot Object
How to Not Plot Gaps in Timeseries with R
How to Scrape Website with Form Using Rvest
Function/Loop to Replace Na with Values in Adjacent Columns in R
R: Ggplot2 Setting the Last Plot in the Midle with Facet_Wrap