How to Create a Single Dummy Variable with Conditions in Multiple Columns

How to Create a Single Dummy Variable with conditions in multiple columns?

You can use rowSums (vectorized solution) like this :

set.seed(123)
dat <- matrix(sample(c(35,1:100),size=15*20,rep=T),ncol=15,byrow=T)
cbind(dat,rowSums(dat[,9:15] == 35) > 0)
   [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14] [,15] [,16]
 [1,]   29   79   41   89   94    4   53   90   55    46    96    45    68    57    10     0
 [2,]   90   24    4   33   96   89   69   64  100    66    71    54    60    29    14     0
 [3,]   97   91   69   80    2   48   76   21   32    23    14    41    41    37    15     0
 [4,]   14   23   47   26   86    4   44   80   12    56    20    12    76    90    37     0
 [5,]   67    9   38   27   82   45   81   82   80    44    76    63    71    35    48     1
 [6,]   22   38   61   35   11   24   67   42   79    10    43    99    90    89    17     0
 [7,]   13   65   34   66   32   18   79    9   47    51    60    33    49    96    48     0
 [8,]   89   92   61   41   14   94   30    6   95    72    14    55    96    59    40     0
 [9,]   65   32   31   22   37   99   15    9   14    69    62    90    67    74    52     0
[10,]   66   83   79   98   44   31   41    1   18    85    23    24     7    24    73     0
[11,]   85   50   39   24   11   39   57   21   44    22    50    35    65    37    35     1
[12,]   53   74   22   41   26   63   18   87   75    67    62    37    53    88    58     0
[13,]   84   31   71   26   60   48   26   57   92    91    27    32    99    62    94     0
[14,]   47   41   66   15   57   24   97   60   52    40    88    36    29    17    17     0
[15,]   48   25   21   68    4   70   35   41   82    92    28    97    73    69     5     0
[16,]   39   48   56   70   92   62   43   54    5    26    40    19    84    15    81     0
[17,]   55   66   17   63   31   73   40   97   97    73    25    22    59    27    53     0
[18,]   79   16   40   47   87   93   89   68   95    52    58    33    35     2    50     1
[19,]   87   35    7   16   77   74   98   47    7    65    76    13    40    22     5     0
[20,]   39    6   22    5   67   30   10    7   88    76    82    99    10    10    80     0

EDIT

I replace the cbind by transform. Since the column will be boolean I coerce it to get 0/1.

 transform(dat,x=as.numeric((rowSums(dat[,9:15] == 35) > 0)))

The result is a data.frame.( coerced from matrix by transform)

EDIT2 ( as suggested by @flodel)

data$indicator <- as.integer(rowSums(data[paste0("col", 9:15)] == 35) > 0)

where data is the OP's data.frame.

Dummy variable with multiple conditions

We can use | with & to create the logical expression

i1 <- with(df, (x > -100  & x <- 90)|(x > -80 & x < -50)|(y > 50 & y < 45))
df1dummy_var[i1] <- 1

How to create dummy variable based on the value of two columns in R?

With tidyverse you could try the following.

Use group_by with Country to consider all the Time values within each Country.

To satisfy DummyTime123 criteria, you need all values of 1, 2, and 3 in the Time values within a Country. If TRUE, then using + this becomes 1.

For DummyTime23, it sounds like you want both 2 and 3 in Time but do not want any values of Time to be 1. Using & you can make sure both criteria are satisfied.

Let me know if this provides the results expected.

library(tidyverse)

df %>%
  group_by(Country) %>%
  mutate(DummyTime123 = +all(1:3 %in% Time),
         DummyTime23 = +(all(2:3 %in% Time) & !any(Time == 1)))

Output

  Country  Time DummyTime123 DummyTime23
  <chr>   <dbl>        <int>       <int>
1 US          1            1           0
2 US          1            1           0
3 US          2            1           0
4 US          3            1           0
5 IT          1            0           0
6 IT          2            0           0
7 IT          1            0           0
8 FR          2            0           1
9 FR          3            0           1

Add a new column having a dummy variable for complete group based on a condition

You can do it like this also.

df['col_2'] = (df.groupby('id')['col_1']
                 .transform(lambda x: x.rolling(3).sum().eq(3).any())
                 .astype(int))
df

Output:

   id  date  col_1  col_2
0   A  2015      1      1
1   A  2016      1      1
2   A  2017      1      1
3   A  2018      0      1
4   B  2015      1      0
5   B  2016      0      0
6   B  2017      1      0
7   B  2018      1      0
8   C  2015      0      1
9   C  2016      1      1
10  C  2017      1      1
11  C  2018      1      1

Creating a dummy variable based on whether words appear in multiple columns

base R

found <- sapply(dat[c("protesterdemand1", "protesterdemand2", "protesterdemand3", "protesterdemand1")],
                grepl, pattern = "political behavior|police brutality|removal of politician", ignore.case = TRUE) # ignore is just-in-case, over to you
found
#      protesterdemand1 protesterdemand2 protesterdemand3 protesterdemand1.1
# [1,]             TRUE            FALSE            FALSE               TRUE
# [2,]             TRUE            FALSE            FALSE               TRUE
# [3,]             TRUE            FALSE            FALSE               TRUE
# [4,]            FALSE            FALSE            FALSE              FALSE
# [5,]             TRUE            FALSE            FALSE               TRUE
# [6,]             TRUE            FALSE            FALSE               TRUE

dat$sensitive_issue <- rowSums(found) > 0

dat
#   Country COWcode Year        Region Protest protesterviolence            protesterdemand1   protesterdemand2 protesterdemand3
# 1  Canada      20 1990 North America       1                 0 political behavior, process labor wage dispute                 
# 2  Canada      20 1990 North America       1                 0 political behavior, process                                    
# 3  Canada      20 1990 North America       1                 0 political behavior, process                                    
# 4  Canada      20 1990 North America       1                 1             land farm issue                                    
# 5  Canada      20 1990 North America       1                 1 political behavior, process                                    
# 6  Canada      20 1990 North America       1                 0            police brutality                                    
#   protesterdemand4  stateresponse1 stateresponse2 stateresponse3 stateresponse4 stateresponse5 stateresponse6 stateresponse7
# 1                           ignore                                                                                          
# 2                           ignore                                                                                          
# 3                           ignore                                                                                          
# 4                     accomodation                                                                                          
# 5                  crowd dispersal        arrests   accomodation                                                            
# 6                  crowd dispersal      shootings                                                                           
#   participants participants_category sensitive_issue
# 1        1000s                                  TRUE
# 2         1000                                  TRUE
# 3          500                                  TRUE
# 4         100s                                 FALSE
# 5          950                                  TRUE
# 6          200                                  TRUE

Create dummy variables for every unique value in a column based on a condition from a second column in R

Here is a crude way to do this

df <- data.frame(country = c ("Australia","Australia","Australia","Angola","Angola","Angola","US","US","US"), year=c("1945","1946","1947"), leader = c("David", "NA", "NA", "NA","Henry","NA","Tom","NA","Chris"), natural.death = c(0,NA,NA,NA,1,NA,1,NA,0),gdp.growth.rate=c(1,4,3,5,6,1,5,7,9))

tmp=which(df$natural.death==1) #index of deaths
lng=length(tmp) #number of deaths

#create matrix with zeros and lng columns, append to df
df=cbind(df,data.frame(matrix(0,nrow=nrow(df),ncol=lng)))
#change the newly added column names
colnames(df)[(ncol(df)-lng+1):ncol(df)]=paste0("id",1:lng)

for (i in 1:lng) { #loop over new columns
   df[tmp[i],paste0("id",i)]=1 #at index i of death and column id+i set df to 1
}

    country year leader natural.death gdp.growth.rate id1 id2
1 Australia 1945  David             0               1   0   0
2 Australia 1946     NA            NA               4   0   0
3 Australia 1947     NA            NA               3   0   0
4    Angola 1945     NA            NA               5   0   0
5    Angola 1946  Henry             1               6   1   0
6    Angola 1947     NA            NA               1   0   0
7        US 1945    Tom             1               5   0   1
8        US 1946     NA            NA               7   0   0
9        US 1947  Chris             0               9   0   0

Building dummy variable with many conditions (R)

Welcome to the world of code! R's syntax can be tricky (even for experienced coders) and dplyr adds its own quirks. First off, it's useful when you ask questions to provide code that other people can run in order to be able to reproduce your data. You can learn more about that here.

Are you trying to create code that works for all possible values of DOB and ATTx? In other words, do you have a whole bunch of variables that start with ATT and you want to look at all of them? That format is called wide data, and R works much better with long data. Fortunately the reshape2 package does exactly that. The code below creates a dummy variable with a value of 1 for people who were in school when they were either 19 or 20 years old.

# Load libraries 
library(dplyr)
library(reshape2)

# Create a sample dataset
ATT94 <- runif(500, min = 0, max = 1) %>% round(digits = 0)
ATT96 <- runif(500, min = 0, max = 1) %>% round(digits = 0)
ATT98 <- runif(500, min = 0, max = 1) %>% round(digits = 0)
DOB <- rnorm(500, mean = 1977, sd = 5) %>% round(digits = 0)
df <- cbind(DOB, ATT94, ATT96, ATT98) %>% data.frame()

# Recode ATTx variables with the actual year
df$ATT94[df$ATT94==1] <- 1994
df$ATT96[df$ATT96==1] <- 1996
df$ATT98[df$ATT98==1] <- 1998

# Melt the data into a long format and perform requested analysis
df %>%
  melt(id = "DOB") %>%
  tbl_df() %>%
  mutate(dummy = ifelse(value - DOB %in% c(19,20), 1, 0))

How to Create a Single Dummy Variable with Conditions in Multiple Columns