Dummify Character Column and Find Unique Values

Create dummy variables for every unique value in a column based on a condition from a second column in R

Here is a crude way to do this

df <- data.frame(country = c ("Australia","Australia","Australia","Angola","Angola","Angola","US","US","US"), year=c("1945","1946","1947"), leader = c("David", "NA", "NA", "NA","Henry","NA","Tom","NA","Chris"), natural.death = c(0,NA,NA,NA,1,NA,1,NA,0),gdp.growth.rate=c(1,4,3,5,6,1,5,7,9))

tmp=which(df$natural.death==1) #index of deaths
lng=length(tmp) #number of deaths

#create matrix with zeros and lng columns, append to df
df=cbind(df,data.frame(matrix(0,nrow=nrow(df),ncol=lng)))
#change the newly added column names
colnames(df)[(ncol(df)-lng+1):ncol(df)]=paste0("id",1:lng)

for (i in 1:lng) { #loop over new columns
   df[tmp[i],paste0("id",i)]=1 #at index i of death and column id+i set df to 1
}

    country year leader natural.death gdp.growth.rate id1 id2
1 Australia 1945  David             0               1   0   0
2 Australia 1946     NA            NA               4   0   0
3 Australia 1947     NA            NA               3   0   0
4    Angola 1945     NA            NA               5   0   0
5    Angola 1946  Henry             1               6   1   0
6    Angola 1947     NA            NA               1   0   0
7        US 1945    Tom             1               5   0   1
8        US 1946     NA            NA               7   0   0
9        US 1947  Chris             0               9   0   0

Create Dummies for Multiple Columns on Unique Value in a Column

I believe you can get this by using both pd.get_dummies() and df.groupby().any(). The groupby().any() will return TRUE/FALSE, and so you end that with converting to int

df2 = pd.get_dummies(df,columns=['CTI','RESOLUTION']) # df is what you have in your first example. Putting in the columns here restricts dummies to just those columns.
df2.groupby('ACCOUNT').any().astype(int)

Separate each unique value of a column into separate columns and remove original column?

This will do all that you're after


library(fastDummies)

# Numerically encode gear column as dummy variables
mt_cars_with_gear_dummy_variables <- fastDummies::dummy_cols(mtcars, select_columns = "gear")


# Remove original gear column
mt_cars_with_gear_dummy_variables[, !names(mt_cars_with_gear_dummy_variables) %in% c("gear")] 


mt_cars_with_gear_dummy_variables

How to search for and extract unique values from one column in another column?

I think this works for you:

mutate(df, Col_C = stringr::str_extract(
  Col_A,
  paste0("\\b(", paste0(unique(Col_B), collapse = "|"), ")\\b")))
#                Col_A  Col_B  Col_C
# 1   blue shovel 1024   blue   blue
# 2    red shovel 1022    red    red
# 3  green bucket 3021  green  green
# 4    green rake 3021   blue  green
# 5 yellow shovel 1023 yellow yellow

Breakdown:

paste0(unique(Col_B), collapse="|") takes the words in Col_B, de-duplicates it, and concatenates them all together with | symbols; that is, c("blue","red","green") --> "blue|red|green". In regex, the | symbol is an "OR" operator.
\\b( and )\\b are word-boundaries, meaning that there isn't a word-like character immediately before (first) or after (second) the patterns; by adding this around the words, we prevent a partial match of blu on blue (in case that ever happens); while it is not apparent that this changes anything here, it's a more defensive/specific pattern. The parens add grouping, more evident in the next bullet.
With all of that, our overall pattern looks something like "\\b(blue|red|green)\\b" (abbreviated). This translates into "find blue or red or green such that there is a word-boundary on both ends of whichever one(s) you find".

Generate all posible dummies according values of var in r

Here is a solution which uses strsplit() to split up the character strings and dcast() to reshape from long to wide format:

library(data.table)
setDT(df)[, rn := .I][
  , strsplit(as.character(V1), ","), by = rn][
    , dcast(.SD, rn ~ V1, length)]

   rn a b c d e f
1:  1 1 1 1 1 1 1
2:  2 1 1 1 0 0 0
3:  3 0 0 0 0 1 1
4:  4 0 1 0 1 0 0
5:  5 1 0 0 0 1 0

If V1 is to be included, it can be joined afterwards:

library(data.table) # version 1.11.4 used
setDT(df)[, rn := .I][
  , strsplit(as.character(V1), ","), by = rn][
    , dcast(.SD, rn ~ V1, length)][
      df, on = "rn"][
        , setcolorder(.SD, "V1")]

            V1 rn a b c d e f
1: a,b,c,d,e,f  1 1 1 1 1 1 1
2:       a,b,c  2 1 1 1 0 0 0
3:         e,f  3 0 0 0 0 1 1
4:         b,d  4 0 1 0 1 0 0
5:         a,e  5 1 0 0 0 1 0

setcolorder() is used to move the V1 column to the front.

creating a dummy matrix from a concatenated column

You can do:

relative <- c("aunt", "mother,grandmother", "sister,mother", "", "other")
R <- strsplit(relative, ',')
r <- unique(unlist(R))
result <- t(sapply(R, function(Ri) if (length(Ri)==0) rep(FALSE, length(r)) else r %in% Ri))
colnames(result) <- r
result
# > result
#       aunt mother grandmother sister other
# [1,]  TRUE  FALSE       FALSE  FALSE FALSE
# [2,] FALSE   TRUE        TRUE  FALSE FALSE
# [3,] FALSE   TRUE       FALSE   TRUE FALSE
# [4,] FALSE  FALSE       FALSE  FALSE FALSE
# [5,] FALSE  FALSE       FALSE  FALSE  TRUE

or (for integers):

+result
# > +result
#      aunt mother grandmother sister other
# [1,]    1      0           0      0     0
# [2,]    0      1           1      0     0
# [3,]    0      1           0      1     0
# [4,]    0      0           0      0     0
# [5,]    0      0           0      0     1

Storing unique values of each column (of a df) in list

Your for loop is almost right, just needs one fix to work:

# for loop
cols = names(df)
unique_values_by_col = list()
for (i in cols) {
  x = unique(df[[i]])
  unique_values_by_col[[i]] = x
}
unique_values_by_col
# $a
# [1] A B C D
# Levels: A B C D
# 
# $b
# [1] 1 2 3 4

i is just a character, the name of a column within df so unique(i) doesn't make sense.

Anyhow, the most standard way for this task is lapply() as shown by demirev.

R: Unbalanced panel, create dummy for unique observations

Using dplyr, you could avoid the loop and try this:

set.seed(123)
df <- data.frame(id = sample(1:10, 20, replace = TRUE),
             happy = sample(c("yes", "no"), 20, replace = TRUE))

library(dplyr)
df <- df %>%
  group_by(id) %>%
  mutate(dummy = ifelse(length(id)>=2, 1, 0))

> df
# A tibble: 20 x 3
# Groups:   id [10]
      id happy dummy
   <int> <fct> <dbl>
 1     3 no        1
 2     8 no        0
 3     5 no        1
 4     9 no        1
 5    10 no        1
 6     1 no        1
 7     6 no        1
 8     9 no        1
 9     6 yes       1  
10     5 yes       1
11    10 no        1
12     5 no        1
13     7 no        0
14     6 no        1
15     2 yes       0
16     9 yes       1
17     3 no        1
18     1 yes       1
19     4 yes       0
20    10 yes       1

Essentially, this approach divides up df by unique values of id and then creates a column dummy that takes the value 1 if there are more than two occurrences of that id and 0 if not.