Create Binary Column (0/1) Based on Condition in Another Column

Create binary column (0/1) based on condition in another column

You can also use ifelse which is vectorized if-else function

mydata$NewTemp <- ifelse(mydata$Temp>0, 1, 0)

add a binary column based on condition

Use numpy.where:

In [1923]: import numpy as np

In [1924]: df['hybrid'] = np.where(df.FUEL_TYPE.str.contains('+', regex=False), 'Y', 'N')

In [1925]: df
Out[1925]:
FUEL_CODE FUEL_TYPE hybrid
0 1 MARGE+PLUS Y
1 10 DIESEL N

how to conditionally create new column in dataframe based on other column values in julia

In Julia we have the beautiful Dot Syntax which can be gracefully applied here:

julia> df[!, :x] = 2 .<= df[!, :time] .<= 4     
6-element BitVector:
0
0
1
1
1
0

or alternatively

df.x = 2 .<= df.time .<= 4

How to create a new variable based on condition from different dataframe in R

Maybe try the following, if you data is relatively small, with dplyr. Assuming names of data.frames of df and df2. Using mutate to create your new column, and ifelse comparing each time in the first data.frame with t_start and t_end in your second data.frame.

library(dplyr) 

df %>%
rowwise() %>%
mutate(trial = ifelse(any(time > df2$t_start & time < df2$t_end), "wt", "iti"))

Output

  initiate  left right l_or_r  time trial
<int> <int> <int> <int> <dbl> <chr>
1 0 0 1 1 2.82 iti
2 0 0 1 1 2.82 iti
3 0 0 1 1 2.82 iti
4 0 0 1 1 2.83 iti
5 1 0 0 0 16.8 wt
6 1 0 0 0 16.8 wt

Assign values to a column depending on condition in another column in R

We can define a grouping variable that is the cumulative count of 1s in the seq column, and then assign state by group:

library(dplyr)
df %>%
group_by(grp = cumsum(seq == 1)) %>%
mutate(state = as.integer(any(num > 7.5))) %>%
ungroup()
# # A tibble: 14 × 4
# seq num grp state
# <int> <dbl> <int> <int>
# 1 1 0.1 1 0
# 2 2 0.1 1 0
# 3 3 0.2 1 0
# 4 1 0 2 0
# 5 2 0 2 0
# 6 3 0 2 0
# 7 1 0.5 3 1
# 8 2 2 3 1
# 9 3 6 3 1
# 10 4 9 3 1
# 11 5 12 3 1
# 12 1 0 4 0
# 13 2 0 4 0
# 14 3 0 4 0

Creating a new column based on if-elif-else condition

To formalize some of the approaches laid out above:

Create a function that operates on the rows of your dataframe like so:

def f(row):
if row['A'] == row['B']:
val = 0
elif row['A'] > row['B']:
val = 1
else:
val = -1
return val

Then apply it to your dataframe passing in the axis=1 option:

In [1]: df['C'] = df.apply(f, axis=1)

In [2]: df
Out[2]:
A B C
a 2 2 0
b 3 1 1
c 1 3 -1

Of course, this is not vectorized so performance may not be as good when scaled to a large number of records. Still, I think it is much more readable. Especially coming from a SAS background.

Edit

Here is the vectorized version

df['C'] = np.where(
df['A'] == df['B'], 0, np.where(
df['A'] > df['B'], 1, -1))

Create indicator column based on presence of 0/1 in all other columns

All the proposed solutions work, but they involve a sum calculation: if sum >= 1, then 1 appears at least once.
I finally found the solution that works as I imagined it, i.e. also with categorical variables.
It is necessary to go through if_any and if_all :

library(tidyverse)
set.seed(001)
#create a table for reproducible example
carac1 <- round(runif(100),0)
carac2 <- round(runif(100),0)
carac3 <- round(runif(100),0)
data <- data.frame(carac1,carac2,carac3)

data <- data %>%
mutate(carac_all = case_when(
if_any(carac1:carac3, ~.x == "1") == T ~ "yes at least one time",
if_all(carac1:carac3, ~.x == "0") == T ~ "always no",
TRUE ~ NA_character_))

Many thanks to all

Create new binary column conditional on other column in R data frame

Try this:

df <- data.frame(patient = c(1, 2, 3, 4),
hb_smaller_8 = c(0, 1, 0, 1))

df <- df %>% mutate(score = ifelse(hb_smaller_8 == 1, 5, 0))
> df
patient hb_smaller_8 score
1 1 0 NA
2 2 1 5
3 3 0 NA
4 4 1 5

Creating New Column based on condition on Other Column in Pandas DataFrame

import pandas as pd 

# initialize list of lists
data = [[1,'High School',7.884], [2,'Bachelors',6.952], [3,'High School',8.185], [4,'High School',6.556],[5,'Bachelors',6.347],[6,'Master',6.794]]

# Create the pandas DataFrame
df = pd.DataFrame(data, columns = ['ID', 'Education', 'Score'])

df['Labels'] = ['Bad' if x<7.000 else 'Good' if 7.000<=x<8.000 else 'Very Good' for x in df['Score']]
df

ID Education Score Labels
0 1 High School 7.884 Good
1 2 Bachelors 6.952 Bad
2 3 High School 8.185 Very Good
3 4 High School 6.556 Bad
4 5 Bachelors 6.347 Bad
5 6 Master 6.794 Bad


Related Topics



Leave a reply



Submit