Create binary column (0/1) based on condition in another column
You can also use ifelse
which is vectorized if-else function
mydata$NewTemp <- ifelse(mydata$Temp>0, 1, 0)
add a binary column based on condition
Use numpy.where
:
In [1923]: import numpy as np
In [1924]: df['hybrid'] = np.where(df.FUEL_TYPE.str.contains('+', regex=False), 'Y', 'N')
In [1925]: df
Out[1925]:
FUEL_CODE FUEL_TYPE hybrid
0 1 MARGE+PLUS Y
1 10 DIESEL N
how to conditionally create new column in dataframe based on other column values in julia
In Julia we have the beautiful Dot Syntax which can be gracefully applied here:
julia> df[!, :x] = 2 .<= df[!, :time] .<= 4
6-element BitVector:
0
0
1
1
1
0
or alternatively
df.x = 2 .<= df.time .<= 4
How to create a new variable based on condition from different dataframe in R
Maybe try the following, if you data is relatively small, with dplyr
. Assuming names of data.frames of df
and df2
. Using mutate
to create your new column, and ifelse
comparing each time
in the first data.frame with t_start
and t_end
in your second data.frame.
library(dplyr)
df %>%
rowwise() %>%
mutate(trial = ifelse(any(time > df2$t_start & time < df2$t_end), "wt", "iti"))
Output
initiate left right l_or_r time trial
<int> <int> <int> <int> <dbl> <chr>
1 0 0 1 1 2.82 iti
2 0 0 1 1 2.82 iti
3 0 0 1 1 2.82 iti
4 0 0 1 1 2.83 iti
5 1 0 0 0 16.8 wt
6 1 0 0 0 16.8 wt
Assign values to a column depending on condition in another column in R
We can define a grouping variable that is the cumulative count of 1
s in the seq
column, and then assign state
by group:
library(dplyr)
df %>%
group_by(grp = cumsum(seq == 1)) %>%
mutate(state = as.integer(any(num > 7.5))) %>%
ungroup()
# # A tibble: 14 × 4
# seq num grp state
# <int> <dbl> <int> <int>
# 1 1 0.1 1 0
# 2 2 0.1 1 0
# 3 3 0.2 1 0
# 4 1 0 2 0
# 5 2 0 2 0
# 6 3 0 2 0
# 7 1 0.5 3 1
# 8 2 2 3 1
# 9 3 6 3 1
# 10 4 9 3 1
# 11 5 12 3 1
# 12 1 0 4 0
# 13 2 0 4 0
# 14 3 0 4 0
Creating a new column based on if-elif-else condition
To formalize some of the approaches laid out above:
Create a function that operates on the rows of your dataframe like so:
def f(row):
if row['A'] == row['B']:
val = 0
elif row['A'] > row['B']:
val = 1
else:
val = -1
return val
Then apply it to your dataframe passing in the axis=1
option:
In [1]: df['C'] = df.apply(f, axis=1)
In [2]: df
Out[2]:
A B C
a 2 2 0
b 3 1 1
c 1 3 -1
Of course, this is not vectorized so performance may not be as good when scaled to a large number of records. Still, I think it is much more readable. Especially coming from a SAS background.
Edit
Here is the vectorized version
df['C'] = np.where(
df['A'] == df['B'], 0, np.where(
df['A'] > df['B'], 1, -1))
Create indicator column based on presence of 0/1 in all other columns
All the proposed solutions work, but they involve a sum calculation: if sum >= 1, then 1 appears at least once.
I finally found the solution that works as I imagined it, i.e. also with categorical variables.
It is necessary to go through if_any and if_all :
library(tidyverse)
set.seed(001)
#create a table for reproducible example
carac1 <- round(runif(100),0)
carac2 <- round(runif(100),0)
carac3 <- round(runif(100),0)
data <- data.frame(carac1,carac2,carac3)
data <- data %>%
mutate(carac_all = case_when(
if_any(carac1:carac3, ~.x == "1") == T ~ "yes at least one time",
if_all(carac1:carac3, ~.x == "0") == T ~ "always no",
TRUE ~ NA_character_))
Many thanks to all
Create new binary column conditional on other column in R data frame
Try this:
df <- data.frame(patient = c(1, 2, 3, 4),
hb_smaller_8 = c(0, 1, 0, 1))
df <- df %>% mutate(score = ifelse(hb_smaller_8 == 1, 5, 0))
> df
patient hb_smaller_8 score
1 1 0 NA
2 2 1 5
3 3 0 NA
4 4 1 5
Creating New Column based on condition on Other Column in Pandas DataFrame
import pandas as pd
# initialize list of lists
data = [[1,'High School',7.884], [2,'Bachelors',6.952], [3,'High School',8.185], [4,'High School',6.556],[5,'Bachelors',6.347],[6,'Master',6.794]]
# Create the pandas DataFrame
df = pd.DataFrame(data, columns = ['ID', 'Education', 'Score'])
df['Labels'] = ['Bad' if x<7.000 else 'Good' if 7.000<=x<8.000 else 'Very Good' for x in df['Score']]
df
ID Education Score Labels
0 1 High School 7.884 Good
1 2 Bachelors 6.952 Bad
2 3 High School 8.185 Very Good
3 4 High School 6.556 Bad
4 5 Bachelors 6.347 Bad
5 6 Master 6.794 Bad
Related Topics
Combining S4 and S3 Methods in a Single Function
Edit Datatable in Shiny with Dropdown Selection for Factor Variables
Compute Monthly Averages from Daily Data
R: Ggplot Stacked Bar Chart with Counts on Y Axis But Percentage as Label
Adding Column If It Does Not Exist
How to Rbind Vectors Matching Their Column Names
Pad with Leading Zeros to Common Width
How to Strip Dollar Signs ($) from Data/ Escape Special Characters in R
Dealing with Very Small Numbers in R
Can Ggplot2 Control Point Size and Line Size (Lineweight) Separately in One Legend
In Ggplot2, Coord_Flip and Free Scales Don't Work Together
How to Add Annotations Below the X Axis in Ggplot2
Convert 12 Hour Character Time to 24 Hour
Merge Dataframes, Different Lengths
R Text File and Text Mining...How to Load Data