Using If/Else on a data frame
Use ifelse
:
frame$twohouses <- ifelse(frame$data>=2, 2, 1)
frame
data twohouses
1 0 1
2 1 1
3 2 2
4 3 2
5 4 2
...
16 0 1
17 2 2
18 1 1
19 2 2
20 0 1
21 4 2
The difference between if
and ifelse
:
if
is a control flow statement, taking a single logical value as an argumentifelse
is a vectorised function, taking vectors as all its arguments.
The help page for if
, accessible via ?"if"
will also point you to ?ifelse
if else function in pandas dataframe
You can use numpy.where
:
def my_fun (var1,var2,var3):
df[var3]= np.where((df[var1]-df[var2])>0, df[var1]-df[var2], 0)
return df
df1 = my_fun('age1','age2','diff')
print (df1)
age1 age2 diff
0 23 10 13
1 45 20 25
2 21 50 0
Error is better explain here.
Slowier solution with apply
, where need axis=1
for data processing by rows:
def my_fun(x, var1, var2, var3):
print (x)
if (x[var1]-x[var2])>0 :
x[var3]=x[var1]-x[var2]
else:
x[var3]=0
return x
print (df.apply(lambda x: my_fun(x, 'age1', 'age2','diff'), axis=1))
age1 age2 diff
0 23 10 13
1 45 20 25
2 21 50 0
Also is possible use loc
, but sometimes data can be overwritten:
def my_fun(x, var1, var2, var3):
print (x)
mask = (x[var1]-x[var2])>0
x.loc[mask, var3] = x[var1]-x[var2]
x.loc[~mask, var3] = 0
return x
print (my_fun(df, 'age1', 'age2','diff'))
age1 age2 diff
0 23 10 13.0
1 45 20 25.0
2 21 50 0.0
How to use if else statement in a dataframe when comparing dates?
Next time you should REALLY provide a reproducible example here I did it for you. My solution uses diff
and ifelse
as requested.
month <- c(1,1:5,5:6)
data <- (1:8)*(1:8)
df <- data.frame(cbind(month, data))
diffs <- sapply(df, diff)
diffs <- data.frame(rbind(NA, diffs))
df$result <- ifelse(diffs$month==0, diffs$data, 0)
df
month data result
1 1 1 NA
2 1 4 3
3 2 9 0
4 3 16 0
5 4 25 0
6 5 36 0
7 5 49 13
8 6 64 0
conditional if/else statements across two data frames in R
The logic in your answer looks solid, it just doesn't yet scale to the other combinations you need. To do that, I'd reshape the data into a long form so you have one column of geographic levels and one of zones.
library(dplyr)
library(tidyr)
true_map_long <- true_map %>%
gather(key = level, value = value, -MapSection)
obsrvd_map_long <- obsrvd_map %>%
gather(key = level, value = value, -MapSection)
Both are shaped like:
head(true_map_long)
#> MapSection level value
#> 1 mapsection1 Country Canada
#> 2 mapsection2 Country Canada
#> 3 mapsection3 Country Canada
#> 4 mapsection4 Country UnitedStates
#> 5 mapsection5 Country UnitedStates
#> 6 mapsection1 Region Ontario
Join these two long-shaped tables by map section and level, and give appropriate suffixes to make it clearer which is which. The case_when
is essentially the same, but now you're not tied to one location.
joined <- inner_join(
true_map_long,
obsrvd_map_long,
by = c("MapSection", "level"),
suffix = c("_t", "_o")
) %>%
mutate(truth = case_when(
value_t == value_o ~ "TP",
is.na(value_t) == is.na(value_o) ~ "TN",
is.na(value_t) & !is.na(value_o) ~ "FP",
!is.na(value_t) & is.na(value_o) ~ "FN",
))
head(joined)
#> MapSection level value_t value_o truth
#> 1 mapsection1 Country Canada Canada TP
#> 2 mapsection2 Country Canada Canada TP
#> 3 mapsection3 Country Canada Canada TP
#> 4 mapsection4 Country UnitedStates UnitedStates TP
#> 5 mapsection5 Country UnitedStates UnitedStates TP
#> 6 mapsection1 Region Ontario Ontario TP
Then drop the value columns and spread to a wide shape again. You could do this and the joining in one step; breaking into two parts was just easier for explaining.
joined %>%
select(-starts_with("value")) %>%
spread(key = level, value = truth)
#> MapSection Country Region Zone
#> 1 mapsection1 TP TP TP
#> 2 mapsection2 TP TP TP
#> 3 mapsection3 TP FN TN
#> 4 mapsection4 TP TP TP
#> 5 mapsection5 TP TP FP
Created on 2019-05-31 by the reprex package (v0.3.0)
Using conditional if/else logic with pandas dataframe columns
Do not use apply
, which is very slow. Use np.where
pw2 = df.pw2.fillna(-np.inf)
df['winner'] = np.where(df.pw1 > pw2, df.Name1, df.Name2)
Once NaN
s always lose, can just fillna()
it with -np.inf
to yield same logic.
Looking at your code, we can point out several problems. First, you are comparing df['pw1'] = None
, which is invalid python syntax for comparison. You usually want to compare things using ==
operator. However, for None
, it is recommended to use is
, such as if variable is None: (...)
. However again, you are in a pandas/numpy
environment, where there actually several values for null values (None
, NaN
, NaT
, etc).
So, it is preferable to check for nullability using pd.isnull()
or df.isnull()
.
Just to illustrate, this is how your code should look like:
def final_winner(df):
if pd.isnull(df['pw1']) and not pd.isnull(df['pw2']):
return df['Name1']
elif pd.isnull(df['pw2']) and not pd.isnull(df['pw1']):
return df['Name1']
elif df['pw2'] > df['pw1']:
return df['Name2']
else:
return df['Name1']
df['winner'] = df.apply(final_winner, axis=1)
But again, definitely use np.where
.
How to write an if statement for a function argument that is a dataframe
As I said in comment, it's not possible to get the dataframe name inside your function but there is an elegant solution. You can use attrs
dict of a dataframe (note the warning).
def calc_mean_max(df):
if df.attrs['name'] == "a_df":
#do formatting
else:
#do regular calculations
a_df = pd.DataFrame(...)
a_df.attrs['name'] = 'a_df'
b_df = pd.DataFrame(...)
b_df.attrs['name'] = 'b_df'
In R how to use an ifelse() with a vector or dataframe for classification
There is a dplyr
way to do this.
library(dplyr)
sp_data %>%
inner_join(size_data, by = c("X1" = "S1")) %>%
mutate(X4 = case_when(X2 >= S2 ~ "above",
TRUE ~ "below")) %>%
select(-S2)
X1 X2 X4
1 fish1 20 below
2 fish1 30 above
3 fish2 32 above
4 fish2 21 below
5 fish3 50 above
Related Topics
Examples of the Perils of Globals in R and Stata
Argument Is of Length Zero in If Statement
Perform a Semi-Join with Data.Table
Count Number of Zeros Per Row, and Remove Rows with More Than N Zeros
Using a Pre-Defined Color Palette in Ggplot
Read.Csv, Header on First Line, Skip Second Line
Update a Value in One Column Based on Criteria in Other Columns
How to Produce Stacked Bars Within Grouped Barchart in R
In 'Knitr' How to Test for If the Output Will Be PDF or Word
Make Conditionalpanel Depend on Files Uploaded with Fileinput
Dealing with True, False, Na and Nan
Finding Overlaps Between Interval Sets/Efficient Overlap Joins
Dynamically Creating Tabs with Plots in Shiny Without Re-Creating Existing Tabs
Prevent Row Names to Be Written to File When Using Write.Csv