Using If/Else on a Data Frame

Using If/Else on a data frame

Use ifelse:

frame$twohouses <- ifelse(frame$data>=2, 2, 1)
frame
data twohouses
1 0 1
2 1 1
3 2 2
4 3 2
5 4 2
...
16 0 1
17 2 2
18 1 1
19 2 2
20 0 1
21 4 2

The difference between if and ifelse:

  • if is a control flow statement, taking a single logical value as an argument
  • ifelse is a vectorised function, taking vectors as all its arguments.

The help page for if, accessible via ?"if" will also point you to ?ifelse

if else function in pandas dataframe

You can use numpy.where:

def my_fun (var1,var2,var3):
df[var3]= np.where((df[var1]-df[var2])>0, df[var1]-df[var2], 0)
return df

df1 = my_fun('age1','age2','diff')
print (df1)
age1 age2 diff
0 23 10 13
1 45 20 25
2 21 50 0

Error is better explain here.

Slowier solution with apply, where need axis=1 for data processing by rows:

def my_fun(x, var1, var2, var3):
print (x)
if (x[var1]-x[var2])>0 :
x[var3]=x[var1]-x[var2]
else:
x[var3]=0
return x

print (df.apply(lambda x: my_fun(x, 'age1', 'age2','diff'), axis=1))
age1 age2 diff
0 23 10 13
1 45 20 25
2 21 50 0

Also is possible use loc, but sometimes data can be overwritten:

def my_fun(x, var1, var2, var3):
print (x)
mask = (x[var1]-x[var2])>0
x.loc[mask, var3] = x[var1]-x[var2]
x.loc[~mask, var3] = 0

return x

print (my_fun(df, 'age1', 'age2','diff'))
age1 age2 diff
0 23 10 13.0
1 45 20 25.0
2 21 50 0.0

How to use if else statement in a dataframe when comparing dates?

Next time you should REALLY provide a reproducible example here I did it for you. My solution uses diff and ifelse as requested.

month <- c(1,1:5,5:6)
data <- (1:8)*(1:8)
df <- data.frame(cbind(month, data))

diffs <- sapply(df, diff)
diffs <- data.frame(rbind(NA, diffs))
df$result <- ifelse(diffs$month==0, diffs$data, 0)
df
month data result
1 1 1 NA
2 1 4 3
3 2 9 0
4 3 16 0
5 4 25 0
6 5 36 0
7 5 49 13
8 6 64 0

conditional if/else statements across two data frames in R

The logic in your answer looks solid, it just doesn't yet scale to the other combinations you need. To do that, I'd reshape the data into a long form so you have one column of geographic levels and one of zones.

library(dplyr)
library(tidyr)

true_map_long <- true_map %>%
gather(key = level, value = value, -MapSection)
obsrvd_map_long <- obsrvd_map %>%
gather(key = level, value = value, -MapSection)

Both are shaped like:

head(true_map_long)
#> MapSection level value
#> 1 mapsection1 Country Canada
#> 2 mapsection2 Country Canada
#> 3 mapsection3 Country Canada
#> 4 mapsection4 Country UnitedStates
#> 5 mapsection5 Country UnitedStates
#> 6 mapsection1 Region Ontario

Join these two long-shaped tables by map section and level, and give appropriate suffixes to make it clearer which is which. The case_when is essentially the same, but now you're not tied to one location.

joined <- inner_join(
true_map_long,
obsrvd_map_long,
by = c("MapSection", "level"),
suffix = c("_t", "_o")
) %>%
mutate(truth = case_when(
value_t == value_o ~ "TP",
is.na(value_t) == is.na(value_o) ~ "TN",
is.na(value_t) & !is.na(value_o) ~ "FP",
!is.na(value_t) & is.na(value_o) ~ "FN",
))
head(joined)
#> MapSection level value_t value_o truth
#> 1 mapsection1 Country Canada Canada TP
#> 2 mapsection2 Country Canada Canada TP
#> 3 mapsection3 Country Canada Canada TP
#> 4 mapsection4 Country UnitedStates UnitedStates TP
#> 5 mapsection5 Country UnitedStates UnitedStates TP
#> 6 mapsection1 Region Ontario Ontario TP

Then drop the value columns and spread to a wide shape again. You could do this and the joining in one step; breaking into two parts was just easier for explaining.

joined %>%
select(-starts_with("value")) %>%
spread(key = level, value = truth)
#> MapSection Country Region Zone
#> 1 mapsection1 TP TP TP
#> 2 mapsection2 TP TP TP
#> 3 mapsection3 TP FN TN
#> 4 mapsection4 TP TP TP
#> 5 mapsection5 TP TP FP

Created on 2019-05-31 by the reprex package (v0.3.0)

Using conditional if/else logic with pandas dataframe columns

Do not use apply, which is very slow. Use np.where

pw2 = df.pw2.fillna(-np.inf)
df['winner'] = np.where(df.pw1 > pw2, df.Name1, df.Name2)

Once NaNs always lose, can just fillna() it with -np.inf to yield same logic.


Looking at your code, we can point out several problems. First, you are comparing df['pw1'] = None, which is invalid python syntax for comparison. You usually want to compare things using == operator. However, for None, it is recommended to use is, such as if variable is None: (...). However again, you are in a pandas/numpy environment, where there actually several values for null values (None, NaN, NaT, etc).

So, it is preferable to check for nullability using pd.isnull() or df.isnull().

Just to illustrate, this is how your code should look like:

def final_winner(df):
if pd.isnull(df['pw1']) and not pd.isnull(df['pw2']):
return df['Name1']
elif pd.isnull(df['pw2']) and not pd.isnull(df['pw1']):
return df['Name1']
elif df['pw2'] > df['pw1']:
return df['Name2']
else:
return df['Name1']

df['winner'] = df.apply(final_winner, axis=1)

But again, definitely use np.where.

How to write an if statement for a function argument that is a dataframe

As I said in comment, it's not possible to get the dataframe name inside your function but there is an elegant solution. You can use attrs dict of a dataframe (note the warning).

def calc_mean_max(df):
if df.attrs['name'] == "a_df":
#do formatting
else:
#do regular calculations


a_df = pd.DataFrame(...)
a_df.attrs['name'] = 'a_df'

b_df = pd.DataFrame(...)
b_df.attrs['name'] = 'b_df'

In R how to use an ifelse() with a vector or dataframe for classification

There is a dplyr way to do this.

library(dplyr)
sp_data %>%
inner_join(size_data, by = c("X1" = "S1")) %>%
mutate(X4 = case_when(X2 >= S2 ~ "above",
TRUE ~ "below")) %>%
select(-S2)
X1 X2 X4
1 fish1 20 below
2 fish1 30 above
3 fish2 32 above
4 fish2 21 below
5 fish3 50 above


Related Topics



Leave a reply



Submit