Using If/Else on a Data Frame

Using If/Else on a data frame

Use ifelse:

frame$twohouses <- ifelse(frame$data>=2, 2, 1)
frame
   data twohouses
1     0         1
2     1         1
3     2         2
4     3         2
5     4         2
...
16    0         1
17    2         2
18    1         1
19    2         2
20    0         1
21    4         2

The difference between if and ifelse:

if is a control flow statement, taking a single logical value as an argument
ifelse is a vectorised function, taking vectors as all its arguments.

The help page for if, accessible via ?"if" will also point you to ?ifelse

if else function in pandas dataframe

You can use numpy.where:

def my_fun (var1,var2,var3):
    df[var3]= np.where((df[var1]-df[var2])>0, df[var1]-df[var2], 0)
    return df

df1 = my_fun('age1','age2','diff')
print (df1)
   age1  age2  diff
0    23    10    13
1    45    20    25
2    21    50     0

Error is better explain here.

Slowier solution with apply, where need axis=1 for data processing by rows:

def my_fun(x, var1, var2, var3):
    print (x)
    if (x[var1]-x[var2])>0 :
        x[var3]=x[var1]-x[var2]
    else:
        x[var3]=0
    return x    

print (df.apply(lambda x: my_fun(x, 'age1', 'age2','diff'), axis=1))
   age1  age2  diff
0    23    10    13
1    45    20    25
2    21    50     0

Also is possible use loc, but sometimes data can be overwritten:

def my_fun(x, var1, var2, var3):
    print (x)
    mask = (x[var1]-x[var2])>0
    x.loc[mask, var3] = x[var1]-x[var2]
    x.loc[~mask, var3] = 0

    return x    

print (my_fun(df, 'age1', 'age2','diff'))
   age1  age2  diff
0    23    10  13.0
1    45    20  25.0
2    21    50   0.0

How to use if else statement in a dataframe when comparing dates?

Next time you should REALLY provide a reproducible example here I did it for you. My solution uses diff and ifelse as requested.

month <- c(1,1:5,5:6)
data <- (1:8)*(1:8)
df <- data.frame(cbind(month, data))

diffs <- sapply(df, diff) 
diffs <- data.frame(rbind(NA, diffs))
df$result <- ifelse(diffs$month==0, diffs$data, 0)
df
month data result
1     1    1     NA
2     1    4      3
3     2    9      0
4     3   16      0
5     4   25      0
6     5   36      0
7     5   49     13
8     6   64      0

conditional if/else statements across two data frames in R

The logic in your answer looks solid, it just doesn't yet scale to the other combinations you need. To do that, I'd reshape the data into a long form so you have one column of geographic levels and one of zones.

library(dplyr)
library(tidyr)

true_map_long <- true_map %>%
  gather(key = level, value = value, -MapSection)
obsrvd_map_long <- obsrvd_map %>%
  gather(key = level, value = value, -MapSection)

Both are shaped like:

head(true_map_long)
#>    MapSection   level        value
#> 1 mapsection1 Country       Canada
#> 2 mapsection2 Country       Canada
#> 3 mapsection3 Country       Canada
#> 4 mapsection4 Country UnitedStates
#> 5 mapsection5 Country UnitedStates
#> 6 mapsection1  Region      Ontario

Join these two long-shaped tables by map section and level, and give appropriate suffixes to make it clearer which is which. The case_when is essentially the same, but now you're not tied to one location.

joined <- inner_join(
  true_map_long,
  obsrvd_map_long,
  by = c("MapSection", "level"),
  suffix = c("_t", "_o")
) %>%
  mutate(truth = case_when(
    value_t == value_o  ~ "TP",
    is.na(value_t) == is.na(value_o)  ~ "TN",
    is.na(value_t) & !is.na(value_o)  ~ "FP",
    !is.na(value_t) & is.na(value_o)  ~ "FN",
  ))
head(joined)
#>    MapSection   level      value_t      value_o truth
#> 1 mapsection1 Country       Canada       Canada    TP
#> 2 mapsection2 Country       Canada       Canada    TP
#> 3 mapsection3 Country       Canada       Canada    TP
#> 4 mapsection4 Country UnitedStates UnitedStates    TP
#> 5 mapsection5 Country UnitedStates UnitedStates    TP
#> 6 mapsection1  Region      Ontario      Ontario    TP

Then drop the value columns and spread to a wide shape again. You could do this and the joining in one step; breaking into two parts was just easier for explaining.

joined %>%
  select(-starts_with("value")) %>%
  spread(key = level, value = truth)
#>    MapSection Country Region Zone
#> 1 mapsection1      TP     TP   TP
#> 2 mapsection2      TP     TP   TP
#> 3 mapsection3      TP     FN   TN
#> 4 mapsection4      TP     TP   TP
#> 5 mapsection5      TP     TP   FP

^{Created on 2019-05-31 by the reprex package (v0.3.0)}

Using conditional if/else logic with pandas dataframe columns

Do not use apply, which is very slow. Use np.where

pw2 = df.pw2.fillna(-np.inf)
df['winner'] = np.where(df.pw1 > pw2, df.Name1, df.Name2)

Once NaNs always lose, can just fillna() it with -np.inf to yield same logic.

Looking at your code, we can point out several problems. First, you are comparing df['pw1'] = None, which is invalid python syntax for comparison. You usually want to compare things using == operator. However, for None, it is recommended to use is, such as if variable is None: (...). However again, you are in a pandas/numpy environment, where there actually several values for null values (None, NaN, NaT, etc).

So, it is preferable to check for nullability using pd.isnull() or df.isnull().

Just to illustrate, this is how your code should look like:

def final_winner(df):
    if pd.isnull(df['pw1']) and not pd.isnull(df['pw2']):
        return df['Name1']
    elif pd.isnull(df['pw2']) and not pd.isnull(df['pw1']):
        return df['Name1']
    elif df['pw2'] > df['pw1']:
        return df['Name2']
    else:
        return df['Name1']

df['winner'] = df.apply(final_winner, axis=1)

But again, definitely use np.where.

How to write an if statement for a function argument that is a dataframe

As I said in comment, it's not possible to get the dataframe name inside your function but there is an elegant solution. You can use attrs dict of a dataframe (note the warning).

def calc_mean_max(df):
    if df.attrs['name'] == "a_df":
        #do formatting
    else:
        #do regular calculations


a_df = pd.DataFrame(...)
a_df.attrs['name'] = 'a_df'

b_df = pd.DataFrame(...)
b_df.attrs['name'] = 'b_df'

In R how to use an ifelse() with a vector or dataframe for classification

There is a dplyr way to do this.

library(dplyr)
sp_data %>%
  inner_join(size_data, by = c("X1" = "S1")) %>%
  mutate(X4 = case_when(X2 >= S2 ~ "above",
                        TRUE ~ "below")) %>%
select(-S2)
     X1 X2    X4
1 fish1 20 below
2 fish1 30 above
3 fish2 32 above
4 fish2 21 below
5 fish3 50 above

Using If/Else on a Data Frame