Apply Function Conditionally

Apply conditional function to a dataframe

You don't need to use an apply() function here, you can just use ifelse():

df$output <- ifelse(df$var3 > df$var1, df$var2*df$var4, df$var2)

Apply function conditionally

There are a lot of alternatives to do this. Note that if you are interested in another function different from sum, then just change the argument FUN=any.function, e.g, if you want mean, var length, etc, then just plug those functions into FUN argument, e.g, FUN=mean, FUN=var and so on. Let's explore some alternatives:

aggregate function in base.

> aggregate(results ~ experiment, FUN=sum, data=DF)
experiment results
1 A 86.3
2 B 986.0

Or maybe tapply ?

> with(DF, tapply(results, experiment, FUN=sum))
A B
86.3 986.0

Also ddply from plyr package

> # library(plyr)
> ddply(DF[, -2], .(experiment), numcolwise(sum))
experiment results
1 A 86.3
2 B 986.0

> ## Alternative syntax
> ddply(DF, .(experiment), summarize, sumResults = sum(results))
experiment sumResults
1 A 86.3
2 B 986.0

Also the dplyr package

> require(dplyr)
> DF %>% group_by(experiment) %>% summarise(sumResults = sum(results))
Source: local data frame [2 x 2]

experiment sumResults
1 A 86.3
2 B 986.0

Using sapply and split, equivalent to tapply.

> with(DF, sapply(split(results, experiment), sum))
A B
86.3 986.0

If you are concern about timing, data.table is your friend:

> # library(data.table)
> DT <- data.table(DF)
> DT[, sum(results), by=experiment]
experiment V1
1: A 86.3
2: B 986.0

Not so popular, but doBy package is nice (equivalent to aggregate, even in syntax!)

> # library(doBy)
> summaryBy(results~experiment, FUN=sum, data=DF)
experiment results.sum
1 A 86.3
2 B 986.0

Also by helps in this situation

> (Aggregate.sums <- with(DF, by(results, experiment, sum)))
experiment: A
[1] 86.3
-------------------------------------------------------------------------
experiment: B
[1] 986

If you want the result to be a matrix then use either cbind or rbind

> cbind(results=Aggregate.sums)
results
A 86.3
B 986.0

sqldf from sqldf package also could be a good option

> library(sqldf)
> sqldf("select experiment, sum(results) `sum.results`
from DF group by experiment")
experiment sum.results
1 A 86.3
2 B 986.0

xtabs also works (only when FUN=sum)

> xtabs(results ~ experiment, data=DF)
experiment
A B
86.3 986.0

Apply a conditional function in a nested dataframe

We filter the data and then use map to loop over the list 'data'

library(dplyr)
library(purrr)
library(ggplot2)

df2 <- df %>%
filter(manufacturer %in% manufacturers_vector) %>%
mutate(out = map(data, ~ func(.x$drv, .x$cty)))

-output

df2
# A tibble: 3 x 3
# Groups: manufacturer [3]
# manufacturer data out
# <chr> <list> <list>
#1 audi <tibble [18 × 10]> <dbl [18]>
#2 chevrolet <tibble [19 × 10]> <dbl [19]>
#3 jeep <tibble [8 × 10]> <dbl [8]>

-out column output

df2$out
#[[1]]
# [1] 0 21 20 21 0 0 0 18 16 20 19 15 17 17 15 15 17 16

#[[2]]
# [1] 14 11 14 13 12 16 15 16 15 15 14 11 11 14 0 22 0 0 0

#[[3]]
#[1] 17 15 15 14 9 14 13 11

If we want to keep the original data as such without filter, then use map_if

df %>% 
mutate(out = map_if(data, .f = ~ func(.x$drv, .x$cty),
.p = manufacturer %in% manufacturers_vector, .else = ~ NA_real_))

-output

# A tibble: 15 x 3
# Groups: manufacturer [15]
# manufacturer data out
# <chr> <list> <list>
# 1 audi <tibble [18 × 10]> <dbl [18]>
# 2 chevrolet <tibble [19 × 10]> <dbl [19]>
# 3 dodge <tibble [37 × 10]> <dbl [1]>
# 4 ford <tibble [25 × 10]> <dbl [1]>
# 5 honda <tibble [9 × 10]> <dbl [1]>
# 6 hyundai <tibble [14 × 10]> <dbl [1]>
# 7 jeep <tibble [8 × 10]> <dbl [8]>
# 8 land rover <tibble [4 × 10]> <dbl [1]>
# 9 lincoln <tibble [3 × 10]> <dbl [1]>
#10 mercury <tibble [4 × 10]> <dbl [1]>
#11 nissan <tibble [13 × 10]> <dbl [1]>
#12 pontiac <tibble [5 × 10]> <dbl [1]>
#13 subaru <tibble [14 × 10]> <dbl [1]>
#14 toyota <tibble [34 × 10]> <dbl [1]>
#15 volkswagen <tibble [27 × 10]> <dbl [1]>

How to conditionally use `pandas.DataFrame.apply` based on values in a certain column?

Filter your dataframe first then apply my_func. Let's use query:

df1['new_column'] = df1.query('type == "A"').apply(my_func, axis=1)

Output:

   amount      back       file     front type  \
0 3 21973805 filename2 21889611 A
1 4 36403870 filename2 36357723 A
2 5 277500 filename3 196312 A
3 1 19 filename4 11 B
4 2 120 filename4 42 B
5 1 3210 filename3 1992 C

new_column
0 [21921030, 21908574, 21971743]
1 [36391053, 36371413, 36394390, 36376405]
2 [198648, 263355, 197017, 261666, 260815]
3 NaN
4 NaN
5 NaN

Conditional apply() in r

I think you're making it too complicated. Just calculate for all then remove those you don't want:

DT$xp_ratio_y <- DT$driv_y_experience/DT$driv_y_age
DT$xp_ratio_y[DT$driv_y_add_flg !=1 ] <- 0

Pandas apply but only for rows where a condition is met

The other answers are excellent, but I thought I'd add one other approach that can be faster in some circumstances – using broadcasting and masking to achieve the same result:

import numpy as np

mask = (z['b'] != 0)
z_valid = z[mask]

z['c'] = 0
z.loc[mask, 'c'] = z_valid['a'] / np.log(z_valid['b'])

Especially with very large dataframes, this approach will generally be faster than solutions based on apply().

Pandas .apply with conditional if in different columns

Use below code-

df['Testing']=df.apply(lambda x: 1 if x['Liq_Factor']=='Nan'  else x['Use']/x['Tw'], axis=1)

Based on changes in comment section

df['Testing']=df.apply(lambda x: 1 if x['Liq_Factor']=='Nan'  else min(x['Use']/x['Tw'],1), axis=1)

Use an 'apply' function to perform code with conditional statements in R

Sure you can! I would first define a helper function that defines what is to be done with one specific column and then you call that function within apply:

    HelperFun <- function(x) {
# your code from above, replacing 'Seq1' by x
}
apply(First, 2, HelperFun)


Related Topics



Leave a reply



Submit