Call Apply-Like Function on Each Row of Dataframe With Multiple Arguments from Each Row

Call apply-like function on each row of dataframe with multiple arguments from each row

You can apply apply to a subset of the original data.

 dat <- data.frame(x=c(1,2), y=c(3,4), z=c(5,6))
apply(dat[,c('x','z')], 1, function(x) sum(x) )

or if your function is just sum use the vectorized version:

rowSums(dat[,c('x','z')])
[1] 6 8

If you want to use testFunc

 testFunc <- function(a, b) a + b
apply(dat[,c('x','z')], 1, function(x) testFunc(x[1],x[2]))

EDIT To access columns by name and not index you can do something like this:

 testFunc <- function(a, b) a + b
apply(dat[,c('x','z')], 1, function(y) testFunc(y['z'],y['x']))

Call a custom function on each row of dataframe with multiple arguments from each row

In R, and especially in your case, you can make use of vectorised functions. They work on the complete vector, so you don't have to apply the function separately for every row, but can directly supply the complete columns:

df <- data.frame(Name=c('John','Tom','Sarah'), Quantity=c(3,4,5), Price=c(5,6,7))

my_vectorised_fun <- function(name, quantity, price) {
sales <- quantity * price

# check for which the name doesn't fit
index_names <- !name %in% c("John", "Tom")
sales[index_names] <- NA

sales
}

library(dplyr)
df %>%
mutate(Sales = my_vectorised_fun(Name, Quantity, Price))
#> Name Quantity Price Sales
#> 1 John 3 5 15
#> 2 Tom 4 6 24
#> 3 Sarah 5 7 NA

Created on 2021-02-19 by the reprex package (v0.3.0)



Edit

Here is a version where you pass the complete .data pronoun to the function and only have to specify the names in the function:

df <- data.frame(Name=c('John','Tom','Sarah'), Quantity=c(3,4,5), Price=c(5,6,7))

my_vectorised_fun <- function(all_data) {
sales <- all_data[["Quantity"]] * all_data[["Price"]]

# check for which the name doesn't fit
index_names <- !all_data[["Name"]] %in% c("John", "Tom")
sales[index_names] <- NA

sales
}

library(dplyr)
df %>%
mutate(Sales = my_vectorised_fun(.data))
#> Name Quantity Price Sales
#> 1 John 3 5 15
#> 2 Tom 4 6 24
#> 3 Sarah 5 7 NA

Created on 2021-02-19 by the reprex package (v0.3.0)

How to run function on each row of a dataframe, while using multiple arguments from that dataframe, and outputting new dataframes in a list

You can try to first create your now variable with mutate, then split your data.frame with a row_number index.

library(dplyr)

a %>% mutate(sum_of_rows=rowSums(.)) %>%
split(1:nrow(a)) %>%
setNames(paste0('r', 1:nrow(a))

That will work if you want a whole row of the data.frame for every element of the list.

If you just want a list of data.frames with a single element each, as in your example, you can make it simply with:

rowSums(a) %>%
as.data.frame() %>%
split(1:nrow(.)) %>%
setNames(paste0('r', 1:nrow(a))

Or with just some base R:

setNames(split(as.data.frame(rowSums(a)), 1:nrow(a)), paste0('r', 1:nrow(a))

Call R apply-like function on each column of dataframe with the remained columns as argument?

We could loop over the sequence of columns of dataset in sapply/lapply, extract the column of dataset with that index for the Y and the remaining columns with - on the index, apply the testfun, assign an already initialized numeric vector (of same length as number of columns of dataset) based on the index (-i), return the vector and transpose the output of sapply

v1 <- numeric(ncol(df));
t(sapply(seq_along(df), function(i) {
v1[-i] <- testfun(as.matrix(df[i]), df[-i])
v1
}))

-output

#      [,1] [,2] [,3]
#[1,] 0.0 8.5 12.5
#[2,] 6.5 0.0 14.5
#[3,] 8.5 12.5 0.0

Or this can be done with tidyverse

library(dplyr)
df %>%
summarise(across(everything(), ~ testfun(., select(df, -cur_column()))))
# x y z
#1 8.5 6.5 8.5
#2 12.5 14.5 12.5

How to apply a function with multiple arguments and create a dataframe?

Change your function to accept string arguments :

fre <- function(.data, var) {
abc <- questionr::na.rm(.data[, var])
abc <- questionr::freq(abc)
abc <- cbind(Label = rownames(abc), abc)
abc <- questionr::rename.variable(abc, "n", "Frequency")
abc <- questionr::rename.variable(abc, "%", "Percent")
abc <- tidyr::separate(abc, Label, into = c("Value", "Label"), sep = "] ")
row.names(abc) <- NULL
abc <- abc %>% dplyr::mutate(Value = gsub("\\[|\\]", "", Value)) %>%
dplyr::select(Label, Value, Frequency, Percent) %>%
select(Label, Percent)
abc$Percent <- paste0(round(abc$Percent), "%")
abc <- abc %>%
tidyr::pivot_wider(names_from = Label, values_from = Percent)
Label <- var_label(.data[[var]])
Name <- var
abc <- cbind(Name, Label, abc)
abc
}

Then pass column names to fre function as string using lapply.

cols <- c('Q03', 'Q06', 'Q07', 'Q08', 'Q10')
result <- do.call(rbind, lapply(cols, fre, .data = dat))
#Or a bit shorter
#result <- purrr::map_df(cols, fre, .data = dat))
result

# Name Label Strongly agree Agree Neither Disagree
#1 Q03 Standard deviations excite me 19% 26% 34% 17%
#2 Q06 I have little experience of computers 27% 44% 13% 10%
#3 Q07 All computers hate me 7% 34% 26% 24%
#4 Q08 I have never been good at mathematics 15% 58% 19% 6%
#5 Q10 Computers are useful only for playing games 14% 57% 18% 10%
# Strongly disagree
#1 3%
#2 6%
#3 8%
#4 3%
#5 2%

Apply a function to each row in a data frame in R

You want apply (see the docs for it). apply(var,1,fun) will apply to rows, apply(var,2,fun) will apply to columns.

> apply(a,1,min)
[1] 1 0 3

Apply function to each DataFrame row, without returning a Series

This operation can already be directly vectorized by-row, so you can avoid using .apply(), which will be tremendously faster

Canonical Answer for How to iterate over rows in a DataFrame in Pandas?

You won't be able to avoid using memory for the results because they need to go somewhere, but you could throw out columns you no longer need before or after performing the calculation

Just keeping the results in a dataframe column (Series) rather than a list of native ints will be a memory savings, but you may find that explicitly setting or reducing the datatypes of your dataframe is a big savings if they're not in their most efficient types already (for example from int64 to uint16 or even uint8 (which will still contain the example values)

>>> df = pd.DataFrame({"col1": [2,10], "col2": [3,12], "col3": [5,4]})
>>> df
col1 col2 col3
0 2 3 5
1 10 12 4
>>> df["2xy"] = 2 * df["col2"] * df["col3"]
>>> df
col1 col2 col3 2xy
0 2 3 5 30
1 10 12 4 96

how do i use multiple columns of a df as input to a function?

apply with axis=1 calls your function for each row, but with 1 parameter: the row as a Series object. So you either need to revise your function definition to take a single row instead of multiple parameters, or wrap your function call in a lambda function which extracts the values from each row and calls the function with them.

  1. Revising your function to take a single row

    Instead of this:

    def delta(S, K, t, r, sigma):
    # ...

    do this:

    def delta(row):
    S, K, t, r, sigma = row.tolist()
    # ...
  2. Wrapping your function call in a lambda function

    Instead of this:

    calls['delta'] = calls[['callput','underlyinglast','strike','yte','rfr','hvol90']].apply(delta,axis=1)

    do this:

    calls['delta'] = calls[['callput','underlyinglast','strike','yte','rfr','hvol90']].apply(lambda row: delta(*row), axis=1)

    (the trick there is to use lambda row: delta(*row) instead of just delta; *row basically "spreads" the items in row across the separate arguments of delta)



Related Topics



Leave a reply



Submit