Apply a Function to Each Data Frame

Apply function to each cell in DataFrame

You can use applymap() which is concise for your case.

df.applymap(foo_bar)

# A B C
#0 wow bar wow bar
#1 bar wow wow bar

Another option is to vectorize your function and then use apply method:

import numpy as np
df.apply(np.vectorize(foo_bar))
# A B C
#0 wow bar wow bar
#1 bar wow wow bar

How to apply a function to every element in a dataframe?

Since your problem requires access to both the index and column labels of your df you probably want df.apply().

df.apply() has access to a pandas.Series representing each row/column (dependent on axis argument value) and you will have access to the column name and index; whereas df.applymap() utilises each individual value of df at runtime - so you wouldn't necessarily have access to the index and column name as required.

Example

import numpy as np
import pandas as pd

def foo(name, index):
return name - index

x = np.arange(0, 2.01, 0.25)
y = np.arange(10, 30, 5.0)

df = pd.DataFrame(index = x, columns = y)

df.apply(lambda x: foo(x.name, x.index))

Output

       10.0   15.0   20.0   25.0
0.00 10.00 15.00 20.00 25.00
0.25 9.75 14.75 19.75 24.75
0.50 9.50 14.50 19.50 24.50
0.75 9.25 14.25 19.25 24.25
1.00 9.00 14.00 19.00 24.00
1.25 8.75 13.75 18.75 23.75
1.50 8.50 13.50 18.50 23.50
1.75 8.25 13.25 18.25 23.25
2.00 8.00 13.00 18.00 23.00

In the above example the column name and index of each Series constituting df is passed to foo() by way of df.apply(). Within foo() each value is defined by subtracting it's own index value from it's own column name value. Here you can see that the index value for each row is accessed using x.index and the column value is accessed using x.name within the call within df.apply().

Update

Many thanks to @SyntaxError for pointing out that x.index and x.name could be passed to foo() within df.apply() instead of feeding the entire Series (x) into the function and accessing the values manually therein. As mentioned, this seems to fit OP's use case in a much neater manner than my original response - which was largely the same but passed each x series into foo() which then had responsibility for extracting x.name and x.column.

apply function to every element in data.frame and return data.frame

df <- data.frame(c(1,2,3), c(2,3,4))
df[] <- lapply(df, function(x) paste(x,"x", sep=""))
df

df[] preserves the dataframe's structure.

Apply a function to each element of each dataframe in a list

Apply the same functions using lapply. This applies both centered and scaled function together.

lapply(l, function(y) apply(y, 2, function(x) {
x = x - mean(x)
x/sqrt(sd(x))
}))

#[[1]]
# A B
#[1,] -0.5946036 -0.8408964
#[2,] 0.5946036 0.8408964

#[[2]]
# A B
#[1,] -1.3201676 -1.3201676
#[2,] -0.4400559 -0.4400559
#[3,] 0.4400559 0.4400559
#[4,] 1.3201676 1.3201676

If you want them separately

centered <- lapply(l, function(y) apply(y, 2, function(x) x - mean(x)))
scaled <- lapply(centered, function(y) apply(y, 2, function(x) x/sqrt(sd(x))))

Apply a function to each column of a data.frame and organize the output

In this case, you would like to use lapply. It will handle each column of the data.frame, as it actually is a list of equal-length vectors, and return a two column data.frame each.

x <- lapply(df, myfunction)

Also, sapply works just fine. The only difference is that it looks different at the beginning. See print(x) for the difference between all solutions.

x <- sapply(df, myfunction)

Afterwards you probably want to combine them from a list to a data.frame again. You can do this with do.call

df2 <- do.call(cbind, x)

This will mess up the column names. You can change these using names

names(df2) <- NULL
df2
# 1 5 0.0 5 0.0 5 0.0
# 2 2 0.0 2 0.0 2 0.0
# 3 -4 -4.0 -4 -4.0 -4 -4.0
# 4 -6 -8.5 -6 -8.5 -6 -8.5
# ....

Side Note:

If you don't have a data.frame but a matrix as input, another option would be apply with the with MARGIN = 2.

x <- apply(df, MARGIN = 2, myfunction)

Although in this example, it works as well, you will run into trouble when having differing data types across your vectors as it converts the data.frame to a matrix before applying the function. Therefore it is not recommended. More info on that can be found in this detailed and easy-to-understand post!

Further reading on this:

Hadley Wickham's Advanced R. Also check out the section on data types on this site.

Peter Werner's blog post


I greatly appreciate the input of @Gregor on this post.

How to apply function to each row of dataframe in R?

You can use apply on the rows (MARGIN = 1).

apply(df, MARGIN = 1, function(x) concord(sourcevar = x[3], origin = x[1], destination = "HS4", dest.digit = x[2], all = F))

However, this does not work because there is no conversion dictionary between "HS4" and "HS4", so you can use apply only on the rows that are not HS4:

df$New <- df$Code
df[df$Model != "HS4", ]$New <- apply(df[df$Model != "HS4", ], 1, \(x) concord(sourcevar = x[colnames(df) == "Code"],
origin = x[colnames(df) == "Model"], destination = "HS4",
dest.digit = x[colnames(df) == "Length"], all = F))

Model Length Code New
1 HS5 6 030299 030289
2 HS5 6 010121 010121
3 HS5 6 030448 030449
4 HS4 6 030324 030324

Apply a function to every column

In dplyr, you can use across to apply a function to multiple columns.

library(dplyr)
df <- df %>% mutate(across(starts_with('var'), ~./sd(.)))
df

# var1 var2 var3
# <dbl> <dbl> <dbl>
# 1 0.0384 0.118 0.707
# 2 0.0767 0.237 0.354
# 3 1.34 0.474 1.06
# 4 0.192 0.632 1.06
# 5 0.844 0.809 1.24
# 6 1.02 0.987 1.41

In base R, we can use lapply -

df[] <- lapply(df, function(x) x/sd(x))

To apply this to selected columns (1:168) you can do

df[1:168] <- lapply(df[1:168], function(x) x/sd(x))

Apply a function returning a data frame to each row in a data frame

You haven't shown what you have in f but based on comments it is written for dataframes, so this should work :

lapply(split(d, seq_len(nrow(d))), f)

split divides every row of d in 1 row-dataframe and using lapply we apply function f on each row.

You can also use by :

by(d, seq_len(nrow(d)), f)

how to apply a function to each row of a pandas dataframe, where the input to the function is the elements in the row in the form of a list

In your lambda, you've defined the incoming row as row, so you can just pass row.tolist():

df_sample['membership'] = df_sample.apply(lambda row:
cluster_pred(row.tolist()), axis=1)


Related Topics



Leave a reply



Submit