Apply function to each cell in DataFrame
You can use applymap()
which is concise for your case.
df.applymap(foo_bar)
# A B C
#0 wow bar wow bar
#1 bar wow wow bar
Another option is to vectorize your function and then use apply
method:
import numpy as np
df.apply(np.vectorize(foo_bar))
# A B C
#0 wow bar wow bar
#1 bar wow wow bar
How to apply a function to every element in a dataframe?
Since your problem requires access to both the index and column labels of your df
you probably want df.apply()
.
df.apply()
has access to a pandas.Series
representing each row/column (dependent on axis
argument value) and you will have access to the column name and index; whereas df.applymap()
utilises each individual value of df
at runtime - so you wouldn't necessarily have access to the index and column name as required.
Example
import numpy as np
import pandas as pd
def foo(name, index):
return name - index
x = np.arange(0, 2.01, 0.25)
y = np.arange(10, 30, 5.0)
df = pd.DataFrame(index = x, columns = y)
df.apply(lambda x: foo(x.name, x.index))
Output
10.0 15.0 20.0 25.0
0.00 10.00 15.00 20.00 25.00
0.25 9.75 14.75 19.75 24.75
0.50 9.50 14.50 19.50 24.50
0.75 9.25 14.25 19.25 24.25
1.00 9.00 14.00 19.00 24.00
1.25 8.75 13.75 18.75 23.75
1.50 8.50 13.50 18.50 23.50
1.75 8.25 13.25 18.25 23.25
2.00 8.00 13.00 18.00 23.00
In the above example the column name and index of each Series constituting df
is passed to foo()
by way of df.apply()
. Within foo()
each value is defined by subtracting it's own index value from it's own column name value. Here you can see that the index value for each row is accessed using x.index
and the column value is accessed using x.name
within the call within df.apply()
.
Update
Many thanks to @SyntaxError for pointing out that x.index
and x.name
could be passed to foo()
within df.apply()
instead of feeding the entire Series (x
) into the function and accessing the values manually therein. As mentioned, this seems to fit OP's use case in a much neater manner than my original response - which was largely the same but passed each x
series into foo()
which then had responsibility for extracting x.name
and x.column
.
apply function to every element in data.frame and return data.frame
df <- data.frame(c(1,2,3), c(2,3,4))
df[] <- lapply(df, function(x) paste(x,"x", sep=""))
df
df[]
preserves the dataframe's structure.
Apply a function to each element of each dataframe in a list
Apply the same functions using lapply
. This applies both centered and scaled function together.
lapply(l, function(y) apply(y, 2, function(x) {
x = x - mean(x)
x/sqrt(sd(x))
}))
#[[1]]
# A B
#[1,] -0.5946036 -0.8408964
#[2,] 0.5946036 0.8408964
#[[2]]
# A B
#[1,] -1.3201676 -1.3201676
#[2,] -0.4400559 -0.4400559
#[3,] 0.4400559 0.4400559
#[4,] 1.3201676 1.3201676
If you want them separately
centered <- lapply(l, function(y) apply(y, 2, function(x) x - mean(x)))
scaled <- lapply(centered, function(y) apply(y, 2, function(x) x/sqrt(sd(x))))
Apply a function to each column of a data.frame and organize the output
In this case, you would like to use lapply
. It will handle each column of the data.frame, as it actually is a list of equal-length vectors, and return a two column data.frame each.
x <- lapply(df, myfunction)
Also, sapply
works just fine. The only difference is that it looks different at the beginning. See print(x)
for the difference between all solutions.
x <- sapply(df, myfunction)
Afterwards you probably want to combine them from a list to a data.frame again. You can do this with do.call
df2 <- do.call(cbind, x)
This will mess up the column names. You can change these using names
names(df2) <- NULL
df2
# 1 5 0.0 5 0.0 5 0.0
# 2 2 0.0 2 0.0 2 0.0
# 3 -4 -4.0 -4 -4.0 -4 -4.0
# 4 -6 -8.5 -6 -8.5 -6 -8.5
# ....
Side Note:
If you don't have a data.frame but a matrix as input, another option would be apply
with the with MARGIN = 2
.
x <- apply(df, MARGIN = 2, myfunction)
Although in this example, it works as well, you will run into trouble when having differing data types across your vectors as it converts the data.frame to a matrix before applying the function. Therefore it is not recommended. More info on that can be found in this detailed and easy-to-understand post!
Further reading on this:
Hadley Wickham's Advanced R. Also check out the section on data types on this site.
Peter Werner's blog post
I greatly appreciate the input of @Gregor on this post.
How to apply function to each row of dataframe in R?
You can use apply
on the rows (MARGIN = 1
).
apply(df, MARGIN = 1, function(x) concord(sourcevar = x[3], origin = x[1], destination = "HS4", dest.digit = x[2], all = F))
However, this does not work because there is no conversion dictionary between "HS4" and "HS4", so you can use apply
only on the rows that are not HS4:
df$New <- df$Code
df[df$Model != "HS4", ]$New <- apply(df[df$Model != "HS4", ], 1, \(x) concord(sourcevar = x[colnames(df) == "Code"],
origin = x[colnames(df) == "Model"], destination = "HS4",
dest.digit = x[colnames(df) == "Length"], all = F))
Model Length Code New
1 HS5 6 030299 030289
2 HS5 6 010121 010121
3 HS5 6 030448 030449
4 HS4 6 030324 030324
Apply a function to every column
In dplyr
, you can use across
to apply a function to multiple columns.
library(dplyr)
df <- df %>% mutate(across(starts_with('var'), ~./sd(.)))
df
# var1 var2 var3
# <dbl> <dbl> <dbl>
# 1 0.0384 0.118 0.707
# 2 0.0767 0.237 0.354
# 3 1.34 0.474 1.06
# 4 0.192 0.632 1.06
# 5 0.844 0.809 1.24
# 6 1.02 0.987 1.41
In base R, we can use lapply
-
df[] <- lapply(df, function(x) x/sd(x))
To apply this to selected columns (1:168
) you can do
df[1:168] <- lapply(df[1:168], function(x) x/sd(x))
Apply a function returning a data frame to each row in a data frame
You haven't shown what you have in f
but based on comments it is written for dataframes, so this should work :
lapply(split(d, seq_len(nrow(d))), f)
split
divides every row of d
in 1 row-dataframe and using lapply
we apply function f
on each row.
You can also use by
:
by(d, seq_len(nrow(d)), f)
how to apply a function to each row of a pandas dataframe, where the input to the function is the elements in the row in the form of a list
In your lambda, you've defined the incoming row as row
, so you can just pass row.tolist()
:
df_sample['membership'] = df_sample.apply(lambda row:
cluster_pred(row.tolist()), axis=1)
Related Topics
How to Use Outlier Tests in R Code
Ggplot2: Is There a Fix for Jagged, Poor-Quality Text Produced by Geom_Text()
Convert from Lowercase to Uppercase All Values in All Character Variables in Dataframe
Cor Shows Only Na or 1 for Correlations - Why
"Factor Has New Levels" Error for Variable I'm Not Using
Ggplot2 - Multi-Group Histogram with In-Group Proportions Rather Than Frequency
Embedding a Miniature Plot Within a Plot
Change Color of Leaflet Marker
How to Add an Inset (Subplot) to "Topright" of an R Plot
Reading Hdf Files into R and Converting Them to Geotiff Rasters
R: Losing Column Names When Adding Rows to an Empty Data Frame
Adjusting Width of Tables Made with Kable() in Rmarkdown Documents
Grepl in R to Find Matches to Any of a List of Character Strings
Package Dependencies When Installing from Source in R
Grouping & Visualizing Cumulative Features in R