Row/Column Counter in 'Apply' Functions

Row/column counter in 'apply' functions

What I usually do is to run sapply on the row numbers 1:nrow(test) instead of test, and use test[i,] inside the function:

t(sapply(1:nrow(test), function(i) test[i,]^(1/i)))

I am not sure this is really efficient, though.

getting the column of a row in a pandas apply function

You can directly modify the row Series and return the modified row Series.

def convert(row):
for col in row.index:
row[col] = f'({row.name}, {col}), {row[col]}'
return row

df = df.apply(convert, axis=1)
print(df)

X Y Z
a (a, X), 1 (a, Y), 3 (a, Z), 5
b (b, X), 2 (b, Y), 4 (b, Z), 6
c (c, X), 3 (c, Y), 5 (c, Z), 7
d (d, X), 4 (d, Y), 6 (d, Z), 8
e (e, X), 5 (e, Y), 7 (e, Z), 9

How to count the iteration over a apply function for rows in pandas

I figured out a workaround for this, if you are using a dataframe then add a counter like this

input_df['counter']=0

for i,row in input_df.iterrows():
input_df['counter'][i]= i+1

Your Apply statement:-

input_df.apply(YourFunction,axis=1)

Your Calling function:-

def YourFunction(row):
print(row['counter'])

R_Extract the row and column of the element in use when using apply function

I'm not entirely sure what you're trying to do but I would use a for loop here.

Pre-allocate the return matrix and this will be very fast

ret <- mymatrix
for (i in 1:nrow(mymatrix))
for (j in 1:ncol(mymatrix))
ret[i, j] <- sum(mymatrix[i, j], i, j)
# [,1] [,2] [,3] [,4]
#[1,] 3 7 11 15
#[2,] 5 9 13 17
#[3,] 7 11 15 19

Benchmark analysis 1

I was curious so I ran a microbenchmark analysis to compare methods; I used a bigger 200x300 matrix.

mymatrix <- matrix(1:600, nrow = 200, ncol = 300)
library(microbenchmark)
res <- microbenchmark(
for_loop = {
ret <- mymatrix
for (i in 1:nrow(mymatrix))
for (j in 1:ncol(mymatrix))
ret[i, j] <- sum(mymatrix[i, j], i, j)
},
expand_grid_mapply = {
newResult<- mymatrix
grid1 <- expand.grid(1:nrow(mymatrix),1:ncol(mymatrix))
newResult[]<-
mapply(function(row_number, col_number){ sum(mymatrix[row_number, col_number], row_number, col_number) },row_number = grid1$Var1, col_number = grid1$Var2 )
},
expand_grid_apply = {
newResult<- mymatrix
grid1 <- expand.grid(1:nrow(mymatrix),1:ncol(mymatrix))
newResult[]<-
apply(grid1, 1, function(x){ sum(mymatrix[x[1], x[2]], x[1], x[2]) })
},
double_sapply = {
sapply(1:ncol(mymatrix), function (x) sapply(1:nrow(mymatrix), function (y) sum(mymatrix[y,x],x,y)))
}
)

res
#Unit: milliseconds
# expr min lq mean median uq max
# for_loop 41.42098 52.72281 56.86675 56.38992 59.1444 82.89455
# expand_grid_mapply 126.98982 161.79123 183.04251 182.80331 196.1476 332.94854
# expand_grid_apply 295.73234 354.11661 375.39308 375.39932 391.6888 562.59317
# double_sapply 91.80607 111.29787 120.66075 120.37219 126.0292 230.85411

library(ggplot2)
autoplot(res)

Sample Image

Benchmark analysis 2 (with expand.grid outside of microbenchmark)

grid1 <- expand.grid(1:nrow(mymatrix),1:ncol(mymatrix))
res <- microbenchmark(
for_loop = {
ret <- mymatrix
for (i in 1:nrow(mymatrix))
for (j in 1:ncol(mymatrix))
ret[i, j] <- sum(mymatrix[i, j], i, j)
},
expand_grid_mapply = {
newResult<- mymatrix
newResult[]<-
mapply(function(row_number, col_number){ sum(mymatrix[row_number, col_number], row_number, col_number) },row_number = grid1$Var1, col_number = grid1$Var2 )
},
expand_grid_apply = {
newResult<- mymatrix
newResult[]<-
apply(grid1, 1, function(x){ sum(mymatrix[x[1], x[2]], x[1], x[2]) })
}
)

res
#Unit: milliseconds
# expr min lq mean median uq max
# for_loop 39.65599 54.52077 60.87034 59.19354 66.64983 95.7890
# expand_grid_mapply 130.33573 167.68201 194.39764 186.82411 209.33490 400.9273
# expand_grid_apply 296.51983 373.41923 405.19549 403.36825 427.41728 597.6937

Applying the counter from collection to a column in a dataframe

IIUC, your comparison to pandas was only to explain your goal and you want to work with lists?

You can use:

l = [['FollowFriday', 'Awesome'],
['Covid_19', 'corona', 'Notagain'],
['Awesome'],
['FollowFriday', 'Awesome'],
[],
['corona', 'Notagain'],
]

from collections import Counter
from itertools import chain

out = Counter(chain.from_iterable(l))

or if you have a Series of lists, use explode:

out = Counter(df['column'].explode())
# OR
out = df['column'].explode().value_counts()

output:

Counter({'FollowFriday': 2,
'Awesome': 3,
'Covid_19': 1,
'corona': 2,
'Notagain': 2})

Pandas apply function to each row by calculating multiple columns

IIUC, you can use:

out = (df
.groupby('name')
.apply(lambda g: g['amount'].mul(g['con']).sum()/g['amount'].sum())
)

output:

name
a 5.842105
b 4.571429
c 10.000000
dtype: float64

Create function to count values across list of columns

rowSums can give you results OP is looking for. This return count of ratings==4 for each group.

rowSums(df[2:5]==4)

#1 2 3 4
#1 0 3 1

OR just part of function from OP can give answer.

apply(df[2:5], 1, function(x)(sum(x==4)))
#1 2 3 4
#1 0 3 1

How can I apply a function to columns in a Pandas dataframe that includes a count of NaN in each column?

Thanks to Datanovice and vb_rises, the answer is:

df.apply(lambda x : x + df.isnull().sum(), axis=1)

If anyone had a similar question, I wanted the answer to be clear and without the need to read through the comments. I had thought that axis=1 (column-wise) is a default in Pandas, but it seems that's not necessarily the case for all methods.

R: avoid turning one-row data frames into a vector when using apply functions

You can solve your problem by using lapply instead of sapply, and then combine the result using do.call as follows

new_df <- as.data.frame(lapply(mydf[,-1,drop=F], function(x) gsub("\\s+","_",x)))
new_df <- do.call(cbind, new_df)
new_df
# value1 value2
#[1,] "A_1" "Z_1"

new_df <- cbind(mydf[,1,drop=F], new_df)
#new_df
# ID value1 value2
#1 A A_1 Z_1

As for your question about unpredictable behavior of sapply, it is because s in sapply represent simplification, but the simplified result is not guaranteed to be a data frame. It can be a data frame, a matrix, or a vector.

According to the documentation of sapply:

sapply is a user-friendly version and wrapper of lapply by default
returning a vector, matrix or, if simplify = "array", an array if
appropriate, by applying simplify2array().

On the simplify argument:

logical or character string; should the result be simplified
to a vector, matrix or higher dimensional array if possible? For
sapply it must be named and not abbreviated. The default value, TRUE,
returns a vector or matrix if appropriate, whereas if simplify =
"array" the result may be an array of “rank” (=length(dim(.))) one
higher than the result of FUN(X[[i]]).

The Details part explain its behavior that loos similar with what you experienced (emphasis is from me) :

Simplification in sapply is only attempted if X has length greater
than zero and if the return values from all elements of X are all of
the same (positive) length. If the common length is one the result is
a vector
, and if greater than one is a matrix with a column
corresponding to each element of X.

Hadley Wickham also recommend not to use sapply:

I recommend that you avoid sapply() because it tries to simplify the
result, so it can return a list, a vector, or a matrix. This makes it
difficult to program with, and it should be avoided in non-interactive
settings

He also recommends not to use apply with a data frame. See Advanced R for further explanation.



Related Topics



Leave a reply



Submit