Calculate Row Sum and Product in Data.Frame

calculate row sum and product in data.frame

Try

 transform(df, sum=rowSums(df), prod=x*y*z)
# x y z sum prod
#1 1 2 3 6 6
#2 2 3 4 9 24
#3 5 1 2 8 10

Or

 transform(df, sum=rowSums(df), prod=Reduce(`*`, df))
# x y z sum prod
#1 1 2 3 6 6
#2 2 3 4 9 24
#3 5 1 2 8 10

Another option would be to use rowProds from matrixStats

 library(matrixStats)
transform(df, sum=rowSums(df), prod=rowProds(as.matrix(df)))

If you are using apply

 df[,c('sum', 'prod')] <-  t(apply(df, 1, FUN=function(x) c(sum(x), prod(x))))
df
# x y z sum prod
#1 1 2 3 6 6
#2 2 3 4 9 24
#3 5 1 2 8 10

Product Sum for two rows from two dataframes and add sum to a new column

Since you are interested in just the first row of w_df, you select just that row, multiply main_df with it, using the mul function, along the row axis; after the product computation, you can them sum along the axis.

main_df.assign(prod_sum=main_df.mul(w_df.iloc[0].array, axis=1).sum(1))

A B prod_sum
0 3 4 1.1
1 5 6 1.7
2 7 8 2.3

Pandas: Element-wise sum-product of data frame of values using a another data frame containing row weights

Try:

# Convert each list to numpy array if it's not already the case
df1 = df1.applymap(np.array)

vals = np.sum((df1.values * df2.values), axis=1)
out = pd.DataFrame.from_records(vals, index=df1.index, columns=df1.columns)

Output:

>>> out
x y z
0 6 9 12
1 30 36 42

# Intermediate result
>>> df1.values * df2.values
[[array([1, 2, 3]) array([2, 3, 4]) array([3, 4, 5])]
[array([ 8, 10, 12]) array([10, 12, 14]) array([12, 14, 16])]]

R sum of vector multiplied by each row of data frame

Try:

transform(df, prod = rowSums(sweep(df[, 3:5], 2, v, `*`)))

Output:

   prod sd  c1   c2   c3
1 0.175 NA 0.5 0.25 0.25
2 0.150 NA 0.5 0.50 0.00
3 0.200 NA 0.5 0.00 0.50

How to get sum of product from columns in 2 data frame using PySpark

Here's a simpler example derived from your sample dataframes. I think it should also be scalable to your real data.

df1.show()
+-------+------+----+-----+
|StoreId|ItemID|Date|Price|
+-------+------+----+-----+
| HH-101| item1| d_1| €9|
| HH-101| item1| d_2| €7|
+-------+------+----+-----+

df2.show()
+-------+------+---+---+
|StoreId|ItemID|d_1|d_2|
+-------+------+---+---+
| HH-101| item1| 2| 4|
| HH-101| item2| 1| 0|
+-------+------+---+---+

You can unpivot df2 using stack with a query string generated from a list comprehension of the column names, then join to df1 using the first 3 columns, group by the store id and item id, and get the sum of price * number.

result = df2.selectExpr(
'StoreId', 'ItemID',
'stack(2, ' + ', '.join(["'%s', %s" % (c, c) for c in df2.columns[2:]]) + ') as (Date, Number)'
# "stack(2, 'd_1', d_1, 'd_2', d_2) as (Date, Number)"
).join(
df1, df1.columns[:3]
).groupBy(
'StoreId', 'ItemID'
).agg(
F.expr('sum(Number * float(substr(Price, 2))) as Total')
)

result.show()
+-------+------+-----+
|StoreId|ItemID|Total|
+-------+------+-----+
| HH-101| item1| 46.0|
+-------+------+-----+

How to calculate sumproduct in pandas by column?

You can do this, assuming ID is not in the index:

 df.loc[5, :] = df.iloc[:,1:-1].mul(df['val'], axis=0).sum()

Output:

    ID      2000-01      2000-02      2000-03      2000-04      2000-05      val
0 1.0 2847.0 2861.0 2875.0 2890.0 2904.0 94717.0
1 2.0 1338.0 1343.0 1348.0 1353.0 1358.0 70105.0
2 3.0 3301.0 3311.0 3321.0 3331.0 3341.0 60307.0
3 4.0 1425.0 1422.0 1419.0 1416.0 1413.0 79888.0
5 NaN 676373596.0 678413565.0 680453534.0 682588220.0 684628189.0 NaN

Use pandas.DataFrame.mul with axis=0 then sum and let pandas intrinsic data alignment put the values in the correct column based on indexing.

Calculate sum for column in dataframe using pandas

Use Series.clip with sum:

pos = df['amount'].clip(lower=0).sum()
neg = df['amount'].clip(upper=0).sum()
print (pos)
132.87
print (neg)
-111.3

Or DataFrame.loc with sum and filtering with Series.gt for greater, ~ is used for invert mask for negative values:

m = df["amount"].gt(0)
pos = df.loc[m, "amount"].sum()
neg = df.loc[~m, "amount"].sum()
print (pos)
132.87
print (neg)
-111.3

How to replicate excel sumproduct function correctly in python?

Pandas supports (and enforces) data alignment. When you apply an operation to two data frames, the operation is applied to the rows and columns with the same index (name), not at the same position. To apply operations to a pair of columns with different names, you should extract the underlying numpy arrays from them:

# Clean the NAs
import numpy as np
df1.replace("N/A", np.nan, inplace=True)

(df1[cols_left].fillna(0).values * df1[cols_right].values).sum() / df1[cols_right].sum(1)
#0 15.25

Note that nan * 0 is still a nan. You must convert nans to finite numbers (e.g., to 0s) to obtain a numeric result.

Tidyverse solution for rowise sum of products over multiple columns

To obtain a completely generalizable and robust solution, I think it's best to transform the data frame to something more amenable to the task in hand.

df %>% 
mutate(row=row_number()) %>%
pivot_longer(
-row,
names_sep="_",
names_to=c("name", "index")
) %>%
group_by(row, index) %>%
pivot_wider(names_from=name, values_from=value)
# A tibble: 6 x 4
# Groups: row, index [6]
row index x y
<int> <chr> <dbl> <dbl>
1 1 0 5 3
2 1 1 9 3
3 1 2 2 1
4 2 0 6 2
5 2 1 1 2
6 2 2 1 3

Then calculate the sum of products...

df %>% 
mutate(row=row_number()) %>%
pivot_longer(
-row,
names_sep="_",
names_to=c("name", "index")
) %>%
group_by(row, index) %>%
pivot_wider(names_from=name, values_from=value) %>%
mutate(product=x * y) %>%
group_by(row) %>%
summarise(sum_product=sum(product))
# A tibble: 2 x 2
row sum_product
<int> <dbl>
1 1 44
2 2 17

This is robust to the number of rows, the number of variable types (eg x, y and z) and the number of indices (eg 1, 2 and 3).

Edit

My claim that the solution above is robust respect to number of variable types is false. (Because of the stage in the pipe that reads mutate(product=x * y).) Here's a solution that is, together with a modified input dataset to demonstrate that it is.

df1 <- tibble(
x_0 = c(5,6,1,-1), x_1 = c(9,1,1,3), x_2 = c(2,1,3,4),
y_0 = c(3,2,1, 2), y_1 = c(3,2,2,2), y_2 = c(1,3,2,2),
z_0 = c(4,5,1, 3), z_1 = c(3,1,2,1), z_2 = c(2,2,1,3)

)

df1 %>%
mutate(row=row_number()) %>%
pivot_longer(
-row,
names_sep="_",
names_to=c("name", "index")
) %>%
group_by(row, index) %>%
pivot_wider(names_from=name, values_from=value) %>%
group_map(
function(.x, .y, .keep=TRUE) {
.y %>% bind_cols(.x %>% mutate(product = unlist(apply(.x, 1, prod))))
}
) %>% bind_rows() %>%
group_by(row) %>%
summarise(sum_product=sum(product))
# A tibble: 4 x 2
row sum_product
<int> <dbl>
1 1 145
2 2 68
3 3 11
4 4 24


Related Topics



Leave a reply



Submit