calculate row sum and product in data.frame
Try
transform(df, sum=rowSums(df), prod=x*y*z)
# x y z sum prod
#1 1 2 3 6 6
#2 2 3 4 9 24
#3 5 1 2 8 10
Or
transform(df, sum=rowSums(df), prod=Reduce(`*`, df))
# x y z sum prod
#1 1 2 3 6 6
#2 2 3 4 9 24
#3 5 1 2 8 10
Another option would be to use rowProds
from matrixStats
library(matrixStats)
transform(df, sum=rowSums(df), prod=rowProds(as.matrix(df)))
If you are using apply
df[,c('sum', 'prod')] <- t(apply(df, 1, FUN=function(x) c(sum(x), prod(x))))
df
# x y z sum prod
#1 1 2 3 6 6
#2 2 3 4 9 24
#3 5 1 2 8 10
Product Sum for two rows from two dataframes and add sum to a new column
Since you are interested in just the first row of w_df
, you select just that row, multiply main_df
with it, using the mul function, along the row axis; after the product computation, you can them sum along the axis.
main_df.assign(prod_sum=main_df.mul(w_df.iloc[0].array, axis=1).sum(1))
A B prod_sum
0 3 4 1.1
1 5 6 1.7
2 7 8 2.3
Pandas: Element-wise sum-product of data frame of values using a another data frame containing row weights
Try:
# Convert each list to numpy array if it's not already the case
df1 = df1.applymap(np.array)
vals = np.sum((df1.values * df2.values), axis=1)
out = pd.DataFrame.from_records(vals, index=df1.index, columns=df1.columns)
Output:
>>> out
x y z
0 6 9 12
1 30 36 42
# Intermediate result
>>> df1.values * df2.values
[[array([1, 2, 3]) array([2, 3, 4]) array([3, 4, 5])]
[array([ 8, 10, 12]) array([10, 12, 14]) array([12, 14, 16])]]
R sum of vector multiplied by each row of data frame
Try:
transform(df, prod = rowSums(sweep(df[, 3:5], 2, v, `*`)))
Output:
prod sd c1 c2 c3
1 0.175 NA 0.5 0.25 0.25
2 0.150 NA 0.5 0.50 0.00
3 0.200 NA 0.5 0.00 0.50
How to get sum of product from columns in 2 data frame using PySpark
Here's a simpler example derived from your sample dataframes. I think it should also be scalable to your real data.
df1.show()
+-------+------+----+-----+
|StoreId|ItemID|Date|Price|
+-------+------+----+-----+
| HH-101| item1| d_1| €9|
| HH-101| item1| d_2| €7|
+-------+------+----+-----+
df2.show()
+-------+------+---+---+
|StoreId|ItemID|d_1|d_2|
+-------+------+---+---+
| HH-101| item1| 2| 4|
| HH-101| item2| 1| 0|
+-------+------+---+---+
You can unpivot df2
using stack
with a query string generated from a list comprehension of the column names, then join to df1
using the first 3 columns, group by the store id and item id, and get the sum of price * number
.
result = df2.selectExpr(
'StoreId', 'ItemID',
'stack(2, ' + ', '.join(["'%s', %s" % (c, c) for c in df2.columns[2:]]) + ') as (Date, Number)'
# "stack(2, 'd_1', d_1, 'd_2', d_2) as (Date, Number)"
).join(
df1, df1.columns[:3]
).groupBy(
'StoreId', 'ItemID'
).agg(
F.expr('sum(Number * float(substr(Price, 2))) as Total')
)
result.show()
+-------+------+-----+
|StoreId|ItemID|Total|
+-------+------+-----+
| HH-101| item1| 46.0|
+-------+------+-----+
How to calculate sumproduct in pandas by column?
You can do this, assuming ID is not in the index:
df.loc[5, :] = df.iloc[:,1:-1].mul(df['val'], axis=0).sum()
Output:
ID 2000-01 2000-02 2000-03 2000-04 2000-05 val
0 1.0 2847.0 2861.0 2875.0 2890.0 2904.0 94717.0
1 2.0 1338.0 1343.0 1348.0 1353.0 1358.0 70105.0
2 3.0 3301.0 3311.0 3321.0 3331.0 3341.0 60307.0
3 4.0 1425.0 1422.0 1419.0 1416.0 1413.0 79888.0
5 NaN 676373596.0 678413565.0 680453534.0 682588220.0 684628189.0 NaN
Use pandas.DataFrame.mul
with axis=0 then sum and let pandas intrinsic data alignment put the values in the correct column based on indexing.
Calculate sum for column in dataframe using pandas
Use Series.clip
with sum
:
pos = df['amount'].clip(lower=0).sum()
neg = df['amount'].clip(upper=0).sum()
print (pos)
132.87
print (neg)
-111.3
Or DataFrame.loc
with sum
and filtering with Series.gt
for greater, ~
is used for invert mask for negative values:
m = df["amount"].gt(0)
pos = df.loc[m, "amount"].sum()
neg = df.loc[~m, "amount"].sum()
print (pos)
132.87
print (neg)
-111.3
How to replicate excel sumproduct function correctly in python?
Pandas supports (and enforces) data alignment. When you apply an operation to two data frames, the operation is applied to the rows and columns with the same index (name), not at the same position. To apply operations to a pair of columns with different names, you should extract the underlying numpy arrays from them:
# Clean the NAs
import numpy as np
df1.replace("N/A", np.nan, inplace=True)
(df1[cols_left].fillna(0).values * df1[cols_right].values).sum() / df1[cols_right].sum(1)
#0 15.25
Note that nan * 0
is still a nan
. You must convert nan
s to finite numbers (e.g., to 0s) to obtain a numeric result.
Tidyverse solution for rowise sum of products over multiple columns
To obtain a completely generalizable and robust solution, I think it's best to transform the data frame to something more amenable to the task in hand.
df %>%
mutate(row=row_number()) %>%
pivot_longer(
-row,
names_sep="_",
names_to=c("name", "index")
) %>%
group_by(row, index) %>%
pivot_wider(names_from=name, values_from=value)
# A tibble: 6 x 4
# Groups: row, index [6]
row index x y
<int> <chr> <dbl> <dbl>
1 1 0 5 3
2 1 1 9 3
3 1 2 2 1
4 2 0 6 2
5 2 1 1 2
6 2 2 1 3
Then calculate the sum of products...
df %>%
mutate(row=row_number()) %>%
pivot_longer(
-row,
names_sep="_",
names_to=c("name", "index")
) %>%
group_by(row, index) %>%
pivot_wider(names_from=name, values_from=value) %>%
mutate(product=x * y) %>%
group_by(row) %>%
summarise(sum_product=sum(product))
# A tibble: 2 x 2
row sum_product
<int> <dbl>
1 1 44
2 2 17
This is robust to the number of rows, the number of variable types (eg x
, y
and z
) and the number of indices (eg 1
, 2
and 3
).
Edit
My claim that the solution above is robust respect to number of variable types is false. (Because of the stage in the pipe that reads mutate(product=x * y)
.) Here's a solution that is, together with a modified input dataset to demonstrate that it is.
df1 <- tibble(
x_0 = c(5,6,1,-1), x_1 = c(9,1,1,3), x_2 = c(2,1,3,4),
y_0 = c(3,2,1, 2), y_1 = c(3,2,2,2), y_2 = c(1,3,2,2),
z_0 = c(4,5,1, 3), z_1 = c(3,1,2,1), z_2 = c(2,2,1,3)
)
df1 %>%
mutate(row=row_number()) %>%
pivot_longer(
-row,
names_sep="_",
names_to=c("name", "index")
) %>%
group_by(row, index) %>%
pivot_wider(names_from=name, values_from=value) %>%
group_map(
function(.x, .y, .keep=TRUE) {
.y %>% bind_cols(.x %>% mutate(product = unlist(apply(.x, 1, prod))))
}
) %>% bind_rows() %>%
group_by(row) %>%
summarise(sum_product=sum(product))
# A tibble: 4 x 2
row sum_product
<int> <dbl>
1 1 145
2 2 68
3 3 11
4 4 24
Related Topics
How Do We Plot Images at Given Coordinates in R
Using R to Connect to a Sharepoint List
Package Domc Not Available for R Version 3.0.0 Warning in Install.Packages
How to Extract Multiples of a Number from a Vector
How to Combine Multiple .CSV Files in R
Glmulti and Liner Mixed Models
Str_Replace (Package Stringr) Cannot Replace Brackets in R
Remove Numbers at the Beginning and End of a String
R - Compute Cross Product of Vectors (Physics)
How to Get Confidence Interval for Smooth.Spline
Print R-Squared for All of the Models Fit with Lmlist
Replace Nan Values in a List with Zero (0)
Solving a System of Nonlinear Equations in R
Page Refresh Button in R Shiny