Sum Duplicate Row Values

sum duplicate row with condition using pandas

Can try summing only the unique values per group:

def sum_unique(s):
return s.unique().sum()

df2 = df.groupby('Name', sort=False, as_index=False).agg(
Name=('Name', 'first'),
rent=('rent', sum_unique),
sale=('sale', sum_unique)
)

df2:

  Name  rent  sale
0 A 180 7
1 B 1 4
2 M 14 20
3 O 10 1

How to SUM duplicate row in CASE statement calculation

I suspect that you want distinct:

SELECT 
user_id,
COALESCE(COUNT(DISTINCT CASE WHEN score_flag = 'Y' THEN item END), 0) AS Compliant_Count,
COALESCE(COUNT(DISTINCT CASE WHEN score_flag = 'N' THEN item END), 0) AS NonCompliant_Count,
COUNT(DISTINCT CASE WHEN score_flag IN ('Y', 'N') THEN item END) AS Total_count
FROM mytable
GROUP BY user_id

Notes:

  • there is no user table alias defined in the query, so user.id is invalid. I renamed it to user_id
  • do not use single quotes for column aliases - single quotes should used for string literals only
  • If Y and N are the only possible values, the last expression can be simplified as just COUNT(DISTINCT item)

Get the sum of all duplicate rows for a each column without hard coding the column name?

We can specify the . which signifies all the rest of the columns other than the 'name' column in aggregate

aggregate(. ~ name, data = dat, sum)
name etc4 etc1 etc2
1 A 30 19 11
2 B 2 7 5
3 C 90 12 1

Or if we need more fine control i.e if there are other columns as well and want to avoid, either subset the data with select or use cbind

aggregate(cbind(etc1, etc2, etc4) ~ name, data = dat, sum)
name etc1 etc2 etc4
1 A 19 11 30
2 B 7 5 2
3 C 12 1 90

If we need to store the names and reuse, subset the data and then convert to matrix

cname <- c("etc4",   "etc1"  )
aggregate(as.matrix(dat[cname]) ~ name, data = dat, sum)
name etc4 etc1
1 A 30 19
2 B 2 7
3 C 90 12

Or this may also be done in a faster way with fsum

library(collapse)
fsum(get_vars(dat, is.numeric), g = dat$name)
etc4 etc1 etc2
A 30 19 11
B 2 7 5
C 90 12 1

Sum values from Duplicated rows

SELECT code, name, sum(total) AS total FROM table GROUP BY code, name

Sum of values excluding rows with duplicate criteria

Sample Image

Formula in D6 is:

=SUMPRODUCT(AVERAGEIF(A2:A6;A2:A6&"";B2:B6)/COUNTIF(A2:A6;A2:A6))

Notice this will work only if all values assigned to a Record are all the same always.



Related Topics



Leave a reply



Submit