sum duplicate row with condition using pandas
Can try summing only the unique
values per group:
def sum_unique(s):
return s.unique().sum()
df2 = df.groupby('Name', sort=False, as_index=False).agg(
Name=('Name', 'first'),
rent=('rent', sum_unique),
sale=('sale', sum_unique)
)
df2
:
Name rent sale
0 A 180 7
1 B 1 4
2 M 14 20
3 O 10 1
How to SUM duplicate row in CASE statement calculation
I suspect that you want distinct
:
SELECT
user_id,
COALESCE(COUNT(DISTINCT CASE WHEN score_flag = 'Y' THEN item END), 0) AS Compliant_Count,
COALESCE(COUNT(DISTINCT CASE WHEN score_flag = 'N' THEN item END), 0) AS NonCompliant_Count,
COUNT(DISTINCT CASE WHEN score_flag IN ('Y', 'N') THEN item END) AS Total_count
FROM mytable
GROUP BY user_id
Notes:
- there is no
user
table alias defined in the query, souser.id
is invalid. I renamed it touser_id
- do not use single quotes for column aliases - single quotes should used for string literals only
- If Y and N are the only possible values, the last expression can be simplified as just
COUNT(DISTINCT item)
Get the sum of all duplicate rows for a each column without hard coding the column name?
We can specify the .
which signifies all the rest of the columns other than the 'name' column in aggregate
aggregate(. ~ name, data = dat, sum)
name etc4 etc1 etc2
1 A 30 19 11
2 B 2 7 5
3 C 90 12 1
Or if we need more fine control i.e if there are other columns as well and want to avoid, either subset
the data with select
or use cbind
aggregate(cbind(etc1, etc2, etc4) ~ name, data = dat, sum)
name etc1 etc2 etc4
1 A 19 11 30
2 B 7 5 2
3 C 12 1 90
If we need to store the names and reuse, subset the data and then convert to matrix
cname <- c("etc4", "etc1" )
aggregate(as.matrix(dat[cname]) ~ name, data = dat, sum)
name etc4 etc1
1 A 30 19
2 B 2 7
3 C 90 12
Or this may also be done in a faster way with fsum
library(collapse)
fsum(get_vars(dat, is.numeric), g = dat$name)
etc4 etc1 etc2
A 30 19 11
B 2 7 5
C 90 12 1
Sum values from Duplicated rows
SELECT code, name, sum(total) AS total FROM table GROUP BY code, name
Sum of values excluding rows with duplicate criteria
Formula in D6 is:
=SUMPRODUCT(AVERAGEIF(A2:A6;A2:A6&"";B2:B6)/COUNTIF(A2:A6;A2:A6))
Notice this will work only if all values assigned to a Record are all the same always.
Related Topics
Select All Records Don't Meet Certain Conditions in a Joined Table
Querying Where Condition to Character Length
Insert Multiple Values Using Insert into (SQL Server 2005)
Getting the Sum of a Datediff Result
How to Use Pivot in SQL Server (Without Aggregates )
Efficient Way to Store Reorderable Items in a Database
Mysql: Optimizing Finding Super Node in Nested Set Tree
Zip Code to City/State and Vice-Versa in a Database
Syntax Error in Dynamic SQL in Pl/Pgsql Function
Access SQL Query to Concatenate Rows
Combine Multiple Rows into Multiple Columns Dynamically in SQL Server
Join One Row to Multiple Rows in Another Table
How to Select the Set of Rows Where Each Item Has the Greatest Timestamp
How to Update Ms Access Database Table Using Update and Sum() Function