Combine Results of Column One Then Sum Column 2 to List Total for Each Entry in Column One

Combine results of column one Then sum column 2 to list total for each entry in column one

This sounds like a job for awk :) Pipe the output of your program to the following awk script:

your_program | awk '{a[$1]+=$2}END{for(name in a)print name " " a[name]}'

Output:

Sean 201
Bob 219
Jim 245
Mark 190
Richard 142
John 208

The awk script itself can be explained better in this format:

# executed on each line
{
# 'a' is an array. It will be initialized
# as an empty array by awk on it's first usage
# '$1' contains the first column - the name
# '$2' contains the second column - the amount
#
# on every line the total score of 'name'
# will be incremented by 'amount'
a[$1]+=$2
}
# executed at the end of input
END{
# print every name and its score
for(name in a)print name " " a[name]
}

Note, to get the output sorted by score, you can add another pipe to sort -r -k2. -r -k2 sorts the by the second column in reverse order:

your_program | awk '{a[$1]+=$2}END{for(n in a)print n" "a[n]}' | sort -r -k2

Output:

Jim 245
Bob 219
John 208
Sean 201
Mark 190
Richard 142

Python Pandas - sum values of column and merge it to one

If there is multiple PRICE columns use:

df1 = df.filter(like='PRICE')
df['PRICEFINAL'] = df1.sum(axis=1)
df = df.drop(df1.columns, axis=1)

If only 2 columns sum with remove columns in DataFrame.pop:

df['PRICEFINAL'] = df.pop('PRICE1') + df.pop('PRICE2')

Pandas: sum up multiple columns into one column without last column

You can first select by iloc and then sum:

df['Fruit Total']= df.iloc[:, -4:-1].sum(axis=1)
print (df)
Apples Bananas Grapes Kiwis Fruit Total
0 2.0 3.0 NaN 1.0 5.0
1 1.0 3.0 7.0 NaN 11.0
2 NaN NaN 2.0 3.0 2.0

For sum all columns use:

df['Fruit Total']= df.sum(axis=1)

Can I Select DISTINCT on 2 columns and Sum grouped by 1 column in one query?

If I understand correctly, you want to consider NULL as a valid value. The rest is just aggregation:

select t.ohid,
(count(distinct t.memid) +
(case when count(*) <> count(t.memid) then 1 else 0 end)
) as num_memid,
sum(t.amount) as total_amount
from #temp t
group by t.ohid,

The case logic might be a bit off-putting. It is just adding 1 if any values are NULL.

You might find this easier to follow with two levels of aggregation:

select t.ohid, count(*), sum(amount)
from (select t.ohid, t.memid, sum(t.amount) as amount
from #temp t
group by t.ohid, t.memid
) t
group by t.ohid

How to join/merge and sum columns with the same name

From its current state, this should give the outcome you're looking for:

df = df.set_index('Country/Region') # optional
df.groupby(df.columns, axis=1).sum() # Stolen from Scott Boston as it's a superior method.

Output:

index           Brazil  Canada
Country/Region
Week 1 0 3
Week 2 0 17
Week 3 0 21
Week 4 0 21
Week 5 0 23
Week 6 0 85
Week 7 0 214
Week 8 12 924
Week 9 182 5350
Week 10 737 27611
Week 11 1674 75442
Week 12 2923 134133
Week 13 4516 200888
Week 14 6002 271539
Week 15 6751 341306
Week 16 7081 409938

I found your dataset interesting, here's how I would clean it up from step 1:

df = pd.read_csv('file.csv')
df = df.set_index(['Province/State', 'Country/Region', 'Lat', 'Long']).stack().reset_index()
df.columns = ['Province/State', 'Country/Region', 'Lat', 'Long', 'date', 'value']
df['date'] = pd.to_datetime(df['date'])
df = df.set_index('date')
df = df.pivot_table(index=df.index, columns='Country/Region', values='value', aggfunc=np.sum)
print(df)

Output:

Country/Region  Afghanistan  Albania  Algeria  Andorra  Angola  ...  West Bank and Gaza  Western Sahara  Yemen  Zambia  Zimbabwe
date ...
2020-01-22 0 0 0 0 0 ... 0 0 0 0 0
2020-01-23 0 0 0 0 0 ... 0 0 0 0 0
2020-01-24 0 0 0 0 0 ... 0 0 0 0 0
2020-01-25 0 0 0 0 0 ... 0 0 0 0 0
2020-01-26 0 0 0 0 0 ... 0 0 0 0 0
... ... ... ... ... ... ... ... ... ... ... ...
2020-07-30 36542 5197 29831 922 1109 ... 11548 10 1726 5555 3092
2020-07-31 36675 5276 30394 925 1148 ... 11837 10 1728 5963 3169
2020-08-01 36710 5396 30950 925 1164 ... 12160 10 1730 6228 3659
2020-08-02 36710 5519 31465 925 1199 ... 12297 10 1734 6347 3921
2020-08-03 36747 5620 31972 937 1280 ... 12541 10 1734 6580 4075

If you now want to do weekly aggregations, it's as simple as:

print(df.resample('w').sum())

Output:

Country/Region  Afghanistan  Albania  Algeria  Andorra  Angola  ...  West Bank and Gaza  Western Sahara  Yemen  Zambia  Zimbabwe
date ...
2020-01-26 0 0 0 0 0 ... 0 0 0 0 0
2020-02-02 0 0 0 0 0 ... 0 0 0 0 0
2020-02-09 0 0 0 0 0 ... 0 0 0 0 0
2020-02-16 0 0 0 0 0 ... 0 0 0 0 0
2020-02-23 0 0 0 0 0 ... 0 0 0 0 0
2020-03-01 7 0 6 0 0 ... 0 0 0 0 0
2020-03-08 10 0 85 7 0 ... 43 0 0 0 0
2020-03-15 57 160 195 7 0 ... 209 0 0 0 0
2020-03-22 175 464 705 409 5 ... 309 0 0 11 7
2020-03-29 632 1142 2537 1618 29 ... 559 0 0 113 31
2020-04-05 1783 2000 6875 2970 62 ... 1178 4 0 262 59
2020-04-12 3401 2864 11629 4057 128 ... 1847 30 3 279 84
2020-04-19 5838 3603 16062 4764 143 ... 2081 42 7 356 154
2020-04-26 8918 4606 21211 5087 174 ... 2353 42 7 541 200
2020-05-03 15149 5391 27943 5214 208 ... 2432 42 41 738 244
2020-05-10 25286 5871 36315 5265 274 ... 2607 42 203 1260 241
2020-05-17 39634 6321 45122 5317 327 ... 2632 42 632 3894 274
2020-05-24 61342 6798 54185 5332 402 ... 2869 45 1321 5991 354
2020-05-31 91885 7517 62849 5344 536 ... 3073 63 1932 7125 894
2020-06-07 126442 8378 68842 5868 609 ... 3221 63 3060 7623 1694
2020-06-14 159822 9689 74147 5967 827 ... 3396 63 4236 8836 2335
2020-06-21 191378 12463 79737 5981 1142 ... 4466 63 6322 9905 3089
2020-06-28 210487 15349 87615 5985 1522 ... 10242 70 7360 10512 3813
2020-07-05 224560 18707 102918 5985 2186 ... 21897 70 8450 11322 4426
2020-07-12 237087 22399 124588 5985 2940 ... 36949 70 9489 13002 6200
2020-07-19 245264 26845 149611 6098 4279 ... 52323 70 10855 16350 9058
2020-07-26 250970 31255 178605 6237 5919 ... 68154 70 11571 26749 14933
2020-08-02 255739 36370 208457 6429 7648 ... 80685 70 12023 38896 22241
2020-08-09 36747 5620 31972 937 1280 ... 12541 10 1734 6580 4075

SQL Sum amount for column with unique values

As per explanation you provided, I think your requirement is aggregate revenue of selective records that map with another table based on Col2 values. If that is the case then you may try following query.

WITH
rev_calc AS (
SELECT
distinct(Col2) as Col2
From table_input
LEFT JOIN another_table
ON another_table.Col2 = table_input.Col2
)
SELECT
Col0,
Col1,
SUM(Revenue) AS total_revenue
FROM table_input
WHERE Col2 in (select Col2 from rev_calc)
GROUP BY Col0, Col1;

Pandas - dataframe groupby - how to get sum of multiple columns

By using apply

df.groupby(['col1', 'col2'])["col3", "col4"].apply(lambda x : x.astype(int).sum())
Out[1257]:
col3 col4
col1 col2
a c 2 4
d 1 2
b d 1 2
e 2 4

If you want to agg

df.groupby(['col1', 'col2']).agg({'col3':'sum','col4':'sum'})

How do I sum values in a column that match a given condition using pandas?

The essential idea here is to select the data you want to sum, and then sum them. This selection of data can be done in several different ways, a few of which are shown below.

Boolean indexing

Arguably the most common way to select the values is to use Boolean indexing.

With this method, you find out where column 'a' is equal to 1 and then sum the corresponding rows of column 'b'. You can use loc to handle the indexing of rows and columns:

>>> df.loc[df['a'] == 1, 'b'].sum()
15

The Boolean indexing can be extended to other columns. For example if df also contained a column 'c' and we wanted to sum the rows in 'b' where 'a' was 1 and 'c' was 2, we'd write:

df.loc[(df['a'] == 1) & (df['c'] == 2), 'b'].sum()

Query

Another way to select the data is to use query to filter the rows you're interested in, select column 'b' and then sum:

>>> df.query("a == 1")['b'].sum()
15

Again, the method can be extended to make more complicated selections of the data:

df.query("a == 1 and c == 2")['b'].sum()

Note this is a little more concise than the Boolean indexing approach.

Groupby

The alternative approach is to use groupby to split the DataFrame into parts according to the value in column 'a'. You can then sum each part and pull out the value that the 1s added up to:

>>> df.groupby('a')['b'].sum()[1]
15

This approach is likely to be slower than using Boolean indexing, but it is useful if you want check the sums for other values in column a:

>>> df.groupby('a')['b'].sum()
a
1 15
2 8


Related Topics



Leave a reply



Submit