﻿ Combine Results of Column One Then Sum Column 2 to List Total for Each Entry in Column One - ITCodar

# Combine Results of Column One Then Sum Column 2 to List Total for Each Entry in Column One

## Combine results of column one Then sum column 2 to list total for each entry in column one

This sounds like a job for `awk` :) Pipe the output of your program to the following `awk` script:

``your_program | awk '{a[\$1]+=\$2}END{for(name in a)print name " " a[name]}'``

Output:

``Sean 201Bob 219Jim 245Mark 190Richard 142John 208``

The `awk` script itself can be explained better in this format:

``# executed on each line{  # 'a' is an array. It will be initialized   # as an empty array by awk on it's first usage  # '\$1' contains the first column - the name  # '\$2' contains the second column - the amount  #  #  on every line the total score of 'name'   #  will be incremented  by 'amount'  a[\$1]+=\$2}# executed at the end of inputEND{  # print every name and its score  for(name in a)print name " " a[name]}``

Note, to get the output sorted by score, you can add another pipe to `sort -r -k2`. `-r -k2` sorts the by the second column in reverse order:

``your_program | awk '{a[\$1]+=\$2}END{for(n in a)print n" "a[n]}' | sort -r -k2``

Output:

``Jim 245Bob 219John 208Sean 201Mark 190Richard 142``

## Python Pandas - sum values of column and merge it to one

If there is multiple `PRICE` columns use:

``df1 = df.filter(like='PRICE')df['PRICEFINAL'] = df1.sum(axis=1)df = df.drop(df1.columns, axis=1)``

If only 2 columns sum with remove columns in `DataFrame.pop`:

``df['PRICEFINAL'] = df.pop('PRICE1') + df.pop('PRICE2')``

## Pandas: sum up multiple columns into one column without last column

You can first select by `iloc` and then `sum`:

``df['Fruit Total']= df.iloc[:, -4:-1].sum(axis=1)print (df)   Apples  Bananas  Grapes  Kiwis  Fruit Total0     2.0      3.0     NaN    1.0          5.01     1.0      3.0     7.0    NaN         11.02     NaN      NaN     2.0    3.0          2.0``

For sum all columns use:

``df['Fruit Total']= df.sum(axis=1)``

## Can I Select DISTINCT on 2 columns and Sum grouped by 1 column in one query?

If I understand correctly, you want to consider `NULL` as a valid value. The rest is just aggregation:

``select t.ohid,       (count(distinct t.memid) +        (case when count(*) <> count(t.memid) then 1 else 0 end)       ) as num_memid,       sum(t.amount) as total_amountfrom #temp tgroup by t.ohid,``

The `case` logic might be a bit off-putting. It is just adding `1` if any values are `NULL`.

You might find this easier to follow with two levels of aggregation:

``select t.ohid, count(*), sum(amount)from (select t.ohid, t.memid, sum(t.amount) as amount      from #temp t      group by t.ohid, t.memid     ) tgroup by t.ohid``

## How to join/merge and sum columns with the same name

From its current state, this should give the outcome you're looking for:

``df = df.set_index('Country/Region') # optionaldf.groupby(df.columns, axis=1).sum() # Stolen from Scott Boston as it's a superior method.``

Output:

``index           Brazil  CanadaCountry/RegionWeek 1               0       3Week 2               0      17Week 3               0      21Week 4               0      21Week 5               0      23Week 6               0      85Week 7               0     214Week 8              12     924Week 9             182    5350Week 10            737   27611Week 11           1674   75442Week 12           2923  134133Week 13           4516  200888Week 14           6002  271539Week 15           6751  341306Week 16           7081  409938``

I found your dataset interesting, here's how I would clean it up from step 1:

``df = pd.read_csv('file.csv')df = df.set_index(['Province/State', 'Country/Region', 'Lat', 'Long']).stack().reset_index()df.columns = ['Province/State', 'Country/Region', 'Lat', 'Long', 'date', 'value']df['date'] = pd.to_datetime(df['date'])df = df.set_index('date')df = df.pivot_table(index=df.index, columns='Country/Region', values='value', aggfunc=np.sum)print(df)``

Output:

``Country/Region  Afghanistan  Albania  Algeria  Andorra  Angola  ...  West Bank and Gaza  Western Sahara  Yemen  Zambia  Zimbabwedate                                                            ...2020-01-22                0        0        0        0       0  ...                   0               0      0       0         02020-01-23                0        0        0        0       0  ...                   0               0      0       0         02020-01-24                0        0        0        0       0  ...                   0               0      0       0         02020-01-25                0        0        0        0       0  ...                   0               0      0       0         02020-01-26                0        0        0        0       0  ...                   0               0      0       0         0...                     ...      ...      ...      ...     ...  ...                 ...             ...    ...     ...       ...2020-07-30            36542     5197    29831      922    1109  ...               11548              10   1726    5555      30922020-07-31            36675     5276    30394      925    1148  ...               11837              10   1728    5963      31692020-08-01            36710     5396    30950      925    1164  ...               12160              10   1730    6228      36592020-08-02            36710     5519    31465      925    1199  ...               12297              10   1734    6347      39212020-08-03            36747     5620    31972      937    1280  ...               12541              10   1734    6580      4075``

If you now want to do weekly aggregations, it's as simple as:

``print(df.resample('w').sum())``

Output:

``Country/Region  Afghanistan  Albania  Algeria  Andorra  Angola  ...  West Bank and Gaza  Western Sahara  Yemen  Zambia  Zimbabwedate                                                            ...2020-01-26                0        0        0        0       0  ...                   0               0      0       0         02020-02-02                0        0        0        0       0  ...                   0               0      0       0         02020-02-09                0        0        0        0       0  ...                   0               0      0       0         02020-02-16                0        0        0        0       0  ...                   0               0      0       0         02020-02-23                0        0        0        0       0  ...                   0               0      0       0         02020-03-01                7        0        6        0       0  ...                   0               0      0       0         02020-03-08               10        0       85        7       0  ...                  43               0      0       0         02020-03-15               57      160      195        7       0  ...                 209               0      0       0         02020-03-22              175      464      705      409       5  ...                 309               0      0      11         72020-03-29              632     1142     2537     1618      29  ...                 559               0      0     113        312020-04-05             1783     2000     6875     2970      62  ...                1178               4      0     262        592020-04-12             3401     2864    11629     4057     128  ...                1847              30      3     279        842020-04-19             5838     3603    16062     4764     143  ...                2081              42      7     356       1542020-04-26             8918     4606    21211     5087     174  ...                2353              42      7     541       2002020-05-03            15149     5391    27943     5214     208  ...                2432              42     41     738       2442020-05-10            25286     5871    36315     5265     274  ...                2607              42    203    1260       2412020-05-17            39634     6321    45122     5317     327  ...                2632              42    632    3894       2742020-05-24            61342     6798    54185     5332     402  ...                2869              45   1321    5991       3542020-05-31            91885     7517    62849     5344     536  ...                3073              63   1932    7125       8942020-06-07           126442     8378    68842     5868     609  ...                3221              63   3060    7623      16942020-06-14           159822     9689    74147     5967     827  ...                3396              63   4236    8836      23352020-06-21           191378    12463    79737     5981    1142  ...                4466              63   6322    9905      30892020-06-28           210487    15349    87615     5985    1522  ...               10242              70   7360   10512      38132020-07-05           224560    18707   102918     5985    2186  ...               21897              70   8450   11322      44262020-07-12           237087    22399   124588     5985    2940  ...               36949              70   9489   13002      62002020-07-19           245264    26845   149611     6098    4279  ...               52323              70  10855   16350      90582020-07-26           250970    31255   178605     6237    5919  ...               68154              70  11571   26749     149332020-08-02           255739    36370   208457     6429    7648  ...               80685              70  12023   38896     222412020-08-09            36747     5620    31972      937    1280  ...               12541              10   1734    6580      4075``

## SQL Sum amount for column with unique values

As per explanation you provided, I think your requirement is aggregate revenue of selective records that map with another table based on Col2 values. If that is the case then you may try following query.

``WITH    rev_calc AS (        SELECT             distinct(Col2) as Col2        From table_input        LEFT JOIN another_table            ON another_table.Col2 = table_input.Col2    )SELECT      Col0,      Col1,      SUM(Revenue) AS total_revenue FROM table_input WHERE Col2 in (select Col2 from rev_calc)GROUP BY Col0, Col1;``

## Pandas - dataframe groupby - how to get sum of multiple columns

By using `apply`

``df.groupby(['col1', 'col2'])["col3", "col4"].apply(lambda x : x.astype(int).sum())Out[1257]:            col3  col4col1 col2            a    c        2     4     d        1     2b    d        1     2     e        2     4``

If you want to `agg`

``df.groupby(['col1', 'col2']).agg({'col3':'sum','col4':'sum'})``

## How do I sum values in a column that match a given condition using pandas?

The essential idea here is to select the data you want to sum, and then sum them. This selection of data can be done in several different ways, a few of which are shown below.

#### Boolean indexing

Arguably the most common way to select the values is to use Boolean indexing.

With this method, you find out where column 'a' is equal to `1` and then sum the corresponding rows of column 'b'. You can use `loc` to handle the indexing of rows and columns:

``>>> df.loc[df['a'] == 1, 'b'].sum()15``

The Boolean indexing can be extended to other columns. For example if `df` also contained a column 'c' and we wanted to sum the rows in 'b' where 'a' was 1 and 'c' was 2, we'd write:

``df.loc[(df['a'] == 1) & (df['c'] == 2), 'b'].sum()``

#### Query

Another way to select the data is to use `query` to filter the rows you're interested in, select column 'b' and then sum:

``>>> df.query("a == 1")['b'].sum()15``

Again, the method can be extended to make more complicated selections of the data:

``df.query("a == 1 and c == 2")['b'].sum()``

Note this is a little more concise than the Boolean indexing approach.

#### Groupby

The alternative approach is to use `groupby` to split the DataFrame into parts according to the value in column 'a'. You can then sum each part and pull out the value that the 1s added up to:

``>>> df.groupby('a')['b'].sum()[1]15``

This approach is likely to be slower than using Boolean indexing, but it is useful if you want check the sums for other values in column `a`:

``>>> df.groupby('a')['b'].sum()a1    152     8``