Keep Other Columns When Doing Groupby

Python Keep other columns when using sum() with groupby

Something like ?(Assuming you have same otherstuff1 and otherstuff2 under the same name )

df.groupby(['name','otherstuff1','otherstuff2'],as_index=False).sum()
Out[121]: 
   name  otherstuff1  otherstuff2  value1  value2
0  Jack         1.19         2.39       2       3
1  Luke         1.08         1.08       1       1
2  Mark         3.45         3.45       0       1

Keep other columns when doing groupby

Method #1: use idxmin() to get the indices of the elements of minimum diff, and then select those:

>>> df.loc[df.groupby("item")["diff"].idxmin()]
   item  diff  otherstuff
1     1     1           2
6     2    -6           2
7     3     0           0

[3 rows x 3 columns]

Method #2: sort by diff, and then take the first element in each item group:

>>> df.sort_values("diff").groupby("item", as_index=False).first()
   item  diff  otherstuff
0     1     1           2
1     2    -6           2
2     3     0           0

[3 rows x 3 columns]

Note that the resulting indices are different even though the row content is the same.

Panda Group by sum specific columns and keep other columns

Pandas support missing values in groupby from 1.1 version, link.

First idea is create new helper column new with replace missing values to some string, e.g. miss, then grouping by new with aggregate by GroupBy.agg with GroupBy.first, last remove helper level by first reset_index:

df = (df.assign(new= df['ColToKeep'].fillna('miss'))
       .groupby(['User', 'new'], sort=False)
       .agg({'Col1ToSum':'sum', 'Col2ToSum':'sum', 'ColToKeep':'first'})
       .reset_index(level=1, drop=True)
       .reset_index())
print (df)
  User  Col1ToSum  Col2ToSum  ColToKeep
0  ABC         40        650      1.015
1  ABA        180        100      2.240
2  AAA         60         20        NaN
3  BBB         10         15        NaN
4  XYZ         10         10      1.100
5  XYZ         10         10      1.500

Another idea is replace back miss to NaNs:

df = (df.assign(ColToKeep = df['ColToKeep'].fillna('miss'))
       .groupby(['User', 'ColToKeep'], sort=False)[['Col1ToSum', 'Col2ToSum']].sum()
       .reset_index()
       .replace({'ColToKeep': {'miss':np.nan}}))
print (df)
  User  ColToKeep  Col1ToSum  Col2ToSum
0  ABC      1.015         40        650
1  ABA      2.240        180        100
2  AAA        NaN         60         20
3  BBB        NaN         10         15
4  XYZ      1.100         10         10
5  XYZ      1.500         10         10

How to GroupBy a Dataframe in Pandas and keep Columns

You want the following:

In [20]:
df.groupby(['Name','Type','ID']).count().reset_index()

Out[20]:
    Name   Type  ID  Count
0  Book1  ebook   1      2
1  Book2  paper   2      2
2  Book3  paper   3      1

In your case the 'Name', 'Type' and 'ID' cols match in values so we can groupby on these, call count and then reset_index.

An alternative approach would be to add the 'Count' column using transform and then call drop_duplicates:

In [25]:
df['Count'] = df.groupby(['Name'])['ID'].transform('count')
df.drop_duplicates()

Out[25]:
    Name   Type  ID  Count
0  Book1  ebook   1      2
1  Book2  paper   2      2
2  Book3  paper   3      1

Pandas groupby apply on one column and keeping the other columns

There is groupby().agg:

df.groupby('name').agg({
   'value1': complex_function, 
   'otherstuff1': 'first', 
   'otherstuff2':'first'
})

Pandas groupby multiple columns and retain all other columns

I was able to get the desired result by including the other columns in the agg funtion with 'first' while the 'QtyOrdered' & 'QtyShipped' are subject to 'sum'.

ActualOrders = PreActualOrders.groupby(['OrderNo','ItemCode']).agg({'OrderDate': 'first', 'LineNo': 'first', 'ClientNo': 'first', 'QtyOrdered': 'sum', 'QtyShipped': 'sum' }).reset_index()

Yeilds my desired reult of:

      OrderNo   ItemCode    OrderDate LineNo ClientNo QtyOrdered QtyShipped
28255   543734  1038324     2/27/2017   3   1254787     1   1
28256   543734  10137992    2/27/2017   1   1254787     1   1
28257   543734  10137993    2/27/2017   2   1254787     1   1
28258   543735  1041106     2/27/2017   4   1816460     1   1
28259   543735  1041108     2/27/2017   3   1816460     1   1
28260   543735  10135359    2/27/2017   2   1816460     1   1
28261   543735  10137993    2/27/2017   1   1816460     1   1

The output example doesn't show any difference between Qty ordered and shipped because the number of matching cancels is very small. The rows which have a corresponding cancel are correctly adjusted.

Groupby multiple columns and get the sum of two other columns while keeping the first occurrence of every other column

I think that using two separate operations on the groupby object and join them afterwards is clearer than a one-liner. Here is a minimal example, grouping on 1 column:

df = pd.DataFrame(
    [
        ("bird", "Falconiformes", 389.0, 5.5, 1),
        ("bird", "Psittaciformes", 24.0, 4.5, 2),
        ("mammal", "Carnivora", 80.2, 33.3, 1),
        ("mammal", "Primates", np.nan, 33.7, 2),
        ("mammal", "Carnivora", 58, 23, 3),
    ],
    index=["falcon", "parrot", "lion", "monkey", "leopard"],
    columns=("class", "family", "max_speed", "height", "order"),
)
print(df, "\n")

grouped = df.groupby('class')
df_sum = grouped[['max_speed', 'height']].agg(sum)
df_first = grouped['order'].first()
df_out = pd.concat([df_sum, df_first], axis=1)
print(df_out)

Output:

          class          family  max_speed  height  order
falcon     bird   Falconiformes      389.0     5.5      1
parrot     bird  Psittaciformes       24.0     4.5      2
lion     mammal       Carnivora       80.2    33.3      1
monkey   mammal        Primates        NaN    33.7      2
leopard  mammal       Carnivora       58.0    23.0      3 

        max_speed  height  order
class                           
bird        413.0    10.0      1
mammal      138.2    90.0      1

Is there a way i can use groupby.sum and keep other columns?

You can partition by columns while keeping the other columns using transform:

df['sum'] = df.groupby([1,2,4])[5].transform(sum)

This will simply add a column that has the aggregation at the grouped level for all rows in the original dataframe.