Python Keep other columns when using sum() with groupby
Something like ?(Assuming you have same otherstuff1 and otherstuff2 under the same name )
df.groupby(['name','otherstuff1','otherstuff2'],as_index=False).sum()
Out[121]:
name otherstuff1 otherstuff2 value1 value2
0 Jack 1.19 2.39 2 3
1 Luke 1.08 1.08 1 1
2 Mark 3.45 3.45 0 1
Keep other columns when doing groupby
Method #1: use idxmin()
to get the indices of the elements of minimum diff
, and then select those:
>>> df.loc[df.groupby("item")["diff"].idxmin()]
item diff otherstuff
1 1 1 2
6 2 -6 2
7 3 0 0
[3 rows x 3 columns]
Method #2: sort by diff
, and then take the first element in each item
group:
>>> df.sort_values("diff").groupby("item", as_index=False).first()
item diff otherstuff
0 1 1 2
1 2 -6 2
2 3 0 0
[3 rows x 3 columns]
Note that the resulting indices are different even though the row content is the same.
Panda Group by sum specific columns and keep other columns
Pandas support missing values in groupby
from 1.1
version, link.
First idea is create new helper column new
with replace missing values to some string, e.g. miss
, then grouping by new
with aggregate by GroupBy.agg
with GroupBy.first
, last remove helper level by first reset_index
:
df = (df.assign(new= df['ColToKeep'].fillna('miss'))
.groupby(['User', 'new'], sort=False)
.agg({'Col1ToSum':'sum', 'Col2ToSum':'sum', 'ColToKeep':'first'})
.reset_index(level=1, drop=True)
.reset_index())
print (df)
User Col1ToSum Col2ToSum ColToKeep
0 ABC 40 650 1.015
1 ABA 180 100 2.240
2 AAA 60 20 NaN
3 BBB 10 15 NaN
4 XYZ 10 10 1.100
5 XYZ 10 10 1.500
Another idea is replace back miss
to NaN
s:
df = (df.assign(ColToKeep = df['ColToKeep'].fillna('miss'))
.groupby(['User', 'ColToKeep'], sort=False)[['Col1ToSum', 'Col2ToSum']].sum()
.reset_index()
.replace({'ColToKeep': {'miss':np.nan}}))
print (df)
User ColToKeep Col1ToSum Col2ToSum
0 ABC 1.015 40 650
1 ABA 2.240 180 100
2 AAA NaN 60 20
3 BBB NaN 10 15
4 XYZ 1.100 10 10
5 XYZ 1.500 10 10
How to GroupBy a Dataframe in Pandas and keep Columns
You want the following:
In [20]:
df.groupby(['Name','Type','ID']).count().reset_index()
Out[20]:
Name Type ID Count
0 Book1 ebook 1 2
1 Book2 paper 2 2
2 Book3 paper 3 1
In your case the 'Name', 'Type' and 'ID' cols match in values so we can groupby
on these, call count
and then reset_index
.
An alternative approach would be to add the 'Count' column using transform
and then call drop_duplicates
:
In [25]:
df['Count'] = df.groupby(['Name'])['ID'].transform('count')
df.drop_duplicates()
Out[25]:
Name Type ID Count
0 Book1 ebook 1 2
1 Book2 paper 2 2
2 Book3 paper 3 1
Pandas groupby apply on one column and keeping the other columns
There is groupby().agg
:
df.groupby('name').agg({
'value1': complex_function,
'otherstuff1': 'first',
'otherstuff2':'first'
})
Pandas groupby multiple columns and retain all other columns
I was able to get the desired result by including the other columns in the agg
funtion with 'first'
while the 'QtyOrdered' & 'QtyShipped' are subject to 'sum'
.
ActualOrders = PreActualOrders.groupby(['OrderNo','ItemCode']).agg({'OrderDate': 'first', 'LineNo': 'first', 'ClientNo': 'first', 'QtyOrdered': 'sum', 'QtyShipped': 'sum' }).reset_index()
Yeilds my desired reult of:
OrderNo ItemCode OrderDate LineNo ClientNo QtyOrdered QtyShipped
28255 543734 1038324 2/27/2017 3 1254787 1 1
28256 543734 10137992 2/27/2017 1 1254787 1 1
28257 543734 10137993 2/27/2017 2 1254787 1 1
28258 543735 1041106 2/27/2017 4 1816460 1 1
28259 543735 1041108 2/27/2017 3 1816460 1 1
28260 543735 10135359 2/27/2017 2 1816460 1 1
28261 543735 10137993 2/27/2017 1 1816460 1 1
The output example doesn't show any difference between Qty ordered and shipped because the number of matching cancels is very small. The rows which have a corresponding cancel are correctly adjusted.
Groupby multiple columns and get the sum of two other columns while keeping the first occurrence of every other column
I think that using two separate operations on the groupby object and join them afterwards is clearer than a one-liner. Here is a minimal example, grouping on 1 column:
df = pd.DataFrame(
[
("bird", "Falconiformes", 389.0, 5.5, 1),
("bird", "Psittaciformes", 24.0, 4.5, 2),
("mammal", "Carnivora", 80.2, 33.3, 1),
("mammal", "Primates", np.nan, 33.7, 2),
("mammal", "Carnivora", 58, 23, 3),
],
index=["falcon", "parrot", "lion", "monkey", "leopard"],
columns=("class", "family", "max_speed", "height", "order"),
)
print(df, "\n")
grouped = df.groupby('class')
df_sum = grouped[['max_speed', 'height']].agg(sum)
df_first = grouped['order'].first()
df_out = pd.concat([df_sum, df_first], axis=1)
print(df_out)
Output:
class family max_speed height order
falcon bird Falconiformes 389.0 5.5 1
parrot bird Psittaciformes 24.0 4.5 2
lion mammal Carnivora 80.2 33.3 1
monkey mammal Primates NaN 33.7 2
leopard mammal Carnivora 58.0 23.0 3
max_speed height order
class
bird 413.0 10.0 1
mammal 138.2 90.0 1
Is there a way i can use groupby.sum and keep other columns?
You can partition by columns while keeping the other columns using transform:
df['sum'] = df.groupby([1,2,4])[5].transform(sum)
This will simply add a column that has the aggregation at the grouped level for all rows in the original dataframe.
Related Topics
How to Update a Label Inside While Loop in Tkinter
How to Test If an Enum Member With a Certain Name Exists
How to Find the Average Colour of an Image in Python With Opencv
Python: Pandas Pd.Read_Excel Giving Importerror: Install Xlrd >= 0.9.0 for Excel Support
Calculating the Area Under a Curve Given a Set of Coordinates, Without Knowing the Function
Decode Utf-8 Encoding in Json String
How to Delete All Columns in Dataframe Except Certain Ones
How to Find Duration Between Two Time Difference in Python Dataframe
How to Index a Middle Character in a List in Python
Find the Index of a Value in a 2D Array
How to Convert a 1 Channel Image into a 3 Channel With Opencv2
Print the Student Name and the Score of Student in Python3
Add Numpy Array as Column to Pandas Data Frame
How to Concatenate Multiple Column Values into a Single Column in Pandas Dataframe
Move Files Between Two Aws S3 Buckets Using Boto3