Pandas dataframe get first row of each group
>>> df.groupby('id').first()
value
id
1 first
2 first
3 first
4 second
5 first
6 first
7 fourth
If you need id
as column:
>>> df.groupby('id').first().reset_index()
id value
0 1 first
1 2 first
2 3 first
3 4 second
4 5 first
5 6 first
6 7 fourth
To get n first records, you can use head():
>>> df.groupby('id').head(2).reset_index(drop=True)
id value
0 1 first
1 1 second
2 2 first
3 2 second
4 3 first
5 3 third
6 4 second
7 4 fifth
8 5 first
9 6 first
10 6 second
11 7 fourth
12 7 fifth
pandas: how do I select first row in each GROUP BY group?
Generally if you want your data sorted in a groupby but it's not one of the columns which are going to be grouped on then it's better to sort
the df prior to performing groupby
:
In [5]:
df.sort_values('B').groupby('A').first()
Out[5]:
B
A
bar 1
foo 1
Selecting first row from each subgroup (pandas)
One way is to use groupby
+ idxmin
to get the index of the smallest distance per group, then use loc
to get the desired output:
out = df.loc[df.groupby(['date', 'p'])['distance'].idxmin()]
Output:
v p distance date
0 14.60 sst 22454.1 2021-12-30
3 1.67 wvht 23141.8 2021-12-30
6 1.70 wvht 23141.4 2021-12-31
Extract first record of each group dataframe pandas
This might help:
import pandas as pd
df = pd.read_excel('Example.xlsx', sheet_name='Sheet1')
df['Date']= pd.to_datetime(df['Date'])
df = df.sort_values(['F. No.', 'Date'], ascending=[True, False])
df_first = df.groupby(['F. No.'], as_index=False).head(1)
To make sure that the groupby column does not become an index, pass as_index=False
kwarg. Note that .head(1)
works because the data is sorted in the previous line.
get first row in a group and assign values
use df.groupby(...).cumcount()
to get a counter of rows within the group which you can then manipulate.
In [51]: df
Out[51]:
a b c
0 def 1 0
1 abc 0 1
2 def 1 0
3 abc 0 1
In [52]: df2 = df.sort_values(['a','b','c'])
In [53]: df2['result'] = df2.groupby(['a', 'b', 'c']).cumcount()
In [54]: df2['result'] = np.where(df2['result'] == 0, 1, 0)
In [55]: df2
Out[55]:
a b c result
1 abc 0 1 1
3 abc 0 1 0
0 def 1 0 1
2 def 1 0 0
Get the first row of each group of unique values in another column
Use groupby
+ first
:
firsts = df.groupby('col_B', as_index=False).first()
Output:
>>> firsts
col_B col_A
0 x 1
1 xx 2
2 y 4
If the order of the columns is important:
firsts = df.loc[df.groupby('col_B', as_index=False).first().index]
Output:
>>> firsts
col_A col_B
0 1 x
1 2 xx
2 3 xx
pandas extract first row column value equal to 1 for each group
First filter only rows with label=1
and then remove duplicates per id
by DataFrame.drop_duplicates
:
df1 = df[df['label'].eq(1)].drop_duplicates('id')
How to conditionally replace the first row of each group with the second row
How's something like this?
df[df.Status.eq("Active")].drop_duplicates(subset=["Group"], ignore_index=True)
Output:
Group Name Title Status
0 A Lisa n Active
1 B Kim Boss Active
Stepping through it:
df[df.Status.eq("Active")]
grabs only the rows where "Status" is "Active"drop_duplicates(subset=["Group"]
drops all rows after the first occurrence of a new value in "Group" ... e.g returns the first row with group A, then the first row with Group B, etcignore_index=True)
ignores the above rows' index and resets the index to start back at 0. Without this the index would be[1, 3]
Select the first row from each group after groupby (Multiindex)
To select the first row from the each year, you can do:
print(
df.reset_index(level="product")
.groupby(level="year")
.first()
.set_index(["product"], append=True)
)
Prints:
count sum
year product
2015 product A 9 23
2016 product A 7 17
2017 product B 9 32
2018 product A 3 33
Related Topics
Differences Between Staticfiles_Dir, Static_Root and Media_Root
Iterate Over Model Instance Field Names and Values in Template
Python Creating a Dictionary of Lists
Compare Two Columns Using Pandas
Find Column Whose Name Contains a Specific String
Why Can Tuples Contain Mutable Items
Creating a Dynamic Choice Field
Appending Turns My List to Nonetype
Reshape Wide to Long in Pandas
Python Operator Precedence of in and Comparison
Rendering Text with Multiple Lines in Pygame
Keras Not Training on Entire Dataset
Why Python Recursive Function Returns None
Why Do We Use _Init_ in Python Classes
How to Take Column-Slices of Dataframe in Pandas
Evenly Distributing N Points on a Sphere