Pandas Dataframe Get First Row of Each Group

Pandas dataframe get first row of each group

>>> df.groupby('id').first()
     value
id        
1    first
2    first
3    first
4   second
5    first
6    first
7   fourth

If you need id as column:

>>> df.groupby('id').first().reset_index()
   id   value
0   1   first
1   2   first
2   3   first
3   4  second
4   5   first
5   6   first
6   7  fourth

To get n first records, you can use head():

>>> df.groupby('id').head(2).reset_index(drop=True)
    id   value
0    1   first
1    1  second
2    2   first
3    2  second
4    3   first
5    3   third
6    4  second
7    4   fifth
8    5   first
9    6   first
10   6  second
11   7  fourth
12   7   fifth

pandas: how do I select first row in each GROUP BY group?

Generally if you want your data sorted in a groupby but it's not one of the columns which are going to be grouped on then it's better to sort the df prior to performing groupby:

In [5]:
df.sort_values('B').groupby('A').first()

Out[5]:
     B
A     
bar  1
foo  1

Selecting first row from each subgroup (pandas)

One way is to use groupby + idxmin to get the index of the smallest distance per group, then use loc to get the desired output:

out = df.loc[df.groupby(['date', 'p'])['distance'].idxmin()]

Output:

       v     p  distance        date
0  14.60   sst   22454.1  2021-12-30
3   1.67  wvht   23141.8  2021-12-30
6   1.70  wvht   23141.4  2021-12-31

Extract first record of each group dataframe pandas

This might help:

import pandas as pd

df = pd.read_excel('Example.xlsx', sheet_name='Sheet1')
df['Date']= pd.to_datetime(df['Date'])
df = df.sort_values(['F. No.', 'Date'], ascending=[True, False])
df_first = df.groupby(['F. No.'], as_index=False).head(1)

To make sure that the groupby column does not become an index, pass as_index=False kwarg. Note that .head(1) works because the data is sorted in the previous line.

get first row in a group and assign values

use df.groupby(...).cumcount() to get a counter of rows within the group which you can then manipulate.

In [51]: df
Out[51]:
     a  b  c
0  def  1  0
1  abc  0  1
2  def  1  0
3  abc  0  1

In [52]: df2 = df.sort_values(['a','b','c'])

In [53]: df2['result'] = df2.groupby(['a', 'b', 'c']).cumcount()

In [54]: df2['result'] = np.where(df2['result'] == 0, 1, 0)

In [55]: df2
Out[55]:
     a  b  c  result
1  abc  0  1       1
3  abc  0  1       0
0  def  1  0       1
2  def  1  0       0

Get the first row of each group of unique values in another column

Use groupby + first:

firsts = df.groupby('col_B', as_index=False).first()

Output:

>>> firsts
  col_B  col_A
0     x      1
1    xx      2
2     y      4

If the order of the columns is important:

firsts = df.loc[df.groupby('col_B', as_index=False).first().index]

Output:

>>> firsts
   col_A col_B
0      1     x
1      2    xx
2      3    xx

pandas extract first row column value equal to 1 for each group

First filter only rows with label=1 and then remove duplicates per id by DataFrame.drop_duplicates:

df1 = df[df['label'].eq(1)].drop_duplicates('id')

How to conditionally replace the first row of each group with the second row

How's something like this?

df[df.Status.eq("Active")].drop_duplicates(subset=["Group"], ignore_index=True)

Output:

    Group   Name    Title       Status
0   A       Lisa    n           Active
1   B       Kim     Boss        Active

Stepping through it:

df[df.Status.eq("Active")] grabs only the rows where "Status" is "Active"
drop_duplicates(subset=["Group"] drops all rows after the first occurrence of a new value in "Group" ... e.g returns the first row with group A, then the first row with Group B, etc
ignore_index=True) ignores the above rows' index and resets the index to start back at 0. Without this the index would be [1, 3]

Select the first row from each group after groupby (Multiindex)

To select the first row from the each year, you can do:

print(
    df.reset_index(level="product")
    .groupby(level="year")
    .first()
    .set_index(["product"], append=True)
)

Prints:

                count  sum
year product              
2015 product A      9   23
2016 product A      7   17
2017 product B      9   32
2018 product A      3   33

Pandas Dataframe Get First Row of Each Group