Pandas Dataframe Get First Row of Each Group

Pandas dataframe get first row of each group

>>> df.groupby('id').first()
value
id
1 first
2 first
3 first
4 second
5 first
6 first
7 fourth

If you need id as column:

>>> df.groupby('id').first().reset_index()
id value
0 1 first
1 2 first
2 3 first
3 4 second
4 5 first
5 6 first
6 7 fourth

To get n first records, you can use head():

>>> df.groupby('id').head(2).reset_index(drop=True)
id value
0 1 first
1 1 second
2 2 first
3 2 second
4 3 first
5 3 third
6 4 second
7 4 fifth
8 5 first
9 6 first
10 6 second
11 7 fourth
12 7 fifth

pandas: how do I select first row in each GROUP BY group?

Generally if you want your data sorted in a groupby but it's not one of the columns which are going to be grouped on then it's better to sort the df prior to performing groupby:

In [5]:
df.sort_values('B').groupby('A').first()

Out[5]:
B
A
bar 1
foo 1

Selecting first row from each subgroup (pandas)

One way is to use groupby + idxmin to get the index of the smallest distance per group, then use loc to get the desired output:

out = df.loc[df.groupby(['date', 'p'])['distance'].idxmin()]

Output:

       v     p  distance        date
0 14.60 sst 22454.1 2021-12-30
3 1.67 wvht 23141.8 2021-12-30
6 1.70 wvht 23141.4 2021-12-31

Extract first record of each group dataframe pandas

This might help:

import pandas as pd

df = pd.read_excel('Example.xlsx', sheet_name='Sheet1')
df['Date']= pd.to_datetime(df['Date'])
df = df.sort_values(['F. No.', 'Date'], ascending=[True, False])
df_first = df.groupby(['F. No.'], as_index=False).head(1)

To make sure that the groupby column does not become an index, pass as_index=False kwarg. Note that .head(1) works because the data is sorted in the previous line.

get first row in a group and assign values

use df.groupby(...).cumcount() to get a counter of rows within the group which you can then manipulate.

In [51]: df
Out[51]:
a b c
0 def 1 0
1 abc 0 1
2 def 1 0
3 abc 0 1

In [52]: df2 = df.sort_values(['a','b','c'])

In [53]: df2['result'] = df2.groupby(['a', 'b', 'c']).cumcount()

In [54]: df2['result'] = np.where(df2['result'] == 0, 1, 0)

In [55]: df2
Out[55]:
a b c result
1 abc 0 1 1
3 abc 0 1 0
0 def 1 0 1
2 def 1 0 0

Get the first row of each group of unique values in another column

Use groupby + first:

firsts = df.groupby('col_B', as_index=False).first()

Output:

>>> firsts
col_B col_A
0 x 1
1 xx 2
2 y 4

If the order of the columns is important:

firsts = df.loc[df.groupby('col_B', as_index=False).first().index]

Output:

>>> firsts
col_A col_B
0 1 x
1 2 xx
2 3 xx

pandas extract first row column value equal to 1 for each group

First filter only rows with label=1 and then remove duplicates per id by DataFrame.drop_duplicates:

df1 = df[df['label'].eq(1)].drop_duplicates('id')

How to conditionally replace the first row of each group with the second row

How's something like this?

df[df.Status.eq("Active")].drop_duplicates(subset=["Group"], ignore_index=True)

Output:

    Group   Name    Title       Status
0 A Lisa n Active
1 B Kim Boss Active

Stepping through it:

  • df[df.Status.eq("Active")] grabs only the rows where "Status" is "Active"
  • drop_duplicates(subset=["Group"] drops all rows after the first occurrence of a new value in "Group" ... e.g returns the first row with group A, then the first row with Group B, etc
  • ignore_index=True) ignores the above rows' index and resets the index to start back at 0. Without this the index would be [1, 3]

Select the first row from each group after groupby (Multiindex)

To select the first row from the each year, you can do:

print(
df.reset_index(level="product")
.groupby(level="year")
.first()
.set_index(["product"], append=True)
)

Prints:

                count  sum
year product
2015 product A 9 23
2016 product A 7 17
2017 product B 9 32
2018 product A 3 33


Related Topics



Leave a reply



Submit