How to Find the First and Last Occurrences of an Element in a Data.Frame

Find first and last occurrence of an item in R dataframe

Making a few assumptions about your data:

  • week is numeric
  • item is always associated with at least one week (no NA weeks)
  • "last" is equivalent to "largest value" for week

Then this dplyr solution should work:

library(dplyr)
df %>%
group_by(item) %>%
summarise(diff = max(week) - min(week)) %>%
ungroup()

# A tibble: 2 x 2
item diff
<int> <dbl>
1 63230 2
2 63233 2

How can I find the first and last occurrences of an element in a data.frame?

You can do this with duplicated and rev (for LAST):

> v1=c(1,1,1,2,2,3,3,3,3,4,4,5)

> data.frame(v1,FIRST=!duplicated(v1),LAST=rev(!duplicated(rev(v1))))
v1 FIRST LAST
1 1 TRUE FALSE
2 1 FALSE FALSE
3 1 FALSE TRUE
4 2 TRUE FALSE
5 2 FALSE TRUE
6 3 TRUE FALSE
7 3 FALSE FALSE
8 3 FALSE FALSE
9 3 FALSE TRUE
10 4 TRUE FALSE
11 4 FALSE TRUE
12 5 TRUE TRUE

pandas - find first occurrence

idxmax and argmax will return the position of the maximal value or the first position if the maximal value occurs more than once.

use idxmax on df.A.ne('a')

df.A.ne('a').idxmax()

3

or the numpy equivalent

(df.A.values != 'a').argmax()

3

However, if A has already been sorted, then we can use searchsorted

df.A.searchsorted('a', side='right')

array([3])

Or the numpy equivalent

df.A.values.searchsorted('a', side='right')

3

How to obtain first and last occurrence of an item in pandas

You can try groupbyand them apply custom function f like:

def f(x):
Doormin = x[x['Door'] == 1].min()
Doormax = x[x['Door'] == 1].max()
Coaster2min = x[x['Coaster2'] == 1].min()
Coaster2max = x[x['Coaster2'] == 1].max()
Coaster1min = x[x['Coaster1'] == 1].min()
Coaster1max = x[x['Coaster1'] == 1].max()
Door = pd.Series([Doormin['Door'], Doormin['SensorDate'], Doormin['SensorTime'], Doormax['SensorTime'], Doormin['RegisteredTime']], index=['Door','SensorDate','SensorTimeFirst','SensorTimeLast','RegisteredTime'])
Coaster1 = pd.Series([Coaster1min['Coaster1'], Coaster1min['SensorDate'], Coaster1min['SensorTime'], Coaster1max['SensorTime'], Coaster1min['RegisteredTime']], index=['Coaster1','SensorDate','SensorTimeFirst','SensorTimeLast','RegisteredTime'])
Coaster2 = pd.Series([Coaster2min['Coaster2'], Coaster2min['SensorDate'], Coaster2min['SensorTime'], Coaster2max['SensorTime'], Coaster2min['RegisteredTime']], index=['Coaster2','SensorDate','SensorTimeFirst','SensorTimeLast','RegisteredTime'])

return pd.DataFrame([Door, Coaster2, Coaster1])

print df.groupby(['User','Activity']).apply(f)

Coaster1 Coaster2 Door RegisteredTime \
User Activity
Chris coffee + hot water 0 NaN NaN 1 13:09:00
1 NaN 1 NaN 13:09:00
2 NaN NaN NaN NaN

SensorDate SensorTimeFirst SensorTimeLast
User Activity
Chris coffee + hot water 0 2015-09-21 13:05:54 13:05:56
1 2015-09-21 13:05:58 13:05:59
2 NaN NaN NaN

And maybe you can add 0 instead of NaN by fillna:

df = df.groupby(['User','Activity']).apply(f)
df[['Coaster1','Coaster2','Door']] = df[['Coaster1','Coaster2','Door']].fillna(0)
print df
Coaster1 Coaster2 Door RegisteredTime \
User Activity
Chris coffee + hot water 0 0 0 1 13:09:00
1 0 1 0 13:09:00
2 0 0 0 NaN

SensorDate SensorTimeFirst SensorTimeLast
User Activity
Chris coffee + hot water 0 2015-09-21 13:05:54 13:05:56
1 2015-09-21 13:05:58 13:05:59
2 NaN NaN NaN

Within rows of data frame, find first occurrence and longest sequence of value

Edit #2: Rewrote as combination of two summarizations.

input_tidy <- input %>%
gather(col, val, -ID) %>%
group_by(ID) %>%
arrange(ID) %>%
mutate(col_num = row_number() + 1)

input[,1] %>%
# Combine with summary of each ID's first zero
left_join(input_tidy %>% filter(val == 0) %>%
summarize(first_0_name = first(col),
first_0_loc = first(col_num))) %>%
# Combine with length of each ID's first post-0 streak of 1's
left_join(input_tidy %>%
filter(val == 1 & cumsum(val == 1 & lag(val, default = 1) == 0) == 1) %>%
summarize(streak_1 = n()))

# A tibble: 10 x 4
ID first_0_name first_0_loc streak_1
<chr> <chr> <dbl> <int>
1 A i9 10 5
2 B i4 5 4
3 C i6 7 8
4 D i8 9 4
5 E i9 10 5
6 F NA NA NA
7 G i1 2 5
8 H i3 4 8
9 I i2 3 NA
10 J i3 4 2

find the first occurrence of a value (from a list of values)in a pandas dataframe and return the index of the row

We can do stack the drop_duplicates

out = df.loc[:,'N2':].stack().drop_duplicates()
0 N2 12
N3 14
N4 40
N5 42
1 N2 5
N3 24
N4 43
N5 45
2 N2 23
N3 28
N4 38
N5 49
3 N2 11
N3 22
N5 41
4 N2 27
N3 30
N4 46
dtype: int64

Extract rows for the first occurrence of a variable in a data frame

t.first <- species[match(unique(species$Taxa), species$Taxa),]

should give you what you're looking for. match returns indices of the first match in the compared vectors, which give you the rows you need.

Python pandas get first and last index, duplicate if first is also the last, of group in data frame

pd.concat

pd.concat([d.iloc[[0, -1]] for _, d in df.groupby('ID')])

ID Date
0 A 1/1/2015
2 A 1/3/2017
3 B 1/3/2017
3 B 1/3/2017
4 C 1/5/2016
5 C 1/7/2016

Using agg

df.groupby('ID').agg(['first', 'last']).stack().reset_index('ID')

ID Date
first A 1/1/2015
last A 1/3/2017
first B 1/3/2017
last B 1/3/2017
first C 1/5/2016
last C 1/7/2016

Access index of last element in data frame

The former answer is now superseded by .iloc:

>>> df = pd.DataFrame({"date": range(10, 64, 8)})
>>> df.index += 17
>>> df
date
17 10
18 18
19 26
20 34
21 42
22 50
23 58
>>> df["date"].iloc[0]
10
>>> df["date"].iloc[-1]
58

The shortest way I can think of uses .iget():

>>> df = pd.DataFrame({"date": range(10, 64, 8)})
>>> df.index += 17
>>> df
date
17 10
18 18
19 26
20 34
21 42
22 50
23 58
>>> df['date'].iget(0)
10
>>> df['date'].iget(-1)
58

Alternatively:

>>> df['date'][df.index[0]]
10
>>> df['date'][df.index[-1]]
58

There's also .first_valid_index() and .last_valid_index(), but depending on whether or not you want to rule out NaNs they might not be what you want.

Remember that df.ix[0] doesn't give you the first, but the one indexed by 0. For example, in the above case, df.ix[0] would produce

>>> df.ix[0]
Traceback (most recent call last):
File "<ipython-input-489-494245247e87>", line 1, in <module>
df.ix[0]
[...]
KeyError: 0


Related Topics



Leave a reply



Submit