How to Find the First and Last Occurrences of an Element in a Data.Frame

Find first and last occurrence of an item in R dataframe

Making a few assumptions about your data:

week is numeric
item is always associated with at least one week (no NA weeks)
"last" is equivalent to "largest value" for week

Then this dplyr solution should work:

library(dplyr)
df %>% 
  group_by(item) %>% 
  summarise(diff = max(week) - min(week)) %>%
  ungroup()

# A tibble: 2 x 2
   item  diff
  <int> <dbl>
1 63230     2
2 63233     2

How can I find the first and last occurrences of an element in a data.frame?

You can do this with duplicated and rev (for LAST):

> v1=c(1,1,1,2,2,3,3,3,3,4,4,5)

> data.frame(v1,FIRST=!duplicated(v1),LAST=rev(!duplicated(rev(v1))))
   v1 FIRST  LAST
1   1  TRUE FALSE
2   1 FALSE FALSE
3   1 FALSE  TRUE
4   2  TRUE FALSE
5   2 FALSE  TRUE
6   3  TRUE FALSE
7   3 FALSE FALSE
8   3 FALSE FALSE
9   3 FALSE  TRUE
10  4  TRUE FALSE
11  4 FALSE  TRUE
12  5  TRUE  TRUE

pandas - find first occurrence

idxmax and argmax will return the position of the maximal value or the first position if the maximal value occurs more than once.

use idxmax on df.A.ne('a')

df.A.ne('a').idxmax()

3

or the numpy equivalent

(df.A.values != 'a').argmax()

3

However, if A has already been sorted, then we can use searchsorted

df.A.searchsorted('a', side='right')

array([3])

Or the numpy equivalent

df.A.values.searchsorted('a', side='right')

3

How to obtain first and last occurrence of an item in pandas

You can try groupbyand them apply custom function f like:

def f(x):
    Doormin = x[x['Door'] == 1].min()
    Doormax = x[x['Door'] == 1].max()
    Coaster2min = x[x['Coaster2'] == 1].min()
    Coaster2max = x[x['Coaster2'] == 1].max()    
    Coaster1min = x[x['Coaster1'] == 1].min()
    Coaster1max = x[x['Coaster1'] == 1].max()      
    Door = pd.Series([Doormin['Door'], Doormin['SensorDate'], Doormin['SensorTime'], Doormax['SensorTime'], Doormin['RegisteredTime']], index=['Door','SensorDate','SensorTimeFirst','SensorTimeLast','RegisteredTime'])
    Coaster1 = pd.Series([Coaster1min['Coaster1'], Coaster1min['SensorDate'], Coaster1min['SensorTime'], Coaster1max['SensorTime'], Coaster1min['RegisteredTime']], index=['Coaster1','SensorDate','SensorTimeFirst','SensorTimeLast','RegisteredTime'])
    Coaster2 = pd.Series([Coaster2min['Coaster2'], Coaster2min['SensorDate'], Coaster2min['SensorTime'], Coaster2max['SensorTime'], Coaster2min['RegisteredTime']], index=['Coaster2','SensorDate','SensorTimeFirst','SensorTimeLast','RegisteredTime'])

    return pd.DataFrame([Door, Coaster2, Coaster1])

print df.groupby(['User','Activity']).apply(f)

                            Coaster1  Coaster2  Door RegisteredTime  \
User  Activity                                                        
Chris coffee + hot water 0       NaN       NaN     1       13:09:00   
                         1       NaN         1   NaN       13:09:00   
                         2       NaN       NaN   NaN            NaN   

                            SensorDate SensorTimeFirst SensorTimeLast  
User  Activity                                                         
Chris coffee + hot water 0  2015-09-21        13:05:54       13:05:56  
                         1  2015-09-21        13:05:58       13:05:59  
                         2         NaN             NaN            NaN

And maybe you can add 0 instead of NaN by fillna:

df = df.groupby(['User','Activity']).apply(f)
df[['Coaster1','Coaster2','Door']] = df[['Coaster1','Coaster2','Door']].fillna(0)
print df
                            Coaster1  Coaster2  Door RegisteredTime  \
User  Activity                                                        
Chris coffee + hot water 0         0         0     1       13:09:00   
                         1         0         1     0       13:09:00   
                         2         0         0     0            NaN   

                            SensorDate SensorTimeFirst SensorTimeLast  
User  Activity                                                         
Chris coffee + hot water 0  2015-09-21        13:05:54       13:05:56  
                         1  2015-09-21        13:05:58       13:05:59  
                         2         NaN             NaN            NaN

Within rows of data frame, find first occurrence and longest sequence of value

Edit #2: Rewrote as combination of two summarizations.

input_tidy <- input %>%
  gather(col, val, -ID) %>%
  group_by(ID) %>%
  arrange(ID) %>%
  mutate(col_num = row_number() + 1) 

input[,1] %>% 
  # Combine with summary of each ID's first zero
  left_join(input_tidy %>% filter(val == 0) %>%
              summarize(first_0_name = first(col),
                        first_0_loc = first(col_num))) %>%
  # Combine with length of each ID's first post-0 streak of 1's
  left_join(input_tidy %>%
              filter(val == 1 & cumsum(val == 1 & lag(val, default = 1) == 0) == 1) %>% 
              summarize(streak_1 = n()))

# A tibble: 10 x 4
   ID    first_0_name first_0_loc streak_1
   <chr> <chr>              <dbl>    <int>
 1 A     i9                    10        5
 2 B     i4                     5        4
 3 C     i6                     7        8
 4 D     i8                     9        4
 5 E     i9                    10        5
 6 F     NA                    NA       NA
 7 G     i1                     2        5
 8 H     i3                     4        8
 9 I     i2                     3       NA
10 J     i3                     4        2

find the first occurrence of a value (from a list of values)in a pandas dataframe and return the index of the row

We can do stack the drop_duplicates

out = df.loc[:,'N2':].stack().drop_duplicates()
0  N2    12
   N3    14
   N4    40
   N5    42
1  N2     5
   N3    24
   N4    43
   N5    45
2  N2    23
   N3    28
   N4    38
   N5    49
3  N2    11
   N3    22
   N5    41
4  N2    27
   N3    30
   N4    46
dtype: int64

Extract rows for the first occurrence of a variable in a data frame

t.first <- species[match(unique(species$Taxa), species$Taxa),]

should give you what you're looking for. match returns indices of the first match in the compared vectors, which give you the rows you need.

Python pandas get first and last index, duplicate if first is also the last, of group in data frame

`pd.concat`

pd.concat([d.iloc[[0, -1]] for _, d in df.groupby('ID')])

  ID      Date
0  A  1/1/2015
2  A  1/3/2017
3  B  1/3/2017
3  B  1/3/2017
4  C  1/5/2016
5  C  1/7/2016

Using `agg`

df.groupby('ID').agg(['first', 'last']).stack().reset_index('ID')

      ID      Date
first  A  1/1/2015
last   A  1/3/2017
first  B  1/3/2017
last   B  1/3/2017
first  C  1/5/2016
last   C  1/7/2016

Access index of last element in data frame

The former answer is now superseded by .iloc:

>>> df = pd.DataFrame({"date": range(10, 64, 8)})
>>> df.index += 17
>>> df
    date
17    10
18    18
19    26
20    34
21    42
22    50
23    58
>>> df["date"].iloc[0]
10
>>> df["date"].iloc[-1]
58

The shortest way I can think of uses .iget():

>>> df = pd.DataFrame({"date": range(10, 64, 8)})
>>> df.index += 17
>>> df
    date
17    10
18    18
19    26
20    34
21    42
22    50
23    58
>>> df['date'].iget(0)
10
>>> df['date'].iget(-1)
58

Alternatively:

>>> df['date'][df.index[0]]
10
>>> df['date'][df.index[-1]]
58

There's also .first_valid_index() and .last_valid_index(), but depending on whether or not you want to rule out NaNs they might not be what you want.

Remember that df.ix[0] doesn't give you the first, but the one indexed by 0. For example, in the above case, df.ix[0] would produce

>>> df.ix[0]
Traceback (most recent call last):
  File "<ipython-input-489-494245247e87>", line 1, in <module>
    df.ix[0]
[...]
KeyError: 0

How to Find the First and Last Occurrences of an Element in a Data.Frame

Find first and last occurrence of an item in R dataframe

How can I find the first and last occurrences of an element in a data.frame?

pandas - find first occurrence

How to obtain first and last occurrence of an item in pandas

Within rows of data frame, find first occurrence and longest sequence of value

find the first occurrence of a value (from a list of values)in a pandas dataframe and return the index of the row

Extract rows for the first occurrence of a variable in a data frame

Python pandas get first and last index, duplicate if first is also the last, of group in data frame

`pd.concat`

Using `agg`

Access index of last element in data frame

Related Topics

Leave a reply

Find first and last occurrence of an item in R dataframe

How can I find the first and last occurrences of an element in a data.frame?

pandas - find first occurrence

How to obtain first and last occurrence of an item in pandas

Within rows of data frame, find first occurrence and longest sequence of value

find the first occurrence of a value (from a list of values)in a pandas dataframe and return the index of the row

Extract rows for the first occurrence of a variable in a data frame

Python pandas get first and last index, duplicate if first is also the last, of group in data frame

pd.concat

Using agg

Access index of last element in data frame

Related Topics

Leave a reply

`pd.concat`

Using `agg`