Extract Rows for the First Occurrence of a Variable in a Data Frame

Extract rows for the first occurrence of a variable in a data frame

t.first <- species[match(unique(species$Taxa), species$Taxa),]

should give you what you're looking for. match returns indices of the first match in the compared vectors, which give you the rows you need.

Extract rows for the first occurrence of a variable in a group

Using data.table:

library(data.table)
setDT(DT)
DT[,{
id=head(grep("\\.wav",message),1)
list(message=message[id],timestamp=timestamp[id])
},subj_trial]

# subj_trial message timestamp
# 1: 1_1 test1.wav 755662
# 2: 1_2 test2.wav 775662
# 3: 1_3 test3.wav 817794
# 4: 2_1 test1.wav 817347
# 5: 2_2 test2.wav 922671
# 6: 2_3 test3.wav 1036899

Extract rows for the first occurrence of a variable in a data frame when date is on the x axis and states

You can keep rows where Counts > 0 and then for each FIPS select the 1st row.

library(dplyr)
df %>%
filter(Counts > 0) %>%
group_by(FIPS) %>%
slice(1L)

# FIPS Date Counts
# <int> <chr> <int>
#1 1001 Jan_23 1
#2 1003 Jan_22 1
#3 1004 Jan_24 1

Extract all BUT the first occurrence of a variable in a data frame

You can do this with the data.table package quite easily.

library(data.table)
setDT(df)[, .SD[-1], by = ID]
# ID DATE N Price
# 1: 1 2013-03-18 1 9.99
# 2: 1 2013-04-13 2 19.99
# 3: 2 2013-05-11 2 19.99

where df is your original data. This removes the first row for each group, grouped by ID.

Another option is the dplyr package.

library(dplyr)
slice(group_by(df, ID), -1)
# ID DATE N Price
# (int) (fctr) (int) (dbl)
# 1 1 2013-03-18 1 9.99
# 2 1 2013-04-13 2 19.99
# 3 2 2013-05-11 2 19.99

These remove the first row for all groups. You don't specify what should happen if there is only one row for a group. Should you need to keep those rows, you would need to account for that. So let's add a single row as a group and take a look.

dff <- rbind(df, df[4, ])
dff[6, 1] <- 3

Then the data.table code would be

setDT(dff)[, .SD[if(.N == 1L) 1 else -1], by = ID]
# ID DATE N Price
# 1: 1 2013-03-18 1 9.99
# 2: 1 2013-04-13 2 19.99
# 3: 2 2013-05-11 2 19.99
# 4: 3 2013-02-18 1 18.99

and the dplyr code would be

slice(group_by(dff, ID), if(n() == 1L) 1 else -1)
# ID DATE N Price
# (dbl) (fctr) (int) (dbl)
# 1 1 2013-03-18 1 9.99
# 2 1 2013-04-13 2 19.99
# 3 2 2013-05-11 2 19.99
# 4 3 2013-02-18 1 18.99

for those situations.

Data:

df <- structure(list(ID = c(1L, 1L, 1L, 2L, 2L), DATE = structure(c(1L, 
3L, 4L, 2L, 5L), .Label = c("2013-02-04", "2013-02-18", "2013-03-18",
"2013-04-13", "2013-05-11"), class = "factor"), N = c(3L, 1L,
2L, 1L, 2L), Price = c(29.99, 9.99, 19.99, 18.99, 19.99)), .Names = c("ID",
"DATE", "N", "Price"), class = "data.frame", row.names = c(NA,
-5L))

find first occurrence in two variables in df

I think this is what you want:

library(tidyverse)
df %>% group_by(group, trial) %>% filter(x > 30 & y > 30) %>% slice(1:2)

Result:

# A tibble: 16 x 5
# Groups: group, trial [8]
group trial x y hour
<chr> <dbl> <dbl> <dbl> <int>
1 A 1 33.5 46.3 4
2 A 1 32.6 42.7 11
3 A 2 35.9 43.6 4
4 A 2 30.5 42.7 14
5 B 1 33.0 38.1 2
6 B 1 40.5 30.4 7
7 B 2 48.6 33.2 2
8 B 2 34.1 30.9 4
9 C 1 33.0 45.1 1
10 C 1 30.3 36.7 17
11 C 2 44.8 33.9 1
12 C 2 41.5 35.6 6
13 D 1 44.2 34.3 12
14 D 1 39.1 40.0 23
15 D 2 39.4 47.5 4
16 D 2 42.1 40.1 10

(slightly different from your results, probably a different R version)

select first occurrence of variable with prefix in dataframe

I like this explanation of the problem:

drop all variables with that prefix except the first occurrence.

select(iris, !starts_with("Sepal")[-1])
# Sepal.Length Petal.Length Petal.Width Species
# 1 5.1 1.4 0.2 setosa
# 2 4.9 1.4 0.2 setosa
# ...

starts_with("Sepal") of course returns all columns that start with "Sepal", we can use [-1] to remove the first match, and ! to drop any remaining matches.

It does seem a little like black magic - if we were doing this in base R, the [-1] would be appropriate if we used which() to get column indices, and the ! would be appropriate if we didn't use which() and had a logical vector, but somehow the tidyselect functionality makes it work!

Extract rows using multiple conditions related to the order of occurrence of zero and one in R rule()

You could create a small function that reflects your four conditions, and then apply that function by group

f <- function(z,p) {
p1 = which(p==1)
z0 = which(z==0)
c1 = c(p1[1],p1[length(p1)])
c2 = ifelse(z[p1[1]-1]==0, as.integer(p1[1]-1),as.integer(NA))
c3 = min(z0[which(z0>p1[1])], na.rm=T)
c4 = max(p1,z0, na.rm=T)
unique(c(c1,c2,c3,c4))
}

Now, apply that function by group

libary(dplyr)
df %>%
group_by(ID) %>%
filter(row_number() %in% f(zero,pos))

Output:

     ID var    zero   pos
<dbl> <chr> <dbl> <dbl>
1 60 X2 0 NA
2 60 X3 NA 1
3 60 X6 0 NA
4 60 X9 NA 1
5 61 X1 NA 1
6 61 X4 0 NA
7 61 X9 NA 1
8 61 X10 0 NA

Or, using data.table

library(data.table)
setDT(df)[, .SD[f(zero,pos)], by=ID]

Output:

      ID    var  zero   pos
<num> <char> <num> <num>
1: 60 X3 NA 1
2: 60 X9 NA 1
3: 60 X2 0 NA
4: 60 X6 0 NA
5: 61 X1 NA 1
6: 61 X9 NA 1
7: 61 X4 0 NA
8: 61 X10 0 NA


Related Topics



Leave a reply



Submit