Returning Above and Below Rows of Specific Rows in R Dataframe

Returning above and below rows of specific rows in r dataframe

Try that:

extract.with.context <- function(x, rows, after = 0, before = 0) {

match.idx <- which(rownames(x) %in% rows)
span <- seq(from = -before, to = after)
extend.idx <- c(outer(match.idx, span, `+`))
extend.idx <- Filter(function(i) i > 0 & i <= nrow(x), extend.idx)
extend.idx <- sort(unique(extend.idx))

return(x[extend.idx, , drop = FALSE])
}

dat <- data.frame(x = 1:26, row.names = letters)
extract.with.context(dat, c("a", "b", "j", "y"), after = 3, before = 1)
# x
# a 1
# b 2
# c 3
# d 4
# e 5
# i 9
# j 10
# k 11
# l 12
# m 13
# x 24
# y 25
# z 26

Select N rows above and below match

This seems to be a simple question but is not as trivial as presumably expected.

The issue is that which(mtcars$vs == 1) returns a vector rather than a single value:

[1]  3  4  6  8  9 10 11 18 19 20 21 26 28 32

If another vector -1:1 (which is c(-1L, 0L, 1L)) is added to it, the normal R rules for operations on vectors of unequal lengths apply: The recycling rule says

Any short vector operands are extended by recycling their values until
they match the size of any other operands.

Therefore the shorter vector -1:1 will be recycled to the length of which(mtcars$vs == 1), i.e.,

rep(-1:1, length.out = length(which(mtcars$vs == 1)))
 [1] -1  0  1 -1  0  1 -1  0  1 -1  0  1 -1  0

Therefore, the result of

which(mtcars$vs == 1) + -1:1

is the element-wise sum of the elements of both vectors where the shorter vector has been recycled to match the length of the longer vector.

 [1]  2  4  7  7  9 11 10 18 20 19 21 27 27 32

which is propably not what the OP has expected.

In addition, we get the

Warning message:

In which(mtcars$vs == 1) + -1:1 :

longer object length is not a multiple of shorter object length

because which(mtcars$vs == 1) has length 14 and -1:1 has length 3.

Solution using outer()

In order to select the N rows above and below each matching row, we need to add -N:N to each row number returned by which(mtcars$vs == 1):

outer(which(mtcars$vs == 1), -1:1, `+`)

[,1] [,2] [,3]
[1,] 2 3 4
[2,] 3 4 5
[3,] 5 6 7
[4,] 7 8 9
[5,] 8 9 10
[6,] 9 10 11
[7,] 10 11 12
[8,] 17 18 19
[9,] 18 19 20
[10,] 19 20 21
[11,] 20 21 22
[12,] 25 26 27
[13,] 27 28 29
[14,] 31 32 33

Now, we have an array of all row numbers. Unfortunately, it cannot be used directly for subsetting because it contains duplicates and there are row numbers which do not exist in mtcars. So the the result has to be "post-processed" before it can be used for subsetting.

library(magrittr) # piping used for clarity
rn <- outer(which(mtcars$vs == 1), -1:1, `+`) %>%
as.vector() %>%
unique() %>%
Filter(function(x) x[1 <= x & x <= nrow(mtcars)], .)

rn
 [1]  2  3  4  5  6  7  8  9 10 11 12 17 18 19 20 21 22 25 26 27 28 29 31 32
mtcars[rn, ]
                   mpg cyl  disp  hp drat    wt  qsec vs am gear carb
Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4
Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3
Chrysler Imperial 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4
Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1
Dodge Challenger 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2
Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2
Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1
Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2
Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4
Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8
Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2

Extracting n rows above and below a given row number (variable n)

A possible solution:

r <- rep(as.numeric(row.names(dat)), 2 * dat$length + 1)
u <- unlist(Map(':', -dat$length, dat$length))
idx <- r + u

Now you can extract these rows from dat.ori with:

dat.ori[idx, ]

Or:

dat.ori[r + u, ]

How to read rows above and below a specified value

Using awk with read.table

df1 <- read.table(pipe("awk 'BEGIN {FS=\" \"} {if ($1 >180 && $1 < 800) print $0}' cn_EP27_L1.sizes.txt"))

Select rows above and below max group value dplyr

I think your intended output is incorrect: ARG's "max" (absolutely value!) value is on 2020-03-23 (and -24), yet you show four rows before it and insufficient rows after it.

Try this:

dat %>%
group_by(Country) %>%
mutate(most = row_number() == which.max(abs(MobDecline))) %>%
filter(zoo::rollapply(most, width = 7, FUN = any, fill = FALSE))
# # A tibble: 14 x 4
# # Groups: Country [2]
# Country Date MobDecline most
# <chr> <date> <dbl> <lgl>
# 1 ARG 2020-03-20 -70.3 FALSE
# 2 ARG 2020-03-21 -71.7 FALSE
# 3 ARG 2020-03-22 -75.3 FALSE
# 4 ARG 2020-03-23 -84 TRUE
# 5 ARG 2020-03-24 -84 FALSE
# 6 ARG 2020-03-25 -75.7 FALSE
# 7 ARG 2020-03-26 -76 FALSE
# 8 AUS 2020-03-30 -43.3 FALSE
# 9 AUS 2020-03-31 -45.3 FALSE
# 10 AUS 2020-04-01 -45.7 FALSE
# 11 AUS 2020-04-02 -47.7 TRUE
# 12 AUS 2020-04-03 -45.7 FALSE
# 13 AUS 2020-04-04 -46 FALSE
# 14 AUS 2020-04-05 -47.3 FALSE

(and most can be removed, keeping it here for demonstration).

The use of zoo::rollapply is a much shorter and flexible version than one based on repeated lead and/or lag (which is otherwise one way to approach this).

Now, this is using abs(which.max(...)), which both assumes max absolute value (you did say max, after all) and will return at most one entry, even when tied. If you need +/- 3 rows to include this (so one more row included here), then we can try to use ==, but it will at times fail (R FAQ 7.31), so I'll introduce a "tolerance":

dat %>%
group_by(Country) %>%
mutate(most = MobDecline <= (min(MobDecline) + tol)) %>%
filter(zoo::rollapply(most, width = 7, FUN = any, fill = FALSE))
# # A tibble: 15 x 4
# # Groups: Country [2]
# Country Date MobDecline most
# <chr> <date> <dbl> <lgl>
# 1 ARG 2020-03-20 -70.3 FALSE
# 2 ARG 2020-03-21 -71.7 FALSE
# 3 ARG 2020-03-22 -75.3 FALSE
# 4 ARG 2020-03-23 -84 TRUE
# 5 ARG 2020-03-24 -84 TRUE
# 6 ARG 2020-03-25 -75.7 FALSE
# 7 ARG 2020-03-26 -76 FALSE
# 8 ARG 2020-03-27 -74.3 FALSE
# 9 AUS 2020-03-30 -43.3 FALSE
# 10 AUS 2020-03-31 -45.3 FALSE
# 11 AUS 2020-04-01 -45.7 FALSE
# 12 AUS 2020-04-02 -47.7 TRUE
# 13 AUS 2020-04-03 -45.7 FALSE
# 14 AUS 2020-04-04 -46 FALSE
# 15 AUS 2020-04-05 -47.3 FALSE

R - How to Return All Rows Below Selected Specific Rows in a Dataframe?


Data

library(data.table)
df <- fread("page_name,activity
Home,View Page
New Project,View Page
New Project,Submit Form
New Project,View Page
Expenses,View Page
Quotes,View Page
New Project,View Page
New Project,Submit Form
New Project,View Page
Payment Claims,View Page", sep=",", header=T)

dplyr solution

lead-lag functions of dplyr are helpful in these cases

library(dplyr)
df[lag(df$page_name,2)=="New Project" & lag(df$activity,2)=="Submit Form",]

Output

         page_name  activity
1: Expenses View Page
2: Payment Claims View Page

Delete rows below one in r

If you mean "keep all rows above the place where that same value is", then maybe you can try

df2[cumsum(df2[,1]==df1[1,1])==0,]

How to subset N rows above a selected point in a 'tidy' dataframe

I'll use a function I wrote in a different answer, https://stackoverflow.com/a/58716950/3358272, called leadlag. The premise for that function is similar to lead or lag (in dplyr-speak) but it has a cumulative effect.

Up front: I'm assuming that this "N prior" is per-group (per stock_name), not generally throughout all stock names.

For this data, I'll add a unique id to each row and find the rows to keep:

stock.data$rn <- seq_len(nrow(stock.data))
rownums <- merge(stock.data, other_data)$rn

From there, let's lead/lag the filtering:

stock.data %>%
group_by(stock_name) %>%
filter(leadlag(rn %in% rownums, bef=1, aft=0)) %>%
ungroup()
# # A tibble: 4 x 4
# stock_name price date rn
# <chr> <dbl> <date> <int>
# 1 Walmart 100 2012-01-01 1
# 2 Walmart 101 2012-03-01 2
# 3 Target 202 2012-03-01 5
# 4 Target 203 2012-04-01 6

and if you wanted N=2 before, then

stock.data %>%
group_by(stock_name) %>%
filter(leadlag(rn %in% rownums, bef=2, aft=0)) %>%
ungroup()
# # A tibble: 5 x 4
# stock_name price date rn
# <chr> <dbl> <date> <int>
# 1 Walmart 100 2012-01-01 1
# 2 Walmart 101 2012-03-01 2
# 3 Target 201 2012-01-01 4
# 4 Target 202 2012-03-01 5
# 5 Target 203 2012-04-01 6

Data

stock.data <- data.frame(
stock_name = c("Walmart","Walmart","Walmart","Target","Target","Target"),
price = c(100,101,102,201,202,203),
date = as.Date(c("2012-01-01", "2012-03-01", "2012-04-01", "2012-01-01",
"2012-03-01","2012-04-01"))
)
other_data <- data.frame(
stock_name = c("Walmart", "Target"),
date = as.Date(c("2012-03-01", "2012-04-01"))
)

A copy of the leadlag function defined in the other answer:

#' Lead/Lag a logical
#'
#' @param lgl logical vector
#' @param bef integer, number of elements to lead by
#' @param aft integer, number of elements to lag by
#' @return logical, same length as 'lgl'
#' @export
leadlag <- function(lgl, bef = 1, aft = 1) {
n <- length(lgl)
bef <- min(n, max(0, bef))
aft <- min(n, max(0, aft))
befx <- if (bef > 0) sapply(seq_len(bef), function(b) c(tail(lgl, n = -b), rep(FALSE, b)))
aftx <- if (aft > 0) sapply(seq_len(aft), function(a) c(rep(FALSE, a), head(lgl, n = -a)))
rowSums(cbind(befx, lgl, aftx), na.rm = TRUE) > 0
}

Remove Rows occurring after a String R Data frame

We can subset with row_numberand which

library(dplyr)

df %>% filter(row_number() < which(A=='total'))

A B
1 Bob Smith 01005
2 Carl Jones 01008
3 Syndey Lewis 01185


Related Topics



Leave a reply



Submit