Select N Rows Above and Below Match

Select N rows above and below match

This seems to be a simple question but is not as trivial as presumably expected.

The issue is that which(mtcars$vs == 1) returns a vector rather than a single value:

[1]  3  4  6  8  9 10 11 18 19 20 21 26 28 32

If another vector -1:1 (which is c(-1L, 0L, 1L)) is added to it, the normal R rules for operations on vectors of unequal lengths apply: The recycling rule says

Any short vector operands are extended by recycling their values until
they match the size of any other operands.

Therefore the shorter vector -1:1 will be recycled to the length of which(mtcars$vs == 1), i.e.,

rep(-1:1, length.out = length(which(mtcars$vs == 1)))
 [1] -1  0  1 -1  0  1 -1  0  1 -1  0  1 -1  0

Therefore, the result of

which(mtcars$vs == 1) + -1:1

is the element-wise sum of the elements of both vectors where the shorter vector has been recycled to match the length of the longer vector.

 [1]  2  4  7  7  9 11 10 18 20 19 21 27 27 32

which is propably not what the OP has expected.

In addition, we get the

Warning message:

In which(mtcars$vs == 1) + -1:1 :

longer object length is not a multiple of shorter object length

because which(mtcars$vs == 1) has length 14 and -1:1 has length 3.

Solution using outer()

In order to select the N rows above and below each matching row, we need to add -N:N to each row number returned by which(mtcars$vs == 1):

outer(which(mtcars$vs == 1), -1:1, `+`)

[,1] [,2] [,3]
[1,] 2 3 4
[2,] 3 4 5
[3,] 5 6 7
[4,] 7 8 9
[5,] 8 9 10
[6,] 9 10 11
[7,] 10 11 12
[8,] 17 18 19
[9,] 18 19 20
[10,] 19 20 21
[11,] 20 21 22
[12,] 25 26 27
[13,] 27 28 29
[14,] 31 32 33

Now, we have an array of all row numbers. Unfortunately, it cannot be used directly for subsetting because it contains duplicates and there are row numbers which do not exist in mtcars. So the the result has to be "post-processed" before it can be used for subsetting.

library(magrittr) # piping used for clarity
rn <- outer(which(mtcars$vs == 1), -1:1, `+`) %>%
as.vector() %>%
unique() %>%
Filter(function(x) x[1 <= x & x <= nrow(mtcars)], .)

rn
 [1]  2  3  4  5  6  7  8  9 10 11 12 17 18 19 20 21 22 25 26 27 28 29 31 32
mtcars[rn, ]
                   mpg cyl  disp  hp drat    wt  qsec vs am gear carb
Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4
Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3
Chrysler Imperial 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4
Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1
Dodge Challenger 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2
Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2
Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1
Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2
Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4
Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8
Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2

How to SELECT N values ABOVE and BELOW from specific value

You can limit and offset inside your QUERY():

=QUERY(A1:C,"limit "&2+MIN(5,MATCH(D1,B:B,0))&" offset "&MAX(0,MATCH(D1,B:B,0)-5))

Sample Image
Sample Image
Sample Image
Sample Image

selecting row matching condition, and also one row above and one row below

You already have the bool so do not need to filter it by index , we can do shift for the mask

m = df['col1']<100
df = df[m.shift(-1)|m.shift()|m]

Select N rows above and below a specific row in pandas

Really simple using index.get_loc. Get the index of the label, and slice accordingly.

idx = df.index.get_loc('2015-01-17')
df.iloc[idx - 10 : idx + 10]

A
2015-01-07 0.262086
2015-01-08 0.836742
2015-01-09 0.094763
2015-01-10 0.133500
2015-01-11 0.285372
2015-01-12 0.338112
2015-01-13 0.451852
2015-01-14 0.163001
2015-01-15 0.247186
2015-01-16 0.227053
2015-01-17 0.837647
2015-01-18 0.918334
2015-01-19 0.514731
2015-01-20 0.207688
2015-01-21 0.700314
2015-01-22 0.363784
2015-01-23 0.811346
2015-01-24 0.079030
2015-01-25 0.051900
2015-01-26 0.520310

How to subset N rows above a selected point in a 'tidy' dataframe

I'll use a function I wrote in a different answer, https://stackoverflow.com/a/58716950/3358272, called leadlag. The premise for that function is similar to lead or lag (in dplyr-speak) but it has a cumulative effect.

Up front: I'm assuming that this "N prior" is per-group (per stock_name), not generally throughout all stock names.

For this data, I'll add a unique id to each row and find the rows to keep:

stock.data$rn <- seq_len(nrow(stock.data))
rownums <- merge(stock.data, other_data)$rn

From there, let's lead/lag the filtering:

stock.data %>%
group_by(stock_name) %>%
filter(leadlag(rn %in% rownums, bef=1, aft=0)) %>%
ungroup()
# # A tibble: 4 x 4
# stock_name price date rn
# <chr> <dbl> <date> <int>
# 1 Walmart 100 2012-01-01 1
# 2 Walmart 101 2012-03-01 2
# 3 Target 202 2012-03-01 5
# 4 Target 203 2012-04-01 6

and if you wanted N=2 before, then

stock.data %>%
group_by(stock_name) %>%
filter(leadlag(rn %in% rownums, bef=2, aft=0)) %>%
ungroup()
# # A tibble: 5 x 4
# stock_name price date rn
# <chr> <dbl> <date> <int>
# 1 Walmart 100 2012-01-01 1
# 2 Walmart 101 2012-03-01 2
# 3 Target 201 2012-01-01 4
# 4 Target 202 2012-03-01 5
# 5 Target 203 2012-04-01 6

Data

stock.data <- data.frame(
stock_name = c("Walmart","Walmart","Walmart","Target","Target","Target"),
price = c(100,101,102,201,202,203),
date = as.Date(c("2012-01-01", "2012-03-01", "2012-04-01", "2012-01-01",
"2012-03-01","2012-04-01"))
)
other_data <- data.frame(
stock_name = c("Walmart", "Target"),
date = as.Date(c("2012-03-01", "2012-04-01"))
)

A copy of the leadlag function defined in the other answer:

#' Lead/Lag a logical
#'
#' @param lgl logical vector
#' @param bef integer, number of elements to lead by
#' @param aft integer, number of elements to lag by
#' @return logical, same length as 'lgl'
#' @export
leadlag <- function(lgl, bef = 1, aft = 1) {
n <- length(lgl)
bef <- min(n, max(0, bef))
aft <- min(n, max(0, aft))
befx <- if (bef > 0) sapply(seq_len(bef), function(b) c(tail(lgl, n = -b), rep(FALSE, b)))
aftx <- if (aft > 0) sapply(seq_len(aft), function(a) c(rep(FALSE, a), head(lgl, n = -a)))
rowSums(cbind(befx, lgl, aftx), na.rm = TRUE) > 0
}

How to find matching row based on a condition and return N row above or below?

Let me know if this does the job. It chooses the subsequent indexes leaving off from the 3 PT ones, and then chooses all the rows in main_df with those index numbers.

final_home = home_df[home_slice]  
print(final_home.to_string()) # starting from where you left off

subsequent_rows = 3 # note it'll choose this value - 1, so pick 3 if you want 2

# returns a list of tuples that contain the ranges of indices following the initial event
index_ranges = home_df[home_df['PLAYER1_TEAM_NICKNAME'] == home_df['rebounder_team']].index.map(lambda x: range(x, x + subsequent_rows))
index_list=[]
# flatten the list of tuples to a list of all the index values we want
[index_list.extend(x) for x in index_ranges]
# go back to main_df and select all the rows with those index values
final = main_df[main_df.index.isin(index_list)]
print(final)

SELECT N rows before and after the row matching the condition?

Right, this works for me:

SELECT child.*
FROM stack as child,
(SELECT idstack FROM stack WHERE message LIKE '%hello%') as parent
WHERE child.idstack BETWEEN parent.idstack-2 AND parent.idstack+2;

Returning above and below rows of specific rows in r dataframe

Try that:

extract.with.context <- function(x, rows, after = 0, before = 0) {

match.idx <- which(rownames(x) %in% rows)
span <- seq(from = -before, to = after)
extend.idx <- c(outer(match.idx, span, `+`))
extend.idx <- Filter(function(i) i > 0 & i <= nrow(x), extend.idx)
extend.idx <- sort(unique(extend.idx))

return(x[extend.idx, , drop = FALSE])
}

dat <- data.frame(x = 1:26, row.names = letters)
extract.with.context(dat, c("a", "b", "j", "y"), after = 3, before = 1)
# x
# a 1
# b 2
# c 3
# d 4
# e 5
# i 9
# j 10
# k 11
# l 12
# m 13
# x 24
# y 25
# z 26


Related Topics



Leave a reply



Submit