Returning above and below rows of specific rows in r dataframe
Try that:
extract.with.context <- function(x, rows, after = 0, before = 0) {
match.idx <- which(rownames(x) %in% rows)
span <- seq(from = -before, to = after)
extend.idx <- c(outer(match.idx, span, `+`))
extend.idx <- Filter(function(i) i > 0 & i <= nrow(x), extend.idx)
extend.idx <- sort(unique(extend.idx))
return(x[extend.idx, , drop = FALSE])
}
dat <- data.frame(x = 1:26, row.names = letters)
extract.with.context(dat, c("a", "b", "j", "y"), after = 3, before = 1)
# x
# a 1
# b 2
# c 3
# d 4
# e 5
# i 9
# j 10
# k 11
# l 12
# m 13
# x 24
# y 25
# z 26
Select N rows above and below match
This seems to be a simple question but is not as trivial as presumably expected.
The issue is that which(mtcars$vs == 1)
returns a vector rather than a single value:
[1] 3 4 6 8 9 10 11 18 19 20 21 26 28 32
If another vector -1:1
(which is c(-1L, 0L, 1L)
) is added to it, the normal R rules for operations on vectors of unequal lengths apply: The recycling rule says
Any short vector operands are extended by recycling their values until
they match the size of any other operands.
Therefore the shorter vector -1:1
will be recycled to the length of which(mtcars$vs == 1)
, i.e.,
rep(-1:1, length.out = length(which(mtcars$vs == 1)))
[1] -1 0 1 -1 0 1 -1 0 1 -1 0 1 -1 0
Therefore, the result of
which(mtcars$vs == 1) + -1:1
is the element-wise sum of the elements of both vectors where the shorter vector has been recycled to match the length of the longer vector.
[1] 2 4 7 7 9 11 10 18 20 19 21 27 27 32
which is propably not what the OP has expected.
In addition, we get the
Warning message:
In which(mtcars$vs == 1) + -1:1 :
longer object length is not a multiple of shorter object length
because which(mtcars$vs == 1)
has length 14 and -1:1
has length 3.
Solution using outer()
In order to select the N
rows above and below each matching row, we need to add -N:N
to each row number returned by which(mtcars$vs == 1)
:
outer(which(mtcars$vs == 1), -1:1, `+`)
[,1] [,2] [,3]
[1,] 2 3 4
[2,] 3 4 5
[3,] 5 6 7
[4,] 7 8 9
[5,] 8 9 10
[6,] 9 10 11
[7,] 10 11 12
[8,] 17 18 19
[9,] 18 19 20
[10,] 19 20 21
[11,] 20 21 22
[12,] 25 26 27
[13,] 27 28 29
[14,] 31 32 33
Now, we have an array of all row numbers. Unfortunately, it cannot be used directly for subsetting because it contains duplicates and there are row numbers which do not exist in mtcars
. So the the result has to be "post-processed" before it can be used for subsetting.
library(magrittr) # piping used for clarity
rn <- outer(which(mtcars$vs == 1), -1:1, `+`) %>%
as.vector() %>%
unique() %>%
Filter(function(x) x[1 <= x & x <= nrow(mtcars)], .)
rn
[1] 2 3 4 5 6 7 8 9 10 11 12 17 18 19 20 21 22 25 26 27 28 29 31 32
mtcars[rn, ]
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4
Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3
Chrysler Imperial 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4
Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1
Dodge Challenger 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2
Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2
Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1
Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2
Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4
Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8
Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2
Extracting n rows above and below a given row number (variable n)
A possible solution:
r <- rep(as.numeric(row.names(dat)), 2 * dat$length + 1)
u <- unlist(Map(':', -dat$length, dat$length))
idx <- r + u
Now you can extract these rows from dat.ori
with:
dat.ori[idx, ]
Or:
dat.ori[r + u, ]
How to read rows above and below a specified value
Using awk
with read.table
df1 <- read.table(pipe("awk 'BEGIN {FS=\" \"} {if ($1 >180 && $1 < 800) print $0}' cn_EP27_L1.sizes.txt"))
Select rows above and below max group value dplyr
I think your intended output is incorrect: ARG
's "max" (absolutely value!) value is on 2020-03-23
(and -24
), yet you show four rows before it and insufficient rows after it.
Try this:
dat %>%
group_by(Country) %>%
mutate(most = row_number() == which.max(abs(MobDecline))) %>%
filter(zoo::rollapply(most, width = 7, FUN = any, fill = FALSE))
# # A tibble: 14 x 4
# # Groups: Country [2]
# Country Date MobDecline most
# <chr> <date> <dbl> <lgl>
# 1 ARG 2020-03-20 -70.3 FALSE
# 2 ARG 2020-03-21 -71.7 FALSE
# 3 ARG 2020-03-22 -75.3 FALSE
# 4 ARG 2020-03-23 -84 TRUE
# 5 ARG 2020-03-24 -84 FALSE
# 6 ARG 2020-03-25 -75.7 FALSE
# 7 ARG 2020-03-26 -76 FALSE
# 8 AUS 2020-03-30 -43.3 FALSE
# 9 AUS 2020-03-31 -45.3 FALSE
# 10 AUS 2020-04-01 -45.7 FALSE
# 11 AUS 2020-04-02 -47.7 TRUE
# 12 AUS 2020-04-03 -45.7 FALSE
# 13 AUS 2020-04-04 -46 FALSE
# 14 AUS 2020-04-05 -47.3 FALSE
(and most
can be removed, keeping it here for demonstration).
The use of zoo::rollapply
is a much shorter and flexible version than one based on repeated lead
and/or lag
(which is otherwise one way to approach this).
Now, this is using abs(which.max(...))
, which both assumes max absolute value (you did say max, after all) and will return at most one entry, even when tied. If you need +/- 3 rows to include this (so one more row included here), then we can try to use ==
, but it will at times fail (R FAQ 7.31), so I'll introduce a "tolerance":
dat %>%
group_by(Country) %>%
mutate(most = MobDecline <= (min(MobDecline) + tol)) %>%
filter(zoo::rollapply(most, width = 7, FUN = any, fill = FALSE))
# # A tibble: 15 x 4
# # Groups: Country [2]
# Country Date MobDecline most
# <chr> <date> <dbl> <lgl>
# 1 ARG 2020-03-20 -70.3 FALSE
# 2 ARG 2020-03-21 -71.7 FALSE
# 3 ARG 2020-03-22 -75.3 FALSE
# 4 ARG 2020-03-23 -84 TRUE
# 5 ARG 2020-03-24 -84 TRUE
# 6 ARG 2020-03-25 -75.7 FALSE
# 7 ARG 2020-03-26 -76 FALSE
# 8 ARG 2020-03-27 -74.3 FALSE
# 9 AUS 2020-03-30 -43.3 FALSE
# 10 AUS 2020-03-31 -45.3 FALSE
# 11 AUS 2020-04-01 -45.7 FALSE
# 12 AUS 2020-04-02 -47.7 TRUE
# 13 AUS 2020-04-03 -45.7 FALSE
# 14 AUS 2020-04-04 -46 FALSE
# 15 AUS 2020-04-05 -47.3 FALSE
R - How to Return All Rows Below Selected Specific Rows in a Dataframe?
Data
library(data.table)
df <- fread("page_name,activity
Home,View Page
New Project,View Page
New Project,Submit Form
New Project,View Page
Expenses,View Page
Quotes,View Page
New Project,View Page
New Project,Submit Form
New Project,View Page
Payment Claims,View Page", sep=",", header=T)
dplyr solution
lead-lag
functions of dplyr
are helpful in these cases
library(dplyr)
df[lag(df$page_name,2)=="New Project" & lag(df$activity,2)=="Submit Form",]
Output
page_name activity
1: Expenses View Page
2: Payment Claims View Page
Delete rows below one in r
If you mean "keep all rows above the place where that same value is", then maybe you can try
df2[cumsum(df2[,1]==df1[1,1])==0,]
How to subset N rows above a selected point in a 'tidy' dataframe
I'll use a function I wrote in a different answer, https://stackoverflow.com/a/58716950/3358272, called leadlag
. The premise for that function is similar to lead
or lag
(in dplyr-speak) but it has a cumulative effect.
Up front: I'm assuming that this "N prior" is per-group (per stock_name
), not generally throughout all stock names.
For this data, I'll add a unique id to each row and find the rows to keep:
stock.data$rn <- seq_len(nrow(stock.data))
rownums <- merge(stock.data, other_data)$rn
From there, let's lead/lag the filtering:
stock.data %>%
group_by(stock_name) %>%
filter(leadlag(rn %in% rownums, bef=1, aft=0)) %>%
ungroup()
# # A tibble: 4 x 4
# stock_name price date rn
# <chr> <dbl> <date> <int>
# 1 Walmart 100 2012-01-01 1
# 2 Walmart 101 2012-03-01 2
# 3 Target 202 2012-03-01 5
# 4 Target 203 2012-04-01 6
and if you wanted N=2
before, then
stock.data %>%
group_by(stock_name) %>%
filter(leadlag(rn %in% rownums, bef=2, aft=0)) %>%
ungroup()
# # A tibble: 5 x 4
# stock_name price date rn
# <chr> <dbl> <date> <int>
# 1 Walmart 100 2012-01-01 1
# 2 Walmart 101 2012-03-01 2
# 3 Target 201 2012-01-01 4
# 4 Target 202 2012-03-01 5
# 5 Target 203 2012-04-01 6
Data
stock.data <- data.frame(
stock_name = c("Walmart","Walmart","Walmart","Target","Target","Target"),
price = c(100,101,102,201,202,203),
date = as.Date(c("2012-01-01", "2012-03-01", "2012-04-01", "2012-01-01",
"2012-03-01","2012-04-01"))
)
other_data <- data.frame(
stock_name = c("Walmart", "Target"),
date = as.Date(c("2012-03-01", "2012-04-01"))
)
A copy of the leadlag
function defined in the other answer:
#' Lead/Lag a logical
#'
#' @param lgl logical vector
#' @param bef integer, number of elements to lead by
#' @param aft integer, number of elements to lag by
#' @return logical, same length as 'lgl'
#' @export
leadlag <- function(lgl, bef = 1, aft = 1) {
n <- length(lgl)
bef <- min(n, max(0, bef))
aft <- min(n, max(0, aft))
befx <- if (bef > 0) sapply(seq_len(bef), function(b) c(tail(lgl, n = -b), rep(FALSE, b)))
aftx <- if (aft > 0) sapply(seq_len(aft), function(a) c(rep(FALSE, a), head(lgl, n = -a)))
rowSums(cbind(befx, lgl, aftx), na.rm = TRUE) > 0
}
Remove Rows occurring after a String R Data frame
We can subset with row_number
and which
library(dplyr)
df %>% filter(row_number() < which(A=='total'))
A B
1 Bob Smith 01005
2 Carl Jones 01008
3 Syndey Lewis 01185
Related Topics
How to Add a Index by Set of Data When Using Rbindlist
Programmatically Creating Markdown Tables in R with Knitr
How to Test If List Element Exists
Using Parallel's Parlapply: Unable to Access Variables Within Parallel Code
Add a Horizontal Line to Plot and Legend in Ggplot2
R: How to Get the Week Number of the Month
How to Change the Figure Caption Format in Bookdown
Any Suggestions for How to Plot Mixem Type Data Using Ggplot2
How to Implement a Cleanup Routine in R Shiny
Replace Empty Values with Value from Other Column in a Dataframe
Divide Row Value by Aggregated Sum in R Data.Frame
Repeat Vector When Its Length Is Not a Multiple of Desired Total Length
Merging Rows with the Same Id Variable
What You Can Do with a Data.Frame That You Can't with a Data.Table
Selecting Columns in R Data Frame Based on Those *Not* in a Vector