Select Rows in a Dataframe in R Based on Values in One Row

Select rows in a dataframe in r based on values in one row

Use the %in% argument

df[df$a %in% idx,] 

How to select only the rows that have a certain value in one column in R?

There are a few ways to do this:

Base R

dfNintendo[dfNintendo$Platform %in% c("GBA", "Wii", "WiiU"), ]

or

subset(dfNintendo, Platform %in% c("GBA", "Wii", "WiiU"))

dplyr package

dplyr::filter(dfNintendo, Platform %in% c("GBA", "Wii", "WiiU"))

These should do what you want

select row based on value of another row in R

You can use a while loop to keep on selecting rows until NA occurs or all the rows are selected in the dataframe.

all_rows <- 1
next_row <- df$Z[all_rows]

while(!is.na(next_row) || length(all_rows) >= nrow(df)) {
all_rows <- c(all_rows, next_row)
next_row <- df$Z[all_rows[length(all_rows)]]
}

result <- df[all_rows, ]

# X Y Z
#1 a A 3
#3 c C 2
#2 b B 5
#5 e E NA

R: Select rows by value and always include previous row

Create a position index where 'time' value is 13 using which and then subtract 1 from the index and concatenate both to subset

i1 <- which(df1$time == 13) 
ind <- sort(unique(i1 - rep(c(1, 0), each = length(i1))))
ind <- ind[ind >0]
df1[ind,]

-output

  ID speed dist time
2 B 7 10 8
3 C 7 18 13
4 C 8 4 5
5 A 5 6 13
6 D 6 2 13

data

df1 <- structure(list(ID = c("A", "B", "C", "C", "A", "D", "E"), speed = c(4L, 
7L, 7L, 8L, 5L, 6L, 7L), dist = c(12L, 10L, 18L, 4L, 6L, 2L,
2L), time = c(4L, 8L, 13L, 5L, 13L, 13L, 9L)),
class = "data.frame", row.names = c(NA,
-7L))

Select rows in a dataframe based on values of all columns

We can try with Reduce and &

df[Reduce(`&`, lapply(replace(df[-1], is.na(df[-1]), 0), `<`, 200)),]
# ID col1 col2
#1 1 NA 24
#2 2 20 NA

data

set.seed(24)
df <- data.frame(ID=1:4, col1 = c(NA, 20, 210, 30), col2 = c(24, NA, 30, 240))

Select rows from a data frame based on values in a vector

Have a look at ?"%in%".

dt[dt$fct %in% vc,]
fct X
1 a 2
3 c 3
5 c 5
7 a 7
9 c 9
10 a 1
12 c 2
14 c 4

You could also use ?is.element:

dt[is.element(dt$fct, vc),]

How can I get a certain value from a row in dataframe? [R]

You didn't post your data so I just put it in a .csv and accessed it from my R folder on my C: drive.

Might be an easier way to do it, but this is the method I use when I might have multiple different types (by column or row) I'd like to sort for. If you're new to R and don't have data.table or dplyr installed yet, you'll need to enter the second parts in the console.

I left the values in but that can be fixed with the last line if you don't want them.

setwd("C:/R")

library(data.table)
library(dplyr)

Table <- read.csv("Table1.csv", check.names = FALSE, fileEncoding = 'UTF-8-BOM')

#Making the data long form makes it much easier to sort as your data gets more complex.
LongForm <- melt(setDT(Table), id.vars = c("index"), variable.name = "Category")

Table1 <- as.data.table(LongForm)

#This gets you what you want.
highest <- Table1 %>% group_by(index) %>% top_n(1, value)

#Then just sort it how you wanted it to look
Table2 <- highest[order(highest$index, decreasing = FALSE), ]

View(Table2)

If you don't have the right packages

install.packages("data.table")

and

install.packages("dplyr")

To get rid of the numbers

Table3 <- Table2[,1:2]

Select specific rows based on previous row value (in the same column)

For the fourth example, you could use which() in combination with lag() from dplyr, to attain the indices that meet your criteria. Then you can use these to subset the data.frame.

# Get indices of rows that meet condition
ind2 <- which(df$Type==20 & dplyr::lag(df$Type)==40)
# Get indices of rows before the ones that meet condition
ind1 <- which(df$Type==20 & dplyr::lag(df$Type)==40)-1

# Subset data
> df[c(ind1,ind2)]
Trial Type Correct Latency
1: 28 40 1 500
2: 29 20 1 230

Select previous and next N rows with the same value as a certain row

A solution with data.table:

# load the package & convert data to a data.table
library(data.table)
setDT(pdata)

# define shock-year and number of previous/next rows
shock <- 2018
n <- 2

# filter
pdata[, .SD[value == value[time == shock] &
between(time, shock - n, shock + n) &
value == rev(value)][.N > 1 & all(diff(time) == 1)]
, by = id]

which gives:

    id time value
1: 4 2016 0
2: 4 2017 0
3: 4 2018 0
4: 4 2019 0
5: 4 2020 0
6: 5 2017 0
7: 5 2018 0
8: 5 2019 0
9: 6 2017 1
10: 6 2018 1
11: 6 2019 1
12: 7 2017 1
13: 7 2018 1
14: 7 2019 1
15: 8 2016 1
16: 8 2017 1
17: 8 2018 1
18: 8 2019 1
19: 8 2020 1

Used data:

pdata <- data.frame(
id = rep(1:10, each = 5),
time = rep(2016:2020, times = 10),
value = c(c(1,1,1,0,0), c(1,1,0,0,0), c(0,0,1,0,0), c(0,0,0,0,0), c(1,0,0,0,1), c(0,1,1,1,0), c(0,1,1,1,1), c(1,1,1,1,1), c(1,0,1,1,1), c(1,1,0,1,1))
)

How to select (four) specific rows (multiple times) based on a column value in R?

Just to capture @Jasonaizkains answer from the comments field above, since pivoting is not strictly necessary in this case with some play data.

library(dplyr)
id <- rep(10:13, 4) # four subjects
year <- rep(2013:2016, each = 4) # four years
gender <- sample(1:2, 16, replace = TRUE)
play <- tibble(id, gender, year) # data.frame of 16

play <- play[-9,] # removes row for id 10 in 2015

# Removes all entries for the right id number
play %>% group_by(id) %>% filter(n_distinct(year) >= 4) %>% ungroup()
#> # A tibble: 12 x 3
#> id gender year
#> <int> <int> <int>
#> 1 11 1 2013
#> 2 12 2 2013
#> 3 13 2 2013
#> 4 11 1 2014
#> 5 12 2 2014
#> 6 13 1 2014
#> 7 11 2 2015
#> 8 12 2 2015
#> 9 13 2 2015
#> 10 11 2 2016
#> 11 12 2 2016
#> 12 13 1 2016


Related Topics



Leave a reply



Submit