Select rows in a dataframe in r based on values in one row
Use the %in%
argument
df[df$a %in% idx,]
How to select only the rows that have a certain value in one column in R?
There are a few ways to do this:
Base R
dfNintendo[dfNintendo$Platform %in% c("GBA", "Wii", "WiiU"), ]
or
subset(dfNintendo, Platform %in% c("GBA", "Wii", "WiiU"))
dplyr package
dplyr::filter(dfNintendo, Platform %in% c("GBA", "Wii", "WiiU"))
These should do what you want
select row based on value of another row in R
You can use a while
loop to keep on selecting rows until NA
occurs or all the rows are selected in the dataframe.
all_rows <- 1
next_row <- df$Z[all_rows]
while(!is.na(next_row) || length(all_rows) >= nrow(df)) {
all_rows <- c(all_rows, next_row)
next_row <- df$Z[all_rows[length(all_rows)]]
}
result <- df[all_rows, ]
# X Y Z
#1 a A 3
#3 c C 2
#2 b B 5
#5 e E NA
R: Select rows by value and always include previous row
Create a position index where 'time' value is 13 using which
and then subtract 1 from the index and concatenate both to subset
i1 <- which(df1$time == 13)
ind <- sort(unique(i1 - rep(c(1, 0), each = length(i1))))
ind <- ind[ind >0]
df1[ind,]
-output
ID speed dist time
2 B 7 10 8
3 C 7 18 13
4 C 8 4 5
5 A 5 6 13
6 D 6 2 13
data
df1 <- structure(list(ID = c("A", "B", "C", "C", "A", "D", "E"), speed = c(4L,
7L, 7L, 8L, 5L, 6L, 7L), dist = c(12L, 10L, 18L, 4L, 6L, 2L,
2L), time = c(4L, 8L, 13L, 5L, 13L, 13L, 9L)),
class = "data.frame", row.names = c(NA,
-7L))
Select rows in a dataframe based on values of all columns
We can try with Reduce
and &
df[Reduce(`&`, lapply(replace(df[-1], is.na(df[-1]), 0), `<`, 200)),]
# ID col1 col2
#1 1 NA 24
#2 2 20 NA
data
set.seed(24)
df <- data.frame(ID=1:4, col1 = c(NA, 20, 210, 30), col2 = c(24, NA, 30, 240))
Select rows from a data frame based on values in a vector
Have a look at ?"%in%"
.
dt[dt$fct %in% vc,]
fct X
1 a 2
3 c 3
5 c 5
7 a 7
9 c 9
10 a 1
12 c 2
14 c 4
You could also use ?is.element
:
dt[is.element(dt$fct, vc),]
How can I get a certain value from a row in dataframe? [R]
You didn't post your data so I just put it in a .csv and accessed it from my R folder on my C: drive.
Might be an easier way to do it, but this is the method I use when I might have multiple different types (by column or row) I'd like to sort for. If you're new to R and don't have data.table or dplyr installed yet, you'll need to enter the second parts in the console.
I left the values in but that can be fixed with the last line if you don't want them.
setwd("C:/R")
library(data.table)
library(dplyr)
Table <- read.csv("Table1.csv", check.names = FALSE, fileEncoding = 'UTF-8-BOM')
#Making the data long form makes it much easier to sort as your data gets more complex.
LongForm <- melt(setDT(Table), id.vars = c("index"), variable.name = "Category")
Table1 <- as.data.table(LongForm)
#This gets you what you want.
highest <- Table1 %>% group_by(index) %>% top_n(1, value)
#Then just sort it how you wanted it to look
Table2 <- highest[order(highest$index, decreasing = FALSE), ]
View(Table2)
If you don't have the right packages
install.packages("data.table")
and
install.packages("dplyr")
To get rid of the numbers
Table3 <- Table2[,1:2]
Select specific rows based on previous row value (in the same column)
For the fourth example, you could use which()
in combination with lag()
from dplyr
, to attain the indices that meet your criteria. Then you can use these to subset the data.frame
.
# Get indices of rows that meet condition
ind2 <- which(df$Type==20 & dplyr::lag(df$Type)==40)
# Get indices of rows before the ones that meet condition
ind1 <- which(df$Type==20 & dplyr::lag(df$Type)==40)-1
# Subset data
> df[c(ind1,ind2)]
Trial Type Correct Latency
1: 28 40 1 500
2: 29 20 1 230
Select previous and next N rows with the same value as a certain row
A solution with data.table:
# load the package & convert data to a data.table
library(data.table)
setDT(pdata)
# define shock-year and number of previous/next rows
shock <- 2018
n <- 2
# filter
pdata[, .SD[value == value[time == shock] &
between(time, shock - n, shock + n) &
value == rev(value)][.N > 1 & all(diff(time) == 1)]
, by = id]
which gives:
id time value
1: 4 2016 0
2: 4 2017 0
3: 4 2018 0
4: 4 2019 0
5: 4 2020 0
6: 5 2017 0
7: 5 2018 0
8: 5 2019 0
9: 6 2017 1
10: 6 2018 1
11: 6 2019 1
12: 7 2017 1
13: 7 2018 1
14: 7 2019 1
15: 8 2016 1
16: 8 2017 1
17: 8 2018 1
18: 8 2019 1
19: 8 2020 1
Used data:
pdata <- data.frame(
id = rep(1:10, each = 5),
time = rep(2016:2020, times = 10),
value = c(c(1,1,1,0,0), c(1,1,0,0,0), c(0,0,1,0,0), c(0,0,0,0,0), c(1,0,0,0,1), c(0,1,1,1,0), c(0,1,1,1,1), c(1,1,1,1,1), c(1,0,1,1,1), c(1,1,0,1,1))
)
How to select (four) specific rows (multiple times) based on a column value in R?
Just to capture @Jasonaizkains answer from the comments field above, since pivoting is not strictly necessary in this case with some play data.
library(dplyr)
id <- rep(10:13, 4) # four subjects
year <- rep(2013:2016, each = 4) # four years
gender <- sample(1:2, 16, replace = TRUE)
play <- tibble(id, gender, year) # data.frame of 16
play <- play[-9,] # removes row for id 10 in 2015
# Removes all entries for the right id number
play %>% group_by(id) %>% filter(n_distinct(year) >= 4) %>% ungroup()
#> # A tibble: 12 x 3
#> id gender year
#> <int> <int> <int>
#> 1 11 1 2013
#> 2 12 2 2013
#> 3 13 2 2013
#> 4 11 1 2014
#> 5 12 2 2014
#> 6 13 1 2014
#> 7 11 2 2015
#> 8 12 2 2015
#> 9 13 2 2015
#> 10 11 2 2016
#> 11 12 2 2016
#> 12 13 1 2016
Related Topics
How to Use Gsub() on Each Element of a Data Frame
Adjusting the Node Size in Igraph Using a Matrix
Extracting Output from Principal Function in Psych Package as a Data Frame
How to Plot a Stacked Bar with Ggplot
Find and Replace Missing Values with Row Mean
Dplyr: Mutate_At + Coalesce: Dynamic Names of Columns
R Shiny: How to Write Loop for Observeevent
Dplyr: Grouping and Summarizing/Mutating Data with Rolling Time Windows
Inline R Code in Yaml for Rmarkdown Doesn't Run
Equation Numbering in Rmarkdown - for Export to Word
Re- Installing R Linux Ubuntu: Unmet Dependencies R
Keep All Plot Components Same Size in Ggplot2 Between Two Plots
Rotate X Axis Labels 45 Degrees on Grouped Bar Plot R
Dplyr Pipes - How to Change the Original Dataframe
How to Write Data from R to Postgresql Tables with an Autoincrementing Primary Key
How to Know a Function or an Operation in R Is Vectorized
R: Interactive Plots (Tooltips): Rcharts Dimple Plot: Formatting Axis