Subset Data Frame Using Row Names

subset a data frame by row names of different rows

Using dplyr:

library(dplyr)
DF <- data.frame(row.names=c("12a", "22a", "13a"), Name=c("12","22","13"), plot=c(25,18,9))

If you want to filter by the data frame column "Name", then:

DF.new -> DF %>% filter(Name %in% c("12", "16"))

If you want to filter by actual row.names of the df, then:

DF.new -> DF %>% filter(row.names(DF) %in% c("12a","13a"))

Or, using base R:

DF.new -> DF[DF$Name %in% c("12","13"), ] or

DF.new -> DF[row.names(DF) %in% c("12a","13a"),]

R: Subset data.frame by exact rownames

Use match function.

> d[match(c('711', '9', '1'), rownames(d)),]
[1]  2 NA  1

Which is exactly what I need.

Addition:

Instead of using data.frame, use Tibbles.

From the documentation (https://cran.r-project.org/web/packages/tibble/vignettes/tibble.html):

Tibbles are also stricter with $. Tibbles never do partial matching, and will throw a warning and return NULL if the column does not exist

match row names of two data frames and subset only matching rows in R

You can directly use the rownames from b to subset z.

z[rownames(b),] 
#    Fox Prox Sox
#ABC   1    2   3
#DEF   1    1   0
#ABD   1    3   0

How do I subset a data frame by row names that do not meet a condition?

In the example, the 'Year' for all unique 'Name' are consecutive. So, an easier option would be to group by 'Name' and filter if the number of distinct 'Year' is less than 3 or the number of rows (n()) is less than 3

library(dplyr)
data %>%
   group_by(Name) %>% 
   filter(n_distinct(Year) < 3)
   #or the number of rows
   # filter(n() < 3)
# A tibble: 4 x 2
# Groups:   Name [2]
#  Name   Year
#  <fct> <dbl>
#1 Dex    2000
#2 Dex    2001
#3 Lex    2001
#4 Lex    2002

As a general case, after grouping by 'Name', we get the difference of adjacent 'Year', check if it is equal to 1 i.e. 1 year difference, use that in run-length-encoding (rle) to find the maximum length of sequence of consecutive 'year' is less than 3 to filter those 'Name' groups

data %>%
   group_by(Name) %>% 
   filter(with(rle(c(TRUE, diff(Year)) == 1), max(lengths[values])) < 3)
# A tibble: 4 x 2
# Groups:   Name [2]
#  Name   Year
#  <fct> <dbl>
#1 Dex    2000
#2 Dex    2001
#3 Lex    2001
#4 Lex    2002

subset dataframe based on rownames

Following on from @yeedle's solution, I modified it a little and found this worked for me:

library(dplyr)
bwenv2 <- bwenv %>% 
  rownames_to_column("row_names") %>%
  semi_join(rownames_to_column(bwsp, "row_names"), by = "row_names")
rownames(bwenv2) <- bwenv2$row_names 
bwenv2 <- bwenv2 %>% select(-row_names)

bw2015 <- cbind(bwenv2, bwsp)
str(bw2015)

Subset Data Frame Rows by value in row.names in R

Extract the data which you want to split on :

sub('\\d+', '', data$group)
#[1] "ga" "ga" "gb" "gc" "gb"

and use the above in split to divide the data into groups.

new_data <- split(data, sub('\\d+', '', data$group))
new_data
#$ga
#  x1 x2 group
#1  3  a   ga1
#2  7  b   ga2

#$gb
#  x1 x2 group
#3  1  c   gb1
#5  5  e   gb1

#$gc
#  x1 x2 group
#4  8  d   gc3

It is better to keep data in a list however, if you want separate dataframes for each group you can use list2env.

list2env(new_data, .GlobalEnv)

how to select row names and row for any mismatch found in a row data frame

dplyr option -

library(dplyr)
df %>% group_by(across()) %>% group_split()

# A tibble: 2 x 4
#  V1    V2    V3    V4   
#  <chr> <chr> <chr> <chr>
#1 L     M     X     V    
#2 L     M     X     V    

#[[2]]
# A tibble: 2 x 4
#  V1    V2    V3    V4   
#  <chr> <chr> <chr> <chr>
#1 P     M     X     V    
#2 P     M     X     V

Subsetting a matrix by row names and column names in R

Try to filter rows and columns in this way:

matrix[rownames(matrix)%in%list_individuals,colnames(matrix)%in%list_individuals]

Only rows and columns contained in list_individuals will be mantained in the output.

Return corresponding row name instead of data in r

You can subset the row.names vector with the index of the max value of the column.

df <- data.frame(
  x = 1:100
)

row.names(df)[which(df$x == max(df$x, na.rm = TRUE))]

# "100"