Selecting Unique Rows in Matrix Using R

Selecting unique rows in matrix using R

You can use the unique function:

unique(mat$V1) # and not matrix$v1
[1] 44 281 1312

You can also write

unique(mat)

and it will give you unique lines (I tried it on your file).

If you want to select on V1s values, you can do this:

> mat[!duplicated(mat$V1), ]
X V1 V2 V3 V4 V5 V6 V7 V8 V9 V10
1 1547 44 14 1 2 100 17 0 0 0 0
23 5385 281 67 2 10 100 10 0 0 0 0
33 17347 1312 1 2 6 100 8 0 0 0 0

Get row numbers of unique rows in a matrix

If the expected row index is '3' as the other two rows are duplicates, then use duplicated to get the logical index and wrap with which for the numeric index.

 which(!(duplicated(m)|duplicated(m,fromLast=TRUE)))
#[1] 3

If we consider the 1st and 3rd as the unique rows, then

 which(!duplicated(m))

How can I extract distinct element from matrix's rows in R?

I would row-wise apply a function that only retains the m unique values and then "pad" that vector to a length N with zeros, by adding N - m zeros to the unique values:

N <- ncol(Data_Achat2)

t(apply(Data_Achat2, 1, function(x){
uniques <- unique(x)
return(c(uniques, rep(0, N-length(uniques))))
}))

Which results in:

   [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14] [,15] [,16] [,17]    --- [,36] [,37] 
1 1349 433 405 451 0 0 0 0 0 0 0 0 0 0 0 0 0 --- 0 0
2 4890 405 416 388 464 392 393 433 453 0 0 0 0 0 0 0 0 --- 0 0
3 7881 405 384 390 395 0 0 0 0 0 0 0 0 0 0 0 0 --- 0 0
4 8081 442 405 475 464 0 0 0 0 0 0 0 0 0 0 0 0 --- 0 0
5 9465 457 417 416 391 441 392 401 432 388 395 466 464 399 475 481 0 --- 0 0
6 10626 432 390 433 0 0 0 0 0 0 0 0 0 0 0 0 0 --- 0 0

Find unique rows

Check duplicated from the beginning and end of the data frame, if none returns true, then select it:

df[!(duplicated(df) | duplicated(df, fromLast = TRUE)),]

# x y
#5 115 215
#10 521 151

select unique rows containing maximum difference between column IDs

You are close - you just need to use filter rather than summarize. Like so:

 data %>% dplyr::group_by(seqname) %>% 
filter(start==min(start), end==max(end))

This assumes that there is exactly one row that has both the minimum start and the maximum end for each seqname. If there is more than one they will both be returned. If different rows have the max and the min, then none of them will be returned.

If you want the row with the min start and the row with the max end, even if they are different rows, use this for the filter

    summarize(start==min(start) | end==max(end))

select unique values and it's corresponding values in dplyr

We can use slice_head after grouping by the 'weeknum'

library(dplyr)
df1 %>%
group_by(weeknum) %>%
slice_head(n = 1)

Or with distinct

df1 %>%
distinct(weeknum, .keep_all = TRUE)

In base R, it can be done with duplicated and subset

subset(df1, !duplicated(weeknum))

Calculate number of unique values in grouped matrix

Here is an option with tidyverse. Reshape to 'long' format with pivot_longer, grouped by 'group', replace all the duplicate 'value' to NA, then grouped by row number, summarise to get the counts with n_distinct (number of distinct elements), and bind with the original data

library(dplyr)
library(tidyr)
data %>%
mutate(rn = row_number()) %>%
pivot_longer(cols = starts_with('c')) %>%
group_by(group) %>%
mutate(value = replace(value, duplicated(value)|duplicated(value,
fromLast = TRUE), NA)) %>%
group_by(rn) %>%
summarise(uniq.vals = n_distinct(value, na.rm = TRUE), .groups = 'drop') %>%
select(uniq.vals) %>%
bind_cols(data, .)

-output

#   group c1 c2 c3 c4 uniq.vals
#1 1 A B C D 2
#2 1 E F G H 3
#3 1 A F C I 1
#4 1 J K L M 4
#5 2 L B C D 2
#6 2 M F X T 3
#7 2 L T C I 1
#8 2 J E V W 4


Related Topics



Leave a reply



Submit