Duplicate List Names in R

Duplicate list names in R

When you have duplicate names and you call a subset by name, only the first element is returned. In fact, [[ will only ever give you one element anyway, so let's look at [ instead.

l["B"]
# $B
# [1] 5

We can also see that trying c("B", "B") as the subset won't even give us the right result because R goes back and gets the first B again.

l[c("B", "B")]
# $B
# [1] 5
#
# $B
# [1] 5

One of the safest ways to retrieve all the B elements is to use a logical subset of the names() vector. This will give us the correct elements.

l[names(l) == "B"]
# $B
# [1] 5
#
# $B
# [1] 7

This is a great example of why duplicate names should be avoided.

Remove duplicate in a large list while keeping the named number in R

Try this:

df <- readRDS('MEPList.rds')
df1 <- as.data.frame(do.call(rbind,df))
df2 <- df1[!duplicated(df1$V1),,drop=F]

Output:

head(df2)

V1
GUE.NGL.mepid 197701
GUE.NGL.mepid.1 197533
GUE.NGL.mepid.2 197521
GUE.NGL.mepid.3 187917
GUE.NGL.mepid.4 124986
GUE.NGL.mepid.5 197529

Then you could format the rownames() to get the names.

Is there a way or a built in R function that can sum the values of duplicate names within a list?

An option using tidyverse

library(dplyr)
library(tibble)
library(tidyr)
enframe(values) %>%
unnest(c(value)) %>%
group_by(name) %>%
summarise(value = sum(value)) %>%
deframe %>%
as.list
#$China
#[1] 34

#$Russia
#[1] 54

#$UK
#[1] 10

#$US
#[1] 66

Or using base R

as.list(tapply(unlist(values), names(values), sum))
#$China
#[1] 34

#$Russia
#[1] 54

#$UK
#[1] 10

#$US
#[1] 66

Identifying duplicates in a list of character vectors in R

A binary output can be generated with

any(duplicated(unlist(my_list)))
[1] TRUE

As pointed out correctly in comments by @sindri_baldur, if duplicates appear in groups they should be handled with unique, if desired:

any(duplicated(unlist(lapply(my_list, unique))))
[1] TRUE

or another base R alternative

anyDuplicated(unlist(lapply(my_list, unique))) > 1
[1] TRUE

R - Finding duplicates in list entries

You can unlist first:

unlisted <- unlist(examplelist)
unlisted[duplicated(unlisted)]
# b1 c1 c2
# "red" "black" "green"

unlisted[!duplicated(unlisted)]
# a1 a2 a3 b2 b3 c3
# "blue" "red" "yellow" "black" "green" "brown"

If you only want the vector (without the names), use unname:

unlisted <- unname(unlist(examplelist))

Best way to delete duplicities from a list of lists in R

Looks like this thread here, answered by @akrun, answers the question:

Remove duplicated elements from list

to adopt it to your code:

unmcli <- unlist(mcli)
res<- Map('[', mcli, relist(!duplicated(unmcli), skeleton = mcli))

And then you could remove the third element as you described.

Losing duplicate column names when flattening list-of-lists into dataframes in R

as_tibble has the parameter .name_repair. Setting that to "unique" does what you want:

nested_list %>%
purrr::map(unlist) %>%
purrr::map(t) %>%
purrr::map(as_tibble, .name_repair = "unique") %>%
dplyr::bind_rows() %>%
readr::type_convert()

# A tibble: 2 x 4
name match team.name...3 team.name...4
<chr> <dbl> <chr> <chr>
1 joe 13 teama teamb
2 tom 15 teamc teamd

Note that we pass this option to the purrr::map() call, and it is passed on to the as_tibble call.

Another tip: if you replace your last purrr::map() with purrr:map_dfr(), the bind_rows() is automatically done.

How to remove duplicate column names in R?

Your real dataframe is of class data.table, while your small example is not. You can try:

df[,!duplicated(colnames(df)), with=F]

Names of variables repeated 2 or more times in a list of data.frames in R

You could:

  1. unlist the list to get all column names as a single vector,
  2. check for the (unique) duplicate names in the vector using duplicated.
## get names
vec <- names(unlist(r, recursive = FALSE))

## return duplicates
unique(vec[duplicated(vec)])
#> [1] "AA" "BB" "CC"


Related Topics



Leave a reply



Submit