Duplicate list names in R
When you have duplicate names and you call a subset by name, only the first element is returned. In fact, [[
will only ever give you one element anyway, so let's look at [
instead.
l["B"]
# $B
# [1] 5
We can also see that trying c("B", "B")
as the subset won't even give us the right result because R goes back and gets the first B
again.
l[c("B", "B")]
# $B
# [1] 5
#
# $B
# [1] 5
One of the safest ways to retrieve all the B
elements is to use a logical subset of the names()
vector. This will give us the correct elements.
l[names(l) == "B"]
# $B
# [1] 5
#
# $B
# [1] 7
This is a great example of why duplicate names should be avoided.
Remove duplicate in a large list while keeping the named number in R
Try this:
df <- readRDS('MEPList.rds')
df1 <- as.data.frame(do.call(rbind,df))
df2 <- df1[!duplicated(df1$V1),,drop=F]
Output:
head(df2)
V1
GUE.NGL.mepid 197701
GUE.NGL.mepid.1 197533
GUE.NGL.mepid.2 197521
GUE.NGL.mepid.3 187917
GUE.NGL.mepid.4 124986
GUE.NGL.mepid.5 197529
Then you could format the rownames()
to get the names.
Is there a way or a built in R function that can sum the values of duplicate names within a list?
An option using tidyverse
library(dplyr)
library(tibble)
library(tidyr)
enframe(values) %>%
unnest(c(value)) %>%
group_by(name) %>%
summarise(value = sum(value)) %>%
deframe %>%
as.list
#$China
#[1] 34
#$Russia
#[1] 54
#$UK
#[1] 10
#$US
#[1] 66
Or using base R
as.list(tapply(unlist(values), names(values), sum))
#$China
#[1] 34
#$Russia
#[1] 54
#$UK
#[1] 10
#$US
#[1] 66
Identifying duplicates in a list of character vectors in R
A binary output can be generated with
any(duplicated(unlist(my_list)))
[1] TRUE
As pointed out correctly in comments by @sindri_baldur, if duplicates appear in groups they should be handled with unique
, if desired:
any(duplicated(unlist(lapply(my_list, unique))))
[1] TRUE
or another base R alternative
anyDuplicated(unlist(lapply(my_list, unique))) > 1
[1] TRUE
R - Finding duplicates in list entries
You can unlist
first:
unlisted <- unlist(examplelist)
unlisted[duplicated(unlisted)]
# b1 c1 c2
# "red" "black" "green"
unlisted[!duplicated(unlisted)]
# a1 a2 a3 b2 b3 c3
# "blue" "red" "yellow" "black" "green" "brown"
If you only want the vector (without the names), use unname
:
unlisted <- unname(unlist(examplelist))
Best way to delete duplicities from a list of lists in R
Looks like this thread here, answered by @akrun, answers the question:
Remove duplicated elements from list
to adopt it to your code:
unmcli <- unlist(mcli)
res<- Map('[', mcli, relist(!duplicated(unmcli), skeleton = mcli))
And then you could remove the third element as you described.
Losing duplicate column names when flattening list-of-lists into dataframes in R
as_tibble
has the parameter .name_repair
. Setting that to "unique"
does what you want:
nested_list %>%
purrr::map(unlist) %>%
purrr::map(t) %>%
purrr::map(as_tibble, .name_repair = "unique") %>%
dplyr::bind_rows() %>%
readr::type_convert()
# A tibble: 2 x 4
name match team.name...3 team.name...4
<chr> <dbl> <chr> <chr>
1 joe 13 teama teamb
2 tom 15 teamc teamd
Note that we pass this option to the purrr::map()
call, and it is passed on to the as_tibble
call.
Another tip: if you replace your last purrr::map()
with purrr:map_dfr()
, the bind_rows()
is automatically done.
How to remove duplicate column names in R?
Your real dataframe is of class data.table
, while your small example is not. You can try:
df[,!duplicated(colnames(df)), with=F]
Names of variables repeated 2 or more times in a list of data.frames in R
You could:
unlist
the list to get all column names as a single vector,- check for the (
unique
) duplicate names in the vector usingduplicated
.
## get names
vec <- names(unlist(r, recursive = FALSE))
## return duplicates
unique(vec[duplicated(vec)])
#> [1] "AA" "BB" "CC"
Related Topics
Linear Model Function Lm() Error: Na/Nan/Inf in Foreign Function Call (Arg 1)
Element-Wise Concatenation of String Vectors
Dplyr Summarize with Subtotals
Applying a Function to Each Row of a Data.Table
Format Ttest Output by R for Tex
Ggplot2 Draw Individual Ellipses But Color by Group
Ggplot: Order Bars in Faceted Bar Chart Per Facet
How to Pass "Nothing" as an Argument to '[' for Subsetting
Identifying Where Value Changes in R Data.Frame Column
Reduce File Size of R Markdown HTML Output
How to Access Dimensions of Labels Plotted by 'Geom_Text' in 'Ggplot2'
How to Calculate Adjacency Matrices in R
Dplyr Without Hard-Coding the Variable Names
Extracting a Random Sample of Rows in a Data.Frame with a Nested Conditional