How Do Keep Only Unique Words Within Each String in a Vector

How do keep only unique words within each string in a vector

Split it up (strsplit on spaces), use unique (in lapply), and paste it back together:

vapply(lapply(strsplit(vector, " "), unique), paste, character(1L), collapse = " ")
# [1] "hello i like to code" "coding is fun" "fun"

## OR
vapply(strsplit(vector, " "), function(x) paste(unique(x), collapse = " "), character(1L))

Update based on comments

You can always write a custom function to use with your vapply function. For instance, here's a function that takes a split string, drops strings that are shorter than a certain number of characters, and has the "unique" setting as a user choice.

myFun <- function(x, minLen = 3, onlyUnique = TRUE) {
a <- if (isTRUE(onlyUnique)) unique(x) else x
paste(a[nchar(a) > minLen], collapse = " ")
}

Compare the output of the following to see how it would work.

vapply(strsplit(vector, " "), myFun, character(1L))
vapply(strsplit(vector, " "), myFun, character(1L), onlyUnique = FALSE)
vapply(strsplit(vector, " "), myFun, character(1L), minLen = 0)

Keep only unique entries in a vector of strings in R

With base R, we can use strsplit + unique + paste0 to make it

> sapply(strsplit(vec, ";"), function(x) paste0(unique(x), collapse = ";"))
[1] "US;DE" "AU;JP" "IN;SA;CN;RU" "PK;IQ"

Count unique words in a string using dplyr (R)

Yet another possible solution:

library(tidyverse)

data.frame(x = str_split(string, "\\s+", simplify = T) %>% t) %>%
add_count(x) %>%
filter(n >= 2) %>%
distinct %>%
pull(x)

#> [1] "the" "home"

Return only the unique words

The problem is that wrapping strsplit(.) in c(.) does not change the fact that it is still a list, and unique will be operating at the list-level, not the word-level.

c(strsplit(rep(a, 2), "\\s+"))
# [[1]]
# [1] "an" "apple" "is" "an" "apple"
# [[2]]
# [1] "an" "apple" "is" "an" "apple"
unique(c(strsplit(rep(a, 2), "\\s+")))
# [[1]]
# [1] "an" "apple" "is" "an" "apple"

Alternatives:

  1. If length(a) is always 1, then perhaps

    unique(strsplit(a, "\\s+")[[1]])
    # [1] "an" "apple" "is"
  2. If length(a) can be 2 or more and you want a list of unique words for each sentence, then

    a2 <- c("an apple is an apple", "a pear is a pear", "an orange is an orange")
    lapply(strsplit(a2, "\\s+"), unique)
    # [[1]]
    # [1] "an" "apple" "is"
    # [[2]]
    # [1] "a" "pear" "is"
    # [[3]]
    # [1] "an" "orange" "is"

    (Note: this always returns a list, regardless of the number of sentences in the input.)

  3. if length(a) can be 2 ore more and you want a unique words across all sentences, then

    unique(unlist(strsplit(a2, "\\s+")))
    # [1] "an" "apple" "is" "a" "pear" "orange"

    (Note: this method also works well when length(a) is 1.)

How do I find unique words from vector and put them into another vector?

you must write something like that:

vector<string> findUniqueWords(vector<string> vardi){
vector<string> unikVardi;
unikVardi.push_back(vardi[0]);
for (int i = 1; i < vardi.size(); i++){
bool unique = true;
for(int k = 0; k < unikVardi.size(); k++){
if (vardi[i] == unikVardi[k]){
unique = false;
}
}
if(unique) unikVardi.push_back(vardi[i]);

}
return unikVardi;

}



Related Topics



Leave a reply



Submit