How Do Keep Only Unique Words Within Each String in a Vector

How do keep only unique words within each string in a vector

Split it up (strsplit on spaces), use unique (in lapply), and paste it back together:

vapply(lapply(strsplit(vector, " "), unique), paste, character(1L), collapse = " ")
# [1] "hello i like to code" "coding is fun"        "fun"  

## OR
vapply(strsplit(vector, " "), function(x) paste(unique(x), collapse = " "), character(1L))

Update based on comments

You can always write a custom function to use with your vapply function. For instance, here's a function that takes a split string, drops strings that are shorter than a certain number of characters, and has the "unique" setting as a user choice.

myFun <- function(x, minLen = 3, onlyUnique = TRUE) {
  a <- if (isTRUE(onlyUnique)) unique(x) else x
  paste(a[nchar(a) > minLen], collapse = " ")
}

Compare the output of the following to see how it would work.

vapply(strsplit(vector, " "), myFun, character(1L))
vapply(strsplit(vector, " "), myFun, character(1L), onlyUnique = FALSE)
vapply(strsplit(vector, " "), myFun, character(1L), minLen = 0)

Keep only unique entries in a vector of strings in R

With base R, we can use strsplit + unique + paste0 to make it

> sapply(strsplit(vec, ";"), function(x) paste0(unique(x), collapse = ";"))
[1] "US;DE"       "AU;JP"       "IN;SA;CN;RU" "PK;IQ"

Count unique words in a string using dplyr (R)

Yet another possible solution:

library(tidyverse)

data.frame(x = str_split(string, "\\s+", simplify = T) %>% t) %>% 
  add_count(x) %>% 
  filter(n >= 2) %>% 
  distinct %>% 
  pull(x)  

#> [1] "the"  "home"

Return only the unique words

The problem is that wrapping strsplit(.) in c(.) does not change the fact that it is still a list, and unique will be operating at the list-level, not the word-level.

c(strsplit(rep(a, 2), "\\s+"))
# [[1]]
# [1] "an"    "apple" "is"    "an"    "apple"
# [[2]]
# [1] "an"    "apple" "is"    "an"    "apple"
unique(c(strsplit(rep(a, 2), "\\s+")))
# [[1]]
# [1] "an"    "apple" "is"    "an"    "apple"

Alternatives:

If length(a) is always 1, then perhaps

unique(strsplit(a, "\\s+")[[1]])
# [1] "an"    "apple" "is"

If length(a) can be 2 or more and you want a list of unique words for each sentence, then

a2 <- c("an apple is an apple", "a pear is a pear", "an orange is an orange")
lapply(strsplit(a2, "\\s+"), unique)
# [[1]]
# [1] "an"    "apple" "is"   
# [[2]]
# [1] "a"    "pear" "is"  
# [[3]]
# [1] "an"     "orange" "is"

(Note: this always returns a list, regardless of the number of sentences in the input.)

if length(a) can be 2 ore more and you want a unique words across all sentences, then
```
unique(unlist(strsplit(a2, "\\s+")))
# [1] "an"     "apple"  "is"     "a"      "pear"   "orange"
```
(Note: this method also works well when length(a) is 1.)

How do I find unique words from vector and put them into another vector?

you must write something like that:

vector<string> findUniqueWords(vector<string> vardi){
vector<string> unikVardi;
unikVardi.push_back(vardi[0]);
for (int i = 1; i < vardi.size(); i++){
    bool unique = true;
    for(int k = 0; k < unikVardi.size(); k++){
        if (vardi[i] == unikVardi[k]){
           unique  = false;
        }
    }
    if(unique)    unikVardi.push_back(vardi[i]);

}
return unikVardi;

}

How Do Keep Only Unique Words Within Each String in a Vector