How do keep only unique words within each string in a vector
Split it up (strsplit
on spaces), use unique
(in lapply
), and paste
it back together:
vapply(lapply(strsplit(vector, " "), unique), paste, character(1L), collapse = " ")
# [1] "hello i like to code" "coding is fun" "fun"
## OR
vapply(strsplit(vector, " "), function(x) paste(unique(x), collapse = " "), character(1L))
Update based on comments
You can always write a custom function to use with your vapply
function. For instance, here's a function that takes a split string, drops strings that are shorter than a certain number of characters, and has the "unique" setting as a user choice.
myFun <- function(x, minLen = 3, onlyUnique = TRUE) {
a <- if (isTRUE(onlyUnique)) unique(x) else x
paste(a[nchar(a) > minLen], collapse = " ")
}
Compare the output of the following to see how it would work.
vapply(strsplit(vector, " "), myFun, character(1L))
vapply(strsplit(vector, " "), myFun, character(1L), onlyUnique = FALSE)
vapply(strsplit(vector, " "), myFun, character(1L), minLen = 0)
Keep only unique entries in a vector of strings in R
With base R, we can use strsplit
+ unique
+ paste0
to make it
> sapply(strsplit(vec, ";"), function(x) paste0(unique(x), collapse = ";"))
[1] "US;DE" "AU;JP" "IN;SA;CN;RU" "PK;IQ"
Count unique words in a string using dplyr (R)
Yet another possible solution:
library(tidyverse)
data.frame(x = str_split(string, "\\s+", simplify = T) %>% t) %>%
add_count(x) %>%
filter(n >= 2) %>%
distinct %>%
pull(x)
#> [1] "the" "home"
Return only the unique words
The problem is that wrapping strsplit(.)
in c(.)
does not change the fact that it is still a list
, and unique
will be operating at the list-level, not the word-level.
c(strsplit(rep(a, 2), "\\s+"))
# [[1]]
# [1] "an" "apple" "is" "an" "apple"
# [[2]]
# [1] "an" "apple" "is" "an" "apple"
unique(c(strsplit(rep(a, 2), "\\s+")))
# [[1]]
# [1] "an" "apple" "is" "an" "apple"
Alternatives:
If
length(a)
is always 1, then perhapsunique(strsplit(a, "\\s+")[[1]])
# [1] "an" "apple" "is"If
length(a)
can be 2 or more and you want a list of unique words for each sentence, thena2 <- c("an apple is an apple", "a pear is a pear", "an orange is an orange")
lapply(strsplit(a2, "\\s+"), unique)
# [[1]]
# [1] "an" "apple" "is"
# [[2]]
# [1] "a" "pear" "is"
# [[3]]
# [1] "an" "orange" "is"(Note: this always returns a
list
, regardless of the number of sentences in the input.)if
length(a)
can be 2 ore more and you want a unique words across all sentences, thenunique(unlist(strsplit(a2, "\\s+")))
# [1] "an" "apple" "is" "a" "pear" "orange"(Note: this method also works well when
length(a)
is 1.)
How do I find unique words from vector and put them into another vector?
you must write something like that:
vector<string> findUniqueWords(vector<string> vardi){
vector<string> unikVardi;
unikVardi.push_back(vardi[0]);
for (int i = 1; i < vardi.size(); i++){
bool unique = true;
for(int k = 0; k < unikVardi.size(); k++){
if (vardi[i] == unikVardi[k]){
unique = false;
}
}
if(unique) unikVardi.push_back(vardi[i]);
}
return unikVardi;
}
Related Topics
Getting Strings Recognized as Variable Names in R
What's the Best Way to Use R Scripts on the Command Line (Terminal)
File Path Issues in R Using Windows ("Hex Digits in Character String" Error)
Read.CSV Warning 'Eof Within Quoted String' Prevents Complete Reading of File
How to Overlay Density Plots in R
Ggplot2, Facet_Grid, Free Scales
Data.Table - Select First N Rows Within Group
Converting Latitude and Longitude Points to Utm
How to Connect Two Coordinates with a Line Using Leaflet in R
How to Draw Stacked Bars in Ggplot2 That Show Percentages Based on Group
How to Check Whether a Function Call Results in a Warning
Creating a Plot Window of a Particular Size
Creating "Radar Chart" (A.K.A. Star Plot; Spider Plot) Using Ggplot2 in R
Backtransform 'Scale()' for Plotting
How to Read Data in Utf-8 Format in R