Extracting Unique Numbers from String in R

Extracting unique numbers from string in R

For the second answer, you can use gsub to remove everything from the string that's not a number, then split the string as follows:

unique(as.numeric(unlist(strsplit(gsub("[^0-9]", "", unlist(ll)), ""))))
# [1] 7 6 1 5 2

For the first answer, similarly using strsplit,

unique(na.omit(as.numeric(unlist(strsplit(unlist(ll), "[^0-9]+")))))
# [1]   7 667  11   5   2

PS: don't name your variable list (as there's an inbuilt function list). I've named your data as ll.

Extracting numbers from vectors of strings

How about

# pattern is by finding a set of numbers in the start and capturing them
as.numeric(gsub("([0-9]+).*$", "\\1", years))

# pattern is to just remove _years_old
as.numeric(gsub(" years old", "", years))

# split by space, get the element in first index
as.numeric(sapply(strsplit(years, " "), "[[", 1))

How to search for and extract unique values from one column in another column?

I think this works for you:

mutate(df, Col_C = stringr::str_extract(
  Col_A,
  paste0("\\b(", paste0(unique(Col_B), collapse = "|"), ")\\b")))
#                Col_A  Col_B  Col_C
# 1   blue shovel 1024   blue   blue
# 2    red shovel 1022    red    red
# 3  green bucket 3021  green  green
# 4    green rake 3021   blue  green
# 5 yellow shovel 1023 yellow yellow

Breakdown:

paste0(unique(Col_B), collapse="|") takes the words in Col_B, de-duplicates it, and concatenates them all together with | symbols; that is, c("blue","red","green") --> "blue|red|green". In regex, the | symbol is an "OR" operator.
\\b( and )\\b are word-boundaries, meaning that there isn't a word-like character immediately before (first) or after (second) the patterns; by adding this around the words, we prevent a partial match of blu on blue (in case that ever happens); while it is not apparent that this changes anything here, it's a more defensive/specific pattern. The parens add grouping, more evident in the next bullet.
With all of that, our overall pattern looks something like "\\b(blue|red|green)\\b" (abbreviated). This translates into "find blue or red or green such that there is a word-boundary on both ends of whichever one(s) you find".

Trying to extract/count the unique characters in a string (of class character)

In base R you can do:

df$char_count <- sapply(strsplit(df$Text, ""), function(x) length(unique(x)))

df
#>       Text char_count
#> 1   banana          3
#> 2 banana12          5
#> 3  Ace@343          6

Data

df <- data.frame(Text = c("banana", "banana12", "Ace@343"))

^{Created on 2021-11-12 by the reprex package (v2.0.0)}

Extract unique numbers from a list with multiple items per line using gsub()?

You can use

v <- list(c("12", "1"), c("13", "1"), c("12", "3"))
unique(sapply(v, "[[", 1))
# => [1] "12" "13"

See the R demo online.

Note:

sapply(v, "[[", 1) - gets the first items
unique leaves only the unique values.

How to extract numbers from text?

We can use str_extract_all by specifying the pattern as one or more number ([0-9]+). The output will be a list of length 1, extract the vector with [[ and convert to numeric.

library(stringr)
as.numeric(str_extract_all(string, "[0-9]+")[[1]])
#[1] 2016   81   64 2017   18   36

If we are using strsplit, split by the non-numeric characters

as.numeric(strsplit(string, "\\D+")[[1]][-1])
#[1] 2016   81   64 2017   18   36

Extracting unique partial elements from vector

stringr also has the str_extract function, which can be used to extract substrings that match a regex pattern. With a positive lookbehind for / and a positive lookahead for _, you can achieve your aim.

Beginning with @Andrie's x:

str_extract(x, perl('(?<=/)\\d+(?=_)'))

# [1] NA     "4101" "4101" "4101" "4101" "4101" "4101" "4101" "4101" "4101"

The pattern above matches one or more numerals (i.e. \\d+) that are preceded by a forward slash and followed by an underscore. Wrapping the pattern in perl() is required for the lookarounds.

Extracting Unique Numbers from String in R