Extracting unique numbers from string in R
For the second answer, you can use gsub
to remove everything from the string that's not a number, then split the string as follows:
unique(as.numeric(unlist(strsplit(gsub("[^0-9]", "", unlist(ll)), ""))))
# [1] 7 6 1 5 2
For the first answer, similarly using strsplit
,
unique(na.omit(as.numeric(unlist(strsplit(unlist(ll), "[^0-9]+")))))
# [1] 7 667 11 5 2
PS: don't name your variable list
(as there's an inbuilt function list
). I've named your data as ll
.
Extracting numbers from vectors of strings
How about
# pattern is by finding a set of numbers in the start and capturing them
as.numeric(gsub("([0-9]+).*$", "\\1", years))
or
# pattern is to just remove _years_old
as.numeric(gsub(" years old", "", years))
or
# split by space, get the element in first index
as.numeric(sapply(strsplit(years, " "), "[[", 1))
How to search for and extract unique values from one column in another column?
I think this works for you:
mutate(df, Col_C = stringr::str_extract(
Col_A,
paste0("\\b(", paste0(unique(Col_B), collapse = "|"), ")\\b")))
# Col_A Col_B Col_C
# 1 blue shovel 1024 blue blue
# 2 red shovel 1022 red red
# 3 green bucket 3021 green green
# 4 green rake 3021 blue green
# 5 yellow shovel 1023 yellow yellow
Breakdown:
paste0(unique(Col_B), collapse="|")
takes the words inCol_B
, de-duplicates it, and concatenates them all together with|
symbols; that is,c("blue","red","green")
-->"blue|red|green"
. In regex, the|
symbol is an "OR" operator.\\b(
and)\\b
are word-boundaries, meaning that there isn't a word-like character immediately before (first) or after (second) the patterns; by adding this around the words, we prevent a partial match ofblu
onblue
(in case that ever happens); while it is not apparent that this changes anything here, it's a more defensive/specific pattern. The parens add grouping, more evident in the next bullet.- With all of that, our overall pattern looks something like
"\\b(blue|red|green)\\b"
(abbreviated). This translates into "findblue
orred
orgreen
such that there is a word-boundary on both ends of whichever one(s) you find".
Trying to extract/count the unique characters in a string (of class character)
In base R you can do:
df$char_count <- sapply(strsplit(df$Text, ""), function(x) length(unique(x)))
df
#> Text char_count
#> 1 banana 3
#> 2 banana12 5
#> 3 Ace@343 6
Data
df <- data.frame(Text = c("banana", "banana12", "Ace@343"))
Created on 2021-11-12 by the reprex package (v2.0.0)
Extract unique numbers from a list with multiple items per line using gsub()?
You can use
v <- list(c("12", "1"), c("13", "1"), c("12", "3"))
unique(sapply(v, "[[", 1))
# => [1] "12" "13"
See the R demo online.
Note:
sapply(v, "[[", 1)
- gets the first itemsunique
leaves only the unique values.
How to extract numbers from text?
We can use str_extract_all
by specifying the pattern as one or more number ([0-9]+
). The output will be a list
of length 1, extract the vector with [[
and convert to numeric
.
library(stringr)
as.numeric(str_extract_all(string, "[0-9]+")[[1]])
#[1] 2016 81 64 2017 18 36
If we are using strsplit
, split by the non-numeric characters
as.numeric(strsplit(string, "\\D+")[[1]][-1])
#[1] 2016 81 64 2017 18 36
Extracting unique partial elements from vector
stringr
also has the str_extract
function, which can be used to extract substrings that match a regex pattern. With a positive lookbehind for /
and a positive lookahead for _
, you can achieve your aim.
Beginning with @Andrie's x
:
str_extract(x, perl('(?<=/)\\d+(?=_)'))
# [1] NA "4101" "4101" "4101" "4101" "4101" "4101" "4101" "4101" "4101"
The pattern above matches one or more numerals (i.e. \\d+
) that are preceded by a forward slash and followed by an underscore. Wrapping the pattern in perl()
is required for the lookarounds.
Related Topics
Remove Multiple Objects with Rm()
Split Up '...' Arguments and Distribute to Multiple Functions
Run a for Loop in Parallel in R
Ggplot2 Multiple Sub Groups of a Bar Chart
R Compare Multiple Values with Vector and Return Vector
Avoid Ggplot Sorting the X-Axis While Plotting Geom_Bar()
How to Get a Reversed, Log10 Scale in Ggplot2
Export a List into a CSV or Txt File in R
Why Do Some Unicode Characters Display in Matrices, But Not Data Frames in R
Extracting Unique Numbers from String in R
Split Text String in a Data.Table Columns
Converting Two Columns of a Data Frame to a Named Vector
What Are the R Sorting Rules of Character Vectors
How to Check Whether a Function Call Results in a Warning
Unicode Characters in Ggplot2 PDF Output
R - Add Column That Counts Sequentially Within Groups But Repeats for Duplicates