Extracting decimal numbers from a string
This approach makes the decimal point and decimal fraction optional and allows multiple numbers to be extracted:
str <- " test 3.1 test 5"
as.numeric(unlist(regmatches(str,
gregexpr("[[:digit:]]+\\.*[[:digit:]]*",str))
) )
#[1] 3.1 5.0
The concern about negative numbers can be address with optional perl style look-ahead:
str <- " test -4.5 3.1 test 5"
as.numeric(unlist(regmatches(str,gregexpr("(?>-)*[[:digit:]]+\\.*[[:digit:]]*",str, perl=TRUE))))
#[1] -4.5 3.1 5.0
str_extract_all with decimal numbers
You can use :
stringr::str_extract_all(DF1$Temperature, '\\d+([.,]\\d+)?')
#[[1]]
#[1] "37.8" "37.6"
#[[2]]
#[1] "37,8"
#[[3]]
#[1] "110" "38"
where :
\\d+
- one or more digit followed by
an optional
[.,]
dot or comma
\\d+
- one or more digit.
Extract decimal Number from string in C#
Try to approach the problem this way. A decimal number has the following features:
- start with one or more digits (
\d+
) - after that, there can be one or 0 dots (
\.?
) - if a dot is present, one or more digits should also follow (
\d+
)
Since the last two features are kind of related, we can put it in a group and add a ?
quantifier: (\.\d+)?
.
So now we have the whole regex: \d+(\.\d+)?
If you want to match decimal numbers like .01
(without the 0 at the front), you can just use |
to mean "or" and add another case (\.\d+)
. Basically: (\d+(\.\d+)?)|(\.\d+)
How to extract numbers from string but keep decimals?
You need to add .
and likely a space to the character class like so:
$ echo "12.32foo 44.2 bar" | tr -dc '[. [:digit:]]'
12.32 44.2
Extracting numbers with decimals from large strings in R
With extract
from tidyr
, we can do:
library(dplyr)
library(tidyr)
data.frame(rs, stringsAsFactors = FALSE) %>%
extract(rs, c("Rating", "Number_of_ratings", "Students_enrolled"),
"(?s)(\\d\\.\\d).*?(\\d+)\\s*ratings?.*?(\\d+(?:,\\d+)?)\\s*students enrolled",
convert = TRUE) %>%
mutate(Students_enrolled = as.numeric(sub(",", "", Students_enrolled)))
Output:
Rating Number_of_ratings Students_enrolled
1 4.0 1 9
2 4.7 4 34
3 3.1 5 22
4 2.4 14 2106
5 4.3 67 1287
6 4.6 3 30
7 0.0 0 8
8 4.6 12 42
9 4.4 6 41
10 4.2 12 115
11 4.8 6 25
12 4.6 19 151
13 4.5 10 385
14 4.8 166 754
15 3.6 34 3396
Notes:
The regular expression looks complicated, but it's really not. What extract
does is it extracts the match from each capture group (things surrounded by parentheses) and turn them into its own column.
(?s)
is a modifier which turns on the "DOTALL" mode. This allows the dot.
to also match newline characters.(\\d\\.\\d)
matches theRating
pattern(\\d+)\\s*ratings
matches theNumber_of_ratings
pattern but only extracts the digits(\\d+)
(\\d+(?:,\\d+)?)\\s*students enrolled
matches theStudents_enrolled
pattern, but only extracts the "digits with or without comma" patternconvert = TRUE
attempts to convert the resulting columns to their best data type, but since there are commas inStudents_enrolled
, an extramutate
is needed to convert it to numeric
Normally, extract
throws an error if the number of capture groups is not equal to the number of output columns, but since modifiers (?s)
and non-capturing groups (?:...)
are not considered capture groups, the capture group count matches the column count.
Extracting numbers (in decimal and / form) from strings in R
Do you need this?
> gsub("[^0-9.<>]", "", x)
[1] "" "" "1.22" "<1.0" ">200"
Extract decimal numbers from string in Sparklyr
You could use regexpr
from base R
v <- "$170.5M"
regmatches(v, regexpr("\\d*\\.\\d", v))
# [1] "170.5"
How to extract numbers with multiple decimal points from text in Google Sheets?
Try like:
=REGEXEXTRACT(A1;"(-*\d*[\.?\d+]+)")
Explanation:
The original:
-*\d*\.?\d+
matches:
-*
:0
ton
-
characters followed by:\d*
:0
ton
decimal characters (0-9
), followed by:\.?
:0
to1
.
character(s) (it has to be escaped, otherwise it means "any character"), followed by:\d+
:1
ton
decimal characters.
We now:
- wrap
\.?\d+
into a "selection" ([...]
) - and match
1
ton
(+
) of its occurences:[\.?\d+]+
- additionally(not mandatory) enclose all in a "capturing group" (
(...)
) ...we could also: extract parts of it.
sample sheet
https://www.google.com/search?q=regex+tutorial
Related Topics
How to Convert Date and Time from Character to Datetime Type
Colorize Clusters in Dendogram with Ggplot2
Remove Parenthesis from a Character String
Extract Non Null Elements from a List in R
Calculating the Difference Between Consecutive Rows by Group Using Dplyr
How to Plot Logit and Probit in Ggplot2
Generate Matrix with Iid Normal Random Variables Using R
Forcing R Output to Be Scientific Notation with at Most Two Decimals
Wrap Long Text in Kable Table Column
Replace Na with Zero in Dplyr Without Using List()
How to Split a Data Frame by Rows, and Then Process the Blocks
Arrange Plots in a Layout Which Cannot Be Achieved by 'Par(Mfrow ='
Formatter Argument in Scale_Continuous Throwing Errors in R 2.15
How to Change a Value Coded as "Yes" to a Value of 1 in R