Extracting Decimal Numbers from a String

Extracting decimal numbers from a string

This approach makes the decimal point and decimal fraction optional and allows multiple numbers to be extracted:

str <- " test 3.1 test 5"
as.numeric(unlist(regmatches(str,
gregexpr("[[:digit:]]+\\.*[[:digit:]]*",str))
) )
#[1] 3.1 5.0

The concern about negative numbers can be address with optional perl style look-ahead:

 str <- " test -4.5 3.1 test 5"
as.numeric(unlist(regmatches(str,gregexpr("(?>-)*[[:digit:]]+\\.*[[:digit:]]*",str, perl=TRUE))))

#[1] -4.5 3.1 5.0

str_extract_all with decimal numbers

You can use :

stringr::str_extract_all(DF1$Temperature, '\\d+([.,]\\d+)?')

#[[1]]
#[1] "37.8" "37.6"

#[[2]]
#[1] "37,8"

#[[3]]
#[1] "110" "38"

where :

\\d+ - one or more digit followed by

an optional

[.,] dot or comma

\\d+ - one or more digit.

Extract decimal Number from string in C#

Try to approach the problem this way. A decimal number has the following features:

  • start with one or more digits (\d+)
  • after that, there can be one or 0 dots (\.?)
  • if a dot is present, one or more digits should also follow (\d+)

Since the last two features are kind of related, we can put it in a group and add a ? quantifier: (\.\d+)?.

So now we have the whole regex: \d+(\.\d+)?

If you want to match decimal numbers like .01 (without the 0 at the front), you can just use | to mean "or" and add another case (\.\d+). Basically: (\d+(\.\d+)?)|(\.\d+)

How to extract numbers from string but keep decimals?

You need to add . and likely a space to the character class like so:

$ echo "12.32foo 44.2 bar" | tr -dc '[. [:digit:]]'
12.32 44.2

Extracting numbers with decimals from large strings in R

With extract from tidyr, we can do:

library(dplyr)
library(tidyr)

data.frame(rs, stringsAsFactors = FALSE) %>%
extract(rs, c("Rating", "Number_of_ratings", "Students_enrolled"),
"(?s)(\\d\\.\\d).*?(\\d+)\\s*ratings?.*?(\\d+(?:,\\d+)?)\\s*students enrolled",
convert = TRUE) %>%
mutate(Students_enrolled = as.numeric(sub(",", "", Students_enrolled)))

Output:

   Rating Number_of_ratings Students_enrolled
1 4.0 1 9
2 4.7 4 34
3 3.1 5 22
4 2.4 14 2106
5 4.3 67 1287
6 4.6 3 30
7 0.0 0 8
8 4.6 12 42
9 4.4 6 41
10 4.2 12 115
11 4.8 6 25
12 4.6 19 151
13 4.5 10 385
14 4.8 166 754
15 3.6 34 3396

Notes:

The regular expression looks complicated, but it's really not. What extract does is it extracts the match from each capture group (things surrounded by parentheses) and turn them into its own column.

  1. (?s) is a modifier which turns on the "DOTALL" mode. This allows the dot . to also match newline characters.

  2. (\\d\\.\\d) matches the Rating pattern

  3. (\\d+)\\s*ratings matches the Number_of_ratings pattern but only extracts the digits (\\d+)

  4. (\\d+(?:,\\d+)?)\\s*students enrolled matches the Students_enrolled pattern, but only extracts the "digits with or without comma" pattern

  5. convert = TRUE attempts to convert the resulting columns to their best data type, but since there are commas in Students_enrolled, an extra mutate is needed to convert it to numeric

Normally, extract throws an error if the number of capture groups is not equal to the number of output columns, but since modifiers (?s) and non-capturing groups (?:...) are not considered capture groups, the capture group count matches the column count.

Extracting numbers (in decimal and / form) from strings in R

Do you need this?

> gsub("[^0-9.<>]", "", x)
[1] "" "" "1.22" "<1.0" ">200"

Extract decimal numbers from string in Sparklyr

You could use regexpr from base R

v <- "$170.5M"
regmatches(v, regexpr("\\d*\\.\\d", v))
# [1] "170.5"

How to extract numbers with multiple decimal points from text in Google Sheets?

Try like:

=REGEXEXTRACT(A1;"(-*\d*[\.?\d+]+)")

Explanation:
The original:

-*\d*\.?\d+

matches:

  • -*: 0 to n - characters followed by:
  • \d*: 0 to n decimal characters (0-9), followed by:
  • \.?: 0 to 1 . character(s) (it has to be escaped, otherwise it means "any character"), followed by:
  • \d+: 1 to n decimal characters.

We now:

  • wrap \.?\d+ into a "selection" ([...])
  • and match 1 to n(+) of its occurences: [\.?\d+]+
  • additionally(not mandatory) enclose all in a "capturing group" ((...)) ...we could also: extract parts of it.


  • sample sheet

  • https://www.google.com/search?q=regex+tutorial



Related Topics



Leave a reply



Submit