Regex for Floating Point

Regular expression for floating point numbers

TL;DR

Use [.] instead of \. and [0-9] instead of \d to avoid escaping issues in some languages (like Java).

Thanks to the nameless one for originally recognizing this.

One relatively simple pattern for matching a floating point number in a larger string is:

[+-]?([0-9]*[.])?[0-9]+

This will match:

  • 123
  • 123.456
  • .456

See a working example

If you also want to match 123. (a period with no decimal part), then you'll need a slightly longer expression:

[+-]?([0-9]+([.][0-9]*)?|[.][0-9]+)

See pkeller's answer for a fuller explanation of this pattern

If you want to include a wider spectrum of numbers, including scientific notation and non-decimal numbers such as hex and octal, see my answer to How do I identify if a string is a number?.

If you want to validate that an input is a number (rather than finding a number within the input), then you should surround the pattern with ^ and $, like so:

^[+-]?([0-9]+([.][0-9]*)?|[.][0-9]+)$

Irregular Regular Expressions

"Regular expressions", as implemented in most modern languages, APIs, frameworks, libraries, etc., are based on a concept developed in formal language theory. However, software engineers have added many extensions that take these implementations far beyond the formal definition. So, while most regular expression engines resemble one another, there is actually no standard. For this reason, a lot depends on what language, API, framework or library you are using.

(Incidentally, to help reduce confusion, many have taken to using "regex" or "regexp" to describe these enhanced matching languages. See Is a Regex the Same as a Regular Expression? at RexEgg.com for more information.)

That said, most regex engines (actually, all of them, as far as I know) would accept \.. Most likely, there's an issue with escaping.

The Trouble with Escaping

Some languages have built-in support for regexes, such as JavaScript. For those languages that don't, escaping can be a problem.

This is because you are basically coding in a language within a language. Java, for example, uses \ as an escape character within it's strings, so if you want to place a literal backslash character within a string, you must escape it:

// creates a single character string: "\"
String x = "\\";

However, regexes also use the \ character for escaping, so if you want to match a literal \ character, you must escape it for the regex engine, and then escape it again for Java:

// Creates a two-character string: "\\"
// When used as a regex pattern, will match a single character: "\"
String regexPattern = "\\\\";

In your case, you have probably not escaped the backslash character in the language you are programming in:

// will most likely result in an "Illegal escape character" error
String wrongPattern = "\.";
// will result in the string "\."
String correctPattern = "\\.";

All this escaping can get very confusing. If the language you are working with supports raw strings, then you should use those to cut down on the number of backslashes, but not all languages do (most notably: Java). Fortunately, there's an alternative that will work some of the time:

String correctPattern = "[.]";

For a regex engine, \. and [.] mean exactly the same thing. Note that this doesn't work in every case, like newline (\\n), open square bracket (\\[) and backslash (\\\\ or [\\]).

A Note about Matching Numbers

(Hint: It's harder than you think)

Matching a number is one of those things you'd think is quite easy with regex, but it's actually pretty tricky. Let's take a look at your approach, piece by piece:

[-+]?

Match an optional - or +

[0-9]*

Match 0 or more sequential digits

\.?

Match an optional .

[0-9]*

Match 0 or more sequential digits

First, we can clean up this expression a bit by using a character class shorthand for the digits (note that this is also susceptible to the escaping issue mentioned above):

[0-9] = \d

I'm going to use \d below, but keep in mind that it means the same thing as [0-9]. (Well, actually, in some engines \d will match digits from all scripts, so it'll match more than [0-9] will, but that's probably not significant in your case.)

Now, if you look at this carefully, you'll realize that every single part of your pattern is optional. This pattern can match a 0-length string; a string composed only of + or -; or, a string composed only of a .. This is probably not what you've intended.

To fix this, it's helpful to start by "anchoring" your regex with the bare-minimum required string, probably a single digit:

\d+

Now we want to add the decimal part, but it doesn't go where you think it might:

\d+\.?\d* /* This isn't quite correct. */

This will still match values like 123.. Worse, it's got a tinge of evil about it. The period is optional, meaning that you've got two repeated classes side-by-side (\d+ and \d*). This can actually be dangerous if used in just the wrong way, opening your system up to DoS attacks.

To fix this, rather than treating the period as optional, we need to treat it as required (to separate the repeated character classes) and instead make the entire decimal portion optional:

\d+(\.\d+)? /* Better. But... */

This is looking better now. We require a period between the first sequence of digits and the second, but there's a fatal flaw: we can't match .123 because a leading digit is now required.

This is actually pretty easy to fix. Instead of making the "decimal" portion of the number optional, we need to look at it as a sequence of characters: 1 or more numbers that may be prefixed by a . that may be prefixed by 0 or more numbers:

(\d*\.)?\d+

Now we just add the sign:

[+-]?(\d*\.)?\d+

Of course, those slashes are pretty annoying in Java, so we can substitute in our long-form character classes:

[+-]?([0-9]*[.])?[0-9]+

Matching versus Validating

This has come up in the comments a couple times, so I'm adding an addendum on matching versus validating.

The goal of matching is to find some content within the input (the "needle in a haystack"). The goal of validating is to ensure that the input is in an expected format.

Regexes, by their nature, only match text. Given some input, they will either find some matching text or they will not. However, by "snapping" an expression to the beginning and ending of the input with anchor tags (^ and $), we can ensure that no match is found unless the entire input matches the expression, effectively using regexes to validate.

The regex described above ([+-]?([0-9]*[.])?[0-9]+) will match one or more numbers within a target string. So given the input:

apple 1.34 pear 7.98 version 1.2.3.4

The regex will match 1.34, 7.98, 1.2, .3 and .4.

To validate that a given input is a number and nothing but a number, "snap" the expression to the start and end of the input by wrapping it in anchor tags:

^[+-]?([0-9]*[.])?[0-9]+$

This will only find a match if the entire input is a floating point number, and will not find a match if the input contains additional characters. So, given the input 1.2, a match will be found, but given apple 1.2 pear no matches will be found.

Note that some regex engines have a validate, isMatch or similar function, which essentially does what I've described automatically, returning true if a match is found and false if no match is found. Also keep in mind that some engines allow you to set flags which change the definition of ^ and $, matching the beginning/end of a line rather than the beginning/end of the entire input. This is typically not the default, but be on the lookout for these flags.

regex for floating point number less than or equal to 4.5

Consider this pattern:

^(?:[0-3](?:\.[0-9]+)?|4(?:\.[0-4][0-9]*)?|4\.50*|-[0-9]+(?:\.[0-9]+)?)$
^^^ ^^^ ^^^ ^^^
0 to 3 4.0...4.4999... 4.5 any negative number

This says to match 0 through 3, followed by any optional decimal component, or 4 followed by 0 through 4, then any other number. The last portion of the alternation allows for any negative number.

Demo

But, that being said, the far easier way to do this comparison would be to use an inequality operator in your programming language. For example, the complex regex above can be replaced in Java using:

float f = 3.4f;
if (f <= 4.5f) {
System.out.println("match");
}

Regular expression for non-zero positive floats

Use

^(?:[1-9]\d*|0(?!(?:\.0+)?$))?(?:\.\d+)?$

See proof.

Explanation

--------------------------------------------------------------------------------
^ the beginning of the string
--------------------------------------------------------------------------------
(?: group, but do not capture (optional
(matching the most amount possible)):
--------------------------------------------------------------------------------
[1-9] any character of: '1' to '9'
--------------------------------------------------------------------------------
\d* digits (0-9) (0 or more times (matching
the most amount possible))
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
0 '0'
--------------------------------------------------------------------------------
(?! look ahead to see if there is not:
--------------------------------------------------------------------------------
(?: group, but do not capture (optional
(matching the most amount possible)):
--------------------------------------------------------------------------------
\. '.'
--------------------------------------------------------------------------------
0+ '0' (1 or more times (matching the
most amount possible))
--------------------------------------------------------------------------------
)? end of grouping
--------------------------------------------------------------------------------
$ before an optional \n, and the end of
the string
--------------------------------------------------------------------------------
) end of look-ahead
--------------------------------------------------------------------------------
)? end of grouping
--------------------------------------------------------------------------------
(?: group, but do not capture (optional
(matching the most amount possible)):
--------------------------------------------------------------------------------
\. '.'
--------------------------------------------------------------------------------
\d+ digits (0-9) (1 or more times (matching
the most amount possible))
--------------------------------------------------------------------------------
)? end of grouping
--------------------------------------------------------------------------------
$ before an optional \n, and the end of the
string

Regular expressions match floating point number but not integer

If your regex flavor supports lookaheads, require one of the floating-point characters before the end of the number:

((\+|-)?(?=\d*[.eE])([0-9]+\.?[0-9]*|\.[0-9]+)([eE](\+|-)?[0-9]+)?

Additional reading.

Here is also a slightly optimized version:

[+-]?(?=\d*[.eE])(?=\.?\d)\d*\.?\d*(?:[eE][+-]?\d+)?

We start with an optional + or -. Then we require one of the characters ., e or E after an arbitrary amount of digits. Then we also require at least one digit, either before or after the string. The we just match digits, an optional . and more digits. Then (completely optional) an e or an E and optional + or - and then one or more digits.

regular expression for finding decimal/float numbers?

Optionally match a + or - at the beginning, followed by one or more decimal digits, optional followed by a decimal point and one or more decimal digits util the end of the string:

/^[+-]?\d+(\.\d+)?$/

RegexPal

How to select a floating-point or integer using a regular expression from text

You have two problems. The first is that .+ is greedy, meaning that, if used to search a single line from the file, it will gobble up as many characters as it can (other than newlines) yet still secure a match, which means matching the last digit.

The second problem is that if you read the file into a string and search the string, .* will not go past the first line, because it will not match newline characters. That can be easily addressed by adding a multiline modifier (/m) which directs .* to match all characters, including newlines.

If you read your file into a string you could use the following regular expressions to extract the characters of interest from the string.

r = /
^ # match beginning of line
[ ]* # match 0+ spaces
\| # match a toothpick
[ ]+ # match 1+ spaces
total # match 'total'
[ ]+ # match 1+ spaces
\| # match a toothpick
[ ]+ # match 1+ spaces
\K # forget everything matched so far
\d+ # match a digit
(?:\.\d+) # match '.' then 1+ digits in non-capture group
? # optionally match the non-capture group
(?= # begin a positive lookahead
% # match '%'
[ ]+ # match '%' then 1+ spaces
\|[ ]* # match a toothpick then 0+ spaces
$ # match end-of-line
) # end positive lookahead
/x # free-spacing mode

I've written the regex in free-spacing mode1 to make it self-documenting. It is conventionally written as follows.

/^ *\| +total +\| +\K\d+(?:\.\d+)?(?=% +\| *$)/

Suppose you read your file into a string held by the variable str:

str =<<~END
===> Verifying dependencies...
===> Compiling sample
===> Performing cover analysis...
|------------------------|------------|
| module | coverage |
|------------------------|------------|
| sample_app | 12.94% |
| sample_sup | 56.78% |
| sample | 96% |
|------------------------|------------|
| total | 23.02% |
|------------------------|------------|
coverage calculated from:
/tmp/workspace/_build/test/cover/ct.coverdata
/tmp/workspace/_build/test/cover/eunit.coverdata
cover summary written to: /tmp/workspace/_build/test/cover/index.html
END

Then

str[r] #=> "23.02" 

1 In free-spacing mode all spaces are stripped out before the regex is parsed, which is why spaces that are part of the regex must be protected. I've done that by putting each space in character class, but they could instead be escaped or \s could be used (if appropriate).

Regex that accepts floating point numbers and minus (-) sign

^[-+]?[0-9]*\.?[0-9]+$

  • ^ - start of string
  • [-+]? - 0 or 1 sign indicator
  • [0-9]* - 0 or more integers
  • \. - the character . (. is used in regex to mean "any character")
  • [0-9]+ - 1 or more integers
  • $ - the end of the string

If you are instead using the comma as a decimal seperator, use , instead of \.

If you are using both/either, you can use [.,]



Related Topics



Leave a reply



Submit