Regex for Number with Decimals and Thousand Separator

Regex for number with decimals and thousand separator

/^\d{1,3}(,\d{3})*(\.\d+)?$/

About the minimum and maximum values... Well, I wouldn't do it with a regex, but you can add lookaheads at the beginning:

/^(?!0+\.00)(?=.{1,9}(\.|$))\d{1,3}(,\d{3})*(\.\d+)?$/

Note: this allows 0,999.00, so you may want to change it to:

/^(?!0+\.00)(?=.{1,9}(\.|$))(?!0(?!\.))\d{1,3}(,\d{3})*(\.\d+)?$/

which would not allow a leading 0.

Edit:
Tests: http://jsfiddle.net/pKsYq/2/

Regular expression to match number with Decimal separator and optional Thousands separator

The reason the second alternative isn't matching is because it only allows a single \f after the decimal point. That needs to be \d+.

Then you need to wrap everything between ^ and $ in a group, so all alternatives match the entire string.

You had lots of redundant parentheses. And \d* in the last alternative should be \d+, otherwise you'll allow a number that's completely empty or just a sign.

^[+-]?([0-9]{1,3}(,[0-9]{3})*(\.[0-9]+)?|\d*\.\d+|\d+)$
  • ^ -> start of string
  • [+-]? -> matches optional + or - char
  • ([0-9]{1,3}(,[0-9]{3})*(\.[0-9]+)?|\d*\.\d+|\d+) -> whole group
    has to match [0-9]{1,3}(,[0-9]{3})*(\.[0-9]+) or \d*\.\d+ or \d+
    • [0-9]{1,3}(,[0-9]{3})*(\.[0-9]+) -> matches numbers with thousand separators and maybe decimal separator
    • \d*\.\d+ -> matches numbers with decimal separator, and maybe digits before the decimal
    • \d+ -> matches numbers without decimal separator
  • $ -> end of string

DEMO

regex for multiple formats decimals and thousand separator

You may try the snippet below since I don't see how to achieve this using a singe regular expression.

$string = array("0,89", "11,111,111,111", "11,111.11", "11'111.11", "11 111.11", "11.111,11", "11'111,11", "11 111,11", "11,111", "11'111", "11 111", "11.111");
$pattern = "/^(\d{1,3}([,\s.']\d{3})*|\d+)([.,]\d+)?$/";
foreach($string as $price){
preg_match($pattern, $price, $matches);
echo $matches[0] , " -> " , preg_replace("/[,\s.']/", "", $matches[1]) , ((!empty($matches[3])) ? preg_replace("/[,\s.']/", ".", $matches[3]) : ".00") , "<br />";
}

Output:

0,89 -> 0.89
11,111,111,111 -> 11111111111.00
11,111.11 -> 11111.11
11'111.11 -> 11111.11
11 111.11 -> 11111.11
11.111,11 -> 11111.11
11'111,11 -> 11111.11
11 111,11 -> 11111.11
11,111 -> 11111.00
11'111 -> 11111.00
11 111 -> 11111.00
11.111 -> 11111.00

As you can see, in the first step, the pattern above will match any (valid) price format and place it into three groups. In the second step, the group (1) will be cleaned up of any separators while the separator in group (3) if exists, would be replaced with a dot (.) (and the zeros will be added if needed)

For more testings check the demo

Regex for custom decimal and thousand separator

Well, your specification is ambiguous, as accepting the decimal indicator as ',' you are allowing to parse 123,456 as the number 123456 or as the number 123.456 (one thousandth of it)? If you fix the ambiguity disallowing only a number of three decimals, you solve the ambiguity, but at a high cost, you need the user to understand that if he makes the mistake of using three decimals, he/she will obtain weird results under strange conditions (123,456 will be parsed as 123456.0 while 123,4560will do as 123.456) This is weird for a user to accept. It's more interesting to use the condition that a single , or . means a decimal point, while if you have both indicators, the first will be a group separator, while the second will be a decimal point.

IMHO I should never use the space as a decimal indicator (if using it as a group separator, just use it as the only digit group separator ---some programming languages e.g. Java, allow for _ to be used as a digit group separator), just nobody uses it. It's preferable to use no decimal indicator at all (making the number an integer, scaled 10, 100, or 1000 times, this has been used for long in desktop calculators) as quick data input people prefer to key the extra zeros, than to move the finger to locate de decimal point and then type two more digits for the most of the times. Don't say then if he has to go to the letters keyboard to find the space bar. (well, of course it is more difficult to go there to find the underscore _ char, but quick typers don't use group separators)

In other side, people normally don't key the thousands separators, but just for readability (the computers do it in printing, but never on reading). In this scenario, sometimes they want not the rigid situation of having groups of three digits, but to use them arbitrarily. This leads to some situations where the user wants to separate digits in groups of three left of the decimal point, while using groups of five or ten one the right (which is something you don't contemplate at all) making, e.g. PI to appear as:

3.14159 26535 89793 23846 264338 3

I agree that using the alternate decimal point as group separator could be interesting, but at both sides of the actual decimal point, and never forcing groups of three.

Anyway, just to fit on your specs, I've written the following lex(1) specification to parse your input.

pfx     [1-9][0-9]?[0-9]?
grp [0-9][0-9][0-9]
dec [0-9]*

e1 [+-]?{pfx}([.]{grp})*([,]{dec})?
e2 [+-]?{pfx}([,]{grp})*([.]{dec})?
e3 [+-]?{pfx}([ ]{grp})*([.,]{dec})?
e4 [+-]?[1-9][0-9]*([,.]{dec})?
e5 [+-]?0?([,.]{dec})?
%%
{e1}|{e2}|{e3}|{e4}|{e5} printf("\033[32m[%s]\033[m\n", yytext);
[0-9., +-]* printf("\033[31m[%s]\033[m\n", yytext);
. |
\n |
\t ;
%%
int main()
{
yylex();
}

int yywrap()
{
return 1;
}

Your regular expression, complete, should be something like:

[+-]?[0-9]{1,3}([ ][0-9]{3})*([,.]([0-9]{3}[ ])*[0-9]{1,3})?|[+-]?[0-9]{1,3}([ ][0-9]{3})*([,.][0-9]{0,2})?|[+-]?[0-9]{0,2}[,.]([0-9]{3}[ ])*[0-9]{1,3}|[+-]?[0-9]{1,3}([,][0-9]{3})*([.]([0-9]{3}[,])*[0-9]{1,3})?|[+-]?[0-9]{1,3}([,][0-9]{3})*([.][0-9]{0,2})?|[+-]?[0-9]{0,2}[.]([0-9]{3}[,])*[0-9]{1,3}|[+-]?[0-9]{1,3}([.][0-9]{3})*([,]([0-9]{3}[.])*[0-9]{1,3})?|[+-]?[0-9]{1,3}([.][0-9]{3})*([,][0-9]{0,2})?|[+-]?[0-9]{0,2}[,]([0-9]{3}[.])*[0-9]{1,3}|[+-]?[0-9]*[,.][0-9]+|[+-]?[0-9]+[,.][0-9]*|[+-]?[0-9]+

Note

Some regexp libraries, don't implement correctly the | operator, making it not actually conmutative as it should be (the worst case I know is regex101.com, see below), and forcing you to put the operands in some particular order to match some strings (this is a bug in the library, but unfortunately, this is spread) Below is the above (which works fine with sed(1)) and you'll see how it doesn't match correctly in reg101 (There should be far less matches).

I've written also a bash script (shown below) to use sed(1) with the above regexp, so you can see how it works at your site:

dig="[0-9]"

af0="${dig}{0,2}"
af1="${dig}{1,3}"
grp="${dig}{3}"

t01="[+-]?${af1}([ ]${grp})*([,.](${grp}[ ])*${af1})?"
t02="[+-]?${af1}([ ]${grp})*([,.]${af0})?"
t03="[+-]?${af0}[,.](${grp}[ ])*${af1}"

t04="[+-]?${af1}([,]${grp})*([.](${grp}[,])*${af1})?"
t05="[+-]?${af1}([,]${grp})*([.]${af0})?"
t06="[+-]?${af0}[.](${grp}[,])*${af1}"

t07="[+-]?${af1}([.]${grp})*([,](${grp}[.])*${af1})?"
t08="[+-]?${af1}([.]${grp})*([,]${af0})?"
t09="[+-]?${af0}[,](${grp}[.])*${af1}"

t10="[+-]?${dig}*[,.]${dig}+"
t11="[+-]?${dig}+[,.]${dig}*"
t12="[+-]?${dig}+"

s01="${t01}|${t02}|${t03}"
s02="${t04}|${t05}|${t06}"
s03="${t07}|${t08}|${t09}"
s04="${t10}|${t11}|${t12}"

reg="${s01}|${s02}|${s03}|${s04}"

echo "$reg"

sed -E -e "s/${reg}/<&>/g"

You can find all this code (and updates) here.

Regex valid numbers with thousand separator

this seems to work ^\d+(,\d{3})*(\.\d+)?$
Demo

Regex to separate thousands with comma and keep two decimals

If you really insist on doing this purely in regex (and truncate instead of round the fractional digits), the only solution I can think of is to use a replacement function as the second argument to .replace():

('' + num).replace(
/(\d)(?=(?:\d{3})+(?:\.|$))|(\.\d\d?)\d*$/g,
function(m, s1, s2){
return s2 || (s1 + ',');
}
);

This makes all your test cases pass:

function format(num){  return ('' + num).replace(    /(\d)(?=(?:\d{3})+(?:\.|$))|(\.\d\d?)\d*$/g,     function(m, s1, s2){      return s2 || (s1 + ',');    }  );}

test(7456, "7,456");test(45345, "45,345");test(25.23523534, "25.23"); //truncated, not roundedtest(3333.239, "3,333.23"); //truncated, not roundedtest(234.99, "234.99");test(2300.99, "2,300.99");test(23123123123.22, "23,123,123,123.22");
function test(num, expected){ var actual = format(num); console.log(num + ' -> ' + expected + ' => ' + actual + ': ' + (actual === expected ? 'passed' : 'failed') );}

Python regex: Subsentence that contains number which can have thousand separator and decimal

You may use this regex with look-around assertions:

(?<=[.,] )(?:[^,.]*?\d+(?:[.,]\d+)*)+[^.,]*(?=[,.])

RegEx Demo

RegEx Details:

  • (?<=[.,] ): Lookbehind assertion to assert that we have comma or dot followed by a space before the current position
  • (?:: Start a non-capture group
    • [^,.]*?: Match 0 or more of any character that are not , and . (lazy)
    • \d+(?:[.,]\d+)*: Match a number that may contain . or ,
  • )+: End non-capture group. + repeats this group 1+ times
  • [^.,]*: Match 0 or more of any character that are not , and .
  • (?=[,.]): Lookahead assertion to assert that we have comma or dot after the current position


Related Topics



Leave a reply



Submit