regex expression to match decimal numbers with comma as a separator
You may use
regmatches(x, gregexpr("\\d+(?:,\\d+)?", x))
See this R demo.
To do the same with stringr
, use stringr::str_extract_all
that "extracts all pieces of a string that match a pattern":
library(stringr)
str_extract_all(x, "\\d+(?:,\\d+)?")
Note that \d
in stringr
functions may match all Unicode digits like
0123456789٠١٢٣٤٥٦٧٨٩۰۱۲۳۴۵۶۷۸۹߀߁߂߃߄߅߆߇߈߉०१२३४५६७८९০১২৩৪৫৬৭৮৯੦੧੨੩੪੫੬੭੮੯૦૧૨૩૪૫૬૭૮૯୦୧୨୩୪୫୬୭୮୯௦௧௨௩௪௫௬௭௮௯౦౧౨౩౪౫౬౭౮౯೦೧೨೩೪೫೬೭೮೯൦൧൨൩൪൫൬൭൮൯๐๑๒๓๔๕๖๗๘๙໐໑໒໓໔໕໖໗໘໙༠༡༢༣༤༥༦༧༨༩၀၁၂၃၄၅၆၇၈၉႐႑႒႓႔႕႖႗႘႙០១២៣៤៥៦៧៨៩᠐᠑᠒᠓᠔᠕᠖᠗᠘᠙᥆᥇᥈᥉᥊᥋᥌᥍᥎᥏᧐᧑᧒᧓᧔᧕᧖᧗᧘᧙᭐᭑᭒᭓᭔᭕᭖᭗᭘᭙᮰᮱᮲᮳᮴᮵᮶᮷᮸᮹᱀᱁᱂᱃᱄᱅᱆᱇᱈᱉᱐᱑᱒᱓᱔᱕᱖᱗᱘᱙꘠꘡꘢꘣꘤꘥꘦꘧꘨꘩꣐꣑꣒꣓꣔꣕꣖꣗꣘꣙꤀꤁꤂꤃꤄꤅꤆꤇꤈꤉꩐꩑꩒꩓꩔꩕꩖꩗꩘꩙0123456789
So, probably you will be safer using
str_extract_all(x, "[0-9]+(?:,[0-9]+)?")
Regex a decimal number with comma
This is a very long and convoluted regular expression that fits all your requirements. It will work if your regex engine is based on PCRE (hopefully you're using PHP, Delphi or R..).
(?<=[^\d,.]|^)\d{1,3}(,(\d{3}))*((?=[,.](\s|$))|(\.\d+)?(?=[^\d,.]|$))
DEMO on RegExr
The things that make it so long:
- Matching multiple numbers on the same line separated by only 1 character (a space) whilst not allowing partial matchs requires a lookahead and a lookbehind.
- Matching numbers ending with
.
and,
without including the.
or,
in the match requires another lookahead.
(?=[,.](\s|$))
Explanation
When writing this explanation I realised the \s
needs to be a (\s|$)
to match 1,
at the very end of a string.
This part of the regex is for matching the 1
in 1,
or the 1,000
in 1,000.
so let's say our number is 1,000.
(with the .
on the end).
Up to this point the regex has matched 1,000
, then it can't find another ,
to repeat the thousands group so it moves on to our (?=[,.](\s|$))
(?=....)
means its a lookahead, that means from where we have matched up to, look at whats coming but don't add it to the match.
So It checks if there is a ,
or a .
and if there is, it checks that it's immediately followed by whitespace or the end of input. In this case it is, so it'd leave the match as 1,000
Had the lookahead not matched, it would have moved on to trying to match decimal places.
Regular expression to match number with Decimal separator and optional Thousands separator
The reason the second alternative isn't matching is because it only allows a single \f
after the decimal point. That needs to be \d+
.
Then you need to wrap everything between ^
and $
in a group, so all alternatives match the entire string.
You had lots of redundant parentheses. And \d*
in the last alternative should be \d+
, otherwise you'll allow a number that's completely empty or just a sign.
^[+-]?([0-9]{1,3}(,[0-9]{3})*(\.[0-9]+)?|\d*\.\d+|\d+)$
^
-> start of string[+-]?
-> matches optional+
or-
char([0-9]{1,3}(,[0-9]{3})*(\.[0-9]+)?|\d*\.\d+|\d+)
-> whole group
has to match[0-9]{1,3}(,[0-9]{3})*(\.[0-9]+)
or\d*\.\d+
or\d+
[0-9]{1,3}(,[0-9]{3})*(\.[0-9]+)
-> matches numbers with thousand separators and maybe decimal separator\d*\.\d+
-> matches numbers with decimal separator, and maybe digits before the decimal\d+
-> matches numbers without decimal separator
$
-> end of string
DEMO
RegEx validation of decimal numbers with comma or dot
Try this regex:
/^(\d+(?:[\.\,]\d{2})?)$/
If $1
exactly matches your input string then assume that it is validated.
Regular expression to match numbers with or without commas and decimals in text
EDIT: Since this has gotten a lot of views, let me start by giving everybody what they Googled for:
#ALL THESE REQUIRE THE WHOLE STRING TO BE A NUMBER
#For numbers embedded in sentences, see discussion below
#### NUMBERS AND DECIMALS ONLY ####
#No commas allowed
#Pass: (1000.0), (001), (.001)
#Fail: (1,000.0)
^\d*\.?\d+$
#No commas allowed
#Can't start with "."
#Pass: (0.01)
#Fail: (.01)
^(\d+\.)?\d+$
#### CURRENCY ####
#No commas allowed
#"$" optional
#Can't start with "."
#Either 0 or 2 decimal digits
#Pass: ($1000), (1.00), ($0.11)
#Fail: ($1.0), (1.), ($1.000), ($.11)
^\$?\d+(\.\d{2})?$
#### COMMA-GROUPED ####
#Commas required between powers of 1,000
#Can't start with "."
#Pass: (1,000,000), (0.001)
#Fail: (1000000), (1,00,00,00), (.001)
^\d{1,3}(,\d{3})*(\.\d+)?$
#Commas required
#Cannot be empty
#Pass: (1,000.100), (.001)
#Fail: (1000), ()
^(?=.)(\d{1,3}(,\d{3})*)?(\.\d+)?$
#Commas optional as long as they're consistent
#Can't start with "."
#Pass: (1,000,000), (1000000)
#Fail: (10000,000), (1,00,00)
^(\d+|\d{1,3}(,\d{3})*)(\.\d+)?$
#### LEADING AND TRAILING ZEROES ####
#No commas allowed
#Can't start with "."
#No leading zeroes in integer part
#Pass: (1.00), (0.00)
#Fail: (001)
^([1-9]\d*|0)(\.\d+)?$
#No commas allowed
#Can't start with "."
#No trailing zeroes in decimal part
#Pass: (1), (0.1)
#Fail: (1.00), (0.1000)
^\d+(\.\d*[1-9])?$
Now that that's out of the way, most of the following is meant as commentary on how complex regex can get if you try to be clever with it, and why you should seek alternatives. Read at your own risk.
This is a very common task, but all the answers I see here so far will accept inputs that don't match your number format, such as ,111
, 9,9,9
, or even .,,.
. That's simple enough to fix, even if the numbers are embedded in other text. IMHO anything that fails to pull 1,234.56 and 1234—and only those numbers—out of abc22 1,234.56 9.9.9.9 def 1234
is a wrong answer.
First of all, if you don't need to do this all in one regex, don't. A single regex for two different number formats is hard to maintain even when they aren't embedded in other text. What you should really do is split the whole thing on whitespace, then run two or three smaller regexes on the results. If that's not an option for you, keep reading.
Basic pattern
Considering the examples you've given, here's a simple regex that allows pretty much any integer or decimal in 0000
format and blocks everything else:
^\d*\.?\d+$
Here's one that requires 0,000
format:
^\d{1,3}(,\d{3})*(\.\d+)?$
Put them together, and commas become optional as long as they're consistent:
^(\d*\.?\d+|\d{1,3}(,\d{3})*(\.\d+)?)$
Embedded numbers
The patterns above require the entire input to be a number. You're looking for numbers embedded in text, so you have to loosen that part. On the other hand, you don't want it to see catch22
and think it's found the number 22. If you're using something with lookbehind support (like C#, .NET 4.0+), this is pretty easy: replace ^
with (?<!\S)
and $
with (?!\S)
and you're good to go:
(?<!\S)(\d*\.?\d+|\d{1,3}(,\d{3})*(\.\d+)?)(?!\S)
If you're working with JavaScript or Ruby or something, things start looking more complex:
(?:^|\s)(\d*\.?\d+|\d{1,3}(?:,\d{3})*(?:\.\d+)?)(?!\S)
You'll have to use capture groups; I can't think of an alternative without lookbehind support. The numbers you want will be in Group 1 (assuming the whole match is Group 0).
Validation and more complex rules
I think that covers your question, so if that's all you need, stop reading now. If you want to get fancier, things turn very complex very quickly. Depending on your situation, you may want to block any or all of the following:
- Empty input
- Leading zeroes (e.g. 000123)
- Trailing zeroes (e.g. 1.2340000)
- Decimals starting with the decimal point (e.g. .001 as opposed to 0.001)
Just for the hell of it, let's assume you want to block the first 3, but allow the last one. What should you do? I'll tell you what you should do, you should use a different regex for each rule and progressively narrow down your matches. But for the sake of the challenge, here's how you do it all in one giant pattern:
(?<!\S)(?=.)(0|([1-9](\d*|\d{0,2}(,\d{3})*)))?(\.\d*[1-9])?(?!\S)
And here's what it means:
(?<!\S) to (?!\S) #The whole match must be surrounded by either whitespace or line boundaries. So if you see something bogus like :;:9.:, ignore the 9.
(?=.) #The whole thing can't be blank.
( #Rules for the integer part:
0 #1. The integer part could just be 0...
| #
[1-9] # ...otherwise, it can't have leading zeroes.
( #
\d* #2. It could use no commas at all...
| #
\d{0,2}(,\d{3})* # ...or it could be comma-separated groups of 3 digits each.
) #
)? #3. Or there could be no integer part at all.
( #Rules for the decimal part:
\. #1. It must start with a decimal point...
\d* #2. ...followed by a string of numeric digits only.
[1-9] #3. It can't be just the decimal point, and it can't end in 0.
)? #4. The whole decimal part is also optional. Remember, we checked at the beginning to make sure the whole thing wasn't blank.
Tested here: http://rextester.com/YPG96786
This will allow things like:
100,000
999.999
90.0009
1,000,023.999
0.111
.111
0
It will block things like:
1,1,1.111
000,001.111
999.
0.
111.110000
1.1.1.111
9.909,888
There are several ways to make this regex simpler and shorter, but understand that changing the pattern will loosen what it considers a number.
Since many regex engines (e.g. JavaScript and Ruby) don't support the negative lookbehind, the only way to do this correctly is with capture groups:
(?:^|\s)(?=.)((?:0|(?:[1-9](?:\d*|\d{0,2}(?:,\d{3})*)))?(?:\.\d*[1-9])?)(?!\S)
The numbers you're looking for will be in capture group 1.
Tested here: http://rubular.com/r/3HCSkndzhT
One final note
Obviously, this is a massive, complicated, nigh-unreadable regex. I enjoyed the challenge, but you should consider whether you really want to use this in a production environment. Instead of trying to do everything in one step, you could do it in two: a regex to catch anything that might be a number, then another one to weed out whatever isn't a number. Or you could do some basic processing, then use your language's built-in number parsing functions. Your choice.
Regex that allows numbers with commas and two decimals
Try with following regex.
Regex: (?:\+|\-|\$)?\d{1,}(?:\,?\d{3})*(?:\.\d+)?%?
Explanation:
(?:\+|\-|\$)?
matches either+
-
or$
in-front of a number which is optional as?
quantifier is used.\d{1,}
matches integer part even if it doesn't have,
.(?:\,?\d{3})*
matches multiple occurrences of comma separated digits if present.(?:\.\d+)?
matches optional decimal part.%?
matches optional%
character in the end.?:
stands for non-capturing groups. It will match but won't store it for back-referencing.
Regex101 Demo
Regular expression for a positive decimal comma separator with 2 decimal places MVC
You can use
^(?!0?(,0?0)?$)([0-9]{0,3}(,[0-9]{1,2})?)?$
See regex demo
Explanation:
^
- start of string(?!0?(,0?0)?$)
- a negative lookahead forbidding the string to equal0
,0,0
,0,00
or even,0
([0-9]{0,3}(,[0-9]{1,2})?)?
- optional group (matches one or zero times due to?
at the end) matching[0-9]{0,3}
- zerot o three any digits(,[0-9]{1,2})?
- optionally matches a group of a comma, followed with 1 or 2 digits
$
- end of string
Insert comma into digits and decimal number with regex
You may use this regex in Javascript (modern Javascript supports lookbehind):
(?<!\.\d*)(\d)(?=(?:\d{3})+(?:\.|$))
RegEx Details:
(?<!\.\d*)
: Negative lookbehind to assert that we don't have a decimal point before current position(\d)
: Match a digit and capture in group #1(?=
: Start Lookahead(?:\d{3})+
: Make sure we have 1 or more sets of 3 digits ahead(?:\.|$)
: that is followed by a dot or end of line
)
: End Lookahead
RegEx Demo 1
Or if you're on PCRE then use:
\.\d+$(*SKIP)(*F)|(\d)(?=(?:\d{3})+(?:\.|$))
RegEx Demo 2
Related Topics
R - Test If a String Vector Contains Any Element of Another List
Adding Some Space Between the X-Axis and the Bars, in Ggplot
Setting Individual Axis Limits With Facet_Wrap and Scales = "Free" in Ggplot2
Split Comma-Separated Strings in a Column into Separate Rows
Dictionary Style Replace Multiple Items
How Can Two Strings Be Concatenated
Specify Custom Date Format For Colclasses Argument in Read.Table/Read.Csv
Concatenating Two Text Columns in Dplyr
Error in Confusionmatrix the Data and Reference Factors Must Have the Same Number of Levels
Calculate the Area Under a Curve
Dynamically Select Data Frame Columns Using $ and a Character Value
Group by Multiple Columns and Sum Other Multiple Columns
Transform Year/Week to Date Object
Order Data Frame Rows According to Vector With Specific Order