Why is \d slower than [0-9]?
\d
checks all Unicode digits, while [0-9]
is limited to these 10 characters. For example, Persian digits, ۱۲۳۴۵۶۷۸۹
, are an example of Unicode digits which are matched with \d
, but not [0-9]
.
You can generate a list of all such characters using the following code:
var sb = new StringBuilder();
for(UInt16 i = 0; i < UInt16.MaxValue; i++)
{
string str = Convert.ToChar(i).ToString();
if (Regex.IsMatch(str, @"\d"))
sb.Append(str);
}
Console.WriteLine(sb.ToString());
Which generates:
0123456789٠١٢٣٤٥٦٧٨٩۰۱۲۳۴۵۶۷۸۹߀߁߂߃߄߅߆߇߈߉०१२३४५६७८९০১২৩৪৫৬৭৮৯੦੧੨੩੪੫੬੭੮੯૦૧૨૩૪૫૬૭૮૯୦୧୨୩୪୫୬୭୮୯௦௧௨௩௪௫௬௭௮௯౦౧౨౩౪౫౬౭౮౯೦೧೨೩೪೫೬೭೮೯൦൧൨൩൪൫൬൭൮൯๐๑๒๓๔๕๖๗๘๙໐໑໒໓໔໕໖໗໘໙༠༡༢༣༤༥༦༧༨༩၀၁၂၃၄၅၆၇၈၉႐႑႒႓႔႕႖႗႘႙០១២៣៤៥៦៧៨៩᠐᠑᠒᠓᠔᠕᠖᠗᠘᠙᥆᥇᥈᥉᥊᥋᥌᥍᥎᥏᧐᧑᧒᧓᧔᧕᧖᧗᧘᧙᭐᭑᭒᭓᭔᭕᭖᭗᭘᭙᮰᮱᮲᮳᮴᮵᮶᮷᮸᮹᱀᱁᱂᱃᱄᱅᱆᱇᱈᱉᱐᱑᱒᱓᱔᱕᱖᱗᱘᱙꘠꘡꘢꘣꘤꘥꘦꘧꘨꘩꣐꣑꣒꣓꣔꣕꣖꣗꣘꣙꤀꤁꤂꤃꤄꤅꤆꤇꤈꤉꩐꩑꩒꩓꩔꩕꩖꩗꩘꩙0123456789
Why does Apache Commons consider '१२३' numeric?
Because that "CharSequence contains only Unicode digits" (quoting your linked documentation).
All of the characters return true for Character.isDigit
:
Some Unicode character ranges that contain digits:
- '\u0030' through '\u0039', ISO-LATIN-1 digits ('0' through '9')
- '\u0660' through '\u0669', Arabic-Indic digits
- '\u06F0' through '\u06F9', Extended Arabic-Indic digits
- '\u0966' through '\u096F', Devanagari digits
- '\uFF10' through '\uFF19', Fullwidth digits
Many other character ranges contain digits as well.
१२३
are Devanagari digits:
१
is DEVANAGARI DIGIT ONE,\u0967
२
is DEVANAGARI DIGIT TWO,\u0968
३
is DEVANAGARI DIGIT THREE,\u0969
Looking for regex solution
Will be [0-9]{2}\.[0-9]{2}\.[0-9]{4}
[0-9]
stands for any integer in range 0 to 9, the number in curly brackets ({2}
in this case) indicates how many times should the pattern be repeated.
You need to escape the dots with a backslash because otherwise they will be interpreted as any character.
RegEx syntax and efficiency
In terms of performance {1,}
and +
are equivalent, but the first has more characters to be read... And {1}
is not necessary. That won't make much difference though.
More generally, it is not a matter of preference. If you have to match a numeric ID made of numbers from 1 to a big number, without +
(or {1,}
, or *
using \d
twice), that will be difficult
\d+
or
[0-9]+
or
[0-9][0-9]*
if you prefer.
Besides, [aA-zZ]
matches a
, Z
(twice actually) and anything between A
and z
, including [
, ]
, _
... (see an ascii table)
Is (*i).member less efficient than i-member
When you return a reference, that's exactly the same as passing back a pointer, pointer semantics excluded.
You pass back a sizeof(void*)
element, not a sizeof(yourClass)
.
So when you do that:
Person& Person::someFunction(){
...
return *this;
}
You return a reference, and that reference has the same intrinsic size than a pointer, so there's no runtime difference.
Same goes for your use of (*i).name
, but in that case you create an l-value, which has then the same semantics as a reference (see also here)
Python regex that will require at least 6 characters to return true
Try this:
regex = (r"^[a-zA-Z0-9_\s-]{6,}$")
If you use re module, it will find you any string with 6 or more chars if you test it.
Code here for ex:
import re
txt = "string"
print(re.search(r"^[a-zA-Z0-9_\s-]{6,}$", txt))
If the strings has less than 6 chars, it will not be found.
How to write the regex pattern to get the matched string?
You can use
^[A-Z]+-[0-9]+\s+-\s+(?:[0-9]+[.)]\s*)?[A-Za-z]+
See the regex demo
Explanation:
^
- start of string[A-Z]+
- 1 or more uppercase ASCII letters-
- a hyphen[0-9]+
- 1 or more digits\s+
- 1+ whitespaces-
- a hyphen\s+
- see above(?:[0-9]+[.)]\s*)?
- an optional sequence of:[0-9]+
- 1+ digits[.)]
- a literal.
or)
\s*
- 0+ whitespaces
[A-Za-z]+
- 1 or more ASCII letters
strip all all of numerics with length less or greater than 6
As Eily mentioned in other comment the first issue is \b. This is an anchor for word boundary so it will not match the numbers that are in words like you suggested.
My solution is to remove \b and to make sure you don't get any weirdness add negative lookahead and negative lookbehind and the end and start of your search.
(?<!\d)(\d{1,5}|\d{7,})(?!\d)
edit: accidently typed {1,6} instead of {1,5}
Related Topics
Passing Strings from C# to C++ Dll and Back - Minimal Example
How to Use Reflection to Call a Generic Method
Understanding Garbage Collection in .Net
How to Auto-Generate a C# Class File from a Json String
In C#, Why Is String a Reference Type That Behaves Like a Value Type
How to Fix the Flickering in User Controls
Download Excel File Via Ajax MVC
How to Post Json to a Server Using C#
How to Iterate Over a Dictionary
Transactionscope Automatically Escalating to Msdtc on Some Machines
Using a Class Defined in a C++ Dll in C# Code
Post an HTML Table to Ado.Net Datatable
How to Make a Textbox That Only Accepts Numbers