Match only unicode letters
Starting with ECMAScript 2018, JavaScript finally supports Unicode property escapes natively.
For older versions, you either need to define all the relevant Unicode ranges yourself. Or you can use Steven Levithan's XRegExp
package with Unicode add-ons and utilize its Unicode property shortcuts:
var regex = new XRegExp("^\\p{L}*$")
var a = "abcäöüéèê"
if (regex.test(a)) {
// Match
} else {
// No Match
}
Match only unicode letters
Starting with ECMAScript 2018, JavaScript finally supports Unicode property escapes natively.
For older versions, you either need to define all the relevant Unicode ranges yourself. Or you can use Steven Levithan's XRegExp
package with Unicode add-ons and utilize its Unicode property shortcuts:
var regex = new XRegExp("^\\p{L}*$")
var a = "abcäöüéèê"
if (regex.test(a)) {
// Match
} else {
// No Match
}
Matching only a unicode letter in Python re
You can construct a new character class:
[^\W\d_]
instead of \w
. Translated into English, it means "Any character that is not a non-alphanumeric character ([^\W]
is the same as \w
), but that is also not a digit and not an underscore".
Therefore, it will only allow Unicode letters.
Match any unicode letter?
Python's re
module doesn't support Unicode properties yet. But you can compile your regex using the re.UNICODE
flag, and then the character class shorthand \w
will match Unicode letters, too.
Since \w
will also match digits, you need to then subtract those from your character class, along with the underscore:
[^\W\d_]
will match any Unicode letter.
>>> import re
>>> r = re.compile(r'[^\W\d_]', re.U)
>>> r.match('x')
<_sre.SRE_Match object at 0x0000000001DBCF38>
>>> r.match(u'é')
<_sre.SRE_Match object at 0x0000000002253030>
Regex - Match only unicode alphabet not numbers
The regex engine need to know that the target string is an unicode string (to avoid interpretation errors). To do that you can use the u modifier, that has two functions:
- it expands classical shorthand character classes like
\w
\d
to unicode characters (and not only ascii characters) - it forces the string to be seen as an unicode string
So you can use: /\pL+/u
Note that in your particular case, the first behavior is not needed, but you can only switch on the second behavior with: /(*UTF8)\pL+/
((*UTF8)
must be placed at the very begining of the pattern)
JavaScript regex pattern for any visible unicode letter characters
Use XRegExp
library to parse your current regular expression:
var pattern = new XRegExp("^[0-9\\p{L} _.]+$");var s = "123 Московская Street.";if (XRegExp.test(s, pattern)) { console.log("Valid");}
<script src="https://cdnjs.cloudflare.com/ajax/libs/xregexp/3.2.0/xregexp-all.min.js"></script>
Matching every Unicode letter only in HTML5 Input form
If you're using a browser that does support \p{}
, and doesn't require the u
switch to enable it, your code works, but you should remove the brackets because they're unnecessary:
<input type="text" pattern="\p{L}+\s\p{L}+">
It worked when I tested it in Chrome.
Older Javascript versions (before ES2018?) do not support \p{}
at all, and some versions may need the u
switch to enable it, which won't work here. If you really need it, I suggest that you try the solutions here: How can I use Unicode-aware regular expressions in JavaScript?.
If you just don't like digits, then you can use \D
as tamas rev said in the comments. Or maybe [^\d\s]
to enforce that your input isn't just spaces.
Note that only matching letters is a bad way to validate names, since it excludes names like "O'Henry". Note that forcing exactly one space to be present excludes languages where the names are not separated with a space (like in the name "蔡英文"), people who only have one name, and people whose names have more than one space ("Mary Jane", "van der Waals"). And some names do have numbers. See Falsehoods Programmers Believe About Names.
Regex: Match everything except unicode letters
The [\W\d_]
is a regex that matches any non-word char (any char not matched with \w
), it matches digits with \d
and a _
. Note that \d
in a Unicode aware Python 3 regex only matches \p{Nd}
(Number, decimal):
Matches any Unicode decimal digit (that is, any character in Unicode character category
[Nd]
).
The chars this pattern does not remove in your string belong to the \p{No}
Unicode category (numbers, other).
So, if you plan to also remove all those chars from \p{No}
, you need to add them to the pattern:
r'[\u00B2\u00B3\u00B9\u00BC-\u00BE\u09F4-\u09F9\u0B72-\u0B77\u0BF0-\u0BF2\u0C78-\u0C7E\u0D58-\u0D5E\u0D70-\u0D78\u0F2A-\u0F33\u1369-\u137C\u17F0-\u17F9\u19DA\u2070\u2074-\u2079\u2080-\u2089\u2150-\u215F\u2189\u2460-\u249B\u24EA-\u24FF\u2776-\u2793\u2CFD\u3192-\u3195\u3220-\u3229\u3248-\u324F\u3251-\u325F\u3280-\u3289\u32B1-\u32BF\uA830-\uA835\U00010107-\U00010133\U00010175-\U00010178\U0001018A\U0001018B\U000102E1-\U000102FB\U00010320-\U00010323\U00010858-\U0001085F\U00010879-\U0001087F\U000108A7-\U000108AF\U000108FB-\U000108FF\U00010916-\U0001091B\U000109BC\U000109BD\U000109C0-\U000109CF\U000109D2-\U000109FF\U00010A40-\U00010A47\U00010A7D\U00010A7E\U00010A9D-\U00010A9F\U00010AEB-\U00010AEF\U00010B58-\U00010B5F\U00010B78-\U00010B7F\U00010BA9-\U00010BAF\U00010CFA-\U00010CFF\U00010E60-\U00010E7E\U00011052-\U00011065\U000111E1-\U000111F4\U0001173A\U0001173B\U000118EA-\U000118F2\U00011C5A-\U00011C6C\U00016B5B-\U00016B61\U0001D360-\U0001D371\U0001E8C7-\U0001E8CF\U0001F100-\U0001F10C\W\d_]+'
See the regex demo.
You may see the chars listed on this page page.
Also, be aware of a Number, letter category, see the \p{Nl}
char list here.
Related Topics
How to Check If String Contains Substring
Invariant Violation: _Registercomponent(...): Target Container Is Not a Dom Element
Utf-8 Word Boundary Regex in JavaScript
Deep Copy in Es6 Using the Spread Syntax
How to Load a JavaScript File Dynamically
Angularjs Multiple Filter with Custom Filter Function
Sort an Array of Objects Based on Another Array of Ids
How to Stop a JavaScript for Loop
JavaScript Xmlhttprequest Using JSONp
Calling Setstate in a Loop Only Updates State 1 Time
Sorting Objects by Property Values
Firebase Query Methods Startat() Taking Case Sensitive Parameters
Correct Way to Handle Conditional Styling in React
JavaScript Infinitely Looping Slideshow with Delays
Difference and Intersection of Two Arrays Containing Objects