regex for accepting only persian characters
TL;DR
Farsi MUST used character sets are as following:
Use
^[آابپتثجچحخدذرزژسشصضطظعغفقکگلمنوهی]+$
for letters or use codepoints regarding your regex flavor (not all engines support\uXXXX
notation):^[\u0622\u0627\u0628\u067E\u062A-\u062C\u0686\u062D-\u0632\u0698\u0633-\u063A\u0641\u0642\u06A9\u06AF\u0644-\u0648\u06CC]+$
Use
^[۰۱۲۳۴۵۶۷۸۹]+$
for numbers or regarding your regex flavor:^[\u06F0-\u06F9]+$
Use
[ ٌ ًّ َ ِ ُ ْ ]
for vowels or regarding your regex flavor:[\u202C\u064B\u064C\u064E-\u0652]
or a combination of those together. You may want to add other Arabic letters like Hamza ء
to your character set additionally.
Why are [\u0600-\u06FF]
and [آ-ی]
both wrong?
Although \u0600-\u06FF
includes:
گ
with codepoint06AF
چ
with codepoint0686
پ
with codepoint067E
ژ
with codepoint0698
as well, all answers that suggest [\u0600-\u06FF]
or [آ-ی]
are simply WRONG.
i.e.
\u0600-\u06FF
contains 209 more characters than you need! and it includes numbers too!
Whole story
This answer exists to fix a common misconception. Codepoints 0600
through 06FF
do not denote Persian / Farsi alphabet (neither does [آ-ی]
):
[\u0600-\u0605 ؐ-ؚ\u061Cـ ۖ-\u06DD ۟-ۤ ۧ ۨ ۪-ۭ ً-ٕ ٟ ٖ-ٞ ٰ ، ؍ ٫ ٬ ؛ ؞ ؟ ۔ ٭ ٪ ؉ ؊ ؈ ؎ ؏
۞ ۩ ؆ ؇ ؋ ٠۰ ١۱ ٢۲ ٣۳ ٤۴ ٥۵ ٦۶ ٧۷ ٨۸ ٩۹ ءٴ۽ آ أ ٲ ٱ ؤ إ ٳ ئ ا ٵ ٮ ب ٻ پ ڀ
ة-ث ٹ ٺ ټ ٽ ٿ ج ڃ ڄ چ ڿ ڇ ح خ ځ ڂ څ د ذ ڈ-ڐ ۮ ر ز ڑ-ڙ ۯ س ش ښ-ڜ ۺ ص ض ڝ ڞ
ۻ ط ظ ڟ ع غ ڠ ۼ ف ڡ-ڦ ٯ ق ڧ ڨ ك ک-ڴ ػ ؼ ل ڵ-ڸ م۾ ن ں-ڽ ڹ ه ھ ہ-ۃ ۿ ەۀ وۥ ٶ
ۄ-ۇ ٷ ۈ-ۋ ۏ ى يۦ ٸ ی-ێ ې ۑ ؽ-ؿ ؠ ے ۓ \u061D]
255 characters are fallen under Arabic block (0600–06FF), Farsi alphabet has 32 letters that in addition to Farsi demonstration of digits it would be 42. If we add vowels (Arabic vowels originally, that rarely used in Farsi) without Tanvin (ً
, ٍِ
, ٌ
) and Tashdid (ّ
) that are both a subset of Arabic diacritics not Farsi, we would end up with 46 characters. This means \u0600-\u06FF
contains 209 more characters than you need!
۷
with codepoint 06F7
is a Farsi representation of number 7
and ٧
with codepoint 0667
is Arabic representation of the same number. ۶
is Farsi representation of number 6
and ٦
is Arabic representation of the same number. And all reside in 0600
through 06FF
codepoints.
The shapes of the Persian digits four (
۴
), five (۵
), and six (۶
) are
different from the shapes used in Arabic and the other numbers have
different codepoints.
You can see different number of other characters that doesn't exist in Farsi / Persian too and nobody is willing to have them while validating a first name or surname.
[آ-ی]
includes 117 characters too which is much more than what someone needs for validation. You can see them all using Unicode CLDR.
Dedicated Regular Expression for Persian alphabet
Persian characters are within the Arabic Unicode block, which ranges from U+0600 to U+06FF (which is specified in character class as \u0600-\u06FF
).
function just_persian(str){
var p = /^[\u0600-\u06FF\s]+$/;
if (!p.test(str)) {
alert("not format");
}
}
Adapted to JavaScript from this question: Regex for check the input string is just in persian language
Regex for accepting Persian characters in address
The problem with your regex is that it contains nested quantifiers that quantify optional patterns.
Use linear logic:
^[\u0600-\u06FF]+(?:[\s0-9()،,-]+[\u0600-\u06FF]+)*$
See the regex demo
Details:
^
- start of string[\u0600-\u06FF]+
- 1 or more symbols from the given Unicode range(?:[\s0-9()،,-]+[\u0600-\u06FF]+)*
- 0+ sequences of:[\s0-9()،,-]+
- 1+ symbols: either whitespace, digits,(
,)
,،
,,
or-
[\u0600-\u06FF]+
- 1 or more symbols from the given Unicode range
$
- end of string.
persian character regex with english and persian number
Your pattern already seems to cover Persian numbers, so if all it is missing is English numerals, you may simply add them to the regex character class:
/^[\u0600-\u06FF\s0-9]+$/
See here for a chart of Unicode symbols for Persian numbers.
Using regex in php to just select and echo out persian chars
سلام عزیزم
$string = preg_replace("/[a-zA-Z0-9]/", "", $string);
You can simply remove all English alphabetical characters and all numbers from the string
OR : you can do it reverse :
$string = preg_replace("/[^ الف-ی]/i", "", $string);
This will remove all characters except Persian chars , This way you can remove all persian numbers too : d
How to check persian character format in regex
How about the regex
^(\[1\])\[[\p{L}\s]+\]$
example : http://regex101.com/r/cU1nQ8/1
\p{L}
matches any kind of letter from any language
Regex for check the input string is just in persian language
Check first letter and last letter range in Persian I think something like this:
"^[آ-ی]$"
Regex for both Persian numbers and English numbers
The shortest and most elegant RegEx for this would be:
[۰۱۲۳۴۵۶۷۸۹0-9]
Related Topics
Read Xml Attribute Using Xmldocument
How to Use Openfiledialog to Select a Folder
Deserializing JSON Object Array with JSON.Net
Using Linq to Concatenate Strings
The Entity Type <Type> Is Not Part of the Model for the Current Context
Deserialize Collection of Interface-Instances
View/Edit Id3 Data for Mp3 Files
How to Set Extended File Properties
Why Does This Floating-Point Calculation Give Different Results on Different MAChines
How to Start Winform App Minimized to Tray
Use Multiple Jwt Bearer Authentication
Method Cannot Be Translated into a Store Expression
How to Drag and Drop Files into an Application
Wcf Named Pipe Minimal Example
Calculating Distance Between Two Latitude and Longitude Geocoordinates
How to Implement a Configurationsection with a Configurationelementcollection