Matching Unicode letter characters in PCRE/PHP
I think the problem is much simpler than that: You forgot to specify the u
modifier. The Unicode character properties are only available in UTF-8 mode.
Your regex should be:
// unicode letters, apostrophe, hyphen, space
$namePattern = '/^[-\' \p{L}]+$/u';
PHP - regex to allow unicode charcaters
From http://php.net/manual/en/reference.pcre.pattern.modifiers.php
u (PCRE_UTF8) This modifier turns on additional functionality of PCRE
that is incompatible with Perl. Pattern and subject strings are
treated as UTF-8. An invalid subject will cause the preg_* function to
match nothing; an invalid pattern will trigger an error of level
E_WARNING. Five and six octet UTF-8 sequences are regarded as invalid
since PHP 5.3.4 (resp. PCRE 7.3 2007-08-28); formerly those have been
regarded as valid UTF-8.
That means that first you have to make sure the input string is proper UTF-8 text.
Secondly, have you heard of unicode categories? If not, head to http://www.regular-expressions.info/unicode.html and search for Unicode categories. For example you could use \p{S}
to match all currency symbols, or \p{L}
for all letters. Your regex could (probably) be written as follows: /[^\p{L}\p{P}\p{N}\p{S}\p{M}]/
.
This will though match pretty much nothing, as it allows pretty much all characters to be used - ^
at the start of a regex character class (something between [
and ]
) means "everything that is not what is in this class will be matched".
On top of that, your regex will only match input that has a length of exactly one - if you want to match everything, you should begin adding a +
after your closing ]
to keep matching characters until the pattern fails.
So, for that sake, what exactly are you trying to achieve? Maybe we can suggest you some more regex improvements if we know what you're trying to do.
Regular expressions for a range of unicode points PHP
You can use:
$foo = preg_replace('/[^\w$\x{0080}-\x{FFFF}]+/u', '', $foo);
\w
- is equivalent of[a-zA-Z0-9_]
\x{0080}-\x{FFFF}
to match characters between code pointsU
+0080and
U+FFFF`/u
for unicode support in regex
PHP: How to match a range of unicode paired surrogates emoticons/emoji?
revo's comment above was very helpful to find a solution:
If your PHP isn't shipped with a PCRE build for UTF-16 then you can't perform such a match. From PHP 7.0 on, you're able to use Unicode code points following this syntax
\u{XXXX}
e.g.preg_replace("~\u{1F600}~", '', $str);
(Mind the double quotes)
Since I am using PHP 7, echo "\u{1F602}";
outputs according to this PHP RFC page on unicode escape. This proposal was in essence:
A new escape sequence is added for double-quoted strings and heredocs.
\u{ codepoint-digits }
wherecodepoint-digits
is composed of hexadecimal digits.
This implies that the matching string in preg_replace
(normally single-quoted for not messing up with double-quoted strings variable expansion), now needs some preg_quote
magic. This is the solution I came up with:
preg_replace(
// single point unicode list
"/[\x{2600}-\x{26FF}".
// http://www.fileformat.info/info/unicode/block/miscellaneous_symbols/list.htm
// concatenates with paired surrogates
preg_quote("\u{1F600}", '/')."-".preg_quote("\u{1F64F}", '/').
// https://www.fileformat.info/info/unicode/block/emoticons/list.htm
"]/u",
'',
$str
);
Here's the proof of the above in 3v4l.
EDIT: a simpler solution
In another comment made by revo, it seems that by placing unicode characters directly into the regex character class, single-quoted strings and previous PHP versions (e.g. 4.3.4) are supported:
preg_replace('/[☀-⛿-]/u','YOINK',$str);
For using PHP 7's new feature though, you still need double-quotes:
preg_replace("/[\u{2600}-\u{26FF}\u{1F600}-\u{1F64F}]/u",'YOINK',$str);
Here's revo's proof in 3v4l.
Matching Unicode letter characters in PCRE/PHP
I think the problem is much simpler than that: You forgot to specify the u
modifier. The Unicode character properties are only available in UTF-8 mode.
Your regex should be:
// unicode letters, apostrophe, hyphen, space
$namePattern = '/^[-\' \p{L}]+$/u';
Matching Unicode letter characters in PCRE/PHP
I think the problem is much simpler than that: You forgot to specify the u
modifier. The Unicode character properties are only available in UTF-8 mode.
Your regex should be:
// unicode letters, apostrophe, hyphen, space
$namePattern = '/^[-\' \p{L}]+$/u';
Regex to match letters, numbers and space, including non-ascii characters
You can use unicode letter and unicode number properties for this:
preg_match('/^([-_ \p{L}\p{N}])+$/iu', $string)
Update: You may not need a capturing group here:
preg_match('/^[-_ \p{L}\p{N}]+$/iu', $string)
Matching Unicode letter characters in PCRE/PHP
I think the problem is much simpler than that: You forgot to specify the u
modifier. The Unicode character properties are only available in UTF-8 mode.
Your regex should be:
// unicode letters, apostrophe, hyphen, space
$namePattern = '/^[-\' \p{L}]+$/u';
Weird behaviour with multibyte strings and php regex
You have to add the UTF-8 flag for tests like these, i.e '/[£]/u'
.
From the PHP docs:
u (PCRE_UTF8) This modifier turns on additional functionality of PCRE that is incompatible with Perl. Pattern and subject strings are treated as UTF-8. An invalid subject will cause the preg_* function to match nothing; an invalid pattern will trigger an error of level E_WARNING. Five and six octet UTF-8 sequences are regarded as invalid since PHP 5.3.4 (resp. PCRE 7.3 2007-08-28); formerly those have been regarded as valid UTF-8.
Related Topics
Target Class Controller Does Not Exist - Laravel 8
In_Array() and Multidimensional Array
Jquery Validate Remote Method Usage to Check If Username Already Exists
PHP Syntax For Dereferencing Function Result
How to Post Json Data With PHP Curl
How to Convert Date to Timestamp in PHP
How to Read If a Checkbox Is Checked in PHP
Facebook Graph API Not Work from 2.2 to 2.3
How to Pad Single-Digit Numbers With a Leading 0
How to Convert a Pdf Document to a Preview Image in PHP
How to Force File Download With PHP
How to Search by Key=≫Value in a Multidimensional Array in PHP
How to Encrypt and Decrypt a PHP String
PHP Display Image Blob from MySQL
How to Close a Connection Early