preg_match with international characters and accents
Add the UTF-8 modifier flag (u) to your expression:
/^\p{L}+$/ui
There is also no need to wrap \p{L}
inside of a character class.
preg_match accented characters
This pattern should work:
/^\pL+(?>[- ']\pL+)*$/u
demo
But feel free to adapt it for more exotic names (For example names with a trailing quote or an apostrophe).
How do I match accented characters with PHP preg?
You could use Unicode character properties to describe the characters:
/^[\p{L}-]*$/u
\p{L}
describes the class of Unicode letter characters.
preg_match special characters
[\W]+
will match any non-word character.
but to match only the characters from the question, use this:
$string="sadw$"
if(preg_match("/[\[^\'£$%^&*()}{@:\'#~?><>,;@\|\\\-=\-_+\-¬\`\]]/", $string)){
//this string contain atleast one of these [^'£$%^&*()}{@:'#~?><>,;@|\-=-_+-¬`] characters
}
php regex match possible accented characters
Thanks for the help everyone, but i will end it up using my first sugestion i made in my question. And thanks again @CasimiretHippolyte for your patience, and making me realize that isn't that overkill as i thought.
Here is the final code I'm using (first the functions):
function removeAccents($string)
{
return preg_replace('/[\x{0300}-\x{036f}]/u', '', Normalizer::normalize($string, Normalizer::FORM_KD));
}
function addAccents($string)
{
$array1 = array('a', 'c', 'e', 'i' , 'n', 'o', 'u', 'y');
$array2 = array('[aàáâãäå]','[cçćĉċč]','[eèéêë]','[iìíîï]','[nñ]','[oòóôõö]','[uùúûü]','[yýÿ]');
return str_replace($array1, $array2, strtolower($string));
}
And:
$word="something";
$word = preg_quote(trim($word)); //Just in case
$word2 = $this->addAccents($this->removeAccents($word)); //check all possible accents
if(!empty($word)) {
$sentence = "/(".$word.")|(".$word2.")/ui"; // Now I'm checking my normal word and the possible variations of it.
if (preg_match($sentence, $content)){
echo "found";
}
}
Btw, im covering all possible accents from my country (and some others). You should check if you need to improve the addAccents()
function before use it.
Using preg_match to match weird characters
You can use 'u' modifier (utf-8):
~(\w+)~u
regex101 demo
Why does this PHP regex not match for accented characters?
PHP regexes need delimiters, like so:
preg_match('/[ÀÁÅÃÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõöøùúûüýþÿ]/', "gustarÃa");
Note that it's also preferable to use single quotes for regex because the dollar sign could be mistaken by php as a variable.
Regex to match string with and without special/accented characters?
You can use the \p{L}
pattern to match any letter.
Source
You have to use the u
modifier after the regular expression to enable unicode mode.
Example : /\p{L}+/u
Edit :
Try something like this. It should replace every letter with an accent to a search pattern containing the accented letter (both single character and unicode dual) and the unaccented letter. You can then use the corrected search pattern to highlight your text.
function mbStringToArray($string)
{
$strlen = mb_strlen($string);
while($strlen)
{
$array[] = mb_substr($string, 0, 1, "UTF-8");
$string = mb_substr($string, 1, $strlen, "UTF-8");
$strlen = mb_strlen($string);
}
return $array;
}
// I had to use this ugly function to remove accents as iconv didn't work properly on my test server.
function stripAccents($stripAccents){
return utf8_encode(strtr(utf8_decode($stripAccents),utf8_decode('àáâãäçèéêëìíîïñòóôõöùúûüýÿÀÁÂÃÄÇÈÉÊËÌÍÎÏÑÒÓÔÕÖÙÚÛÜÝ'),'aaaaaceeeeiiiinooooouuuuyyAAAAACEEEEIIIINOOOOOUUUUY'));
}
$clientName = 'céra';
$clientNameNoAccent = stripAccents($clientName);
$clientNameArray = mbStringToArray($clientName);
foreach($clientNameArray as $pos => &$char)
{
$charNA =$clientNameNoAccent[$pos];
if($char != $charNA)
{
$char = "(?:$char|$charNA|$charNA\p{M})";
}
}
$clientSearchPattern = implode($clientNameArray); // c(?:é|e|e\p{M})ra
$text = 'the client name is Céra but it could be Cera or céra too.';
$search = preg_replace('/(.*?)(' . $clientSearchPattern . ')(.*?)/iu', '$1<span class="highlight">$2</span>$3', $text);
echo $search; // the client name is <span class="highlight">Céra</span> but it could be <span class="highlight">Cera</span> or <span class="highlight">céra</span> too.
Best preg_match expression for international name
Try placing the \p{L}
and allowed symbols inside brackets:
$name = "B-jör n'Bòrg";
if (preg_match("/^[- '\p{L}]+$/u", $name)) {
echo "$name is a valid name!"; // It is
}
You may also want to add some additional checks, e.g. to make sure that names starts and ends with a letter and not a symbol.
Edit
This will make sure that names starts/ends with a letter and does not contain consecutive symbols:
$name = "-Björ n''Bòrg-";
if (preg_match("/^\p{L}([- ']\p{L}|\p{L})*$/u", $name)) {
echo "$name is a valid name!"; // It's not
}
Related Topics
Phpexcel Download Using Ajax Call
Is This the Most Efficient Way to Get and Remove First Line in File
How to Display PHP Errors in Code Output
How Does Pcntl_Fork Work in PHP
How to Upgrade from PHPmailer 5.2 to 6.0
Utf-8 in PHP Regular Expressions
How to Check If a MySQL Query Using the Legacy API Was Successful
Time Calculation in PHP (Add 10 Hours)
Avoiding MySQL Injections with the Zend_Db Class
How to Insert into MySQL Using a Prepared Statement with PHP
How to Read Xml File from Url Using PHP
How to Install Gd on My Windows Server Version of PHP
How to Redirect a 404 Error in a Custom 404 Page Using Codeigniter
iOS Push Notification Does Not Work When Using Crontab Scheduler
Laravel: General Error: 1615 Prepared Statement Needs to Be Re-Prepared