Preg_Match with International Characters and Accents

preg_match with international characters and accents

Add the UTF-8 modifier flag (u) to your expression:

/^\p{L}+$/ui

There is also no need to wrap \p{L} inside of a character class.

preg_match accented characters

This pattern should work:

/^\pL+(?>[- ']\pL+)*$/u

demo

But feel free to adapt it for more exotic names (For example names with a trailing quote or an apostrophe).

How do I match accented characters with PHP preg?

You could use Unicode character properties to describe the characters:

/^[\p{L}-]*$/u

\p{L} describes the class of Unicode letter characters.

preg_match special characters

[\W]+ will match any non-word character.

but to match only the characters from the question, use this:

  $string="sadw$"
if(preg_match("/[\[^\'£$%^&*()}{@:\'#~?><>,;@\|\\\-=\-_+\-¬\`\]]/", $string)){
//this string contain atleast one of these [^'£$%^&*()}{@:'#~?><>,;@|\-=-_+-¬`] characters
}

php regex match possible accented characters

Thanks for the help everyone, but i will end it up using my first sugestion i made in my question. And thanks again @CasimiretHippolyte for your patience, and making me realize that isn't that overkill as i thought.

Here is the final code I'm using (first the functions):

function removeAccents($string)
{
return preg_replace('/[\x{0300}-\x{036f}]/u', '', Normalizer::normalize($string, Normalizer::FORM_KD));
}

function addAccents($string)
{
$array1 = array('a', 'c', 'e', 'i' , 'n', 'o', 'u', 'y');
$array2 = array('[aàáâãäå]','[cçćĉċč]','[eèéêë]','[iìíîï]','[nñ]','[oòóôõö]','[uùúûü]','[yýÿ]');

return str_replace($array1, $array2, strtolower($string));
}

And:

$word="something";
$word = preg_quote(trim($word)); //Just in case
$word2 = $this->addAccents($this->removeAccents($word)); //check all possible accents
if(!empty($word)) {
$sentence = "/(".$word.")|(".$word2.")/ui"; // Now I'm checking my normal word and the possible variations of it.
if (preg_match($sentence, $content)){
echo "found";
}
}

Btw, im covering all possible accents from my country (and some others). You should check if you need to improve the addAccents() function before use it.

Using preg_match to match weird characters

You can use 'u' modifier (utf-8):

~(\w+)~u

regex101 demo

Why does this PHP regex not match for accented characters?

PHP regexes need delimiters, like so:

preg_match('/[ÀÁÅÃÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõöøùúûüýþÿ]/', "gustaría");

Note that it's also preferable to use single quotes for regex because the dollar sign could be mistaken by php as a variable.

Regex to match string with and without special/accented characters?

You can use the \p{L} pattern to match any letter.

Source

You have to use the u modifier after the regular expression to enable unicode mode.

Example : /\p{L}+/u

Edit :

Try something like this. It should replace every letter with an accent to a search pattern containing the accented letter (both single character and unicode dual) and the unaccented letter. You can then use the corrected search pattern to highlight your text.

function mbStringToArray($string)
{
$strlen = mb_strlen($string);
while($strlen)
{
$array[] = mb_substr($string, 0, 1, "UTF-8");
$string = mb_substr($string, 1, $strlen, "UTF-8");
$strlen = mb_strlen($string);
}
return $array;
}

// I had to use this ugly function to remove accents as iconv didn't work properly on my test server.
function stripAccents($stripAccents){
return utf8_encode(strtr(utf8_decode($stripAccents),utf8_decode('àáâãäçèéêëìíîïñòóôõöùúûüýÿÀÁÂÃÄÇÈÉÊËÌÍÎÏÑÒÓÔÕÖÙÚÛÜÝ'),'aaaaaceeeeiiiinooooouuuuyyAAAAACEEEEIIIINOOOOOUUUUY'));
}

$clientName = 'céra';

$clientNameNoAccent = stripAccents($clientName);

$clientNameArray = mbStringToArray($clientName);

foreach($clientNameArray as $pos => &$char)
{
$charNA =$clientNameNoAccent[$pos];
if($char != $charNA)
{
$char = "(?:$char|$charNA|$charNA\p{M})";
}
}

$clientSearchPattern = implode($clientNameArray); // c(?:é|e|e\p{M})ra

$text = 'the client name is Céra but it could be Cera or céra too.';

$search = preg_replace('/(.*?)(' . $clientSearchPattern . ')(.*?)/iu', '$1<span class="highlight">$2</span>$3', $text);

echo $search; // the client name is <span class="highlight">Céra</span> but it could be <span class="highlight">Cera</span> or <span class="highlight">céra</span> too.

Best preg_match expression for international name

Try placing the \p{L} and allowed symbols inside brackets:

$name = "B-jör n'Bòrg";
if (preg_match("/^[- '\p{L}]+$/u", $name)) {
echo "$name is a valid name!"; // It is
}

You may also want to add some additional checks, e.g. to make sure that names starts and ends with a letter and not a symbol.

Edit

This will make sure that names starts/ends with a letter and does not contain consecutive symbols:

$name = "-Björ n''Bòrg-";
if (preg_match("/^\p{L}([- ']\p{L}|\p{L})*$/u", $name)) {
echo "$name is a valid name!"; // It's not
}


Related Topics



Leave a reply



Submit