How to Match Accented Characters with PHP Preg

How do I match accented characters with PHP preg?

You could use Unicode character properties to describe the characters:

/^[\p{L}-]*$/u

\p{L} describes the class of Unicode letter characters.

preg_match accented characters

This pattern should work:

/^\pL+(?>[- ']\pL+)*$/u

demo

But feel free to adapt it for more exotic names (For example names with a trailing quote or an apostrophe).

preg_match with international characters and accents

Add the UTF-8 modifier flag (u) to your expression:

/^\p{L}+$/ui

There is also no need to wrap \p{L} inside of a character class.

PHP Regex for Accented Characters

Your regex is faulty. The part а-à gives the error Character range is out of order - I guess the - was added by mistake there...

Then a small hint: is not '

[^a-zA-Z0-9àâáçéèèêëìîíïôòóùûüÂÊÎÔúÛÄËÏÖÜÀÆæÇÉÈŒœÙñý'’,. ] 

should work fine.

Also, if you're working with Regex, tools like RegExr or regex101 are really a nice thing.

php regex match possible accented characters

Thanks for the help everyone, but i will end it up using my first sugestion i made in my question. And thanks again @CasimiretHippolyte for your patience, and making me realize that isn't that overkill as i thought.

Here is the final code I'm using (first the functions):

function removeAccents($string)
{
return preg_replace('/[\x{0300}-\x{036f}]/u', '', Normalizer::normalize($string, Normalizer::FORM_KD));
}

function addAccents($string)
{
$array1 = array('a', 'c', 'e', 'i' , 'n', 'o', 'u', 'y');
$array2 = array('[aàáâãäå]','[cçćĉċč]','[eèéêë]','[iìíîï]','[nñ]','[oòóôõö]','[uùúûü]','[yýÿ]');

return str_replace($array1, $array2, strtolower($string));
}

And:

$word="something";
$word = preg_quote(trim($word)); //Just in case
$word2 = $this->addAccents($this->removeAccents($word)); //check all possible accents
if(!empty($word)) {
$sentence = "/(".$word.")|(".$word2.")/ui"; // Now I'm checking my normal word and the possible variations of it.
if (preg_match($sentence, $content)){
echo "found";
}
}

Btw, im covering all possible accents from my country (and some others). You should check if you need to improve the addAccents() function before use it.

preg_match - words with accents

If you want to match a specific word, with accented variations, alas you will need to define the alternatives that you will permit for each character. Example:

/h[oòóôõö]h[oòóôõö]/ui

Here's a useful reference table for the Unicode character set:

http://unicode-table.com/en/

regex to also match accented characters

$search = str_replace(
['a','e','i','o','u','ñ'],
['[aá]','[eé]','[ií]','[oó]','[uú]','[nñ]'],
$search)

This and the same for upper case will complain your request. A side note: ñ replacemet sounds invalid to me, as 'niño' is totaly diferent from 'nino'

how to make preg_replace for accented characters

In your code, the variable $adress is passed to preg_replace by value. The line

preg_replace('/[é]|[è]/','e',$adress);

actually replaces the characters in a temporary copy of $adress variable. But the result is unused.

If you want to modify it, you should assign the result of preg_replace to it:

$adress = preg_replace('/[éè]/u', 'e', $adress);

Note the use of u flag. Also, I have slightly optimized the original regular expression.

Alternatively, use str_replace. It is not considered "multibyte", but it actually can be used for such replacements:

$adress = str_replace(['é', 'è'], 'e', $adress);

P.S.: consider renaming $adress to $address.

preg_match and UTF-8 in PHP

Looks like this is a "feature", see
http://bugs.php.net/bug.php?id=37391

'u' switch only makes sense for pcre, PHP itself is unaware of it.

From PHP's point of view, strings are byte sequences and returning byte offset seems logical (i don't say "correct").

How do I match accented characters in preg_match()?

Use the /u modifier. That will enable Unicode for the regexes.
http://php.net/manual/en/reference.pcre.pattern.modifiers.php



Related Topics



Leave a reply



Submit