Regex to Ignore Accents? PHP

Regex to ignore accents? PHP

I don't think, there is such a way. That would be locale-dependent and you probably want a "/u" switch first to enable UTF-8 in pattern strings.

I would probably do something like this.

function prepare($pattern)
{
$replacements = Array("a" => "[áàäâ]",
"e" => "[éèëê]" ...);
return str_replace(array_keys($replacements), $replacements, $pattern);
}

pcre_replace("/(" . prepare($word) . ")/ui", "<b>\\1</b>", $str);

In your case, index was different, because unless you used mb_string you were probably dealing with UTF-8 which uses more than one byte per character.

Regex to match string with and without special/accented characters?

You can use the \p{L} pattern to match any letter.

Source

You have to use the u modifier after the regular expression to enable unicode mode.

Example : /\p{L}+/u

Edit :

Try something like this. It should replace every letter with an accent to a search pattern containing the accented letter (both single character and unicode dual) and the unaccented letter. You can then use the corrected search pattern to highlight your text.

function mbStringToArray($string)
{
$strlen = mb_strlen($string);
while($strlen)
{
$array[] = mb_substr($string, 0, 1, "UTF-8");
$string = mb_substr($string, 1, $strlen, "UTF-8");
$strlen = mb_strlen($string);
}
return $array;
}

// I had to use this ugly function to remove accents as iconv didn't work properly on my test server.
function stripAccents($stripAccents){
return utf8_encode(strtr(utf8_decode($stripAccents),utf8_decode('àáâãäçèéêëìíîïñòóôõöùúûüýÿÀÁÂÃÄÇÈÉÊËÌÍÎÏÑÒÓÔÕÖÙÚÛÜÝ'),'aaaaaceeeeiiiinooooouuuuyyAAAAACEEEEIIIINOOOOOUUUUY'));
}

$clientName = 'céra';

$clientNameNoAccent = stripAccents($clientName);

$clientNameArray = mbStringToArray($clientName);

foreach($clientNameArray as $pos => &$char)
{
$charNA =$clientNameNoAccent[$pos];
if($char != $charNA)
{
$char = "(?:$char|$charNA|$charNA\p{M})";
}
}

$clientSearchPattern = implode($clientNameArray); // c(?:é|e|e\p{M})ra

$text = 'the client name is Céra but it could be Cera or céra too.';

$search = preg_replace('/(.*?)(' . $clientSearchPattern . ')(.*?)/iu', '$1<span class="highlight">$2</span>$3', $text);

echo $search; // the client name is <span class="highlight">Céra</span> but it could be <span class="highlight">Cera</span> or <span class="highlight">céra</span> too.

regex to also match accented characters

$search = str_replace(
['a','e','i','o','u','ñ'],
['[aá]','[eé]','[ií]','[oó]','[uú]','[nñ]'],
$search)

This and the same for upper case will complain your request. A side note: ñ replacemet sounds invalid to me, as 'niño' is totaly diferent from 'nino'

PHP-REGEX: accented letters matches non-accented ones, and vice versa. How to achieve this?

You can try to make a function to create your regex expression based on your txt_search, replacing any possible match to all possible matches like this:

function search_term($txt_search) {
$search = preg_quote($txt_search);

$search = preg_replace('/[aàáâãåäæ]/iu', '[aàáâãåäæ]', $search);
$search = preg_replace('/[eèéêë]/iu', '[eèéêë]', $search);
$search = preg_replace('/[iìíîï]/iu', '[iìíîï]', $search);
$search = preg_replace('/[oòóôõöø]/iu', '[oòóôõöø]', $search);
$search = preg_replace('/[uùúûü]/iu', '[uùúûü]', $search);
// add any other character

return $search;
}

Then you use the result as a regex on your preg_replace.

How to match with regex unicode text ignoring diacritics on characters (Á É Í)

I finally found working solution thanks to this Tibor's answer here: Regex to ignore accents? PHP

My function highlights text ignoring diacritics, spaces, apostrophes and dashes:

  function highlight($pattern, $string)
{
$array = str_split($pattern);

//add or remove characters to be ignored
$pattern=implode('[\s\'\-]*', $array);

//list of letters with diacritics
$replacements = Array("a" => "[áa]", "e"=>"[ée]", "i"=>"[íi]", "o"=>"[óo]", "u"=>"[úu]", "A" => "[ÁA]", "E"=>"[ÉE]", "I"=>"[ÍI]", "O"=>"[ÓO]", "U"=>"[ÚU]");

$pattern=str_replace(array_keys($replacements), $replacements, $pattern);

//instead of <u> you can use <b>, <i> or even <div> or <span> with css class
return preg_replace("/(" . $pattern . ")/ui", "<u>\\1</u>", $string);
}

Replacing accented characters php

I have tried all sorts based on the variations listed in the answers, but the following worked:

$unwanted_array = array(    'Š'=>'S', 'š'=>'s', 'Ž'=>'Z', 'ž'=>'z', 'À'=>'A', 'Á'=>'A', 'Â'=>'A', 'Ã'=>'A', 'Ä'=>'A', 'Å'=>'A', 'Æ'=>'A', 'Ç'=>'C', 'È'=>'E', 'É'=>'E',
'Ê'=>'E', 'Ë'=>'E', 'Ì'=>'I', 'Í'=>'I', 'Î'=>'I', 'Ï'=>'I', 'Ñ'=>'N', 'Ò'=>'O', 'Ó'=>'O', 'Ô'=>'O', 'Õ'=>'O', 'Ö'=>'O', 'Ø'=>'O', 'Ù'=>'U',
'Ú'=>'U', 'Û'=>'U', 'Ü'=>'U', 'Ý'=>'Y', 'Þ'=>'B', 'ß'=>'Ss', 'à'=>'a', 'á'=>'a', 'â'=>'a', 'ã'=>'a', 'ä'=>'a', 'å'=>'a', 'æ'=>'a', 'ç'=>'c',
'è'=>'e', 'é'=>'e', 'ê'=>'e', 'ë'=>'e', 'ì'=>'i', 'í'=>'i', 'î'=>'i', 'ï'=>'i', 'ð'=>'o', 'ñ'=>'n', 'ò'=>'o', 'ó'=>'o', 'ô'=>'o', 'õ'=>'o',
'ö'=>'o', 'ø'=>'o', 'ù'=>'u', 'ú'=>'u', 'û'=>'u', 'ý'=>'y', 'þ'=>'b', 'ÿ'=>'y' );
$str = strtr( $str, $unwanted_array );

Remove special characters in regex PHP that allow accented words and chinese language

$string = preg_replace('/\PL/u', '', $string);
  • L is a character attribute meaning letter
  • \P means does not match attribute
  • /u is the Unicode modifier, you need this if you want to handle Unicode characters
  • make sure $string is encoded in UTF-8

So this matches all non-letters and removes them. I can only guess that this matches what you want. See http://www.php.net/manual/en/regexp.reference.unicode.php for more attributes you could match by, e.g. /[^\pL\pS]/u would match everything except letters and "symbols".

Regex that checks upper or lower case characters with or without accents

I see no reason as to why adding \s to that regex would not work. \s should match all whitespace characters.

$foo = preg_replace("/[^áéíóúÁÉÍÓÚñÑa-zA-Z\s]/", "", $_REQUEST["bar"]);


Related Topics



Leave a reply



Submit