PHP Replacing Special Characters Like à->A, è->E

PHP replacing special characters like à- a, è- e

There's a much easier way to do this, using iconv - from the user notes, this seems to be what you want to do: characters transliteration

// PHP.net User notes
<?php
$string = "ʿABBĀSĀBĀD";

echo iconv('UTF-8', 'ISO-8859-1//TRANSLIT', $string);
// output: [nothing, and you get a notice]

echo iconv('UTF-8', 'ISO-8859-1//IGNORE', $string);
// output: ABBSBD

echo iconv('UTF-8', 'ISO-8859-1//TRANSLIT//IGNORE', $string);
// output: ABBASABAD
// Yay! That's what I wanted!
?>

Be very conscientious with your character encodings, so you are keeping the same encoding at all stages in the process - front end, form submission, encoding of the source files. Default encoding in PHP and in forms is ISO-8859-1, before PHP 5.4 where it changed to be UTF8 (finally!).

There's a couple of functions you can play around with for ideas. First is from CakePHP's inflector class, called slug:

public static function slug($string, $replacement = '_') {
$quotedReplacement = preg_quote($replacement, '/');

$merge = array(
'/[^\s\p{Ll}\p{Lm}\p{Lo}\p{Lt}\p{Lu}\p{Nd}]/mu' => ' ',
'/\\s+/' => $replacement,
sprintf('/^[%s]+|[%s]+$/', $quotedReplacement, $quotedReplacement) => '',
);

$map = self::$_transliteration + $merge;
return preg_replace(array_keys($map), array_values($map), $string);
}

It depends on a self::$_transliteration array which is similar to what you were doing in your question - you can see the source for inflector on github.

Another is a function I use personally, which comes from here.

function slugify($text,$strict = false) {
$text = html_entity_decode($text, ENT_QUOTES, 'UTF-8');
// replace non letter or digits by -
$text = preg_replace('~[^\\pL\d.]+~u', '-', $text);

// trim
$text = trim($text, '-');
setlocale(LC_CTYPE, 'en_GB.utf8');
// transliterate
if (function_exists('iconv')) {
$text = iconv('utf-8', 'us-ascii//TRANSLIT', $text);
}

// lowercase
$text = strtolower($text);
// remove unwanted characters
$text = preg_replace('~[^-\w.]+~', '', $text);
if (empty($text)) {
return 'empty_$';
}
if ($strict) {
$text = str_replace(".", "_", $text);
}
return $text;
}

What those functions do is transliterate and create 'slugs' from arbitrary text input, which is a very very useful thing to have in your toolchest when making web apps. Hope this helps!

Replacing accented characters php

I have tried all sorts based on the variations listed in the answers, but the following worked:

$unwanted_array = array(    'Š'=>'S', 'š'=>'s', 'Ž'=>'Z', 'ž'=>'z', 'À'=>'A', 'Á'=>'A', 'Â'=>'A', 'Ã'=>'A', 'Ä'=>'A', 'Å'=>'A', 'Æ'=>'A', 'Ç'=>'C', 'È'=>'E', 'É'=>'E',
'Ê'=>'E', 'Ë'=>'E', 'Ì'=>'I', 'Í'=>'I', 'Î'=>'I', 'Ï'=>'I', 'Ñ'=>'N', 'Ò'=>'O', 'Ó'=>'O', 'Ô'=>'O', 'Õ'=>'O', 'Ö'=>'O', 'Ø'=>'O', 'Ù'=>'U',
'Ú'=>'U', 'Û'=>'U', 'Ü'=>'U', 'Ý'=>'Y', 'Þ'=>'B', 'ß'=>'Ss', 'à'=>'a', 'á'=>'a', 'â'=>'a', 'ã'=>'a', 'ä'=>'a', 'å'=>'a', 'æ'=>'a', 'ç'=>'c',
'è'=>'e', 'é'=>'e', 'ê'=>'e', 'ë'=>'e', 'ì'=>'i', 'í'=>'i', 'î'=>'i', 'ï'=>'i', 'ð'=>'o', 'ñ'=>'n', 'ò'=>'o', 'ó'=>'o', 'ô'=>'o', 'õ'=>'o',
'ö'=>'o', 'ø'=>'o', 'ù'=>'u', 'ú'=>'u', 'û'=>'u', 'ý'=>'y', 'þ'=>'b', 'ÿ'=>'y' );
$str = strtr( $str, $unwanted_array );

how to replace special characters with the ones they're based on in PHP?

This answer is incorrect. I didn't understand Unicode Normalization when I wrote it.
Look at francadaval's comment and link

Check out the Normalizer class to do this. The documentation is good, so I'll just link it instead of repeating things here:

http://www.php.net/manual/en/class.normalizer.php

Specifically, the normalize member of that class:

http://www.php.net/manual/en/normalizer.normalize.php

Note that Unicode normalization has several forms, and you seem to want Normalization Form KD (NFKD) Compatibility Decomposition, though you should read the documentation to make sure.

You shouldn't try to roll your own function for this: There's way too many things that can go wrong, and using the provided function is a much better idea.

PHP replace special characters from a string

Add those characters you want to keep to preg, also add Upper cases if neededç I edited your code:

function clean($string) {
$string = str_replace(' ', ' ', $string);
$string = preg_replace('/[^A-Za-z0-9\-ığşçöüÖÇŞİıĞ]/', ' ', $string);

return preg_replace('/-+/', '-', $string);
}

Test:

$str='Merhaba=Türkiye 12345 çok çalış another one ! *, !@_';
var_dump(clean($str));
//Output: string(57) "Merhaba Türkiye 12345 çok çalış another one "

How can I preg_replace special character like 'Prêt-à-porter'?

"Pr\u00eat-\u00e0-porter" is a correct JavaScript string literal representation of Prêt-à-porter. I assume you're doing a json_encode at some point along the line?

Note also that PHP's regular expressions are not Unicode-aware, so if you are using UTF-8 (which generally you want to be), the character ê is not a single character, but byte C3 followed by byte AA. That's fine for simple literal matches, but in situations like a character class you're now matching two bytes separately instead of one after each other, which can easily mess up your expression.

Replacing specific special character in a string of custom tag

Finally :) The only way this works was changing each line i read from the text file for its HEX equivalent and searching for a HEX pattern and replacing it for a "readable one"

I found this somewhere here

private function string_to_hex($string){
$hex = '';
for ($i=0; $i<strlen($string); $i++){
$ord = ord($string[$i]);
$hexCode = dechex($ord);
$hex .= substr('0'.$hexCode, -2);
}
return strToUpper($hex);
}

and then replace the hex pattern

$line = utf8_encode($line);
$hex_line = str_replace('C286C286', '2B2B', $this->string_to_hex($line));

note: even the C286 hex value is not the actual charater i was loking for but i belive is his utf8-hex aproximation.

and finally convert the HEX string to a human redable string again

$line = $this->hex_to_string($hex_line);

private function hex_to_string($hex){
$string='';
for ($i=0; $i < strlen($hex)-1; $i+=2){
$string .= chr(hexdec($hex[$i].$hex[$i+1]));
}
return $string;
}

it cost me a little performance...YES but nothing else seems to work :(

How to convert special characters to normal characters?

try this .. works for me.

iconv('utf-8', 'ascii//TRANSLIT', $text);

Issues replacing special characters in PHP string

I copied and pasted your code into my editor and something interesting happened. Instead of getting adios I was getting adjiós. Notice the j in the middle after the d. This was coming from the 'đ'=>'dj', in the first line of the table map. Apparently, my editor changed the đ to a regular d, and then it wouldn't convert the ó. I removed this key/value pair and suddenly it worked for me. Are you sure all of your keys are correct in your editor (Does you editor accept alternative character sets?) Here is my test file (with the đ removed:

<html>
<head>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=ISO-8859-1">
</head>
<body>
<?php

function normalize ($string) {
$table = array(
'Š'=>'S', 'š'=>'s', 'Ð'=>'Dj', 'Ž'=>'Z', 'ž'=>'z', 'C'=>'C', 'c'=>'c', 'C'=>'C', 'c'=>'c',
'À'=>'A', 'Á'=>'A', 'Â'=>'A', 'Ã'=>'A', 'Ä'=>'A', 'Å'=>'A', 'Æ'=>'A', 'Ç'=>'C', 'È'=>'E', 'É'=>'E',
'Ê'=>'E', 'Ë'=>'E', 'Ì'=>'I', 'Í'=>'I', 'Î'=>'I', 'Ï'=>'I', 'Ñ'=>'N', 'Ò'=>'O', 'Ó'=>'O', 'Ô'=>'O',
'Õ'=>'O', 'Ö'=>'O', 'Ø'=>'O', 'Ù'=>'U', 'Ú'=>'U', 'Û'=>'U', 'Ü'=>'U', 'Ý'=>'Y', 'Þ'=>'B', 'ß'=>'Ss',
'à'=>'a', 'á'=>'a', 'â'=>'a', 'ã'=>'a', 'ä'=>'a', 'å'=>'a', 'æ'=>'a', 'ç'=>'c', 'è'=>'e', 'é'=>'e',
'ê'=>'e', 'ë'=>'e', 'ì'=>'i', 'í'=>'i', 'î'=>'i', 'ï'=>'i', 'ð'=>'o', 'ñ'=>'n', 'ò'=>'o', 'ó'=>'o',
'ô'=>'o', 'õ'=>'o', 'ö'=>'o', 'ø'=>'o', 'ù'=>'u', 'ú'=>'u', 'û'=>'u', 'ý'=>'y', 'ý'=>'y', 'þ'=>'b',
'ÿ'=>'y', 'R'=>'R', 'r'=>'r',
);

return strtr($string, $table);
}

$word = 'adiós';
$length = strlen($word);

echo 'original: '. $word;
echo '<br />';
echo 'normalized: '. normalize($word);
echo '<br />';
echo 'loop: ';

for($i = 0; $i < $length; $i++) {
echo normalize($word[$i]);
}

?>

</body>
</html>

When I loop through each character with the 'd' => 'dj' in the array map then I correctly get adjios



Related Topics



Leave a reply



Submit