Replacing accented characters php
I have tried all sorts based on the variations listed in the answers, but the following worked:
$unwanted_array = array( 'Š'=>'S', 'š'=>'s', 'Ž'=>'Z', 'ž'=>'z', 'À'=>'A', 'Á'=>'A', 'Â'=>'A', 'Ã'=>'A', 'Ä'=>'A', 'Å'=>'A', 'Æ'=>'A', 'Ç'=>'C', 'È'=>'E', 'É'=>'E',
'Ê'=>'E', 'Ë'=>'E', 'Ì'=>'I', 'Í'=>'I', 'Î'=>'I', 'Ï'=>'I', 'Ñ'=>'N', 'Ò'=>'O', 'Ó'=>'O', 'Ô'=>'O', 'Õ'=>'O', 'Ö'=>'O', 'Ø'=>'O', 'Ù'=>'U',
'Ú'=>'U', 'Û'=>'U', 'Ü'=>'U', 'Ý'=>'Y', 'Þ'=>'B', 'ß'=>'Ss', 'à'=>'a', 'á'=>'a', 'â'=>'a', 'ã'=>'a', 'ä'=>'a', 'å'=>'a', 'æ'=>'a', 'ç'=>'c',
'è'=>'e', 'é'=>'e', 'ê'=>'e', 'ë'=>'e', 'ì'=>'i', 'í'=>'i', 'î'=>'i', 'ï'=>'i', 'ð'=>'o', 'ñ'=>'n', 'ò'=>'o', 'ó'=>'o', 'ô'=>'o', 'õ'=>'o',
'ö'=>'o', 'ø'=>'o', 'ù'=>'u', 'ú'=>'u', 'û'=>'u', 'ý'=>'y', 'þ'=>'b', 'ÿ'=>'y' );
$str = strtr( $str, $unwanted_array );
Replace Accented Characters
Trying to find a solution to your problem I have found this question:
multibyte strtr() -> mb_strtr()
In the chosen answer Alix Axel writes a function which is exactly what you need:
function Unaccent($string)
{
return preg_replace('~&([a-z]{1,2})(?:acute|cedil|circ|grave|lig|orn|ring|slash|th|tilde|uml|caron);~i', '$1', htmlentities($string, ENT_QUOTES, 'UTF-8'));
}
echo Unaccent(html_entity_decode('Áhkká'));
prints Ahkka
How do I remove accents from characters in a PHP string?
I think the problem here is that your encodings consider ä and å different symbols to 'a'. In fact, the PHP documentation for strtr offers a sample for removing accents the ugly way :(
http://ie2.php.net/strtr
Remove all special characters from a string
This should do what you're looking for:
function clean($string) {
$string = str_replace(' ', '-', $string); // Replaces all spaces with hyphens.
return preg_replace('/[^A-Za-z0-9\-]/', '', $string); // Removes special chars.
}
Usage:
echo clean('a|"bc!@£de^&$f g');
Will output: abcdef-g
Edit:
Hey, just a quick question, how can I prevent multiple hyphens from being next to each other? and have them replaced with just 1?
function clean($string) {
$string = str_replace(' ', '-', $string); // Replaces all spaces with hyphens.
$string = preg_replace('/[^A-Za-z0-9\-]/', '', $string); // Removes special chars.
return preg_replace('/-+/', '-', $string); // Replaces multiple hyphens with single one.
}
PHP replacing special characters like à->a, è->e
There's a much easier way to do this, using iconv
- from the user notes, this seems to be what you want to do: characters transliteration
// PHP.net User notes
<?php
$string = "ʿABBĀSĀBĀD";
echo iconv('UTF-8', 'ISO-8859-1//TRANSLIT', $string);
// output: [nothing, and you get a notice]
echo iconv('UTF-8', 'ISO-8859-1//IGNORE', $string);
// output: ABBSBD
echo iconv('UTF-8', 'ISO-8859-1//TRANSLIT//IGNORE', $string);
// output: ABBASABAD
// Yay! That's what I wanted!
?>
Be very conscientious with your character encodings, so you are keeping the same encoding at all stages in the process - front end, form submission, encoding of the source files. Default encoding in PHP and in forms is ISO-8859-1, before PHP 5.4 where it changed to be UTF8 (finally!).
There's a couple of functions you can play around with for ideas. First is from CakePHP's inflector class, called slug
:
public static function slug($string, $replacement = '_') {
$quotedReplacement = preg_quote($replacement, '/');
$merge = array(
'/[^\s\p{Ll}\p{Lm}\p{Lo}\p{Lt}\p{Lu}\p{Nd}]/mu' => ' ',
'/\\s+/' => $replacement,
sprintf('/^[%s]+|[%s]+$/', $quotedReplacement, $quotedReplacement) => '',
);
$map = self::$_transliteration + $merge;
return preg_replace(array_keys($map), array_values($map), $string);
}
It depends on a self::$_transliteration
array which is similar to what you were doing in your question - you can see the source for inflector on github.
Another is a function I use personally, which comes from here.
function slugify($text,$strict = false) {
$text = html_entity_decode($text, ENT_QUOTES, 'UTF-8');
// replace non letter or digits by -
$text = preg_replace('~[^\\pL\d.]+~u', '-', $text);
// trim
$text = trim($text, '-');
setlocale(LC_CTYPE, 'en_GB.utf8');
// transliterate
if (function_exists('iconv')) {
$text = iconv('utf-8', 'us-ascii//TRANSLIT', $text);
}
// lowercase
$text = strtolower($text);
// remove unwanted characters
$text = preg_replace('~[^-\w.]+~', '', $text);
if (empty($text)) {
return 'empty_$';
}
if ($strict) {
$text = str_replace(".", "_", $text);
}
return $text;
}
What those functions do is transliterate and create 'slugs' from arbitrary text input, which is a very very useful thing to have in your toolchest when making web apps. Hope this helps!
how to make preg_replace for accented characters
In your code, the variable $adress
is passed to preg_replace
by value. The line
preg_replace('/[é]|[è]/','e',$adress);
actually replaces the characters in a temporary copy of $adress
variable. But the result is unused.
If you want to modify it, you should assign the result of preg_replace
to it:
$adress = preg_replace('/[éè]/u', 'e', $adress);
Note the use of u
flag. Also, I have slightly optimized the original regular expression.
Alternatively, use str_replace
. It is not considered "multibyte", but it actually can be used for such replacements:
$adress = str_replace(['é', 'è'], 'e', $adress);
P.S.: consider renaming $adress
to $address
.
Related Topics
PHP If Statement With Multiple Conditions
PHP: Replace Umlauts With Closest 7-Bit Ascii Equivalent in an Utf-8 String
MySQLi Equivalent of MySQL_Result()
PHP, How to Catch a Division by Zero
What Is the Purpose of the Question Marks Before Type Declaration in PHP7 (String or Int)
How to Create a Simple 'Hello World' Module in Magento
How to Check Whether an Array Is Empty Using PHP
Delete Directory With Files in It
Global or Singleton For Database Connection
Efficient Jpeg Image Resizing in PHP
Why Do Some Scripts Omit the Closing PHP Tag, '≫'