Replacing Accented Characters PHP

Replacing accented characters php

I have tried all sorts based on the variations listed in the answers, but the following worked:

$unwanted_array = array(    'Š'=>'S', 'š'=>'s', 'Ž'=>'Z', 'ž'=>'z', 'À'=>'A', 'Á'=>'A', 'Â'=>'A', 'Ã'=>'A', 'Ä'=>'A', 'Å'=>'A', 'Æ'=>'A', 'Ç'=>'C', 'È'=>'E', 'É'=>'E',
                            'Ê'=>'E', 'Ë'=>'E', 'Ì'=>'I', 'Í'=>'I', 'Î'=>'I', 'Ï'=>'I', 'Ñ'=>'N', 'Ò'=>'O', 'Ó'=>'O', 'Ô'=>'O', 'Õ'=>'O', 'Ö'=>'O', 'Ø'=>'O', 'Ù'=>'U',
                            'Ú'=>'U', 'Û'=>'U', 'Ü'=>'U', 'Ý'=>'Y', 'Þ'=>'B', 'ß'=>'Ss', 'à'=>'a', 'á'=>'a', 'â'=>'a', 'ã'=>'a', 'ä'=>'a', 'å'=>'a', 'æ'=>'a', 'ç'=>'c',
                            'è'=>'e', 'é'=>'e', 'ê'=>'e', 'ë'=>'e', 'ì'=>'i', 'í'=>'i', 'î'=>'i', 'ï'=>'i', 'ð'=>'o', 'ñ'=>'n', 'ò'=>'o', 'ó'=>'o', 'ô'=>'o', 'õ'=>'o',
                            'ö'=>'o', 'ø'=>'o', 'ù'=>'u', 'ú'=>'u', 'û'=>'u', 'ý'=>'y', 'þ'=>'b', 'ÿ'=>'y' );
$str = strtr( $str, $unwanted_array );

Replace Accented Characters

Trying to find a solution to your problem I have found this question:

multibyte strtr() -> mb_strtr()

In the chosen answer Alix Axel writes a function which is exactly what you need:

function Unaccent($string)
{
    return preg_replace('~&([a-z]{1,2})(?:acute|cedil|circ|grave|lig|orn|ring|slash|th|tilde|uml|caron);~i', '$1', htmlentities($string, ENT_QUOTES, 'UTF-8'));
}

echo Unaccent(html_entity_decode('Áhkká'));

prints Ahkka

How do I remove accents from characters in a PHP string?

I think the problem here is that your encodings consider ä and å different symbols to 'a'. In fact, the PHP documentation for strtr offers a sample for removing accents the ugly way :(

http://ie2.php.net/strtr

Remove all special characters from a string

This should do what you're looking for:

function clean($string) {
   $string = str_replace(' ', '-', $string); // Replaces all spaces with hyphens.

   return preg_replace('/[^A-Za-z0-9\-]/', '', $string); // Removes special chars.
}

Usage:

echo clean('a|"bc!@£de^&$f g');

Will output: abcdef-g

Edit:

Hey, just a quick question, how can I prevent multiple hyphens from being next to each other? and have them replaced with just 1?

function clean($string) {
   $string = str_replace(' ', '-', $string); // Replaces all spaces with hyphens.
   $string = preg_replace('/[^A-Za-z0-9\-]/', '', $string); // Removes special chars.

   return preg_replace('/-+/', '-', $string); // Replaces multiple hyphens with single one.
}

PHP replacing special characters like à->a, è->e

There's a much easier way to do this, using iconv - from the user notes, this seems to be what you want to do: characters transliteration

// PHP.net User notes
<?php
    $string = "ʿABBĀSĀBĀD";

    echo iconv('UTF-8', 'ISO-8859-1//TRANSLIT', $string);
    // output: [nothing, and you get a notice]

    echo iconv('UTF-8', 'ISO-8859-1//IGNORE', $string);
    // output: ABBSBD

    echo iconv('UTF-8', 'ISO-8859-1//TRANSLIT//IGNORE', $string);
    // output: ABBASABAD
    // Yay! That's what I wanted!
?>

Be very conscientious with your character encodings, so you are keeping the same encoding at all stages in the process - front end, form submission, encoding of the source files. Default encoding in PHP and in forms is ISO-8859-1, before PHP 5.4 where it changed to be UTF8 (finally!).

There's a couple of functions you can play around with for ideas. First is from CakePHP's inflector class, called slug:

public static function slug($string, $replacement = '_') {
    $quotedReplacement = preg_quote($replacement, '/');

    $merge = array(
        '/[^\s\p{Ll}\p{Lm}\p{Lo}\p{Lt}\p{Lu}\p{Nd}]/mu' => ' ',
        '/\\s+/' => $replacement,
        sprintf('/^[%s]+|[%s]+$/', $quotedReplacement, $quotedReplacement) => '',
    );

    $map = self::$_transliteration + $merge;
    return preg_replace(array_keys($map), array_values($map), $string);
}

It depends on a self::$_transliteration array which is similar to what you were doing in your question - you can see the source for inflector on github.

Another is a function I use personally, which comes from here.

function slugify($text,$strict = false) {
    $text = html_entity_decode($text, ENT_QUOTES, 'UTF-8');
    // replace non letter or digits by -
    $text = preg_replace('~[^\\pL\d.]+~u', '-', $text);

    // trim
    $text = trim($text, '-');
    setlocale(LC_CTYPE, 'en_GB.utf8');
    // transliterate
    if (function_exists('iconv')) {
        $text = iconv('utf-8', 'us-ascii//TRANSLIT', $text);
    }

    // lowercase
    $text = strtolower($text);
    // remove unwanted characters
    $text = preg_replace('~[^-\w.]+~', '', $text);
    if (empty($text)) {
        return 'empty_$';
    }
    if ($strict) {
        $text = str_replace(".", "_", $text);
    }
    return $text;
}

What those functions do is transliterate and create 'slugs' from arbitrary text input, which is a very very useful thing to have in your toolchest when making web apps. Hope this helps!

how to make preg_replace for accented characters

In your code, the variable $adress is passed to preg_replace by value. The line

preg_replace('/[é]|[è]/','e',$adress);

actually replaces the characters in a temporary copy of $adress variable. But the result is unused.

If you want to modify it, you should assign the result of preg_replace to it:

$adress = preg_replace('/[éè]/u', 'e', $adress);

Note the use of u flag. Also, I have slightly optimized the original regular expression.

Alternatively, use str_replace. It is not considered "multibyte", but it actually can be used for such replacements:

$adress = str_replace(['é', 'è'], 'e', $adress);

P.S.: consider renaming $adress to $address.

Replacing Accented Characters PHP