How to Sort an Array of Utf-8 Strings in PHP

How to sort an array of UTF-8 strings?

Eventually this problem cannot be solved in a simple way without using recoded strings (UTF-8 → Windows-1252 or ISO-8859-1) as suggested by ΤΖΩΤΖΙΟΥ due to an obvious PHP bug as discovered by Huppie.
To summarize the problem, I created the following code snippet which clearly demonstrates that the problem is the strcoll() function when using the 65001 Windows-UTF-8-codepage.

function traceStrColl($a, $b) {
$outValue=strcoll($a, $b);
echo "$a $b $outValue\r\n";
return $outValue;
}

$locale=(defined('PHP_OS') && stristr(PHP_OS, 'win')) ? 'German_Germany.65001' : 'de_DE.utf8';

$string="ABCDEFGHIJKLMNOPQRSTUVWXYZÄÖÜabcdefghijklmnopqrstuvwxyzäöüß";
$array=array();
for ($i=0; $i<mb_strlen($string, 'UTF-8'); $i++) {
$array[]=mb_substr($string, $i, 1, 'UTF-8');
}
$oldLocale=setlocale(LC_COLLATE, "0");
var_dump(setlocale(LC_COLLATE, $locale));
usort($array, 'traceStrColl');
setlocale(LC_COLLATE, $oldLocale);
var_dump($array);

The result is:

string(20) "German_Germany.65001"
a B 2147483647
[...]
array(59) {
[0]=>
string(1) "c"
[1]=>
string(1) "B"
[2]=>
string(1) "s"
[3]=>
string(1) "C"
[4]=>
string(1) "k"
[5]=>
string(1) "D"
[6]=>
string(2) "ä"
[7]=>
string(1) "E"
[8]=>
string(1) "g"
[...]

The same snippet works on a Linux machine without any problems producing the following output:

string(10) "de_DE.utf8"
a B -1
[...]
array(59) {
[0]=>
string(1) "a"
[1]=>
string(1) "A"
[2]=>
string(2) "ä"
[3]=>
string(2) "Ä"
[4]=>
string(1) "b"
[5]=>
string(1) "B"
[6]=>
string(1) "c"
[7]=>
string(1) "C"
[...]

The snippet also works when using Windows-1252 (ISO-8859-1) encoded strings (of course the mb_* encodings and the locale must be changed then).

I filed a bug report on bugs.php.net: Bug #46165 strcoll() does not work with UTF-8 strings on Windows. If you experience the same problem, you can give your feedback to the PHP team on the bug-report page (two other, probably related, bugs have been classified as bogus - I don't think that this bug is bogus ;-).

Thanks to all of you.

Utf8 Sort Array

You can use setlocale along with first parameter LC_COLLATE and second locale with en_US.utf8 and simply sort using usort along with strcoll try as

setlocale(LC_COLLATE, 'en_US.utf8');
$array = array('Australien','Belgien','Botswana','Brasilien','Bulgarien','Burma','China','Costa Rica','Ägypten');
usort($array, 'strcoll');
print_r($array);

Demo

PHP | alphabetical sort of UTF-8 tags ( Class 'Collator' not found )

After asking this question and learning the answer of it, I needed to find a way to sort UTF-8 (Turkish input) string array from MySql db alphabetically.

then I used Fy-'s answer in this link and it worked.

Now, my solution is:

setlocale(LC_COLLATE, 'tr_TR.utf8'); //utf-8 sorting için gerekli
$sorgum = "...";
$bindparametre1 = '...';

if ($beyan = $db_baglanti->prepare($sorgum))
{
/* bind parameters */
$beyan -> bind_param("s", $bindparametre1);

/* execute statement */
$beyan->execute();

/* bind result variables */
$beyan->bind_result($etiketler);

echo "\t".'<div class="sol-icerik-kapsar">'."\r\n";
echo "\t\t".'<h1>'.'Etiketler'.'</h1>'."\r\n";

/* fetch values */
$etiket_bulutu = '';
while ($beyan->fetch())
{
$etiket_bulutu .= $etiketler.', ';
}
$etiket_bulutu = substr_replace($etiket_bulutu ,'',-2); //en son 2karakteri yani {, } attık
$etiketler = explode(", ", $etiket_bulutu); //her bir etiketten array yarattık
$etiketler = array_unique($etiketler);
$etiketler = array_values($etiketler); //sadece tekil etiketler kaldı
$etadet = count($etiketler);

usort($etiketler, 'strcoll');

echo "\n\t\t".'<p>'."\n";
for($x=0;$x<$etadet;$x++)
{
echo "\t\t\t".'<a href="'.url_validate(sitenin_koku.'etiketler/'.$etiketler[$x]).'">'.convert_one_row($etiketler[$x]).'</a>, '."\n";
}
echo "\t\t".'</p>'."\n";

key codes for my solution here are : setlocale(LC_COLLATE, 'tr_TR.utf8'); and usort($etiketler, 'strcoll');

regards

Natural sorting algorithm in PHP with support for Unicode?

Nailed it!

$array = array('Ägile', 'Ãgile', 'Test', 'カタカナ', 'かたかな', 'Ágile', 'Àgile', 'Âgile', 'Agile');

function Sortify($string)
{
return preg_replace('~&([a-z]{1,2})(acute|cedil|circ|grave|lig|orn|ring|slash|tilde|uml);~i', '$1' . chr(255) . '$2', htmlentities($string, ENT_QUOTES, 'UTF-8'));
}

array_multisort(array_map('Sortify', $array), $array);

Output:

Array
(
[0] => Agile
[1] => Ágile
[2] => Âgile
[3] => Àgile
[4] => Ãgile
[5] => Ägile
[6] => Test
[7] => かたかな
[8] => カタカナ
)

Even better:

if (extension_loaded('intl') === true)
{
collator_asort(collator_create('root'), $array);
}

Thanks to @tchrist!

Strcoll UTF-8 Sorting - PHP

Ok, so in short, the problem you have is due to the value of the character. strcoll compares strings based on the values of their characters for the character set you are using. I am going to assume you are using utf-8 or something very similar. See UTF-8 for the values, notice how the value of e with an accent mark is much higher than Z? That is why you are having this problem. In order to fix it, you will have to add special cases for the accented characters, as the normal sort with not work otherwise. So, basically, create your own compare function which would put e with an accent mark where f would usually be and so on.

PHP sort array with special characters

Taken reference from this example:-Sort an array with special characters in PHP

Explanation:-

  1. Get array keys using array_keys() method

  2. Sort keys based on iconv() AND strcmp() functions

  3. Iterated over the sorted key array and get their corresponding value from initial array.Save this key value pair to your resultant array

Do like below:-

<?php

setlocale(LC_ALL, 'sl_SI.utf8');

$a = [
'č' => [12],
'a' => [23],
'š' => [45],
'u' => [56]
];

$index_array = array_keys($a);

function compareASCII($a, $b) {
$at = iconv('UTF-8', 'ASCII//TRANSLIT', $a);
$bt = iconv('UTF-8', 'ASCII//TRANSLIT', $b);
return strcmp($at, $bt);
}

uasort($index_array, 'compareASCII');

$final_array = [];
foreach($index_array as $index_arr){

$final_array[$index_arr] = $a[$index_arr];
}

print_r($final_array);

Output:- https://eval.in/990872

Reference:-

iconv()

strcmp()

uasort

How to sort a collection of UTF-8 strings containing non-Latin chars in Laravel 5.3?

Here's a Solid way to do it:

$blank = array();
$collection = collect([
["name"=>"maroon"],
["name"=>"zoo"],
["name"=>"ábel"],
["name"=>"élof"]
])->toArray();

$count = count($collection);

for ($x=0; $x < $count; $x++) {
$blank[$x] = $collection[$x]['name'];
}

$collator = collator_create('en_US');
var_export($blank);
collator_sort( $collator, $blank );
var_export( $blank );

dd($blank);

Outputs:

array (
0 => 'maroon',
1 => 'zoo',
2 => 'ábel',
3 => 'élof',
)array (
0 => 'ábel',
1 => 'élof',
2 => 'maroon',
3 => 'zoo',
)

Laravel Pretty Output:

array:4 [
0 => "ábel"
1 => "élof"
2 => "maroon"
3 => "zoo"
]

For personal Reading and reference:
http://php.net/manual/en/class.collator.php

Hope this answer helps, sorry for late response =)



Related Topics



Leave a reply



Submit