How to sort an array of UTF-8 strings?
Eventually this problem cannot be solved in a simple way without using recoded strings (UTF-8 → Windows-1252 or ISO-8859-1) as suggested by ΤΖΩΤΖΙΟΥ due to an obvious PHP bug as discovered by Huppie.
To summarize the problem, I created the following code snippet which clearly demonstrates that the problem is the strcoll() function when using the 65001 Windows-UTF-8-codepage.
function traceStrColl($a, $b) {
$outValue=strcoll($a, $b);
echo "$a $b $outValue\r\n";
return $outValue;
}
$locale=(defined('PHP_OS') && stristr(PHP_OS, 'win')) ? 'German_Germany.65001' : 'de_DE.utf8';
$string="ABCDEFGHIJKLMNOPQRSTUVWXYZÄÖÜabcdefghijklmnopqrstuvwxyzäöüß";
$array=array();
for ($i=0; $i<mb_strlen($string, 'UTF-8'); $i++) {
$array[]=mb_substr($string, $i, 1, 'UTF-8');
}
$oldLocale=setlocale(LC_COLLATE, "0");
var_dump(setlocale(LC_COLLATE, $locale));
usort($array, 'traceStrColl');
setlocale(LC_COLLATE, $oldLocale);
var_dump($array);
The result is:
string(20) "German_Germany.65001"
a B 2147483647
[...]
array(59) {
[0]=>
string(1) "c"
[1]=>
string(1) "B"
[2]=>
string(1) "s"
[3]=>
string(1) "C"
[4]=>
string(1) "k"
[5]=>
string(1) "D"
[6]=>
string(2) "ä"
[7]=>
string(1) "E"
[8]=>
string(1) "g"
[...]
The same snippet works on a Linux machine without any problems producing the following output:
string(10) "de_DE.utf8"
a B -1
[...]
array(59) {
[0]=>
string(1) "a"
[1]=>
string(1) "A"
[2]=>
string(2) "ä"
[3]=>
string(2) "Ä"
[4]=>
string(1) "b"
[5]=>
string(1) "B"
[6]=>
string(1) "c"
[7]=>
string(1) "C"
[...]
The snippet also works when using Windows-1252 (ISO-8859-1) encoded strings (of course the mb_* encodings and the locale must be changed then).
I filed a bug report on bugs.php.net: Bug #46165 strcoll() does not work with UTF-8 strings on Windows. If you experience the same problem, you can give your feedback to the PHP team on the bug-report page (two other, probably related, bugs have been classified as bogus - I don't think that this bug is bogus ;-).
Thanks to all of you.
Utf8 Sort Array
You can use setlocale
along with first parameter LC_COLLATE
and second locale with en_US.utf8
and simply sort using usort
along with strcoll
try as
setlocale(LC_COLLATE, 'en_US.utf8');
$array = array('Australien','Belgien','Botswana','Brasilien','Bulgarien','Burma','China','Costa Rica','Ägypten');
usort($array, 'strcoll');
print_r($array);
Demo
PHP | alphabetical sort of UTF-8 tags ( Class 'Collator' not found )
After asking this question and learning the answer of it, I needed to find a way to sort UTF-8 (Turkish input) string array from MySql db alphabetically.
then I used Fy-'s answer in this link and it worked.
Now, my solution is:
setlocale(LC_COLLATE, 'tr_TR.utf8'); //utf-8 sorting için gerekli
$sorgum = "...";
$bindparametre1 = '...';
if ($beyan = $db_baglanti->prepare($sorgum))
{
/* bind parameters */
$beyan -> bind_param("s", $bindparametre1);
/* execute statement */
$beyan->execute();
/* bind result variables */
$beyan->bind_result($etiketler);
echo "\t".'<div class="sol-icerik-kapsar">'."\r\n";
echo "\t\t".'<h1>'.'Etiketler'.'</h1>'."\r\n";
/* fetch values */
$etiket_bulutu = '';
while ($beyan->fetch())
{
$etiket_bulutu .= $etiketler.', ';
}
$etiket_bulutu = substr_replace($etiket_bulutu ,'',-2); //en son 2karakteri yani {, } attık
$etiketler = explode(", ", $etiket_bulutu); //her bir etiketten array yarattık
$etiketler = array_unique($etiketler);
$etiketler = array_values($etiketler); //sadece tekil etiketler kaldı
$etadet = count($etiketler);
usort($etiketler, 'strcoll');
echo "\n\t\t".'<p>'."\n";
for($x=0;$x<$etadet;$x++)
{
echo "\t\t\t".'<a href="'.url_validate(sitenin_koku.'etiketler/'.$etiketler[$x]).'">'.convert_one_row($etiketler[$x]).'</a>, '."\n";
}
echo "\t\t".'</p>'."\n";
key codes for my solution here are : setlocale(LC_COLLATE, 'tr_TR.utf8');
and usort($etiketler, 'strcoll');
regards
Natural sorting algorithm in PHP with support for Unicode?
Nailed it!
$array = array('Ägile', 'Ãgile', 'Test', 'カタカナ', 'かたかな', 'Ágile', 'Àgile', 'Âgile', 'Agile');
function Sortify($string)
{
return preg_replace('~&([a-z]{1,2})(acute|cedil|circ|grave|lig|orn|ring|slash|tilde|uml);~i', '$1' . chr(255) . '$2', htmlentities($string, ENT_QUOTES, 'UTF-8'));
}
array_multisort(array_map('Sortify', $array), $array);
Output:
Array
(
[0] => Agile
[1] => Ágile
[2] => Âgile
[3] => Àgile
[4] => Ãgile
[5] => Ägile
[6] => Test
[7] => かたかな
[8] => カタカナ
)
Even better:
if (extension_loaded('intl') === true)
{
collator_asort(collator_create('root'), $array);
}
Thanks to @tchrist!
Strcoll UTF-8 Sorting - PHP
Ok, so in short, the problem you have is due to the value of the character. strcoll
compares strings based on the values of their characters for the character set you are using. I am going to assume you are using utf-8 or something very similar. See UTF-8 for the values, notice how the value of e with an accent mark is much higher than Z? That is why you are having this problem. In order to fix it, you will have to add special cases for the accented characters, as the normal sort with not work otherwise. So, basically, create your own compare function which would put e with an accent mark where f would usually be and so on.
PHP sort array with special characters
Taken reference from this example:-Sort an array with special characters in PHP
Explanation:-
Get array keys using
array_keys()
methodSort keys based on
iconv()
ANDstrcmp()
functionsIterated over the sorted key array and get their corresponding value from initial array.Save this key value pair to your resultant array
Do like below:-
<?php
setlocale(LC_ALL, 'sl_SI.utf8');
$a = [
'č' => [12],
'a' => [23],
'š' => [45],
'u' => [56]
];
$index_array = array_keys($a);
function compareASCII($a, $b) {
$at = iconv('UTF-8', 'ASCII//TRANSLIT', $a);
$bt = iconv('UTF-8', 'ASCII//TRANSLIT', $b);
return strcmp($at, $bt);
}
uasort($index_array, 'compareASCII');
$final_array = [];
foreach($index_array as $index_arr){
$final_array[$index_arr] = $a[$index_arr];
}
print_r($final_array);
Output:- https://eval.in/990872
Reference:-
iconv()
strcmp()
uasort
How to sort a collection of UTF-8 strings containing non-Latin chars in Laravel 5.3?
Here's a Solid way to do it:
$blank = array();
$collection = collect([
["name"=>"maroon"],
["name"=>"zoo"],
["name"=>"ábel"],
["name"=>"élof"]
])->toArray();
$count = count($collection);
for ($x=0; $x < $count; $x++) {
$blank[$x] = $collection[$x]['name'];
}
$collator = collator_create('en_US');
var_export($blank);
collator_sort( $collator, $blank );
var_export( $blank );
dd($blank);
Outputs:
array (
0 => 'maroon',
1 => 'zoo',
2 => 'ábel',
3 => 'élof',
)array (
0 => 'ábel',
1 => 'élof',
2 => 'maroon',
3 => 'zoo',
)
Laravel Pretty Output:
array:4 [
0 => "ábel"
1 => "élof"
2 => "maroon"
3 => "zoo"
]
For personal Reading and reference:
http://php.net/manual/en/class.collator.php
Hope this answer helps, sorry for late response =)
Related Topics
Create a Dynamic MySQL Query Using PHP Variables
What's the Best Practice to Set HTML Attribute via PHP
Transfer In-Memory Data to Ftp Server Without Using Intermediate File
Is There a PHP Sandbox, Something Like Jsfiddle Is to Js
PHP Add Elements to Multidimensional Array with Array_Push
How to Convert Soap Response to PHP Array
How to Delete a Folder with Contents Using PHP
Php: Get HTML Source Code with Curl
Insert Multiple Rows with Pdo Prepared Statements
Install Imagick for PHP and Apache on Windows
PHP Random Shuffle Array Maintaining Key => Value
Preg_Match(); - Unknown Modifier '+'
Best Practices to Test Protected Methods with PHPunit
Email from PHP Has Broken Subject Header Encoding
How Get Value for Unchecked Checkbox in Checkbox Elements When Form Posted