Ucfirst() Function for Multibyte Character Encodings

ucfirst() function for multibyte character encodings

There is no mb_ucfirst function, as you've already noticed. You can fake a mb_ucfirst with two mb_substr:

function mb_ucfirst($string, $encoding)
{
$firstChar = mb_substr($string, 0, 1, $encoding);
$then = mb_substr($string, 1, null, $encoding);
return mb_strtoupper($firstChar, $encoding) . $then;
}

UTF-8 ucfirst doesnt work

This works (I know it does, I'm using it in my own projects)

function mb_ucfirst($string, $encoding='UTF-8') {
$firstChar = mb_substr($string, 0, 1, $encoding);
$then = mb_substr($string, 1, mb_strlen($string, $encoding)-1, $encoding);
return mb_strtoupper($firstChar, $encoding) . $then;
} // end function mb_ucfirst

Use it as mb_ucfirst($string);

Complete example:

<?php
$string = mb_ucfirst("ååååeee");
echo $string;

function mb_ucfirst($string, $encoding='UTF-8') {
$firstChar = mb_substr($string, 0, 1, $encoding);
$then = mb_substr($string, 1, mb_strlen($string, $encoding)-1, $encoding);
return mb_strtoupper($firstChar, $encoding) . $then;
} // end function mb_ucfirst
?>

ucfirst() not working properly with scandinavic characters

Your problem here is not ucfirst() it's strtolower(). You have to use mb_strtolower(), to get your string in lower case, e.g.

echo ucfirst(mb_strtolower($str));
//^^^^^^^^^^^^^^ See here

Also you can find a multibyte version of ucfirst() in the comments from the manual:

Simple multi-bytes ucfirst():

<?php

function my_mb_ucfirst($str) {
$fc = mb_strtoupper(mb_substr($str, 0, 1));
return $fc.mb_substr($str, 1);
}

Code from plemieux from the manual comment

Php function UTF-8 characters issue

The most straightforward way to make your code UTF-8 aware is to use mbstring functions instead of the plain dumb ones in the three cases where the latter appear:

function sentenceCase($str)
{
$cap = true;
$ret = '';
for ($x = 0; $x < mb_strlen($str); $x++) { // mb_strlen instead
$letter = mb_substr($str, $x, 1); // mb_substr instead
if ($letter == "." || $letter == "!" || $letter == "?") {
$cap = true;
} elseif ($letter != " " && $cap == true) {
$letter = mb_strtoupper($letter); // mb_strtoupper instead
$cap = false;
}
$ret .= $letter;
}
return $ret;
}

You can then configure mbstring to work with UTF-8 strings and you are ready to go:

mb_internal_encoding('UTF-8');
echo sentenceCase ("üias skdfnsknka");

Bonus solution

Specifically for UTF-8 you can also use a regular expression, which will result in less code:

$str = "üias skdfnsknka";
echo preg_replace_callback(
'/((?:^|[!.?])\s*)(\p{Ll})/u',
function($match) { return $match[1].mb_strtoupper($match[2], 'UTF-8'); },
$str);

strtolower() for unicode/multibyte strings

Have you tried using mb_strtolower()?

Encoding of Danish letters

Use mb_strtoupper and specify the character-encoding in mb_substr

echo mb_strtoupper(mb_substr('ølstykke', 0, 1,'utf-8'));//Ø

In your case maybe you want not only first character but also the rest characters,
so maybe mb_convert_case function can help you.

echo mb_convert_case('ølstykke', MB_CASE_TITLE, "UTF-8");//Ølstykke

How to make first letter of a word capital?

You may want to use ucfirst().

For multibyte strings, please see this snippet.



Related Topics



Leave a reply



Submit