strtolower() for unicode/multibyte strings
Have you tried using mb_strtolower()
?
accents and UTf-8 php using strtolower
Seeing the comment and edit (was not shown in the original post) about the use of strtolower()
, the manual states:
Note that 'alphabetic' is determined by the current locale. This means that e.g. in the default "C" locale, characters such as umlaut-A (Ä) will not be converted.
mb_strtolower() on the other hand, shows:
By contrast to strtolower(), 'alphabetic' is determined by the Unicode character properties. Thus the behaviour of this function is not affected by locale settings and it can convert any characters that have 'alphabetic' property, such as A-umlaut (Ä).
ucfirst() function for multibyte character encodings
There is no mb_ucfirst
function, as you've already noticed. You can fake a mb_ucfirst
with two mb_substr
:
function mb_ucfirst($string, $encoding)
{
$firstChar = mb_substr($string, 0, 1, $encoding);
$then = mb_substr($string, 1, null, $encoding);
return mb_strtoupper($firstChar, $encoding) . $then;
}
Does InnoDB stores multibyte strings in expanded form, in indexes?
All characters in utf8 string are stored as variable-length encodings. Each character uses either 1, 2, 3, or 4 bytes depending on its code point. A string can have a mix of encodings, because each code point identifies its length in the initial bits of each byte.
The characters that are in the ASCII subset will only use 1 byte.
Combine preg _replace and strtolower
Its better to split this into a two-liner and debug the output of the commands with var_dump() in order to see whats going on:
<?php
/* string with special chars */
$string = 'abczABCZ-#+´!"§123';
$no_special_chars = preg_replace("/[^a-zA-Z]/", "", $string);
var_dump($no_special_chars); // string 'abczABCZ' (length=8)
$lowercased = strtolower($no_special_chars);
var_dump($lowercased); // string 'abczabcz' (length=8)
And maybe you noticed, that you don't have to handle A-Z
in the preg_replace(), if you lowercase the string first.
$res = preg_replace("/[^a-z]/", "", strtolower($string));
var_dump($res); // string 'abczabcz' (length=8)
Detecting and removing multibyte strings in R
This is probably an encoding issue, so try change the encoding during load! Try something like this,
df<- read.csv(file_path,
encoding = "iso-8859-1", "use different encodings/langs"
header = TRUE,
stringsAsFactors = FALSE)
ucfirst() not working properly with scandinavic characters
Your problem here is not ucfirst()
it's strtolower()
. You have to use mb_strtolower()
, to get your string in lower case, e.g.
echo ucfirst(mb_strtolower($str));
//^^^^^^^^^^^^^^ See here
Also you can find a multibyte version of ucfirst()
in the comments from the manual:
Simple multi-bytes ucfirst():
<?php
function my_mb_ucfirst($str) {
$fc = mb_strtoupper(mb_substr($str, 0, 1));
return $fc.mb_substr($str, 1);
}
Code from plemieux from the manual comment
Related Topics
How to Make My PHP Script Run at a Certain Time Everyday
How to Detect Ambiguous and Invalid Datetime in PHP
Getimagesize() Not Returning False When It Should
How to Get the Last Dir from a Path in a String
In PHP, How to Generate a Big Pseudo-Random Number
Laravel Collection Converts Array to Object
Good Tutorial on How to Update Your MySQL Database with a PHP Form
Counting Values in Multidimensional Array
Mysql_Fetch_Array Add All Rows
Google+ Login via JavaScript and PHP
How to Create PHP Two Column Table with Values from the Database
How to Upload a File Using Jquery's $.Ajax Function with JSON and PHP
PHP Loop Counter Bootstrap Row
How to Make First Letter of a Word Capital