Using str_split on a UTF-8 encoded string
str_split
does not work with multi-byte characters, it will only return the first byte - thus invalidating your characters. you could use mb_split
.
Using str_split on a UTF-8 encoded string
str_split
does not work with multi-byte characters, it will only return the first byte - thus invalidating your characters. you could use mb_split
.
PHP str_split and UTF8 polish characters
str_split
works on byte level and not on character level (despite its name). So in fact you're splitting mała
along its bytes and not along its characters. That's why you're getting an array of five items instead of four. Index 2 and 3 together form the UTF-8 encoding of ł
.
You need to use either the mbstring
or the iconv
extension to split your string manually.
$str = 'mała';
$len = mb_strlen($str, 'UTF-8');
$result = [];
for ($i = 0; $i < $len; $i++) {
$result[] = mb_substr($str, $i, 1, 'UTF-8');
}
var_dump($result);
Split a UTF-8 encoded string on blank characters without knowing about UTF-8 encoding
Yes, you can.
Multibyte sequences necessarily include one lead byte (the two MSBs equal to 11
) and one ore more continuation bytes (two MSBs equal to 10
). The total length of the multibyte sequence (lead byte+continuation bytes) is equal to the number of count of MSBs equal to 1 in the lead byte, before the first bit 0 appears (e.g.: if lead byte is 110xxxxx
, exactly one continuation byte should follow; if it is 11110xxx
, there should be exactly three continuation bytes).
So, if you find short MB sequences or stray continuationb bytes without a lead byte, your string is probably invalid anyway, and you split procedures probably wouldn't screw it any further than what it probably already was.
But there is something you might want to note: Unicode introduces other “blank” symbols in the upper, non-ASCII compatible ranges. You might want to treat them accordingly.
I have some trouble with str_split, it doesn't work correctly with my language
You must use Multibyte String Functions for manipulating persian string.
You can use preg_split for your porpuse.
print_r(preg_split('//u', "رستوران ها", null, PREG_SPLIT_NO_EMPTY));
Output:
Array
(
[0] => ر
[1] => س
[2] => ت
[3] => و
[4] => ر
[5] => ا
[6] => ن
[7] =>
[8] => ه
[9] => ا
)
my function doesn't return some letters even though included in utf-8 (vscode, php)
str_split
operates on bytes, and characters such as æ
take up more than 1 byte in UTF-8.
So if you str_split
these characters, they basically get 'split in two' into an invalid character. Just run count()
on $letterarr
to see that there are 9 items in the array, instead of the expected 7.
The solution is to use PHP's string functions that are UTF-8 aware. Simply changing str_split
into mb_str_split
will fix your code sample.
Split utf8 string into array of chars
I found out the é was not the character I expected. Apparently there is a difference between né and ńe. I got it working by normalizing the string first.
PHP str_split on string with decoded html_entity
This should do well:
function mb_str_split($string) {
return preg_split('/(?<!^)(?!$)/u', $string );
}
$string = 'My string ‘to parse’';
$string = utf8_encode($string);
$string_decoded = html_entity_decode($string, ENT_QUOTES, 'utf-8');
$string_array = mb_str_split($string_decoded);
var_dump($string_array);
As mentioned in comments: you need to split the string with mb_split or by regex.
Proof: https://3v4l.org/3FRmG
Related Topics
Ip Address Storing in MySQL Database Using PHP
Function to Return Only Alpha-Numeric Characters from String
Request Headers Bag Is Missing Authorization Header in Symfony 2
How to Set Up Entity (Doctrine) for Database View in Symfony 2
Difference Between & and && in PHP
PHP _Php_Incomplete_Class Object with My $_Session Data
How to Use Array_Push on a Session Array in PHP
Explode a String to Associative Array Without Using Loops
Get Text from <Option> Tag Using PHP
Pass Data from Jquery to PHP for an Ajax Post
Programmatically Add Product to Cart with Price Change
How to Check an Ip Address Is Within a Range of Two Ips in PHP
Passing PHP Objects to JavaScript
Download CSV File Using "Ajax"
Simple HTML Dom File_Get_HTML Not Working - Is There Any Workaround