What is the best way to split a string into an array of Unicode characters in PHP?
You could use the 'u' modifier with PCRE regex ; see Pattern Modifiers (quoting) :
For instance, considering this code :u (PCRE8)
This modifier turns on additional
functionality of PCRE that is
incompatible with Perl. Pattern
strings are treated as UTF-8. This
modifier is available from PHP 4.1.0
or greater on Unix and from PHP 4.2.3
on win32. UTF-8 validity of the
pattern is checked since PHP 4.3.5.
header('Content-type: text/html; charset=UTF-8'); // So the browser doesn't make our lives harder
$str = "abc 文字化け, efg";
$results = array();
preg_match_all('/./', $str, $results);
var_dump($results[0]);
You'll get an unusable result:array
0 => string 'a' (length=1)
1 => string 'b' (length=1)
2 => string 'c' (length=1)
3 => string ' ' (length=1)
4 => string '�' (length=1)
5 => string '�' (length=1)
6 => string '�' (length=1)
7 => string '�' (length=1)
8 => string '�' (length=1)
9 => string '�' (length=1)
10 => string '�' (length=1)
11 => string '�' (length=1)
12 => string '�' (length=1)
13 => string '�' (length=1)
14 => string '�' (length=1)
15 => string '�' (length=1)
16 => string ',' (length=1)
17 => string ' ' (length=1)
18 => string 'e' (length=1)
19 => string 'f' (length=1)
20 => string 'g' (length=1)
But, with this code :header('Content-type: text/html; charset=UTF-8'); // So the browser doesn't make our lives harder
$str = "abc 文字化け, efg";
$results = array();
preg_match_all('/./u', $str, $results);
var_dump($results[0]);
(Notice the 'u' at the end of the regex)You get what you want :
array
0 => string 'a' (length=1)
1 => string 'b' (length=1)
2 => string 'c' (length=1)
3 => string ' ' (length=1)
4 => string '文' (length=3)
5 => string '字' (length=3)
6 => string '化' (length=3)
7 => string 'け' (length=3)
8 => string ',' (length=1)
9 => string ' ' (length=1)
10 => string 'e' (length=1)
11 => string 'f' (length=1)
12 => string 'g' (length=1)
Hope this helps :-) Split string into array based on a unicode character range in PHP
You have to check also with a look ahead if the next character is a cyrrilic one. This code will do the job:
$t = preg_split ('/(?<=[^а-я])(?=[а-я]+)/ius', $text, NULL, PREG_SPLIT_NO_EMPTY);
It gives this output:Array
(
[0] => «
[1] => Добрый
[2] => день!» -
[3] => сказал
[4] => он,
[5] => потянувшись…
)
Here you can try it. Convert a String into an Array of Characters - multi-byte
Just pass an empty pattern with the PREG_SPLIT_NO_EMPTY
flag.
Otherwise, you can write a pattern with \X
(unicode dot) and \K
(restart fullstring match). I'll include a mb_split()
call and a preg_match_all()
call for completeness.
Code: (Demo)
$string='先秦兩漢';
var_export(preg_split('~~u', $string, 0, PREG_SPLIT_NO_EMPTY));
echo "\n---\n";
var_export(preg_split('~\X\K~u', $string, 0, PREG_SPLIT_NO_EMPTY));
echo "\n---\n";
var_export(preg_split('~\X\K(?!$)~u', $string));
echo "\n---\n";
var_export(mb_split('\X\K(?!$)', $string));
echo "\n---\n";
var_export(preg_match_all('~\X~u', $string, $out) ? $out[0] : []);
All produce::array (
0 => '先',
1 => '秦',
2 => '兩',
3 => '漢',
)
From https://www.regular-expressions.info/unicode.html: How to Match a Single Unicode Grapheme
Matching a single grapheme, whether it's encoded as a single code point, or as multiple code points using combining marks, is easy in Perl, PCRE, PHP, Boost, Ruby 2.0, Java 9, and the Just Great Software applications: simply use \X.
You can consider \X the Unicode version of the dot. There is one difference, though: \X always matches line break characters, whereas the dot does not match line break characters unless you enable the dot matches newline matching mode.
UPDATE, DHarman has brought to my attention that
mb_str_split()
is now available from PHP7.4.The default length parameter of the new function is 1, so the length parameter can be omitted for this case.
https://wiki.php.net/rfc/mb_str_split
Dharman's demo: https://3v4l.org/M85Fi/rfc#output
Convert a String into an Array of Characters
You will want to use str_split().
$result = str_split('abcdef');
http://us2.php.net/manual/en/function.str-split.php PHP - Split String Into Arrays for every N characters
You could do something like this:
$text = 'VERY LONG STRING';
$result = [];
$partial = [];
$len = 0;
foreach(explode(' ', $text) as $chunk) {
$chunkLen = strlen($chunk);
if ($len + $chunkLen > 5000) {
$result[] = $partial;
$partial = [];
$len = 0;
}
$len += $chunkLen;
$partial[] = $chunk;
}
if ($partial) {
$result[] = $partial;
}
You can test it more easily if you do it with a lower max length PHP : Split a comma delimited & special symbol string into an array
Try this -
str = "your string";
$arr = explode('@', $str);
$newArray = array();
foreach ($arr as $val) {
$temp = explode(',', $val);
$newTemp['treatment'] = $temp[0];
$newTemp['quantity'] = $temp[1];
$newTemp['cost'] = $temp[2];
$newTemp['discount'] = $temp[3];
$newTemp['discount_type'] = "INR";
$newTemp['total'] = $temp[4];
$newTemp['note'] = $temp[5];
$newArray[] = $newTemp;
$temp = array();
}
var_dump($newArray);
How to split string with special character (�) using PHP
Take a look at mb_split:
array mb_split ( string $pattern , string $string [, int $limit = -1 ] )
Like this:Split a multibyte string using regular expression pattern and returns
the result as an array.
$string = "a�b�k�e";
$chunks = mb_split("�", $string);
print_r($chunks);
Outputs:Array
(
[0] => a
[1] => b
[2] => k
[3] => e
)
How to split a string character by character, , paying attention to special characters
str_split
has problems with Unicode strings.
You can use the u
modifier in preg_split
instead
For instance:
$input = "Comment ça va?";
$letters1 = str_split($input);
$letters2 = preg_split('//u', $input, -1, PREG_SPLIT_NO_EMPTY);
print_r($letters1);
print_r($letters2);
Will outputArray ( [0] => C [1] => o [2] => m [3] => m [4] => e
[5] => n [6] => t [7] => [8] => � [9] => �
[10] => a [11] => [12] => v [13] => a [14] => ? )
Array ( [0] => C [1] => o [2] => m [3] => m [4] => e
[5] => n [6] => t [7] => [8] => ç [9] => a
[10] => [11] => v [12] => a [13] => ? )
Related Topics
Ssl Alternative - Encrypt Password with JavaScript Submit to PHP to Decrypt
How to Save/Redirect Output from Laravel Artisan Command
PHP Looping Through Multiple Arrays
Adding an Admin Page on Opencart Version 2
Laravel 4 - Including a "Partial" View Within a View (Without Using Blade Template)
.Htaccess Rewrite: Subdomain as Get Var and Path as Get Var
Symfony2:Two Forms in a Same Page
What Is the Best Method for Getting a Database Connection/Object into a Function in PHP
Get Service Container from Entity in Symfony 2.1 (Doctrine)
How to Force Ssl in Codeigniter
Docx File Type in PHP Finfo_File Is Application/Zip
Print Directly to Network Printer Using PHP
PHP Search Array Key and Get Value
Why a Full Stop, "." and Not a Plus Symbol, "+", for String Concatenation in PHP