Trim Unicode Whitespace in PHP 5.2

Trim unicode whitespace in PHP 5.2

preg_replace('/^[\pZ\pC]+|[\pZ\pC]+$/u','',$str);

Trim unicode whitespaces PHP

Such Unicode whitespaces how \u{2009} cause problems in various places.
I would therefore replace all unicode spaces with regular spaces and then apply trim().

$string = "   test   string and XY \t ";
//\u{2009}\u{2009}\u{2009}test\u{2009}\u{2009}\u{2009}string\u{2009}and\x20XY\x20\x09\u{2009}

$trimString = trim(preg_replace('/[\pZ\pC]/u', ' ', $string));
//test\x20\x20\x20string\x20and\x20XY

Note: The representation of the strings in the comment was made with debug::writeUni($string, $trimString); realized from this class.

why is php trim is not really remove all whitespace and line breaks?

The trim function doesn't know about Unicode white spaces. You could try this:

preg_replace('/^\p{Z}+|\p{Z}+$/u', '', $str);

As taken from: Trim unicode whitespace in PHP 5.2

Otherwise, you can do a bin2hex() to find out what characters are being added at the front.

Update

Your file contains a UTF8 BOM; to remove it:

$f = fopen("file.txt", "r");
$s = fread($f, 3);
if ($s !== "\xef\xbb\xbf") {
// bom not found, rewind file
fseek($f, 0, SEEK_SET);
}
// continue reading here

Identifying hidden characters

What I would do is parse the string and get the ASCII character

$str = str_split('your string here');
foreach($str as $char) echo ord($char);

You'll then have the ASCII code of the character. You can theoretically work backwards from there

php: how to remove starting blanks and new lines

Use trim.

examples from php.net:

$text   = "\t\tThese are a few words :) ...  ";
$binary = "\x09Example string\x0A";
$hello = "Hello World";
var_dump($text, $binary, $hello);

print "\n";

$trimmed = trim($text);
var_dump($trimmed);

$trimmed = trim($text, " \t.");
var_dump($trimmed);

$trimmed = trim($hello, "Hdle");
var_dump($trimmed);

$trimmed = trim($hello, 'HdWr');
var_dump($trimmed);

// trim the ASCII control characters at the beginning and end of $binary
// (from 0 to 31 inclusive)
$clean = trim($binary, "\x00..\x1F");
var_dump($clean);

The above example will output:

string(32) "        These are a few words :) ...  "
string(16) " Example string
"
string(11) "Hello World"

string(28) "These are a few words :) ..."
string(24) "These are a few words :)"
string(5) "o Wor"
string(9) "ello Worl"
string(14) "Example string"

Weird whitespace error in PHP

The code below works, I have tried it with the special characters we discovered in the comments. Basically, the regex removes everything that isnt a number (0-9) and then uses your original formatting.

$trimmed = preg_replace('/\D+/', '', $v->cust_num);
$num = substr($trimmed,0,3)."-".substr($trimmed,3,3)."-".substr($trimmed,6,4);

MySQL Wrong Order by varchar

Tested your data and your query, and got this result:

. . .

| 8 | World War 3 |
| 79 | W​+​J |
| 88 | ​Damaged Goods |
+----------+------------------------------------+

Notice the alignment of the right bars. There are non-printing spaces in the data. On my screen, it looks like:

. . .

| 8 | World War 3 |
| 79 | W ​+ ​J |
| 88 | ​ Damaged Goods |
+----------+------------------------------------+

The "Damaged Goods" title has an extra non-printing space at the beginning, which makes it sort after all other titles.

If I open your data in vim, I see:

(88, '<200b>Damaged Goods', 'en',

Unicode 200b is "zero width space": https://www.fileformat.info/info/unicode/char/200B/index.htm

You should do some kind of whitespace-trimming operation on your data before inserting it to the database. Unfortunately, the reglar PHP trim() function doesn't do the job.

See Trim unicode whitespace in PHP 5.2 for a solution.


Re your comment:

Your use of trim() function in PHP won't work. The PHP trim() function understands only ASCII whitespace characters, not unicode whitespace-like characters. See http://php.net/trim for the list of characters trim looks for.

Wrong result for array_count_values in PHP

Yes the matching is separate for those because of the space. trim() to remove spaces for each word wouldn't work for your case as it doesn't remove unicode white spaces.

So your solution use: preg_replace.

Refer:

Trim unicode whitespace

why is php trim is not really remove all whitespace and line breaks?

Replace tabs and spaces with a single space as well as carriage returns and newlines with a single newline

First, I'd like to point out that new lines can be either \r, \n, or \r\n depending on the operating system.

My solution:

echo preg_replace('/[ \t]+/', ' ', preg_replace('/[\r\n]+/', "\n", $string));

Which could be separated into 2 lines if necessary:

$string = preg_replace('/[\r\n]+/', "\n", $string);
echo preg_replace('/[ \t]+/', ' ', $string);

Update:

An even better solutions would be this one:

echo preg_replace('/[ \t]+/', ' ', preg_replace('/\s*$^\s*/m', "\n", $string));

Or:

$string = preg_replace('/\s*$^\s*/m', "\n", $string);
echo preg_replace('/[ \t]+/', ' ', $string);

I've changed the regular expression that makes multiple lines breaks into a single better. It uses the "m" modifier (which makes ^ and $ match the start and end of new lines) and removes any \s (space, tab, new line, line break) characters that are a the end of a string and the beginning of the next. This solve the problem of empty lines that have nothing but spaces. With my previous example, if a line was filled with spaces, it would have skipped an extra line.



Related Topics



Leave a reply



Submit