non-breaking utf-8 0xc2a0 space and preg_replace strange behaviour
Actually the documentation about escape sequences in PHP is wrong. When you use \xc2\xa0
syntax, it searches for UTF-8 character. But with \x{c2a0}
syntax, it tries to convert the Unicode sequence to UTF-8 encoded character.
A non breaking space is U+00A0
(Unicode) but encoded as C2A0
in UTF-8. So if you try with the pattern ~\x{00a0}~siu
, it will work as expected.
Mysql insisting on putting weird non-breaking space characters inside my empty textareas
I AM STUPID. Christ, in convoluted oop structure I had done a stupid thing that caused this to keep happening. Sorry to waste your time! Can we delete this question?
PHP: preg_replace() with pattern modifier e - strange behaviour in evalution result seems to ignore back references
Change
'"\1\2</a> (<a href=\"http://www.google.com/search?q=' . strip_tags(strtoupper('blah\2')) . '\">S</a>)"'
to
'"\1\2</a> (<a href=\"http://www.google.com/search?q=" . strip_tags(strtoupper(\'blah\2\')) . "\">S</a>)"'
The strip_tags()
function needs to be a part of the replacement string so it doesn't get evaluated before it gets passed to preg_replace()
.
How to replace decoded Non-breakable space (nbsp)
Problem Explanation
The reason why it's not working is that you are specifying the non-breaking space incorrectly.
The proper code for the non-breaking space in the UTF-8 encoding is 0xC2A0
, it consists of two bytes - 0xC2
(194
) and 0xA0
(160
), so technically, you're specifying only the half of the character's code.
A Bit of Theory
Legacy character encodings were using the constant number of bits to encode every character in their set. For example, the original ASCII encoding was using 7 bits per character, extended ASCII 8 bits.
The UTF-8 encoding is so-called variable width character encoding, which means that the number of bits used to represent individual characters is variable, in the case of UTF-8, character codes consist of one up to four (8 bit) bytes (octets). In general, similarly to the Huffman coding, more frequently used characters have shorter codes while more rare characters have longer codes. That helps reduce the data size of the average text.
Solution
You can replace all occurences of the UTF-8 non-breaking space in text using a simple (and fast) str_replace
or using a more flexible regular expression, depending on your needs:
// faster solution
$regular_spaces = str_replace("\xc2\xa0", ' ', $original_string);
// more flexible solution
$regular_spaces = preg_replace('/\xc2\xa0/', ' ', $original_string);
Notes
Note that in case of str_replace
, you have to use double quotes ("
) to enclose the search string because it doesn't understand the textual representation of character codes so it needs those codes to be converted into actual characters first. That's made automatically by PHP because strings enclosed in double quotes are being processed and special sequences (e.g. newline character \n
, textual representation of character codes, etc.) are replaced by actual characters (e.g. 0x0A
for \n
in UTF-8) before the string value is being used.
In contrast, the preg_replace
function itself understands the textual representation of character codes so you don't need PHP to convert them into actual characters and you can use apostrophes (single quotes, '
) to enclose the search string in this case.
PHP Currency with XML - Incorrect Results
I would suggest to use
$product_price = preg_replace('~[ \x{00a0}]~siu', '', $product_price);
instead of
$product_price = str_replace(" ", "", $product_price);
The reason for this is that your xml file looks like:
grep price *.xml | head -n1 | hexdump -C
00000000 20 20 20 20 3c 70 72 69 63 65 3e 52 20 32 38 c2 | <price>R 28.|
00000010 a0 39 39 39 2e 30 30 3c 2f 70 72 69 63 65 3e 0d |.999.00</price>.|
00000020 0a |.|
00000021
i.e., the space serving as a delimiter of the thousands is not an ordinary space but a non-breaking space (C2 A0
in UTF-8 as shown above), thus the statement str_replace(" ", "", $product_price)
had no effect on it and therefore you were effectively taking into account only the thousands (i.e., in this case "28 999"*1.15
which yields 28*1.15
)...
Related Topics
How to Change the Woocommerce_Form_Field HTML Structure
Long Integer Is Transformed When Inserted in Shorter Column, Not Truncated. Why? What Is the Formula
How to Put JavaScript Variable in PHP Echo
Return Multiple Response Data in One Response
Using Value of a Column as Index in Results Using Pdo
Sum Specific Values in a Multidimensional Array (Php)
Errorbag Is Always Empty in Laravel 5.2
Display Custom Order Meta Data Value in Email Notifications Woocommerce
Proper Repository Pattern Design in PHP
Laravel Checking If a Record Exists
How to Run Multiple Insert Query in SQL Using PHP in One Go
MySQL - Insert Date Range into Date Columns If Dates Don't Overlap with Existing Ones
MySQL Select from Tables Based on Multiple Rows
How to Do Multiple SQL Statements in One MySQL_Query
Get Repeated Matches with Preg_Match_All()
Fatal Error - 'Mongo' Class Not Found
Forget Password Page, Creating a Generated Password to Email to the User