Does HTML_Entity_Decode Replaces &Nbsp; Also? If Not How to Replace It

Does html_entity_decode replaces   also? If not how to replace it?

Quote from html_entity_decode() manual:

You might wonder why
trim(html_entity_decode(' '));
doesn't reduce the string to an empty
string, that's because the ' '
entity is not ASCII code 32 (which is
stripped by trim()) but ASCII code 160
(0xa0) in the default ISO 8859-1
characterset.

You can use str_replace() to replace the ascii character #160 to a space:

<?php
$a = html_entity_decode('> <');
echo 'before ' . $a . PHP_EOL;
$a = str_replace("\xA0", ' ', $a);
echo ' after ' . $a . PHP_EOL;

PHP preg_replace  

I think the problem is quite simply that highlight_string() is outputting its result immediately, rather than saving it to $note.

Instead, please try the following:

$note = html_entity_decode($note);
$note = highlight_string($note, true);
$note = str_replace(' ', ' ', $note);

The difference in my code is that I use highlight_string($note, true) with the second parameter set to true. The docs shed some light about the function's behavior:

mixed highlight_string ( string $str [, bool $return = false ] )

Return
 Set this parameter to TRUE to make this function return the highlighted code.

The regex function you have in your code block might work, but since this is a simple replacement, it will suffice to use str_replace in this case, as you have tried.

PHP Parsing Problem -   and Â

The non-breaking space exist in UTF-8 of two bytes: 0xC2 and 0xA0.

When those bytes are represented in ISO-8859-1 (a single-byte encoding) instead of UTF-8 (a multi-byte encoding) then those bytes becomes respectively the characters  and another non-breaking space .

Apparently you're parsing the HTML using UTF-8 and echoing the results using ISO-8859-1. To fix this problem, you need to either parse HTML using ISO-8859-1 or echo the results using UTF-8. I'd recommend to use UTF-8 all the way. Go through the PHP UTF-8 cheatsheet to align it all out.

replace   characters that are hidden in text

This solution will work, I tested it:

$string = htmlentities($content, null, 'utf-8');
$content = str_replace(" ", "", $string);
$content = html_entity_decode($content);

How to check if   exist?

First of all, People get tripped up on this move all the time...

strpos($string, " ")

If   is at the start of your string, then the evaluated result is 0 ("offset position") AND 0 is loosely compared to false in the way that you have crafted your conditional expression.

You need to explicitly check for false (strict check) from strpos() like this:

if (empty($string) || strpos($string, " ") !== false || $string == " ") {
//Do Something.
}

However, that is NOT your actual issue because...

You have a multibyte space evidenced by when you "highlight" the character with your cursor -- it only has a character length of one, but when you call var_dump() there is a byte count of 2.

trim() can't help you. ctype_space() can't help you. You need something that is multibyte aware.

To allow the most inclusive match, I'll employ a regular expression that will search for all whitespace characters, invisible control characters, and unused code points.

if (empty($string) || preg_match("/^[\pZ\pC]+$/u", $string)) {

This will check if the string is truly empty or is entirely composed of one or more of the aforementioned characters.

Here's a little demo: https://3v4l.org/u7eoK

(I don't really think this is a   issue, so I am leaving that out of my solution.)

Scroll down this resource: https://www.regular-expressions.info/unicode.html

How to remove html special chars?

Either decode them using html_entity_decode or remove them using preg_replace:

$Content = preg_replace("/&#?[a-z0-9]+;/i","",$Content); 

(From here)

EDIT: Alternative according to Jacco's comment

might be nice to replace the '+' with
{2,8} or something. This will limit
the chance of replacing entire
sentences when an unencoded '&' is
present.

$Content = preg_replace("/&#?[a-z0-9]{2,8};/i","",$Content); 

PHP convert html   to space, to etc

Use htmlspecialchars_decode is the opposite of htmlspecialchars.

Example from the PHP documentation page:

    $str = '<p>this -> "</p>';
echo htmlspecialchars_decode($str);
//Output: <p>this -> "</p>

How to remove   from a UTF-8 string?

This gets tricky, its not as straight forward as replacing normal string.

Try this.

 str_replace("\xc2\xa0",' ',$str); 

or this, the above should work:

$nbsp = html_entity_decode(" ");
$s = html_entity_decode("[ ]");
$s = str_replace($nbsp, " ", $s);
echo $s;

@ref: https://moovwebconfluence.atlassian.net/wiki/pages/viewpage.action?pageId=1081435

Replace   with a blank or empty string PHP

$text_description="       Hello world! lorel ipsum";
$text_description = str_replace(' ', ' ', $text_description);
echo $text_description;

Output:

Hello world! lorel ipsum



Related Topics



Leave a reply



Submit