How Can Strip Whitespaces in PHP'S Variable

How can strip whitespaces in PHP's variable?

A regular expression does not account for UTF-8 characters by default. The \s meta-character only accounts for the original latin set. Therefore, the following command only removes tabs, spaces, carriage returns and new lines

// http://stackoverflow.com/a/1279798/54964
$str=preg_replace('/\s+/', '', $str);

With UTF-8 becoming mainstream this expression will more frequently fail/halt when it reaches the new utf-8 characters, leaving white spaces behind that the \s cannot account for.

To deal with the new types of white spaces introduced in unicode/utf-8, a more extensive string is required to match and removed modern white space.

Because regular expressions by default do not recognize multi-byte characters, only a delimited meta string can be used to identify them, to prevent the byte segments from being alters in other utf-8 characters (\x80 in the quad set could replace all \x80 sub-bytes in smart quotes)

$cleanedstr = preg_replace(
"/(\t|\n|\v|\f|\r| |\xC2\x85|\xc2\xa0|\xe1\xa0\x8e|\xe2\x80[\x80-\x8D]|\xe2\x80\xa8|\xe2\x80\xa9|\xe2\x80\xaF|\xe2\x81\x9f|\xe2\x81\xa0|\xe3\x80\x80|\xef\xbb\xbf)+/",
"_",
$str
);

This accounts for and removes tabs, newlines, vertical tabs, formfeeds, carriage returns, spaces, and additionally from here:

nextline, non-breaking spaces, mongolian vowel separator, [en quad, em quad, en space, em space, three-per-em space, four-per-em space, six-per-em space, figure space, punctuation space, thin space, hair space, zero width space, zero width non-joiner, zero width joiner], line separator, paragraph separator, narrow no-break space, medium mathematical space, word joiner, ideographical space, and the zero width non-breaking space.

Many of these wreak havoc in xml files when exported from automated tools or sites which foul up text searches, recognition, and can be pasted invisibly into PHP source code which causes the parser to jump to next command (paragraph and line separators) which causes lines of code to be skipped resulting in intermittent, unexplained errors that we have begun referring to as "textually transmitted diseases"

[Its not safe to copy and paste from the web anymore. Use a character scanner to protect your code. lol]

How do I strip all spaces out of a string in PHP?

Do you just mean spaces or all whitespace?

For just spaces, use str_replace:

$string = str_replace(' ', '', $string);

For all whitespace (including tabs and line ends), use preg_replace:

$string = preg_replace('/\s+/', '', $string);

(From here).

How to remove all white spaces if more than one

$str = preg_replace('/\s+/', ' ', $originalString);
echo $str;

This will replace all whitespace with a single space.

Strip php variable, replace white spaces with dashes

This function will create an SEO friendly string

function seoUrl($string) {
//Lower case everything
$string = strtolower($string);
//Make alphanumeric (removes all other characters)
$string = preg_replace("/[^a-z0-9_\s-]/", "", $string);
//Clean up multiple dashes or whitespaces
$string = preg_replace("/[\s-]+/", " ", $string);
//Convert whitespaces and underscore to dash
$string = preg_replace("/[\s_]/", "-", $string);
return $string;
}

should be fine :)

Removing extra whitespaces between characters from a string using PHP

1.Solution found.

As it was mentioned, the source of my string is an editable html DIV, which I use instead of normal HTML text area. That makes my string very different than hard coded or PHP-maden string.
So, my string contains some white spaces written in a form of machine language/ASCII code, which I still couldn't figure out. There is NO WAY you can see this encoded white spaces.

That's why REGEX doesn't match '\t nbsp; " ", %C2 %A0%' etc.
Urldecode function doesn't work either, as it's not URL encoded thing. Instead of guessing what is sitting in those white spaces, I just encode them to html using this:

$data = htmlentities($_POST['characters']);

Now my string outputs this:

12345     6

I still can't figure out why my first 4 white spaces were converted to nbsp but the 5th one appears as a normal " " white space. But at least it explains why REGEX removed only 1 white space.

Then I easily remove/strip nbsp and extra spaces.

$stripped = trim(preg_replace('/( )+|\s\K\s+/','', $data));

Now my output looks as expected:
12345 6

Let's decode those html entities (if you had any html tags in your string) back to their respective characters:

$finalString = html_entity_decode($stripped);

Now everything is just perfect.

To summarize the problem: it appears that different browsers and different OS can replace HTML white space with different things. I think it's not a bug, rather it's behavioral thing of browser/OS. Just let the machine to convert their encoding to an universal HTML entity and then use REGEX to match that entity and remove it.
I hope I saved lots of time for some people.

Remove excess whitespace from within a string

Not sure exactly what you want but here are two situations:

  1. If you are just dealing with excess whitespace on the beginning or end of the string you can use trim(), ltrim() or rtrim() to remove it.

  2. If you are dealing with extra spaces within a string consider a preg_replace of multiple whitespaces " "* with a single whitespace " ".

Example:

$foo = preg_replace('/\s+/', ' ', $foo);

Removing all space in string before the first non space character

Use ltrim() function to remove whitespaces from the beginning of a string.

$str = "        Hello World";
$sanitized_string = ltrim($str);
echo $sanitized_string; // Hello World

Trim space in the variable with PHP not working

You can use preg_replace to remove whitespace in your path

$logo_file_path = '20 07170312_download.jpg';
$path = preg_replace('/\s+/', '', 'uploads/' . $logo_file_path);

echo $path;
//Output : uploads/2007170312_download.jpg


Related Topics



Leave a reply



Submit