How to Remove All Whitespace from a String

Remove all whitespace in a string

If you want to remove leading and ending spaces, use str.strip():

>>> "  hello  apple  ".strip()
'hello apple'

If you want to remove all space characters, use str.replace() (NB this only removes the “normal” ASCII space character ' ' U+0020 but not any other whitespace):

>>> "  hello  apple  ".replace(" ", "")
'helloapple'

If you want to remove duplicated spaces, use str.split() followed by str.join():

>>> " ".join("  hello  apple  ".split())
'hello apple'

How to strip all whitespace from string

Taking advantage of str.split's behavior with no sep parameter:

>>> s = " \t foo \n bar "
>>> "".join(s.split())
'foobar'

If you just want to remove spaces instead of all whitespace:

>>> s.replace(" ", "")
'\tfoo\nbar'

Premature optimization

Even though efficiency isn't the primary goal—writing clear code is—here are some initial timings:

$ python -m timeit '"".join(" \t foo \n bar ".split())'
1000000 loops, best of 3: 1.38 usec per loop
$ python -m timeit -s 'import re' 're.sub(r"\s+", "", " \t foo \n bar ")'
100000 loops, best of 3: 15.6 usec per loop

Note the regex is cached, so it's not as slow as you'd imagine. Compiling it beforehand helps some, but would only matter in practice if you call this many times:

$ python -m timeit -s 'import re; e = re.compile(r"\s+")' 'e.sub("", " \t foo \n bar ")'
100000 loops, best of 3: 7.76 usec per loop

Even though re.sub is 11.3x slower, remember your bottlenecks are assuredly elsewhere. Most programs would not notice the difference between any of these 3 choices.

Remove all whitespace from string

If you want to modify the String, use retain. This is likely the fastest way when available.

fn remove_whitespace(s: &mut String) {
s.retain(|c| !c.is_whitespace());
}

If you cannot modify it because you still need it or only have a &str, then you can use filter and create a new String. This will, of course, have to allocate to make the String.

fn remove_whitespace(s: &str) -> String {
s.chars().filter(|c| !c.is_whitespace()).collect()
}

How to remove all whitespace characters from a String?

Try using Linq in order to filter out white spaces:

  using System.Linq;

...

string source = "abc \t def\r\n789";
string result = string.Concat(source.Where(c => !char.IsWhiteSpace(c)));

Console.WriteLine(result);

Outcome:

abcdef789

How to remove all whitespace from a string?

In general, we want a solution that is vectorised, so here's a better test example:

whitespace <- " \t\n\r\v\f" # space, tab, newline, 
# carriage return, vertical tab, form feed
x <- c(
" x y ", # spaces before, after and in between
" \u2190 \u2192 ", # contains unicode chars
paste0( # varied whitespace
whitespace,
"x",
whitespace,
"y",
whitespace,
collapse = ""
),
NA # missing
)
## [1] " x y "
## [2] " ← → "
## [3] " \t\n\r\v\fx \t\n\r\v\fy \t\n\r\v\f"
## [4] NA

The base R approach: gsub

gsub replaces all instances of a string (fixed = TRUE) or regular expression (fixed = FALSE, the default) with another string. To remove all spaces, use:

gsub(" ", "", x, fixed = TRUE)
## [1] "xy" "←→"
## [3] "\t\n\r\v\fx\t\n\r\v\fy\t\n\r\v\f" NA

As DWin noted, in this case fixed = TRUE isn't necessary but provides slightly better performance since matching a fixed string is faster than matching a regular expression.

If you want to remove all types of whitespace, use:

gsub("[[:space:]]", "", x) # note the double square brackets
## [1] "xy" "←→" "xy" NA

gsub("\\s", "", x) # same; note the double backslash

library(regex)
gsub(space(), "", x) # same

"[:space:]" is an R-specific regular expression group matching all space characters. \s is a language-independent regular-expression that does the same thing.


The stringr approach: str_replace_all and str_trim

stringr provides more human-readable wrappers around the base R functions (though as of Dec 2014, the development version has a branch built on top of stringi, mentioned below). The equivalents of the above commands, using [str_replace_all][3], are:

library(stringr)
str_replace_all(x, fixed(" "), "")
str_replace_all(x, space(), "")

stringr also has a str_trim function which removes only leading and trailing whitespace.

str_trim(x) 
## [1] "x y" "← →" "x \t\n\r\v\fy" NA
str_trim(x, "left")
## [1] "x y " "← → "
## [3] "x \t\n\r\v\fy \t\n\r\v\f" NA
str_trim(x, "right")
## [1] " x y" " ← →"
## [3] " \t\n\r\v\fx \t\n\r\v\fy" NA

The stringi approach: stri_replace_all_charclass and stri_trim

stringi is built upon the platform-independent ICU library, and has an extensive set of string manipulation functions. The equivalents of the above are:

library(stringi)
stri_replace_all_fixed(x, " ", "")
stri_replace_all_charclass(x, "\\p{WHITE_SPACE}", "")

Here "\\p{WHITE_SPACE}" is an alternate syntax for the set of Unicode code points considered to be whitespace, equivalent to "[[:space:]]", "\\s" and space(). For more complex regular expression replacements, there is also stri_replace_all_regex.

stringi also has trim functions.

stri_trim(x)
stri_trim_both(x) # same
stri_trim(x, "left")
stri_trim_left(x) # same
stri_trim(x, "right")
stri_trim_right(x) # same

How to remove all whitespace from string?

This piece of code helped figure out exactly what kind of whitespace was present in the original query that had the join issue:

select distinct
fieldname,
space = iif(charindex(char(32), fieldname) > 0, 1, 0),
horizontal_tab = iif(charindex(char(9), fieldname) > 0, 1, 0),
vertical_tab = iif(charindex(char(11), fieldname) > 0, 1, 0),
backspace = iif(charindex(char(8), fieldname) > 0, 1, 0),
carriage_return = iif(charindex(char(13), fieldname) > 0, 1, 0),
newline = iif(charindex(char(10), fieldname) > 0, 1, 0),
formfeed = iif(charindex(char(12), fieldname) > 0, 1, 0),
nonbreakingspace = iif(charindex(char(255), fieldname) > 0, 1, 0)
from tablename;

It turned out there were carriage returns and new line feeds in the data of one of the tables. So using @scsimon's solution this problem was resolved by changing the join to this:

on REPLACE(REPLACE(a.fieldname, CHAR(10), ''), CHAR(13), '') = b.fieldname

Efficient way to remove ALL whitespace from String?

This is fastest way I know of, even though you said you didn't want to use regular expressions:

Regex.Replace(XML, @"\s+", "");

Crediting @hypehuman in the comments, if you plan to do this more than once, create and store a Regex instance. This will save the overhead of constructing it every time, which is more expensive than you might think.

private static readonly Regex sWhitespace = new Regex(@"\s+");
public static string ReplaceWhitespace(string input, string replacement)
{
return sWhitespace.Replace(input, replacement);
}

Strip all whitespace from a string

Here is some benchmarks on a few different methods for stripping all whitespace characters from a string: (source data):


BenchmarkSpaceMap-8 2000 1100084 ns/op 221187 B/op 2 allocs/op
BenchmarkSpaceFieldsJoin-8 1000 2235073 ns/op 2299520 B/op 20 allocs/op
BenchmarkSpaceStringsBuilder-8 2000 932298 ns/op 122880 B/op 1 allocs/op
  • SpaceMap: uses strings.Map; gradually increases the amount of allocated space as more non-whitespace characters are encountered
  • SpaceFieldsJoin: strings.Fields and strings.Join; generates a lot of intermediate data
  • SpaceStringsBuilder uses strings.Builder; performs a single allocation, but may grossly overallocate if the source string is mainly whitespace.
package main_test

import (
"strings"
"unicode"
"testing"
)

func SpaceMap(str string) string {
return strings.Map(func(r rune) rune {
if unicode.IsSpace(r) {
return -1
}
return r
}, str)
}

func SpaceFieldsJoin(str string) string {
return strings.Join(strings.Fields(str), "")
}

func SpaceStringsBuilder(str string) string {
var b strings.Builder
b.Grow(len(str))
for _, ch := range str {
if !unicode.IsSpace(ch) {
b.WriteRune(ch)
}
}
return b.String()
}

func BenchmarkSpaceMap(b *testing.B) {
for n := 0; n < b.N; n++ {
SpaceMap(data)
}
}

func BenchmarkSpaceFieldsJoin(b *testing.B) {
for n := 0; n < b.N; n++ {
SpaceFieldsJoin(data)
}
}

func BenchmarkSpaceStringsBuilder(b *testing.B) {
for n := 0; n < b.N; n++ {
SpaceStringsBuilder(data)
}
}


Related Topics



Leave a reply



Submit