How to Strip All Whitespace from String

How to strip all whitespace from string

Taking advantage of str.split's behavior with no sep parameter:

>>> s = " \t foo \n bar "
>>> "".join(s.split())
'foobar'

If you just want to remove spaces instead of all whitespace:

>>> s.replace(" ", "")
'\tfoo\nbar'

Premature optimization

Even though efficiency isn't the primary goal—writing clear code is—here are some initial timings:

$ python -m timeit '"".join(" \t foo \n bar ".split())'
1000000 loops, best of 3: 1.38 usec per loop
$ python -m timeit -s 'import re' 're.sub(r"\s+", "", " \t foo \n bar ")'
100000 loops, best of 3: 15.6 usec per loop

Note the regex is cached, so it's not as slow as you'd imagine. Compiling it beforehand helps some, but would only matter in practice if you call this many times:

$ python -m timeit -s 'import re; e = re.compile(r"\s+")' 'e.sub("", " \t foo \n bar ")'
100000 loops, best of 3: 7.76 usec per loop

Even though re.sub is 11.3x slower, remember your bottlenecks are assuredly elsewhere. Most programs would not notice the difference between any of these 3 choices.

Remove all whitespace in a string

If you want to remove leading and ending spaces, use str.strip():

>>> "  hello  apple  ".strip()
'hello apple'

If you want to remove all space characters, use str.replace() (NB this only removes the “normal” ASCII space character ' ' U+0020 but not any other whitespace):

>>> "  hello  apple  ".replace(" ", "")
'helloapple'

If you want to remove duplicated spaces, use str.split() followed by str.join():

>>> " ".join("  hello  apple  ".split())
'hello apple'

Efficient way to remove ALL whitespace from String?

This is fastest way I know of, even though you said you didn't want to use regular expressions:

Regex.Replace(XML, @"\s+", "");

Crediting @hypehuman in the comments, if you plan to do this more than once, create and store a Regex instance. This will save the overhead of constructing it every time, which is more expensive than you might think.

private static readonly Regex sWhitespace = new Regex(@"\s+");
public static string ReplaceWhitespace(string input, string replacement)
{
return sWhitespace.Replace(input, replacement);
}

Strip all whitespace from a string

Here is some benchmarks on a few different methods for stripping all whitespace characters from a string: (source data):


BenchmarkSpaceMap-8 2000 1100084 ns/op 221187 B/op 2 allocs/op
BenchmarkSpaceFieldsJoin-8 1000 2235073 ns/op 2299520 B/op 20 allocs/op
BenchmarkSpaceStringsBuilder-8 2000 932298 ns/op 122880 B/op 1 allocs/op
  • SpaceMap: uses strings.Map; gradually increases the amount of allocated space as more non-whitespace characters are encountered
  • SpaceFieldsJoin: strings.Fields and strings.Join; generates a lot of intermediate data
  • SpaceStringsBuilder uses strings.Builder; performs a single allocation, but may grossly overallocate if the source string is mainly whitespace.
package main_test

import (
"strings"
"unicode"
"testing"
)

func SpaceMap(str string) string {
return strings.Map(func(r rune) rune {
if unicode.IsSpace(r) {
return -1
}
return r
}, str)
}

func SpaceFieldsJoin(str string) string {
return strings.Join(strings.Fields(str), "")
}

func SpaceStringsBuilder(str string) string {
var b strings.Builder
b.Grow(len(str))
for _, ch := range str {
if !unicode.IsSpace(ch) {
b.WriteRune(ch)
}
}
return b.String()
}

func BenchmarkSpaceMap(b *testing.B) {
for n := 0; n < b.N; n++ {
SpaceMap(data)
}
}

func BenchmarkSpaceFieldsJoin(b *testing.B) {
for n := 0; n < b.N; n++ {
SpaceFieldsJoin(data)
}
}

func BenchmarkSpaceStringsBuilder(b *testing.B) {
for n := 0; n < b.N; n++ {
SpaceStringsBuilder(data)
}
}

Remove all whitespace from a string

If you want to modify the String, use retain. This is likely the fastest way when available.

fn remove_whitespace(s: &mut String) {
s.retain(|c| !c.is_whitespace());
}

If you cannot modify it because you still need it or only have a &str, then you can use filter and create a new String. This will, of course, have to allocate to make the String.

fn remove_whitespace(s: &str) -> String {
s.chars().filter(|c| !c.is_whitespace()).collect()
}

Remove ALL white spaces from text

You have to tell replace() to repeat the regex:

.replace(/ /g,'')

The g character makes it a "global" match, meaning it repeats the search through the entire string. Read about this, and other RegEx modifiers available in JavaScript here.

If you want to match all whitespace, and not just the literal space character, use \s instead:

.replace(/\s/g,'')

You can also use .replaceAll if you're using a sufficiently recent version of JavaScript, but there's not really any reason to for your specific use case, since catching all whitespace requires a regex, and when using a regex with .replaceAll, it must be global, so you just end up with extra typing:

.replaceAll(/\s/g,'')

How to remove all whitespace from a string?

In general, we want a solution that is vectorised, so here's a better test example:

whitespace <- " \t\n\r\v\f" # space, tab, newline, 
# carriage return, vertical tab, form feed
x <- c(
" x y ", # spaces before, after and in between
" \u2190 \u2192 ", # contains unicode chars
paste0( # varied whitespace
whitespace,
"x",
whitespace,
"y",
whitespace,
collapse = ""
),
NA # missing
)
## [1] " x y "
## [2] " ← → "
## [3] " \t\n\r\v\fx \t\n\r\v\fy \t\n\r\v\f"
## [4] NA

The base R approach: gsub

gsub replaces all instances of a string (fixed = TRUE) or regular expression (fixed = FALSE, the default) with another string. To remove all spaces, use:

gsub(" ", "", x, fixed = TRUE)
## [1] "xy" "←→"
## [3] "\t\n\r\v\fx\t\n\r\v\fy\t\n\r\v\f" NA

As DWin noted, in this case fixed = TRUE isn't necessary but provides slightly better performance since matching a fixed string is faster than matching a regular expression.

If you want to remove all types of whitespace, use:

gsub("[[:space:]]", "", x) # note the double square brackets
## [1] "xy" "←→" "xy" NA

gsub("\\s", "", x) # same; note the double backslash

library(regex)
gsub(space(), "", x) # same

"[:space:]" is an R-specific regular expression group matching all space characters. \s is a language-independent regular-expression that does the same thing.


The stringr approach: str_replace_all and str_trim

stringr provides more human-readable wrappers around the base R functions (though as of Dec 2014, the development version has a branch built on top of stringi, mentioned below). The equivalents of the above commands, using [str_replace_all][3], are:

library(stringr)
str_replace_all(x, fixed(" "), "")
str_replace_all(x, space(), "")

stringr also has a str_trim function which removes only leading and trailing whitespace.

str_trim(x) 
## [1] "x y" "← →" "x \t\n\r\v\fy" NA
str_trim(x, "left")
## [1] "x y " "← → "
## [3] "x \t\n\r\v\fy \t\n\r\v\f" NA
str_trim(x, "right")
## [1] " x y" " ← →"
## [3] " \t\n\r\v\fx \t\n\r\v\fy" NA

The stringi approach: stri_replace_all_charclass and stri_trim

stringi is built upon the platform-independent ICU library, and has an extensive set of string manipulation functions. The equivalents of the above are:

library(stringi)
stri_replace_all_fixed(x, " ", "")
stri_replace_all_charclass(x, "\\p{WHITE_SPACE}", "")

Here "\\p{WHITE_SPACE}" is an alternate syntax for the set of Unicode code points considered to be whitespace, equivalent to "[[:space:]]", "\\s" and space(). For more complex regular expression replacements, there is also stri_replace_all_regex.

stringi also has trim functions.

stri_trim(x)
stri_trim_both(x) # same
stri_trim(x, "left")
stri_trim_left(x) # same
stri_trim(x, "right")
stri_trim_right(x) # same

How to remove all whitespace from string?

This piece of code helped figure out exactly what kind of whitespace was present in the original query that had the join issue:

select distinct
fieldname,
space = iif(charindex(char(32), fieldname) > 0, 1, 0),
horizontal_tab = iif(charindex(char(9), fieldname) > 0, 1, 0),
vertical_tab = iif(charindex(char(11), fieldname) > 0, 1, 0),
backspace = iif(charindex(char(8), fieldname) > 0, 1, 0),
carriage_return = iif(charindex(char(13), fieldname) > 0, 1, 0),
newline = iif(charindex(char(10), fieldname) > 0, 1, 0),
formfeed = iif(charindex(char(12), fieldname) > 0, 1, 0),
nonbreakingspace = iif(charindex(char(255), fieldname) > 0, 1, 0)
from tablename;

It turned out there were carriage returns and new line feeds in the data of one of the tables. So using @scsimon's solution this problem was resolved by changing the join to this:

on REPLACE(REPLACE(a.fieldname, CHAR(10), ''), CHAR(13), '') = b.fieldname

How to remove white spaces from a string in Python?

Use str.replace:

>>> s = "TN 81 NZ 0025"
>>> s.replace(" ", "")
'TN81NZ0025'

To remove all types of white-space characters use str.translate:

>>> from string import whitespace
>>> s = "TN 81 NZ\t\t0025\nfoo"
# Python 2
>>> s.translate(None, whitespace)
'TN81NZ0025foo'
# Python 3
>>> s.translate(dict.fromkeys(map(ord, whitespace)))
'TN81NZ0025foo'


Related Topics



Leave a reply



Submit