Remove all whitespace in a string
If you want to remove leading and ending spaces, use str.strip()
:
>>> " hello apple ".strip()
'hello apple'
If you want to remove all space characters, use str.replace()
(NB this only removes the “normal” ASCII space character ' ' U+0020
but not any other whitespace):
>>> " hello apple ".replace(" ", "")
'helloapple'
If you want to remove duplicated spaces, use str.split()
followed by str.join()
:
>>> " ".join(" hello apple ".split())
'hello apple'
How to strip all whitespace from string
Taking advantage of str.split's behavior with no sep parameter:
>>> s = " \t foo \n bar "
>>> "".join(s.split())
'foobar'
If you just want to remove spaces instead of all whitespace:
>>> s.replace(" ", "")
'\tfoo\nbar'
Premature optimization
Even though efficiency isn't the primary goal—writing clear code is—here are some initial timings:
$ python -m timeit '"".join(" \t foo \n bar ".split())'
1000000 loops, best of 3: 1.38 usec per loop
$ python -m timeit -s 'import re' 're.sub(r"\s+", "", " \t foo \n bar ")'
100000 loops, best of 3: 15.6 usec per loop
Note the regex is cached, so it's not as slow as you'd imagine. Compiling it beforehand helps some, but would only matter in practice if you call this many times:
$ python -m timeit -s 'import re; e = re.compile(r"\s+")' 'e.sub("", " \t foo \n bar ")'
100000 loops, best of 3: 7.76 usec per loop
Even though re.sub is 11.3x slower, remember your bottlenecks are assuredly elsewhere. Most programs would not notice the difference between any of these 3 choices.
Remove all whitespace from string
If you want to modify the String
, use retain
. This is likely the fastest way when available.
fn remove_whitespace(s: &mut String) {
s.retain(|c| !c.is_whitespace());
}
If you cannot modify it because you still need it or only have a &str
, then you can use filter and create a new String
. This will, of course, have to allocate to make the String
.
fn remove_whitespace(s: &str) -> String {
s.chars().filter(|c| !c.is_whitespace()).collect()
}
How to remove all whitespace characters from a String?
Try using Linq in order to filter out white spaces:
using System.Linq;
...
string source = "abc \t def\r\n789";
string result = string.Concat(source.Where(c => !char.IsWhiteSpace(c)));
Console.WriteLine(result);
Outcome:
abcdef789
How to remove all whitespace from a string?
In general, we want a solution that is vectorised, so here's a better test example:
whitespace <- " \t\n\r\v\f" # space, tab, newline,
# carriage return, vertical tab, form feed
x <- c(
" x y ", # spaces before, after and in between
" \u2190 \u2192 ", # contains unicode chars
paste0( # varied whitespace
whitespace,
"x",
whitespace,
"y",
whitespace,
collapse = ""
),
NA # missing
)
## [1] " x y "
## [2] " ← → "
## [3] " \t\n\r\v\fx \t\n\r\v\fy \t\n\r\v\f"
## [4] NA
The base R approach: gsub
gsub
replaces all instances of a string (fixed = TRUE
) or regular expression (fixed = FALSE
, the default) with another string. To remove all spaces, use:
gsub(" ", "", x, fixed = TRUE)
## [1] "xy" "←→"
## [3] "\t\n\r\v\fx\t\n\r\v\fy\t\n\r\v\f" NA
As DWin noted, in this case fixed = TRUE
isn't necessary but provides slightly better performance since matching a fixed string is faster than matching a regular expression.
If you want to remove all types of whitespace, use:
gsub("[[:space:]]", "", x) # note the double square brackets
## [1] "xy" "←→" "xy" NA
gsub("\\s", "", x) # same; note the double backslash
library(regex)
gsub(space(), "", x) # same
"[:space:]"
is an R-specific regular expression group matching all space characters. \s
is a language-independent regular-expression that does the same thing.
The stringr
approach: str_replace_all
and str_trim
stringr
provides more human-readable wrappers around the base R functions (though as of Dec 2014, the development version has a branch built on top of stringi
, mentioned below). The equivalents of the above commands, using [str_replace_all][3]
, are:
library(stringr)
str_replace_all(x, fixed(" "), "")
str_replace_all(x, space(), "")
stringr
also has a str_trim
function which removes only leading and trailing whitespace.
str_trim(x)
## [1] "x y" "← →" "x \t\n\r\v\fy" NA
str_trim(x, "left")
## [1] "x y " "← → "
## [3] "x \t\n\r\v\fy \t\n\r\v\f" NA
str_trim(x, "right")
## [1] " x y" " ← →"
## [3] " \t\n\r\v\fx \t\n\r\v\fy" NA
The stringi
approach: stri_replace_all_charclass
and stri_trim
stringi
is built upon the platform-independent ICU library, and has an extensive set of string manipulation functions. The equivalents of the above are:
library(stringi)
stri_replace_all_fixed(x, " ", "")
stri_replace_all_charclass(x, "\\p{WHITE_SPACE}", "")
Here "\\p{WHITE_SPACE}"
is an alternate syntax for the set of Unicode code points considered to be whitespace, equivalent to "[[:space:]]"
, "\\s"
and space()
. For more complex regular expression replacements, there is also stri_replace_all_regex
.
stringi
also has trim functions.
stri_trim(x)
stri_trim_both(x) # same
stri_trim(x, "left")
stri_trim_left(x) # same
stri_trim(x, "right")
stri_trim_right(x) # same
How to remove all whitespace from string?
This piece of code helped figure out exactly what kind of whitespace was present in the original query that had the join issue:
select distinct
fieldname,
space = iif(charindex(char(32), fieldname) > 0, 1, 0),
horizontal_tab = iif(charindex(char(9), fieldname) > 0, 1, 0),
vertical_tab = iif(charindex(char(11), fieldname) > 0, 1, 0),
backspace = iif(charindex(char(8), fieldname) > 0, 1, 0),
carriage_return = iif(charindex(char(13), fieldname) > 0, 1, 0),
newline = iif(charindex(char(10), fieldname) > 0, 1, 0),
formfeed = iif(charindex(char(12), fieldname) > 0, 1, 0),
nonbreakingspace = iif(charindex(char(255), fieldname) > 0, 1, 0)
from tablename;
It turned out there were carriage returns and new line feeds in the data of one of the tables. So using @scsimon's solution this problem was resolved by changing the join to this:
on REPLACE(REPLACE(a.fieldname, CHAR(10), ''), CHAR(13), '') = b.fieldname
Efficient way to remove ALL whitespace from String?
This is fastest way I know of, even though you said you didn't want to use regular expressions:
Regex.Replace(XML, @"\s+", "");
Crediting @hypehuman in the comments, if you plan to do this more than once, create and store a Regex instance. This will save the overhead of constructing it every time, which is more expensive than you might think.
private static readonly Regex sWhitespace = new Regex(@"\s+");
public static string ReplaceWhitespace(string input, string replacement)
{
return sWhitespace.Replace(input, replacement);
}
Strip all whitespace from a string
Here is some benchmarks on a few different methods for stripping all whitespace characters from a string: (source data):
BenchmarkSpaceMap-8 2000 1100084 ns/op 221187 B/op 2 allocs/op
BenchmarkSpaceFieldsJoin-8 1000 2235073 ns/op 2299520 B/op 20 allocs/op
BenchmarkSpaceStringsBuilder-8 2000 932298 ns/op 122880 B/op 1 allocs/op
SpaceMap
: usesstrings.Map
; gradually increases the amount of allocated space as more non-whitespace characters are encounteredSpaceFieldsJoin
:strings.Fields
andstrings.Join
; generates a lot of intermediate dataSpaceStringsBuilder
usesstrings.Builder
; performs a single allocation, but may grossly overallocate if the source string is mainly whitespace.
package main_test
import (
"strings"
"unicode"
"testing"
)
func SpaceMap(str string) string {
return strings.Map(func(r rune) rune {
if unicode.IsSpace(r) {
return -1
}
return r
}, str)
}
func SpaceFieldsJoin(str string) string {
return strings.Join(strings.Fields(str), "")
}
func SpaceStringsBuilder(str string) string {
var b strings.Builder
b.Grow(len(str))
for _, ch := range str {
if !unicode.IsSpace(ch) {
b.WriteRune(ch)
}
}
return b.String()
}
func BenchmarkSpaceMap(b *testing.B) {
for n := 0; n < b.N; n++ {
SpaceMap(data)
}
}
func BenchmarkSpaceFieldsJoin(b *testing.B) {
for n := 0; n < b.N; n++ {
SpaceFieldsJoin(data)
}
}
func BenchmarkSpaceStringsBuilder(b *testing.B) {
for n := 0; n < b.N; n++ {
SpaceStringsBuilder(data)
}
}
Related Topics
How to Use Facets With a Dual Y-Axis Ggplot
How to Change Multiple Date Formats in Same Column
Error in ≪My Code≫: Target of Assignment Expands to Non-Language Object
R Shiny Passing Reactive to Selectinput Choices
Extract Month and Year from a Zoo::Yearmon Object
Why Do I Get "Warning Longer Object Length Is Not a Multiple of Shorter Object Length"
How to Put a Transformed Scale on the Right Side of a Ggplot2
Forcing Garbage Collection to Run in R With the Gc() Command
How to Extract a Single Column from a Data.Frame as a Data.Frame
How to Tell What Is in One Vector and Not Another
Convert Column With Pipe Delimited Data into Dummy Variables
All Levels of a Factor in a Model Matrix in R
Combining Paste() and Expression() Functions in Plot Labels