Fastest Way to Remove White Spaces in String

Fastest way to remove white spaces in string

I would build a custom extension method using StringBuilder, like:

public static string ExceptChars(this string str, IEnumerable<char> toExclude)
{
StringBuilder sb = new StringBuilder(str.Length);
for (int i = 0; i < str.Length; i++)
{
char c = str[i];
if (!toExclude.Contains(c))
sb.Append(c);
}
return sb.ToString();
}

Usage:

var str = s.ExceptChars(new[] { ' ', '\t', '\n', '\r' });

or to be even faster:

var str = s.ExceptChars(new HashSet<char>(new[] { ' ', '\t', '\n', '\r' }));

With the hashset version, a string of 11 millions of chars takes less than 700 ms (and I'm in debug mode)

EDIT :

Previous code is generic and allows to exclude any char, but if you want to remove just blanks in the fastest possible way you can use:

public static string ExceptBlanks(this string str)
{
StringBuilder sb = new StringBuilder(str.Length);
for (int i = 0; i < str.Length; i++)
{
char c = str[i];
switch (c)
{
case '\r':
case '\n':
case '\t':
case ' ':
continue;
default:
sb.Append(c);
break;
}
}
return sb.ToString();
}

EDIT 2 :

as correctly pointed out in the comments, the correct way to remove all the blanks is using char.IsWhiteSpace method :

public static string ExceptBlanks(this string str)
{
StringBuilder sb = new StringBuilder(str.Length);
for (int i = 0; i < str.Length; i++)
{
char c = str[i];
if(!char.IsWhiteSpace(c))
sb.Append(c);
}
return sb.ToString();
}

Efficient way to remove ALL whitespace from String?

This is fastest way I know of, even though you said you didn't want to use regular expressions:

Regex.Replace(XML, @"\s+", "");

Crediting @hypehuman in the comments, if you plan to do this more than once, create and store a Regex instance. This will save the overhead of constructing it every time, which is more expensive than you might think.

private static readonly Regex sWhitespace = new Regex(@"\s+");
public static string ReplaceWhitespace(string input, string replacement)
{
return sWhitespace.Replace(input, replacement);
}

How to strip all whitespace from string

Taking advantage of str.split's behavior with no sep parameter:

>>> s = " \t foo \n bar "
>>> "".join(s.split())
'foobar'

If you just want to remove spaces instead of all whitespace:

>>> s.replace(" ", "")
'\tfoo\nbar'

Premature optimization

Even though efficiency isn't the primary goal—writing clear code is—here are some initial timings:

$ python -m timeit '"".join(" \t foo \n bar ".split())'
1000000 loops, best of 3: 1.38 usec per loop
$ python -m timeit -s 'import re' 're.sub(r"\s+", "", " \t foo \n bar ")'
100000 loops, best of 3: 15.6 usec per loop

Note the regex is cached, so it's not as slow as you'd imagine. Compiling it beforehand helps some, but would only matter in practice if you call this many times:

$ python -m timeit -s 'import re; e = re.compile(r"\s+")' 'e.sub("", " \t foo \n bar ")'
100000 loops, best of 3: 7.76 usec per loop

Even though re.sub is 11.3x slower, remember your bottlenecks are assuredly elsewhere. Most programs would not notice the difference between any of these 3 choices.

Python fastest way to remove multiple spaces in a string

Just a small rewrite of the suggestion up there, but just because something has a small fault doesn't mean you should assume it won't work.

You could easily do something like:

front_space = lambda x:x[0]==" "
trailing_space = lambda x:x[-1]==" "
" "*front_space(text)+' '.join(text.split())+" "*trailing_space(text)

Remove whitespaces inside a string in javascript

For space-character removal use

"hello world".replace(/\s/g, "");

for all white space use the suggestion by Rocket in the comments below!

Remove all whitespace in a string

If you want to remove leading and ending spaces, use str.strip():

>>> "  hello  apple  ".strip()
'hello apple'

If you want to remove all space characters, use str.replace() (NB this only removes the “normal” ASCII space character ' ' U+0020 but not any other whitespace):

>>> "  hello  apple  ".replace(" ", "")
'helloapple'

If you want to remove duplicated spaces, use str.split() followed by str.join():

>>> " ".join("  hello  apple  ".split())
'hello apple'

Remove ALL white spaces from text

You have to tell replace() to repeat the regex:

.replace(/ /g,'')

The g character makes it a "global" match, meaning it repeats the search through the entire string. Read about this, and other RegEx modifiers available in JavaScript here.

If you want to match all whitespace, and not just the literal space character, use \s instead:

.replace(/\s/g,'')

You can also use .replaceAll if you're using a sufficiently recent version of JavaScript, but there's not really any reason to for your specific use case, since catching all whitespace requires a regex, and when using a regex with .replaceAll, it must be global, so you just end up with extra typing:

.replaceAll(/\s/g,'')

Remove all whitespace from a string

If you want to modify the String, use retain. This is likely the fastest way when available.

fn remove_whitespace(s: &mut String) {
s.retain(|c| !c.is_whitespace());
}

If you cannot modify it because you still need it or only have a &str, then you can use filter and create a new String. This will, of course, have to allocate to make the String.

fn remove_whitespace(s: &str) -> String {
s.chars().filter(|c| !c.is_whitespace()).collect()
}

c# Fastest way to remove extra white spaces

The fastest way? Iterate over the string and build a second copy in a StringBuilder character by character, only copying one space for each group of spaces.

The easier to type Replace variants will create a bucket load of extra strings (or waste time building the regex DFA).

Edit with comparison results:

Using http://ideone.com/NV6EzU, with n=50 (had to reduce it on ideone because it took so long they had to kill my process), I get:

Regex: 7771ms.

Stringbuilder: 894ms.

Which is indeed as expected, Regex is horribly inefficient for something this simple.



Related Topics



Leave a reply



Submit