Memory Efficiency and Performance of String.Replace .Net Framework

Memory Efficiency and Performance of String.Replace .NET Framework

All characters in a .NET string are "unicode chars". Do you mean they're non-ascii? That shouldn't make any odds - unless you run into composition issues, e.g. an "e + acute accent" not being replaced when you try to replace an "e acute".

You could try using a regular expression with Regex.Replace, or StringBuilder.Replace. Here's sample code doing the same thing with both:

using System;
using System.Text;
using System.Text.RegularExpressions;

class Test
{
static void Main(string[] args)
{
string original = "abcdefghijkl";

Regex regex = new Regex("a|c|e|g|i|k", RegexOptions.Compiled);

string removedByRegex = regex.Replace(original, "");
string removedByStringBuilder = new StringBuilder(original)
.Replace("a", "")
.Replace("c", "")
.Replace("e", "")
.Replace("g", "")
.Replace("i", "")
.Replace("k", "")
.ToString();

Console.WriteLine(removedByRegex);
Console.WriteLine(removedByStringBuilder);
}
}

I wouldn't like to guess which is more efficient - you'd have to benchmark with your specific application. The regex way may be able to do it all in one pass, but that pass will be relatively CPU-intensive compared with each of the many replaces in StringBuilder.

Replace {#Text} and {$Text} in a string, in a performant way

IndexOf vs Regex: Tested using Stopwatch ticks over 100000 iterations with 500~ length string.

Method IndexOf

public static string Re(string str)
{
int strSIndex = -1;
int strEIndex = -1;

strSIndex = str.IndexOf("{#");
if (strSIndex == -1) strSIndex = str.IndexOf("{$");
if (strSIndex == -1) return str;

strEIndex = str.IndexOf("}");
if (strEIndex == -1) return str;

if (strEIndex < strSIndex)
{
strSIndex = str.IndexOf("{$");
if (strSIndex == -1) return str;
}

str = str.Substring(0, strSIndex) + str.Substring(strEIndex + 1);

return Re(str);
}

Regex Method

Regex re = new Regex(@"\{(?:#|\$)(\w+)}", RegexOptions.Compiled);
re.Replace(str, "");

Results (few replaces):

Fn: IndexOf
Ticks: 1181967

Fn: Regex
Ticks: 1482261

Notice that regex was set to compile before the iterations.

Results (lots of replaces):

Fn: Regex
Ticks: 19136772

Fn: IndexOf
Ticks: 37457111

replace a character in string without using replace function in .NET

You should have a look at some of the solutions discussed here:

Memory Efficiency and Performance of String.Replace .NET Framework

It mentions the use of Regex.Replace and StringBuilder.Replace

An efficient solution to a String.Replace problem?

It doesn't help with the number of loops, but if you use a StringBuilder as an intermediate it has a .Replace call with the same set of parameter signatures.

Edit:

Not sure if it's faster, but you can use Regex.Replace with an evaluator delegate.

If you build a search regex with your keys:
(key1|key2|key3|key4...)

and then pass in the delegate to .Replace, you can return a lookup based on the Match's Value property.

  public string ReplaceData(Match m)
{
return pairs[m.Value];
}

...

  pairs.Add("foo","bar");
pairs.Add("homer","simpson");
Regex r = new Regex("(?>foo|homer)");
MatchEvaluator myEval = new MatchEvaluator(class.ReplaceData);
string sOutput = r.Replace(sInput, myEval);

Performance char vs string

If you have two horses and want to know which is faster...

  String replaceMe = new String('a', 10000000) + 
new String('b', 10000000) +
new String('a', 10000000);

Stopwatch sw = new Stopwatch();

sw.Start();

// String replacement
if (replaceMe.Contains("a")) {
replaceMe = replaceMe.Replace("a", "b");
}

// Char replacement
//if (replaceMe.Contains('a')) {
// replaceMe = replaceMe.Replace('a', 'b');
//}

sw.Stop();

Console.Write(sw.ElapsedMilliseconds);

I've got 60 ms for Char replacement and 500 ms for String one (Core i5 3.2GHz, 64-bit, .Net 4.6). So

 replaceMe = replaceMe.Replace('a', 'b')

is about 9 times faster

Which one is performance wise to clear a string builder?

Update It turns out that you are using .net 3.5 and Clear was added in .net 4. So you should use Length = 0. Actually I'd probably add an extension method named Clear to do this since it is far more readable, in my view, than Length = 0.


I would use none of those and instead call Clear.

Clear is a convenience method that is equivalent to setting the Length property of the current instance to 0 (zero).

I can't imagine that it's slower than any of your variants and I also can't imagine that clearing a StringBuilder instance could ever be a bottleneck. If there is a bottleneck anywhere it will be in the appending code.

If performance of clearing the object really is a bottleneck then you will need to time your code to know which variant is faster. There's never a real substitute for benchmarking when considering performance.

Using String+string+string vs using string.replace

Your colleague is completely wrong.

He is mis-applying the fact that strings are immutable, and that appending two strings will create a third string object.

Your method (a + b + c) is the most efficient way to do this.

The compiler transforms your code into a call to String.Concat(string[]), which uses unsafe code to allocate a single buffer for all of the strings and copy them into the buffer.

His advice should be to use a StringBuilder when concatenating strings in a loop.

EDIT: String.Concat (which is equivalent to + concatenation, like your first example) is the fastest way to do this. Using a StringBuilder like in your edit will be slower, because it will need to resize the string during each Replace call.



Related Topics



Leave a reply



Submit