String Output: Format or Concat in C#

String output: format or concat in C#?

Try this code.

It's a slightly modified version of your code.

  1. I removed Console.WriteLine as it's probably a few orders of magnitude slower than what I'm trying to measure.
  2. I'm starting the Stopwatch before the loop and stopping it right after, this way I'm not losing precision if the function takes for example 26.4 ticks to execute.
  3. The way you divided the result by some iterations was wrong. See what happens if you have 1,000 milliseconds and 100 milliseconds. In both situations, you will get 0 ms after dividing it by 1,000,000.

Code:

Stopwatch s = new Stopwatch();

var p = new { FirstName = "Bill", LastName = "Gates" };

int n = 1000000;
long fElapsedMilliseconds = 0, fElapsedTicks = 0, cElapsedMilliseconds = 0, cElapsedTicks = 0;

string result;
s.Start();
for (var i = 0; i < n; i++)
result = (p.FirstName + " " + p.LastName);
s.Stop();
cElapsedMilliseconds = s.ElapsedMilliseconds;
cElapsedTicks = s.ElapsedTicks;
s.Reset();
s.Start();
for (var i = 0; i < n; i++)
result = string.Format("{0} {1}", p.FirstName, p.LastName);
s.Stop();
fElapsedMilliseconds = s.ElapsedMilliseconds;
fElapsedTicks = s.ElapsedTicks;
s.Reset();


Console.Clear();
Console.WriteLine(n.ToString()+" x result = string.Format(\"{0} {1}\", p.FirstName, p.LastName); took: " + (fElapsedMilliseconds) + "ms - " + (fElapsedTicks) + " ticks");
Console.WriteLine(n.ToString() + " x result = (p.FirstName + \" \" + p.LastName); took: " + (cElapsedMilliseconds) + "ms - " + (cElapsedTicks) + " ticks");
Thread.Sleep(4000);

Those are my results:

1000000 x result = string.Format("{0} {1}", p.FirstName, p.LastName); took: 618ms - 2213706 ticks

1000000 x result = (p.FirstName + " " + p.LastName); took: 166ms - 595610 ticks


When is it better to use String.Format vs string concatenation?

Before C# 6

To be honest, I think the first version is simpler - although I'd simplify it to:

xlsSheet.Write("C" + rowIndex, null, title);

I suspect other answers may talk about the performance hit, but to be honest it'll be minimal if present at all - and this concatenation version doesn't need to parse the format string.

Format strings are great for purposes of localisation etc, but in a case like this concatenation is simpler and works just as well.

With C# 6

String interpolation makes a lot of things simpler to read in C# 6. In this case, your second code becomes:

xlsSheet.Write($"C{rowIndex}", null, title);

which is probably the best option, IMO.

Concatenate multiple strings versus String.Format

It's largely a matter of style. However, consider more complex formatting. For example you want to format a bunch of stuff:

var s = String.Format("{0,10:N2}  {1,-20}  {2:P2}", val, description, (val/100));

Or ...

var s = val.ToString("10:N2") + string.Format("{0,-20}", desc) + (val/100).ToString("P2");

I like the String.Format call there much better. It separates the formatting from the content in much the way that CSS separates the presentation from the HTML content. When I'm writing or examining code that formats output, I typically want to see the format at a glance. The actual values being formatted aren't relevant when I'm debugging a formatting issue.

With concatenation, I have to slog my way through the individual values to see what the format is for each item.

Finally, does performance really matter here? Most programs spend precious little time formatting their output. Optimizing the formatting code seems like a huge waste of time. Write whatever is easier to maintain and prove correct.

Are there benefits to using string formatting versus string concatenation?

For me, the benefits of the string.Format pendant are:

  • Improved readability
  • Better translatable

From a performance perspective, I did never do any measurements; it could well be that the concatenation is faster then the string.Format pendant.

String.Format vs string + string or StringBuilder?


  • Compiler will optimize as much string concat as it can, so for example strings that are just broken up for line break purposes can usually be optimized into a single string literal.
  • Concatenation with variables will get compiled into String.Concat
  • StringBuilder can be a lot faster if you're doing several (more than 10 or so I guess) "modifications" to a string but it carries some extra overhead because it allocates more space than you need in its buffer and resizes its internal buffer when it needs to.

I personally use String.Format almost all of the time for two reasons:

  • It's a lot easier to maintain the format string than rearranging a bunch of variables.
  • String.Format takes a IFormatProvider which is passed to any IFormattable types embedded in the string (such as numeric) so that you get appropriate numeric formatting for the specified culture and overall just more control over how values are formatted.

For example, since some cultures use a comma as a decimal point you would want to ensure with either StringBuilder or String.Format that you specify CultureInfo.InvariantCulture if you wanted to ensure that numbers were formatted the way you intend.

Two more thing to note...

  • StringBuilder also has an AppendFormat function which gives you the flexibility of String.Format without requiring an unnecessary second buffer.
  • When using StringBuilder, make sure you don't defeat the purpose by concatenating parameters that you pass to Append. It's an easy one to miss.

Memory fragmentation when concatenating or adding strings but not with string.Format?


So a professor in university just told me that using concatenation on strings in C# (i.e. when you use the plus sign operator) creates memory fragmentation, and that I should use string.Format instead.

No, what you should do instead is do user research, set user-focussed real-world performance metrics, and measure the performance of your program against those metrics. When, and only when you find a performance problem, you should use the appropriate profiling tools to determine the cause of the performance issue. If the cause is "memory fragmentation" then address that by identifying the causes of the "fragmentation" and trying experiments to determine what techniques mitigate the effect.

Performance is not achieved by "tips and tricks" like "avoid string concatenation". Performance is achieved by applying engineering discipline to realistic problems.

To address your more specific problem: I have never heard the advice to eschew concatenation in favor of formatting for performance reasons. The advice usually given is to eschew iterated concatenation in favor of builders. Iterated concatenation is quadratic in time and space and creates collection pressure. Builders allocate unnecessary memory but are linear in typical scenarios. Neither creates fragmentation of the managed heap; iterated concatenation tends to produce contiguous blocks of garbage.

The number of times I've had a performance problem that came down to unnecessary fragmentation of a managed heap is exactly one; in an early version of Roslyn we had a pattern where we would allocate a small long lived object, then a small short lived object, then a small long lived object... several hundred thousand times in a row, and the resulting maximally fragmented heap caused user-impacting performance problems on collections; we determined this by careful measurement of the performance in the relevant scenarios, not by ad hoc analysis of the code from our comfortable chairs.

The usual advice is not to avoid fragmentation, but rather to avoid pressure. We found during the design of Roslyn that pressure was far more impactful on GC performance than fragmentation, once our aforementioned allocation pattern problem was fixed.

My advice to you is to either press your professor for an explanation, or to find a professor who has a more disciplined approach to performance metrics.

Now, all that said, you should use formatting instead of concatenation, but not for performance reasons. Rather, for code readability, localizability, and similar stylistic concerns. A format string can be made into a resource, it can be localized, and so on.

Finally, I caution you that if you are putting strings together in order to build something like a SQL query or a block of HTML to be served to a user, then you want to use none of these techniques. These applications of string building have serious security impacts when you get them wrong. Use libraries and tools specifically designed for construction of those objects, rather than rolling your own with strings.

String output: format or concat in C#?

Try this code.

It's a slightly modified version of your code.

  1. I removed Console.WriteLine as it's probably a few orders of magnitude slower than what I'm trying to measure.
  2. I'm starting the Stopwatch before the loop and stopping it right after, this way I'm not losing precision if the function takes for example 26.4 ticks to execute.
  3. The way you divided the result by some iterations was wrong. See what happens if you have 1,000 milliseconds and 100 milliseconds. In both situations, you will get 0 ms after dividing it by 1,000,000.

Code:

Stopwatch s = new Stopwatch();

var p = new { FirstName = "Bill", LastName = "Gates" };

int n = 1000000;
long fElapsedMilliseconds = 0, fElapsedTicks = 0, cElapsedMilliseconds = 0, cElapsedTicks = 0;

string result;
s.Start();
for (var i = 0; i < n; i++)
result = (p.FirstName + " " + p.LastName);
s.Stop();
cElapsedMilliseconds = s.ElapsedMilliseconds;
cElapsedTicks = s.ElapsedTicks;
s.Reset();
s.Start();
for (var i = 0; i < n; i++)
result = string.Format("{0} {1}", p.FirstName, p.LastName);
s.Stop();
fElapsedMilliseconds = s.ElapsedMilliseconds;
fElapsedTicks = s.ElapsedTicks;
s.Reset();


Console.Clear();
Console.WriteLine(n.ToString()+" x result = string.Format(\"{0} {1}\", p.FirstName, p.LastName); took: " + (fElapsedMilliseconds) + "ms - " + (fElapsedTicks) + " ticks");
Console.WriteLine(n.ToString() + " x result = (p.FirstName + \" \" + p.LastName); took: " + (cElapsedMilliseconds) + "ms - " + (cElapsedTicks) + " ticks");
Thread.Sleep(4000);

Those are my results:

1000000 x result = string.Format("{0} {1}", p.FirstName, p.LastName); took: 618ms - 2213706 ticks

1000000 x result = (p.FirstName + " " + p.LastName); took: 166ms - 595610 ticks


String Interpolation vs String.Format

The answer is both yes and no. ReSharper is fooling you by not showing a third variant, which is also the most performant. The two listed variants produce equal IL code, but the following will indeed give a boost:

myString += $"{x.ToString("x2")}";

Full test code

using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Configs;
using BenchmarkDotNet.Diagnosers;
using BenchmarkDotNet.Diagnostics.Windows;
using BenchmarkDotNet.Running;

namespace StringFormatPerformanceTest
{
[Config(typeof(Config))]
public class StringTests
{
private class Config : ManualConfig
{
public Config() => AddDiagnoser(MemoryDiagnoser.Default, new EtwProfiler());
}

[Params(42, 1337)]
public int Data;

[Benchmark] public string Format() => string.Format("{0:x2}", Data);
[Benchmark] public string Interpolate() => $"{Data:x2}";
[Benchmark] public string InterpolateExplicit() => $"{Data.ToString("x2")}";
}

class Program
{
static void Main(string[] args)
{
var summary = BenchmarkRunner.Run<StringTests>();
}
}
}

Test results

|              Method | Data |      Mean |  Gen 0 | Allocated |
|-------------------- |----- |----------:|-------:|----------:|
| Format | 42 | 118.03 ns | 0.0178 | 56 B |
| Interpolate | 42 | 118.36 ns | 0.0178 | 56 B |
| InterpolateExplicit | 42 | 37.01 ns | 0.0102 | 32 B |
| Format | 1337 | 117.46 ns | 0.0176 | 56 B |
| Interpolate | 1337 | 113.86 ns | 0.0178 | 56 B |
| InterpolateExplicit | 1337 | 38.73 ns | 0.0102 | 32 B |

The InterpolateExplicit() method is faster since we now explicitly tell the compiler to use a string. No need to box the object to be formatted. Boxing is indeed very costly. Also, note that we reduced the allocations a bit.

Difference between concatenation and {}

both look similar as you use string types. suppose you deal with different types. Then you will see the difference between Concatenation and Composite Formatting.

 int myInt = 2;
Console.WriteLine("This is my int {0}", myInt);

Suppose now you want to put more types inside the composite formatting:

 char  myChar = 'c';
bool myBool = true;

Console.WriteLine("This is my bool {0} and myChar {1}", myBool ,myChar );

But Concatenation is the process of appending one string to the end of another string. When you concatenate string literals or string constants by using the + operator, the compiler creates a single string. No run time concatenation occurs. However, string variables can be concatenated only at run time. In this case, you should understand the performance implications of the various approaches.

Concat two strings when initialising class object without providing the concatenated string

FullName is not a real property and should have only a getter

public String FullName
{
get { return string.Format("{0} {1}", FirstName, LastName); }
}

and this is NOT java.. you can use string

UPDATE

please stop concating string and start formatting them or building them with StringBuilder..
String output: format or concat in C#?



Related Topics



Leave a reply



Submit