C# Filestream:Optimal Buffer Size for Writing Large Files

File I/O with streams - best memory buffer size

Files are already buffered by the file system cache. You just need to pick a buffer size that doesn't force FileStream to make the native Windows ReadFile() API call to fill the buffer too often. Don't go below a kilobyte, more than 16 KB is a waste of memory and unfriendly to the CPU's L1 cache (typically 16 or 32 KB of data).

4 KB is a traditional choice, even though that will exactly span a virtual memory page only ever by accident. It is difficult to profile; you'll end up measuring how long it takes to read a cached file. Which runs at RAM speeds, 5 gigabytes/sec and up if the data is available in the cache. It will be in the cache the second time you run your test, and that won't happen in a production environment too often. File I/O is completely dominated by the disk drive or the NIC and is glacially slow, copying the data is peanuts. 4 KB will work fine.

Why set a bufferSize in the FileStream ctor, if we set it later when reading?

Those are different buffers. One is internal buffer of FileStream itself (size of that you pass to constructor), and another one is caller's buffer (that one from Read). They are not related.

Say you pass 4000 to constructor as internal buffer size and then call:

Read(buffer, 0, 100);

What will happen (in simplified terms, and assuming that is first read from this stream) is FileStream will go and read 4000 bytes from file and store it in it's internal buffer. Then it will write 100 bytes to caller's buffer.

If you do

Read(buffer, 0, 8000)

It will go and read 4000 from file to the internal buffer, write 4000 to caller's buffer then will go again and read next 4000 from file to internal buffer, then complete writing to caller's one.

Why have that internal buffer? Because it's expensive to bother file system for every small read. Say you read FileStream byte by byte, 4000 times. It will bother filesystem only once, the rest 3999 it will return from internal buffer.

Handling big file stream (read+write bytes)

It is better to stream the data from one file to the other, only loading small parts of it into memory:

public static void CopyFileSection(string inFile, string outFile, long startPosition, long size)
{
// Open the files as streams
using (var inStream = File.OpenRead(inFile))
using (var outStream = File.OpenWrite(outFile))
{
// seek to the start position
inStream.Seek(startPosition, SeekOrigin.Begin);

// Create a variable to track how much more to copy
// and a buffer to temporarily store a section of the file
long remaining = size;
byte[] buffer = new byte[81920];

do
{
// Read the smaller of 81920 or remaining and break out of the loop if we've already reached the end of the file
int bytesRead = inStream.Read(buffer, 0, (int)Math.Min(buffer.Length, remaining));
if (bytesRead == 0) { break; }

// Write the buffered bytes to the output file
outStream.Write(buffer, 0, bytesRead);
remaining -= bytesRead;
}
while (remaining > 0);
}
}

Usage:

CopyFileSection(sourcefile, outfile, offset, size);

This should have equivalent functionality to your current method without the overhead of reading the entire file, regardless of its size, into memory.

Note: If you're doing this in code that uses async/await, you should change CopyFileSection to be public static async Task CopyFileSection and change inStream.Read and outStream.Write to await inStream.ReadAsync and await outStream.WriteAsync respectively.

What is the default buffer size for StreamWriter

Ah when documentation fails, decompile. I always forget that!

Well, don't do that. It isn't necessary anymore, you can look at the actual source code that the Microsoft programmers wrote. Always better than decompiled code, it has comments.

Visit the Reference Source website. It was updated about a fat year ago, it has now a very slick browser interface that's actually faster than a decompiler. Just type StreamWriter in the search box. Takes you at most a dozen seconds to discover:

    // For UTF-8, the values of 1K for the default buffer size and 4K for the
// file stream buffer size are reasonable & give very reasonable
// performance for in terms of construction time for the StreamWriter and
// write perf. Note that for UTF-8, we end up allocating a 4K byte buffer,
// which means we take advantage of adaptive buffering code.
// The performance using UnicodeEncoding is acceptable.
internal const int DefaultBufferSize = 1024; // char[]
private const int DefaultFileStreamBufferSize = 4096;

So the default is 1024 characters for the StreamWriter. And if you write to a file instead of a stream then there's a FileStream with a 4096 byte buffer, can't change that. It does expose a classic problem with comments, they have a knack for not being maintained and mismatch the code. The noodling about "adaptive buffering" isn't actually implemented. A KiB is an animal with 1024 toes, never 1000.

Determining buffer size when working with files in C#?

4KB is a good choice. for more info look to this:

File I/O with streams - best memory buffer size

Greetings

Increase Speed for Streaming Large(1-10 gb) files .Net Core

  • Always use ConfigureAwait with await to specify thread synchronization for the async continuation.

    • Depending on the platform, omitting ConfigureAwait may default to synchronizing with the UI thread (WPF, WinForms) or to any thread (ASP.NET Core). If it's synchronizing with the UI thread inside your Stream copy operation then it's no wonder performance takes a nose-dive.
    • If you're running code in a thread-synchronized context, then your await statements will be unnecessarily delayed because the program schedules the continuation to a thread that might be otherwise busy.
  • Use a buffer sized at least a couple hundred KiB - or even megabyte-sized buffer for async operations - not a typical 4KiB or 80KiB sized array.

    • This QA shows benchmarks that demonstrate that significantly larger buffers are necessary for async IO to have better performance than synchronous IO.
  • If you're using FileStream ensure you used FileOptions.Asynchronous or useAsync: true otherwise the FileStream will fake its async operations by performing blocking IO using a thread-pool thread instead of Windows' native async IO.

With respect to your actual code - just use Stream::CopyToAsync instead of reimplementing it yourself. If you want progress reporting then consider subclassing Stream (as a proxy wrapper) instead.

Here's how I would write your code:

  1. First, add my ProxyStream class from this GitHub Gist to your project.
  2. Then subclass ProxyStream to add support for IProgress:
  3. Ensure any FileStream instances are created with FileOptions.Asynchronous | FileOptions.SequentialScan.
  4. Use CopyToAsync.
public class ProgressProxyStream : ProxyStream
{
private readonly IProgress<(Int64 soFar, Int64? total)> progress;
private readonly Int64? total;

public ProgressProxyStream( Stream stream, IProgress<Int64> progress, Boolean leaveOpen )
: base( stream, leaveOpen )
{
this.progress = progress ?? throw new ArgumentNullException(nameof(progress));
this.total = stream.CanSeek ? stream.Length : (Int64?)null;
}

public override Task<Int32> ReadAsync( Byte[] buffer, Int32 offset, Int32 count, CancellationToken cancellationToken )
{
this.progress.Report( ( offset, this.total ) );
return this.Stream.ReadAsync( buffer, offset, count, cancellationToken );
}
}

If performance still suffers with the above ProgressProxyStream then I'm willing to bet the bottleneck is inside the IProgress.Report callback target (which I assume is synchronised to a UI thread) - in which case a better solution is to use a (System.Threading.Channels.Channel) for the ProgressProxyStream (or even your implementation of IProgress<T>) to dump progress reports to without blocking any other IO activity.



Related Topics



Leave a reply



Submit