File I/O with streams - best memory buffer size
Files are already buffered by the file system cache. You just need to pick a buffer size that doesn't force FileStream to make the native Windows ReadFile() API call to fill the buffer too often. Don't go below a kilobyte, more than 16 KB is a waste of memory and unfriendly to the CPU's L1 cache (typically 16 or 32 KB of data).
4 KB is a traditional choice, even though that will exactly span a virtual memory page only ever by accident. It is difficult to profile; you'll end up measuring how long it takes to read a cached file. Which runs at RAM speeds, 5 gigabytes/sec and up if the data is available in the cache. It will be in the cache the second time you run your test, and that won't happen in a production environment too often. File I/O is completely dominated by the disk drive or the NIC and is glacially slow, copying the data is peanuts. 4 KB will work fine.
Why set a bufferSize in the FileStream ctor, if we set it later when reading?
Those are different buffers. One is internal buffer of FileStream
itself (size of that you pass to constructor), and another one is caller's buffer (that one from Read
). They are not related.
Say you pass 4000 to constructor as internal buffer size and then call:
Read(buffer, 0, 100);
What will happen (in simplified terms, and assuming that is first read from this stream) is FileStream
will go and read 4000 bytes from file and store it in it's internal buffer. Then it will write 100 bytes to caller's buffer.
If you do
Read(buffer, 0, 8000)
It will go and read 4000 from file to the internal buffer, write 4000 to caller's buffer then will go again and read next 4000 from file to internal buffer, then complete writing to caller's one.
Why have that internal buffer? Because it's expensive to bother file system for every small read. Say you read FileStream
byte by byte, 4000 times. It will bother filesystem only once, the rest 3999 it will return from internal buffer.
Handling big file stream (read+write bytes)
It is better to stream the data from one file to the other, only loading small parts of it into memory:
public static void CopyFileSection(string inFile, string outFile, long startPosition, long size)
{
// Open the files as streams
using (var inStream = File.OpenRead(inFile))
using (var outStream = File.OpenWrite(outFile))
{
// seek to the start position
inStream.Seek(startPosition, SeekOrigin.Begin);
// Create a variable to track how much more to copy
// and a buffer to temporarily store a section of the file
long remaining = size;
byte[] buffer = new byte[81920];
do
{
// Read the smaller of 81920 or remaining and break out of the loop if we've already reached the end of the file
int bytesRead = inStream.Read(buffer, 0, (int)Math.Min(buffer.Length, remaining));
if (bytesRead == 0) { break; }
// Write the buffered bytes to the output file
outStream.Write(buffer, 0, bytesRead);
remaining -= bytesRead;
}
while (remaining > 0);
}
}
Usage:
CopyFileSection(sourcefile, outfile, offset, size);
This should have equivalent functionality to your current method without the overhead of reading the entire file, regardless of its size, into memory.
Note: If you're doing this in code that uses async/await, you should change CopyFileSection
to be public static async Task CopyFileSection
and change inStream.Read
and outStream.Write
to await inStream.ReadAsync
and await outStream.WriteAsync
respectively.
What is the default buffer size for StreamWriter
Ah when documentation fails, decompile. I always forget that!
Well, don't do that. It isn't necessary anymore, you can look at the actual source code that the Microsoft programmers wrote. Always better than decompiled code, it has comments.
Visit the Reference Source website. It was updated about a fat year ago, it has now a very slick browser interface that's actually faster than a decompiler. Just type StreamWriter in the search box. Takes you at most a dozen seconds to discover:
// For UTF-8, the values of 1K for the default buffer size and 4K for the
// file stream buffer size are reasonable & give very reasonable
// performance for in terms of construction time for the StreamWriter and
// write perf. Note that for UTF-8, we end up allocating a 4K byte buffer,
// which means we take advantage of adaptive buffering code.
// The performance using UnicodeEncoding is acceptable.
internal const int DefaultBufferSize = 1024; // char[]
private const int DefaultFileStreamBufferSize = 4096;
So the default is 1024 characters for the StreamWriter. And if you write to a file instead of a stream then there's a FileStream with a 4096 byte buffer, can't change that. It does expose a classic problem with comments, they have a knack for not being maintained and mismatch the code. The noodling about "adaptive buffering" isn't actually implemented. A KiB is an animal with 1024 toes, never 1000.
Determining buffer size when working with files in C#?
4KB is a good choice. for more info look to this:
File I/O with streams - best memory buffer size
Greetings
Increase Speed for Streaming Large(1-10 gb) files .Net Core
- Always use
ConfigureAwait
withawait
to specify thread synchronization for the async continuation.- Depending on the platform, omitting
ConfigureAwait
may default to synchronizing with the UI thread (WPF, WinForms) or to any thread (ASP.NET Core). If it's synchronizing with the UI thread inside your Stream copy operation then it's no wonder performance takes a nose-dive. - If you're running code in a thread-synchronized context, then your
await
statements will be unnecessarily delayed because the program schedules the continuation to a thread that might be otherwise busy.
- Depending on the platform, omitting
- Use a buffer sized at least a couple hundred KiB - or even megabyte-sized buffer for async operations - not a typical 4KiB or 80KiB sized array.
- This QA shows benchmarks that demonstrate that significantly larger buffers are necessary for async IO to have better performance than synchronous IO.
- If you're using
FileStream
ensure you usedFileOptions.Asynchronous
oruseAsync: true
otherwise theFileStream
will fake its async operations by performing blocking IO using a thread-pool thread instead of Windows' native async IO.
With respect to your actual code - just use Stream::CopyToAsync
instead of reimplementing it yourself. If you want progress reporting then consider subclassing Stream
(as a proxy wrapper) instead.
Here's how I would write your code:
- First, add my
ProxyStream
class from this GitHub Gist to your project. - Then subclass
ProxyStream
to add support forIProgress
: - Ensure any
FileStream
instances are created withFileOptions.Asynchronous | FileOptions.SequentialScan
. - Use
CopyToAsync
.
public class ProgressProxyStream : ProxyStream
{
private readonly IProgress<(Int64 soFar, Int64? total)> progress;
private readonly Int64? total;
public ProgressProxyStream( Stream stream, IProgress<Int64> progress, Boolean leaveOpen )
: base( stream, leaveOpen )
{
this.progress = progress ?? throw new ArgumentNullException(nameof(progress));
this.total = stream.CanSeek ? stream.Length : (Int64?)null;
}
public override Task<Int32> ReadAsync( Byte[] buffer, Int32 offset, Int32 count, CancellationToken cancellationToken )
{
this.progress.Report( ( offset, this.total ) );
return this.Stream.ReadAsync( buffer, offset, count, cancellationToken );
}
}
If performance still suffers with the above ProgressProxyStream
then I'm willing to bet the bottleneck is inside the IProgress.Report
callback target (which I assume is synchronised to a UI thread) - in which case a better solution is to use a (System.Threading.Channels.Channel
) for the ProgressProxyStream
(or even your implementation of IProgress<T>
) to dump progress reports to without blocking any other IO activity.
Related Topics
C# Picturebox Transparent Background Doesn't Seem to Work
Why Enums Require an Explicit Cast to Int Type
Serialize Property, But Do Not Deserialize Property in JSON.Net
Importing Nested Namespaces Automatically in C#
How to Get Around the "'" Problem in SQLite and C#
Passing Values Between Forms (Winforms)
How to Deserialize a JSON Array into an Object Using JSON.Net
Create a Coroutine to Fade Out Different Types of Object
Setting Unique Constraint with Fluent API
Can You Configure Log4Net in Code Instead of Using a Config File
Mvvm in Wpf - How to Alert Viewmodel of Changes in Model... or Should I
Display a Image in a Console Application
Get Value of C# Dynamic Property via String