How to Write Super-Fast File-Streaming Code in C#

How to write super-fast file-streaming code in C#?

I don't believe there's anything within .NET to allow copying a section of a file without buffering it in memory. However, it strikes me that this is inefficient anyway, as it needs to open the input file and seek many times. If you're just splitting up the file, why not open the input file once, and then just write something like:

public static void CopySection(Stream input, string targetFile, int length)
{
byte[] buffer = new byte[8192];

using (Stream output = File.OpenWrite(targetFile))
{
int bytesRead = 1;
// This will finish silently if we couldn't read "length" bytes.
// An alternative would be to throw an exception
while (length > 0 && bytesRead > 0)
{
bytesRead = input.Read(buffer, 0, Math.Min(length, buffer.Length));
output.Write(buffer, 0, bytesRead);
length -= bytesRead;
}
}
}

This has a minor inefficiency in creating a buffer on each invocation - you might want to create the buffer once and pass that into the method as well:

public static void CopySection(Stream input, string targetFile,
int length, byte[] buffer)
{
using (Stream output = File.OpenWrite(targetFile))
{
int bytesRead = 1;
// This will finish silently if we couldn't read "length" bytes.
// An alternative would be to throw an exception
while (length > 0 && bytesRead > 0)
{
bytesRead = input.Read(buffer, 0, Math.Min(length, buffer.Length));
output.Write(buffer, 0, bytesRead);
length -= bytesRead;
}
}
}

Note that this also closes the output stream (due to the using statement) which your original code didn't.

The important point is that this will use the operating system file buffering more efficiently, because you reuse the same input stream, instead of reopening the file at the beginning and then seeking.

I think it'll be significantly faster, but obviously you'll need to try it to see...

This assumes contiguous chunks, of course. If you need to skip bits of the file, you can do that from outside the method. Also, if you're writing very small files, you may want to optimise for that situation too - the easiest way to do that would probably be to introduce a BufferedStream wrapping the input stream.

how to write a super fast ascii file in C# if it contains 100000 lines of data?

what happens when you do this:

Record[] rec = new Record[100000]; 
Class1 cl = new Class1();
Random random = new Random();
int i = 0;
while (i < 100000)
{ rec[i].num1 = random.Next(); rec[i].var_set = cl.generateRandomString(2); i++; };

i = 0;
using (StreamWriter writer = new StreamWriter("important.txt", true))
{

while ( i < 100000)
{
writer.Write(rec[i].name);
writer.Write(" ");
writer.Write(rec[i].var_set);
writer.Write(" ");
writer.Write(rec[i].num1);
writer.Write(" ");
writer.Write(rec[i].num2);
writer.Write(" ");
writer.Write(rec[i].mult);
writer.Write(" ");
writer.WriteLine(rec[i].rel);

i++;
};
}

EDIT - another option:

Record[] rec = new Record[100000]; 
Class1 cl = new Class1();
Random random = new Random();
int i = 0;
while (i < 100000)
{ rec[i].num1 = random.Next(); rec[i].var_set = cl.generateRandomString(2); i++; };

File.WriteAllLines ( "important.txt", (from r in rec select r.name + " " +
r.var_set + " " + r.num1 + " " + r.num2 + " " + r.mult +
" " + r.rel).ToArray());

Is there a faster way to read data with a FileStream?

Well, we are working with buffered IO so iterating by byte isn't that bad.
But, reading data once (if you can) into a buffer is always faster - one IO.
So below I used your code - had to add a seek(0) in the loop to reset the iteration.

In the next block I read all the data in and iterate using the new .AsSpan<>() - which is the new fast way to iterate an array.

using System;
using System.Diagnostics;
using System.IO;

namespace test_con
{
class Program
{
static void Main(string[] args)
{
makedata();
var filePath = "data.dat";
var loop_cnt = 5000;
using FileStream fs = new FileStream(filePath, FileMode.Open);
bool[] buffer = new bool[fs.Length];

Stopwatch sw = new Stopwatch();
sw.Start();

for (int r = 0; r < loop_cnt; r++)
{
int stackable = 0;
int counter = 0;
while ((stackable = fs.ReadByte()) != -1)
{
buffer[counter] = (stackable == 1);
counter++;
}
fs.Seek(0, SeekOrigin.Begin);
}

Console.WriteLine($"avg iteration: {sw.Elapsed.TotalMilliseconds/loop_cnt}");

var byte_buf = new byte[fs.Length];
sw.Restart();

for (int r = 0; r < loop_cnt; r++)
{
fs.Seek(0, SeekOrigin.Begin);
fs.Read(byte_buf);
int counter = 0;
foreach(var b in byte_buf.AsSpan()) {
buffer[counter] = (b == 1);
counter++;
}
}

Console.WriteLine($"buf avg iteration: {sw.Elapsed.TotalMilliseconds / loop_cnt}");
}

static void makedata()
{
var filePath = "data.dat";
if (!File.Exists(filePath))
{
Random rnd = new Random();

using FileStream fs = new FileStream(filePath, FileMode.CreateNew);
for (int n = 0; n < 100000; n++)
{
if (rnd.Next() % 1 == 1)
fs.WriteByte(0);
else
fs.WriteByte(1);
}
}
}
}
}

The output on my 2012 MacBook is:

avg iteration: 1.01832286
buf avg iteration: 0.6913623999999999

So buffer iteration is only about 70% of the stream iteration.

Alternative/faster method to slow StreamWriter to print strings to a file in C#

I profiled this and it looks like it is completely the opposite. I was able to get about .25GB/s written to a standard 10K rpm drive (no SSD). It looks like you're calling this function a lot and writing to the file by connecting to it new each time. Try something like this (I snipped this together quickly from a piece of old console logging code, so it might be a bit buggy, and error handling is certainly not complete):

public static class LogWriter
{
// we keep a static reference to the StreamWriter so the stream stays open
// this could be closed when not needed, but each open() takes resources
private static StreamWriter writer = null;
private static string LogFilePath = null;

public static void Init(string FilePath)
{
LogFilePath = FilePath;
}

public static void WriteLine(string LogText)
{
// create a writer if one does not exist
if(writer==null)
{
writer = new StreamWriter(File.Open(LogFilePath,FileMode.OpenOrCreate,FileAccess.Write,FileShare.ReadWrite));
}
try
{
// do the actual work
writer.WriteLine(LogText);
}
catch (Exception ex)
{
// very simplified exception logic... Might want to expand this
if(writer!=null)
{
writer.Dispose();
}
}
}

// Make sure you call this before you end
public static void Close()
{
if(writer!=null)
{
writer.Dispose();
writer = null;
}
}
}

Reading and writing very large text files in C#

Not sure how much this will improve the performance, but surely, opening and closing the output file for every line that you want to write is not a good idea.

Instead open both files just one time and then write the line directly

using (StreamWriter file = new StreamWriter(@strFullFileName, true, System.Text.Encoding.UTF8))
using (StreamReader srStreamRdr = new StreamReader(strFileName))
{
while ((strDataLine = srStreamRdr.ReadLine()) != null)
{
lngCurrNumRows++;

if (lngCurrNumRows > 1)
file.WriteLine(strDataRow);
}
}

You could also remove the check on lngCurrNumRow simply making an empty read before entering the while loop

strDataLine = srStreamRdr.ReadLine();
if(strDataLine != null)
{
while ((strDataLine = srStreamRdr.ReadLine()) != null)
{
file.WriteLine(strDataRow);
}
}

How to write 1GB file in efficient way C#

You may get some speedup by using PLINQ to do the work in parallel, also switching from a list to a hash set will also greatly speed up the Contains( check. HashSet is thread safe for read-only operations.

private HashSet<string> _hshLineToRemove;

void ProcessFiles()
{
var inputLines = File.ReadLines(_inputFileName);
var filteredInputLines = inputLines.AsParallel().AsOrdered().Where(line => !_hshLineToRemove.Contains(line));
File.WriteAllLines(_outputFileName, filteredInputLines);
}

If it does not matter that the output file be in the same order as the input file you can remove the .AsOrdered() and get some additional speed.

Beyond this you are really just I/O bound, the only way to make it any faster is to get faster drives to run it on.



Related Topics



Leave a reply



Submit