Reading Large Text Files With Streams in C#

Reading and writing very large text files in C#

Not sure how much this will improve the performance, but surely, opening and closing the output file for every line that you want to write is not a good idea.

Instead open both files just one time and then write the line directly

using (StreamWriter file = new StreamWriter(@strFullFileName, true, System.Text.Encoding.UTF8))
using (StreamReader srStreamRdr = new StreamReader(strFileName))
{
while ((strDataLine = srStreamRdr.ReadLine()) != null)
{
lngCurrNumRows++;

if (lngCurrNumRows > 1)
file.WriteLine(strDataRow);
}
}

You could also remove the check on lngCurrNumRow simply making an empty read before entering the while loop

strDataLine = srStreamRdr.ReadLine();
if(strDataLine != null)
{
while ((strDataLine = srStreamRdr.ReadLine()) != null)
{
file.WriteLine(strDataRow);
}
}

How to read a large (1 GB) txt file in .NET?

If you are using .NET 4.0, try MemoryMappedFile which is a designed class for this scenario.

You can use StreamReader.ReadLine otherwise.

How to load large files with StreamReader in chunks?

You can use Buffer Stream for chunk data read here is the code

private void ReadFile(string filePath)
{
const int MAX_BUFFER = 20971520; //20MB this is the chunk size read from file
byte[] buffer = new byte[MAX_BUFFER];
int bytesRead;

using (FileStream fs = File.Open(filePath, FileMode.Open, FileAccess.Read))
using (BufferedStream bs = new BufferedStream(fs))
{
while ((bytesRead = bs.Read(buffer, 0, MAX_BUFFER)) != 0) //reading only 20mb chunks at a time
{
//buffer contains the chunk data Treasure the moments with it . . .
//modify the buffer size above to change the size of chunk . . .
}
}
}

how to read big text files in C#?

You can try reading all lines at once :

private async void Button7_Click(object sender, EventArgs e)
{
openFileDialog = new OpenFileDialog();
if (openFileDialog.ShowDialog() == DialogResult.OK)
aFlistBoxEmail.DataSource = File.ReadAllLines(openFileDialog.FileName);
}

Reading very large text files, should I be incorporating async?

Your problem isn't synchronous v's asynchronous, it's that you're reading the entire file and storing parts of the file in memory before you do something with that data.

If you were reading each line, processing it and writing the result to another file/database, then StreamReader will let you process multi GB (or TB) files.

Theres only a problem if you're storing a portions of the file until you finish reading it, then you can run into memory issues (but you'd be surprised how large you can let Lists & Dictionaries get before you run out of memory)

What you need to do is save your processed data as soon as you can, and not keep it in memory (or keep as little in memory as possible).

With files that large you may need to keep your working set (your processing data) in a database - possibly something like SqlExpress or SqlLite would do (but again, it depends on how large your working set gets).

Hope this helps, don't hesitate to ask further questions in the comments, or edit your original question, I'll update this answer if I can help in any way.

Update - Paging/Chunking

You need to read the text file in chunks of one page, and allow the user to scroll through the "pages" in the file. As the user scrolls you read and present them with the next page.

Now, there are a couple of things you can do to help yourself, always keep about 10 pages in memory, this allows your app to be responsive if the user pages up / down a couple of pages very quickly. In the applications idle time (Application Idle event) you can read in the next few pages, again you throw away pages that are more than five pages before or after the current page.

Paging backwards is a problem, because you don't know where each line begins or ends in the file, therefore you don't know where each page begins or ends. So for paging backwards, as you read down through the file, keep a list of offsets to the start of each page (Stream.Pos), then you can quickly Seek to a given position and read the page in from there.

If you need to allow the user to search through the file, then you pretty much read through the file line by line (remembering the page offsets as you go) looking for the text, then when you find something, read in and present them with that page.

You can speed everything up by pre-processing the file into a database, there are grid controls that will work off a dynamic dataset (they will do the paging for you) and you get the benefit of built in searches / filters.

So, from a certain point of view, this is reading the file asynchronously, but that's from the users point of view. But from a technical point of view, we tend to mean something else when we talk about doing something asynchronous when programming.

Reading large files backwards (from end to start) in C#

Try following code. The last line could be blank. Wasn't sure best way of handling the last line being blank.

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.IO;

namespace GetFileReverse
{
class Program
{
const string FILENAME = @"c:\temp\test.txt";
static void Main(string[] args)
{
GetFileReverse getFileReverse = new GetFileReverse(FILENAME);
string line = "";
while ((line = getFileReverse.ReadLine()) != null)
{
Console.WriteLine(line);
}
}
}
public class GetFileReverse : IDisposable
{
const int BUFFER_SIZE = 1024;
private FileStream stream { get; set; }
private string data { get; set; }
public Boolean SOF { get; set; }
private long position { get; set; }
public GetFileReverse(string filename)
{
stream = File.OpenRead(filename);
if (stream != null)
{
position = stream.Seek(0, SeekOrigin.End);
SOF = false;
data = string.Empty;
}
else
{
SOF = true;
}
}
private byte[] ReadStream()
{
byte[] bytes = null;
int size = BUFFER_SIZE;
if (position != 0)
{
bytes = new byte[BUFFER_SIZE];
long oldPosition = position;
if (position >= BUFFER_SIZE)
{
position = stream.Seek(-1 * BUFFER_SIZE, SeekOrigin.Current);
}
else
{
position = stream.Seek(-1 * position, SeekOrigin.Current);
size = (int)(oldPosition - position);
bytes = new byte[size];
}
stream.Read(bytes, 0, size);
stream.Seek(-1 * size, SeekOrigin.Current);
}
return bytes;

}
public string ReadLine()
{
string line = "";
while (!SOF && (!data.Contains("\r\n")))
{
byte[] bytes = ReadStream();
if (bytes != null)
{
string temp = Encoding.UTF8.GetString(bytes);
data = data.Insert(0, temp);
}
SOF = position == 0;
}


int lastReturn = data.LastIndexOf("\r\n");
if (lastReturn == -1)
{
if (data.Length > 0)
{
line = data;
data = string.Empty;
}
else
{
line = null;
}
}
else
{
line = data.Substring(lastReturn + 2);
data = data.Remove(lastReturn);
}

return line;
}
public void Close()
{
stream.Close();
}
public void Dispose()
{
stream.Dispose();
data = string.Empty;
position = -1;
}
}
}

Searching numerous very large (50GB+) txt files for text matching

Assuming you have new lines, then it's simple enough to use a stream with a good buffer size. FileStreams and alike have an internal buffer and the internal mechanism will read from disk when it needs it allowing you to read an entire file without running into the fundamental .net array size limit or allocating large files into memory.

Note that anything over 85k will end up on your Large Object Heap anyway, so you might want to be mindful of the size one way or another.

using var sr = new StreamReader(
new FileStream("SomeFileName",
FileMode.Open,
FileAccess.Read,
FileShare.None,
1024 * 1024,// some nasty buffer size that you have benchmarked for your system
FileOptions.SequentialScan));

while (!sr.EndOfStream)
{
if (sr.ReadLine().Contains("bob"))
return true;
}

Notes : The buffer size will be key to performance here, SSD's can take a larger size than the old spindal crayon hdds. Determining the right size will require benchmarking

How could I read a very large text file using StreamReader?

Firstly, the code you posted will only put the first line of the file into the TextBox. What you want is this:

using (var reader = new StreamReader(@"C:\Test.txt"))
{
while (!reader.EndOfStream)
textBox1.Text += reader.ReadLine();
}

Now as for the OutOfMemoryException: I haven't tested this, but have you tried the TextBox.AppendText method instead of using +=? The latter will certainly be allocating a ton of strings, most of which are going to be nearly the length of the entire file by the time you near the end of the file.

For all I know, AppendText does this as well; but its existence leads me to suspect it's put there to deal with this scenario. I could be wrong -- like I said, haven't tested personally.



Related Topics



Leave a reply



Submit