Best Way to Read a Large File into a Byte Array in C#

Best way to read a large file into a byte array in C#?

Simply replace the whole thing with:

return File.ReadAllBytes(fileName);

However, if you are concerned about the memory consumption, you should not read the whole file into memory all at once at all. You should do that in chunks.

C# read blocks of data from file to byte array

Filestream would be a perfect choice for your case:

FileStream stream = new FileStream("Dir\File.dat", FileMode.Open, FileAccess.Read);         
byte[] block = new byte[16];
while (stream.Read(block, 0, 16) > 0) { //as long as this does not return 0, the data in the file hasn't been completely read
//Print/do anything you want with [block], your 16 bytes data are there
}

Its Read method would return 0 if there is not more data left. This is how you know the file has ended

Sample output (each byte is changed to its hex string representation):

[2016-01-18 05:35:52.827 UTC] 89 50 4E 47 0D 0A 1A 0A 00 00 00 0D 49 48 44 52
[2016-01-18 05:35:52.829 UTC] 00 00 01 4E 00 00 00 51 08 02 00 00 00 32 C6 D8
[2016-01-18 05:35:52.829 UTC] C4 00 00 00 01 73 52 47 42 00 AE CE 1C E9 00 00
[2016-01-18 05:35:52.830 UTC] 00 04 67 41 4D 41 00 00 B1 8F 0B FC 61 05 00 00
[2016-01-18 05:35:52.830 UTC] 00 09 70 48 59 73 00 00 0E C3 00 00 0E C3 01 C7
[2016-01-18 05:35:52.830 UTC] 6F A8 64 00 00 02 F4 49 44 41 54 78 5E ED D7 3D
[2016-01-18 05:35:52.831 UTC] 4E 1B 41 18 80 E1 BD 13 92 25 EE E2 62 0B 0E 11
[2016-01-18 05:35:52.831 UTC] A5 A2 73 C5 29 28 4D C7 19 68 49 C9 09 E8 49 95
[2016-01-18 05:35:52.831 UTC] 22 45 32 FB EB D5 1A C4 2C C4 64 C7 DF F3 68 8A
[2016-01-18 05:35:52.832 UTC] F1 B7 B3 48 20 5E AF 5D BD 00 01 48 1D 42 90 3A
[2016-01-18 05:35:52.832 UTC] 84 20 75 08 41 EA 10 82 D4 21 04 A9 43 08 52 87
[2016-01-18 05:35:52.832 UTC] 10 A4 0E 21 48 1D 42 90 3A 84 20 75 08 41 EA 10
[2016-01-18 05:35:52.833 UTC] 82 D4 21 04 A9 43 08 52 87 10 A4 0E 21 48 1D 42
[2016-01-18 05:35:52.833 UTC] 90 3A 84 F0 7E EA D5 B7 A7 73 5D FD 6F 08 01 64
[2016-01-18 05:35:52.833 UTC] A5 FE F8 FC EB F9 E7 EF 33 5B 52 27 94 AC D4 67
[2016-01-18 05:35:52.833 UTC] 91 9C C7 92 3A A1 48 1D 42 90 3A 84 20 75 08 E1
[2016-01-18 05:35:52.834 UTC] 34 A9 FF B8 B9 D8 DC 3C CC 86 FF 7A DD D6 D5 C5
[2016-01-18 05:35:52.834 UTC] EE E3 6F 43 52 27 94 FF 97 FA 78 E6 63 EF 0B E9
[2016-01-18 05:35:52.834 UTC] AE 4A EA 90 AB D0 D4 9F AE 37 97 DB FA 52 EA 90
[2016-01-18 05:35:52.835 UTC] E9 74 A9 5F 6D 37 55 EB EA B6 9F 0C 3D 4F AF 1E
[2016-01-18 05:35:52.835 UTC] 36 43 F6 ED AB E9 5D D7 75 3F DA DE 0D 3F FF EE
[2016-01-18 05:35:52.835 UTC] AA AA EF 1F 76 E7 99 FA 77 B2 F5 7F 32 32 9C 2C
[2016-01-18 05:35:52.836 UTC] F5 A1 CC 14 64 CA F2 28 F5 9B 87 71 72 B8 94 9E
[2016-01-18 05:35:52.836 UTC] D5 47 77 8D 85 1F 8E DD 6F DB 37 82 33 4E FD 0F
[2016-01-18 05:35:52.836 UTC] 19 A4 BE C8 E9 3F C0 37 B9 5E DD CE 26 69 3F 4E
[2016-01-18 05:35:52.837 UTC] 8E 37 C7 67 26 57 C7 C2 A5 1E 5C 49 A9 3F EE 36
[2016-01-18 05:35:52.837 UTC] 9B DD 63 FF 62 E2 AD F9 D4 FC CC BE AE EA 7D BF
[2016-01-18 05:35:52.838 UTC] 5F E0 4B 52 7F 35 DA 71 72 BC 39 3E 73 B8 DA 3C
[2016-01-18 05:35:52.844 UTC] F9 A7 3E 5C BB D4 4B 17 2F F5 B4 EB FE EB 57 95
[2016-01-18 05:35:52.845 UTC] 7A F7 65 7B FC 28 3E 7C EA 6E AE A6 6F DA D3 8C
[2016-01-18 05:35:52.845 UTC] 0F 3D BF F6 01 7E 9E 7A BB 6F 97 A7 7A 70 9E EA
[2016-01-18 05:35:52.846 UTC] 8B 9C F0 BB 7A 6F E8 B3 A9 B7 9F 5C B6 D1 A6 F8
[2016-01-18 05:35:52.846 UTC] 93 D4 FF B8 99 DC D8 DD 25 75 DE 56 62 EA 29 D3
[2016-01-18 05:35:52.847 UTC] 51 D3 6B 33 AF EB D9 B3 3A 0D BB 41 D5 DE B3 EA
[2016-01-18 05:35:52.847 UTC] D4 4B 58 52 2F 5D D9 4F F5 6E D2 56 DD 85 DB 6C
[2016-01-18 05:35:52.848 UTC] 87 FA FB A3 E3 19 A9 7F 66 49 BD 74 45 A6 DE 04
[2016-01-18 05:35:52.848 UTC] 3D 98 65 DC 5C AA F7 D3 03 8D CD 6E 2F F5 CF 2D
[2016-01-18 05:35:52.849 UTC] A9 97 AE C0 D4 53 A5 43 B5 DD 64 96 7A 37 99 95
[2016-01-18 05:35:52.849 UTC] 3C 3D D3 90 FA C2 25 F5 D2 95 99 FA 50 69 57 FD
[2016-01-18 05:35:52.850 UTC] 24 EC 61 7B 34 EA EF 1D 49 7D E1 92 7A E9 0A 4C
[2016-01-18 05:35:52.851 UTC] BD 8D 77 D4 A7 3E 38 F4 9C 62 9E 8C BE 32 F5 73
[2016-01-18 05:35:52.851 UTC] 5D FD 6F B8 32 52 CF 54 52 EA 2B F0 7E EA 7C 31
[2016-01-18 05:35:52.852 UTC] A9 67 92 FA 22 52 5F 1D A9 67 92 FA 22 52 5F 1D
[2016-01-18 05:35:52.852 UTC] A9 67 92 FA 22 52 5F 1D A9 67 92 FA 22 52 5F 1D
[2016-01-18 05:35:52.853 UTC] A9 67 92 FA 22 52 5F 1D A9 67 92 FA 22 52 5F 1D
[2016-01-18 05:35:52.853 UTC] A9 67 92 FA 22 52 5F 9D F4 1F 4C A6 FE 4F 46 06
[2016-01-18 05:35:52.854 UTC] A9 43 08 52 87 10 A4 0E 21 48 1D 42 90 3A 84 20
[2016-01-18 05:35:52.855 UTC] 75 08 41 EA 10 82 D4 21 04 A9 43 08 52 87 10 A4
[2016-01-18 05:35:52.856 UTC] 0E 21 48 1D 42 90 3A 84 20 75 08 41 EA 10 82 D4
[2016-01-18 05:35:52.856 UTC] 21 04 A9 43 08 52 87 10 A4 0E 21 48 1D 42 90 3A
[2016-01-18 05:35:52.857 UTC] 84 20 75 08 41 EA 10 82 D4 21 04 A9 43 08 52 87
[2016-01-18 05:35:52.857 UTC] 10 A4 0E 21 48 1D 42 90 3A 84 20 75 08 41 EA 10
[2016-01-18 05:35:52.862 UTC] 82 D4 21 04 A9 43 00 2F 2F 7F 01 43 36 3E CD D7
[2016-01-18 05:35:52.862 UTC] C4 27 55 00 00 00 00 49 45 4E 44 AE 42 60 82 D7

Difficulty reading large file into byte array

With a StreamReader object, you can Seek (place the "cursor") to any particular byte, so you can use that to go over the entire file's contents in reverse.

Example:

const int bufferSize = 1024;
string fileName = 'yourfile.txt';

StreamReader myStream = new StreamReader(fileName);
myStream.BaseStream.Seek(bufferSize, SeekOrigin.End);

char[] bytes = new char[bufferSize];
while(myStream.BaseStream.Position > 0)
{
bytes.Initialize();
myStream.BaseStream.Seek(bufferSize, SeekOrigin.Current);
int bytesRead = myStream.Read(bytes, 0, bufferSize);
}

Read a large binary file(5GB) into a byte array in C#?

See you can not read that much big file at once, so you have to either split the file in small portions and then process the file.

 OR

Read file using buffer concept and once you are done with that buffer data then flush out that buffer.

I faced the same issue, so i tried the buffer based approach and it worked for me.

         FileStream inputTempFile = new FileStream(Path, FileMode.OpenOrCreate, FileAccess.Read);
Buffer_value = 1024;
byte[] Array_buffer = new byte[Buffer_value];
while ((bytesRead = inputTempFile.Read(Array_buffer, 0, Buffer_value)) > 0)
{
for (int z = 0; z < Array_buffer.Length; z = z + 4)
{
string temp_id = BitConverter.ToString(Array_buffer, z, 4);
string[] temp_strArrayID = temp_id.Split(new char[] { '-' });
string temp_ArraydataID = temp_strArrayID[0] + temp_strArrayID[1] + temp_strArrayID[2] + temp_strArrayID[3];
}
}

this way you can process your data.

For my case i was trying to store buffer read data in to a List, it will work fine till 2GB data after that it will throw memory exception.

The approach i followed, read the data from buffer and apply needed filters and write filter data in to a text file and then process that file.

//text file approach

           FileStream inputTempFile = new FileStream(Path, FileMode.OpenOrCreate, FileAccess.Read);
Buffer_value = 1024;
StreamWriter writer = new StreamWriter(Path, true);
byte[] Array_buffer = new byte[Buffer_value];
while ((bytesRead = inputTempFile.Read(Array_buffer, 0, Buffer_value)) > 0)
{
for (int z = 0; z < Array_buffer.Length; z = z + 4)
{
string temp_id = BitConverter.ToString(Array_buffer, z, 4);
string[] temp_strArrayID = temp_id.Split(new char[] { '-' });
string temp_ArraydataID = temp_strArrayID[0] + temp_strArrayID[1] + temp_strArrayID[2] + temp_strArrayID[3];
if(temp_ArraydataID =="XYZ Condition")
{
writer.WriteLine(temp_ArraydataID);
}
}

}
writer.Close();

Read large file into byte array and encode it to ToBase64String

I would use two filestreams - one to read the large file, one to write the result back out.

So in chunks you would convert to base 64 ... then convert the resulting string to bytes ... and write.

    private static void ConvertLargeFileToBase64()
{
var buffer = new byte[16 * 1024];
using (var fsIn = new FileStream("D:\\in.txt", FileMode.Open, FileAccess.Read, FileShare.ReadWrite))
{
using (var fsOut = new FileStream("D:\\out.txt", FileMode.CreateNew, FileAccess.Write))
{
int read;
while ((read = fsIn.Read(buffer, 0, buffer.Length)) > 0)
{
// convert to base 64 and convert to bytes for writing back to file
var b64 = Encoding.ASCII.GetBytes(Convert.ToBase64String(buffer));

// write to the output filestream
fsOut.Write(b64, 0, read);
}

fsOut.Close();
}
}
}

Best way to read text file into byte array in selected encoding?

Best way to read file into byte array in selected encoding?

Character Encoding is about storing text in binary form, as sequences of specific bytes for each character. Another way of thinking about it is that the Encoding system is what gives meaning to some bytes. Without the context that some bytes represents text, the bytes are just bytes.

Files are just bytes too; And they can be interpreted however you want your application to interpret them.

When you decode bytes you are giving meaning to those bytes according the encoding system used. For text encodings, you start with bytes and end up with characters.

You can't "decode" bytes from a file into a byte array. That doesn't give meaning to the bytes or produce any characters.

You can decode bytes into strings using a specific encoding though:

string allLinesFromFileAsAuto = File.ReadAllText(filename);
string allLinesFromFileAsUTF8 = File.ReadAllText(filename, Encoding.UTF8);
string allLinesFromFileAsASCII = File.ReadAllText(filename, Encoding.ASCII);

All three of these methods convert bytes from the same file into strings, but the resulting strings will be different depending on the encoding you use.

And what encoding uses File.ReadAllBytes(filename) method?

File.ReadAllBytes(filename) does not use any encoding. Files are just bytes. This method pulls all of a file's bytes into a byte array. You still have to decode those bytes into strings after getting that byte array. But this only works well for plaintext files.

I need utf-8 byte arrays to store files in db

Is this because your database uses UTF-8 encoding?

The encoding of a database defines how text is stored (as binary).
Binary data can be stored as-is, byte-for-byte, as "blobs" in most databases, regardless of the encoding.

Handling big file stream (read+write bytes)

It is better to stream the data from one file to the other, only loading small parts of it into memory:

public static void CopyFileSection(string inFile, string outFile, long startPosition, long size)
{
// Open the files as streams
using (var inStream = File.OpenRead(inFile))
using (var outStream = File.OpenWrite(outFile))
{
// seek to the start position
inStream.Seek(startPosition, SeekOrigin.Begin);

// Create a variable to track how much more to copy
// and a buffer to temporarily store a section of the file
long remaining = size;
byte[] buffer = new byte[81920];

do
{
// Read the smaller of 81920 or remaining and break out of the loop if we've already reached the end of the file
int bytesRead = inStream.Read(buffer, 0, (int)Math.Min(buffer.Length, remaining));
if (bytesRead == 0) { break; }

// Write the buffered bytes to the output file
outStream.Write(buffer, 0, bytesRead);
remaining -= bytesRead;
}
while (remaining > 0);
}
}

Usage:

CopyFileSection(sourcefile, outfile, offset, size);

This should have equivalent functionality to your current method without the overhead of reading the entire file, regardless of its size, into memory.

Note: If you're doing this in code that uses async/await, you should change CopyFileSection to be public static async Task CopyFileSection and change inStream.Read and outStream.Write to await inStream.ReadAsync and await outStream.WriteAsync respectively.

Reliable way to convert a file to a byte[]

byte[] bytes = System.IO.File.ReadAllBytes(filename);

That should do the trick. ReadAllBytes opens the file, reads its contents into a new byte array, then closes it. Here's the MSDN page for that method.

Speed up reading a file to multiple bytes array and copy the offsets by if statement

You might have fallen in the "I use bytes and arrays so I'm low level so the code will be fast" trap. C# and .NET provide its users with efficient abstractions to avoid this, in this case probably a BinaryReader and its ReadString method.

You code has problems:

  • You don't read the entire file (you stop at the last buffer of 1048576 bytes filled up, the next x bytes wont be read)
  • You read a byte array then convert it to string ? you should use the .NET framework IO API (Streams) and allocate strings directly
  • You should encapsulate IO operations on files in a 'using' statement to avoid resources leaks (on file handles).

Performance issues I see:

  • Allocation of a big bytes array for nothing
  • Convert calls that can be avoided

Code sample (I did not compile it but it gives you the idea of a clean and efficient file reading, see my MSDN links for more examples)

string myText = string.Empty;
using(var reader = new BinaryReader(openFileDialog1.FileName))
{
string line;
while((line = reader.ReadString()) != null)
{
myText += YouLogic(line); // use a StringBuilder if the number of string concatenations is high
}
}

As a side note, for very big files, please consider using C# Memory Mapped Files

EDIT:
BinaryReader contains Read methods for all the needed data types, it should allow you to efficiently read any type of binary file.

As for you additional question, look at ReadBytes from MSDN:

//Offset 100 bytes:
var offsetRead = reader.ReadBytes(100);
// read one byte
byte readByte = reader.ReadByte();
// compare it to the expected value
if(readByte == myByte)
{
// copy the offset read
var myArray = new byte[100];
Array.Copy(offsetRead, myArray, offsetRead.Length);
DoSomething(myArray);
}


Related Topics



Leave a reply



Submit