Reading Binary File and Looping Over Each Byte

Reading binary file and looping over each byte

Python 2.4 and Earlier

f = open("myfile", "rb")
try:
    byte = f.read(1)
    while byte != "":
        # Do stuff with byte.
        byte = f.read(1)
finally:
    f.close()

Python 2.5-2.7

with open("myfile", "rb") as f:
    byte = f.read(1)
    while byte != "":
        # Do stuff with byte.
        byte = f.read(1)

Note that the with statement is not available in versions of Python below 2.5. To use it in v 2.5 you'll need to import it:

from __future__ import with_statement

In 2.6 this is not needed.

Python 3

In Python 3, it's a bit different. We will no longer get raw characters from the stream in byte mode but byte objects, thus we need to alter the condition:

with open("myfile", "rb") as f:
    byte = f.read(1)
    while byte != b"":
        # Do stuff with byte.
        byte = f.read(1)

Or as benhoyt says, skip the not equal and take advantage of the fact that b"" evaluates to false. This makes the code compatible between 2.6 and 3.x without any changes. It would also save you from changing the condition if you go from byte mode to text or the reverse.

with open("myfile", "rb") as f:
    byte = f.read(1)
    while byte:
        # Do stuff with byte.
        byte = f.read(1)

python 3.8

From now on thanks to := operator the above code can be written in a shorter way.

with open("myfile", "rb") as f:
    while (byte := f.read(1)):
        # Do stuff with byte.

First byte skipped when reading binary file in python

The problem is:

You read the first byte before the loop, and when you enter the loop
you read another byte -> causing you to skip the first byte.

You should change it to:

f = open("GoldenFPGA.bit", "rb")
count = 0
print("#ifndef __CL_NX_BITSTREAM_HEADER_H");
print("#define __CL_NX_BITSTREAM_HEADER_H");
print("const uint8_t cl_nx_bitstream[] = ");
print("{");
print("    0x7A, 0x00, 0x00, 0x00,");
print("    ", end='')
try:
    byte = f.read(1)
    while byte:
        # Do stuff with byte.
        if byte:
            print("0x" + byte.hex() + ", ", end='')
        count = count + 1
        if count % 8 == 0:
            print("\n    ", end='')
        # read next byte
        byte = f.read(1)
finally:
    f.close()
print("\n};");
print("#endif");

Reading a binary file with python

Read the binary file content like this:

with open(fileName, mode='rb') as file: # b is important -> binary
    fileContent = file.read()

then "unpack" binary data using struct.unpack:

The start bytes: struct.unpack("iiiii", fileContent[:20])

The body: ignore the heading bytes and the trailing byte (= 24); The remaining part forms the body, to know the number of bytes in the body do an integer division by 4; The obtained quotient is multiplied by the string 'i' to create the correct format for the unpack method:

struct.unpack("i" * ((len(fileContent) -24) // 4), fileContent[20:-4])

The end byte: struct.unpack("i", fileContent[-4:])

What is the idiomatic way to iterate over a binary file?

I don't know of any built-in way to do this, but a wrapper function is easy enough to write:

def read_in_chunks(infile, chunk_size=1024*64):
    while True:
        chunk = infile.read(chunk_size)
        if chunk:
            yield chunk
        else:
            # The chunk was empty, which means we're at the end
            # of the file
            return

Then at the interactive prompt:

>>> from chunks import read_in_chunks
>>> infile = open('quicklisp.lisp')
>>> for chunk in read_in_chunks(infile):
...     print chunk
... 
<contents of quicklisp.lisp in chunks>

Of course, you can easily adapt this to use a with block:

with open('quicklisp.lisp') as infile:
    for chunk in read_in_chunks(infile):
        print chunk

And you can eliminate the if statement like this.

def read_in_chunks(infile, chunk_size=1024*64):
    chunk = infile.read(chunk_size)
    while chunk:
        yield chunk
        chunk = infile.read(chunk_size)

Reading Binary File and Looping Over Each Byte