Python How to Write to a Binary File

Python how to write to a binary file?

This is exactly what bytearray is for:

newFileByteArray = bytearray(newFileBytes)
newFile.write(newFileByteArray)

If you're using Python 3.x, you can use bytes instead (and probably ought to, as it signals your intention better). But in Python 2.x, that won't work, because bytes is just an alias for str. As usual, showing with the interactive interpreter is easier than explaining with text, so let me just do that.

Python 3.x:

>>> bytearray(newFileBytes)
bytearray(b'{\x03\xff\x00d')
>>> bytes(newFileBytes)
b'{\x03\xff\x00d'

Python 2.x:

>>> bytearray(newFileBytes)
bytearray(b'{\x03\xff\x00d')
>>> bytes(newFileBytes)
'[123, 3, 255, 0, 100]'

Writing to a binary file python

You appear to be having some confusion about types in Python.

The expression 'i' + 'j' is adding two strings together. This results in the string ij, which is most likely written to the file as two bytes.

The variable i is already an int. You can write it to a file as a 4-byte integer in a couple of different ways (which also apply to the float j):

  1. Use the struct module as detailed in how to write integer number in particular no of bytes in python ( file writing). Something like this:

    import struct
    with open('binary.file', 'wb') as f:
    f.write(struct.pack("i", i))

    You would use the 'd' specifier to write j.

  2. Use the numpy module to do the writing for you, which is especially convenient since you are already using it to read the file. The method ndarray.tofile is made just for this purpose:

    i = 4
    j = 5.55
    with open('binary.file', 'wb') as f:
    np.array(i, dtype=np.uint32).tofile(f)
    np.array(j, dtype=np.float64).tofile(f)

Note that in both cases I use open as a context manager when writing the file with a with block. This ensures that the file is closed, even if an error occurs during writing.

Python writing binary

When you open a file in binary mode, then you are essentially working with the bytes type. So when you write to the file, you need to pass a bytes object, and when you read from it, you get a bytes object. In contrast, when opening the file in text mode, you are working with str objects.

So, writing “binary” is really writing a bytes string:

with open(fileName, 'br+') as f:
f.write(b'\x07\x08\x07')

If you have actual integers you want to write as binary, you can use the bytes function to convert a sequence of integers into a bytes object:

>>> lst = [7, 8, 7]
>>> bytes(lst)
b'\x07\x08\x07'

Combining this, you can write a sequence of integers as a bytes object into a file opened in binary mode.


As Hyperboreus pointed out in the comments, bytes will only accept a sequence of numbers that actually fit in a byte, i.e. numbers between 0 and 255. If you want to store arbitrary (positive) integers in the way they are, without having to bother about knowing their exact size (which is required for struct), then you can easily write a helper function which splits those numbers up into separate bytes:

def splitNumber (num):
lst = []
while num > 0:
lst.append(num & 0xFF)
num >>= 8
return lst[::-1]

bytes(splitNumber(12345678901234567890))
# b'\xabT\xa9\x8c\xeb\x1f\n\xd2'

So if you have a list of numbers, you can easily iterate over them and write each into the file; if you want to extract the numbers individually later you probably want to add something that keeps track of which individual bytes belong to which numbers.

with open(fileName, 'br+') as f:
for number in numbers:
f.write(bytes(splitNumber(number)))

How to write binary file with bit length not a multiple of 8 in Python?

A file, as most OSes see them, contains a stream of bytes, sometimes referred to as characters. In some systems, there's a difference between text and binary data storage (e.g. PDP-1 uses 6 bit characters and 18 bit words), but the size of the file is counted in those bytes. For some systems, not even that level is stored, but an end-of-file character is used to mark where the data ends in the last block (be it sector, cluster or extent).

You'll need to replicate one of these methods to store a number of bits, for instance using 1-then-0s padding. The downside of that padding method is you need to find the end to know if a string of 0s (and the prior 1) form the padding, not data.

Another method might be to first store the number of bits, or just store the number of bits for each written chunk. Doing that requires an encoding such that you know the size of the size field, for instance one byte, which would imply chunks of no more than 256 bits. This length prefix method is used e.g. in Pascal strings.

You may also want to consider an established file format where bit sequences are stored, such as the serial vector format. Most of these aren't very efficient, and designed for specific tasks (in this case, storing time series of circuit simulation).

Schemes such as these can also be generalized into the data storage formats themselves. Examples include length-prefixed strings, UTF-8 code points, BitTorrent Bencoding or Exponential-Golomb coding. That last one is relevant today because it allows an arbitrary size and is supported by the bitstring module.

One reasonably easy way in bitstring might be to add an (aligned) trailing byte to the file signifying how many bits in the penultimate byte were padding:

def pad(data: bitstring.BitArray):
padding = data.bytealign()
data.append(bitstring.Bits(chr(padding)))
def unpad(data: bitstring.BitArray):
padding = data[-8:].uint
del data[-8-padding:]

If you're reading the file piecemeal you'll have to take care to do this unpadding as you reach the last two bytes.

Here's a 1-then-0 variation:

def pad(data: bitstring.BitArray):
data.append(bitstring.Bits(length=1, uint=1))
data.bytealign()
def unpad(data: bitstring.BitArray):
last1 = data.rfind(bitstring.Bits(length=1, uint=1))[0]
del data[last1:]

Different result writing to binary file with python and matlab

The characters still look very similar due to how Notepad is attempting to read integers as text, but I think it gave enough of a hint. For easier typing, let's call Matlab's text d? [@ _@ [@ and Python's text d? _@ [@ [@.

Computer memory is linear, so all multidimensional arrays are actually stored as 1D arrays. What you're seeing is NumPy arrays being C order (by default) versus Matlab matrices being Fortran order (by default). This order is how multidimensional arrays are flattened to 1D arrays in memory.

matrix   notepad text
1 2 d? _@
3 4 [@ [@

Matlab Fortran order goes by columns
1 3 2 4
d? [@ _@ [@

NumPy C order goes by rows
1 2 3 4
d? _@ [@ [@

Since you're converting code between MATLAB and Python, you should be very aware of array orders being different. Iteration is faster when you don't jump around in memory, so nested for-loops may have to be reordered. It won't make much of a difference for vectorized code someScalar * myArray because it's handled for you. NumPy does provide functions and optional arguments to create Fortran order arrays numpy.asfortranarray(), ndarray.copy(order = 'F') and to check the order ndarray.flags.f_contiguous, ndarray.flags.c_contiguous, but coding with that is still tougher because C order is the default.



Related Topics



Leave a reply



Submit