How to Split a Byte String into Separate Bytes in Python

How to split a byte string into separate bytes in python

You can use slicing on byte objects:

>>> value = b'\x00\x01\x00\x02\x00\x03'
>>> value[:2]
b'\x00\x01'
>>> value[2:4]
b'\x00\x02'
>>> value[-2:]
b'\x00\x03'

When handling these frames, however, you probably also want to know about memoryview() objects; these let you interpret the bytes as C datatypes without any extra work on your part, simply by casting a 'view' on the underlying bytes:

>>> mv = memoryview(value).cast('H')
>>> mv[0], mv[1], mv[2]
256, 512, 768

The mv object is now a memory view interpreting every 2 bytes as an unsigned short; so it now has length 3 and each index is an integer value, based on the underlying bytes.

split byte string into lines

There is no reason to convert to string. Just give split bytes parameters. Split strings with strings, bytes with bytes.

>>> a = b'asdf\nasdf'
>>> a.split(b'\n')
[b'asdf', b'asdf']

Stream split of a byte array in python

As per the docs, split returns a list, not a generator. You read one byte at a time and maintain your own line buffer, though, something like:

def get_lines_buffer(bytes_):
buff = bytearray()
for b in bytes_:
if b == b'\n':
yield buff.decode('utf-8')
buff = bytearray()
else:
buff.append(b)
if buff:
yield buff.decode('utf-8') # yield remaining buffer


for line in get_lines_buffer(b'123\n456\n789'):
print(line)

Or here's your find method:

def get_lines_find(bytes_):
a, b = 0, 0
while b < len(bytes_):
b = bytes_.find(b'\n', a)
if b == -1:
b = len(bytes_) # no further matches
s = bytes_[a:b]
a = b + 1
yield s.decode('utf-8')

for line in get_lines_find(b'123\n456\n789'):
print(line)

Comparing the two:

data = b'123\n456\n789\n' * int(1e5)


def test_buffer():
for _ in get_lines_buffer(data):
pass


def test_find():
for _ in get_lines_find(data):
pass


if __name__ == '__main__':
import timeit

time_buffer = timeit.timeit(
"test_buffer()",
setup="from __main__ import test_buffer",
number=5)
print(f'buffer method: {time_buffer:.3f}s')

time_find = timeit.timeit(
"test_find()",
setup="from __main__ import test_find",
number=5)
print(f'find method: {time_find:.3f}s')

Performance seems to be a bit slower with the "find" method:

buffer method: 8.027s
find method: 10.370s

Also note that bytes is a built-in name, you shouldn't use that as a variable name.

Divide 'bytes' object into chunks in Python

bytes can be tricky.

First off, encoded_array = str(int_array).encode() is not doing what you think. If you print encoded_array, you'll see that it's literally converting the to_string value of int_array to bytes. This is why the first value of encoded_array is [

>>> encoded_array[:1]
b'['

Second, I'm not sure you want int_array = [i**2 for i in range(1, 100)] to do what it's doing. It creates values up to 10,000. I'm unsure if you would like for this range to be between 0 and 256 if each element is to represent a byte. I'm going to assume you would like some data to be converted to bytes and split into 40 byte chunks and the data in this case is an array of integers.

First let's convert your int_array into an array of bytes. I'm going to convert each int into a 2 byte value and represent it in hex.

>>> hex_array = [x.to_bytes(2, byteorder="big") for x in int_array]

Now to split up the data into bytes of 40

>>> h
[]

>>> for x in range(0, int(round(len(hex_array)/20))):
... h.append(hex_array[:20])
... del hex_array[:20]

I'm splitting by 20 because each element holds 2 bytes

Now you'll have an array of max 40 byte collections. Time to join the collections together so we can transmit them!

>>> result
[]

>>> for package in h:
... joined_package = b';'.join(package)
... result.append(joined_package)

split bytes variable on newline

You'll need to do this (see the string.split function for more details)...

for word in output.decode('utf-8').split('\n'):
print word

And you don't need to print word - you can do anything you want with it. This loop will iterate over every line in output.

splitting string / bytes in Python 3

Both string and bytes have a split method, that requires an argument of the same type. ',' is not a bytes object - hence the complaint. You want

deviceInfoList=readBuffer.value.split(b',')

python split a bytes string

You guessed that you have to convert (i.e. decode) the bytes returned by a serial binary read in to a string, that you can parse further. Using bytes.decode("ascii") should do here (no strange chars on sight).

Then, convert to float, not int. Also, using str.split is more robust to changes if you know that the first field is your value:

>>> line = b'       0.000 kg \r\n'
>>> value = float(line.split()[0].decode("ascii"))

results in 0.0

How to split bytes into a list of integers in Python-3?

Just use the same method as on a regular string. Split and map to int():

[int(v) for v in bytesvalue.split()]

This works because bytes have many of the same methods (including bytes.split(), and the int() type accepts bytes values the same way it accepts str values:

If x is not a number or if base is given, then x must be a string, bytes, or bytearray instance representing an integer literal in radix base.

Demo:

>>> bytesvalue = b'34\n44\n-28\n-63\n22\n'
>>> bytesvalue.split()
[b'34', b'44', b'-28', b'-63', b'22']
>>> [int(v) for v in bytesvalue.split()]
[34, 44, -28, -63, 22]

Divide byte string and return as a byte string

Convert the list of bytes into a bytes object, and you can write the whole function body in one line.

def div2bstr(bstr):
return bytes(i//2 for i in bstr)

Efficient way to split a bytes array then convert it to string in Python

I like your way, it is explicit, the for loop is understandable by all and it isn't all that slow compared to other approaches.

Some suggestions I'd make would be to change your condition from if c != b'' to if c since a non-empty byte object will be truthy and, *don't name your list bytes, you mask the built-in! Name it bt or something similar :-)

Other options include itertools.takewhile which will grab elements from an iterable as long as a predicate holds; your operation would look like:

"".join(s.decode('utf-8') for s in takewhile(bool, bt))

This is slightly slower but is more compact, if you're a one-liner lover this might appeal to you.

Slightly faster and also compact is using index along with a slice:

"".join(b.decode('utf-8') for b in bt[:bt.index(b'')])

While compact it also suffers from readability.

In short, I'd go with the for loop since readability counts as very pythonic in my eyes.



Related Topics



Leave a reply



Submit