How to split a byte string into separate bytes in python
You can use slicing on byte
objects:
>>> value = b'\x00\x01\x00\x02\x00\x03'
>>> value[:2]
b'\x00\x01'
>>> value[2:4]
b'\x00\x02'
>>> value[-2:]
b'\x00\x03'
When handling these frames, however, you probably also want to know about memoryview()
objects; these let you interpret the bytes as C datatypes without any extra work on your part, simply by casting a 'view' on the underlying bytes:
>>> mv = memoryview(value).cast('H')
>>> mv[0], mv[1], mv[2]
256, 512, 768
The mv
object is now a memory view interpreting every 2 bytes as an unsigned short; so it now has length 3 and each index is an integer value, based on the underlying bytes.
split byte string into lines
There is no reason to convert to string. Just give split
bytes parameters. Split strings with strings, bytes with bytes.
>>> a = b'asdf\nasdf'
>>> a.split(b'\n')
[b'asdf', b'asdf']
Stream split of a byte array in python
As per the docs, split returns a list, not a generator. You read one byte at a time and maintain your own line buffer, though, something like:
def get_lines_buffer(bytes_):
buff = bytearray()
for b in bytes_:
if b == b'\n':
yield buff.decode('utf-8')
buff = bytearray()
else:
buff.append(b)
if buff:
yield buff.decode('utf-8') # yield remaining buffer
for line in get_lines_buffer(b'123\n456\n789'):
print(line)
Or here's your find method:
def get_lines_find(bytes_):
a, b = 0, 0
while b < len(bytes_):
b = bytes_.find(b'\n', a)
if b == -1:
b = len(bytes_) # no further matches
s = bytes_[a:b]
a = b + 1
yield s.decode('utf-8')
for line in get_lines_find(b'123\n456\n789'):
print(line)
Comparing the two:
data = b'123\n456\n789\n' * int(1e5)
def test_buffer():
for _ in get_lines_buffer(data):
pass
def test_find():
for _ in get_lines_find(data):
pass
if __name__ == '__main__':
import timeit
time_buffer = timeit.timeit(
"test_buffer()",
setup="from __main__ import test_buffer",
number=5)
print(f'buffer method: {time_buffer:.3f}s')
time_find = timeit.timeit(
"test_find()",
setup="from __main__ import test_find",
number=5)
print(f'find method: {time_find:.3f}s')
Performance seems to be a bit slower with the "find" method:
buffer method: 8.027s
find method: 10.370s
Also note that bytes
is a built-in name, you shouldn't use that as a variable name.
Divide 'bytes' object into chunks in Python
bytes can be tricky.
First off, encoded_array = str(int_array).encode()
is not doing what you think. If you print encoded_array, you'll see that it's literally converting the to_string
value of int_array to bytes. This is why the first value of encoded_array is [
>>> encoded_array[:1]
b'['
Second, I'm not sure you want int_array = [i**2 for i in range(1, 100)]
to do what it's doing. It creates values up to 10,000. I'm unsure if you would like for this range to be between 0 and 256 if each element is to represent a byte. I'm going to assume you would like some data to be converted to bytes and split into 40 byte chunks and the data in this case is an array of integers.
First let's convert your int_array into an array of bytes. I'm going to convert each int into a 2 byte value and represent it in hex.
>>> hex_array = [x.to_bytes(2, byteorder="big") for x in int_array]
Now to split up the data into bytes of 40
>>> h
[]
>>> for x in range(0, int(round(len(hex_array)/20))):
... h.append(hex_array[:20])
... del hex_array[:20]
I'm splitting by 20 because each element holds 2 bytes
Now you'll have an array of max 40 byte collections. Time to join the collections together so we can transmit them!
>>> result
[]
>>> for package in h:
... joined_package = b';'.join(package)
... result.append(joined_package)
split bytes variable on newline
You'll need to do this (see the string.split
function for more details)...
for word in output.decode('utf-8').split('\n'):
print word
And you don't need to print word - you can do anything you want with it. This loop will iterate over every line in output
.
splitting string / bytes in Python 3
Both string
and bytes
have a split method, that requires an argument of the same type. ','
is not a bytes
object - hence the complaint. You want
deviceInfoList=readBuffer.value.split(b',')
python split a bytes string
You guessed that you have to convert (i.e. decode) the bytes returned by a serial binary read in to a string, that you can parse further. Using bytes.decode("ascii")
should do here (no strange chars on sight).
Then, convert to float, not int. Also, using str.split
is more robust to changes if you know that the first field is your value:
>>> line = b' 0.000 kg \r\n'
>>> value = float(line.split()[0].decode("ascii"))
results in 0.0
How to split bytes into a list of integers in Python-3?
Just use the same method as on a regular string. Split and map to int()
:
[int(v) for v in bytesvalue.split()]
This works because bytes
have many of the same methods (including bytes.split()
, and the int()
type accepts bytes
values the same way it accepts str
values:
If x is not a number or if base is given, then x must be a string,
bytes
, orbytearray
instance representing an integer literal in radix base.
Demo:
>>> bytesvalue = b'34\n44\n-28\n-63\n22\n'
>>> bytesvalue.split()
[b'34', b'44', b'-28', b'-63', b'22']
>>> [int(v) for v in bytesvalue.split()]
[34, 44, -28, -63, 22]
Divide byte string and return as a byte string
Convert the list of bytes into a bytes
object, and you can write the whole function body in one line.
def div2bstr(bstr):
return bytes(i//2 for i in bstr)
Efficient way to split a bytes array then convert it to string in Python
I like your way, it is explicit, the for
loop is understandable by all and it isn't all that slow compared to other approaches.
Some suggestions I'd make would be to change your condition from if c != b''
to if c
since a non-empty byte object will be truthy and, *don't name your list bytes
, you mask the built-in! Name it bt
or something similar :-)
Other options include itertools.takewhile
which will grab elements from an iterable as long as a predicate holds; your operation would look like:
"".join(s.decode('utf-8') for s in takewhile(bool, bt))
This is slightly slower but is more compact, if you're a one-liner lover this might appeal to you.
Slightly faster and also compact is using index
along with a slice:
"".join(b.decode('utf-8') for b in bt[:bt.index(b'')])
While compact it also suffers from readability.
In short, I'd go with the for loop since readability counts as very pythonic in my eyes.
Related Topics
Get Discord User Id from Username
How to Count the Number of Messages
Finding Out Who Got the Highest Mark Among the Students
How to Make a Tkinter Label Background Transparent
Macos: How to Downgrade Homebrew Python
How to Clear/Delete the Contents of a Tkinter Text Widget
Python: How to Check If Cell in CSV File Is Empty
Python Selenium - Element Is Not Currently Interactable and May Not Be Manipulated
How to Change Python Version in Anaconda Spyder
Replace Single Quote With Double Quote in a String Python
How to Divide Each Column of Pandas Dataframe by a Series
Plot Different Dataframes in the Same Figure
Webdriverexception: Message: Unknown Error: Chrome Failed to Start: Crashed
Python 3 Error - Typeerror: Input Expected At Most 1 Arguments, Got 3