How Does Ruby handle bytes/binary?
To make a string that has an arbitrary sequence of bytes, do something like this:
binary_string = "\xE5\xA5\xBD"
The "\x" is a special escape to encode an arbitrary byte from hex, so "\xE5" means byte 0xE5.
Then try sending that string on the socket.
In ruby, how do I turn a text representation of a byte in to a byte?
So you're using /
markers, but you aren't actually asking about regexps, right?
I think this does what you want:
['FA'].pack('H*')
# => "\xFA"
There is no actual byte
type in ruby stdlib (I don't think? unless there's one I don't know about?), just Strings, that can be any number of bytes long (in this case, one). A single "byte" is typically represented as a 1-byte long String in ruby. #bytesize on a String will always return the length in bytes.
"\xFA".bytesize
# => 1
Your example happens not to be a valid UTF-8 character, by itself. Depending on exactly what you're doing and how you're environment is set up, your string might end up being tagged with a UTF-8 encoding by default. If you are dealing with binary data, and want to make sure the string is tagged as such, you might want to #force_encoding on it to be sure. It should NOT be neccesary when using #pack, the results should be tagged as ASCII-8BIT
already (which has a synonym of BINARY
, it's basically the "null encoding" used in ruby for binary data).
['FA'].pack('H*').encoding
=> #<Encoding:ASCII-8BIT
But if you're dealing with string objects holding what's meant to be binary data, not neccesarily valid character data in any encoding, it is useful to know you may sometimes need to do str.force_encoding("ASCII-8BIT")
(or force_encoding("BINARY")
, same thing), to make sure your string isn't tagged as a particular text encoding, which would make ruby complain when you try to do certain operations on it if it includes invalid bytes for that encoding -- or in other cases, possibly do the wrong thing
Actually for a regexp
Okay, you actually do want a regexp. So we have to take our string we created, and embed it in a regexp. Here's one way:
representation = "FA"
str = [representation].pack("H*")
# => "\xFA"
data = "\x01\xFA\xC2".force_encoding("BINARY")
regexp = Regexp.new(str)
data =~ regexp
# => 1 (matched on byte 1; the first byte of data is byte 0)
You see how I needed the force_encoding there on the data
string, otherwise ruby would default to it being a UTF-8 string (depending on ruby version and environment setup), and complain that those bytes aren't valid UTF-8.
In some cases you might need to explicitly set the regexp to handle binary data too, the docs say you can pass a second argument 'n'
to Regexp.new
to do that, but I've never done it.
How to unpack 4 bytes of binary data as 3 byte and 1 byte values?
That's a reasonable way. Another way (that doesn't involve backing up) would be
a, b, c = data.unpack("S>CC") # C doesn't have endianness
ab = a << 8 + b
Since your values are unsigned, you don't need to worry about sign extension when sticking them together.
And for completeness, you could also go in the opposite direction — unpack a single 32-bit int and split it up using bit operations.
ab, = data.unpack("L>")
a, b = ab >> 8, ab & 0xFF
Write binary file from text string that represents hex bytes in Ruby
You have to split the string by aligned bytes in the first place.
str.
each_char. # enumerator
each_slice(2). # bytes
map { |h, l| (h.to_i(16) * 16 + l.to_i(16)) }.
pack('C*')
#⇒ "\x00\x11\x04\x05\x94\x19c(\x01\x00\e#q\x000\x03\x81\x01\n"
or, even better:
str.
scan(/../).
map { |b| b.to_i(16) }.
pack('C*')
Now you might dump this to the file using e.g. IO#binwrite
.
Searching Binary Data in Ruby
If I understand your description correctly, whole file consists of a number of such "blocks" of a fixed structure?
In that case, I suggest scanning one by one, and skipping ones not of interest to you. So, your each step should do the following:
- Read 8 bytes (using
IO#readbytes
or a similar method) - From the read header, extract the
size
(first 4 bytes), and thetag
(second 4)- If the tag is the one you need, skip following 16 bytes and read
size-24
bytes. - If the tag is not of interest, skip following
size-16
bytes.
- If the tag is the one you need, skip following 16 bytes and read
- Repeat.
For skipping bytes, you can use IO#seek
.
How to convert string to bytes in Ruby?
Ruby already has a String#each_byte
method which is aliased to String#bytes
.
Prior to Ruby 1.9 strings were equivalent to byte arrays, i.e. a character was assumed to be a single byte. That's fine for ASCII text and various text encodings like Win-1252 and ISO-8859-1 but fails badly with Unicode, which we see more and more often on the web. Ruby 1.9+ is Unicode aware, and strings are no longer considered to be made up of bytes, but instead consist of characters, which can be multiple bytes long.
So, if you are trying to manipulate text as single bytes, you'll need to ensure your input is ASCII, or at least a single-byte-based character set. If you might have multi-byte characters you should use String#each_char
or String.split(//)
or String.unpack
with the U
flag.
What does // mean in
String.split(//)
//
is the same as using ''
. Either tells split
to return characters. You can also usually use chars
.
Ruby: How to convert a string to binary and write it to file
Try using double quotes:
data = "BZh91AY&SY\x94$|\x0e\x00\x00\x00\x81\x00\x03"
Then do as sepp2k suggested.
Related Topics
Rendering a JSON Object of a Join-Model and Its Associated Models
Activerecord Association Select Counts for Included Records
Trouble on Rendering a Template Passing a Local Variable
Does Multibyte Character Interfere with End-Line Character Within a Regex
Difference Between String.Scan and String.Split
Differencebetween 'Size' and 'Length' Methods
Capistrano 3 + Sprockets 3 + Rails 4.2.1 Won't Deploy
Split String Without Removing Delimiter
Rake Db:Migration Not Working on Travis-Ci Build
Breaking Ruby Module Across Several Files
Running Webrick Server in Background
Convert HTML to Plain Text (With Inclusion of <Br>S)
In Ruby What's the Difference Between Self.Method and a Method Within Class << Self
How to Escape a Single Quote in Ruby
What Is the Purpose of a 'Transient Do' Block in Factorybot Factories