How Does Ruby Handle Bytes/Binary

How Does Ruby handle bytes/binary?

To make a string that has an arbitrary sequence of bytes, do something like this:

binary_string = "\xE5\xA5\xBD"

The "\x" is a special escape to encode an arbitrary byte from hex, so "\xE5" means byte 0xE5.

Then try sending that string on the socket.

In ruby, how do I turn a text representation of a byte in to a byte?

So you're using / markers, but you aren't actually asking about regexps, right?

I think this does what you want:

['FA'].pack('H*')
# => "\xFA"

There is no actual byte type in ruby stdlib (I don't think? unless there's one I don't know about?), just Strings, that can be any number of bytes long (in this case, one). A single "byte" is typically represented as a 1-byte long String in ruby. #bytesize on a String will always return the length in bytes.

"\xFA".bytesize
# => 1

Your example happens not to be a valid UTF-8 character, by itself. Depending on exactly what you're doing and how you're environment is set up, your string might end up being tagged with a UTF-8 encoding by default. If you are dealing with binary data, and want to make sure the string is tagged as such, you might want to #force_encoding on it to be sure. It should NOT be neccesary when using #pack, the results should be tagged as ASCII-8BIT already (which has a synonym of BINARY, it's basically the "null encoding" used in ruby for binary data).

['FA'].pack('H*').encoding
=> #<Encoding:ASCII-8BIT

But if you're dealing with string objects holding what's meant to be binary data, not neccesarily valid character data in any encoding, it is useful to know you may sometimes need to do str.force_encoding("ASCII-8BIT") (or force_encoding("BINARY"), same thing), to make sure your string isn't tagged as a particular text encoding, which would make ruby complain when you try to do certain operations on it if it includes invalid bytes for that encoding -- or in other cases, possibly do the wrong thing

Actually for a regexp

Okay, you actually do want a regexp. So we have to take our string we created, and embed it in a regexp. Here's one way:

representation = "FA"
str = [representation].pack("H*")
# => "\xFA"
data = "\x01\xFA\xC2".force_encoding("BINARY")
regexp = Regexp.new(str)
data =~ regexp
# => 1 (matched on byte 1; the first byte of data is byte 0)

You see how I needed the force_encoding there on the data string, otherwise ruby would default to it being a UTF-8 string (depending on ruby version and environment setup), and complain that those bytes aren't valid UTF-8.

In some cases you might need to explicitly set the regexp to handle binary data too, the docs say you can pass a second argument 'n' to Regexp.new to do that, but I've never done it.

How to unpack 4 bytes of binary data as 3 byte and 1 byte values?

That's a reasonable way. Another way (that doesn't involve backing up) would be

a, b, c = data.unpack("S>CC") # C doesn't have endianness
ab = a << 8 + b

Since your values are unsigned, you don't need to worry about sign extension when sticking them together.

And for completeness, you could also go in the opposite direction — unpack a single 32-bit int and split it up using bit operations.

ab, = data.unpack("L>")
a, b = ab >> 8, ab & 0xFF

Write binary file from text string that represents hex bytes in Ruby

You have to split the string by aligned bytes in the first place.

str.
  each_char.     # enumerator
  each_slice(2). # bytes
  map { |h, l| (h.to_i(16) * 16 + l.to_i(16)) }.
  pack('C*')

 #⇒ "\x00\x11\x04\x05\x94\x19c(\x01\x00\e#q\x000\x03\x81\x01\n"

or, even better:

str.
  scan(/../).
  map { |b| b.to_i(16) }.
  pack('C*')

Now you might dump this to the file using e.g. IO#binwrite.

Searching Binary Data in Ruby

If I understand your description correctly, whole file consists of a number of such "blocks" of a fixed structure?

In that case, I suggest scanning one by one, and skipping ones not of interest to you. So, your each step should do the following:

Read 8 bytes (using IO#readbytes or a similar method)
From the read header, extract the size (first 4 bytes), and the tag (second 4)
1. If the tag is the one you need, skip following 16 bytes and read size-24 bytes.
2. If the tag is not of interest, skip following size-16 bytes.
Repeat.

For skipping bytes, you can use IO#seek.

How to convert string to bytes in Ruby?

Ruby already has a String#each_byte method which is aliased to String#bytes.

Prior to Ruby 1.9 strings were equivalent to byte arrays, i.e. a character was assumed to be a single byte. That's fine for ASCII text and various text encodings like Win-1252 and ISO-8859-1 but fails badly with Unicode, which we see more and more often on the web. Ruby 1.9+ is Unicode aware, and strings are no longer considered to be made up of bytes, but instead consist of characters, which can be multiple bytes long.

So, if you are trying to manipulate text as single bytes, you'll need to ensure your input is ASCII, or at least a single-byte-based character set. If you might have multi-byte characters you should use String#each_char or String.split(//) or String.unpack with the U flag.

What does // mean in String.split(//)

// is the same as using ''. Either tells split to return characters. You can also usually use chars.

Ruby: How to convert a string to binary and write it to file

Try using double quotes:

data = "BZh91AY&SY\x94$|\x0e\x00\x00\x00\x81\x00\x03"

Then do as sepp2k suggested.

How Does Ruby Handle Bytes/Binary