Read Binary File as String in Ruby

Read binary file as string in Ruby

First, you should open the file as a binary file. Then you can read the entire file in, in one command.

file = File.open("path-to-file.tar.gz", "rb")
contents = file.read

That will get you the entire file in a string.

After that, you probably want to file.close. If you don’t do that, file won’t be closed until it is garbage-collected, so it would be a slight waste of system resources while it is open.

Automatically open a file as binary with Ruby

Actually, the previous answer by Alex D is incomplete. While it's true that there is no "text" mode in Unix file systems, Ruby does make a difference between opening files in binary and non-binary mode:

s = File.open('/tmp/test.jpg', 'r') { |io| io.read }
s.encoding
=> #<Encoding:UTF-8>

is different from (note the "rb")

s = File.open('/tmp/test.jpg', 'rb') { |io| io.read }
s.encoding
=> #<Encoding:ASCII-8BIT>

The latter, as the docs say, set the external encoding to ASCII-8BIT which tells Ruby to not attempt to interpret the result at UTF-8. You can achieve the same thing by setting the encoding explicitly with s.force_encoding('ASCII-8BIT'). This is key if you want to read binary into a string and move them around (e.g. saving them to a database, etc.).

Extract hex strings from binary file in Ruby

In order to get the bytes starting with 20 and ending with 00 you need to change the regex like this:

next unless line =~ /(.{8}2d.{4}2d.{4})20(.{4}3a.{4}3a.{4})|^20(.*?0?)0{2}/

Basically I changed only the last part of the regex from (^20.*) to ^20(.*?0?)0{2}.
Here's the explanation:

  • starting from 20 - ^20
  • match as little as possible - .*?
  • until you get to two consecutive 0s 0{2}
  • the 0? after .*? handles the case where you have X0 00

Also I'm not including 20 in the captured group since you are removing it later in the code anyways, so you can remove the .gsub(/20/, '') in

p $3.gsub(/20/,"").gsub(/../) { |b| b.hex.chr }         

Write binary file from text string that represents hex bytes in Ruby

You have to split the string by aligned bytes in the first place.

str.
each_char. # enumerator
each_slice(2). # bytes
map { |h, l| (h.to_i(16) * 16 + l.to_i(16)) }.
pack('C*')

#⇒ "\x00\x11\x04\x05\x94\x19c(\x01\x00\e#q\x000\x03\x81\x01\n"

or, even better:

str.
scan(/../).
map { |b| b.to_i(16) }.
pack('C*')

Now you might dump this to the file using e.g. IO#binwrite.

Ruby: How to convert a string to binary and write it to file

Try using double quotes:

data = "BZh91AY&SY\x94$|\x0e\x00\x00\x00\x81\x00\x03"

Then do as sepp2k suggested.

Read binary file in chunks of different size with ruby

Just surround:

puts "############### Chunk No. 1 ######################"

chunkheader = z.read(16)
chunksize = z.read(4).unpack('H*')[0].hex
data = z.read(chunksize).unpack('H*')

puts chunkheader.unpack('H*')
puts chunksize
puts data

with loop:

while chunkheader = z.read(16) do
puts "############### Chunk ######################"
chunksize = z.read(4).unpack('H*')[0].hex
data = z.read(chunksize).unpack('H*')

puts chunkheader.unpack('H*')
puts chunksize
puts data
end

the loop above will be terminated as there is no more data in the file remained. Please note, that the snipped above is in general error-prone, since it expects the file to be not corrupted and will fail if last chunk header reports erroneous amount of bytes.

But in your case it seems to be ok.

Using binary data (strings in utf-8) from external file

If your file contains the literal escaped string:

\u306b\u3064\u3044\u3066

Then you will need to unescape it after reading. Ruby does this for you with string literals, which is why the second case worked for you. Taken from the answer to "Is this the best way to unescape unicode escape sequences in Ruby?", you can use this:

file  = "c:\\...\\vlmList_unicode.txt" #\u306b\u3064\u3044\u3066
data = File.open(file, 'rb') { |io|
contents = io.read.gsub(/\\u([\da-fA-F]{4})/) { |m|
[$1].pack("H*").unpack("n*").pack("U*")
}
contents.split(/\t/)
}

Alternatively, if you will like to make it more readable, extract the substitution into a new method, and add it to the String class:

class String
def unescape_unicode
self.gsub(/\\u([\da-fA-F]{4})/) { |m|
[$1].pack("H*").unpack("n*").pack("U*")
}
end
end

Then you can call:

file  = "c:\\...\\vlmList_unicode.txt" #\u306b\u3064\u3044\u3066
data = File.open(file, 'rb') { |io|
io.read.unescape_unicode.split(/\t/)
}


Related Topics



Leave a reply



Submit