Read binary file as string in Ruby
First, you should open the file as a binary file. Then you can read the entire file in, in one command.
file = File.open("path-to-file.tar.gz", "rb")
contents = file.read
That will get you the entire file in a string.
After that, you probably want to file.close
. If you don’t do that, file
won’t be closed until it is garbage-collected, so it would be a slight waste of system resources while it is open.
Automatically open a file as binary with Ruby
Actually, the previous answer by Alex D is incomplete. While it's true that there is no "text" mode in Unix file systems, Ruby does make a difference between opening files in binary and non-binary mode:
s = File.open('/tmp/test.jpg', 'r') { |io| io.read }
s.encoding
=> #<Encoding:UTF-8>
is different from (note the "rb"
)
s = File.open('/tmp/test.jpg', 'rb') { |io| io.read }
s.encoding
=> #<Encoding:ASCII-8BIT>
The latter, as the docs say, set the external encoding to ASCII-8BIT which tells Ruby to not attempt to interpret the result at UTF-8. You can achieve the same thing by setting the encoding explicitly with s.force_encoding('ASCII-8BIT')
. This is key if you want to read binary into a string and move them around (e.g. saving them to a database, etc.).
Extract hex strings from binary file in Ruby
In order to get the bytes starting with 20
and ending with 00
you need to change the regex like this:
next unless line =~ /(.{8}2d.{4}2d.{4})20(.{4}3a.{4}3a.{4})|^20(.*?0?)0{2}/
Basically I changed only the last part of the regex from (^20.*)
to ^20(.*?0?)0{2}
.
Here's the explanation:
- starting from 20 -
^20
- match as little as possible -
.*?
- until you get to two consecutive 0s
0{2}
- the
0?
after.*?
handles the case where you haveX0 00
Also I'm not including 20
in the captured group since you are removing it later in the code anyways, so you can remove the .gsub(/20/, '')
in
p $3.gsub(/20/,"").gsub(/../) { |b| b.hex.chr }
Write binary file from text string that represents hex bytes in Ruby
You have to split the string by aligned bytes in the first place.
str.
each_char. # enumerator
each_slice(2). # bytes
map { |h, l| (h.to_i(16) * 16 + l.to_i(16)) }.
pack('C*')
#⇒ "\x00\x11\x04\x05\x94\x19c(\x01\x00\e#q\x000\x03\x81\x01\n"
or, even better:
str.
scan(/../).
map { |b| b.to_i(16) }.
pack('C*')
Now you might dump this to the file using e.g. IO#binwrite
.
Ruby: How to convert a string to binary and write it to file
Try using double quotes:
data = "BZh91AY&SY\x94$|\x0e\x00\x00\x00\x81\x00\x03"
Then do as sepp2k suggested.
Read binary file in chunks of different size with ruby
Just surround:
puts "############### Chunk No. 1 ######################"
chunkheader = z.read(16)
chunksize = z.read(4).unpack('H*')[0].hex
data = z.read(chunksize).unpack('H*')
puts chunkheader.unpack('H*')
puts chunksize
puts data
with loop:
while chunkheader = z.read(16) do
puts "############### Chunk ######################"
chunksize = z.read(4).unpack('H*')[0].hex
data = z.read(chunksize).unpack('H*')
puts chunkheader.unpack('H*')
puts chunksize
puts data
end
the loop above will be terminated as there is no more data in the file remained. Please note, that the snipped above is in general error-prone, since it expects the file to be not corrupted and will fail if last chunk header reports erroneous amount of bytes.
But in your case it seems to be ok.
Using binary data (strings in utf-8) from external file
If your file contains the literal escaped string:
\u306b\u3064\u3044\u3066
Then you will need to unescape it after reading. Ruby does this for you with string literals, which is why the second case worked for you. Taken from the answer to "Is this the best way to unescape unicode escape sequences in Ruby?", you can use this:
file = "c:\\...\\vlmList_unicode.txt" #\u306b\u3064\u3044\u3066
data = File.open(file, 'rb') { |io|
contents = io.read.gsub(/\\u([\da-fA-F]{4})/) { |m|
[$1].pack("H*").unpack("n*").pack("U*")
}
contents.split(/\t/)
}
Alternatively, if you will like to make it more readable, extract the substitution into a new method, and add it to the String
class:
class String
def unescape_unicode
self.gsub(/\\u([\da-fA-F]{4})/) { |m|
[$1].pack("H*").unpack("n*").pack("U*")
}
end
end
Then you can call:
file = "c:\\...\\vlmList_unicode.txt" #\u306b\u3064\u3044\u3066
data = File.open(file, 'rb') { |io|
io.read.unescape_unicode.split(/\t/)
}
Related Topics
Rails 3 Disabling Session Cookies
Reading the Last N Lines of a File in Ruby
How to Check If a Ruby Object Is a Boolean
Rails 4.0 Strong Parameters Nested Attributes With a Key That Points to a Hash
How to Remove the Bom from a Utf-8 Encoded File
How to Run Untrusted Ruby Code Inside a Safe Sandbox
Rails - Render :Action to Target Anchor Tag
Correct Way to Populate an Array with a Range in Ruby
Find Unused Code in a Rails App
List of Ruby Operators That Can Be Overridden/Implemented
How to Create a Sha1 Hash in Ruby
Ruby 1.9.2 How to Install Rmagick on Windows
Rake "Already Initialized Constant Wfkv_" Warning
Shortcut to Make Case/Switch Return a Value