Ruby: How to Determine If File Being Read Is Binary or Text

Ruby: How to determine if file being read is binary or text

gem install ptools
require 'ptools'
File.binary?(file)

Is there a way to find out presence of binary code in an uploaded file or text area in ruby?

You can't properly check this user submission in javascript, because there always are possible submit data without javascript checks. So you should validate it in rails. Your validation can be something like this (not tested).

class YourModel < ActiveRecord::Base
validate :proper_string
def proper_string
errors.add(:submission, "Only text allowed") unless submission.force_encoding("UTF-8").valid_encoding?
end
end

With javascript you can validate only for usability reason, but there look more like hacking attempt, but not user mistake.

How do I distinguish between 'binary' and 'text' files?

The spreadsheet software my company makes reads a number of binary file formats as well as text files.

We first look at the first few bytes for a magic number which we recognize. If we do not recognize the magic number of any of the binary types we read, then we look at up to the first 2K bytes of the file to see whether it appears to be a UTF-8, UTF-16 or a text file encoded in the current code page of the host operating system. If it passes none of these tests, we assume that it is not a file we can deal with and throw an appropriate exception.

find whether a zipped file is text or binary without unzipping it

the filemagic gem takes a file path

The filemagic gem's file method takes a file path, but file isn't the only method it has. A glance at the docs reveals it has an io method, too.

all I have are these Entry and InputStream classes which are unique to ruby-zip

I wouldn't say InputStream is "unique to ruby-zip." From the docs (emphasis mine):

A InputStream inherits IOExtras::AbstractInputStream in order to provide an IO-like interface for reading from a single zip entry

So FileMagic has an io method and Zip::InputStream is IO-like. That leads us to a pretty straightforward solution:

require 'filemagic'
require 'zip'

Zip::InputStream.open('/path/to/file.zip') do |io|
entry = io.get_next_entry

FileMagic.open(:mime) do |fm|
p fm.io(entry.get_input_stream)
end
end

Automatically open a file as binary with Ruby

Actually, the previous answer by Alex D is incomplete. While it's true that there is no "text" mode in Unix file systems, Ruby does make a difference between opening files in binary and non-binary mode:

s = File.open('/tmp/test.jpg', 'r') { |io| io.read }
s.encoding
=> #<Encoding:UTF-8>

is different from (note the "rb")

s = File.open('/tmp/test.jpg', 'rb') { |io| io.read }
s.encoding
=> #<Encoding:ASCII-8BIT>

The latter, as the docs say, set the external encoding to ASCII-8BIT which tells Ruby to not attempt to interpret the result at UTF-8. You can achieve the same thing by setting the encoding explicitly with s.force_encoding('ASCII-8BIT'). This is key if you want to read binary into a string and move them around (e.g. saving them to a database, etc.).

Read binary file as string in Ruby

First, you should open the file as a binary file. Then you can read the entire file in, in one command.

file = File.open("path-to-file.tar.gz", "rb")
contents = file.read

That will get you the entire file in a string.

After that, you probably want to file.close. If you don’t do that, file won’t be closed until it is garbage-collected, so it would be a slight waste of system resources while it is open.



Related Topics



Leave a reply



Submit