Ruby: How to determine if file being read is binary or text
gem install ptools
require 'ptools'
File.binary?(file)
Is there a way to find out presence of binary code in an uploaded file or text area in ruby?
You can't properly check this user submission in javascript, because there always are possible submit data without javascript checks. So you should validate it in rails. Your validation can be something like this (not tested).
class YourModel < ActiveRecord::Base
validate :proper_string
def proper_string
errors.add(:submission, "Only text allowed") unless submission.force_encoding("UTF-8").valid_encoding?
end
end
With javascript you can validate only for usability reason, but there look more like hacking attempt, but not user mistake.
How do I distinguish between 'binary' and 'text' files?
The spreadsheet software my company makes reads a number of binary file formats as well as text files.
We first look at the first few bytes for a magic number which we recognize. If we do not recognize the magic number of any of the binary types we read, then we look at up to the first 2K bytes of the file to see whether it appears to be a UTF-8, UTF-16 or a text file encoded in the current code page of the host operating system. If it passes none of these tests, we assume that it is not a file we can deal with and throw an appropriate exception.
find whether a zipped file is text or binary without unzipping it
the filemagic gem takes a file path
The filemagic gem's file
method takes a file path, but file
isn't the only method it has. A glance at the docs reveals it has an io
method, too.
all I have are these Entry and InputStream classes which are unique to ruby-zip
I wouldn't say InputStream is "unique to ruby-zip." From the docs (emphasis mine):
A InputStream inherits IOExtras::AbstractInputStream in order to provide an IO-like interface for reading from a single zip entry
So FileMagic has an io
method and Zip::InputStream is IO-like. That leads us to a pretty straightforward solution:
require 'filemagic'
require 'zip'
Zip::InputStream.open('/path/to/file.zip') do |io|
entry = io.get_next_entry
FileMagic.open(:mime) do |fm|
p fm.io(entry.get_input_stream)
end
end
Automatically open a file as binary with Ruby
Actually, the previous answer by Alex D is incomplete. While it's true that there is no "text" mode in Unix file systems, Ruby does make a difference between opening files in binary and non-binary mode:
s = File.open('/tmp/test.jpg', 'r') { |io| io.read }
s.encoding
=> #<Encoding:UTF-8>
is different from (note the "rb"
)
s = File.open('/tmp/test.jpg', 'rb') { |io| io.read }
s.encoding
=> #<Encoding:ASCII-8BIT>
The latter, as the docs say, set the external encoding to ASCII-8BIT which tells Ruby to not attempt to interpret the result at UTF-8. You can achieve the same thing by setting the encoding explicitly with s.force_encoding('ASCII-8BIT')
. This is key if you want to read binary into a string and move them around (e.g. saving them to a database, etc.).
Read binary file as string in Ruby
First, you should open the file as a binary file. Then you can read the entire file in, in one command.
file = File.open("path-to-file.tar.gz", "rb")
contents = file.read
That will get you the entire file in a string.
After that, you probably want to file.close
. If you don’t do that, file
won’t be closed until it is garbage-collected, so it would be a slight waste of system resources while it is open.
Related Topics
Are There Any Good Mutation Testing Tools for Ruby 1.9 and Rspec2
Delete Non-Utf Characters from a String in Ruby
How to Calculate 32 Bit Crc in Ruby on Rails
Ruby: How to Find Out If a Character Is a Letter or a Digit
What Are the Limitations of Opal
How to Push Keys and Values into an Empty Hash W/ Ruby
Good Explanation of Ruby Object Model -- Mainly, 'Classes Are Objects'
Create Hash from Array and Frequency
Ruby String with Usd "Money" Converted to Number
How to Enable Compression in Ruby on Rails
A Selenium Webdriver Exception
Accessing One Controller Variable in Another Controller in Rails
Is There a Ruby Http Client Library with a Response Cache
In a Sinatra App on Heroku, Session Is Not Shared Across Dynos