Does Ruby provide a way to do File.read() with specified encoding?
From the fine manual:
read(name, [length [, offset]], open_args) → string
Opens the file, optionally seeks to the given
offset
, then returnslength
bytes (defaulting to the rest of the file).read
ensures the file is closed before returning.If the last argument is a hash, it specifies option for internal open().
So you can say things like this:
s = File.read('pancakes', :encoding => 'iso-8859-1')
s.encoding
#<Encoding:ISO-8859-1>
Ruby: Is there a way to specify your encoding in File.write?
AFIK you can't do it at the time of performing the write
, but you can do it at the time of creating the File
object; here an example of UTF8 encoding:
File.open(FILE_LOCATION, "w:UTF-8") do
|f|
f.write(....)
end
Another possibility would be to use the external_encoding
option:
File.open(FILE_LOCATION, "w", external_encoding: Encoding::UTF_8)
Of course this assumes that the data which is written, is a String
. If you have (packed) binary data, you would use "wb"
for openeing the file, and syswrite
instead of write
to write the data to the file.
UPDATE As engineersmnky points out in a comment, the arguments for the encoding can also be passed as parameter to the write
method itself, for instance
IO::write(FILE_LOCATION, data_to_write, external_encoding: Encoding::UTF_8)
Reading contents from UTF-16 encoded file in Ruby
The lines
method is deprecated. If you expect text
to be an array with lines, then use readlines
.
text = File.open(filepath,"rb:UTF-16LE"){ |file| file.readlines }
As the Tin Man says, it's better practise to process each line seperately, if possible:
File.open("test.csv", "rb:UTF-16LE") do |file|
file.each do |line|
p line
end
end
Ruby Encoding While File Writing
You need to open the file in binary to get the right encoding.
file = File.new(path, 'wb')
Check the encoding like this
puts file.encoding
It should be 'ASCII-8BIT'.
Do the same with your decrypted filecontent, it should be the same encoding, other wise you need to convert it like this.
Document.find(123).fetch_file.force_encoding('ASCII-8BIT')
You could also use File.binread(file)
and File.binwrite(file, content)
http://ruby-doc.org/core-2.3.0/IO.html#method-c-binread
http://ruby-doc.org/core-2.3.0/IO.html#method-c-binwrite
ruby 1.9 wrong file encoding on windows
You're not specifying the encoding when you read the file. You're being very careful to specify it everywhere except there, but then you're reading it with the default encoding.
File.open('foo.txt', "w:iso-8859-1") {|f| f << 'fòo'.force_encoding('iso-8859-1')}
File.open('foo.txt', "r:iso-8859-1") {|f| puts f.read().encoding }
# => ISO-8859-1
Also note that you probably mean 'fòo'.encode('iso-8859-1')
rather than 'fòo'.force_encoding('iso-8859-1')
. The latter leaves the bytes unchanged, while the former transcodes the string.
Update: I'll elaborate a bit since I wasn't as clear or thorough as I could have been.
If you don't specify an encoding with
File.read()
, the file will be read withEncoding.default_external
. Since you're not setting that yourself, Ruby is using a value depending on the environment it's run in. In your Windows environment, it's IBM437; in your Cygwin environment, it's UTF-8. So my point above was that of course that's what the encoding is; it has to be, and it has nothing to do with what bytes are contained in the file. Ruby doesn't auto-detect encodings for you.force_encoding()
doesn't change the bytes in a string, it only changes the Encoding attached to those bytes. If you tell Ruby "pretend this string is ISO-8859-1", then it won't transcode them when you tell it "please write this string as ISO-8859-1".encode()
transcodes for you, as does writing to the file if you don't trick it into not doing so.
Putting those together, if you have a source file in ISO-8859-1:
# encoding: iso-8859-1
# Write in ISO-8859-1 regardless of default_external
File.open('foo.txt', "w:iso-8859-1") {|f| f << 'fòo'}
# Read in ISO-8859-1 regardless of default_external,
# transcoding if necessary to default_internal, if set
File.open('foo.txt', "r:iso-8859-1") {|f| puts f.read().encoding } # => ISO-8859-1
puts File.read('foo.txt').encoding # -> Whatever is specified by default_external
If you have a source file in UTF-8:
# encoding: utf-8
# Write in ISO-8859-1 regardless of default_external, transcoding from UTF-8
File.open('foo.txt', "w:iso-8859-1") {|f| f << 'fòo'}
# Read in ISO-8859-1 regardless of default_external,
# transcoding if necessary to default_internal, if set
File.open('foo.txt', "r:iso-8859-1") {|f| puts f.read().encoding } # => ISO-8859-1
puts File.read('foo.txt').encoding # -> Whatever is specified by default_external
Update 2, to answer your new questions:
No, the
# encoding: iso-8859-1
line does not changeEncoding.default_external
, it only tells Ruby that the source file itself is encoded in ISO-8859-1. Simply addEncoding.default_external = "iso-8859-1"
if you expect all files that your read to be stored in that encoding.
No, I don't personally think Ruby should auto-detect encodings, but reasonable people can disagree on that one, and a discussion of "should it be so" seems off-topic here.
Personally, I use UTF-8 for everything, and in the rare circumstances that I can't control encoding, I manually set the encoding when I read the file, as demonstrated above. My source files are always in UTF-8. If you're dealing with files that you can't control and don't know the encoding of, the charguess gem or similar would be useful.
Ruby: File.read Error encoding: UTF-8
It seems you are using older Ruby version. Try this instead:
File.read(inputfile, :encoding => "UTF-8").gsub(/<group.*?type=\"public\".*?\/>/, "")
Related Topics
What's the Difference Between To_A and To_Ary
Best Practice for Rails App to Run a Long Task in the Background
How to Url Encode a String in Ruby
How to Randomly Sort (Scramble) an Array in Ruby
Gem Install Error (Sass Compass)
Set Attribute Dynamically of Ruby Object
Error Installing Nokogiri: Failed to Build Gem Native Extension & Libiconv Is Missing (Osx)
Ruby CSV - Get Current Line/Row Number
Why Do This Ruby Object Have Both To_S and Inspect Methods That Appear to Do the Same Thing
Rails - the System Cannot Find the Path Specified
Bundle Failing - Can't Find the Postgresql Client Library (Libpq)
Libxml2 Missing MAC Os X 10.10
Ruby on Rails: How to Edit Database.Yml for Postgresql
Finding Out Current Index in Each Loop (Ruby)
Foreman Only Shows Line with "Started with Pid #" and Nothing Else
Rails Resque Workers Fail with Pgerror: Server Closed the Connection Unexpectedly