Does Ruby Provide a Way to Do File.Read() with Specified Encoding

Does Ruby provide a way to do File.read() with specified encoding?

From the fine manual:

read(name, [length [, offset]], open_args) → string

Opens the file, optionally seeks to the given offset, then returns length bytes (defaulting to the rest of the file). read ensures the file is closed before returning.

If the last argument is a hash, it specifies option for internal open().

So you can say things like this:

s = File.read('pancakes', :encoding => 'iso-8859-1')
s.encoding
#<Encoding:ISO-8859-1>

Ruby: Is there a way to specify your encoding in File.write?

AFIK you can't do it at the time of performing the write, but you can do it at the time of creating the File object; here an example of UTF8 encoding:

File.open(FILE_LOCATION, "w:UTF-8") do 
|f|
f.write(....)
end

Another possibility would be to use the external_encoding option:

File.open(FILE_LOCATION, "w", external_encoding: Encoding::UTF_8)

Of course this assumes that the data which is written, is a String. If you have (packed) binary data, you would use "wb" for openeing the file, and syswrite instead of write to write the data to the file.

UPDATE As engineersmnky points out in a comment, the arguments for the encoding can also be passed as parameter to the write method itself, for instance

IO::write(FILE_LOCATION, data_to_write, external_encoding: Encoding::UTF_8)

Reading contents from UTF-16 encoded file in Ruby

The lines method is deprecated. If you expect text to be an array with lines, then use readlines.

text = File.open(filepath,"rb:UTF-16LE"){ |file| file.readlines }

As the Tin Man says, it's better practise to process each line seperately, if possible:

File.open("test.csv", "rb:UTF-16LE") do |file|
file.each do |line|
p line
end
end

Ruby Encoding While File Writing

You need to open the file in binary to get the right encoding.

file = File.new(path, 'wb')

Check the encoding like this

puts file.encoding

It should be 'ASCII-8BIT'.
Do the same with your decrypted filecontent, it should be the same encoding, other wise you need to convert it like this.

Document.find(123).fetch_file.force_encoding('ASCII-8BIT')

You could also use File.binread(file) and File.binwrite(file, content)

http://ruby-doc.org/core-2.3.0/IO.html#method-c-binread

http://ruby-doc.org/core-2.3.0/IO.html#method-c-binwrite

ruby 1.9 wrong file encoding on windows

You're not specifying the encoding when you read the file. You're being very careful to specify it everywhere except there, but then you're reading it with the default encoding.

File.open('foo.txt', "w:iso-8859-1") {|f| f << 'fòo'.force_encoding('iso-8859-1')}
File.open('foo.txt', "r:iso-8859-1") {|f| puts f.read().encoding }

# => ISO-8859-1

Also note that you probably mean 'fòo'.encode('iso-8859-1') rather than 'fòo'.force_encoding('iso-8859-1'). The latter leaves the bytes unchanged, while the former transcodes the string.

Update: I'll elaborate a bit since I wasn't as clear or thorough as I could have been.

  1. If you don't specify an encoding with File.read(), the file will be read with Encoding.default_external. Since you're not setting that yourself, Ruby is using a value depending on the environment it's run in. In your Windows environment, it's IBM437; in your Cygwin environment, it's UTF-8. So my point above was that of course that's what the encoding is; it has to be, and it has nothing to do with what bytes are contained in the file. Ruby doesn't auto-detect encodings for you.

  2. force_encoding() doesn't change the bytes in a string, it only changes the Encoding attached to those bytes. If you tell Ruby "pretend this string is ISO-8859-1", then it won't transcode them when you tell it "please write this string as ISO-8859-1". encode() transcodes for you, as does writing to the file if you don't trick it into not doing so.

Putting those together, if you have a source file in ISO-8859-1:

# encoding: iso-8859-1

# Write in ISO-8859-1 regardless of default_external
File.open('foo.txt', "w:iso-8859-1") {|f| f << 'fòo'}

# Read in ISO-8859-1 regardless of default_external,
# transcoding if necessary to default_internal, if set
File.open('foo.txt', "r:iso-8859-1") {|f| puts f.read().encoding } # => ISO-8859-1

puts File.read('foo.txt').encoding # -> Whatever is specified by default_external

If you have a source file in UTF-8:

# encoding: utf-8

# Write in ISO-8859-1 regardless of default_external, transcoding from UTF-8
File.open('foo.txt', "w:iso-8859-1") {|f| f << 'fòo'}

# Read in ISO-8859-1 regardless of default_external,
# transcoding if necessary to default_internal, if set
File.open('foo.txt', "r:iso-8859-1") {|f| puts f.read().encoding } # => ISO-8859-1

puts File.read('foo.txt').encoding # -> Whatever is specified by default_external

Update 2, to answer your new questions:

  1. No, the # encoding: iso-8859-1 line does not change Encoding.default_external, it only tells Ruby that the source file itself is encoded in ISO-8859-1. Simply add

    Encoding.default_external = "iso-8859-1"

    if you expect all files that your read to be stored in that encoding.

  2. No, I don't personally think Ruby should auto-detect encodings, but reasonable people can disagree on that one, and a discussion of "should it be so" seems off-topic here.

  3. Personally, I use UTF-8 for everything, and in the rare circumstances that I can't control encoding, I manually set the encoding when I read the file, as demonstrated above. My source files are always in UTF-8. If you're dealing with files that you can't control and don't know the encoding of, the charguess gem or similar would be useful.

Ruby: File.read Error encoding: UTF-8

It seems you are using older Ruby version. Try this instead:

File.read(inputfile, :encoding => "UTF-8").gsub(/<group.*?type=\"public\".*?\/>/, "")


Related Topics



Leave a reply



Submit