Encoding Ruby on Rails Code

Encoding Ruby on Rails code?

Maybe you could host the application yourself.

This way nobody will have ever access to your code and you're clients will use the application everywhere via Internet and also will pay you for the support.

In order to host rails application the easiest way you could try http://heroku.com/ or even set a small VPS with apache and mod_passenger.

Ruby on Rails - Issue with encoding

You might try to force the conversion to UTF before storing to the database. This code will convert the original string, replacing invalid or undefined characters:

string.encode!("UTF-8", invalid: :replace, undef: :replace).force_encoding("utf-8") }

See this information on String#encode for more information.

If your encodings are matched, and you still have this issue, you can simply strip those non-ASCII characters from the strings with this gsub call:

x.map {|text| text.gsub!(/[^\001-\176]+/, "") }

The regex will remove any characters that are between ASCII code 1 (octal 001) and ASCII code 126 (octal 176). This effectively scrubs the string of any non-ASCII characters (and ASCII 0).

If you require "extended ASCII" for use with an international character set, such as ISO-8859 character set or Windows 1252, or even specific Unicode characters, you can extend the range to include those characters by changing the digits to include those characters.

Rails View Encoding Issues

The culprit that was leaking non UTF-8 characters in my template was an innocuous meta tag for Facebook Open Graph

%meta{property: "og:url", content: request.url}

And when the request is non-standard, this causes the encoding issue. Changing it to

%meta{property: "og:url", content: request.url.force_encoding('UTF-8')}

made the trick.

Ruby converting string encoding from ISO-8859-1 to UTF-8 not working

You assign a string, in UTF-8. It contains ä. UTF-8 represents ä with two bytes.

string = 'ä'
string.encoding
# => #<Encoding:UTF-8>
string.length
# 1
string.bytes
# [195, 164]

Then you force the bytes to be interpreted as if they were ISO-8859-1, without actually changing the underlying representation. This does not contain ä any more. It contains two characters, Ã and ¤.

string.force_encoding('iso-8859-1')
# => "\xC3\xA4"
string.length
# 2
string.bytes
# [195, 164]

Then you translate that into UTF-8. Since this is not reinterpretation but translation, you keep the two characters, but now encoded in UTF-8:

string = string.encode('utf-8')
# => "Ã¤" 
string.length
# 2
string.bytes
# [195, 131, 194, 164]

What you are missing is the fact that you originally don't have an ISO-8859-1 string, as you would from your Web-service - you have gibberish. Fortunately, this is all in your console tests; if you read the response of the website using the proper input encoding, it should all work okay.

For your console test, let's demonstrate that if you start with a proper ISO-8859-1 string, it all works:

string = 'Norrlandsvägen'.encode('iso-8859-1')
# => "Norrlandsv\xE4gen"
string = string.encode('utf-8')
# => "Norrlandsvägen"

EDIT For your specific problem, this should work:

require 'net/https'
uri = URI.parse("https://rusta.easycruit.com/intranet/careerbuilder_se/export/xml/full")
options = {
  :use_ssl => uri.scheme == 'https', 
  :verify_mode => OpenSSL::SSL::VERIFY_NONE
}
response = Net::HTTP.start(uri.host, uri.port, options) do |https|
  https.request(Net::HTTP::Get.new(uri.path))
end
body = response.body.force_encoding('ISO-8859-1').encode('UTF-8')

Rails ActiveRecord string field encoding vs Ruby String encoding

The simplest and, arguably, cleanest solution to this problem would be to not call Encoding.find directly, but have an utility method (perhaps in a module located at lib/yourapp) which knows about the encoding name differences you care about and falls back to Encoding.find for all other inputs:

module YourApp
  module DatabaseStringEncoding
    def find(name)
      case name
      when 'UTF8'
        Encoding::UTF_8
      ...
      else
        Encoding.find(name)
      end 
    end
  end

This is both easy to understand and discover (as opposed to modifying Encoding directly, which is not visible to the reader of the code which does the encoding). Based on such a find method, you could then go further and implement a method which automatically recodes a string to the database's string encoding using YourRecord.connection.encoding.

I know it would be more exciting to get Encoding.find to do exactly what you want, but I would argue that this "dumber" approach would actually be the better one. :-)

Ruby on Rails: UTF-8 encoding string that has %F1 in content

You need to tell Ruby what the encoding of the parsed string should be. It looks like you are working in Latin-1 to start with ('ISO-8859-1'). There are a few different options. If you want to limit this decision to just the string you are processing, you can use .force_encoding like this

require 'cgi'
unescaped_name = CGI::unescape( "John%20Da%F1e" ).force_encoding('ISO-8859-1')
#  => "John Da\xF1e"
unescaped_name.encode('UTF-8')
#  => "John Dañe"

Note that once the encoding is set up correctly, it already contains the correct characters, but you won't necessarily see that until you convert it to an encoding that you can display. So where I show "John Da\xF1e" that's only because my terminal is set to display UTF-8 - \xF1 is the byte for ñ in Latin-1 encoding.

As far as I can tell, the URI encoding for UTF-8 bytes of the same string in a single step looks like this:

"John%20Da%C3%B1e"
CGI::unescape( "John%20Da%C3%B1e" )
#  => "John Dañe"

Encoding Ruby on Rails Code