In Ruby/Rails, How to Encode/Escape Special Characters in Urls

In Ruby/Rails, how can I encode/escape special characters in URLs?

Ruby has the built-in URI library, and the Addressable gem, in particular Addressable::URI

I prefer Addressable::URI. It's very full featured and handles the encoding for you when you use the query_values= method.

I've seen some discussions about URI going through some growing pains so I tend to leave it alone for handling encoding/escaping until these things get sorted out:

  • http://osdir.com/ml/ruby-core/2010-06/msg00324.html
  • http://osdir.com/ml/lang-ruby-core/2009-06/msg00350.html
  • http://osdir.com/ml/ruby-core/2011-06/msg00748.html

Ruby - how to encode URL without re-encoding already encoded characters

I can't think of a way to do this that isn't a little bit of a kludge. So I propose a little bit of a kludge.

URI.escape appears to work the way you want in all cases except when characters are already encoded. With that in mind we can take the result of URI.encode and use String#gsub to "un-encode" only those characters.

The below regular expression looks for %25 (an encoded %) followed by two hex digits, turning e.g. %252f back into %2f:

require "uri"

DOUBLE_ESCAPED_EXPR = /%25([0-9a-f]{2})/i

def escape_uri(uri)
URI.encode(uri).gsub(DOUBLE_ESCAPED_EXPR, '%\1')
end

puts escape_uri("https://www.example.com/url-déjà-vu")
# => https://www.example.com/url-d%C3%A9j%C3%A0-vu

puts escape_uri("https://somesite.com/page?stuff=stuff&%20")
# => https://somesite.com/page?stuff=stuff&%20

puts escape_uri("http://example.com/a%2fb")
# => http://example.com/a%2fb

I don't promise that this is foolproof, but hopefully it helps.

Unescape special characters correctly from the URL in Rails 3.0.3

You're right, it looks like you have an encoding problem somewhere. The 0xC5 character is "Å" in ISO-8859-1 (AKA Latin-1), in UTF-8 it would be %C3%85 in the URL.

I suspect that you're using JavaScript on the client side and that your JavaScript is using the old escape function to build the URL, escape has some issues with non-ASCII characters. If this is the case, then you should upgrade your JavaScript to use encodeURIComponent instead. Have a look at this little demo and you'll see what I'm talking about:

http://jsfiddle.net/ambiguous/U5A3k/

If you can't change the client-side script then you can do it the hard way in Ruby using force_encoding and encoding:

>> s = CGI.unescape('%C5rhus%2C%20Denmark')
=> "\xC5rhus, Denmark"
>> s.encoding
=> #<Encoding:UTF-8>
>> s.force_encoding('iso-8859-1')
=> "\xC5rhus, Denmark"
>> s.encoding
=> #<Encoding:ISO-8859-1>
>> s.encode!('utf-8')
=> "Århus, Denmark"
>> s.encoding
=> #<Encoding:UTF-8>

You should get something like "\xC5rhus, Denmark" from params and you could unmangle that with:

s = params[:whatever].force_encoding('iso-8859-1').encode('utf-8')

Dealing with this on the server side would be a last resort though, if your client-side code is sending back incorrectly encoded data then you'll be left with a pile of guesswork on the server to figure out what encoding was actually used to get it into the URL.

How to HTML encode/escape a string? Is there a built-in?

The h helper method:

<%=h "<p> will be preserved" %>

Rails url helper not encoding ampersands

I can't tell you a nicer way to deal with it - but I can explain why it's happening.

Ampersands are not invalid characters for a URL. Otherwise you'd have problems with: "http://host/pages/foo?bar=baz&style=foo_style" or whatever.

Edit:
Digging deeper into the source code, it looks like Rails uses CGI.escape only on parameters.

The helper, url-generators use url_for (under the covers), which eventually calls: http://apidock.com/rails/ActionController/Routing/Route/generate
Which calls stuff deep in the sprivate-methods of the source code... but eventually ends up calling CGI.escape
(first look in actionpack/lib/action_controller/routing/route.rb then in actionpack/lib/action_controller/routing/segments.rb )

End result is that on the url itself, rails uses URI.escape - which notably does not update ampersands at all:

>> CGI.escape('/my_foo_&_bar')
=> "%2Fmy_foo_%26_bar"
>> URI.escape('/my_foo_&_bar')
=> "/my_foo_&_bar"

There's currently nothing you can do about this without putting an actual feature-request onto the rails team.

...unless you have the option to choose not to use ampersands in your URLs
You can always gsub them out yourself for all URLs:

def my_clean_url(the_url)
return the_url.gsub(/&/,'_')
end
>> my_clean_url('/my_foo_&_bar')
=> "/my_foo___bar"

page_url(my_clean_url('/my_foo_&_bar'))

Ruby on rails escape unicode characters in search

Use htmlentities gem.

To convert from HTML entity to UTF-8 char:

require 'htmlentities'
HTMLEntities.new.decode('Ü') # => "Ü"

From UTF-8 to HTML entity:

require 'htmlentities'
HTMLEntities.new.encode("Ü", :named) # => "Ü"


Related Topics



Leave a reply



Submit