In Ruby/Rails, how can I encode/escape special characters in URLs?
Ruby has the built-in URI library, and the Addressable gem, in particular Addressable::URI
I prefer Addressable::URI. It's very full featured and handles the encoding for you when you use the query_values=
method.
I've seen some discussions about URI going through some growing pains so I tend to leave it alone for handling encoding/escaping until these things get sorted out:
- http://osdir.com/ml/ruby-core/2010-06/msg00324.html
- http://osdir.com/ml/lang-ruby-core/2009-06/msg00350.html
- http://osdir.com/ml/ruby-core/2011-06/msg00748.html
Ruby - how to encode URL without re-encoding already encoded characters
I can't think of a way to do this that isn't a little bit of a kludge. So I propose a little bit of a kludge.
URI.escape
appears to work the way you want in all cases except when characters are already encoded. With that in mind we can take the result of URI.encode
and use String#gsub
to "un-encode" only those characters.
The below regular expression looks for %25
(an encoded %
) followed by two hex digits, turning e.g. %252f
back into %2f
:
require "uri"
DOUBLE_ESCAPED_EXPR = /%25([0-9a-f]{2})/i
def escape_uri(uri)
URI.encode(uri).gsub(DOUBLE_ESCAPED_EXPR, '%\1')
end
puts escape_uri("https://www.example.com/url-déjà-vu")
# => https://www.example.com/url-d%C3%A9j%C3%A0-vu
puts escape_uri("https://somesite.com/page?stuff=stuff&%20")
# => https://somesite.com/page?stuff=stuff&%20
puts escape_uri("http://example.com/a%2fb")
# => http://example.com/a%2fb
I don't promise that this is foolproof, but hopefully it helps.
Unescape special characters correctly from the URL in Rails 3.0.3
You're right, it looks like you have an encoding problem somewhere. The 0xC5 character is "Å" in ISO-8859-1 (AKA Latin-1), in UTF-8 it would be %C3%85
in the URL.
I suspect that you're using JavaScript on the client side and that your JavaScript is using the old escape
function to build the URL, escape
has some issues with non-ASCII characters. If this is the case, then you should upgrade your JavaScript to use encodeURIComponent
instead. Have a look at this little demo and you'll see what I'm talking about:
http://jsfiddle.net/ambiguous/U5A3k/
If you can't change the client-side script then you can do it the hard way in Ruby using force_encoding
and encoding
:
>> s = CGI.unescape('%C5rhus%2C%20Denmark')
=> "\xC5rhus, Denmark"
>> s.encoding
=> #<Encoding:UTF-8>
>> s.force_encoding('iso-8859-1')
=> "\xC5rhus, Denmark"
>> s.encoding
=> #<Encoding:ISO-8859-1>
>> s.encode!('utf-8')
=> "Århus, Denmark"
>> s.encoding
=> #<Encoding:UTF-8>
You should get something like "\xC5rhus, Denmark"
from params
and you could unmangle that with:
s = params[:whatever].force_encoding('iso-8859-1').encode('utf-8')
Dealing with this on the server side would be a last resort though, if your client-side code is sending back incorrectly encoded data then you'll be left with a pile of guesswork on the server to figure out what encoding was actually used to get it into the URL.
How to HTML encode/escape a string? Is there a built-in?
The h
helper method:
<%=h "<p> will be preserved" %>
Rails url helper not encoding ampersands
I can't tell you a nicer way to deal with it - but I can explain why it's happening.
Ampersands are not invalid characters for a URL. Otherwise you'd have problems with: "http://host/pages/foo?bar=baz&style=foo_style" or whatever.
Edit:
Digging deeper into the source code, it looks like Rails uses CGI.escape only on parameters.
The helper, url-generators use url_for (under the covers), which eventually calls: http://apidock.com/rails/ActionController/Routing/Route/generate
Which calls stuff deep in the sprivate-methods of the source code... but eventually ends up calling CGI.escape
(first look in actionpack/lib/action_controller/routing/route.rb then in actionpack/lib/action_controller/routing/segments.rb )
End result is that on the url itself, rails uses URI.escape - which notably does not update ampersands at all:
>> CGI.escape('/my_foo_&_bar')
=> "%2Fmy_foo_%26_bar"
>> URI.escape('/my_foo_&_bar')
=> "/my_foo_&_bar"
There's currently nothing you can do about this without putting an actual feature-request onto the rails team.
...unless you have the option to choose not to use ampersands in your URLs
You can always gsub them out yourself for all URLs:
def my_clean_url(the_url)
return the_url.gsub(/&/,'_')
end
>> my_clean_url('/my_foo_&_bar')
=> "/my_foo___bar"
page_url(my_clean_url('/my_foo_&_bar'))
Ruby on rails escape unicode characters in search
Use htmlentities gem.
To convert from HTML entity to UTF-8 char:
require 'htmlentities'
HTMLEntities.new.decode('Ü') # => "Ü"
From UTF-8 to HTML entity:
require 'htmlentities'
HTMLEntities.new.encode("Ü", :named) # => "Ü"
Related Topics
Why Does Openuri Treat Files Under 10Kb in Size as Stringio
Strictly Convert String to Integer (Or Nil)
Vi Input Mode in Command Line Matlab
Why Is Devise Not Displaying Authentication Errors on Sign in Page
How to Convert String to Bytes in Ruby
In Ruby, Should I Use ||= or If Defined? for Memoization
Learning Ruby: Recommended Blogs to Read
Getting the Full Rspec Test Name from Within a Before(:Each) Block
Cannot Load Such File -- 1.9/Bcrypt_Ext (Loaderror)
Ruby Does Not 'Ensure' When I 'Retry' in 'Rescue'
Using Gets() Gives "No Such File or Directory" Error When I Pass Arguments to My Script
What Is the Community-Preferred Ruby Unit Testing Framework
How to Create Md5 Hash with Hmac Module in Ruby
Appending to Rake Db:Seed in Rails and Running It Without Duplicating Data
String Interpolation When Not Using a String Literal
What's the Fastest Way to Check If a Word from One String Is in Another String