What's the Difference Between Uri.Escape and Cgi.Escape

What's the difference between URI.escape and CGI.escape?

There were some small differences, but the important point is that URI.escape has been deprecated in Ruby 1.9.2... so use CGI::escape or ERB::Util.url_encode.

There is a long discussion on ruby-core for those interested which also mentions WEBrick::HTTPUtils.escape and WEBrick::HTTPUtils.escape_form.

What is the difference between URI.escape and URI.encode in Ruby?

There is no difference. In Ruby 1.9.3 encode is simply an alias for escape.

[Edit] Note that those methods allow an "unsafe" descriptor of characters to encode:

URI.encode('http://my.web.com', /\W/) # => "http%3A%2F%2Fmy%2Eweb%2Ecom"

Thanks @muistooshort! =)

What's the difference between CGI.unescape and URI.decode_www_form_component?

These methods are very similar. They both accept a string and an encoding and return a string in the specified encoding with the % escapes decoded. But there are differences:

Invalid escapes

URI.decode_www_form_component raises an ArgumentError if the string contains invalid escape sequences.

URI.decode_www_form_component('%xz')
# ArgumentError: invalid %-encoding (%xz)

CGI.unescape simply ignores them.

CGI.unescape('%xz')
# "%xz"

Invalid encodings

CGI.unescape ignores your specified encoding if the result is invalid

p CGI.unescape("\u263a", 'ASCII')
# "☺"

URI.decode_www_form_component doesn't care

p URI.decode_www_form_component("\u263a", 'ASCII')
# "\xE2\x98\xBA"

Lastly (and I hesitate to even mention this), URI.decode_www_form_component is slightly faster because it uses a precomputed Hash to decode all 485 valid escape codes (it's case-sensitive), whereas CGI.unescape actually interprets the hex code and repacks it as a character.

Ruby 2.7 says URI.escape is obsolete, what replaces it?

There is no official RFC 3986-compliant URI escaper in the Ruby standard library today.

See Why is URI.escape() marked as obsolete and where is this REGEXP::UNSAFE constant? for background.

There are several methods that have various issues with them as you have discovered and pointed out in the comment:

  • They produce deprecation warnings
  • They do not claim standards compliance
  • They are not escaping in accordance with RFC 3986
  • They are implemented in tangentially related libraries

CGI.escape and URLEncoder.encode result are not matching

The %0A denotes a line break ("\n").

Perhaps you got the text from some input source (like user input, or a file), and you need to chomp the line break?

hash = "gFH6B8aN+yReGkBL2QS7X4O7d98=\n"
puts "hash: " + hash
# => hash: gFH6B8aN+yReGkBL2QS7X4O7d98=
puts "escaped hash: " + CGI.escape(hash.chomp)
# => escaped hash: gFH6B8aN%2ByReGkBL2QS7X4O7d98%3D

Why doesn't URI.escape escape single quotes?

For the same reason it doesn't escape ? or / or :, and so forth. URI.escape() only escapes characters that cannot be used in URLs at all, not characters that have a special meaning.

What you're looking for is CGI.escape():

require "cgi"
CGI.escape("foo'bar\" baz")
=> "foo%27bar%22+baz"


Related Topics



Leave a reply



Submit