Percent Encoding in Ruby

Percent encoding in Ruby

As I mentioned in my comment, equating the character ä as the codepoint 228 (0xE4) implies that you're dealing with the ISO 8859-1 character encoding.

So, you need to tell Ruby what encoding you want for your string.

str1 = "Hullo ängstrom" # uses whatever encoding is current, generally utf-8
str2 = str1.encode('iso-8859-1')

Then you can encode it as you like:

require 'cgi'
s2c = CGI.escape str2
#=> "Hullo+%E4ngstrom"

require 'uri'
s2u = URI.escape str2
#=> "Hullo%20%E4ngstrom"

Then, to reverse it, you must first (a) unescape the value, and then (b) turn the encoding back into what you're used to (likely UTF-8), telling Ruby what character encoding it should interpret the codepoints as:

s3a = CGI.unescape(s2c)  #=> "Hullo \xE4ngstrom"
puts s3a.encode('utf-8','iso-8859-1')
#=> "Hullo ängstrom"

s3b = URI.unescape(s2u) #=> "Hullo \xE4ngstrom"
puts s3b.encode('utf-8','iso-8859-1')
#=> "Hullo ängstrom"

How to URL encode a string in Ruby

str = "\x12\x34\x56\x78\x9a\xbc\xde\xf1\x23\x45\x67\x89\xab\xcd\xef\x12\x34\x56\x78\x9a".force_encoding('ASCII-8BIT')
puts CGI.escape str

=> "%124Vx%9A%BC%DE%F1%23Eg%89%AB%CD%EF%124Vx%9A"

URL encode every possible character

URI.escape was deprecated and replaced by CGI::escape which is RFC compliant by grabbing non-alphanum characters and converting them. This is the module that does it:

# https://ruby-doc.org/stdlib-2.4.3/libdoc/cgi/rdoc/CGI/Util.html

# File cgi/util.rb, line 11
def escape(string)
encoding = string.encoding
string.b.gsub(/([^ a-zA-Z0-9_.-]+)/) do |m|
'%' + m.unpack('H2' * m.bytesize).join('%').upcase
end.tr(' ', '+').force_encoding(encoding)
end

At the end of the day, it's the server that needs fixing, not your code. You can monkeypatch or fork CGI and remove the - from the regex, or gsub() the character.

Ruby - how to encode URL without re-encoding already encoded characters

I can't think of a way to do this that isn't a little bit of a kludge. So I propose a little bit of a kludge.

URI.escape appears to work the way you want in all cases except when characters are already encoded. With that in mind we can take the result of URI.encode and use String#gsub to "un-encode" only those characters.

The below regular expression looks for %25 (an encoded %) followed by two hex digits, turning e.g. %252f back into %2f:

require "uri"

DOUBLE_ESCAPED_EXPR = /%25([0-9a-f]{2})/i

def escape_uri(uri)
URI.encode(uri).gsub(DOUBLE_ESCAPED_EXPR, '%\1')
end

puts escape_uri("https://www.example.com/url-déjà-vu")
# => https://www.example.com/url-d%C3%A9j%C3%A0-vu

puts escape_uri("https://somesite.com/page?stuff=stuff&%20")
# => https://somesite.com/page?stuff=stuff&%20

puts escape_uri("http://example.com/a%2fb")
# => http://example.com/a%2fb

I don't promise that this is foolproof, but hopefully it helps.

Issue with percent encoding in paperclip document.url on s3

Use URI.unescape:

<%= URI.unescape(client.document.url) %>

Ruby - URL encoding

What about CGI::escape

You need to only encode the parameters though.

url = "http://xyz.com/hello?"
params = "name=john&msg=hello\nJohn\n\rgoodmorning¬e=last\night I went to \roger"

puts "#{url}#{CGI::escape(params)}"
# => "http://xyz.com/hello?name%3Djohn%26msg%3Dhello%0AJohn%0A%0Dgoodmorning%26note%3Dlast%0Aight+I+went+to+%0Doger"

Is there a function to url encode dot ('.') in ruby

You actually don't need to encode the dot. After the ? in the url, / and . don't have any specific meaning.

How to encode Email if it contains + to %2B in Ruby

The uri std-lib has a method for that URI::Escape#escape. URI extends the URI::Escape module, so also has this method.

URI.escape('test+@gmail.com', '+')
#=> "test%2B@gmail.com" ^ the characters to escape with URL encoding

However like @spickermann says in the comments:

Why do you want to encode the + in the URL but not the @? @ must be encoded too.

Parsing string to add to URL-encoded URL

In 2019, URI.encode is obsolete and should not be used.



require 'uri'

URI.encode("Hello there world")
#=> "Hello%20there%20world"
URI.encode("hello there: world, how are you")
#=> "hello%20there:%20world,%20how%20are%20you"

URI.decode("Hello%20there%20world")
#=> "Hello there world"


Related Topics



Leave a reply



Submit