Unescaping Characters in a String with Ruby

Best way to escape and unescape strings in Ruby?

Ruby 2.5 added String#undump as a complement to String#dump:

$ irb
irb(main):001:0> dumped_newline = "\n".dump
=> "\"\\n\""
irb(main):002:0> undumped_newline = dumped_newline.undump
=> "\n"

With it:

def escape(s)
s.dump[1..-2]
end

def unescape(s)
"\"#{s}\"".undump
end

$irb
irb(main):001:0> escape("\n \" \\")
=> "\\n \\\" \\\\"
irb(main):002:0> unescape("\\n \\\" \\\\")
=> "\n \" \\"

Unescaping special character sequences in Ruby strings

The text will be loaded exactly as it is in the file. If the file has the literal text \ and n instead of a newline, that is what will be loaded. If there is a known set of escapes you want to change, you could simply gsub them

line='abc\ndef\tghi'
line.gsub!('\n', "\n")
line.gsub!('\t', "\t")

Unescaping characters in a string with Ruby

I ran into this exact problem the other day. There is a bug in the json parser that HTTParty uses (Crack gem) - basically it uses a case-sensitive regexp for the Unicode sequences, so because Posterous puts out A-F instead of a-f, Crack isn't unescaping them. I submitted a pull request to fix this.

In the meantime HTTParty nicely lets you specify alternate parsers so you can do ::JSON.parse bypassing Crack entirely like this:

class JsonParser < HTTParty::Parser
def json
::JSON.parse(body)
end
end

class Posterous
include HTTParty
parser ::JsonParser

#....
end

Removing backslash (escape character) from a string

When you write:

input = "{ \"foo\": \"bar\", \"num\": 3}"

The actual string stored in input is:

{ "foo": "bar", "num": 3}

The escape \" here is interpreted by Ruby parser, so that it can distinguish between the boundary of a string (the left most and the right most "), and a normal character " in a string (the escaped ones).

String#delete deletes a character set specified the first parameter, rather than a pattern. All characters that is in the first parameter will be removed. So by writing

input.delete('\\"')

You got a string with all \ and " removed from input, rather than a string with all \" sequence removed from input. This is wrong for your case. It may cause unexpected behavior some time later.

String#gsub, however, substitute a pattern (either regular expression or plain string).

input.gsub('\\"', '')

means find all \" (two characters in a sequence) and replace them with empty string. Since there isn't \ in input, nothing got replaced. What you need is actually:

input.gsub('"', '')

How can I remove escape characters from string? UTF issue?

Your text does not have 'escape' characters. The .inspect version of the string shows these. Observe:

> s = gets
Hello "Michael"
#=> "Hello \"Michael\"\n"

> puts s
Hello "Michael"

> p s # The same as `puts s.inspect`
"Hello \"Michael\"\n"

However, the real answer is to process this XML file as XML. For example:

require 'nokogiri'                                # gem install nokogiri
doc = Nokogiri.XML( IO.read( 'mysonglist.xml' ) ) # Read and parse the XML file
songs = doc.css( 'Song' ) # Gives you a NodeList of song els
puts songs.map{ |s| s['name'] } # Print the name of all songs
puts songs.map{ |s| s['duration'] } # Print the durations (as strings)

mins_and_seconds = songs.map{ |s| (s['duration'].to_i/1000.0).divmod(60) }
#=> [ [ 4, 36.6 ], … ]


Related Topics



Leave a reply



Submit