How to Convert Escaped Characters

How to convert escaped characters?


>>> escaped_str = 'One \\\'example\\\''
>>> print escaped_str.encode('string_escape')
One \\\'example\\\'
>>> print escaped_str.decode('string_escape')
One 'example'

Several similar codecs are available, such as rot13 and hex.

The above is Python 2.x, but – since you said (below, in a comment) that you're using Python 3.x – while it's circumlocutious to decode a Unicode string object, it's still possible. The codec has been renamed to "unicode_escape" too:


Python 3.3a0 (default:b6aafb20e5f5, Jul 29 2011, 05:34:11)
[GCC 4.4.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> escaped_str = "One \\\'example\\\'"
>>> import codecs
>>> print(codecs.getdecoder("unicode_escape")(escaped_str)[0])
One 'example'

Convert escaped characters with python

(In this answer, I'm assuming you use Python 2.)

First, let me explain why your snippet returns something different than you expect:

r1 = json.dumps({"detalle":"el Expediente N\u00b0\u00a030 de la Resoluci\u00f3n 11..."}, ensure_ascii=False).encode('utf8')
print(r1)
r2 = json.dumps({"detalle":u"el Expediente N\u00b0\u00a030 de la Resoluci\u00f3n 11..."}, ensure_ascii=False).encode('utf8')
print(r2)

This outputs:

{"detalle": "el Expediente N\\u00b0\\u00a030 de la Resoluci\\u00f3n 11..."}
{"detalle": "el Expediente N° 30 de la Resolución 11..."}

The difference is, that in the first case, the input string is ascii code, with slashes and other characters to represent special characters, and in the second case, the string is a unicode string with unicode characters. The second case is what you want.

Based on this, here is what I understand from your problem:

Normally when you read a JSON file with the json module, the strings (which are escaped in the JSON file) are unescaped by the parser. If you still see escaped characters, that indicates that the strings were (accidentally?) double escaped in the JSON file. In that case, try an extra unescape with s.decode('unicode-escape'):

data["detalle"] = data["detalle"].decode('unicode-escape')

Once you have proper unicode strings loaded in Python, converting them to bytes with s.encode('utf8') and writing the result to a file, is correct.

Python - String formatting convert escaped characters to literals

You can use either single or double quote to signify a string so if you string needs to contain one then use the other. Also you can use an r string to escape all special characters.

r'"{\"MachineType\":0,\"BrokenMachines\":6}"'

How to convert a string containing escape characters to a string

If you are looking to replace all escaped character codes, not only the code for @, you can use this snippet of code to do the conversion:

public static string UnescapeCodes(string src) {
var rx = new Regex("\\\\([0-9A-Fa-f]+)");
var res = new StringBuilder();
var pos = 0;
foreach (Match m in rx.Matches(src)) {
res.Append(src.Substring(pos, m.Index - pos));
pos = m.Index + m.Length;
res.Append((char)Convert.ToInt32(m.Groups[1].ToString(), 16));
}
res.Append(src.Substring(pos));
return res.ToString();
}

The code relies on a regular expression to find all sequences of hex digits, converting them to int, and casting the resultant value to a char.

How to convert escape characters in HTML tags?

You can use the strconv.Unquote() to do the conversion.

One thing you should be aware of is that strconv.Unquote() can only unquote strings that are in quotes (e.g. start and end with a quote char " or a back quote char `), so we have to manually append that.

Example:

// Important to use backtick ` (raw string literal)
// else the compiler will unquote it (interpreted string literal)!

s := `\u003chtml\u003e`
fmt.Println(s)
s2, err := strconv.Unquote(`"` + s + `"`)
if err != nil {
panic(err)
}
fmt.Println(s2)

Output (try it on the Go Playground):

\u003chtml\u003e
<html>

Note: To do HTML text escaping and unescaping, you can use the html package. Quoting its doc:

Package html provides functions for escaping and unescaping HTML text.

But the html package (specifically html.UnescapeString()) does not decode unicode sequences of the form \uxxxx, only &#decimal; or &#xHH;.

Example:

fmt.Println(html.UnescapeString(`\u003chtml\u003e`)) // wrong
fmt.Println(html.UnescapeString(`<html>`)) // good
fmt.Println(html.UnescapeString(`<html>`)) // good

Output (try it on the Go Playground):

\u003chtml\u003e
<html>
<html>

Note #2:

You should also note that if you write a code like this:

s := "\u003chtml\u003e"

This quoted string will be unquoted by the compiler itself as it is an interpreted string literal, so you can't really test that. To specify quoted string in the source, you may use the backtick to specify a raw string literal or you may use a double quoted interpreted string literal:

s := "\u003chtml\u003e" // Interpreted string literal (unquoted by the compiler!)
fmt.Println(s)

s2 := `\u003chtml\u003e` // Raw string literal (no unquoting will take place)
fmt.Println(s2)

s3 := "\\u003chtml\\u003e" // Double quoted interpreted string literal
// (unquoted by the compiler to be "single" quoted)
fmt.Println(s3)

Output:

<html>
\u003chtml\u003e

Convert backslash-escaped characters to literals, within a string

Try using Regex.Unescape

using System.Text.RegularExpressions;
...

string result=Regex.Unescape(@"this\x20is a\ntest");

This results in:

this is a 
test

https://dotnetfiddle.net/y2f5GE

It might not work all the time as expected, please read the docs for details

How can I convert these escape sequences back to their original characters?

The json is still valid, but I usually always recommend to use Neftonsoft.Json since it has much much less problems, but you can use a string Replace as well

var jsonNode = JsonNode.Parse(testString);
var jsonString = jsonNode.ToJsonString().Replace("\\u003C","<").Replace("\\u003E",">");

result

{"testString":"<...>"}

another option is to use UnsafeRelaxedJsonEscaping, but it is not safe in some cases

var options = new JsonSerializerOptions
{
Encoder = JavaScriptEncoder.UnsafeRelaxedJsonEscaping,
WriteIndented = true
};
var jsonString = System.Text.Json.JsonSerializer.Serialize(jsonNode, options);

Convert data with escaped unicode characters to string

Assuming your data has the same content as something like this:

let data = #"Pla\u010daj Izbri\u0161i"#.data(using: .utf8)!
print(data as NSData) //->{length = 24, bytes = 0x506c615c7530313064616a20497a6272695c753031363169}

You can decode it in this way:

    public func decode(data: Data) throws -> String {
guard let text = String(data: data, encoding: .utf8) else {
throw SomeError()
}

let transform = StringTransform(rawValue: "Any-Hex/Java")
return text.applyingTransform(transform, reverse: true) ?? text
}

But, if you really get this sort of data from the web api, you should better tell the api engineer to use some normal encoding scheme.



Related Topics



Leave a reply



Submit