How to convert escaped characters?
>>> escaped_str = 'One \\\'example\\\''
>>> print escaped_str.encode('string_escape')
One \\\'example\\\'
>>> print escaped_str.decode('string_escape')
One 'example'
Several similar codecs are available, such as rot13 and hex.
The above is Python 2.x, but – since you said (below, in a comment) that you're using Python 3.x – while it's circumlocutious to decode a Unicode string object, it's still possible. The codec has been renamed to "unicode_escape" too:
Python 3.3a0 (default:b6aafb20e5f5, Jul 29 2011, 05:34:11)
[GCC 4.4.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> escaped_str = "One \\\'example\\\'"
>>> import codecs
>>> print(codecs.getdecoder("unicode_escape")(escaped_str)[0])
One 'example'
Convert escaped characters with python
(In this answer, I'm assuming you use Python 2.)
First, let me explain why your snippet returns something different than you expect:
r1 = json.dumps({"detalle":"el Expediente N\u00b0\u00a030 de la Resoluci\u00f3n 11..."}, ensure_ascii=False).encode('utf8')
print(r1)
r2 = json.dumps({"detalle":u"el Expediente N\u00b0\u00a030 de la Resoluci\u00f3n 11..."}, ensure_ascii=False).encode('utf8')
print(r2)
This outputs:
{"detalle": "el Expediente N\\u00b0\\u00a030 de la Resoluci\\u00f3n 11..."}
{"detalle": "el Expediente N° 30 de la Resolución 11..."}
The difference is, that in the first case, the input string is ascii code, with slashes and other characters to represent special characters, and in the second case, the string is a unicode string with unicode characters. The second case is what you want.
Based on this, here is what I understand from your problem:
Normally when you read a JSON file with the json
module, the strings (which are escaped in the JSON file) are unescaped by the parser. If you still see escaped characters, that indicates that the strings were (accidentally?) double escaped in the JSON file. In that case, try an extra unescape with s.decode('unicode-escape')
:
data["detalle"] = data["detalle"].decode('unicode-escape')
Once you have proper unicode strings loaded in Python, converting them to bytes with s.encode('utf8')
and writing the result to a file, is correct.
Python - String formatting convert escaped characters to literals
You can use either single or double quote to signify a string so if you string needs to contain one then use the other. Also you can use an r string to escape all special characters.
r'"{\"MachineType\":0,\"BrokenMachines\":6}"'
How to convert a string containing escape characters to a string
If you are looking to replace all escaped character codes, not only the code for @
, you can use this snippet of code to do the conversion:
public static string UnescapeCodes(string src) {
var rx = new Regex("\\\\([0-9A-Fa-f]+)");
var res = new StringBuilder();
var pos = 0;
foreach (Match m in rx.Matches(src)) {
res.Append(src.Substring(pos, m.Index - pos));
pos = m.Index + m.Length;
res.Append((char)Convert.ToInt32(m.Groups[1].ToString(), 16));
}
res.Append(src.Substring(pos));
return res.ToString();
}
The code relies on a regular expression to find all sequences of hex digits, converting them to int
, and casting the resultant value to a char
.
How to convert escape characters in HTML tags?
You can use the strconv.Unquote()
to do the conversion.
One thing you should be aware of is that strconv.Unquote()
can only unquote strings that are in quotes (e.g. start and end with a quote char "
or a back quote char `
), so we have to manually append that.
Example:
// Important to use backtick ` (raw string literal)
// else the compiler will unquote it (interpreted string literal)!
s := `\u003chtml\u003e`
fmt.Println(s)
s2, err := strconv.Unquote(`"` + s + `"`)
if err != nil {
panic(err)
}
fmt.Println(s2)
Output (try it on the Go Playground):
\u003chtml\u003e
<html>
Note: To do HTML text escaping and unescaping, you can use the html
package. Quoting its doc:
Package html provides functions for escaping and unescaping HTML text.
But the html
package (specifically html.UnescapeString()
) does not decode unicode sequences of the form \uxxxx
, only decimal;
or HH;
.
Example:
fmt.Println(html.UnescapeString(`\u003chtml\u003e`)) // wrong
fmt.Println(html.UnescapeString(`<html>`)) // good
fmt.Println(html.UnescapeString(`<html>`)) // good
Output (try it on the Go Playground):
\u003chtml\u003e
<html>
<html>
Note #2:
You should also note that if you write a code like this:
s := "\u003chtml\u003e"
This quoted string will be unquoted by the compiler itself as it is an interpreted string literal, so you can't really test that. To specify quoted string in the source, you may use the backtick to specify a raw string literal or you may use a double quoted interpreted string literal:
s := "\u003chtml\u003e" // Interpreted string literal (unquoted by the compiler!)
fmt.Println(s)
s2 := `\u003chtml\u003e` // Raw string literal (no unquoting will take place)
fmt.Println(s2)
s3 := "\\u003chtml\\u003e" // Double quoted interpreted string literal
// (unquoted by the compiler to be "single" quoted)
fmt.Println(s3)
Output:
<html>
\u003chtml\u003e
Convert backslash-escaped characters to literals, within a string
Try using Regex.Unescape
using System.Text.RegularExpressions;
...
string result=Regex.Unescape(@"this\x20is a\ntest");
This results in:
this is a
test
https://dotnetfiddle.net/y2f5GE
It might not work all the time as expected, please read the docs for details
How can I convert these escape sequences back to their original characters?
The json is still valid, but I usually always recommend to use Neftonsoft.Json since it has much much less problems, but you can use a string Replace as well
var jsonNode = JsonNode.Parse(testString);
var jsonString = jsonNode.ToJsonString().Replace("\\u003C","<").Replace("\\u003E",">");
result
{"testString":"<...>"}
another option is to use UnsafeRelaxedJsonEscaping, but it is not safe in some cases
var options = new JsonSerializerOptions
{
Encoder = JavaScriptEncoder.UnsafeRelaxedJsonEscaping,
WriteIndented = true
};
var jsonString = System.Text.Json.JsonSerializer.Serialize(jsonNode, options);
Convert data with escaped unicode characters to string
Assuming your data
has the same content as something like this:
let data = #"Pla\u010daj Izbri\u0161i"#.data(using: .utf8)!
print(data as NSData) //->{length = 24, bytes = 0x506c615c7530313064616a20497a6272695c753031363169}
You can decode it in this way:
public func decode(data: Data) throws -> String {
guard let text = String(data: data, encoding: .utf8) else {
throw SomeError()
}
let transform = StringTransform(rawValue: "Any-Hex/Java")
return text.applyingTransform(transform, reverse: true) ?? text
}
But, if you really get this sort of data from the web api, you should better tell the api engineer to use some normal encoding scheme.
Related Topics
Set Up Python Simplehttpserver on Windows
How to Remove Specific Tag/Sticker/Object from Images Using Opencv
How to Programmatically Set a Global (Module) Variable
Too Many Values to Unpack Calling Cv2.Findcontours
How to Find Which Columns Contain Any Nan Value in Pandas Dataframe
Child Processes Created with Python Multiprocessing Module Won't Print
Find the Indexes of All Regex Matches
Python Command Line Input in a Process
Flask-Sqlalchemy Update a Row's Information
Import Module Works in Terminal But Not in Idle
Difference Between Data and JSON Parameters in Python Requests Package
Pyplot Move Alternative Y Axis to Background
Split Views.Py in Several Files
Python - Using the Multiply Operator to Create Copies of Objects in Lists
How to Check If There Exists a Process with a Given Pid in Python
Python Regular Expression Pattern * Is Not Working as Expected