Json.Loads() Decodes Only With Raw String Literal

Python breaks parsing json with characters \"

You almost certainly did not define properly escaped backslashes. If you define the string properly the JSON parses just fine:

>>> import json
>>> json_str = r'''
... {
... "publisher": "\"O'Reilly Media, Inc.\""
... }
... ''' # raw string to prevent the \" from being interpreted by Python
>>> json.loads(json_str)
{u'publisher': u'"O\'Reilly Media, Inc."'}

Note that I used a raw string literal to define the string in Python; if I did not, the \" would be interpreted by Python and a regular " would be inserted. You'd have to double the backslash otherwise:

>>> print '\"'
"
>>> print '\\"'
\"
>>> print r'\"'
\"

Reencoding the parsed Python structure back to JSON shows the backslashes re-appearing, with the repr() output for the string using the same double backslash:

>>> json.dumps(json.loads(json_str))
'{"publisher": "\\"O\'Reilly Media, Inc.\\""}'
>>> print json.dumps(json.loads(json_str))
{"publisher": "\"O'Reilly Media, Inc.\""}

If you did not escape the \ escape you'll end up with unescaped quotes:

>>> json_str_improper = '''
... {
... "publisher": "\"O'Reilly Media, Inc.\""
... }
... '''
>>> print json_str_improper

{
"publisher": ""O'Reilly Media, Inc.""
}

>>> json.loads(json_str_improper)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/mj/Development/Library/buildout.python/parts/opt/lib/python2.7/json/__init__.py", line 338, in loads
return _default_decoder.decode(s)
File "/Users/mj/Development/Library/buildout.python/parts/opt/lib/python2.7/json/decoder.py", line 366, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/Users/mj/Development/Library/buildout.python/parts/opt/lib/python2.7/json/decoder.py", line 382, in raw_decode
obj, end = self.scan_once(s, idx)
ValueError: Expecting , delimiter: line 3 column 20 (char 22)

Note that the \" sequences now are printed as ", the backslash is gone!

python repr() function doesnt return \"

You just can't declare your from_page variable like that, because escaped quotation mark in a regular string is just a quotation mark:

>>> var = '\"'
>>> len(var)
1

There is only one character in the var variable - the quotation mark. No backslash to convert to double backslash. So when you declare your from_page, it's content is just like displayed (note the quotation marks):

'{"data": "There is start: "Hello there!". There is end."}'
^----^ ^----------------^ ^---------------^

You need to declare your string as 'raw', adding an 'r' character in front of it:

>>> from_page = r'{"data": "There is start: \"Hello there!\". There is end."}'
>> from_page
'{"data": "There is start: \\"Hello there!\\". There is end."}'
>>> json.loads(from_page)
{'data': 'There is start: "Hello there!". There is end.'}

json module bug in Python 3.4.1?

The \' sequence is invalid JSON. Single quotes do not need to be escaped, making this an invalid string escape.

You could try to repair it after the fact:

import re

data = re.sub(r"(?<!\\)\\'", "'", data)

before loading it with JSON. This replaces \' with plain ', provided the backslash wasn't already escaped by a preceding \.

Since single quotes can only appear in string values, this should be safe.



Related Topics



Leave a reply



Submit