Python breaks parsing json with characters \"
You almost certainly did not define properly escaped backslashes. If you define the string properly the JSON parses just fine:
>>> import json
>>> json_str = r'''
... {
... "publisher": "\"O'Reilly Media, Inc.\""
... }
... ''' # raw string to prevent the \" from being interpreted by Python
>>> json.loads(json_str)
{u'publisher': u'"O\'Reilly Media, Inc."'}
Note that I used a raw string literal to define the string in Python; if I did not, the \"
would be interpreted by Python and a regular "
would be inserted. You'd have to double the backslash otherwise:
>>> print '\"'
"
>>> print '\\"'
\"
>>> print r'\"'
\"
Reencoding the parsed Python structure back to JSON shows the backslashes re-appearing, with the repr()
output for the string using the same double backslash:
>>> json.dumps(json.loads(json_str))
'{"publisher": "\\"O\'Reilly Media, Inc.\\""}'
>>> print json.dumps(json.loads(json_str))
{"publisher": "\"O'Reilly Media, Inc.\""}
If you did not escape the \
escape you'll end up with unescaped quotes:
>>> json_str_improper = '''
... {
... "publisher": "\"O'Reilly Media, Inc.\""
... }
... '''
>>> print json_str_improper
{
"publisher": ""O'Reilly Media, Inc.""
}
>>> json.loads(json_str_improper)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/mj/Development/Library/buildout.python/parts/opt/lib/python2.7/json/__init__.py", line 338, in loads
return _default_decoder.decode(s)
File "/Users/mj/Development/Library/buildout.python/parts/opt/lib/python2.7/json/decoder.py", line 366, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/Users/mj/Development/Library/buildout.python/parts/opt/lib/python2.7/json/decoder.py", line 382, in raw_decode
obj, end = self.scan_once(s, idx)
ValueError: Expecting , delimiter: line 3 column 20 (char 22)
Note that the \"
sequences now are printed as "
, the backslash is gone!
python repr() function doesnt return \"
You just can't declare your from_page
variable like that, because escaped quotation mark in a regular string is just a quotation mark:
>>> var = '\"'
>>> len(var)
1
There is only one character in the var
variable - the quotation mark. No backslash to convert to double backslash. So when you declare your from_page
, it's content is just like displayed (note the quotation marks):
'{"data": "There is start: "Hello there!". There is end."}'
^----^ ^----------------^ ^---------------^
You need to declare your string as 'raw', adding an 'r' character in front of it:
>>> from_page = r'{"data": "There is start: \"Hello there!\". There is end."}'
>> from_page
'{"data": "There is start: \\"Hello there!\\". There is end."}'
>>> json.loads(from_page)
{'data': 'There is start: "Hello there!". There is end.'}
json module bug in Python 3.4.1?
The \'
sequence is invalid JSON. Single quotes do not need to be escaped, making this an invalid string escape.
You could try to repair it after the fact:
import re
data = re.sub(r"(?<!\\)\\'", "'", data)
before loading it with JSON. This replaces \'
with plain '
, provided the backslash wasn't already escaped by a preceding \
.
Since single quotes can only appear in string values, this should be safe.
Related Topics
Typeerror: Missing 1 Required Positional Argument: 'Self'
How to Block Comment Code in the Ipython Notebook
Modulenotfounderror: What Does It Mean _Main_ Is Not a Package
Python List - Only Keep Only-Positive or Only-Negative Values
How to Format an Integer to a Two Digit Hex
Efficient Date Range Overlap Calculation
Python Divide by Zero Encountered in Log - Logistic Regression
Access Is Denied When Trying to Pip Install a Package on Windows
Python While Loop for Finding Prime Numbers
Finding the Value of the Min and Max Pixel
Plot Two Histograms on Single Chart With Matplotlib
Python: Pickle.Load() Raising Eoferror
How to Find the Average of Particular Numbers in a CSV File
Is There an Easy Way in Python to Wait Until Certain Condition Is True