How to Automatically Fix an Invalid JSON String

How do I automatically fix an invalid JSON string?

The answer by @Michael gave me an idea... not a very pretty idea, but it seems to work, at least on your example: Try to parse the JSON string, and if it fails, look for the character where it failed in the exception string1 and replace that character.

while True:
try:
result = json.loads(s) # try to parse...
break # parsing worked -> exit loop
except Exception as e:
# "Expecting , delimiter: line 34 column 54 (char 1158)"
# position of unexpected character after '"'
unexp = int(re.findall(r'\(char (\d+)\)', str(e))[0])
# position of unescaped '"' before that
unesc = s.rfind(r'"', 0, unexp)
s = s[:unesc] + r'\"' + s[unesc+1:]
# position of correspondig closing '"' (+2 for inserted '\')
closg = s.find(r'"', unesc + 2)
s = s[:closg] + r'\"' + s[closg+1:]
print result

You may want to add some additional checks to prevent this from ending in an infinite loop (e.g., at max as many repetitions as there are characters in the string). Also, this will still not work if an incorrect " is actually followed by a comma, as pointed out by @gnibbler.

Update: This seems to work pretty well now (though still not perfect), even if the unescaped " is followed by a comma, or closing bracket, as in this case it will likely get a complaint about a syntax error after that (expected property name, etc.) and trace back to the last ". It also automatically escapes the corresponding closing " (assuming there is one).


1) The exception's str is "Expecting , delimiter: line XXX column YYY (char ZZZ)", where ZZZ is the position in the string where the error occurred. Note, though, that this message may depend on the version of Python, the json module, the OS, or the locale, and thus this solution may have to be adapted accordingly.

Most efficient way to fix an invalid JSON

You need to run this through JavaScript. Fire up a JavaScript parser in .net. Give the string as input to JavaScript and use JavaScript's native JSON.stringify to convert:

obj = {    "user":'180111',    "title":'I\'m sure "E pluribus unum" means \'Out of Many, One.\' \n\nhttp://en.wikipedia.org/wiki/E_pluribus_unum.\n\n',    "date":'2007/01/10 19:48:38',    "id":"3322121",    "previd":"112211",    "body":"\'You\' can \"read\" more here [url=http:\/\/en.wikipedia.org\/?search=E_pluribus_unum]E pluribus unum[\/url]'s. Cheers \\*/ :\/",    "from":"112221",    "username":"mikethunder",    "creationdate":"2007\/01\/10 14:04:49"}
console.log(JSON.stringify(obj));document.write(JSON.stringify(obj));

Python - Invalid JSON format - how to parse

It goes without saying that the better solution would be to fix the broken data at the source. But if you can't do that, you could try and fix the problem with a simple regex. Simple, as in "will fail if you throw anything more complicated at it", but likely sufficient as a quick and dirty solution:

import re
import json
with open("almost.json") as infile:
jstring = infile.read()
data = json.loads(re.sub(r"(\w+):", r'"\1":', jstring))

Correct an invalid Json string and convert it to C# Class

your json is not valid, it has extra {} on the sides. Try this

var json=...your json

json=json.Substring(1,json.Length-2);

var jsonDeserialized = JsonConvert.DeserializeObject<Data[]>(json);

and class

public class Data
{
public List<object> tmcp_post_fields { get; set; }
public int product_id { get; set; }
public bool per_product_pricing { get; set; }
public string cpf_product_price { get; set; }
public bool variation_id { get; set; }
public string form_prefix { get; set; }
public string tc_added_in_currency { get; set; }
public string tc_default_currency { get; set; }
}

Another option is to use insert string. This option is even better since you can use parse json string as well.

json=json.Insert(1,"result:");
var jsonDeserialized = JsonConvert.DeserializeObject<Root>(json);

and class

public class Root
{
public Data[] result {get; set;}
}

output

{
"result": [
{
"tmcp_post_fields": [],
"product_id": 703,
"per_product_pricing": true,
"cpf_product_price": "45",
"variation_id": false,
"form_prefix": "",
"tc_added_in_currency": "EUR",
"tc_default_currency": "EUR"
}
]
}

Convert invalid json into valid json

All keys (preOpen, preClose, ...) have to be strings, so they need double-quotes around.

{
"preOpen": "900",
"preClose": "908",
...
}

=== UPDATE ===

If you have an invalid Json-String you can convert it with following script:

$sInvalidJson = '{
preOpen: "900",
preClose: "908"
}';
$sValidJson = preg_replace("/(\n[\t ]*)([^\t ]+):/", "$1\"$2\":", $sInvalidJson);

Also see this example.

(This script only works with the invalid JSON described above, otherwise the pattern has to be changed.)

=== UPDATE ===

$sInvalidJson = '{preOpen:"900",preClose:"908",mktOpen:"915",mktClose:"1530",corrOpen:"1540",corrClose:"1600",mktStatusCode:"3",status:"MARKET OPEN",time:"Jan 11, 2012 14:25:15",data:[{name:"S&P CNX NIFTY Pre Open",lastPrice:"4,863.15",change:"13.60",pChange:"0.28",imgFileName:"S&P_CNX_NIFTY_Pre_Open_open.png"},{name:"S&P CNX NIFTY",lastPrice:"4,847.85",change:"-1.70",pChange:"-0.04",imgFileName:"S&P_CNX_NIFTY_open.png"},{name:"CNX NIFTY JUNIOR",lastPrice:"8,917.00",change:"68.85",pChange:"0.78",imgFileName:"CNX_NIFTY_JUNIOR_open.png"},{name:"BANK NIFTY",lastPrice:"8,768.75",change:"33.70",pChange:"0.39",imgFileName:"BANK_NIFTY_open.png"},{name:"INDIA VIX",lastPrice:"24.61",change:"0.61",pChange:"2.54",imgFileName:"INDIA_VIX_open.png"},{name:"CNX 100",lastPrice:"4,707.85",change:"3.65",pChange:"0.08",imgFileName:"CNX_100_open.png"},{name:"S&P CNX DEFTY",lastPrice:"3,253.50",change:"30.20",pChange:"0.94",imgFileName:"S&P_CNX_DEFTY_open.png"},{name:"S&P CNX 500",lastPrice:"3,795.40",change:"10.05",pChange:"0.27",imgFileName:"S&P_CNX_500_open.png"},{name:"CNX MIDCAP",lastPrice:"6,524.90",change:"57.35",pChange:"0.89",imgFileName:"CNX_MIDCAP_open.png"},{name:"NIFTY MIDCAP 50",lastPrice:"1,926.55",change:"10.65",pChange:"0.56",imgFileName:"NIFTY_MIDCAP_50_open.png"},{name:"CNX INFRA",lastPrice:"2,262.05",change:"-3.05",pChange:"-0.13",imgFileName:"CNX_INFRA_open.png"},{name:"CNX REALTY",lastPrice:"207.70",change:"7.95",pChange:"3.98",imgFileName:"CNX_REALTY_open.png"},{name:"CNX ENERGY",lastPrice:"7,301.05",change:"37.60",pChange:"0.52",imgFileName:"CNX_ENERGY_open.png"},{name:"CNX FMCG",lastPrice:"10,235.35",change:"-62.65",pChange:"-0.61",imgFileName:"CNX_FMCG_open.png"},{name:"CNX MNC",lastPrice:"4,631.55",change:"1.60",pChange:"0.03",imgFileName:"CNX_MNC_open.png"},{name:"CNX PHARMA",lastPrice:"4,749.95",change:"2.65",pChange:"0.06",imgFileName:"CNX_PHARMA_open.png"},{name:"CNX PSE",lastPrice:"2,744.85",change:"5.55",pChange:"0.20",imgFileName:"CNX_PSE_open.png"},{name:"CNX PSU BANK",lastPrice:"2,841.10",change:"15.95",pChange:"0.56",imgFileName:"CNX_PSU_BANK_open.png"},{name:"CNX SERVICE",lastPrice:"5,900.60",change:"-11.40",pChange:"-0.19",imgFileName:"CNX_SERVICE_open.png"},{name:"CNX IT",lastPrice:"6,262.10",change:"-69.65",pChange:"-1.10",imgFileName:"CNX_IT_open.png"},{name:"CNX SMALLCAP",lastPrice:"2,963.90",change:"31.95",pChange:"1.09",imgFileName:"CNX_SMALLCAP_open.png"},{name:"CNX 200",lastPrice:"2,421.50",change:"3.80",pChange:"0.16",imgFileName:"CNX_200_open.png"},{name:"CNX AUTO",lastPrice:"3,484.30",change:"-9.25",pChange:"-0.26",imgFileName:"CNX_AUTO_open.png"},{name:"CNX MEDIA",lastPrice:"1,139.60",change:"15.65",pChange:"1.39",imgFileName:"CNX_MEDIA_open.png"},{name:"CNX METAL",lastPrice:"2,726.75",change:"40.40",pChange:"1.50",imgFileName:"CNX_METAL_open.png"}]}';
$sValidJson = preg_replace("/([{,])([a-zA-Z][^: ]+):/", "$1\"$2\":", $sInvalidJson);

Also this updated example.

How to Fix JSON Key Values without double-quotes?

Using Regex:

import re
data = """{ Name: "test", Address: "xyz"}"""
print( re.sub("(\w+):", r'"\1":', data) )

Output:

{ "Name": "test", "Address": "xyz"}

How to insert a missing delimiter character in an invalid JSON in Python?

You can slice the string at the position where the delimiter ',' is missing (stored as the pos attribute of exception object), and join them with ',':

import json

s = '[{"a":1}{"b":2}{"c":3}]'
while True:
try:
data = json.loads(s)
break
except json.decoder.JSONDecodeError as e:
if not e.args[0].startswith("Expecting ',' delimiter:"):
raise
s = ','.join((s[:e.pos], s[e.pos:]))
print(s)
print(data)

This outputs:

[{"a":1},{"b":2},{"c":3}]
[{'a': 1}, {'b': 2}, {'c': 3}]

How to make a invalid json to val;id json in python

Yes, it's not valid JSON, but you can pass the string to ast.literal_eval if you surround it with brackets:

>>> s="""{'Link': 'media/pdf/details/all-india-govt-jobs/other-all-india-govt-jobs/5472540504.pdf', 'Title': 'Corrigendum'},
... {'Link': 'media/pdf/details/all-india-govt-jobs/other-all-india-govt-jobs/3901883467.pdf', 'Title': 'Notification '},
... {'Link': 'http://www.nbagr.res.in/', 'Title': ' Official Website'}"""
>>> import ast
>>> ast.literal_eval("[" + s + "]")
[{'Link': 'media/pdf/details/all-india-govt-jobs/other-all-india-govt-jobs/5472540504.pdf', 'Title': 'Corrigendum'},
{'Link': 'media/pdf/details/all-india-govt-jobs/other-all-india-govt-jobs/3901883467.pdf', 'Title': 'Notification '},
{'Link': 'http://www.nbagr.res.in/', 'Title': ' Official Website'}]


Related Topics



Leave a reply



Submit