Loading JSONl File as JSON Objects

Loading JSONL file as JSON objects

The splitlines would address that problem for you, so In general the code below will work for you:

import json

result = [json.loads(jline) for jline in jsonl_content.splitlines()]

If that's the response object the result would be:

result = [json.loads(jline) for jline in response.read().splitlines()]

Both json.load and json.loads is unable to load my jsonl file

To read a JSONL file one has to read lines and then parse them.

data = []
with open("mli_train_v1.jsonl", 'r', encoding='utf-8') as f:
for line in f:
data.append(json.loads(line))

Loading and parsing a JSON file with multiple JSON objects

You have a JSON Lines format text file. You need to parse your file line by line:

import json

data = []
with open('file') as f:
for line in f:
data.append(json.loads(line))

Each line contains valid JSON, but as a whole, it is not a valid JSON value as there is no top-level list or object definition.

Note that because the file contains JSON per line, you are saved the headaches of trying to parse it all in one go or to figure out a streaming JSON parser. You can now opt to process each line separately before moving on to the next, saving memory in the process. You probably don't want to append each result to one list and then process everything if your file is really big.

If you have a file containing individual JSON objects with delimiters in-between, use How do I use the 'json' module to read in one JSON object at a time? to parse out individual objects using a buffered method.

Python conversion from JSON to JSONL

Your input appears to be a sequence of Python objects; it certainly is not valid a JSON document.

If you have a list of Python dictionaries, then all you have to do is dump each entry into a file separately, followed by a newline:

import json

with open('output.jsonl', 'w') as outfile:
for entry in JSON_file:
json.dump(entry, outfile)
outfile.write('\n')

The default configuration for the json module is to output JSON without newlines embedded.

Assuming your A, B and C names are really strings, that would produce:

{"index": 1, "met": "1043205", "no": "A"}
{"index": 2, "met": "000031043206", "no": "B"}
{"index": 3, "met": "0031043207", "no": "C"}

If you started with a JSON document containing a list of entries, just parse that document first with json.load()/json.loads().

merge & write two jsonl (json lines) files into a new jsonl file in python3.6

It is possible that extract_json returns a generator instead of a list/dict which is json serializable

since it is jsonl, which means each line is a valid json

so you just need to tweak your existing code a little bit.

import json
import glob

result = []
for f in glob.glob("folder_with_all_jsonl/*.jsonl"):
with open(f, 'r', encoding='utf-8-sig') as infile:
for line in infile.readlines():
try:
result.append(json.loads(line)) # read each line of the file
except ValueError:
print(f)

# This would output jsonl
with open('merged_file.jsonl','w', encoding= 'utf-8-sig') as outfile:
#json.dump(result, outfile)
#write each line as a json
outfile.write("\n".join(map(json.dumps, result)))

Now that I think about it you didn't even have to load it using json, except it will help you sanitize any badly formatted JSON lines is all

you could collect all the lines in one shot like this

outfile = open('merged_file.jsonl','w', encoding= 'utf-8-sig')
for f in glob.glob("folder_with_all_jsonl/*.jsonl"):
with open(f, 'r', encoding='utf-8-sig') as infile:
for line in infile.readlines():
outfile.write(line)
outfile.close()

Merge multiple JSONL files from a folder using Python

You can update a main dict with every json object you load. Like

import json
import glob

result = {}
for f in glob.glob("*.json"):
with jsonlines.open(f) as infile:
result.update(json.load(infile)) #merge the dicts

with open("merged_file.json", "wb") as outfile:
json.dump(result, outfile)

But this will overwite similar keys.!

list of json files into jsonL file using Python

Here's how you read json files from a directory in python and then output the loaded json files into a single jsonl file:

import os, json
import pandas as pd

directory = '/Path/To/Your/Json/Directory' #Specify your json directory path here

json_list=[] #Initiate a new blank list for storing json data in list format
for dirpath, subdirs, files in os.walk(directory):
print(dirpath)
print(filename)
print(file)
for file in files:
if file.endswith(".json"):
with open(os.path.join(dirpath, file)) as json_file:
data = json.load(json_file)
json_list.append(data)

#Now, output the list of json data into a single jsonl file
with open('output.jsonl', 'w') as outfile:
for entry in json_list:
json.dump(entry, outfile)
outfile.write('\n')

Loading a very large jsonl in pandas returns ValueError

You seem to have malformed JSON data in your file. For example, try loading the following "JSON" data - note that id 77 is malformed.

{"created_at": "2019-01-01 23:45:01", "id":1}
{"created_at": "2019-01-01 23:45:01", "id":2}
{"created_at": "2019-01-01 23:45:01", "id":3}
{"created_at": "2019-01-01 23:45:01", "id":4}
{"created_at": "2019-01-01 23:45:01", "id":5}
{"created_at": "2019-01-01 23:45:01", "id":6}
{"created_at": "2019-01-01 23:45:01", "id":7}
{"created_at": "2019-01-01 23:45:01", "id":8}
{"created_at": "2019-01-01 23:45:01", "id":11}
{"created_at": "2019-01-01 23:45:01", "id":22}
{"created_at": "2019-01-01 23:45:01", "id":33}
{"created_at": "2019-01-01 23:45:01", "id":44}
{"created_at": "2019-01-01 23:45:01", "id":55}
{"created_at": "2019-01-01 23:45:01", "id":66}
{i"created_at": "2019-01-01 23:45:01", "id":77}

{"created_at": "2019-01-01 23:45:01", "id":88}
{"created_at": "2019-01-01 23:45:01", "id":99}

Then run this code.

>>> import pandas as pd
>>> reader = pd.read_json("January.jsonl", lines=True, chunksize=1)
>>> for r in reader:
... print(r)

And view the output:

12 2019-01-01 23:45:01  55
created_at id
13 2019-01-01 23:45:01 66
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/user/anaconda3/envs/project/lib/python3.7/site-packages/pandas/io/json/_json.py", line 779, in __next__
obj = self._get_object_parser(lines_json)
File "/home/user/anaconda3/envs/project/lib/python3.7/site-packages/pandas/io/json/_json.py", line 753, in _get_object_parser
obj = FrameParser(json, **kwargs).parse()
File "/home/user/anaconda3/envs/project/lib/python3.7/site-packages/pandas/io/json/_json.py", line 857, in parse
self._parse_no_numpy()
File "/home/user/anaconda3/envs/project/lib/python3.7/site-packages/pandas/io/json/_json.py", line 1089, in _parse_no_numpy
loads(json, precise_float=self.precise_float), dtype=None
ValueError: Expected object or value

The error is the same as the one you received. You will need to find the malformed data and fix it. You could try reading the JSON data line by line to find out where the error(s) exists and extract the lines to inspect them.

f = open("January.jsonl")
lines=f.readlines()
for line_no, line in enumerate(lines):
try:
data = json.loads(line)
except Exception:
print(line_no)
print(line)

What’s the best way to load a JSONObject from a json text file?

try this:

import net.sf.json.JSONObject;
import net.sf.json.JSONSerializer;
import org.apache.commons.io.IOUtils;

public class JsonParsing {

public static void main(String[] args) throws Exception {
InputStream is =
JsonParsing.class.getResourceAsStream( "sample-json.txt");
String jsonTxt = IOUtils.toString( is );

JSONObject json = (JSONObject) JSONSerializer.toJSON( jsonTxt );
double coolness = json.getDouble( "coolness" );
int altitude = json.getInt( "altitude" );
JSONObject pilot = json.getJSONObject("pilot");
String firstName = pilot.getString("firstName");
String lastName = pilot.getString("lastName");

System.out.println( "Coolness: " + coolness );
System.out.println( "Altitude: " + altitude );
System.out.println( "Pilot: " + lastName );
}
}

and this is your sample-json.txt , should be in json format

{
'foo':'bar',
'coolness':2.0,
'altitude':39000,
'pilot':
{
'firstName':'Buzz',
'lastName':'Aldrin'
},
'mission':'apollo 11'
}


Related Topics



Leave a reply



Submit