Loading JSONL file as JSON objects
The splitlines would address that problem for you, so In general the code below will work for you:
import json
result = [json.loads(jline) for jline in jsonl_content.splitlines()]
If that's the response object the result would be:result = [json.loads(jline) for jline in response.read().splitlines()]
Both json.load and json.loads is unable to load my jsonl file
To read a JSONL file one has to read lines and then parse them.
data = []
with open("mli_train_v1.jsonl", 'r', encoding='utf-8') as f:
for line in f:
data.append(json.loads(line))
Loading and parsing a JSON file with multiple JSON objects
You have a JSON Lines format text file. You need to parse your file line by line:
import json
data = []
with open('file') as f:
for line in f:
data.append(json.loads(line))
Each line contains valid JSON, but as a whole, it is not a valid JSON value as there is no top-level list or object definition.Note that because the file contains JSON per line, you are saved the headaches of trying to parse it all in one go or to figure out a streaming JSON parser. You can now opt to process each line separately before moving on to the next, saving memory in the process. You probably don't want to append each result to one list and then process everything if your file is really big.
If you have a file containing individual JSON objects with delimiters in-between, use How do I use the 'json' module to read in one JSON object at a time? to parse out individual objects using a buffered method.
Python conversion from JSON to JSONL
Your input appears to be a sequence of Python objects; it certainly is not valid a JSON document.
If you have a list of Python dictionaries, then all you have to do is dump each entry into a file separately, followed by a newline:
import json
with open('output.jsonl', 'w') as outfile:
for entry in JSON_file:
json.dump(entry, outfile)
outfile.write('\n')
The default configuration for the json
module is to output JSON without newlines embedded.Assuming your A
, B
and C
names are really strings, that would produce:
{"index": 1, "met": "1043205", "no": "A"}
{"index": 2, "met": "000031043206", "no": "B"}
{"index": 3, "met": "0031043207", "no": "C"}
If you started with a JSON document containing a list of entries, just parse that document first with json.load()
/json.loads()
. merge & write two jsonl (json lines) files into a new jsonl file in python3.6
It is possible that extract_json returns a generator instead of a list/dict which is json serializable
since it is jsonl, which means each line is a valid json
so you just need to tweak your existing code a little bit.
import json
import glob
result = []
for f in glob.glob("folder_with_all_jsonl/*.jsonl"):
with open(f, 'r', encoding='utf-8-sig') as infile:
for line in infile.readlines():
try:
result.append(json.loads(line)) # read each line of the file
except ValueError:
print(f)
# This would output jsonl
with open('merged_file.jsonl','w', encoding= 'utf-8-sig') as outfile:
#json.dump(result, outfile)
#write each line as a json
outfile.write("\n".join(map(json.dumps, result)))
Now that I think about it you didn't even have to load it using json, except it will help you sanitize any badly formatted JSON lines is allyou could collect all the lines in one shot like this
outfile = open('merged_file.jsonl','w', encoding= 'utf-8-sig')
for f in glob.glob("folder_with_all_jsonl/*.jsonl"):
with open(f, 'r', encoding='utf-8-sig') as infile:
for line in infile.readlines():
outfile.write(line)
outfile.close()
Merge multiple JSONL files from a folder using Python
You can update a main dict with every json object you load. Like
import json
import glob
result = {}
for f in glob.glob("*.json"):
with jsonlines.open(f) as infile:
result.update(json.load(infile)) #merge the dicts
with open("merged_file.json", "wb") as outfile:
json.dump(result, outfile)
But this will overwite similar keys.! list of json files into jsonL file using Python
Here's how you read json files from a directory in python and then output the loaded json
files into a single jsonl
file:
import os, json
import pandas as pd
directory = '/Path/To/Your/Json/Directory' #Specify your json directory path here
json_list=[] #Initiate a new blank list for storing json data in list format
for dirpath, subdirs, files in os.walk(directory):
print(dirpath)
print(filename)
print(file)
for file in files:
if file.endswith(".json"):
with open(os.path.join(dirpath, file)) as json_file:
data = json.load(json_file)
json_list.append(data)
#Now, output the list of json data into a single jsonl file
with open('output.jsonl', 'w') as outfile:
for entry in json_list:
json.dump(entry, outfile)
outfile.write('\n')
Loading a very large jsonl in pandas returns ValueError
You seem to have malformed JSON data in your file. For example, try loading the following "JSON" data - note that id 77 is malformed.
{"created_at": "2019-01-01 23:45:01", "id":1}
{"created_at": "2019-01-01 23:45:01", "id":2}
{"created_at": "2019-01-01 23:45:01", "id":3}
{"created_at": "2019-01-01 23:45:01", "id":4}
{"created_at": "2019-01-01 23:45:01", "id":5}
{"created_at": "2019-01-01 23:45:01", "id":6}
{"created_at": "2019-01-01 23:45:01", "id":7}
{"created_at": "2019-01-01 23:45:01", "id":8}
{"created_at": "2019-01-01 23:45:01", "id":11}
{"created_at": "2019-01-01 23:45:01", "id":22}
{"created_at": "2019-01-01 23:45:01", "id":33}
{"created_at": "2019-01-01 23:45:01", "id":44}
{"created_at": "2019-01-01 23:45:01", "id":55}
{"created_at": "2019-01-01 23:45:01", "id":66}
{i"created_at": "2019-01-01 23:45:01", "id":77}
{"created_at": "2019-01-01 23:45:01", "id":88}
{"created_at": "2019-01-01 23:45:01", "id":99}
Then run this code.>>> import pandas as pd
>>> reader = pd.read_json("January.jsonl", lines=True, chunksize=1)
>>> for r in reader:
... print(r)
And view the output:12 2019-01-01 23:45:01 55
created_at id
13 2019-01-01 23:45:01 66
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/user/anaconda3/envs/project/lib/python3.7/site-packages/pandas/io/json/_json.py", line 779, in __next__
obj = self._get_object_parser(lines_json)
File "/home/user/anaconda3/envs/project/lib/python3.7/site-packages/pandas/io/json/_json.py", line 753, in _get_object_parser
obj = FrameParser(json, **kwargs).parse()
File "/home/user/anaconda3/envs/project/lib/python3.7/site-packages/pandas/io/json/_json.py", line 857, in parse
self._parse_no_numpy()
File "/home/user/anaconda3/envs/project/lib/python3.7/site-packages/pandas/io/json/_json.py", line 1089, in _parse_no_numpy
loads(json, precise_float=self.precise_float), dtype=None
ValueError: Expected object or value
The error is the same as the one you received. You will need to find the malformed data and fix it. You could try reading the JSON data line by line to find out where the error(s) exists and extract the lines to inspect them.f = open("January.jsonl")
lines=f.readlines()
for line_no, line in enumerate(lines):
try:
data = json.loads(line)
except Exception:
print(line_no)
print(line)
What’s the best way to load a JSONObject from a json text file?
try this:
import net.sf.json.JSONObject;
import net.sf.json.JSONSerializer;
import org.apache.commons.io.IOUtils;
public class JsonParsing {
public static void main(String[] args) throws Exception {
InputStream is =
JsonParsing.class.getResourceAsStream( "sample-json.txt");
String jsonTxt = IOUtils.toString( is );
JSONObject json = (JSONObject) JSONSerializer.toJSON( jsonTxt );
double coolness = json.getDouble( "coolness" );
int altitude = json.getInt( "altitude" );
JSONObject pilot = json.getJSONObject("pilot");
String firstName = pilot.getString("firstName");
String lastName = pilot.getString("lastName");
System.out.println( "Coolness: " + coolness );
System.out.println( "Altitude: " + altitude );
System.out.println( "Pilot: " + lastName );
}
}
and this is your sample-json.txt , should be in json format{
'foo':'bar',
'coolness':2.0,
'altitude':39000,
'pilot':
{
'firstName':'Buzz',
'lastName':'Aldrin'
},
'mission':'apollo 11'
}
Related Topics
How to Convert an Int to a Hex String
Opencv Python: Draw Minarearect ( Rotatedrect Not Implemented)
How to Strip All Whitespace from String
How to Plot Empirical Cdf (Ecdf)
What Does Model.Train() Do in Pytorch
Python Pandas Dataframe, Is It Pass-By-Value or Pass-By-Reference
Rotating a Two-Dimensional Array in Python
How to Calculate the Inverse of the Normal Cumulative Distribution Function in Python
Brew Installation of Python 3.6.1: [Ssl: Certificate_Verify_Failed] Certificate Verify Failed
Python Socket Receive - Incoming Packets Always Have a Different Size
Dump to JSON Adds Additional Double Quotes and Escaping of Quotes
Loading Initial Data with Django 1.7 and Data Migrations
Django Abstract Models Versus Regular Inheritance
Appending to the Same List from Different Processes Using Multiprocessing