How to Extract Multiple JSON Objects from One File

How to extract multiple JSON objects from one file?

Use a json array, in the format:

[
{"ID":"12345","Timestamp":"20140101", "Usefulness":"Yes",
"Code":[{"event1":"A","result":"1"},…]},
{"ID":"1A35B","Timestamp":"20140102", "Usefulness":"No",
"Code":[{"event1":"B","result":"1"},…]},
{"ID":"AA356","Timestamp":"20140103", "Usefulness":"No",
"Code":[{"event1":"B","result":"0"},…]},
...
]

Then import it into your python code

import json

with open('file.json') as json_file:

data = json.load(json_file)

Now the content of data is an array with dictionaries representing each of the elements.

You can access it easily, i.e:

data[0]["ID"]

How to extract multiple independently nested JSON objects and keys from a website using Python

IIUC, you just need to call the .find_all method in your spanList to get all the json objects.

Try this:

from bs4 import BeautifulSoup
import requests
import json

reno = 'https://www.foodpantries.org/ci/nv-reno'
renoContent = requests.get(reno)
renoHtml = BeautifulSoup(renoContent.text, 'html.parser')
json_scripts = renoHtml.find("div", class_="span8").find_all('script', type='application/ld+json')
data = [json.loads(script.text, strict=False) for script in json_scripts]
#use strict=False to bypass json.decoder.JSONDecodeError: Invalid control character
print(data)

Extract Data from multiple objects in JSON variable in SQL

You need a combination of:

  1. OPENJSON() with default schema. The result is a table with columns key, value and type. The key column contains the key of each nested JSON object, the value column contains the value of each nested JSON object.
  2. OPENJSON() with explicit schema (the WITH clause) and an additional APPLY operator to parse the nested JSON objects from the first OPEBNJSON() call using the defined output schema:
SELECT j2.*
FROM OPENJSON(@json, '$.result') j1
OUTER APPLY OPENJSON(j1.[value]) WITH (
id nvarchar(50) '$.management_account_id',
lbl nvarchar(50) '$.management_account_label'
) j2

Result:

id    lbl
-----------------------------------------
6828 EXC001-00-GP Excellerate Facilities
12183 ENF001-04-GP The Zone

How to read multiple nested json objects in one file extract by pyspark to dataframe in Azure databricks?

  1. You can read it into an RDD first. It will be read as a list of strings
  2. You need to convert the json string into a native python datatype using
    json.loads()
  3. Then you can convert the RDD into a dataframe, and it can infer the schema directly using toDF()
  4. Using the answer from Flatten Spark Dataframe column of map/dictionary into multiple columns, you can explode the Data column into multiple columns. Given your Id column is going to be unique. Note that, explode would return key, value columns for each entry in the map type.
  5. You can repeat the 4th point to explode the properties column.

Solution:

import json

rdd = sc.textFile("demo_files/Test20191023.log")
df = rdd.map(lambda x: json.loads(x)).toDF()
df.show()
# +--------------------+----------+--------------------+----------+
# | Data| EventType| Id| Timestamp|
# +--------------------+----------+--------------------+----------+
# |[MessageTemplate ...|3735091736|event-c20b9c7eac0...|2019-03-19|
# |[MessageTemplate ...|3735091737|event-d20b9c7eac0...|2019-03-18|
# |[MessageTemplate ...|3735091738|event-e20b9c7eac0...|2019-03-17|
# +--------------------+----------+--------------------+----------+

data_exploded = df.select('Id', 'EventType', "Timestamp", F.explode('Data'))\
.groupBy('Id', 'EventType', "Timestamp").pivot('key').agg(F.first('value'))
# There is a duplicate Id column and might cause ambiguity problems
data_exploded.show()

# +--------------------+----------+----------+--------+-----+---------------+--------------------+
# | Id| EventType| Timestamp| Id|Level|MessageTemplate| Properties|
# +--------------------+----------+----------+--------+-----+---------------+--------------------+
# |event-c20b9c7eac0...|3735091736|2019-03-19|event-c2| 2| Test1|{CorrId=d69b7489,...|
# |event-d20b9c7eac0...|3735091737|2019-03-18|event-d2| 2| Test1|{CorrId=f69b7489,...|
# |event-e20b9c7eac0...|3735091738|2019-03-17|event-e2| 1| Test1|{CorrId=g69b7489,...|
# +--------------------+----------+----------+--------+-----+---------------+--------------------+


Related Topics



Leave a reply



Submit