Python - How to Convert JSON File to Dataframe

Python - How to convert JSON File to Dataframe

Creating dataframe from dictionary object.

import pandas as pd
data = [{'name': 'vikash', 'age': 27}, {'name': 'Satyam', 'age': 14}]
df = pd.DataFrame.from_dict(data, orient='columns')

df
Out[4]:
age name
0 27 vikash
1 14 Satyam

If you have nested columns then you first need to normalize the data:

data = [
{
'name': {
'first': 'vikash',
'last': 'singh'
},
'age': 27
},
{
'name': {
'first': 'satyam',
'last': 'singh'
},
'age': 14
}
]

df = pd.DataFrame.from_dict(pd.json_normalize(data), orient='columns')

df
Out[8]:
age name.first name.last
0 27 vikash singh
1 14 satyam singh

Source:

  • pandas.DataFrame.from_dict
  • pandas.json_normalize

JSON to pandas DataFrame

I found a quick and easy solution to what I wanted using json_normalize() included in pandas 1.01.

from urllib2 import Request, urlopen
import json

import pandas as pd

path1 = '42.974049,-81.205203|42.974298,-81.195755'
request=Request('http://maps.googleapis.com/maps/api/elevation/json?locations='+path1+'&sensor=false')
response = urlopen(request)
elevations = response.read()
data = json.loads(elevations)
df = pd.json_normalize(data['results'])

This gives a nice flattened dataframe with the json data that I got from the Google Maps API.

Read json file as pandas dataframe?

If you open the file as binary ('rb'), you will get bytes. How about:

with open('C:/Users/Alberto/nutrients.json', 'rU') as f:

Also as noted in this answer you can also use pandas directly like:

df = pd.read_json('C:/Users/Alberto/nutrients.json', lines=True)

Convert JSON data to Pandas DataFrame where keys and values are in different sections of JSON

Construct a DataFrame by extracting the values under the "values" key; assign column names using the list under "my_data_columns_headers" key, which is under the "my_data" key.

out = pd.DataFrame(pd.Series(data['values']).str.get('data').tolist(), columns=data['my_data']['my_data_columns_headers'])

Output:

  heading 1 heading 2 heading 3 heading 4 heading 5
0 value 1 value 2 value 3 value 4 value 5
1 value 1 value 2 value 3 value 4 value 5

how to convert a json file into a pandas dataframe

Your file is not JSON. JSON expects property names to be enclosed in double quotes. But, you can work around it.

import pandas as pd 

with open('data.txt') as f:
data = [eval(line) for line in f]
df = pd.DataFrame(data)

problem when try to convert json file to dataframe with pandas

I'm not sure what you would like to achieve in the final outcome. But if you run below code you should get something in return

import pandas as pd
import json
# from pandas.io.json import json_normalize #no need this
pd.options.display.max_colwidth = 100 #default=50
pd.options.display.max_rows = 250 #default=60

data = pd.read_json('response_1660310720193.json')
df = pd.json_normalize(data['event']).T
print(df)

Output:

eventPublishTime                                                                0
eventSubmissionTime 0
correlationId string
eventName string
senderOrgName string
senderOrgTypes [string]
originatorId string
eventOccurrenceTime 1589574600000
eventOccurrenceTime8601 2020-05-15T15:30:00.000-05:00
fromOceanAggregator false
originatorName string

<truncated>

declarationRef string
transportEvents [{'eventAction': 'Arrival', 'transpo...
doc.description string

<truncated>

You should see 'transportEvents' in the df, so if you drill deeper you can get the data

df1 = pd.json_normalize(df.T['transportEvents'].explode().tolist()).T
df1.columns = ['transportEvents']
print(df1)

transportEvents
eventAction Arrival
transportMode Rail
eventOccurrenceTime8601 2018-03-13T11:30:00.000-05:00
transportPlanSequenceNumber 1
transportationPhase Import
vehicleId JEV4568
vehicleName Vehicle Name
voyageId 1234
emptyIndicator Laden
location.unlocode NLRTM

After the above exploration, you would then know the correct keys to apply the function pd.json_normalize(), with the first key event and nested with second key transportEvents, same output as above

df1 = pd.json_normalize(data, ['event', 'transportEvents'])
df1.columns = ['transportEvents']
print(df1)

Convert JSON file to Pandas dataframe

You could use this:

def flatten_dict(d):
""" Returns list of lists from given dictionary """
l = []
for k, v in sorted(d.items()):
if isinstance(v, dict):
flatten_v = flatten_dict(v)
for my_l in reversed(flatten_v):
my_l.insert(0, k)

l.extend(flatten_v)

elif isinstance(v, list):
for l_val in v:
l.append([k, l_val])

else:
l.append([k, v])

return l

This function receives a dictionary (including nesting where values could also be lists) and flattens it to a list of lists.

Then, you can simply:

df = pd.DataFrame(flatten_dict(my_dict))

Where my_dict is your JSON object.
Taking your example, what you get when you run print(df) is:

          0        1             2         3     4
0 country1 AdUnit1 floor_price1 feature1 1111
1 country1 AdUnit1 floor_price1 feature2 1112
2 country1 AdUnit1 floor_price2 feature1 1121
3 country1 AdUnit2 floor_price1 feature1 1211
4 country1 AdUnit2 floor_price2 feature1 1221
5 country2 AdUnit1 floor_price1 feature1 2111
6 country2 AdUnit1 floor_price1 feature2 2112

And when you create the dataframe, you can name your columns and index

Convert json file to dataframe and remove whitespaces and newlines from value

@chitown88's answer is probably faster, but if you want to do it using regex you can do it like that:

df.replace('(^\s+|\s+$)', '', regex=True, inplace=True)

Output:

   name  age       car
0 John 30 Bmw
1 Joe 20 mercedes
2 Alex 18 tesla

Convert JSON into dataframe

One possible approach is to create a DataFrame from the value under "Results" (this will create a column named "index") and build another DataFrame with the "index" column and join it back to the original DataFrame:

df = pd.DataFrame(data['Results'])
df = df.join(pd.DataFrame(df['index'].tolist())).drop(columns=['prediction_interval', 'index'])
df['SaleDate'] = pd.to_datetime(df['SaleDate'], unit='ms')

Output:

    forecast   SaleDate  OfferingGroupId
0 2.163242 2022-02-08 0
1 16.354220 2022-02-09 1


Related Topics



Leave a reply



Submit