Python - How to convert JSON File to Dataframe
Creating dataframe from dictionary object.
import pandas as pd
data = [{'name': 'vikash', 'age': 27}, {'name': 'Satyam', 'age': 14}]
df = pd.DataFrame.from_dict(data, orient='columns')
df
Out[4]:
age name
0 27 vikash
1 14 Satyam
If you have nested columns then you first need to normalize the data:
data = [
{
'name': {
'first': 'vikash',
'last': 'singh'
},
'age': 27
},
{
'name': {
'first': 'satyam',
'last': 'singh'
},
'age': 14
}
]
df = pd.DataFrame.from_dict(pd.json_normalize(data), orient='columns')
df
Out[8]:
age name.first name.last
0 27 vikash singh
1 14 satyam singh
Source:
pandas.DataFrame.from_dict
pandas.json_normalize
JSON to pandas DataFrame
I found a quick and easy solution to what I wanted using json_normalize()
included in pandas 1.01
.
from urllib2 import Request, urlopen
import json
import pandas as pd
path1 = '42.974049,-81.205203|42.974298,-81.195755'
request=Request('http://maps.googleapis.com/maps/api/elevation/json?locations='+path1+'&sensor=false')
response = urlopen(request)
elevations = response.read()
data = json.loads(elevations)
df = pd.json_normalize(data['results'])
This gives a nice flattened dataframe with the json data that I got from the Google Maps API.
Read json file as pandas dataframe?
If you open the file as binary ('rb'
), you will get bytes. How about:
with open('C:/Users/Alberto/nutrients.json', 'rU') as f:
Also as noted in this answer you can also use pandas directly like:
df = pd.read_json('C:/Users/Alberto/nutrients.json', lines=True)
Convert JSON data to Pandas DataFrame where keys and values are in different sections of JSON
Construct a DataFrame by extracting the values under the "values" key; assign column names using the list under "my_data_columns_headers" key, which is under the "my_data" key.
out = pd.DataFrame(pd.Series(data['values']).str.get('data').tolist(), columns=data['my_data']['my_data_columns_headers'])
Output:
heading 1 heading 2 heading 3 heading 4 heading 5
0 value 1 value 2 value 3 value 4 value 5
1 value 1 value 2 value 3 value 4 value 5
how to convert a json file into a pandas dataframe
Your file is not JSON. JSON expects property names to be enclosed in double quotes. But, you can work around it.
import pandas as pd
with open('data.txt') as f:
data = [eval(line) for line in f]
df = pd.DataFrame(data)
problem when try to convert json file to dataframe with pandas
I'm not sure what you would like to achieve in the final outcome. But if you run below code you should get something in return
import pandas as pd
import json
# from pandas.io.json import json_normalize #no need this
pd.options.display.max_colwidth = 100 #default=50
pd.options.display.max_rows = 250 #default=60
data = pd.read_json('response_1660310720193.json')
df = pd.json_normalize(data['event']).T
print(df)
Output:
eventPublishTime 0
eventSubmissionTime 0
correlationId string
eventName string
senderOrgName string
senderOrgTypes [string]
originatorId string
eventOccurrenceTime 1589574600000
eventOccurrenceTime8601 2020-05-15T15:30:00.000-05:00
fromOceanAggregator false
originatorName string
<truncated>
declarationRef string
transportEvents [{'eventAction': 'Arrival', 'transpo...
doc.description string
<truncated>
You should see 'transportEvents' in the df
, so if you drill deeper you can get the data
df1 = pd.json_normalize(df.T['transportEvents'].explode().tolist()).T
df1.columns = ['transportEvents']
print(df1)
transportEvents
eventAction Arrival
transportMode Rail
eventOccurrenceTime8601 2018-03-13T11:30:00.000-05:00
transportPlanSequenceNumber 1
transportationPhase Import
vehicleId JEV4568
vehicleName Vehicle Name
voyageId 1234
emptyIndicator Laden
location.unlocode NLRTM
After the above exploration, you would then know the correct keys to apply the function pd.json_normalize()
, with the first key event
and nested with second key transportEvents
, same output as above
df1 = pd.json_normalize(data, ['event', 'transportEvents'])
df1.columns = ['transportEvents']
print(df1)
Convert JSON file to Pandas dataframe
You could use this:
def flatten_dict(d):
""" Returns list of lists from given dictionary """
l = []
for k, v in sorted(d.items()):
if isinstance(v, dict):
flatten_v = flatten_dict(v)
for my_l in reversed(flatten_v):
my_l.insert(0, k)
l.extend(flatten_v)
elif isinstance(v, list):
for l_val in v:
l.append([k, l_val])
else:
l.append([k, v])
return l
This function receives a dictionary (including nesting where values could also be lists) and flattens it to a list of lists.
Then, you can simply:
df = pd.DataFrame(flatten_dict(my_dict))
Where my_dict
is your JSON object.
Taking your example, what you get when you run print(df)
is:
0 1 2 3 4
0 country1 AdUnit1 floor_price1 feature1 1111
1 country1 AdUnit1 floor_price1 feature2 1112
2 country1 AdUnit1 floor_price2 feature1 1121
3 country1 AdUnit2 floor_price1 feature1 1211
4 country1 AdUnit2 floor_price2 feature1 1221
5 country2 AdUnit1 floor_price1 feature1 2111
6 country2 AdUnit1 floor_price1 feature2 2112
And when you create the dataframe, you can name your columns and index
Convert json file to dataframe and remove whitespaces and newlines from value
@chitown88's answer is probably faster, but if you want to do it using regex you can do it like that:
df.replace('(^\s+|\s+$)', '', regex=True, inplace=True)
Output:
name age car
0 John 30 Bmw
1 Joe 20 mercedes
2 Alex 18 tesla
Convert JSON into dataframe
One possible approach is to create a DataFrame from the value under "Results" (this will create a column named "index") and build another DataFrame with the "index" column and join
it back to the original DataFrame:
df = pd.DataFrame(data['Results'])
df = df.join(pd.DataFrame(df['index'].tolist())).drop(columns=['prediction_interval', 'index'])
df['SaleDate'] = pd.to_datetime(df['SaleDate'], unit='ms')
Output:
forecast SaleDate OfferingGroupId
0 2.163242 2022-02-08 0
1 16.354220 2022-02-09 1
Related Topics
Pythonic Way to Create Union of All Values Contained in Multiple Lists
Datetime from String in Python, Best-Guessing String Format
How to Re Import an Updated Package While in Python Interpreter
Is There Any Built-In Way to Get the Length of an Iterable in Python
Unique Combinations of Values in Selected Columns in Pandas Data Frame and Count
How to Redirect the Output of Print to a Txt File
How to Assign a Value to a Tensorflow Variable
Enable Python to Connect to MySQL via Ssh Tunnelling
How to Use a Custom Comparison Function in Python 3
Set Environment Variable in Python Script
Python Requests.Exceptions.Sslerror: Eof Occurred in Violation of Protocol
Convert Timedelta64[Ns] Column to Seconds in Python Pandas Dataframe