How to normalize json correctly by Python Pandas
You could just pass data
without any extra params.
df = pd.io.json.json_normalize(data)
df
complete mid.c mid.h mid.l mid.o time volume
0 True 119.743 119.891 119.249 119.341 1488319200.000000000 14651
1 True 119.893 119.954 119.552 119.738 1488348000.000000000 10738
2 True 119.946 120.221 119.840 119.888 1488376800.000000000 10041
If you want to change the column order, use df.reindex
:
df = df.reindex(columns=['time', 'volume', 'complete', 'mid.h', 'mid.l', 'mid.c', 'mid.o'])
df
time volume complete mid.h mid.l mid.c mid.o
0 1488319200.000000000 14651 True 119.891 119.249 119.743 119.341
1 1488348000.000000000 10738 True 119.954 119.552 119.893 119.738
2 1488376800.000000000 10041 True 120.221 119.840 119.946 119.888
Pandas: JSON Normalize with brackets around the JSON?
If your json objects are under the xd columns, you can exctract that json, which is a list of dictionaries. A list of dictionaries can be used to create a dataframe object, from here.
list_of_dicts = list_of_dicts=list(map(lambda l: l[0], df['xd'].to_list()))
expected = pd.Dataframe(list_of_dicts)
Does this answer your question?
How to read and normalize following json in pandas?
Here is another way:
df = pd.read_json(r'C:\path\file.json')
final=df.stack().str[0].unstack()
final=final.assign(cities=final['cities'].str.split(',')).explode('cities')
final=final.assign(**pd.DataFrame(final.pop('user').str[0].tolist()))
print(final)
session_id unix_timestamp cities user_id joining_date \
0 X061RFWB06K9V 1442503708 New York NY 2024 2015-03-22
0 X061RFWB06K9V 1442503708 Newark NJ 2024 2015-03-22
1 5AZ2X2A9BHH5U 1441353991 New York NY 2024 2015-03-22
1 5AZ2X2A9BHH5U 1441353991 Jersey City NJ 2024 2015-03-22
1 5AZ2X2A9BHH5U 1441353991 Philadelphia PA 2024 2015-03-22
country
0 UK
0 UK
1 UK
1 UK
1 UK
Normalizing nested JSON object into Pandas dataframe
Personally, I would not use pd.json_normalize
for this case. Your JSON is quite complex, and unless you're really experienced with json_normalize
, the following code may take less time to understand for the average dev. In fact, you don't even need to see the JSON to understand exactly what this code does (although it would certainly help ;).
First, we can extract the objects (portfolios and their children) from the JSON into a list, and use a series of steps to get them in the right form and order:
def prep_obj(o):
"""Prepares an object (portfolio/child) from the JSON to be inserted into a dataframe."""
return {
'New Entity Group': o['name'],
} | o['columns']
# Get a list of lists, where each sub-list contains the portfolio object at index 0 and then the portfolio object's children:
groups = [[prep_obj(o), *[prep_obj(child) for child in o['children']]] for o in api_response['data']['attributes']['total']['children']]
# Sort the portfolio groups by their number:
groups.sort(key=lambda g: int(g[0]['New Entity Group'].split('_')[1]))
# Reverse the children of each portfolio group:
groups = [[g[0]] + g[1:][::-1] for g in groups]
# Flatten out the groups into one large list of objects:
objects = [obj for group in groups for obj in group]
# The above is exactly equivalent to the following:
# objects = []
# for group in groups:
# for obj in group:
# objects.append(obj)
Next, create the dataframe:
# Create a mapping for column names so that their display names can be used:
mapping = {col['key']: col['display_name'] for col in api_response['meta']['columns']}
# Create a dataframe from the list of objects:
df = pd.DataFrame(objects)
# Correct column names:
df = df.rename(mapping, axis=1)
# Reorder columns:
column_names = ["New Entity Group", "Entity ID", "Adjusted Value (1/31/2022, No Div, USD)", "Adjusted TWR (Current Quarter, No Div, USD)", "Adjusted TWR (YTD, No Div, USD)", "Annualized Adjusted TWR (Since Inception, No Div, USD)", "Inception Date", "Risk Target"]
df = df[column_names]
And formatting:
def format_twr_col(col):
return (
col
.abs()
.mul(100)
.round(2)
.pipe(lambda s: s.where(s.eq(0) | s.isna(), '(' + s.astype(str) + '%)'))
.pipe(lambda s: s.where(s.ne(0) | s.isna(), s.astype(str) + '%'))
.fillna('-')
)
def format_value_col(col):
positive_mask = col.ge(0)
col[positive_mask] = (
col[positive_mask]
.round()
.astype(int)
.map('${:,}'.format)
)
col[~positive_mask] = (
col[~positive_mask]
.astype(float)
.round()
.astype(int)
.abs()
.map('(${:,})'.format)
)
return col
df['Adjusted TWR (Current Quarter, No Div, USD)'] = format_twr_col(df['Adjusted TWR (Current Quarter, No Div, USD)'])
df['Annualized Adjusted TWR (Since Inception, No Div, USD)'] = format_twr_col(df['Annualized Adjusted TWR (Since Inception, No Div, USD)'])
df['Adjusted TWR (YTD, No Div, USD)'] = format_twr_col(df['Adjusted TWR (YTD, No Div, USD)'])
df['Adjusted Value (1/31/2022, No Div, USD)'] = format_value_col(df['Adjusted Value (1/31/2022, No Div, USD)'].copy())
df['Inception Date'] = pd.to_datetime(df['Inception Date']).dt.strftime('%b %d, %Y')
df['Entity ID'] = df['Entity ID'].fillna('')
And... voilà:
>>> pd.options.display.max_columns = None
>>> df
New Entity Group Entity ID Adjusted Value (1/31/2022, No Div, USD) Adjusted TWR (Current Quarter, No Div, USD) Adjusted TWR (YTD, No Div, USD) Annualized Adjusted TWR (Since Inception, No Div, USD) Inception Date Risk Target
0 Portfolio_1 $260,786 (44.55%) (44.55%) (44.55%) Apr 07, 2021 N/A
1 The FW Irrev Family Tr 9552252 $260,786 0.0% 0.0% 0.0% Jan 11, 2022 N/A
2 Portfolio_2 $18,396,664 (5.78%) (5.78%) (5.47%) Sep 03, 2021 Growth
3 FW DAF 10946585 $18,396,664 (5.78%) (5.78%) (5.47%) Sep 03, 2021 Growth
4 Portfolio_3 $60,143,818 (4.42%) (4.42%) (7.75%) Dec 17, 2020 NaN
5 The FW Family Trust 13014080 $475,356 (6.1%) (6.1%) (3.97%) Apr 09, 2021 Aggressive
6 FW Liquid Fund LP 13396796 $52,899,527 (4.15%) (4.15%) (4.15%) Dec 30, 2021 Aggressive
7 FW Holdings No. 2 LLC 8413655 $6,768,937 (0.77%) (0.77%) (11.84%) Mar 05, 2021 N/A
8 FW and FR Joint 9957007 ($1) - - - Dec 21, 2021 N/A
How to normalize a nested json with json_normalize
- Use
pandas.json_normalize()
- The following code uses
pandas v.1.2.4
- If you don't want the other columns, remove the list of
keys
assigned tometa
- Use
pandas.DataFrame.drop
to remove any other unwanted columns fromdf
.
import pandas as pd
df = pd.json_normalize(data, record_path=['results', 'docs'], meta=[['results', 'name'], 'numberOfResults'])
display(df)
id type category media label title subtitle results.name numberOfResults
0 RAKDI342342 Culture Culture unknown exampellabel testtitle and titletest Archive single 376
1 GUI6N5QHBPTO6GJ66VP5OXB7GKX6J7ER Culture Culture image more label als example test the second title picture single 376
Data
- The posted JSON / Dict is not correctly formed
- Assuming the following corrected form
data = \
{'numberOfResults': 376,
'results': [{'docs': [{'category': 'Culture',
'id': 'RAKDI342342',
'label': 'exampellabel',
'media': 'unknown',
'subtitle': 'Archive',
'title': 'testtitle and titletest',
'type': 'Culture'},
{'category': 'Culture',
'id': 'GUI6N5QHBPTO6GJ66VP5OXB7GKX6J7ER',
'label': 'more label als example',
'media': 'image',
'subtitle': 'picture',
'title': 'test the second title',
'type': 'Culture'}],
'name': 'single'}]}
Pandas JSON Normalize - Choose Correct Record Path
You can try to apply the following function to your json:
def flatten_nested_json_df(df):
df = df.reset_index()
s = (df.applymap(type) == list).all()
list_columns = s[s].index.tolist()
s = (df.applymap(type) == dict).all()
dict_columns = s[s].index.tolist()
while len(list_columns) > 0 or len(dict_columns) > 0:
new_columns = []
for col in dict_columns:
horiz_exploded = pd.json_normalize(df[col]).add_prefix(f'{col}.')
horiz_exploded.index = df.index
df = pd.concat([df, horiz_exploded], axis=1).drop(columns=[col])
new_columns.extend(horiz_exploded.columns) # inplace
for col in list_columns:
#print(f"exploding: {col}")
df = df.drop(columns=[col]).join(df[col].explode().to_frame())
new_columns.append(col)
s = (df[new_columns].applymap(type) == list).all()
list_columns = s[s].index.tolist()
s = (df[new_columns].applymap(type) == dict).all()
dict_columns = s[s].index.tolist()
return df
by doing this:
df1= flatten_nested_json_df(df)
where
df = pd.json_normalize(json)
That should give you all the information contained in your json.
How do I unpack multiple levels using json_normalize in python pandas?
Fix your dictionary first, it's not consistent, this makes it consistent:
for i, x in enumerate(data):
x = x['Source'][0]['Movies']
if not isinstance(x, list):
data[i]['Source'][0]['Movies'] = [x]
Then json_normalize
works just fine:
df = pd.json_normalize(data, ['Source','Movies'], ['Name', 'Year', 'Location'])
print(df)
Output:
MovieNumber Money Percent Name Year Location
0 1 1000 10 Rocco 2020 Itay
1 1 2000 10 Anja 2021 Germany
2 2 3000 10 Anja 2021 Germany
3 1 1000 10 Kasia 2021 Poland
4 2 1000 10 Kasia 2021 Poland
5 3 1000 10 Kasia 2021 Poland
What my code actually did, Before:
[
{
"Name": "Rocco",
"Year": 2020,
"Location": "Itay",
"Source": [
{
"Movies": # Here, Movies isn't a list.
{"MovieNumber": 1, "Money": 1000, "Percent": 10}
}
]
},
{
"Name": "Anja",
"Year": 2021,
"Location": "Germany",
"Source": [
{
"Movies": [ # Here, Movies is a list.
{"MovieNumber": 1, "Money": 2000, "Percent": 10},
{"MovieNumber": 2, "Money": 3000, "Percent": 10}
]
}
]
}
]
After:
[
{
"Name": "Rocco",
"Year": 2020,
"Location": "Itay",
"Source": [
{
"Movies": [ # Now this is a list.
{"MovieNumber": 1, "Money": 1000, "Percent": 10}
]
}
]
},
{
"Name": "Anja",
"Year": 2021,
"Location": "Germany",
"Source": [
{
"Movies": [ # And this remains unchanged.
{"MovieNumber": 1, "Money": 2000, "Percent": 10},
{"MovieNumber": 2, "Money": 3000, "Percent": 10 }
]
}
]
}
]
So all I did was force all Source.Movies
to be lists, by putting the contents in a list if it wasn't already a list.
Python pandas normalize this Json into pandas
Use:
gateio = pd.json_normalize(e)
gateio.columns = gateio.columns.str.split('.', expand=True)
df = gateio.rename_axis(('symbol', None), axis=1).stack(0).droplevel(0).reset_index()
print(df)
symbol baseVolume high24hr highestBid \
0 100x_usdt 0 0
1 10set_eth 0 0
2 10set_usdt 78055.955772115 2.334 2.3189
3 1art_usdt 84629.671759612 0.020476 0.020051
4 1earth_eth 0 0
... ... ... ...
3023 zrx_usd 378.6665316 0.3075 0.3036
3024 zrx_usdt 21064.601829316 0.3074 0.3038
3025 zsc_eth 6.5764445243 0.00000006666 0.00000005859
3026 zsc_usdt 12105.551030017 0.000099271 0.00009592
3027 ztg_usdt 17735.456307939 0.10993 0.0993
last low24hr lowestAsk percentChange \
0 0.00000001677 0 0
1 0 0 0
2 2.3258 2.25 2.3315 0.54
3 0.020139 0.019922 0.020318 -0.62
4 0 0 0
... ... ... ...
3023 0.3053 0.2919 0.3048 4.05
3024 0.3046 0.2923 0.3043 4.35
3025 0.00000006116 0.00000005942 0.00000006438 -7.91
3026 0.000098951 0.000095918 0.000101036 2.53
3027 0.09977 0.09929 0.1003 -7.96
quoteVolume result
0 0 true
1 0 true
2 34176.76678812 true
3 4186530.9550705 true
4 0 true
... ...
3023 1250.925 true
3024 69748.810196325 true
3025 105661371 true
3026 125394404.8585 true
3027 169037.51711601 true
[3028 rows x 10 columns]
Another idea is create DataFrame
by constructor and pivoting:
gateio = requests.get("https://data.gateapi.io/api2/1/tickers")
e = gateio.json()
df = pd.DataFrame([(k,k1, v1) for k, v in e.items() for k1, v1 in v.items()]).pivot(0,1,2)
print(df)
1 baseVolume high24hr highestBid last \
0
100x_usdt 0 0 0.00000001677
10set_eth 0 0 0
10set_usdt 77135.369425029 2.334 2.3189 2.324
1art_usdt 85135.129113461 0.020476 0.020073 0.020231
1earth_eth 0 0 0
... ... ... ...
zrx_usd 378.7539874 0.3075 0.3031 0.3036
zrx_usdt 20969.605384316 0.3074 0.3034 0.3048
zsc_eth 6.54257544205 0.00000006666 0.00000005891 0.00000006175
zsc_usdt 12071.777701317 0.000099271 0.00009592 0.00009804
ztg_usdt 17614.164813459 0.10918 0.0993 0.0998
1 low24hr lowestAsk percentChange quoteVolume result
0
100x_usdt 0 0 0 true
10set_eth 0 0 0 true
10set_usdt 2.25 2.3303 0.31 33779.242174485 true
1art_usdt 0.019922 0.02037 0.32 4211596.8280705 true
1earth_eth 0 0 0 true
... ... ... ... ...
zrx_usd 0.2919 0.3046 3.47 1251.201 true
zrx_usdt 0.2923 0.3041 4.27 69423.160196325 true
zsc_eth 0.00000005942 0.00000006479 -7.18 105182158 true
zsc_usdt 0.000095918 0.000100982 1.6 125041663.4785 true
ztg_usdt 0.09929 0.1002 -8.8 167942.13011601 true
[3028 rows x 9 columns]
How to normalize a nested JSON key into a pandas dataframe
- The
'results'
key
is a 1 elementlist
, so'members'
can be normalized by selecting the'members'
key from thedict
at index 0.
import pandas as pd
import requests
# Requesting data trhough API
payload = {'X-API-Key': '...'}
terms = '"trade war"AND"China"'
index = str(0) # 440 is last offset for this call
response = requests.get('https://api.propublica.org/congress/v1/116/house/members.json', headers=payload)
# extract the json data from the response
json_data = response.json()
# normalize only members
members = pd.json_normalize(data=json_data['results'][0]['members'])
# alternatively: normalize members and the preceding keys
members = pd.json_normalize(data=json_data['results'][0], record_path=['members'], meta=['congress', 'chamber', 'num_results', 'offset'])
display(members)
id title short_title api_uri first_name middle_name last_name suffix date_of_birth gender party leadership_role twitter_account facebook_account youtube_account govtrack_id cspan_id votesmart_id icpsr_id crp_id google_entity_id fec_candidate_id url rss_url contact_form in_office cook_pvi dw_nominate ideal_point seniority next_election total_votes missed_votes total_present last_updated ocd_id office phone fax state district at_large geoid missed_votes_pct votes_with_party_pct votes_against_party_pct
0 A000374 Representative Rep. https://api.propublica.org/congress/v1/members/A000374.json Ralph None Abraham None 1954-09-16 M R RepAbraham CongressmanRalphAbraham None 412630 76236 155414 21522 N00036633 /m/012dwd7_ H4LA05221 https://abraham.house.gov https://abraham.house.gov/rss.xml None False R+15 0.541 None 6 2020 954.0 377.0 0.0 2020-12-31 18:30:50 -0500 ocd-division/country:us/state:la/cd:5 417 Cannon House Office Building 202-225-8490 None LA 5 False 2205 39.52 94.93 4.90
1 A000370 Representative Rep. https://api.propublica.org/congress/v1/members/A000370.json Alma None Adams None 1946-05-27 F D None RepAdams CongresswomanAdams None 412607 76386 5935 21545 N00035451 /m/02b45d H4NC12100 https://adams.house.gov https://adams.house.gov/rss.xml None False D+18 -0.465 None 8 2020 954.0 26.0 0.0 2020-12-31 18:30:55 -0500 ocd-division/country:us/state:nc/cd:12 2436 Rayburn House Office Building 202-225-1510 None NC 12 False 3712 2.73 99.24 0.65
2 A000055 Representative Rep. https://api.propublica.org/congress/v1/members/A000055.json Robert B. Aderholt None 1965-07-22 M R None Robert_Aderholt RobertAderholt RobertAderholt 400004 45516 441 29701 N00003028 /m/024p03 H6AL04098 https://aderholt.house.gov https://aderholt.house.gov/rss.xml None False R+30 0.369 None 24 2020 954.0 71.0 0.0 2020-12-31 18:30:49 -0500 ocd-division/country:us/state:al/cd:4 1203 Longworth House Office Building 202-225-4876 None AL 4 False 0104 7.44 93.60 6.29
3 A000371 Representative Rep. https://api.propublica.org/congress/v1/members/A000371.json Pete None Aguilar None 1979-06-19 M D None reppeteaguilar reppeteaguilar None 412615 79994 70114 21506 N00033997 /m/0jwv0xf H2CA31125 https://aguilar.house.gov https://aguilar.house.gov/rss.xml None False D+8 -0.291 None 6 2020 954.0 9.0 0.0 2020-12-31 18:30:52 -0500 ocd-division/country:us/state:ca/cd:31 109 Cannon House Office Building 202-225-3201 None CA 31 False 0631 0.94 97.45 2.44
4 A000372 Representative Rep. https://api.propublica.org/congress/v1/members/A000372.json Rick None Allen None 1951-11-07 M R None reprickallen CongressmanRickAllen None 412625 62545 136062 21516 N00033720 /m/0127y9dk H2GA12121 https://allen.house.gov None None False R+9 0.679 None 6 2020 954.0 15.0 0.0 2020-12-31 18:30:49 -0500 ocd-division/country:us/state:ga/cd:12 2400 Rayburn House Office Building 202-225-2823 None GA 12 False 1312 1.57 92.26 7.63
5 A000376 Representative Rep. https://api.propublica.org/congress/v1/members/A000376.json Colin None Allred None 1983-04-15 M D None RepColinAllred None None 412828 None 177357 None N00040989 /m/03d066b H8TX32098 https://allred.house.gov None None False R+5 NaN None 2 2020 954.0 29.0 0.0 2020-12-31 18:30:52 -0500 ocd-division/country:us/state:tx/cd:32 328 Cannon House Office Building 202-225-2231 None TX 32 False 4832 3.04 97.72 2.17
6 A000367 Representative Rep. https://api.propublica.org/congress/v1/members/A000367.json Justin None Amash None 1980-04-18 M I justinamash repjustinamash repjustinamash 412438 1033767 105566 21143 N00031938 /m/0c00p_n https://amash.house.gov https://amash.house.gov/rss.xml None False R+6 NaN None 10 2020 524.0 0.0 10.0 2020-12-31 18:30:47 -0500 ocd-division/country:us/state:mi/cd:3 None None None MI 3 False 2603 0.00 58.49 41.51
7 A000367 Representative Rep. https://api.propublica.org/congress/v1/members/A000367.json Justin None Amash None 1980-04-18 M R justinamash repjustinamash repjustinamash 412438 1033767 105566 21143 N00031938 /m/0c00p_n H0MI03126 https://amash.house.gov https://amash.house.gov/rss.xml None False None 0.654 None 10 2020 430.0 0.0 5.0 2020-12-28 21:04:36 -0500 ocd-division/country:us/state:mi/cd:3 106 Cannon House Office Building 202-225-3831 None MI 3 False 2603 0.00 61.97 37.79
8 A000369 Representative Rep. https://api.propublica.org/congress/v1/members/A000369.json Mark None Amodei None 1958-06-12 M R None MarkAmodeiNV2 MarkAmodeiNV2 markamodeinv2 412500 62817 12537 21196 N00031177 /m/03bzdkn H2NV02395 https://amodei.house.gov https://amodei.house.gov/rss/news-releases.xml None False R+7 0.384 None 10 2020 954.0 36.0 0.0 2020-12-31 18:30:49 -0500 ocd-division/country:us/state:nv/cd:2 104 Cannon House Office Building 202-225-6155 None NV 2 False 3202 3.77 92.63 7.26
9 A000377 Representative Rep. https://api.propublica.org/congress/v1/members/A000377.json Kelly None Armstrong None 1976-10-08 M R None RepArmstrongND None None 412794 None 139338 None N00042868 /g/11hcszksh3 H8ND00096 https://armstrong.house.gov None None False R+16 NaN None 2 2020 954.0 33.0 0.0 2020-12-31 18:30:49 -0500 ocd-division/country:us/state:nd/cd:1 1004 Longworth House Office Building 202-225-2611 None ND At-Large True 3800 3.46 93.31 6.58
Related Topics
Beautiful Soup 4 Find_All Don't Find Links That Beautiful Soup 3 Finds
How to Create an Object and Add Attributes to It
Pyplot Move Alternative Y Axis to Background
How to Remove Leading Whitespace in Python
Timedelta to String Type in Pandas Dataframe
Importerror: No Module Named Win32Api
Get Files Names Inside a Zip File on Ftp Server Without Downloading Whole Archive
Cx_Freeze Crashing Python 3.7.0
How to Correctly Parse Utf-8 Encoded HTML to Unicode Strings with Beautifulsoup
How to Interact with the Recaptcha Audio Element Using Selenium and Python
Python: How to Ignore an Exception and Proceed
Pygame: Problems with Shooting in Space Invaders
List of All Available Matplotlib Backends
When to Use Sys.Path.Append and When Modifying %Pythonpath% Is Enough
Zip with List Output Instead of Tuple
Why Is the Exit Window Button Work But the Exit Button in the Game Does Not Work