How to Normalize JSON Correctly by Python Pandas

How to normalize json correctly by Python Pandas

You could just pass data without any extra params.

df = pd.io.json.json_normalize(data)
df

   complete    mid.c    mid.h    mid.l    mid.o                  time  volume
0      True  119.743  119.891  119.249  119.341  1488319200.000000000   14651
1      True  119.893  119.954  119.552  119.738  1488348000.000000000   10738
2      True  119.946  120.221  119.840  119.888  1488376800.000000000   10041

If you want to change the column order, use df.reindex:

df = df.reindex(columns=['time', 'volume', 'complete', 'mid.h', 'mid.l', 'mid.c', 'mid.o'])
df

                   time  volume  complete    mid.h    mid.l    mid.c    mid.o
0  1488319200.000000000   14651      True  119.891  119.249  119.743  119.341
1  1488348000.000000000   10738      True  119.954  119.552  119.893  119.738
2  1488376800.000000000   10041      True  120.221  119.840  119.946  119.888

Pandas: JSON Normalize with brackets around the JSON?

If your json objects are under the xd columns, you can exctract that json, which is a list of dictionaries. A list of dictionaries can be used to create a dataframe object, from here.

list_of_dicts = list_of_dicts=list(map(lambda l: l[0], df['xd'].to_list()))
expected = pd.Dataframe(list_of_dicts)

Does this answer your question?

How to read and normalize following json in pandas?

Here is another way:

df = pd.read_json(r'C:\path\file.json')

final=df.stack().str[0].unstack()
final=final.assign(cities=final['cities'].str.split(',')).explode('cities')
final=final.assign(**pd.DataFrame(final.pop('user').str[0].tolist()))
print(final)

      session_id unix_timestamp            cities  user_id joining_date  \
0  X061RFWB06K9V     1442503708       New York NY     2024   2015-03-22   
0  X061RFWB06K9V     1442503708         Newark NJ     2024   2015-03-22   
1  5AZ2X2A9BHH5U     1441353991       New York NY     2024   2015-03-22   
1  5AZ2X2A9BHH5U     1441353991    Jersey City NJ     2024   2015-03-22   
1  5AZ2X2A9BHH5U     1441353991   Philadelphia PA     2024   2015-03-22   

  country  
0      UK  
0      UK  
1      UK  
1      UK  
1      UK

Normalizing nested JSON object into Pandas dataframe

Personally, I would not use pd.json_normalize for this case. Your JSON is quite complex, and unless you're really experienced with json_normalize, the following code may take less time to understand for the average dev. In fact, you don't even need to see the JSON to understand exactly what this code does (although it would certainly help ;).

First, we can extract the objects (portfolios and their children) from the JSON into a list, and use a series of steps to get them in the right form and order:

def prep_obj(o):
    """Prepares an object (portfolio/child) from the JSON to be inserted into a dataframe."""
    return {
        'New Entity Group': o['name'],
    } | o['columns']

# Get a list of lists, where each sub-list contains the portfolio object at index 0 and then the portfolio object's children:
groups = [[prep_obj(o), *[prep_obj(child) for child in o['children']]] for o in api_response['data']['attributes']['total']['children']]

# Sort the portfolio groups by their number:
groups.sort(key=lambda g: int(g[0]['New Entity Group'].split('_')[1]))

# Reverse the children of each portfolio group:
groups = [[g[0]] + g[1:][::-1] for g in groups]

# Flatten out the groups into one large list of objects:
objects = [obj for group in groups for obj in group]
# The above is exactly equivalent to the following:
#   objects = []
#   for group in groups:
#       for obj in group:
#           objects.append(obj)

Next, create the dataframe:

# Create a mapping for column names so that their display names can be used:
mapping = {col['key']: col['display_name'] for col in api_response['meta']['columns']}

# Create a dataframe from the list of objects:
df = pd.DataFrame(objects)

# Correct column names:
df = df.rename(mapping, axis=1)
# Reorder columns:
column_names = ["New Entity Group", "Entity ID", "Adjusted Value (1/31/2022, No Div, USD)", "Adjusted TWR (Current Quarter, No Div, USD)", "Adjusted TWR (YTD, No Div, USD)", "Annualized Adjusted TWR (Since Inception, No Div, USD)", "Inception Date", "Risk Target"]
df = df[column_names]

And formatting:

def format_twr_col(col):
    return (
        col
        .abs()
        .mul(100)
        .round(2)
        .pipe(lambda s: s.where(s.eq(0) | s.isna(), '(' + s.astype(str) + '%)'))
        .pipe(lambda s: s.where(s.ne(0) | s.isna(), s.astype(str) + '%'))
        .fillna('-')
    )

def format_value_col(col):
    positive_mask = col.ge(0)

    col[positive_mask] = (
        col[positive_mask]
        .round()
        .astype(int)
        .map('${:,}'.format)
    )

    col[~positive_mask] = (
        col[~positive_mask]
        .astype(float)
        .round()
        .astype(int)
        .abs()
        .map('(${:,})'.format)
    )
    
    return col

df['Adjusted TWR (Current Quarter, No Div, USD)'] = format_twr_col(df['Adjusted TWR (Current Quarter, No Div, USD)'])
df['Annualized Adjusted TWR (Since Inception, No Div, USD)'] = format_twr_col(df['Annualized Adjusted TWR (Since Inception, No Div, USD)'])
df['Adjusted TWR (YTD, No Div, USD)'] = format_twr_col(df['Adjusted TWR (YTD, No Div, USD)'])

df['Adjusted Value (1/31/2022, No Div, USD)'] = format_value_col(df['Adjusted Value (1/31/2022, No Div, USD)'].copy())

df['Inception Date'] = pd.to_datetime(df['Inception Date']).dt.strftime('%b %d, %Y')

df['Entity ID'] = df['Entity ID'].fillna('')

And... voilà:

>>> pd.options.display.max_columns = None
>>> df
         New Entity Group Entity ID Adjusted Value (1/31/2022, No Div, USD)  Adjusted TWR (Current Quarter, No Div, USD) Adjusted TWR (YTD, No Div, USD)  Annualized Adjusted TWR (Since Inception, No Div, USD) Inception Date  Risk Target
0             Portfolio_1                                          $260,786                                     (44.55%)                        (44.55%)                                            (44.55%)       Apr 07, 2021          N/A
1  The FW Irrev Family Tr   9552252                                $260,786                                         0.0%                            0.0%                                                0.0%       Jan 11, 2022          N/A
2             Portfolio_2                                       $18,396,664                                      (5.78%)                         (5.78%)                                             (5.47%)       Sep 03, 2021       Growth
3                  FW DAF  10946585                             $18,396,664                                      (5.78%)                         (5.78%)                                             (5.47%)       Sep 03, 2021       Growth
4             Portfolio_3                                       $60,143,818                                      (4.42%)                         (4.42%)                                             (7.75%)       Dec 17, 2020          NaN
5     The FW Family Trust  13014080                                $475,356                                       (6.1%)                          (6.1%)                                             (3.97%)       Apr 09, 2021   Aggressive
6       FW Liquid Fund LP  13396796                             $52,899,527                                      (4.15%)                         (4.15%)                                             (4.15%)       Dec 30, 2021   Aggressive
7   FW Holdings No. 2 LLC   8413655                              $6,768,937                                      (0.77%)                         (0.77%)                                            (11.84%)       Mar 05, 2021          N/A
8         FW and FR Joint   9957007                                    ($1)                                            -                               -                                                   -       Dec 21, 2021          N/A

How to normalize a nested json with json_normalize

Use pandas.json_normalize()
The following code uses pandas v.1.2.4
If you don't want the other columns, remove the list of keys assigned to meta
Use pandas.DataFrame.drop to remove any other unwanted columns from df.

import pandas as pd

df = pd.json_normalize(data, record_path=['results', 'docs'], meta=[['results', 'name'], 'numberOfResults'])

display(df)
                                 id     type category    media                   label                    title subtitle results.name numberOfResults
0                       RAKDI342342  Culture  Culture  unknown            exampellabel  testtitle and titletest  Archive       single             376
1  GUI6N5QHBPTO6GJ66VP5OXB7GKX6J7ER  Culture  Culture    image  more label als example    test the second title  picture       single             376

Data

The posted JSON / Dict is not correctly formed
Assuming the following corrected form

data = \
{'numberOfResults': 376,
 'results': [{'docs': [{'category': 'Culture',
                        'id': 'RAKDI342342',
                        'label': 'exampellabel',
                        'media': 'unknown',
                        'subtitle': 'Archive',
                        'title': 'testtitle and titletest',
                        'type': 'Culture'},
                       {'category': 'Culture',
                        'id': 'GUI6N5QHBPTO6GJ66VP5OXB7GKX6J7ER',
                        'label': 'more label als example',
                        'media': 'image',
                        'subtitle': 'picture',
                        'title': 'test the second title',
                        'type': 'Culture'}],
              'name': 'single'}]}

Pandas JSON Normalize - Choose Correct Record Path

You can try to apply the following function to your json:

def flatten_nested_json_df(df):
    df = df.reset_index()
    s = (df.applymap(type) == list).all()
    list_columns = s[s].index.tolist()
    
    s = (df.applymap(type) == dict).all()
    dict_columns = s[s].index.tolist()

    
    while len(list_columns) > 0 or len(dict_columns) > 0:
        new_columns = []

        for col in dict_columns:
            horiz_exploded = pd.json_normalize(df[col]).add_prefix(f'{col}.')
            horiz_exploded.index = df.index
            df = pd.concat([df, horiz_exploded], axis=1).drop(columns=[col])
            new_columns.extend(horiz_exploded.columns) # inplace

        for col in list_columns:
            #print(f"exploding: {col}")
            df = df.drop(columns=[col]).join(df[col].explode().to_frame())
            new_columns.append(col)

        s = (df[new_columns].applymap(type) == list).all()
        list_columns = s[s].index.tolist()

        s = (df[new_columns].applymap(type) == dict).all()
        dict_columns = s[s].index.tolist()
    return df

by doing this:

df1= flatten_nested_json_df(df)

where

df = pd.json_normalize(json)

That should give you all the information contained in your json.

How do I unpack multiple levels using json_normalize in python pandas?

Fix your dictionary first, it's not consistent, this makes it consistent:

for i, x in enumerate(data):
    x = x['Source'][0]['Movies']
    if not isinstance(x, list):
        data[i]['Source'][0]['Movies'] = [x]

Then json_normalize works just fine:

df = pd.json_normalize(data, ['Source','Movies'], ['Name', 'Year', 'Location'])
print(df)

Output:

   MovieNumber  Money  Percent   Name  Year Location
0            1   1000       10  Rocco  2020     Itay
1            1   2000       10   Anja  2021  Germany
2            2   3000       10   Anja  2021  Germany
3            1   1000       10  Kasia  2021   Poland
4            2   1000       10  Kasia  2021   Poland
5            3   1000       10  Kasia  2021   Poland

What my code actually did, Before:

[
  {
    "Name": "Rocco",
    "Year": 2020,
    "Location": "Itay",
    "Source": [
      {
        "Movies": # Here, Movies isn't a list.
          {"MovieNumber": 1, "Money": 1000, "Percent": 10}
      }
    ]
  },
  {
    "Name": "Anja",
    "Year": 2021,
    "Location": "Germany",
    "Source": [
      {
        "Movies": [ # Here, Movies is a list.
          {"MovieNumber": 1, "Money": 2000, "Percent": 10},
          {"MovieNumber": 2, "Money": 3000, "Percent": 10}
        ]
      }
    ]
  }
]

After:

[
  {
    "Name": "Rocco",
    "Year": 2020,
    "Location": "Itay",
    "Source": [
      {
        "Movies": [ # Now this is a list.
          {"MovieNumber": 1, "Money": 1000, "Percent": 10}
        ]
      }
    ]
  },
  {
    "Name": "Anja",
    "Year": 2021,
    "Location": "Germany",
    "Source": [
      {
        "Movies": [ # And this remains unchanged.
          {"MovieNumber": 1, "Money": 2000, "Percent": 10},
          {"MovieNumber": 2, "Money": 3000, "Percent": 10 }
        ]
      }
    ]
  }
]

So all I did was force all Source.Movies to be lists, by putting the contents in a list if it wasn't already a list.

Python pandas normalize this Json into pandas

Use:

gateio = pd.json_normalize(e)
gateio.columns = gateio.columns.str.split('.', expand=True)
df = gateio.rename_axis(('symbol', None), axis=1).stack(0).droplevel(0).reset_index()

print(df)
          symbol       baseVolume       high24hr     highestBid  \
0      100x_usdt                0              0                  
1      10set_eth                0              0                  
2     10set_usdt  78055.955772115          2.334         2.3189   
3      1art_usdt  84629.671759612       0.020476       0.020051   
4     1earth_eth                0              0                  
         ...              ...            ...            ...   
3023     zrx_usd      378.6665316         0.3075         0.3036   
3024    zrx_usdt  21064.601829316         0.3074         0.3038   
3025     zsc_eth     6.5764445243  0.00000006666  0.00000005859   
3026    zsc_usdt  12105.551030017    0.000099271     0.00009592   
3027    ztg_usdt  17735.456307939        0.10993         0.0993   

               last        low24hr      lowestAsk percentChange  \
0     0.00000001677              0                            0   
1                 0              0                            0   
2            2.3258           2.25         2.3315          0.54   
3          0.020139       0.019922       0.020318         -0.62   
4                 0              0                            0   
            ...            ...            ...           ...   
3023         0.3053         0.2919         0.3048          4.05   
3024         0.3046         0.2923         0.3043          4.35   
3025  0.00000006116  0.00000005942  0.00000006438         -7.91   
3026    0.000098951    0.000095918    0.000101036          2.53   
3027        0.09977        0.09929         0.1003         -7.96   

          quoteVolume result  
0                   0   true  
1                   0   true  
2      34176.76678812   true  
3     4186530.9550705   true  
4                   0   true  
              ...    ...  
3023         1250.925   true  
3024  69748.810196325   true  
3025        105661371   true  
3026   125394404.8585   true  
3027  169037.51711601   true  

[3028 rows x 10 columns]

Another idea is create DataFrame by constructor and pivoting:

gateio = requests.get("https://data.gateapi.io/api2/1/tickers")
e = gateio.json()
df = pd.DataFrame([(k,k1, v1) for k, v in e.items() for k1, v1 in v.items()]).pivot(0,1,2)
print(df)
1                baseVolume       high24hr     highestBid           last  \
0                                                                          
100x_usdt                 0              0                 0.00000001677   
10set_eth                 0              0                             0   
10set_usdt  77135.369425029          2.334         2.3189          2.324   
1art_usdt   85135.129113461       0.020476       0.020073       0.020231   
1earth_eth                0              0                             0   
                    ...            ...            ...            ...   
zrx_usd         378.7539874         0.3075         0.3031         0.3036   
zrx_usdt    20969.605384316         0.3074         0.3034         0.3048   
zsc_eth       6.54257544205  0.00000006666  0.00000005891  0.00000006175   
zsc_usdt    12071.777701317    0.000099271     0.00009592     0.00009804   
ztg_usdt    17614.164813459        0.10918         0.0993         0.0998   

1                 low24hr      lowestAsk percentChange      quoteVolume result  
0                                                                               
100x_usdt               0                            0                0   true  
10set_eth               0                            0                0   true  
10set_usdt           2.25         2.3303          0.31  33779.242174485   true  
1art_usdt        0.019922        0.02037          0.32  4211596.8280705   true  
1earth_eth              0                            0                0   true  
                  ...            ...           ...              ...    ...  
zrx_usd            0.2919         0.3046          3.47         1251.201   true  
zrx_usdt           0.2923         0.3041          4.27  69423.160196325   true  
zsc_eth     0.00000005942  0.00000006479         -7.18        105182158   true  
zsc_usdt      0.000095918    0.000100982           1.6   125041663.4785   true  
ztg_usdt          0.09929         0.1002          -8.8  167942.13011601   true  

[3028 rows x 9 columns]

How to normalize a nested JSON key into a pandas dataframe

The 'results' key is a 1 element list, so 'members' can be normalized by selecting the 'members' key from the dict at index 0.

import pandas as pd
import requests

# Requesting data trhough API
payload = {'X-API-Key': '...'} 
terms = '"trade war"AND"China"'
index = str(0)  # 440 is last offset for this call

response = requests.get('https://api.propublica.org/congress/v1/116/house/members.json', headers=payload)

# extract the json data from the response
json_data = response.json()

# normalize only members
members = pd.json_normalize(data=json_data['results'][0]['members'])

# alternatively: normalize members and the preceding keys
members = pd.json_normalize(data=json_data['results'][0], record_path=['members'], meta=['congress', 'chamber', 'num_results', 'offset'])

`display(members)`

        id           title short_title                                                      api_uri first_name middle_name  last_name suffix date_of_birth gender party leadership_role  twitter_account         facebook_account youtube_account govtrack_id cspan_id votesmart_id icpsr_id     crp_id google_entity_id fec_candidate_id                          url                                         rss_url contact_form  in_office cook_pvi  dw_nominate ideal_point seniority next_election  total_votes  missed_votes  total_present               last_updated                                  ocd_id                                office         phone   fax state  district  at_large geoid  missed_votes_pct  votes_with_party_pct  votes_against_party_pct
0  A000374  Representative        Rep.  https://api.propublica.org/congress/v1/members/A000374.json      Ralph        None    Abraham   None    1954-09-16      M     R                       RepAbraham  CongressmanRalphAbraham            None      412630    76236       155414    21522  N00036633      /m/012dwd7_        H4LA05221    https://abraham.house.gov               https://abraham.house.gov/rss.xml         None      False     R+15        0.541        None         6          2020        954.0         377.0            0.0  2020-12-31 18:30:50 -0500   ocd-division/country:us/state:la/cd:5      417 Cannon House Office Building  202-225-8490  None    LA         5     False  2205             39.52                 94.93                     4.90
1  A000370  Representative        Rep.  https://api.propublica.org/congress/v1/members/A000370.json       Alma        None      Adams   None    1946-05-27      F     D            None         RepAdams       CongresswomanAdams            None      412607    76386         5935    21545  N00035451        /m/02b45d        H4NC12100      https://adams.house.gov                 https://adams.house.gov/rss.xml         None      False     D+18       -0.465        None         8          2020        954.0          26.0            0.0  2020-12-31 18:30:55 -0500  ocd-division/country:us/state:nc/cd:12    2436 Rayburn House Office Building  202-225-1510  None    NC        12     False  3712              2.73                 99.24                     0.65
2  A000055  Representative        Rep.  https://api.propublica.org/congress/v1/members/A000055.json     Robert          B.   Aderholt   None    1965-07-22      M     R            None  Robert_Aderholt           RobertAderholt  RobertAderholt      400004    45516          441    29701  N00003028        /m/024p03        H6AL04098   https://aderholt.house.gov              https://aderholt.house.gov/rss.xml         None      False     R+30        0.369        None        24          2020        954.0          71.0            0.0  2020-12-31 18:30:49 -0500   ocd-division/country:us/state:al/cd:4  1203 Longworth House Office Building  202-225-4876  None    AL         4     False  0104              7.44                 93.60                     6.29
3  A000371  Representative        Rep.  https://api.propublica.org/congress/v1/members/A000371.json       Pete        None    Aguilar   None    1979-06-19      M     D            None   reppeteaguilar           reppeteaguilar            None      412615    79994        70114    21506  N00033997       /m/0jwv0xf        H2CA31125    https://aguilar.house.gov               https://aguilar.house.gov/rss.xml         None      False      D+8       -0.291        None         6          2020        954.0           9.0            0.0  2020-12-31 18:30:52 -0500  ocd-division/country:us/state:ca/cd:31      109 Cannon House Office Building  202-225-3201  None    CA        31     False  0631              0.94                 97.45                     2.44
4  A000372  Representative        Rep.  https://api.propublica.org/congress/v1/members/A000372.json       Rick        None      Allen   None    1951-11-07      M     R            None     reprickallen     CongressmanRickAllen            None      412625    62545       136062    21516  N00033720      /m/0127y9dk        H2GA12121      https://allen.house.gov                                            None         None      False      R+9        0.679        None         6          2020        954.0          15.0            0.0  2020-12-31 18:30:49 -0500  ocd-division/country:us/state:ga/cd:12    2400 Rayburn House Office Building  202-225-2823  None    GA        12     False  1312              1.57                 92.26                     7.63
5  A000376  Representative        Rep.  https://api.propublica.org/congress/v1/members/A000376.json      Colin        None     Allred   None    1983-04-15      M     D            None   RepColinAllred                     None            None      412828     None       177357     None  N00040989       /m/03d066b        H8TX32098     https://allred.house.gov                                            None         None      False      R+5          NaN        None         2          2020        954.0          29.0            0.0  2020-12-31 18:30:52 -0500  ocd-division/country:us/state:tx/cd:32      328 Cannon House Office Building  202-225-2231  None    TX        32     False  4832              3.04                 97.72                     2.17
6  A000367  Representative        Rep.  https://api.propublica.org/congress/v1/members/A000367.json     Justin        None      Amash   None    1980-04-18      M     I                      justinamash           repjustinamash  repjustinamash      412438  1033767       105566    21143  N00031938       /m/0c00p_n                       https://amash.house.gov                 https://amash.house.gov/rss.xml         None      False      R+6          NaN        None        10          2020        524.0           0.0           10.0  2020-12-31 18:30:47 -0500   ocd-division/country:us/state:mi/cd:3                                  None          None  None    MI         3     False  2603              0.00                 58.49                    41.51
7  A000367  Representative        Rep.  https://api.propublica.org/congress/v1/members/A000367.json     Justin        None      Amash   None    1980-04-18      M     R                      justinamash           repjustinamash  repjustinamash      412438  1033767       105566    21143  N00031938       /m/0c00p_n        H0MI03126      https://amash.house.gov                 https://amash.house.gov/rss.xml         None      False     None        0.654        None        10          2020        430.0           0.0            5.0  2020-12-28 21:04:36 -0500   ocd-division/country:us/state:mi/cd:3      106 Cannon House Office Building  202-225-3831  None    MI         3     False  2603              0.00                 61.97                    37.79
8  A000369  Representative        Rep.  https://api.propublica.org/congress/v1/members/A000369.json       Mark        None     Amodei   None    1958-06-12      M     R            None    MarkAmodeiNV2            MarkAmodeiNV2   markamodeinv2      412500    62817        12537    21196  N00031177       /m/03bzdkn        H2NV02395     https://amodei.house.gov  https://amodei.house.gov/rss/news-releases.xml         None      False      R+7        0.384        None        10          2020        954.0          36.0            0.0  2020-12-31 18:30:49 -0500   ocd-division/country:us/state:nv/cd:2      104 Cannon House Office Building  202-225-6155  None    NV         2     False  3202              3.77                 92.63                     7.26
9  A000377  Representative        Rep.  https://api.propublica.org/congress/v1/members/A000377.json      Kelly        None  Armstrong   None    1976-10-08      M     R            None   RepArmstrongND                     None            None      412794     None       139338     None  N00042868    /g/11hcszksh3        H8ND00096  https://armstrong.house.gov                                            None         None      False     R+16          NaN        None         2          2020        954.0          33.0            0.0  2020-12-31 18:30:49 -0500   ocd-division/country:us/state:nd/cd:1  1004 Longworth House Office Building  202-225-2611  None    ND  At-Large      True  3800              3.46                 93.31                     6.58

How to Normalize JSON Correctly by Python Pandas