Flatten Nested Lists in a List

How do I make a flat list out of a list of lists?

Given a list of lists l,

flat_list = [item for sublist in l for item in sublist]

which means:

flat_list = []
for sublist in l:
for item in sublist:
flat_list.append(item)

is faster than the shortcuts posted so far. (l is the list to flatten.)

Here is the corresponding function:

def flatten(l):
return [item for sublist in l for item in sublist]

As evidence, you can use the timeit module in the standard library:

$ python -mtimeit -s'l=[[1,2,3],[4,5,6], [7], [8,9]]*99' '[item for sublist in l for item in sublist]'
10000 loops, best of 3: 143 usec per loop
$ python -mtimeit -s'l=[[1,2,3],[4,5,6], [7], [8,9]]*99' 'sum(l, [])'
1000 loops, best of 3: 969 usec per loop
$ python -mtimeit -s'l=[[1,2,3],[4,5,6], [7], [8,9]]*99' 'reduce(lambda x,y: x+y,l)'
1000 loops, best of 3: 1.1 msec per loop

Explanation: the shortcuts based on + (including the implied use in sum) are, of necessity, O(L**2) when there are L sublists -- as the intermediate result list keeps getting longer, at each step a new intermediate result list object gets allocated, and all the items in the previous intermediate result must be copied over (as well as a few new ones added at the end). So, for simplicity and without actual loss of generality, say you have L sublists of I items each: the first I items are copied back and forth L-1 times, the second I items L-2 times, and so on; total number of copies is I times the sum of x for x from 1 to L excluded, i.e., I * (L**2)/2.

The list comprehension just generates one list, once, and copies each item over (from its original place of residence to the result list) also exactly once.

Flatten nested lists in a list

Loop through the list, unlist recursively, then return as a list:

lapply(LIST2, function(i) list(unlist(i, recursive = TRUE)))

flatten nested list in pandas containing nan

This should work for any nested lists

from collections.abc import Iterable
def flatten(l):
for el in l:
if isinstance(el, Iterable) and not isinstance(el, (str, bytes)):
yield from flatten(el)
else:
yield el

So recreating your df

import pandas as pd
df = pd.DataFrame([[[[float('nan')],[float('nan'), 'DE']]],
[[[float('nan'), ['IT', 'DE']]]],
[[[['FR']]]],
[[[['AE'], float('nan'), ['AE', 'MT'], ['MX']]]]],columns=['country'])

df['country'] = df['country'].apply(lambda x:list(set(flatten(x)))).apply(lambda x: [i for i in x if str(i) != 'nan'])

gives the following output

    country
0 [DE]
1 [IT, DE]
2 [FR]
3 [AE, MT, MX]

Flattening List of Dict containing multiple nested lists using pandas json_normalize

You can use:

df = pd.json_normalize(users_info)
df_addresses = df['Addresses.records'].explode().apply(pd.Series)
df_addresses.rename(columns={col:f'Addresses.{col}' for col in df_addresses.columns}, inplace=True)

df_education = df['Education.records'].explode().apply(pd.Series)
df_education.rename(columns={col:f'Education.{col}' for col in df_education.columns}, inplace=True)

cols = [col for col in df.columns if col not in ['Addresses.records', 'Education.records']]
df = df[cols].join(df_addresses).join(df_education)
df.dropna(axis=1, how='all', inplace=True)
print(df)

OUTPUT

Id Name Country.Name Addresses.addressId Addresses.line1 Addresses.city Education.Degree Education.Id
0 21 ABC Country 1 12 xyz, 102 PQR Bachelors 45
0 21 ABC Country 1 12 xyz, 102 PQR Masters 49
0 21 ABC Country 1 13 YTR, 102 NMS Bachelors 45
0 21 ABC Country 1 13 YTR, 102 NMS Masters 49
1 26 PEW Country 2 10 BTR, 12 UYT Bachelors 45
1 26 PEW Country 2 10 BTR, 12 UYT Masters 49
1 26 PEW Country 2 123 MEQW, 6 KJH Bachelors 45
1 26 PEW Country 2 123 MEQW, 6 KJH Masters 49
2 214 TUF NaN NaN NaN NaN Bachelors 45
2 214 TUF NaN NaN NaN NaN Masters 49
3 2609 JJU Country 2 10 BTR, 12 UYT NaN NaN
3 2609 JJU Country 2 123 MEQW, 6 KJH NaN NaN


Related Topics



Leave a reply



Submit