Converting a list of lists into a 2D numpy array
If your lists are NOT of the same length (in each nested dimension) you CANT do a traditional conversion to a NumPy array because it's necessary for a NumPy array of 2D or above to have the same number of elements in its first dimension.
So you cant convert [[1,2],[3,4,5]]
to a numpy array directly. Applying np.array
will give you a 2 element numpy array where each element is a list object as - array([list([1, 2]), list([3, 4, 5])], dtype=object)
. I believe this is the issue you are facing.
You cant create a 2D matrix for example that looks like -
[[1,2,3,?],
[4,5,6,7]]
What you may need to do is pad the elements of each list of lists of lists to a fixed length (equal lengths for each dimension) before converting to a NumPy array.
I would recommend iterating over each of the lists of lists of lists as done in the code I have written below to flatten your data, then transforming it the way you want.
If your lists are of the same length, then should not be a problem with numpy version 1.18.5
or above.
a = [[[1,2],[3,4]],[[5,6],[7,8]]]
np.array(a)
array([[[1, 2],
[3, 4]],
[[5, 6],
[7, 8]]])
However, if you are unable to still work with the list of list of lists, then you may need to iterate over each element first to flatten the list and then change it into a numpy array with the required shape as below -
a = [[[1,2],[3,4]],[[5,6],[7,8]]]
flat_a = [item for sublist in a for subsublist in sublist for item in subsublist]
np.array(flat_a).reshape(2,2,2)
array([[[1, 2],
[3, 4]],
[[5, 6],
[7, 8]]])
converting list of lists into 1-D numpy array of lists
In your first case, np.array
gives us a warning (in new enough numpy versions). That should tell us something - using np.array
to make ragged arrays is not ideal. np.array
is meant to create regular multidimensional arrays, with numeric (or string) dtypes. Creating an object dtype array like this a fallback option.
In [96]: sample_list = [["hello", "world"], ["foo"], ["alpha", "beta", "gamma"], []]
In [97]: arr = np.array(sample_list)
<ipython-input-97-ec7d58f98892>:1: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray.
arr = np.array(sample_list)
In [98]: arr
Out[98]:
array([list(['hello', 'world']), list(['foo']),
list(['alpha', 'beta', 'gamma']), list([])], dtype=object)
In many ways such an array is a debased list, not a true array.
In the second case it can work as intended (by the developers, if not you!):
In [99]: sample_list = [["hello"], ["world"], ["foo"], ["bar"]]
In [100]: arr = np.array(sample_list)
In [101]: arr
Out[101]:
array([['hello'],
['world'],
['foo'],
['bar']], dtype='<U5')
To work around that, I recommend making an object dtype array of the right size, and populating it from the list:
In [102]: arr = np.empty(len(sample_list), object)
In [103]: arr
Out[103]: array([None, None, None, None], dtype=object)
In [104]: arr[:] = sample_list
In [105]: arr
Out[105]:
array([list(['hello']), list(['world']), list(['foo']), list(['bar'])],
dtype=object)
Converting a List of Lists into a numpy array
To create a list of numpy arrays:
np_arrays = []
for array in arrays:
np_arrays.append(numpy.array(array))
Make a numpy array of sets from a list of lists
This can be done efficienly using Union-Find algorithm from graphs (see https://www.geeksforgeeks.org/union-find-algorithm-set-2-union-by-rank/)
We consider each sublist as a vertex in a graph.
Two vertexes are connected if their sublists overlap (i.e. intersect).
Union-find provides an efficient method of finding all disjoint subsets of non-overlapping vertices.
from collections import defaultdict
# a structure to represent a graph
class Graph:
def __init__(self, num_of_v):
self.num_of_v = num_of_v
self.edges = defaultdict(list)
# graph is represented as an
# array of edges
def add_edge(self, u, v):
self.edges[u].append(v)
class Subset:
def __init__(self, parent, rank):
self.parent = parent
self.rank = rank
def __repr__(self):
return {'name':self.parent, 'age':self.rank}
def __str__(self):
return 'Subset(parent='+str(self.parent)+', rank='+str(self.rank)+ ')'
# A utility function to find set of an element
# node(uses path compression technique)
def find(subsets, node):
if subsets[node].parent != node:
subsets[node].parent = find(subsets, subsets[node].parent)
return subsets[node].parent
# A function that does union of two sets
# of u and v(uses union by rank)
def union(subsets, u, v):
# Attach smaller rank tree under root
# of high rank tree(Union by Rank)
if subsets[u].rank > subsets[v].rank:
subsets[v].parent = u
elif subsets[v].rank > subsets[u].rank:
subsets[u].parent = v
# If ranks are same, then make one as
# root and increment its rank by one
else:
subsets[v].parent = u
subsets[u].rank += 1
def find_disjoint_sets(graph):
# Allocate memory for creating sets
subsets = []
for u in range(graph.num_of_v):
subsets.append(Subset(u, 0))
# Iterate through all edges of graph,
# find sets of both vertices of every
# edge, if sets are same, then there
# is cycle in graph.
for u in graph.edges:
u_rep = find(subsets, u)
for v in graph.edges[u]:
v_rep = find(subsets, v)
if u_rep == v_rep:
continue
else:
union(subsets, u_rep, v_rep)
return subsets
def generate_groups(lst):
""" Finds disjoint sublists in lst. Performs a union of sublists that intersect """
# Generate graph
g = Graph(len(lst))
# Loop over all pairs of subists,
# Place an edge in the graph for sublists that intersect
for i1, v1 in enumerate(lst):
set_v1 = set(v1)
for i2, v2 in enumerate(lst):
if i2 > i1 and set_v1.intersection(v2):
g.add_edge(i1, i2)
# Disjoint subsets of sublists
subsets = find_disjoint_sets(g)
# Union of sublists which are non-disjoint (i.e. have the same parent)
d = {}
for i in range(len(lst)):
sublist_index = find(subsets, i)
if not sublist_index in d:
d[sublist_index] = set()
d[sublist_index] = d[sublist_index].union(lst[i])
return d
# Test Code
lst = [[2],[5],[5,8,16],[7,9,12],[9,20]]
d = generate_groups(lst)
print(d)
Output
{0: {2}, 1: {8, 16, 5}, 3: {9, 12, 20, 7}}
Convert a list of lists to numpy array in python
You can set the dtype
to object.
>>> import numpy as np
>>> np.array([[1, 2, 3, (2, 4)], [3, 4, 8, 9], [2, 3, 5, (3, 7)]], dtype=object)
array([[1, 2, 3, (2, 4)],
[3, 4, 8, 9],
[2, 3, 5, (3, 7)]], dtype=object)
Note that there's probably not a good reason to create this array in the first place. The main strength of numpy is fast operations on flat sequences of numeric data, with dtype=object
you are storing pointers to full fledged Python objects - just like in a list.
Here is a good answer explaining the object
dtype
.
Convert list of lists of lists to 2D np array
Pandas dataframe constructor is really flexible. You can cast any list to a dataframe.
df = pd.DataFrame(lst)
df.shape # (4, 5)
df
But as the other comments say, there's not much you could do with this dataframe. One of the main reasons to store data as a df is to use vectorized methods but that's not possible with this.
A more sensible approach is to construct a multi index dataframe where each "column" in lst
is its own column.
# reshape 3D -> 2D + build df
df = pd.DataFrame(np.reshape(lst, (len(lst), -1)))
# convert the columns to a 5x3 multi-index
df.columns = pd.MultiIndex.from_arrays(np.divmod(df.columns, len(lst[0][0])))
df
Related Topics
How to Replace Text in a String Column of a Pandas Dataframe
How to Specify the Function Type in My Type Hints
Parse HTML Table to Python List
How to Use the Python HTMLparser Library to Extract Data from a Specific Div Tag
Is There a Built in Package to Parse HTML into Dom
Paramiko Error When Trying to Edit File: "Sudo: No Tty Present and No Askpass Program Specified"
Arranging Text Files Side by Side Using Python
Programming on Samsung Chromebook
How to Check If a Process Is Still Running Using Python on Linux
How to Read Realtime Microphone Audio Volume in Python and Ffmpeg or Similar
Boto3 Client Noregionerror: You Must Specify a Region Error Only Sometimes
Extracting Text from a PDF File Using PDFminer in Python
Curses-Like Library for Cross-Platform Console App in Python