Python 2D List Performance, Without Numpy

Python 2D list performance, without numpy

So obviously, I would like to avoid the first solution, especially since this will be running for sizes up to 100k. However, I also do not want to use too many dependencies.

You must choose which of these is more important to you. Numpy has better performance precisely because it doesn't use the builtin Python types and uses its own types that are optimized for numerical work. If your data are going to be numeric and you're going to have 100k rows/columns, you will see a gigantic performance increase with numpy. If you want to avoid the numpy dependency, you will have to live with reduced performance. (Obviously you can always write your own Python libraries or C extensions to optimize for your particular use case, but these will then be dependencies like any other.)

Personally I would recommend you just use numpy. It is so widely used that anyone who is considering using a library that deals with 100k multidimensional arrays probably already has numpy installed.

Sub matrix of a list of lists (without numpy)

In [74]: [row[2:5] for row in LoL[1:4]]
Out[74]: [[2, 3, 4], [2, 3, 4], [2, 3, 4]]

You could also mimic NumPy's syntax by defining a subclass of list:

class LoL(list):
def __init__(self, *args):
list.__init__(self, *args)
def __getitem__(self, item):
try:
return list.__getitem__(self, item)
except TypeError:
rows, cols = item
return [row[cols] for row in self[rows]]

lol = LoL([list(range(10)) for i in range(10)])
print(lol[1:4, 2:5])

also yields

[[2, 3, 4], [2, 3, 4], [2, 3, 4]]

Using the LoL subclass won't win any speed tests:

In [85]: %timeit [row[2:5] for row in x[1:4]]
1000000 loops, best of 3: 538 ns per loop
In [82]: %timeit lol[1:4, 2:5]
100000 loops, best of 3: 3.07 us per loop

but speed isn't everything -- sometimes readability is more important.

Faster way to access 2D lists/arrays and do calculations in Python?

Some general tips:

  1. Try to optimize your algorithm the best You can.
  2. If it's possible, then use PyPy instead of regular Python. This usually doesn't require any code modification if all Your external dependencies work with it.
  3. Static typing and compilation to C can add additional boost, but requires some simple code modifications. For this purpose You can use Cython.

Note that step 1 is very ofter hard to do and most time consuming, giving you small amounts of boost if you have already good code, while the steps 2 and 3 give you dramatic boost without much additional effort.

Fast 2-dimensional array (matrix) in Python without C extensions

The array module might be faster when the matrix gets large, because it can pack values more compactly; it can be used with the values[j*width + i] convention. But no, there's no multidimensional array in the Python standard library, probably because (a) Numpy already fills that niche effectively and (b) you can always make a list of lists if performance is not paramount.

The fastest option really depends on the algorithm. The dict-based approach might actually be the fastest when the matrices you're handling are very sparse (which in DP algorithms they're usually not, at least not in the ones I saw).

Python, divide all numbers in a 2d list by 10 and return a 2d list

That's generally solved by using comprehensions, in this case a nested list-comprehension:

>>> from __future__ import division   # for python-2.x compatibility
>>> [[item / 10 for item in subl] for subl in a]
[[0.2, 0.3, 0.4], [0.9, 0.1, 0.2]]

That's probably faster than map and avoids all the lambda functions.

what if instead of 10, I wanted to use the total of all the values in the nested 2d list?

Calculate the total using sum and a nested generator expression:

>>> sum_ = sum(item for subl in a for item in subl)
>>> [[item / sum_ for item in subl] for subl in a]
[[0.09523809523809523, 0.14285714285714285, 0.19047619047619047],
[0.42857142857142855, 0.047619047619047616, 0.09523809523809523]]

But with NumPy arrays it's even easier. NumPy a 3rd party package but very powerful and fast:

>>> import numpy as np
>>> arr = np.array(a)
>>> arr / 10. # element-wise division
array([[ 0.2, 0.3, 0.4],
[ 0.9, 0.1, 0.2]])

>>> arr / arr.sum() # sum over all elements then element-wise division
array([[ 0.0952381 , 0.14285714, 0.19047619],
[ 0.42857143, 0.04761905, 0.0952381 ]])

why is converting a long 2D list to numpy array so slow?

Implementing this in Cython without the extra checking involved to determine dimensionality, etc. nearly eliminates the time difference you are seeing.
Here's the .pyx file I used to verify that.

from numpy cimport ndarray as ar
import numpy as np
cimport cython

@cython.boundscheck(False)
@cython.wraparound(False)
def toarr(xy):
cdef int i, j, h=len(xy), w=len(xy[0])
cdef ar[double,ndim=2] new = np.empty((h,w))
for i in xrange(h):
for j in xrange(w):
new[i,j] = xy[i][j]
return new

I would assume that the extra time is spent in checking the length and content of each sublist in order to determine the datatype, dimension, and size of the desired array.
When there are only two sublists, it only has to check two lengths to determine the number of columns in the array, instead of checking 1000000 of them.

Python/Numpy: Build 2D array without adding duplicate rows (for triangular mesh)

Jaime has shown a neat trick which can be used to view a 2D array as a 1D array with items that correspond to rows of the 2D array. This trick can allow you to apply numpy functions which take 1D arrays as input (such as np.unique) to higher dimensional arrays.

If the order of the rows in unified_verts does not matter (as long as the ref_list is correct with respect to unifed_verts), then you could use np.unique along with Jaime's trick like this:

def unify2(raw_data):
dtype = np.dtype((np.void, (raw_data.shape[1] * raw_data.dtype.itemsize)))
uniq, inv = np.unique(raw_data.view(dtype), return_inverse=True)
uniq = uniq.view(raw_data.dtype).reshape(-1, raw_data.shape[1])
return uniq, inv

The result is the same in the sense that the raw_data can be reconstructed from the return values of unify (or unify2):

unified, ref = unify(raw_data)
uniq, inv = unify2(raw_data)
assert np.allclose(uniq[inv], unified[ref]) # raw_data

On my machine, unified, ref = unify(raw_data) requires about 51.390s, while uniq, inv = unify2(raw_data) requires about 0.133s (~ 386x speedup).

alternative to numpy.argwhere to speed up for loop in python

Make a sample 2d array:

In [584]: arr = np.random.rand(1000,1000)

Find a small proportion of them:

In [587]: np.where(arr>.999)
Out[587]:
(array([ 1, 1, 1, ..., 997, 999, 999], dtype=int32),
array([273, 471, 584, ..., 745, 310, 679], dtype=int32))
In [588]: _[0].shape
Out[588]: (1034,)

Time various pieces of argwhere:

In [589]: timeit arr>.999
2.65 ms ± 116 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [590]: timeit np.count_nonzero(arr>.999)
2.79 ms ± 26 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [591]: timeit np.nonzero(arr>.999)
6 ms ± 10 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [592]: timeit np.argwhere(arr>.999)
6.06 ms ± 58.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

So about 1/3 of the time is spend doing the > test, and the rest in finding the True elements. Turning the where tuple into a 2 column array is fast.

Now if the goal was to just find the first > value, argmax is fast.

In [593]: np.argmax(arr>.999)
Out[593]: 1273 # can unravel this to (1,273)
In [594]: timeit np.argmax(arr>.999)
2.76 ms ± 143 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

argmax short circuits, so the actual run time will vary on when it finds the first value.

flatnonzero is faster than where:

In [595]: np.flatnonzero(arr>.999)
Out[595]: array([ 1273, 1471, 1584, ..., 997745, 999310, 999679], dtype=int32)
In [596]: timeit np.flatnonzero(arr>.999)
3.05 ms ± 26.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [599]: np.unravel_index(np.flatnonzero(arr>.999),arr.shape)
Out[599]:
(array([ 1, 1, 1, ..., 997, 999, 999], dtype=int32),
array([273, 471, 584, ..., 745, 310, 679], dtype=int32))
In [600]: timeit np.unravel_index(np.flatnonzero(arr>.999),arr.shape)
3.05 ms ± 3.58 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [601]: timeit np.transpose(np.unravel_index(np.flatnonzero(arr>.999),arr.shap
...: e))
3.1 ms ± 5.86 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

This is the same as np.argwhere(arr>.999).

Interesting, the flatnonzero approach cuts the time in half! I didn't expect such a big improvement.


Comparing the iteration speeds:

Iteration on the 2d array from argwhere:

In [607]: pixels = np.argwhere(arr>.999)
In [608]: timeit [pixel for pixel in pixels]
347 µs ± 5.29 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Iterating on the tuple from where with the zip(*) transpose:

In [609]: idx = np.where(arr>.999)
In [610]: timeit [pixel for pixel in zip(*idx)]
256 µs ± 147 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Iterating on an array is often a little slower than iterating on a list, or in this case zipped arrays.

In [611]: [pixel for pixel in pixels][:5]
Out[611]:
[array([ 1, 273], dtype=int32),
array([ 1, 471], dtype=int32),
array([ 1, 584], dtype=int32),
array([ 1, 826], dtype=int32),
array([ 2, 169], dtype=int32)]
In [612]: [pixel for pixel in zip(*idx)][:5]
Out[612]: [(1, 273), (1, 471), (1, 584), (1, 826), (2, 169)]

One is a list of arrays, the other a list of tuples. But turning those tuples into arrays (individually) is slow:

In [614]: timeit [np.array(pixel) for pixel in zip(*idx)]
2.26 ms ± 4.94 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Iterating on the flat nonzero array is faster

In [617]: fdx = np.flatnonzero(arr>.999)
In [618]: fdx[:5]
Out[618]: array([1273, 1471, 1584, 1826, 2169], dtype=int32)
In [619]: timeit [i for i in fdx]
112 µs ± 23.5 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

but applying unravel to those values individually will take time.

def foo(idx):    # a simplified unravel
return idx//1000, idx%1000

In [628]: timeit [foo(i) for i in fdx]
1.12 ms ± 1.02 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Add this 1 ms to the 3 ms to generate fdx, this flatnonzero might still come out ahead. But at its best we are talking about a 2x speed improvement.

How to define a two-dimensional array?

You're technically trying to index an uninitialized array. You have to first initialize the outer list with lists before adding items; Python calls this
"list comprehension".

# Creates a list containing 5 lists, each of 8 items, all set to 0
w, h = 8, 5
Matrix = [[0 for x in range(w)] for y in range(h)]

#You can now add items to the list:

Matrix[0][0] = 1
Matrix[6][0] = 3 # error! range...
Matrix[0][6] = 3 # valid

Note that the matrix is "y" address major, in other words, the "y index" comes before the "x index".

print Matrix[0][0] # prints 1
x, y = 0, 6
print Matrix[x][y] # prints 3; be careful with indexing!

Although you can name them as you wish, I look at it this way to avoid some confusion that could arise with the indexing, if you use "x" for both the inner and outer lists, and want a non-square Matrix.



Related Topics



Leave a reply



Submit