How does __contains__ work for ndarrays?
I found the source for ndarray.__contains__
, in numpy/core/src/multiarray/sequence.c
. As a comment in the source states,
thing in x
is equivalent to
(x == thing).any()
for an ndarray x
, regardless of the dimensions of x
and thing
. This only makes sense when thing
is a scalar; the results of broadcasting when thing
isn't a scalar cause the weird results I observed, as well as oddities like array([1, 2, 3]) in array(1)
that I didn't think to try. The exact source is
static int
array_contains(PyArrayObject *self, PyObject *el)
{
/* equivalent to (self == el).any() */
int ret;
PyObject *res, *any;
res = PyArray_EnsureAnyArray(PyObject_RichCompare((PyObject *)self,
el, Py_EQ));
if (res == NULL) {
return -1;
}
any = PyArray_Any((PyArrayObject *)res, NPY_MAXDIMS, NULL);
Py_DECREF(res);
ret = PyObject_IsTrue(any);
Py_DECREF(any);
return ret;
}
Concatenate a NumPy array to another NumPy array
In [1]: import numpy as np
In [2]: a = np.array([[1, 2, 3], [4, 5, 6]])
In [3]: b = np.array([[9, 8, 7], [6, 5, 4]])
In [4]: np.concatenate((a, b))
Out[4]:
array([[1, 2, 3],
[4, 5, 6],
[9, 8, 7],
[6, 5, 4]])
or this:
In [1]: a = np.array([1, 2, 3])
In [2]: b = np.array([4, 5, 6])
In [3]: np.vstack((a, b))
Out[3]:
array([[1, 2, 3],
[4, 5, 6]])
How do I count the occurrence of a certain item in an ndarray?
Using numpy.unique
:
import numpy
a = numpy.array([0, 3, 0, 1, 0, 1, 2, 1, 0, 0, 0, 0, 1, 3, 4])
unique, counts = numpy.unique(a, return_counts=True)
>>> dict(zip(unique, counts))
{0: 7, 1: 4, 2: 1, 3: 2, 4: 1}
Non-numpy method using collections.Counter
;
import collections, numpy
a = numpy.array([0, 3, 0, 1, 0, 1, 2, 1, 0, 0, 0, 0, 1, 3, 4])
counter = collections.Counter(a)
>>> counter
Counter({0: 7, 1: 4, 3: 2, 2: 1, 4: 1})
Why does numpy have a corresponding function for many ndarray methods?
As others have noted, the identically-named NumPy functions and array methods are often equivalent (they end up calling the same underlying code). One might be preferred over the other if it makes for easier reading.
However, in some instances the two behave different slightly differently. In particular, using the ndarray
method sometimes emphasises the fact that the method is modifying the array in-place.
For example, np.resize
returns a new array with the specified shape. On the other hand, ndarray.resize
changes the shape of the array in-place. The fill values used in each case are also different.
Similarly, a.sort()
sorts the array a
in-place, while np.sort(a)
returns a sorted copy.
Python numpy array of numpy arrays
Never append to numpy
arrays in a loop: it is the one operation that NumPy is very bad at compared with basic Python. This is because you are making a full copy of the data each append
, which will cost you quadratic time.
Instead, just append your arrays to a Python list and convert it at the end; the result is simpler and faster:
a = []
while ...:
b = ... # NumPy array
a.append(b)
a = np.asarray(a)
As for why your code doesn't work: np.append
doesn't behave like list.append
at all. In particular, it won't create new dimensions when appending. You would have to create the initial array with two dimensions, then append with an explicit axis argument.
How to convert list of numpy arrays into single numpy array?
In general you can concatenate a whole sequence of arrays along any axis:
numpy.concatenate( LIST, axis=0 )
but you do have to worry about the shape and dimensionality of each array in the list (for a 2-dimensional 3x5 output, you need to ensure that they are all 2-dimensional n-by-5 arrays already). If you want to concatenate 1-dimensional arrays as the rows of a 2-dimensional output, you need to expand their dimensionality.
As Jorge's answer points out, there is also the function stack
, introduced in numpy 1.10:
numpy.stack( LIST, axis=0 )
This takes the complementary approach: it creates a new view of each input array and adds an extra dimension (in this case, on the left, so each n
-element 1D array becomes a 1-by-n
2D array) before concatenating. It will only work if all the input arrays have the same shape.
vstack
(or equivalently row_stack
) is often an easier-to-use solution because it will take a sequence of 1- and/or 2-dimensional arrays and expand the dimensionality automatically where necessary and only where necessary, before concatenating the whole list together. Where a new dimension is required, it is added on the left. Again, you can concatenate a whole list at once without needing to iterate:
numpy.vstack( LIST )
This flexible behavior is also exhibited by the syntactic shortcut numpy.r_[ array1, ...., arrayN ]
(note the square brackets). This is good for concatenating a few explicitly-named arrays but is no good for your situation because this syntax will not accept a sequence of arrays, like your LIST
.
There is also an analogous function column_stack
and shortcut c_[...]
, for horizontal (column-wise) stacking, as well as an almost-analogous function hstack
—although for some reason the latter is less flexible (it is stricter about input arrays' dimensionality, and tries to concatenate 1-D arrays end-to-end instead of treating them as columns).
Finally, in the specific case of vertical stacking of 1-D arrays, the following also works:
numpy.array( LIST )
...because arrays can be constructed out of a sequence of other arrays, adding a new dimension to the beginning.
What is the fastest way to stack numpy arrays in a loop?
What @hpaulj was trying to say with
Stick with list append when doing loops.
is
#use a normal list
result_arr = []
for label in labels_set:
data_transform = pca.fit_transform(data_sub_tfidf)
# append the data_transform object to that list
# Note: this is not np.append(), which is slow here
result_arr.append(data_transform)
# and stack it after the loop
# This prevents slow memory allocation in the loop.
# So only one large chunk of memory is allocated since
# the final size of the concatenated array is known.
result_arr = np.concatenate(result_arr)
# or
result_arr = np.stack(result_arr, axis=0)
# or
result_arr = np.vstack(result_arr)
Your arrays don't really have different dimensions. They have one different dimension, the other one is identical. And in that case you can always stack along the "different" dimension.
Related Topics
Plotting a Fast Fourier Transform in Python
Add Sum of Values of Two Lists into New List
Create PDF from a List of Images
Catching an Exception While Using a Python 'With' Statement
Executing Multiple Statements with Postgresql via SQLalchemy Does Not Persist Changes
Can Elementtree Be Told to Preserve the Order of Attributes
How to Prevent Iterator Getting Exhausted
Getting Started with the Python Debugger, Pdb
Python-Requests Close Http Connection
Differences Between Numpy.Random and Random.Random in Python
Call a Python Function from Jinja2
Run Child Processes as Different User from a Long Running Python Process
Running Get_Dummies on Several Dataframe Columns
Can't Install New Packages for Python (Python 3.9.0, Windows 10)