Initialize a Numpy Array

initialize a numpy array

numpy.zeros

Return a new array of given shape and
type, filled with zeros.

or

numpy.ones

Return a new array of given shape and
type, filled with ones.

or

numpy.empty

Return a new array of given shape and
type, without initializing entries.


However, the mentality in which we construct an array by appending elements to a list is not much used in numpy, because it's less efficient (numpy datatypes are much closer to the underlying C arrays). Instead, you should preallocate the array to the size that you need it to be, and then fill in the rows. You can use numpy.append if you must, though.

How to initialize a numpy array with lists

if you wish to use a list of lists:

import numpy as np
l = [[1,2,3],[2,3,4],[3,4,5]]
np.array(l)
# array([[1, 2, 3],
# [2, 3, 4],
# [3, 4, 5]])

if you have multiple list of same dimension:

import numpy as np
l1 = [1,2,3]
l2 = [2,3,4]
l3 = [3,4,5]
np.array([l1, l2, l3])
# array([[1, 2, 3],
# [2, 3, 4],
# [3, 4, 5]])

How to initialize 2D numpy array

Found out the answer myself:
This code does what I want, and shows that I can put a python array ("a") and have it turn into a numpy array. For my code that draws it to a window, it drew it upside down, which is why I added the last line of code.

# generate grid
a = [ ]
allZeroes = []
allOnes = []

for i in range(0,800):
allZeroes.append(0)
allOnes.append(1)

# append 400 rows of 800 zeroes per row.
for i in range(0, 400):
a.append(allZeroes)

# append 400 rows of 800 ones per row.
for i in range(0,400):
a.append(allOnes)

#So this is a 2D 800 x 800 array of zeros on the top half, ones on the bottom half.
array = numpy.array(a)

# Need to flip the array so my other code that draws
# this array will draw it right-side up
array = numpy.flipud(array)

NumPy array initialization with tuple (fill with identical tuples)

You can, with the correct dtype. With 'f,f' you can initialise the array with tuples of floats; see Data type objects (dtype) for more.

np.full((3,2), np.nan, dtype='f,f')

array([[(nan, nan), (nan, nan)],
[(nan, nan), (nan, nan)],
[(nan, nan), (nan, nan)]], dtype=[('f0', '<f4'), ('f1', '<f4')])

How do I initialize a numpy array starting at a particular number?

arange takes an optional start argument.

start = 13 # Any number works here
np.arange(start, start + 32).reshape(4, 8)

# array([[13, 14, 15, 16, 17, 18, 19, 20],
# [21, 22, 23, 24, 25, 26, 27, 28],
# [29, 30, 31, 32, 33, 34, 35, 36],
# [37, 38, 39, 40, 41, 42, 43, 44]])

Initializing series object using numpy?

When you have in my_data non-homogenous items, e.g. a mixture of numbers
and strings, e.g.:

labels = ['a', 'b', 'c', 'd', 'e']
my_data = [10, 20, 30, 'xx', 12.55]
arr = np.array(my_data)
s = pd.Series(my_data, index=labels)

and print arr, you will get:

array(['10', '20', '30', 'xx', '12.55'], dtype='<U11')

Note that each item in arr is a string.

At the first glance, the same holds for s. When you print it, you will get:

a       10
b 20
c 30
d xx
e 12.55
dtype: object

When you look at the items itself, it is not obvious what is their type, but look
at the bottom line: dtype: object.
The first thought is "actually a string", but as a matter of fact it means
"it depends from particular cell".

To confirm it, take a look at individual cells:

type(s['a']) yields int, type(s['d']) yields str and
type(s['e']) yields float (each of them are descendants of object).

And now try the homogenous variant:

my_data = [10, 20, 30, 4.12, 12.55]

(either int or float, other "initial" instructions like above).

Now when you print arr, you will get:

array([10.  , 20.  , 30.  ,  4.12, 12.55])

so all elements are coerced to closest possible ancestor, in this
case just float.

When you print s, the result is:

a    10.00
b 20.00
c 30.00
d 4.12
e 12.55
dtype: float64

so its type is inherited from arr.

This time, when you print type(s['a']) (or whathever other cell),
you will get float.

Note also such a difference between plain pythonic list and a Numpy array:

  • in a list each element has its own type,
  • in a Numpy array the type is assigned to the array, i.e.
    all its elements have the same type (although they can be subtypes
    of the "basic" type for the whole array).

So when you create a Series or DataFrame from Numpy array (1-D or 2-D
respectively):

  • Series object inherits type from the source array,
  • each column of a DataFrame also inherit type from this array.

Of course, you can create a DataFrame also from a number of separate 1-D
Numpy arrays (sources for columns), each with its own type and the
resulting DataFrame also will inherit source types, separately
for each column, from respective Numpy arrays.

Edit following the question

Only as late as in version 1.0 of Pandas there were introduced some
new, experimental dtypes, among them string (just what you ask for).

Apparently Pandas authors recognized that there is a need for an "explicit"
string, not "any object, maybe a string".

But these changes are introduced stepwise, for now not including existing
methods to read content from files.
E.g. read_csv operates "the old way", i.e. if some column is of
non-numerical and non-datelike type, then object type is assumed.

To allow conversion of such columns to "new" dtypes, convert_dtypes()
method has been added, to be called e.g. after read_csv, in an attempt
to change the type of each column to some of "new" dtypes (if possible).

To get more complete image of what was recenty added, and how to use it,
read the Pandas documentation about new dtypes, NA scalar and
working with missing data.

Create numpy matrix filled with NaNs

You rarely need loops for vector operations in numpy.
You can create an uninitialized array and assign to all entries at once:

>>> a = numpy.empty((3,3,))
>>> a[:] = numpy.nan
>>> a
array([[ NaN, NaN, NaN],
[ NaN, NaN, NaN],
[ NaN, NaN, NaN]])

I have timed the alternatives a[:] = numpy.nan here and a.fill(numpy.nan) as posted by Blaenk:

$ python -mtimeit "import numpy as np; a = np.empty((100,100));" "a.fill(np.nan)"
10000 loops, best of 3: 54.3 usec per loop
$ python -mtimeit "import numpy as np; a = np.empty((100,100));" "a[:] = np.nan"
10000 loops, best of 3: 88.8 usec per loop

The timings show a preference for ndarray.fill(..) as the faster alternative. OTOH, I like numpy's convenience implementation where you can assign values to whole slices at the time, the code's intention is very clear.

Note that ndarray.fill performs its operation in-place, so numpy.empty((3,3,)).fill(numpy.nan) will instead return None.



Related Topics



Leave a reply



Submit