NumPy

Introduction

Python has not been designed with number crunching in mind. In fact, if you try to use built-in lists for numeric computing, you will soon notice several caveats:

Operations with lists are slow
Lists do not store numeric data efficiently
Performing elementwise operations is cumbersome

Consider the following example – a list x containing numbers from 1 to 10:

x = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

Let’s try to increment each element in the list by 1. Intuitively, we might try this as follows:

x + 1  # error

TypeError: can only concatenate list (not "int") to list

Apparently, this results in an error. The + operator concatenates two lists or adds two integers, but it does not know what to do with a list and an integer. We could use a list comprehension to perform the calculation:

[n + 1 for n in x]

[2, 3, 4, 5, 6, 7, 8, 9, 10, 11]

Although this works, it is quite a lot to type for such a (seemingly) basic operation. Using the built-in map() function does not make things simpler either:

list(map(lambda x: x + 1, x))

[2, 3, 4, 5, 6, 7, 8, 9, 10, 11]

Even though the array module in the standard library features an array data type (which stores elements of the same type much more efficiently than lists), elementwise operations with a simple syntax are still not supported.

NumPy solves all of these issues. The importance of this third-party package cannot be understated: without NumPy, Python would not be one of the most popular programming language for data science today. Many scientific packages like SciPy, pandas, and Scikit-Learn rely on NumPy under the hood, so it makes sense to learn the basics even if you do not plan to work with NumPy directly. This is the goal of this chapter.

The n-dimensional array

Before we discuss NumPy in detail, here is a quick teaser of how you can use it in our previous example:

import numpy as np
x = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
x + 1

array([ 2,  3,  4,  5,  6,  7,  8,  9, 10, 11])

Voilà! We added 1 to each element in the array by just typing x + 1 – no list comprehensions, lambda functions, or other gymnastics required!

Of course, this does not work out of the box, because we need to import NumPy before we can use it. By convention, this is done by import numpy as np.

The foundation of NumPy is the n-dimensional homogeneous array (or short ndarray or just array). This data type is a highly efficient container for homogeneous data, which means that all elements must be of the same type (typically, elements are numeric types such as integers or floats). The data is also structured into dimensions (also called axes), so an array can be one-dimensional, two-dimensional, three-dimensional, and so on (n-dimensional in general).

Let’s take a look at our array x that we have just created:

type(x)

<class 'numpy.ndarray'>

Alright, so this is an object of type numpy.ndarray – a NumPy array!

Each object of type numpy.ndarray has a dtype attribute, which indicates the data type of its elements. In our example, all items are integers (represented by 64 bits):

x.dtype

dtype('int64')

Our array looks like a list, so it should have only one dimension. The shape attribute contains the number of elements in each dimension:

x.shape

(10,)

Indeed, x has only one dimension containing 10 elements.

Note

The shape attribute is always a tuple, even if the array has only one dimension. In this case, the tuple contains only one element, which specifies the number of items in that dimension.

In addition to dtype and shape, the following attributes are also available:

ndim contains the number of dimensions (axes) (this is equal to the length of the shape tuple)
size contains the total number of elements in the array (this is equal to the product of the individual shape elements)
itemsize contains the size of one element in bytes (this is normally apparent from the dtype, for example int64 means that one element occupies 64 bits, which corresponds to 8 bytes)
nbytes contains the total size of the array in bytes (this is equal to size * itemsize)

These are the attributes for our example array x:

x.ndim

x.size

x.itemsize

x.nbytes

Creating arrays

From existing sequences

The np.array() function takes a sequence (such as a list) and produces a NumPy array. We already saw how this function generates a one-dimensional array from a simple list of numbers:

x = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
x

array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10])

If we pass a list of lists, we can create a two-dimensional array:

y = np.array([[1, 2], [3, 4], [5, 6], [7, 8], [9, 10]])
y

array([[ 1,  2],
       [ 3,  4],
       [ 5,  6],
       [ 7,  8],
       [ 9, 10]])

This array consists of five rows and two columns, as can be verified by inspecting its shape:

y.shape

(5, 2)

In other words, the first dimension contains five elements, and the second dimension contains two elements.

Tip

In any n-dimensional array with at least two dimensions, the last two dimensions can be interpreted as rows and columns, respectively.

More deeply nested lists are mapped to additional dimensions. Here’s a three-dimensional array:

z = np.array([[[1, 2, 3], [4, 5, 6]], [[7, 8, 9], [10, 11, 12]]])
z

array([[[ 1,  2,  3],
        [ 4,  5,  6]],

       [[ 7,  8,  9],
        [10, 11, 12]]])

The shape of z is:

z.shape

(2, 2, 3)

This can be interpreted as two tables with two rows and three columns each (remember that the last two dimensions correspond to rows and columns).

All arrays that we created so far contain integers (because our lists that we provided to initialize the arrays also contained integers):

x.dtype

dtype('int64')

y.dtype

dtype('int64')

z.dtype

dtype('int64')

We can also create arrays consisting of floating point numbers, for example:

f = np.array([[1.1, 5.2, -8.3], [-4.4, 15.5, 9.6]])
f

array([[ 1.1,  5.2, -8.3],
       [-4.4, 15.5,  9.6]])

f.dtype

dtype('float64')

We can even specify the desired data type explicitly:

g = np.array([1, 2, 3], dtype=float)
g

array([1., 2., 3.])

Notice the decimal points in the output; the data type is therefore:

g.dtype

dtype('float64')

Besides the standard Python data types int and float, more specific types available as np.int32, np.int64, np.float32, np.float64, and so on, are also supported. In fact, int and float map to np.int64 and np.float64, respectively.

Number ranges

The np.arange() function creates a one-dimensional array with equally-spaced numbers:

np.arange(10)

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

The function accepts custom values for the start, end, and step size of the sequence:

np.arange(start=3, stop=11, step=0.8)

array([ 3. ,  3.8,  4.6,  5.4,  6.2,  7. ,  7.8,  8.6,  9.4, 10.2])

Note

The stop value is exclusive, so the sequence will end before reaching this value.

If you need a specific number of equally-spaced values within a given range, np.linspace() and np.logspace() are useful. For example, the following command creates an array with eight equally-spaced values between 13 and 14:

np.linspace(start=13, stop=14, num=8)

array([13.        , 13.14285714, 13.28571429, 13.42857143, 13.57142857,
       13.71428571, 13.85714286, 14.        ])

Note

Here, the stop value is inclusive, so the sequence will end at this value.

Tip

The np.linspace() function is helpful when you want to evaluate a function at many points. The following example plots a complete period of a sine using 100 equally-spaced points:

import matplotlib.pyplot as plt

fig, ax = plt.subplots()
x = np.linspace(0, 2 * np.pi, 100)  # 100 values from 0 to 2𝜋
ax.plot(x, np.sin(x))

Note that we use Matplotlib to plot the sine function values for these 100 points with a line graph, but don’t worry too much about the plot right now. We will discuss Matplotlib in more detail later.

Next, let’s create a logarithmically-spaced array with 16 numbers ranging from 10^-1 to 10⁷:

np.logspace(start=-1, stop=7, num=16)

array([1.00000000e-01, 3.41454887e-01, 1.16591440e+00, 3.98107171e+00,
       1.35935639e+01, 4.64158883e+01, 1.58489319e+02, 5.41169527e+02,
       1.84784980e+03, 6.30957344e+03, 2.15443469e+04, 7.35642254e+04,
       2.51188643e+05, 8.57695899e+05, 2.92864456e+06, 1.00000000e+07])

Note how numbers are automatically displayed in scientific notation to accomodate the broad range with a fixed number of digits.

Filled arrays

Sometimes, it is necessary to create an array consisting of all zeros, all ones, or any arbitrary fixed value. This can be achieved with np.zeros(), np.ones(), and np.full(). If you only want to pre-allocate an array of a given size and do not care which values it contains initially, you can use np.empty().

np.zeros((2, 3))  # 2 rows, 3 columns

array([[0., 0., 0.],
       [0., 0., 0.]])

Note

In the previous example, we passed the tuple (2, 3) as the first argument to create the desired array. To prevent Python from interpreting these numbers as separate arguments, it is necessary to enclose this tuple within parentheses. In other words, np.zeros(2, 3) does not work! To make it really explicit, we can use a keyword argument in our function call:

np.zeros(shape=(2, 3))

The default data type is np.float64 (a floating point number with 64 bits, also known as double), but you can specify the desired type with the dtype argument:

np.zeros((2, 3), dtype=np.int64)

array([[0, 0, 0],
       [0, 0, 0]])

Creating an array with all ones works similarly:

np.ones((3, 3))

array([[1., 1., 1.],
       [1., 1., 1.],
       [1., 1., 1.]])

Next, we create an array filled with the number 61:

np.full((2, 2), 61)

array([[61, 61],
       [61, 61]])

We can also just allocate an array if we do not care about its initial values (which is usually a bit faster than filling with some predefined value):

np.empty((4, 3))

array([[0., 0., 0.],
       [0., 0., 0.],
       [0., 0., 0.],
       [0., 0., 0.]])

Note that those values are arbitrary and might be different on your computer, so we have to make sure to populate the array later with the desired values.

Finally, np.eye() is a nice shortcut to create a square two-dimensional “identity” array (with ones on the diagonal and zeros elsewhere):

np.eye(5)

array([[1., 0., 0., 0., 0.],
       [0., 1., 0., 0., 0.],
       [0., 0., 1., 0., 0.],
       [0., 0., 0., 1., 0.],
       [0., 0., 0., 0., 1.]])

Random numbers

The numpy.random module provides functions to generate (pseudo-)random numbers from a variety of probability distributions. This works as follows: first, we create a generator, which we can then use to draw random numbers from a specific probability distribution. In the following example, we use the default generator provided by default_rng(). We also set the random seed of this generator (42 in this example, but the specific value does not matter), which means that we will get the exact same random numbers every time we run our code. This is important for reproducible results.

from numpy.random import default_rng

rng = default_rng(42)
x1 = rng.standard_normal(size=10)
x2 = rng.uniform(size=(2, 3))
x3 = rng.integers(low=-3, high=99, size=(2, 5))

x1

array([ 0.30471708, -1.03998411,  0.7504512 ,  0.94056472, -1.95103519,
       -1.30217951,  0.1278404 , -0.31624259, -0.01680116, -0.85304393])

x2

array([[0.37079802, 0.92676499, 0.64386512],
       [0.82276161, 0.4434142 , 0.22723872]])

x3

array([[ 6, 53, 87,  3, 84],
       [81, 25, 61, 13, 74]])

Shape and reshape

Every array has a certain shape, which determines how its values are structured into dimensions. The desired shape can be passed as an argument when creating an array, but it can also be changed later on. The shape attribute of an array returns the current shape as a tuple, listing the number of elements in each dimension.

We already saw how to create various arrays with different shapes using the shape argument:

np.zeros(shape=(2, 3))

array([[0., 0., 0.],
       [0., 0., 0.]])

np.ones(shape=(3, 2))

array([[1., 1.],
       [1., 1.],
       [1., 1.]])

np.full(shape=(2, 4), fill_value=2)

array([[2, 2, 2, 2],
       [2, 2, 2, 2]])

np.empty(shape=(2, 3))

array([[1., 1., 1.],
       [1., 1., 1.]])

Functions that create arrays with random numbers use the size argument to specify the desired shape (in the following examples, we are re-using the rng generator created in the previous section):

rng.standard_normal(size=(2, 5))

array([[-0.68092954,  1.22254134, -0.15452948, -0.42832782, -0.35213355],
       [ 0.53230919,  0.36544406,  0.41273261,  0.430821  ,  2.1416476 ]])

rng.uniform(size=(2, 2))

array([[0.96750973, 0.32582536],
       [0.37045971, 0.46955581]])

rng.integers(low=-3, high=99, size=(2, 5))

array([[78, 16, 44, 10, 67],
       [45, 30, 20, 54, 65]])

However, we can always change the current shape of an array to a new compatible shape. Here, compatible means that the total number of elements must stay the same, so we could reshape an array with 3 rows and 4 columns to 6 rows and 2 columns (the total number of elements remains 12).

There are three main ways to change the shape of an array:

Assign a new value (tuple) to the shape attribute
Use the resize() method
Use the reshape() method

Let’s take a look at each of these options in turn with the following (3, 4) example array:

x = np.array([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]])
x

array([[ 1,  2,  3,  4],
       [ 5,  6,  7,  8],
       [ 9, 10, 11, 12]])

x.shape

(3, 4)

If we want to reshape this array to 4 rows and 3 columns, we can directly manipulate its shape:

x.shape = (4, 3)
x

array([[ 1,  2,  3],
       [ 4,  5,  6],
       [ 7,  8,  9],
       [10, 11, 12]])

This modifies the shape of the array in place. Note that the way the array is stored in memory does not change, so this is a very fast operation.

Another way to change the shape is to call the resize() method with the new shape, which will also modify the array in place:

x.resize((2, 6))
x

array([[ 1,  2,  3,  4,  5,  6],
       [ 7,  8,  9, 10, 11, 12]])

Finally, the reshape() method works just like resize(), but it creates a copy and returns a new array with the new shape (so it does not modify the array in place, which is slower and requires more memory):

x.reshape((3, 4))

array([[ 1,  2,  3,  4],
       [ 5,  6,  7,  8],
       [ 9, 10, 11, 12]])

x  # still (2, 6)

array([[ 1,  2,  3,  4,  5,  6],
       [ 7,  8,  9, 10, 11, 12]])

Of course you can always re-bind the existing name to the new array like this:

x = x.reshape((3, 4))
x

array([[ 1,  2,  3,  4],
       [ 5,  6,  7,  8],
       [ 9, 10, 11, 12]])

We do not even have to specify the complete shape when using the reshape() or resize() methods. Since the total number of elements must stay the same, we can set one dimension in the new shape tuple to -1, which means that its size will be calculated automatically:

x.reshape((6, -1))  # -1 is inferred to mean 2 here

array([[ 1,  2],
       [ 3,  4],
       [ 5,  6],
       [ 7,  8],
       [ 9, 10],
       [11, 12]])

Indexing and slicing

We can pull out one or more items of an array using indexing and slicing. This works very similar to lists, at least for one-dimensional arrays. Let’s start with a simple one-dimensional array:

x = np.arange(3, 19, 2, dtype=np.int64)
x

array([ 3,  5,  7,  9, 11, 13, 15, 17])

Python uses zero-based indexing, so the first item is given by:

x[0]

np.int64(3)

Similarly, we can index other positions of the array:

x[2]

np.int64(7)

Negative indexes count from the end of the array, so the last item is:

x[-1]

np.int64(17)

Slices pull out multiple elements with the : operator to indicate the desired range (and an optional step size). Note that the stop index is exclusive.

x[1:5]  # start and stop

array([ 5,  7,  9, 11])

x[1:5:2]  # start, stop, and step

array([5, 9])

x[::-1]

array([17, 15, 13, 11,  9,  7,  5,  3])

When there is more than one dimension, we can provide indices or slices for each dimension (separated by commas):

y = np.arange(12).reshape((3, 4))
y

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])

y[1, 0]  # row 1, column 0

np.int64(4)

y[1:, 1:3]  # rows 1 through the last, columns 1 and 2

array([[ 5,  6],
       [ 9, 10]])

y[:, 3]  # column 3 (the fourth one)

array([ 3,  7, 11])

y[1, :]  # row 1 (the second row)

array([4, 5, 6, 7])

If you do not provide indices or slices for some axes (or just use :), these are considered complete slices (so all elements in the missing dimensions are selected):

y[0]  # first row, equivalent to y[0, :]

array([0, 1, 2, 3])

If you want to skip multiple axes, instead of providing a : for each axis, you can also use ... (three dots, also called ellipsis). For example, let’s create a five-dimensional array:

x = np.arange(720).reshape((3, 4, 5, 2, 6))  # five dimensions (axes)

The following slices can be abbreviated using ... notation:

x[1, 2, :, :, :] is equal to x[1, 2, ...] and x[1, 2]
x[:, 1, :, :, 4] is equal to x[:, 1, ..., 4]
x[:, :, :, :, 3] is equal to x[..., 3]

Fancy indexing

In contrast to lists, we can even use arrays (or lists) as indices inside the square brackets to pull out several individual elements. This is called fancy indexing.

x = np.arange(10, 18, dtype=np.int64)
x

array([10, 11, 12, 13, 14, 15, 16, 17])

x[[1, 5, 1, 0]]  # elements 1, 5, 1, and 0

array([11, 15, 11, 10])

It is also possible to use boolean values in fancy indexing. This can be used to filter values in an array, because the result will exclusively contain values corresponding to True locations:

x[[True, False, False, False, True, False, True, False]]

array([10, 14, 16])

Because every comparison yields a boolean array, this approach can be used to filter an array by a condition:

x > 15  # boolean array

array([False, False, False, False, False, False,  True,  True])

x[x > 15]

array([16, 17])

We can even use indexing in an assignment. For example, we could set all odd numbers to -1 like this:

x[x % 2 != 0] = -1  # % is the remainder operator
x

array([10, -1, 12, -1, 14, -1, 16, -1])

Array operations

Elementwise operations

As a general rule, NumPy carries out operations element by element. If two arrays have identical shapes, this is pretty straightforward:

x = np.arange(1, 7).reshape((2, 3))
x

array([[1, 2, 3],
       [4, 5, 6]])

y = np.arange(7, 13).reshape((2, 3))
y

array([[ 7,  8,  9],
       [10, 11, 12]])

x + y

array([[ 8, 10, 12],
       [14, 16, 18]])

x - y

array([[-6, -6, -6],
       [-6, -6, -6]])

x * y

array([[ 7, 16, 27],
       [40, 55, 72]])

x / y

array([[0.14285714, 0.25      , 0.33333333],
       [0.4       , 0.45454545, 0.5       ]])

These operations are vectorized, which means that they are automatically applied to all array elements without us having to write a manual loop. Vectorized operations are extremely fast and efficient (they are basically as fast as in C or Fortran).

Important

Whenever possible, avoid loops over arrays! Loops are very slow and inefficient compared to vectorized operations. It is very rarely necessary to use loops with NumPy arrays, so if you find yourself writing a loop, consider whether you can use vectorized operations instead.

Vector and matrix operations

Note that NumPy does not attach special meanings to 1D or 2D arrays. A 1D array is not interpreted as a vector, and similarly, a 2D array is not interpreted as a matrix. However, it is of course possible to perform vector and matrix operations with special notation. Specifically, the dot product between two vectors can be computed with the @ operator:

a = np.array([1, 2, 3])
b = np.array([4, 5, 6])
a @ b

np.int64(32)

Alternatively, we can use the np.dot() function or the .dot() method of an array:

np.dot(a, b)

np.int64(32)

a.dot(b)

np.int64(32)

Similarly, the @ operator computes matrix multiplication if the operands are 2D arrays:

A = np.arange(6).reshape((2, 3))
B = np.arange(6).reshape((3, 2))
A @ B

array([[10, 13],
       [28, 40]])

Note that if the shapes are not compatible with matrix multiplication (the number of columns of the first operand must equal the number of rows of the second operand), we will get an error.

In summary, we saw that if two arrays have the same shape, operations are performed element by element. Things get a little more complicated when the two arrays have different shapes – we will discuss this so-called broadcasting soon. But before that, there’s even more we can do with just a single array using its special array methods.

Array methods

There are many array methods that perform some kind of computation across all elements (regardless of the shape), such as calculating the total sum, minimum, maximum, or mean:

x = np.arange(1, 13).reshape((3, 4))
x

array([[ 1,  2,  3,  4],
       [ 5,  6,  7,  8],
       [ 9, 10, 11, 12]])

x.sum()

np.int64(78)

x.mean()

np.float64(6.5)

However, we can also perform these operations along specific dimensions. For example, by computing the sum over axis 0 (the rows), we get the column sums:

x.sum(axis=0)

array([15, 18, 21, 24])

Similarly, we can compute the row means if we perform the operation over the columns (axis 1):

x.mean(axis=1)

array([ 2.5,  6.5, 10.5])

Tip

Remember that the specified axis will disappear from the result!

This also works for more than one axis simultaneously:

y = np.arange(60).reshape((4, 3, 5))  # four 3 x 5 arrays
y.mean(axis=(1, 2))  # compute the means of the four arrays

array([ 7., 22., 37., 52.])

In this example, y has shape (4, 3, 5). Because we compute the mean across axes 1 and 2 (and remembering that Python starts counting at zero), this leaves only axis 0. Because axis 0 has four elements, we get four individual means.

Universal functions

Other useful functions are directly available in the numpy namespace, for example np.sin(), np.cos(), np.exp(), np.sqrt(), and so on. They are vectorized (and therefore operate on all elements) and are referred to as universal functions (or short ufuncs).

np.sqrt(x)

array([[1.        , 1.41421356, 1.73205081, 2.        ],
       [2.23606798, 2.44948974, 2.64575131, 2.82842712],
       [3.        , 3.16227766, 3.31662479, 3.46410162]])

Broadcasting

Even when two array shapes are different, calculations still work if certain conditions are met (meaning that their shapes are compatible). The rules for this so-called broadcasting are:

If two arrays have different shapes, the array with fewer dimensions will automatically get new dimensions with size 1 on the left until it matches the number of the larger array.
Elements in dimensions with size 1 are automatically repeated to match the size of the largest array in that dimension.

If the shapes of the two arrays are the same after broadcasting, NumPy is able to compute the result. If the dimensions still do not match, the operation will result in an error (“operands could not be broadcast together”).

Let’s illustrate these rules with some examples:

x = np.ones((2, 3), dtype=int)  # shape (2, 3)
y = np.array([1, 2, 3])  # shape (3,) -> (1, 3) -> (2, 3)
x + y  # works

array([[2, 3, 4],
       [2, 3, 4]])

x = np.ones((2, 4), dtype=int)  # shape (2, 4)
y = np.array([1, 2, 3])  # shape (3,) -> (1, 3) -> (2, 3)
x + y  # does not work because shapes (2, 3) and (2, 4) do not match

ValueError: operands could not be broadcast together with shapes (2,4) (3,)

x = np.arange(18).reshape((3, 2, 3))  # shape (3, 2, 3)
y = np.arange(6).reshape((2, 3))  # shape (2, 3) -> (1, 2, 3) -> (3, 2, 3)
x + y  # works

array([[[ 0,  2,  4],
        [ 6,  8, 10]],

       [[ 6,  8, 10],
        [12, 14, 16]],

       [[12, 14, 16],
        [18, 20, 22]]])

Useful functions

This section summarizes some useful functions that we have not discussed so far.

Finding unique values

A common task in many data analysis pipelines is to determine the number of unique elements in an array. The np.unique() function does exactly that (and more):

x = np.array([5, 7, 2, 5, 1, 3, 5, 5, 2, 1, 7, 7, 2, 2])
np.unique(x)

array([1, 2, 3, 5, 7])

It is also possible to count the number of items for each unique value:

np.unique(x, return_counts=True)

(array([1, 2, 3, 5, 7]), array([2, 4, 1, 4, 3]))

Used like this, the function returns a tuple, where the first element corresponds to the unique elements, and the second element contains their frequencies. In this example, we can see that 1 occurs 2 times, 2 occurs 4 times, and so on.

Repeating an array

The np.tile() function creates a new array by repeating a given array a certain number of times:

a = np.eye(2, dtype=int)
a

array([[1, 0],
       [0, 1]])

np.tile(a, 2)

array([[1, 0, 1, 0],
       [0, 1, 0, 1]])

np.tile(a, (2, 1))

array([[1, 0],
       [0, 1],
       [1, 0],
       [0, 1]])

np.tile(a, (2, 4))

array([[1, 0, 1, 0, 1, 0, 1, 0],
       [0, 1, 0, 1, 0, 1, 0, 1],
       [1, 0, 1, 0, 1, 0, 1, 0],
       [0, 1, 0, 1, 0, 1, 0, 1]])

Sorting an array

The sort() method sorts an array in place:

x = rng.integers(low=0, high=100, size=15)
x

array([94, 43, 16, 83, 62, 70,  9, 31, 76, 83, 43, 80, 84, 38, 89])

x.sort()
x

array([ 9, 16, 31, 38, 43, 43, 62, 70, 76, 80, 83, 83, 84, 89, 94])

Sampling from an array

Given an array and a random generator, it is possible to create a random sample from the array (with or without replacement) using the choice() method of the random generator:

x = np.array([0, 1])
rng.choice(x, size=20)  # rng is defined in a previous example

array([0, 0, 1, 1, 0, 1, 0, 1, 0, 1, 1, 1, 1, 0, 1, 0, 1, 1, 0, 1])

rng.choice(np.arange(20), size=10, replace=False)

array([19,  0,  8, 17,  1,  3, 14, 11,  6,  7])

Shuffling an array

Similarly, the shuffle() method shuffles a given array in place:

x = np.arange(10)
rng.shuffle(x)
x

array([6, 9, 3, 5, 1, 8, 0, 4, 7, 2])

Additional resources

The official NumPy website contains excellent documentation and many tutorials. I specifically recommend the following tutorials for beginners:

If you are coming from MATLAB, this tutorial is for you:

NumPy for MATLAB users

Exercises

100 NumPy Exercises (Solutions)