Lecture 4: Numpy and Matplotlib¶
These are two of the most fundamental parts of the scientific python "ecosystem". Most everything else is built on top of them.
To install Matplotlib:¶
Open a Terminal window, activate rcaes_env:
conda activate rcaes_env conda install -c conda-forge matplotlib
import numpy as np
What did we just do? We imported a package. This brings new variables (mostly functions) into our interpreter. We access them as follows.
# find out what's in numpy
# find out what version we have
The numpy documentation is crucial!
The core class is the numpy ndarray (n-dimensional array). The n-dimensional array object in NumPy is referred to as an ndarray, a multidimensional container of homogeneous items – i.e. all values in the array are the same type and size. These arrays can be one-dimensional (one row or column vector), two-dimensional (m rows x n columns), or three-dimensional (arrays within arrays).
Create array from a list¶
# create an array from a list
a = np.array([9,0,2,1,0])
# find out the datatype
# find out the shape
# what is the shape
# another array with a different datatype and shape
b = np.array([[5,3,1,9],[9,2,3,0]], dtype=np.float64)
# array with 3 rows x 4 columns
a_2d = np.array([[3,2,0,1],[9,1,8,7],[4,0,1,6]])
array([[3, 2, 0, 1], [9, 1, 8, 7], [4, 0, 1, 6]])
# check dtype and shape
b.dtype, b.shape
(dtype('float64'), (2, 4))
Important Concept: The fastest varying dimension is the last dimension! The outer level of the hierarchy is the first dimension. (This is called "c-style" indexing)
Create arrays using functions¶
There are lots of ways to create arrays.
# create some uniform arrays
c = np.zeros((9,9))
d = np.ones((3,6,3), dtype=np.complex128)
e = np.full((3,3), np.pi)
e = np.ones_like(c)
f = np.zeros_like(d)
g = np.random.rand(3,4)
The np.arange()
function is used to generate an array with evenly spaced values within a given interval. np.arange()
can be used with one, two, or three parameters to specify the start, stop, and step values. If only one value is passed to the function, it will be interpreted as the stop value:
# create some ranges
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
# arange is left inclusive, right exclusive
array([2. , 2.25, 2.5 , 2.75, 3. , 3.25, 3.5 , 3.75])
Similarly, the np.linspace()
function is used to construct an array with evenly spaced numbers over a given interval. However, instead of the step parameter, np.linspace()
takes a num parameter to specify the number of samples within the given interval:
# linearly spaced
array([2. , 2.10526316, 2.21052632, 2.31578947, 2.42105263, 2.52631579, 2.63157895, 2.73684211, 2.84210526, 2.94736842, 3.05263158, 3.15789474, 3.26315789, 3.36842105, 3.47368421, 3.57894737, 3.68421053, 3.78947368, 3.89473684, 4. ])
Note that unlike np.arange()
, np.linspace()
includes the stop value by default (this can be changed by passing endpoint=True
). Finally, it should be noted that while we could have used np.arange()
to generate the same array in the above example, it is recommended to use np.linspace()
when a non-integer step (e.g. 0.25) is desired.
np.linspace(2,4,20, endpoint = False)
array([2. , 2.1, 2.2, 2.3, 2.4, 2.5, 2.6, 2.7, 2.8, 2.9, 3. , 3.1, 3.2, 3.3, 3.4, 3.5, 3.6, 3.7, 3.8, 3.9])
Create two-dimensional grids¶
x = np.linspace(-4, 4, 9)
y = np.linspace(-5, 5, 11)
x_2d, y_2d = np.meshgrid(x, y)
array([[-4., -3., -2., -1., 0., 1., 2., 3., 4.], [-4., -3., -2., -1., 0., 1., 2., 3., 4.], [-4., -3., -2., -1., 0., 1., 2., 3., 4.], [-4., -3., -2., -1., 0., 1., 2., 3., 4.], [-4., -3., -2., -1., 0., 1., 2., 3., 4.], [-4., -3., -2., -1., 0., 1., 2., 3., 4.], [-4., -3., -2., -1., 0., 1., 2., 3., 4.], [-4., -3., -2., -1., 0., 1., 2., 3., 4.], [-4., -3., -2., -1., 0., 1., 2., 3., 4.], [-4., -3., -2., -1., 0., 1., 2., 3., 4.], [-4., -3., -2., -1., 0., 1., 2., 3., 4.]])
array([[-5., -5., -5., -5., -5., -5., -5., -5., -5.], [-4., -4., -4., -4., -4., -4., -4., -4., -4.], [-3., -3., -3., -3., -3., -3., -3., -3., -3.], [-2., -2., -2., -2., -2., -2., -2., -2., -2.], [-1., -1., -1., -1., -1., -1., -1., -1., -1.], [ 0., 0., 0., 0., 0., 0., 0., 0., 0.], [ 1., 1., 1., 1., 1., 1., 1., 1., 1.], [ 2., 2., 2., 2., 2., 2., 2., 2., 2.], [ 3., 3., 3., 3., 3., 3., 3., 3., 3.], [ 4., 4., 4., 4., 4., 4., 4., 4., 4.], [ 5., 5., 5., 5., 5., 5., 5., 5., 5.]])
Basic indexing is similar to lists
# get some individual elements of xx
x_2d[0,0], x_2d[-1,-1], x_2d[3,-5]
(-4.0, 4.0, 0.0)
# get some whole rows and columns
x_2d[0].shape, x_2d[:,-1].shape
((9,), (11,))
# get some ranges
(7, 2)
There are many advanced ways to index arrays. You can read about them in the manual. Here is one example.
# use a boolean array as an index
idx = x_2d<0
# two dimensional grids
x = np.linspace(-2*np.pi, 2*np.pi, 100)
y = np.linspace(-np.pi, np.pi, 50)
xx, yy = np.meshgrid(x, y)
xx.shape, yy.shape
((50, 100), (50, 100))
f = np.sin(xx) * np.cos(0.5*yy)
At this point you might be getting curious what these arrays "look" like. So we need to introduce some visualization.
from matplotlib import pyplot as plt
# %matplotlib inline
Manipulating array dimensions¶
# transpose
# Flip the array up/down (reverse the order of the rows)
# reshape an array (wrong size)
g = np.reshape(f, (8,9))
# reshape an array (right size) and mess it up
g = np.reshape(f, (200,25))
(50, 100)
(300, 100)
# tile an array
Broadcasting is an efficient way to multiply arrays of different sizes
from IPython.display import Image

# multiply f by x
print(f.shape, x.shape)
g = f * x
(50, 100) (100,) (50, 100)
# multiply f by y
print(f.shape, y.shape)
h = f * y
(50, 100) (50,)
# use newaxis special syntax
h = f * y[:,np.newaxis]
(50, 100)
Reduction Operations¶
# sum
# mean
# std
# apply on just one axis
# Mean of each row (calculated across columns)
g_xmean = g.mean(axis=1)
# Mean of each column (calculated across rows)
g_ymean = g.mean(axis=0)
plt.plot(x, g_ymean)
plt.plot(g_xmean, y)
Missing data¶
Most real-world datasets – environmental or otherwise – have data gaps. Data can be missing for any number of reasons, including observations not being recorded or data corruption. While a cell corresponding to a data gap may just be left blank in a spreadsheet, when imported into Python, there must be some way to handle "blank" or missing values.
Missing data should not be replaced with zeros, as 0 can be a valid value for many datasets, (e.g. temperature, precipitation, etc.). Instead, the convention is to fill all missing data with the constant NaN. NaN stands for "Not a Number" and is implemented in NumPy as np.nan.
NaNs are handled differently by different packages. In NumPy, all computations involving NaN values will return nan:
data = np.array([[2.,2.7,1.89],
[1.1, 0.0, np.nan],
[3.2, 0.74, 2.1]])
More Matplotlib¶
Figure and Axes¶
The figure is the highest level of organization of matplotlib objects. If we want, we can create a figure explicitly.
fig = plt.figure()
fig = plt.figure()
fig = plt.figure()
fig = plt.figure()
ax1 = fig.add_axes([0, 0, 0.5, 1])
Subplot syntax is one way to specify the creation of multiple axes.
fig = plt.figure()
fig = plt.figure(figsize=(12, 6))
There is a shorthand for doing this all at once.
This is our recommended way to create new figures!
fig, ax = plt.subplots()
fig, axes = plt.subplots(ncols=2, figsize=(8, 4), subplot_kw={'facecolor': 'g'})
Drawing into Axes¶
All plots are drawn into axes. It is easiest to understand how matplotlib works if you use the object-oriented style.
# create some data to plot
import numpy as np
x = np.linspace(-np.pi, np.pi, 100)
y = np.cos(x)
z = np.sin(6*x)
fig, ax = plt.subplots()
ax.plot(x, y)
This does the same thing as
plt.plot(x, y)
This starts to matter when we have multiple axes to worry about.
fig, axes = plt.subplots(figsize=(8, 4), ncols=2)
ax0, ax1 = axes
ax0.plot(x, y)
ax1.plot(x, z)
Labeling Plots¶
fig, axes = plt.subplots(figsize=(8, 4), ncols=2)
ax0, ax1 = axes
ax0.plot(x, y)
ax0.set_title('x vs. y')
ax1.plot(x, z)
ax1.set_title('x vs. z')
# squeeze everything in
Customizing Line Plots¶
fig, ax = plt.subplots()
ax.plot(x, y, x, z)
It's simple to switch axes
fig, ax = plt.subplots()
ax.plot(y, x, z, x)
Line Styles¶
fig, axes = plt.subplots(figsize=(16, 5), ncols=3)
axes[0].plot(x, y, linestyle='dashed')
axes[0].plot(x, z, linestyle='--')
axes[1].plot(x, y, linestyle='dotted')
axes[1].plot(x, z, linestyle=':')
axes[2].plot(x, y, linestyle='dashdot', linewidth=5)
axes[2].plot(x, z, linestyle='-.', linewidth=0.5)
As described in the colors documentation, there are some special codes for commonly used colors:
- b: blue
- g: green
- r: red
- c: cyan
- m: magenta
- y: yellow
- k: black
- w: white
fig, ax = plt.subplots()
ax.plot(x, y, color='k')
ax.plot(x, z, color='r')
Other ways to specify colors:
fig, axes = plt.subplots(figsize=(16, 5), ncols=3)
# grayscale
axes[0].plot(x, y, color='0.8')
axes[0].plot(x, z, color='0.2')
# RGB tuple
axes[1].plot(x, y, color=(1, 0, 0.7))
axes[1].plot(x, z, color=(0, 0.4, 0.3))
# HTML hex code
axes[2].plot(x, y, color='#00dcba')
axes[2].plot(x, z, color='#b029ee')
There is a default color cycle built into matplotlib.
'color' |
'#1f77b4' |
'#ff7f0e' |
'#2ca02c' |
'#d62728' |
'#9467bd' |
'#8c564b' |
'#e377c2' |
'#7f7f7f' |
'#bcbd22' |
'#17becf' |
fig, ax = plt.subplots(figsize=(12, 10))
for factor in np.linspace(0.2, 1, 11):
ax.plot(x, factor*y)
There are lots of different markers availabile in matplotlib!
fig, axes = plt.subplots(figsize=(12, 5), ncols=2)
axes[0].plot(x[:20], y[:20], marker='.')
axes[0].plot(x[:20], z[:20], marker='o')
axes[1].plot(x[:20], z[:20], marker='^',
markersize=10, markerfacecolor='r',
Label, Ticks, and Gridlines¶
fig, ax = plt.subplots(figsize=(12, 7))
ax.plot(x, y)
ax.set_title('A complicated math function: $f(x) = \cos(x)$')
ax.set_xticks(np.pi * np.array([-1, 0, 1]))
ax.set_xticklabels(['$-\pi$', '0', '$\pi$'])
ax.set_yticks([-1, 0, 1])
ax.set_yticks(np.arange(-1, 1.1, 0.2), minor=True)
#ax.set_xticks(np.arange(-3, 3.1, 0.2), minor=True)
ax.grid(which='minor', linestyle='--')
ax.grid(which='major', linewidth=2)
Axis Limits¶
fig, ax = plt.subplots()
ax.plot(x, y, x, z)
ax.set_xlim(-5, 5)
ax.set_ylim(-3, 3)
fig, ax = plt.subplots()
ax.plot(x, y, x, z)
ax.set_xlim(-5, 5)
ax.set_ylim(-100, 100)
Text Annotations¶
fig, ax = plt.subplots()
ax.plot(x, y)
ax.text(-3, 0.3, 'hello world')
ax.annotate('the maximum', xy=(0, 1),
xytext=(0, 0), arrowprops={'facecolor': 'k'})
fig, ax = plt.subplots()
ax.plot(x, y)
ax.text(0.1, 0.9, 'hello world', transform=ax.transAxes)
ax.annotate('the maximum', xy=(0, 1),
xytext=(0, 0), arrowprops={'facecolor': 'k'})
fig, ax = plt.subplots()
splot = ax.scatter(y, z, c=x, s=(100*z**2 + 5))
Bar Plots¶
labels = ['first', 'second', 'third']
values = [10, 5, 30]
fig, axes = plt.subplots(figsize=(10, 5), ncols=2)
axes[0].bar(labels, values)
axes[1].barh(labels, values)
x1d = np.linspace(-2*np.pi, 2*np.pi, 100)
y1d = np.linspace(-np.pi, np.pi, 50)
xx, yy = np.meshgrid(x1d, y1d)
f = np.cos(xx) * np.sin(yy)
(50, 100)
fig, ax = plt.subplots(figsize=(12,4), ncols=2)
ax[1].imshow(f, origin='lower')
fig, ax = plt.subplots(ncols=2, figsize=(12, 5))
pc0 = ax[0].pcolormesh(x1d, y1d, f)
pc1 = ax[1].pcolormesh(xx, yy, f)
fig.colorbar(pc0, ax=ax[0])
fig.colorbar(pc1, ax=ax[1])
x_sm, y_sm, f_sm = xx[:10, :10], yy[:10, :10], f[:10, :10]
fig, ax = plt.subplots(figsize=(12,5), ncols=2)
# last row and column ignored!
ax[0].pcolormesh(x_sm, y_sm, f_sm, edgecolors='k', shading = 'nearest')
# same!
ax[1].pcolormesh(x_sm, y_sm, f_sm[:-1, :-1], edgecolors='k', shading = 'flat')
