Chapter 9 - Math and Stats in Python¶

View a printable version of these slides here.

Math in Python and NumPy¶

Python and NumPy both provide modules for using common mathematical functions:

In [1]:

import math
import numpy as np
math.exp( 2 ), np.exp( 2 )

Out[1]:

(7.38905609893065, 7.38905609893065)

Question: What is the key difference, as emphasized in the course notes, between using Python's math module and using NumPy?

(In other words, if we had math already, why did someone build numpy?)

Question: To benefit from NumPy's speed, we'll need to use NumPy arrays. How can you use each of the following to get a NumPy array?

np.array()
np.arange()
np.linspace()
np.random.rand()
pandas DataFrames

GB213 in Python¶

We will not be reviewing this in class, because you had an entire course on GB213.

To use GB213 in Python, see the GB213 review page in our course notes, here.

Are there any questions on that content now?

Binding function arguments¶

Binding an argument to a function f means creating a new function g by supplying a value for the argument, so that it doesn't need to be specified when calling the new function g.

Example 1: Let's say we have a data set we wish to sample from, because it is too large. For simplicity here, I'll use a small data set, but that's just for example purposes. NumPy makes it easy to sample, as follows.

In [2]:

# tiny example:
data = [ 35.75, 57.27, 58.59, 24.35, 16.33, 9.38, 8.78, 35.24 ]

# pick 3 random values, with replacement:
np.random.choice( data, 3 )

Out[2]:

array([16.33, 16.33, 57.27])

A statistical technique called bootstrapping requires sampling from the same data set repeatedly. Let's say we wanted to do that here. How would we use Python's partial function to create a function that would let us choose however many random items we want from data?

Example 2: Consider Python's built in round(x,n) function that rounds a number x to n digits after the decimal.

In [3]:

round( 58.23 )

Out[3]:

How would you create a new function that rounds any input to 3 digits after the decimal? (Note that Python's partial function can't help here, because it can bind only the first argument(s) of a function.)

Curve fitting¶

We will be doing an exercise using the function that you prepared for class today, which can get the COVID-19 time series data from any state in the US.

Recall curve fitting with SciPy from today's notes. We will try to fit a logistic model to a state's data. In math, the logistic curve is:

$$ y=\frac{\beta_0}{1+e^{\beta_1(-x+\beta_2)}} $$

In Python, it is:

In [4]:

def logistic_curve ( x, β0, β1, β2 ):
    return β0 / ( 1 + np.exp( β1*(-x+β2) ) )

Exercise 1:

(This follows the 4-step process in the curve fitting section of today's notes..)

Load your notebook or script that contains the function you did for homework, that can get COVID vaccination data for a state, and run that function to fetch the data for a state of your choosing.
We will use a logistic model, so add the code from the previous slide to your notebook or script.
Import the curve_fit function and use it to find the $\beta$s, as suggested in the curve fitting section of today's notes. To choose somewhat sensible initial guesses for the $\beta$s, the Project 1 assignment suggests:
- $\beta_0=$ the maximum number of vaccinations in the data, data.max()
- $\beta_1=1$
- $\beta_2=$ the half way point from the beginning until now, len(data) / 2
Plot the model as shown in the final section of the course notes for today.

Exercise 2: Take the code you did in Exercise 1 and abstract it into two functions.

One function takes a state abbreviation as input and does steps 1-3, producing a fit model as output, which is a function that takes a single x as input and gives the corresponding (predicted) y as output.
Another takes as input the data and the model function and produces a plot showing the scattered data and the model as a curve, on the same plot.

Run your new function on several states to see how well or poorly it works in various situations.