Python and NumPy both provide modules for using common mathematical functions:
import math
import numpy as np
math.exp( 2 ), np.exp( 2 )
(7.38905609893065, 7.38905609893065)
Question: What is the key difference, as emphasized in the course notes, between using Python's math
module and using NumPy?
(In other words, if we had math
already, why did someone build numpy
?)
Question: To benefit from NumPy's speed, we'll need to use NumPy arrays. How can you use each of the following to get a NumPy array?
np.array()
np.arange()
np.linspace()
np.random.rand()
We will not be reviewing this in class, because you had an entire course on GB213.
To use GB213 in Python, see the GB213 review page in our course notes, here.
Are there any questions on that content now?
Binding an argument to a function f
means creating a new function g
by supplying a value for the argument, so that it doesn't need to be specified when calling the new function g
.
Example 1: Let's say we have a data set we wish to sample from, because it is too large. For simplicity here, I'll use a small data set, but that's just for example purposes. NumPy makes it easy to sample, as follows.
# tiny example:
data = [ 35.75, 57.27, 58.59, 24.35, 16.33, 9.38, 8.78, 35.24 ]
# pick 3 random values, with replacement:
np.random.choice( data, 3 )
array([16.33, 16.33, 57.27])
A statistical technique called bootstrapping requires sampling from the same data set repeatedly. Let's say we wanted to do that here. How would we use Python's partial
function to create a function that would let us choose however many random items we want from data
?
Example 2: Consider Python's built in round(x,n)
function that rounds a number x
to n
digits after the decimal.
round( 58.23 )
58
How would you create a new function that rounds any input to 3 digits after the decimal? (Note that Python's partial
function can't help here, because it can bind only the first argument(s) of a function.)
We will be doing an exercise using the function that you prepared for class today, which can get the COVID-19 time series data from any state in the US.
Recall curve fitting with SciPy from today's notes. We will try to fit a logistic model to a state's data. In math, the logistic curve is:
$$ y=\frac{\beta_0}{1+e^{\beta_1(-x+\beta_2)}} $$In Python, it is:
def logistic_curve ( x, β0, β1, β2 ):
return β0 / ( 1 + np.exp( β1*(-x+β2) ) )
Exercise 1:
(This follows the 4-step process in the curve fitting section of today's notes..)
curve_fit
function and use it to find the $\beta$s, as suggested in the curve fitting section of today's notes. To choose somewhat sensible initial guesses for the $\beta$s, the Project 1 assignment suggests:data.max()
len(data) / 2
Exercise 2: Take the code you did in Exercise 1 and abstract it into two functions.
x
as input and gives the corresponding (predicted) y
as output.Run your new function on several states to see how well or poorly it works in various situations.