Jupyter lets you do all this in one place:

  • write and run code
  • write explanations of code and data, including with mathematical formulas
  • view tables, plots, and other visualizations of data
  • interact with certain types of data visualizations

Notice that these things can be summarized as "math, stats, coding, visualizing, and explaining."

These are the bread and butter of data science, which is why people use Jupyter.

Jupyter notebooks contain input-output pairs, like this:

In [1]:
import numpy as np
np.random.normal( 0, 1, 20 )
Out[1]:
array([-1.5528848 ,  0.12957022, -0.91807189, -0.60007748, -0.50769516,
        0.67810184,  0.16894859,  2.25543421,  1.0799978 ,  1.57562487,
       -0.94228657,  0.80518276, -0.67946273,  0.87518185, -0.40811446,
       -0.65385804, -1.28785419,  0.7768802 , -0.36092852, -0.0990758 ])
In [2]:
import matplotlib.pyplot as plt
plt.plot( np.random.normal( 0, 1, 20 ) )
Out[2]:
[<matplotlib.lines.Line2D at 0x121307860>]

Important!!!

(Well, okay, it's important if you're running Jupyter on your own computer.)

Jupyter is made of two pieces:

  1. The notebook interface, which shows you a document with code and visualizations in it, called "a Jupyter notebook." (This runs on your computer.)
  2. The engine behind the notebook, which runs your code, and is doing its work invisibly in the background; this engine is called the "kernel." (This can run in the cloud or on your computer, your choice.)

Jupyter in the cloud

Let's start with a cloud provider, for simplicity. (You won't have to worry about starting up or shutting down the kernel; the cloud provider handles that.)

You can get a free Deepnote account using your Bentley email, which doubles as a Google account.

Jupyter on your computer

Later, if you choose, you can install Jupyter on your laptop.

See the instructions in the course notes for doing so.

Why scientists love Jupyter notebooks

  1. Markdown cells let the author explain the motivation behind each computation, and the interpretation of the results.
  2. Unlike a script, the notebook is run interactively, so lengthy computations can be done just once, and then built on without re-running the whole notebook (at least, not very often).
  3. Good notebooks are self-documenting. Scientists can write “lab reports,” data scientists can write “reports” (typically to other tech people), and computer scientists can write “documentation.”
  4. Anyone can re-run the notebook as a reproducibility check. Even the author can re-run it later, on new or updated data, to get the latest results. (Contrast this with a spreadsheet.)
  5. Visualizations are included inline and automatically updated with all other computations.

See these interesting example data science notebooks.

Why scientists hate Jupyter notebooks

  1. With a script, you have to understand only the script. With a notebook, you also have to know what code you’ve already run and what it did. This is easy to mess up.
    • Running cells out-of-order
    • Running cells and then changing/deleting them, or forgetting to run them after editing them
  2. Because a notebook encourages experimentation, it seems informal, and so people don’t always apply good coding practices, such as abstraction and testing.
  3. Notebooks lack helpful coding features that IDEs have, and that help you avoid errors.
  4. Just sharing a notebook itself gives the illusion of reproducibility, although this problem is decreasing as more people use sandboxed computation in the cloud.