Chapter 7 - Abstraction¶

View a printable version of these slides here.

Abstract/concrete spectrum¶

$$ \text{Concrete (or specific)} ~ ~ \longleftrightarrow ~ ~ \text{Abstract (or general)} $$

Abstraction means moving $\to$rightward$\to$ on the spectrum.

Generalization is another word for the same thing.

Example of generalization in a mathematical formula¶

Specific:

If it's $50^\circ$F outside, we can convert it $\frac59(50-32)=\frac59(18)=10$ and find that it's $10^\circ$C outside.

(Abstraction/generalization: Replace the specific constant 50 with a variable that could stand for any temperature.)

General:

Any temperature can be converted from degrees Fahrenheit to degrees Celsius using $C=\frac59(F-32)$.

Example of generalization in programming¶

Non-generalized code, converting columns of currencies from text to float:

df['Tuition'] = df['Tuition'].str.replace( "$", "" )
df['Tuition'] = df['Tuition'].str.replace( ",", "" )
df['Tuition'] = df['Tuition'].astype( float )
df['Fees'] = df['Fees'].str.replace( "$", "" )
df['Fees'] = df['Fees'].str.replace( ",", "" )
df['Fees'] = df['Fees'].astype( float )
df['Books'] = df['Books'].str.replace( "$", "" )
df['Books'] = df['Books'].str.replace( ",", "" )
df['Books'] = df['Books'].astype( float )
df['Room and board'] = df['Room and board'].str.replace( "$", "" )
df['Room and board'] = df['Room and board'].str.replace( ",", "" )
df['Room and board'] = df['Room and board'].astype( float )

Example of generalization in programming¶

Highlighting the repetition:

df['Tuition'       ] = df['Tuition'       ].str.replace( "$", "" )
df['Tuition'       ] = df['Tuition'       ].str.replace( ",", "" )
df['Tuition'       ] = df['Tuition'       ].astype( float )

df['Fees'          ] = df['Fees'          ].str.replace( "$", "" )
df['Fees'          ] = df['Fees'          ].str.replace( ",", "" )
df['Fees'          ] = df['Fees'          ].astype( float )

df['Books'         ] = df['Books'         ].str.replace( "$", "" )
df['Books'         ] = df['Books'         ].str.replace( ",", "" )
df['Books'         ] = df['Books'         ].astype( float )

df['Room and board'] = df['Room and board'].str.replace( "$", "" )
df['Room and board'] = df['Room and board'].str.replace( ",", "" )
df['Room and board'] = df['Room and board'].astype( float )

Example of generalization in programming¶

Creating a variable for what's changing really highlights the repetition:

column = 'Tuition'
df[column] = df[column].str.replace( "$", "" )
df[column] = df[column].str.replace( ",", "" )
df[column] = df[column].astype( float )
column = 'Fees'
df[column] = df[column].str.replace( "$", "" )
df[column] = df[column].str.replace( ",", "" )
df[column] = df[column].astype( float )
column = 'Books'
df[column] = df[column].str.replace( "$", "" )
df[column] = df[column].str.replace( ",", "" )
df[column] = df[column].astype( float )
column = 'Room and board'
df[column] = df[column].str.replace( "$", "" )
df[column] = df[column].str.replace( ",", "" )
df[column] = df[column].astype( float )

Example of generalization in programming¶

Use that variable to create a function and call it as many times as needed:

def simplify_currency ( column ):
    df[column] = df[column].str.replace( "$", "" )
    df[column] = df[column].str.replace( ",", "" )
    df[column] = df[column].astype( float )

simplify_currency( 'Tuition' )
simplify_currency( 'Fees' )
simplify_currency( 'Books' )
simplify_currency( 'Room and board' )

Benefits of abstraction¶

Decrease in total number of lines of code and size of many lines, and so it's much more readable.
What the code is doing is clearer, because we've given it a name (in this case, simplify_currency).
It isn't always obvious in the original version that there is repetition of the same procedure three times. In the new version, the repetition is obvious.
If you later need to change how you simplify currency, you have to make that change in only one place (inside the function). Before, you would have had to make the same change four times.
Also, if you tried to make a change to the code later, but accidentally missed changing one of the four, you'd have broken code and not realize it.
You could share this same function to other notebooks or with other coders if needed.

Learn this reflex¶

$$ \text{When I want to copy-and-paste code...} $$$$ \downarrow $$$$ \text{...instead I will create a function.} $$

Second example of generalization in programming¶

As an alternative to the first example, you could create a loop instead:

for column in [ 'Tuition', 'Fees', 'Books', 'Room and board' ]:
    df[column] = df[column].str.replace( "$", "" )
    df[column] = df[column].str.replace( ",", "" )
    df[column] = df[column].astype( float )

Second example of generalization in programming¶

Or you could create both a loop and a function:

def simplify_currency ( columns ):
    df[column] = df[column].str.replace( "$", "" )
    df[column] = df[column].str.replace( ",", "" )
    df[column] = df[column].astype( float )

for column in [ 'Tuition', 'Fees', 'Books', 'Room and board' ]:
    simplify_currency( column )

Second example of generalization in programming¶

And if there had been too many columns to fit on one line, then:

def simplify_currency ( columns ):
    df[column] = df[column].str.replace( "$", "" )
    df[column] = df[column].str.replace( ",", "" )
    df[column] = df[column].astype( float )

columns_to_simplify = [
    'Tuition',
    'Fees',
    'Books',
    'Room and board'
    # add as many as needed
]

for column in columns_to_simplify:
    simplify_currency( column )

Exercise 1¶

Examine the file in-class-exercise-1.ipynb in this Deepnote project, together with the corresponding dataset of baseball statistics, player-batting-2015.csv.

Duplicate the project for your own use.
Run the entire notebook to be sure that it succeeds, including finding the input dataset and producing a cleaned version as output.
Could any code in that file benefit from abstraction?
Introduce the necessary abstraction to improve the code.
Ensure the file still runs. (You should probably delete the cleaned version of the data before re-running, to be sure that the notebook still creates the cleaned version successfully.)

Exercise 2¶

Examine the file in-class-exercise-2.ipynb in the same Deepnote project as in the previous exercise, which will read the cleaned dataset produced when you ran in-class-exercise-1.ipynb.

Run the entire notebook to be sure that it succeeds, including finding the input dataset and producing a folder of images as output.
Could any code in that file benefit from abstraction?
Introduce the necessary abstraction to improve the code.
Ensure the file still runs. (You should probably delete the folder of images before re-running, to be sure that the notebook still creates them successfully.)

Exercise 3¶

The homework assignment you turned in today required documenting some code that does several things, including (a) creating a pair of overlapping histograms for two subsamples, (b) computing the mean of those two subsamples, and (c) performing a hypothesis test on those two subsamples.

But what if we wanted to perform the same test on subsamples other than just high/low minority percent areas, as that notebook did?

(continued on next slide)

Exercise 3, continued¶

Abstract the code that creates the plot into a function that could do so for any two sub-populations. (Before you begin: what parameters should it accept?) Once you've done so, call the function with the appropriate inputs so that it creates the same plot that it did before you did the abstraction.
Abstract the code that computes the two means and does the hypothesis test into a function that could do so for any two sub-populations. Ensure the function returns multiple values, so that the caller gets everything they need. Once you've done so, call the function with the appropriate inputs and then update your comments that interpret the outputs accordingly.
Now that you've got re-usable functions, apply them to answer these questions, with accompanying plots:
- Is the mean interest rate statistically significantly different between applications from males vs. females?
- Is the mean property value statistically significantly different between Asian applicants and non-Asian applicants?