$$ \text{Concrete (or specific)} ~ ~ \longleftrightarrow ~ ~ \text{Abstract (or general)} $$
Abstraction means moving $\to$rightward$\to$ on the spectrum.
Generalization is another word for the same thing.
Specific:
If it's $50^\circ$F outside, we can convert it $\frac59(50-32)=\frac59(18)=10$ and find that it's $10^\circ$C outside.
(Abstraction/generalization: Replace the specific constant 50 with a variable that could stand for any temperature.)
General:
Any temperature can be converted from degrees Fahrenheit to degrees Celsius using $C=\frac59(F-32)$.
Non-generalized code, converting columns of currencies from text to float:
df['Tuition'] = df['Tuition'].str.replace( "$", "" )
df['Tuition'] = df['Tuition'].str.replace( ",", "" )
df['Tuition'] = df['Tuition'].astype( float )
df['Fees'] = df['Fees'].str.replace( "$", "" )
df['Fees'] = df['Fees'].str.replace( ",", "" )
df['Fees'] = df['Fees'].astype( float )
df['Books'] = df['Books'].str.replace( "$", "" )
df['Books'] = df['Books'].str.replace( ",", "" )
df['Books'] = df['Books'].astype( float )
df['Room and board'] = df['Room and board'].str.replace( "$", "" )
df['Room and board'] = df['Room and board'].str.replace( ",", "" )
df['Room and board'] = df['Room and board'].astype( float )
Highlighting the repetition:
df['Tuition' ] = df['Tuition' ].str.replace( "$", "" )
df['Tuition' ] = df['Tuition' ].str.replace( ",", "" )
df['Tuition' ] = df['Tuition' ].astype( float )
df['Fees' ] = df['Fees' ].str.replace( "$", "" )
df['Fees' ] = df['Fees' ].str.replace( ",", "" )
df['Fees' ] = df['Fees' ].astype( float )
df['Books' ] = df['Books' ].str.replace( "$", "" )
df['Books' ] = df['Books' ].str.replace( ",", "" )
df['Books' ] = df['Books' ].astype( float )
df['Room and board'] = df['Room and board'].str.replace( "$", "" )
df['Room and board'] = df['Room and board'].str.replace( ",", "" )
df['Room and board'] = df['Room and board'].astype( float )
Creating a variable for what's changing really highlights the repetition:
column = 'Tuition'
df[column] = df[column].str.replace( "$", "" )
df[column] = df[column].str.replace( ",", "" )
df[column] = df[column].astype( float )
column = 'Fees'
df[column] = df[column].str.replace( "$", "" )
df[column] = df[column].str.replace( ",", "" )
df[column] = df[column].astype( float )
column = 'Books'
df[column] = df[column].str.replace( "$", "" )
df[column] = df[column].str.replace( ",", "" )
df[column] = df[column].astype( float )
column = 'Room and board'
df[column] = df[column].str.replace( "$", "" )
df[column] = df[column].str.replace( ",", "" )
df[column] = df[column].astype( float )
Use that variable to create a function and call it as many times as needed:
def simplify_currency ( column ):
df[column] = df[column].str.replace( "$", "" )
df[column] = df[column].str.replace( ",", "" )
df[column] = df[column].astype( float )
simplify_currency( 'Tuition' )
simplify_currency( 'Fees' )
simplify_currency( 'Books' )
simplify_currency( 'Room and board' )
simplify_currency
).$$ \text{When I want to copy-and-paste code...} $$$$ \downarrow $$$$ \text{...instead I will create a function.} $$
As an alternative to the first example, you could create a loop instead:
for column in [ 'Tuition', 'Fees', 'Books', 'Room and board' ]:
df[column] = df[column].str.replace( "$", "" )
df[column] = df[column].str.replace( ",", "" )
df[column] = df[column].astype( float )
Or you could create both a loop and a function:
def simplify_currency ( columns ):
df[column] = df[column].str.replace( "$", "" )
df[column] = df[column].str.replace( ",", "" )
df[column] = df[column].astype( float )
for column in [ 'Tuition', 'Fees', 'Books', 'Room and board' ]:
simplify_currency( column )
And if there had been too many columns to fit on one line, then:
def simplify_currency ( columns ):
df[column] = df[column].str.replace( "$", "" )
df[column] = df[column].str.replace( ",", "" )
df[column] = df[column].astype( float )
columns_to_simplify = [
'Tuition',
'Fees',
'Books',
'Room and board'
# add as many as needed
]
for column in columns_to_simplify:
simplify_currency( column )
Examine the file in-class-exercise-1.ipynb
in this Deepnote project, together with the corresponding dataset of baseball statistics, player-batting-2015.csv
.
Examine the file in-class-exercise-2.ipynb
in the same Deepnote project as in the previous exercise, which will read the cleaned dataset produced when you ran in-class-exercise-1.ipynb
.
The homework assignment you turned in today required documenting some code that does several things, including (a) creating a pair of overlapping histograms for two subsamples, (b) computing the mean of those two subsamples, and (c) performing a hypothesis test on those two subsamples.
But what if we wanted to perform the same test on subsamples other than just high/low minority percent areas, as that notebook did?
(continued on next slide)