{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Homework Exercise from Week 3\n", "\n", "This file re-uses a lot of the code from the in-class exercises from Week 2, so the first half of it you've already seen. There are two exceptions:\n", "\n", " 1. I have updated all the comments throughout the file to follow the best practices taught in [Chapter 5 of the class notes](https://nathancarter.github.io/MA346-course-notes/_build/html/chapter-5-before-and-after.html). Specifically:\n", " * Before every code cell, I've included the motivation for why we're running it.\n", " * After every code cell, I've interpreted the output for the reader.\n", " 2. I have added new code at the end of the file that wasn't there before. This code is either uncommented or very poorly commented. It's your job to:\n", " * Read all that code until you understand it. (Feel free to utilize office hours, email, Teams, etc. if you need help.)\n", " * Add in comments that follow the best practices covered in class and summarized above.\n", " * Submit your work by publishing it on Deepnote and emailing me the link to the published version.\n", "\n", "This file should be used in the same folder as [the CSV file of mortgage applications](https://nathancarter.github.io/MA346-course-notes/_static/practice-project-dataset-1.csv) discussed in [Chapter 4 of the class notes](https://nathancarter.github.io/MA346-course-notes/_build/html/chapter-4-review-of-python-and-pandas.html).\n", "\n", "---\n", "\n", "# HERE'S THE PART YOU'VE ALREADY SEEN:\n", "\n", "---\n", "\n", "We begin by loading the mortgage dataset, which also requires importing the pandas library." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "df = pd.read_csv( 'practice-project-dataset-1.csv' )" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "No output from this cell means it succeeded without error. The variable `df` now contains all the data.\n", "\n", "But what's in the dataset? Let's explore." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " | Unnamed: 0 | \n", "Unnamed: 0.1 | \n", "activity_year | \n", "lei | \n", "derived_msa_md | \n", "state_code | \n", "county_code | \n", "census_tract | \n", "conforming_loan_limit | \n", "derived_loan_product_type | \n", "... | \n", "denial_reason_2 | \n", "denial_reason_3 | \n", "denial_reason_4 | \n", "tract_population | \n", "tract_minority_population_percent | \n", "ffiec_msa_md_median_family_income | \n", "tract_to_msa_income_percentage | \n", "tract_owner_occupied_units | \n", "tract_one_to_four_family_homes | \n", "tract_median_age_of_housing_units | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "80545 | \n", "80545 | \n", "2018 | \n", "5493002QI2ILHHZH8D20 | \n", "31084 | \n", "CA | \n", "6037.0 | \n", "6.037603e+09 | \n", "C | \n", "Conventional:First Lien | \n", "... | \n", "NaN | \n", "NaN | \n", "NaN | \n", "7029 | \n", "96.76 | \n", "69300 | \n", "65 | \n", "885 | \n", "1363 | \n", "51 | \n", "
1 | \n", "62888 | \n", "62888 | \n", "2018 | \n", "549300ALNLUNS3Y53T24 | \n", "44060 | \n", "WA | \n", "53063.0 | \n", "5.306301e+10 | \n", "C | \n", "Conventional:First Lien | \n", "... | \n", "NaN | \n", "NaN | \n", "NaN | \n", "7568 | \n", "11.19 | \n", "64000 | \n", "138 | \n", "2002 | \n", "2338 | \n", "22 | \n", "
2 | \n", "140260 | \n", "140260 | \n", "2018 | \n", "549300PUSSF737Y6XW86 | \n", "12060 | \n", "GA | \n", "13223.0 | \n", "1.322312e+10 | \n", "C | \n", "Conventional:First Lien | \n", "... | \n", "NaN | \n", "NaN | \n", "NaN | \n", "11924 | \n", "26.72 | \n", "74400 | \n", "125 | \n", "3082 | \n", "3775 | \n", "13 | \n", "
3 | \n", "108456 | \n", "108456 | \n", "2018 | \n", "JJKC32MCHWDI71265Z06 | \n", "17900 | \n", "SC | \n", "45079.0 | \n", "4.507901e+10 | \n", "C | \n", "Conventional:First Lien | \n", "... | \n", "NaN | \n", "NaN | \n", "NaN | \n", "6860 | \n", "53.85 | \n", "68800 | \n", "128 | \n", "1815 | \n", "2465 | \n", "13 | \n", "
4 | \n", "82467 | \n", "82467 | \n", "2018 | \n", "5493002UNUIL8WHZAD63 | \n", "31140 | \n", "KY | \n", "21185.0 | \n", "2.118503e+10 | \n", "C | \n", "Conventional:First Lien | \n", "... | \n", "NaN | \n", "NaN | \n", "NaN | \n", "4719 | \n", "6.42 | \n", "70400 | \n", "161 | \n", "1412 | \n", "1616 | \n", "33 | \n", "
5 rows × 101 columns
\n", "