Recall the summary of visualization techniques given in the course notes for today.
Let's review it here.
If you want to see this | Then use this |
---|---|
Just the distribution's quartiles and outliers | Box plot |
Simple approximation of the distribution | Histogram |
Very good approximation of the distribution, maybe very wide | Swarm plot |
Good approximation of the distribution, not too wide | Strip plot |
Good approximation of a large distribution, smoothed | Violin plot |
Whether the distribution is approximately normal | Overlapping ECDFs |
If you want to see this | Then use this |
---|---|
A graph of the data when the data is a function | Line plot |
The shape of the data when the data is a relation | Scatter plot |
The shape of the data when the data is a relation, plus each variable's distribution | Joint plot |
The line of best fit through the data | sns.lmplot |
If you want to see this | Then use this |
---|---|
The quartiles and outliers of each | Side-by-side box plots |
Simple approximation of the distributions | Histograms with side-by-side bars |
Very good approximation of each distribution (can't fit too many) | Side-by-side swarm plots |
Good approximation of each distribution (can fit more) | Side-by-side strip plots |
Good approximation if the distributions are large (will be smoothed) | Side-by-side violin plots |
The shape of all possible two-column relationships | Pair plot |
A measurement of all possible correlations | Heat map of correlation coefficients |
Data: A series of 300 temperature readings from a single, stationary sensor at regular time intervals
Goal: To see the change in temperature over time
Which visualization type should I choose? Recall our options:
sns.lmplot
)Data: A series of 100,000 temperature readings from a single, stationary sensor at regular time intervals
Goal: The distribution of temperature values over that time interval
Which visualization type should I choose? Recall our options:
sns.lmplot
)Data: A large dataset about students who visited the wellness center with stress-related concerns, including columns about their demographic information, health history, academic record, and extracurricular activities
Goal: Ideas for how to predict risk for students who may be under too much pressure
Which visualization type should I choose? Recall our options:
sns.lmplot
)Data: The baseball salaries we investigated in Week 3 of this course
Goal: See changes in the distribution of batter salaries throughout the 2000s
Which visualization type should I choose? Recall our options:
sns.lmplot
)Data: The baseball salaries we investigated in Week 3 of this course
Goal: We want to do hypothesis testing on salaries of different groups, and need to ensure approximate normality of distributions first
Which visualization type should I choose? Recall our options:
sns.lmplot
)The homework from Week 3 included documenting some code that compared two groups within a home mortgage dataset, using two histograms on one graph. Today we'll extend that work, so your instructor will provide a copy of the solutions for you to use as a starting point.
Write a function that behaves as follows:
It might look like this:
def compare_two_groups ( group1, group2, name1, name2 ):
pass # replace this with actual code