Missing values¶
Consider whether each of the following is reasonable or not, and be prepared to say why.
- In a spreadsheet of daily volunteer hours logged by National Honor Society students, some values are missing. The staff member in charge of the spreadsheet filled those values with the mean across all days for the student in question.
- In a database of historical immigration records, some records mark a person's age with an X. An analyst, before performing statistics on the data, wants to replace all X values with NaN.
- A data scientist is trying to fit a multi-variable linear model to predict the rate of reported intolerant behaviors from variables measuring the employer's culture and commitments to diversity messaging and training. Some rows in the dataset of companies contain one or more missing values. The data scientist plans to drop these rows before creating the model.