| First | Last | Day | Sales |
|---|---|---|---|
| Amy | Smith | Monday | 39 |
| Amy | Smith | Tuesday | 68 |
| Amy | Smith | Wednesday | 10 |
| Bob | Jones | Monday | 93 |
| Bob | Jones | Tuesday | 85 |
| Bob | Jones | Wednesday | 0 |
From the reading, what is the value of this form of data?
And what is the other name for this form of data?
| First | Last | Monday | Tuesday | Wednesday |
|---|---|---|---|---|
| Amy | Smith | 39 | 68 | 10 |
| Bob | Jones | 93 | 85 | 0 |
From the reading, what is the value of this form of data?
And what are the verbs used to convert from tall to wide form, or wide to tall form?
Names
indexcolumnsvaluesRequirements
index and columns to values must be a function.Guarantees
columns column.Names
id_varsvalue_varsRequirements
id_vars uniquely identify each row.id_vars to each value_vars column is a function.Guarantees
value_vars column headers will be merged into one single column entitled variable.value_vars column entries will be merged into one single column entitled value.id_vars entries will be replicated so that the result is still a function from id_vars and variable to value.Same as pivot, except:
index and columns need not be a function.aggfunc.Pivot tables are extremely common for summarizing data, especially since there are so many different aggregation functions. Here is a list of all the built-in ones, and you can also code your own.
After the break in class today, you'll be diving into working on some datasets for practice.
To prepare for that, let's do a few exercises for discussion, to refresh your memory on other pandas tools, functions, and syntax.
Which of the following sentences correctly describes the uses of the pandas functions loc and iloc?
df, you can use df.loc[...] to look up rows, columns, or cells by their names, and df.iloc[...] to look up rows, columns, or cells by their zero-based numerical index.df, you can use df.loc[...] to access rows and df.iloc[...] to access columns.df, you can use df.loc[...] to look up one or more rows by integer index and df.iloc[...] to do the same, but counting from the end of the DataFrame (iloc = "inverted loc").df, you can use df[...], df.loc[...], and df.iloc[...] interchangeably to get access to individual entries in the DataFrame.If you have a DataFrame in the variable df, which of the following are situations in which you would want to execute the code df["sales"] = 0?
Assume we have a DataFrame df with several columns, including "Salary" and "Job Title". How would we find the salaries of anyone whose job title is "Engineer"? (Fill in the blanks.)
indices = df[___________] == __________
salaries = df.loc[____________________]
What happens when we run the code df["column"].apply( f )?
x in df["column"] with the result of f(x)f(df["column"])f(x) for each entry x in df["columns"]df["column"] with the result of f(df["column"])Assume that we have read a DataFrame df from a CSV file, and provided no default index, so that its index is the integers from 0 to 9.
Assume further that the rows in df each represent data collected in one particular year. The data were collected beginning with the year 1970, and repeating the data collection every five years, so that the first row is from 1970, the second row is from 1975, and so on.
We want the index of df to represent the year of data collection, which is not currently stored in any of the columns of the DataFrame. Which of the following pieces of code would accomplish that goal?
# Option 1:
df.index = df.index*5 + 1965
df.index.name = 'Year'
# Option 2:
df.index = df.index*5 + 1970
df.index.name = 'Year'
# Option 3:
df.index = range(0,50,5) + 1965
df.index.name = 'Year'
# Option 4:
df.index = range(0,50,5) + 1970
df.index.name = 'Year'