In the course notes, we discussed a recommender system, which takes as input a user's preferences about past products (movies, in the chapter) and recommends new products based on those preferences.
Question 1. We stored previous users' preferences in a matrix for reference in helping make recommendations. What were the row and column headings in our preferences matrix? (Not the specific entries, of course, but what real-world groups did they represent?)
? | ? | ? | |
---|---|---|---|
? | 1 | 0 | 0 |
? | 0 | 1 | 1 |
? | 1 | 0 | 1 |
When a new user arrived, we needed their preferences as an input to our recommendation algorithm.
Question 2. In what format did we expect/store those preferences?
We used matrix multiplication (written in Python with the @
symbol) to "combine" the preferences matrix with the new user's preferences, as in prefs_matrix @ new_user_prefs
.
Question 3. What type of result did this create? (The specific values are not important, but what kind of object is it, what shape, etc.?)
$$ \text{prefs_matrix @ new_user_prefs}=\left[\begin{array}{ccc}1&0&0\\0&1&1\\1&0&1\end{array}\right]\left[\begin{array}{c}1\\0\\0\end{array}\right]=\text{?} $$Question 4. Why did we normalize the rows of the preferences matrix?
Question 5. What does it mean to normalize the row of a matrix?
Question 6. When we had a perfectly good user preferences matrix, with precise data from each user, why did we choose to replace it with an approximation?
Recall that you did your work in two phases:
Why do you think the instructions were explicitly split into those two steps? Why couldn't we just randomly sample jams from the overall data set in just one step instead?
In class today we'll be repeating the example from the Chapter 16 notes, but with a real dataset (the songs data you prepared for class today).
Our schedule will be like this, working in groups:
You will probably want to do today's work in a new Jupyter notebook or Python script, beginning by importing the CSV file you prepared for class today, jam-sample.csv
.
Convert the edge list you created for today into an adjacency matrix, as described in this section of the chapter notes.
As a quick check, you can compute your_matrix.shape
and verify that
Normalize the rows of your adjacency matrix, as described in this section of the chapter notes.
You can verify that this worked by computing the norms of the rows again after you've done the work, and all should be almost exactly 1.0. (If our computers were perfect, they'd all be exactly 1.0, but there is some slight imprecision in using any finite computing device.)
The way we've chosen to create an approximation of our adjacency matrix is using the Singular Value Decomposition (SVD) introduced in the course notes.
Create the SVD for your matrix, using the technique from this section of the chapter notes.
Once you've done so, you can check to see if your results make sense by computing the shapes of your $U$, $\Sigma$, and $V$ matrices.
U.shape
should be $n\times n$ for some value of $n$ in the range 1000-2000, the number of rows in your original preferences matrix.V.shape
should be $m\times m$ for some value of $m$ in the range 15000-30000, the number of columns in your original preferences matrix.Σ.shape
should be a single number, the same $n$ value from U.shape
.In this section of the chapter notes, we learned how to write a function $\rho$ that takes as input the number of singular values we plan to remove, and it lets us know the error level of the resulting approximation.
Bring that function into your work and verify that you can run it and it produces sensible results. Examples:
ρ(0)
should give 0 (no error if we remove nothing)ρ(1)
should give a very small numberρ(1000)
(or whatever your $n$ value is) should give a value close to 1.0Create a table of $\rho$ values, showing $\rho(i)$ for various $i$ between 1 and $n$. Python's range()
function can be useful here, especially since its third parameter can create ranges with large jumps.
list( range( 1, 1001, 100 ) )
[1, 101, 201, 301, 401, 501, 601, 701, 801, 901]
Choose a value for $i$ that you will use going forward; choose one that keeps the error level below 0.5.
In this section of the chapter notes, we saw code for creating an approximate version of the preferences matrix, by dropping the $i$ lowest singular values.
Apply that technique to your preferences matrix now.
As a quick check, if you compute the shape of the resulting matrix, it should be the same as the shape of the original preferences matrix.
We'll take our 10-minute break at this point.
We'll want to be able to create realistic preferences vectors about the songs in our dataset. Recall that a preferences vector is mostly zeros, but has a few ones to indicate which songs the user liked.
Recall that we can compute how similar your preferences are to those of the users in our dataset with a single matrix multiplication.
A
and your preferences vector is called pv
, then compute A @ pv
.jam-sample.csv
file with a number that represents their similarity to your preferences.Now that we know users that have your preferences, let's see what songs they like. These will be your recommendations.
Create a playlist (e.g., with Spotify, Apple Music, etc.) from your 25 song recommendations.
Shuffle play.
How did the recommender system do?
Be ready to tell us at the end of class!
Our recommender system is not specific to songs. It used absolutely no data about the content of any song. It didn't even know the year, artist, speed, genre, or even that the items in the matrix were songs.
The way it was able to make reasonable recommendations was just based on the associations of users with songs.
Although the system we just built is a small example, and real recommender systems used by large online retailers are much more complex, they are built on foundations like the one we just used.
Discussion: What are some ways to make the simple recommender system we just built more powerful?