Neural Networks 101: Part 12
Collaborative Filtering
Collaborative Filtering is a technique that analyzes patterns in user to item relationships. It allows predictions on future user action, based on identifying the underlying patterns of historical user data.
These patterns are usually relationships between users and items, essentially learning the patterns of interactions.
Collaborative Filtering operates on finding patterns between Latent Factors and a target. A Latent Factor represents a characteristic of the relationship e.g. user preference. A target represents the item e.g. a type of movie or a rating of a movie.
A common example would be Netflix or Amazon, or any platform that has a recommendation system. User ratings for products or services are used to create a recommendation for another user. e.g. If you watched a science fiction film, another user who watched the same film or rated it highly, also watched or rated a Science Fiction series. Collaborative Filtering would predict that you (the user) would also be interested in the Science Fiction series. Collaborative Filtering would generate a prediction between your Latent Factors and targets.
Latent Factors
Latent Factors are intially randomized variables that are used to model the hidden relationship between user and item. it essentially captures patterns of interaction or interest. For example, a User and a Movie would be Latent Factors and the target would be the ratings of movies.
Latent Factors are randomized numbers specific to each variable. e.g.
user = [x1, x2, x3, x4]
movies = [x1, x2, x3, x4]
We would have a matrix of Latent Factors for each user and movie.
Embedding
An Embedding is an index into a matrix of Latent Factors for a specific user/record.
Dot Product
A dot product is the product of x2 matrices that result in a scalar (single number) output.
In Collaborative Filtering, this is used to calculate the score between x2 latent factors, indicating a predicted preference.
\[ \text{score} = u \cdot i = \sum_{k=1}^{n} u_k \times i_k \]For example, if there are x2 latent factors - user
and movie
.
user = [0.5, 0.1, 0.3]
movie = [0.6, 0.4, 0.2]
score=(0.5ร0.6)+(0.1ร0.4)+(0.3ร0.2)=0.3+0.04+0.06=0.4
We can imagine that these latent facotrs might represent user interest in certain genres of movies. Since the score is quite low (0.4), we can see that this dot product result would resolve into a recommendatoin.
Weight Decay
Weight Decay is a regularization technique used to mitigate overfitting. Weight Decay applies a penalty to the weight calculation to prevent the weights from growing too large, leading to overfitting and overshooting in optimization.
The Weight Decay parameter is usually applied as a hyperparameter that can be fine tuned.
\[ \text{parameters.grad} = \nabla L + wd \times 2 \times \text{parameters} \]Where:
- \( \nabla L \) is the gradient of the loss with respect to the model parameters, calculated during backpropagation.
- \( wd \times 2 \times \text{parameters} \) is the additional term from the weight decay, adjusting the gradients to discourage large weights.
Also important to note, the Weight Decay hyperparameter is also sometimes used in Loss calculation for the same purpose of keeping equations stable and preventing overfitting.
\[ \text{parameters.grad} += wd \times 2 \times \text{parameters} \]Collaborative Filtering Steps
Now that we understand Collaborative Filtering, lets go through the steps to train a model.
We will use a problem of users and movie ratings. We want to create predictions in order to recommend other movies.
Our Latent Factors will be the Users
and Movies
. The target will be the Ratings
. We’ll use pseudocode to demonstrate the intent.
- Generate Latent Factors for
Users
andMovies
, we need to set a number for the latent factors.
- This will create a tensor of initially randomized values for x amout of users and movies.
- We essentially end up creating a matrix of (100 x 50 = users) and (500 x 50 = movies)
users = randomized_tensor(latent_factor_size=50, num_users=100)
movies = randomized_tensor(latent_factor_size=50, num_movies=500)
- Initialize a bias for each Latent Factor
user_bias = randomized_tensor(1)
movie_bias = randomized_tensor(1)
- Generate a prediction using the Dot Product
- For simplicity, we’ll choose Embeddings from the Latent Factor matrices but this would be trained as a batch
r = random()
weight_decay = 0.1
result = (users[r] * movies[r]).sum()
result += weight_decay * user_bias + movie_bias
- Pass the result through a sigmoid function to stablize the equation
result = sigmoid(result, range=(0, 5.5))
- Calculate the loss based on the result
target = user_rating
loss = loss_fn(result, target)
- Calculate the gradients based on the loss
weight_decay = 0.1
parameters -= calc_grads(weight_decay, loss, parameters)
Limitations
Collaborative Filtering is a useful technique for finding patterns in user interaction but has limitations.
The “cold start problem” refers to the fact that new users need to have a history interaction built up in order for correlations in recommendations to be found. As well, predictions are not made on “new” input/new users, this requires existing Embeddings in the Latent Factor matrix. Essentially, if you want to make a prediction on a brand new user, you would theoretically have to retrain the whole model.
There are some solutions such as asking users for their preference when they first sign up. This can essentially bootstrap a new user Embedding and can be trained further the more they use the service.
This is why, on certain serivces like Netflix, new users are prompted to answer a few questions on their interests and favourite genres. The initial input will help but strap the users Embeddings.