How We Personalized Tasty Tag Search Results

Published in

BuzzFeed Tech

11 min readAug 20, 2018

The Tasty app just celebrated its first birthday! There are many great things about the app, but there is definitely room for improvement.

For example, when searching for recipes via our tags, the results are ordered by the most recent recipes posted. If I usually click on healthy smoothies while searching on the ‘drinks’ tag, I might be a bit annoyed if I search ‘drinks’ again and the first ten results are all cocktails just because they were the most recently posted. Ideally, my search results would be personalized by taking into account my previous interactions. Personalization of results can help improve search metrics in a couple of ways:

Decrease the time spent scrolling until you find a recipe you want to click (minimize impressions until a click)
Decrease the proportion of times one searches for a recipe and exits without clicking anything (minimize the exit ratio)

This article will introduce the technical steps taken to achieve this goal, how Tasty data was prepped to build the model, and how to interpret the results.

What sort of data are we working with?

When trying to build a model, it’s important to figure out how much data you have as well as the quality of that data. To personalize search results based on each user’s previous interactions with your content (in our case, recipes), you should have an idea of which items a user interacted with, and how they felt about that interaction.

There are two kinds of data available for how a user felt about an interaction: explicit and implicit feedback. Explicit feedback is related to interactions such as rating a show on Netflix or giving a recipe a thumbs up. Explicit feedback is much richer as we have a concrete way of whether the user had a positive or negative experience, but the data is often very scarce. I’ve personally used Netflix and Amazon for years but have never rated anything! Implicit feedback involves interactions such as clicking on a link, purchasing an item, or viewing a video. Implicit feedback is less scarce but it can be more difficult to infer whether I liked a movie or recipe just because I viewed it.

The Tasty app only recently incorporated tips and ratings so explicit data would definitely be too sparse to use, thus I had to rely on implicit feedback. The implicit feedback used is the number of times someone clicked on a recipe in a given time span. This isn’t the end of the world, as companies like Hulu also realize the abundance of implicit feedback and utilize it in their recommendations!

Since the data represents the number of clicks for each user, recipe interaction, it’s a large MxN matrix denoted R where M is the number of users and N is the number of recipes. Below is a toy example of what it might look like

Each entry represents the number of times clicks for that user — recipe combination. With utilizing click data from the past 30 days, the matrix had about 1 billion entries and a large percentage of these entries were 0. Thus, the implicit Ratings matrix was very large and mostly empty meaning the computer is allocating a lot of space for information that isn’t super helpful for personalizing tag search. Recall that the goal is to personalize tag search based on previous click history. Thus, for each user, we somehow use the non-zero ratings to predict an implicit rating for recipes that the user did not interact with. Then, we can sort recipes by the predicted ratings. To make better use of this large, sparse matrix, I use Matrix Factorization, which consists of decomposing a large matrix into a product of smaller matrices, to get latent/hidden features of each user and each item. This can approximate how a user might interact with an item they’ve not previously engaged with.

Matrix Factorization (MF) & Alternating Least Squares (ALS)

As previously mentioned, our implicit Ratings matrix is large and sparse, meaning we’re storing a lot of values that aren’t helpful for personalizing tag search. Instead, we could reduce our MxN matrix into an MxK matrix, U and KxN matrix V such that R ≈ U*V.

Factorizing the matrix into user and recipe vectors

Instead of one sparse matrix, Matrix Factorization will yield two dense matrices including a number of latent features K for each of our items and users. We can then approximate the user f’s interest in all recipes via the f’th row of the product of U and V To get these user and item vectors, we must minimize a cost function that contains MxN terms. This huge number of terms prevents most direct optimization techniques such as stochastic gradient descent. Thus, Alternating Least Squares is used instead as it allows us to solve one feature vector at a time, which means it can be run in parallel! To do this, we can randomly initialize U and solve for V. Then we can use the current approximation of V to solve for U. The patterns repeats until we get a convergence that approximates R as best as we can.

Implementation and Evaluation

Ben Frederickson, a software developer in Vancouver, released an amazing packaged called implicit. Getting the user and recipe feature vectors by scratch is non-trivial and his package simplifies things.

import implicit# initialize model
model = implicit.als.AlternatingLeastSquares(factors=num_factors, regularization=regularization, iterations=num_iters)model.fit(training_matrix) # fit model
print(model.user_factors, model.item_factors)

Here is a brief explanation of the parameters used when initializing the model:

Factors: The value of K, which represents the number of latent features for the user and recipe vectors
Regularization: Gives a tradeoff between a biased model and model w/ high variance. Increasing this value may increase bias but decrease variance.
Iterations: The number of times to alternate between solving for the user feature vector and recipe feature vector in alternating least squares.

Model Evaluation

Usually in Machine Learning applications, you train your model on training data then evaluate it on data that has never been seen. This can be done by randomly subsetting the data to create a training and testing set. The setup looks a bit like this.

Typical data split into training and testing set

Our application needs a bit of a different setup because we need all of the user/recipe interactions to properly factorize the matrix and get our feature vectors. Instead, we hide a certain percentage of randomly selected user/recipe interactions from the model during the training phase. The testing set will be a binary version of the original Ratings matrix — whether a recipe was clicked or not. After we get our simplified matrix, we check during the test phase how many of the recommended recipes (utilizing the feature vectors from the training matrix) were actually clicked on using the testing set. The setup looks a bit like this

Now that we have our training set, test set, and model, what can we compare it to? Common practice says a good comparison is between the model recommendations and always recommending the most popular recipes. We did this for the top four searched Tasty tags, meaning we recommend all recipes within a clicked tag then compare it to the most popular recipe in that tag. The metric we use to compare is the area under the Receiver Operating Characteristic (ROC) curve. A greater area under the curve (AUC) means we are recommending recipes that end up being clicked near the top of the list of recommended recipes.

Note that since the recipes are masked randomly, there are a number of cases where masked recipes are filtered out because they aren’t under the specified tag. For example, a user had one recipe masked that was under the ‘dinner’ tag, so when I get the mean AUC for predictions under the ‘dessert’ tag, that user will not have any clicked recipes in the test set since their masked recipe was filtered out for not being dessert. For this reason, I also calculate the proportion of cases (denoted prop_non_nan) where users don’t have their masked recipes filtered out so there is a sort of ‘trust’ in the comparison between AUC values. The first column represents the AUC for the recommendation system and the second represents the AUC for recommending the most popular recipes first. Note that the photo below is a preliminary evaluation on a sample of users and doesn’t reflect the true evaluation metrics.

How to get Recommendations

Now that we know our model at least improves upon recommending the most popular recipe within each tag, let’s explore how this can get utilized in the app. I learned a lot during this section since I did not much data engineering experience prior to this internship. We now have our user and recipe feature vectors that will be updated every day to take into account incoming data as users continue to click and new recipes continue to be published. We utilize these to get a recommendation vector:

# using the user_id, find the user's vector in the U matrix
user_ind = np.where(users_arr == user_id)[0][0]
user_vec = user_matrix[user_ind, :]# Get dot product of user vector and all item vectors, then scale to be between [0, 1]
rec_vector = user_vecs[user_ind, :].dot(item_vecs.T)
min_max = MinMaxScaler()
rec_vector_scaled = min_max.fit_transform(rec_vector.reshape(-1, 1))[:, 0]

What to do with the recipes that a user has already clicked? There might be a chance that those recipes consistently resurface to the top which defeats the purpose of the project. For now, I get all the indices in rec_vector_scaled that correspond to previously clicked recipes and turn those values to 0 such that they are at the bottom of the barrel. If a user wants to revisit an old recipe, they can use the ‘Recently Viewed’ tab in Tasty or ‘like’ the recipe and effectively save it.

Now that we have our recommendation vector, we can create a dataframe with the recipe IDs and their recommendation scores, and then merge the dataframe with another dataframe containing tags for each recipe. This allows us to subset to only contain recipes with the tags the user is searching. Here is a small example of how things work:

rand_user = np.random.choice(users_arr, 1)
clicked_tag = 'dinner'
get_prev_clicks(rand_user, product_train, users_arr, recipes_arr, item_lookup, clicked_tag)rec_items(rand_user, product_train, user_vecs, item_vecs, users_arr, recipes_arr, item_lookup, clicked_tag, recipe_tags, num_items=10)

The previous clicks from the user can be categorized into ‘easy weeknight dinners’. This is captured in the recommendations as well.

Perfect! Given an arbitrary user_id and set of click tags, we can generate a recommendation of recipes utilizing that user’s previous interactions! I thought my project was good to go, and it could get integrated into the Tasty app. I talked to my mentor and she mentioned that I’d have to brainstorm what should be precomputed and stored in a database versus what should be computed and served to the user while they interact with the app? If I had to compute a recommendation vector, merge to a dataframe, subset to recipes with the clicked tags(s), then serve those recipe ids each time someone added or removed a tag, they might get a significant delay of a few seconds.

A solution I thought of was to precompute a recommendation lookup table for every user that includes the top 500 most recommended recipes. It would look something like this

These recipes will probably be split among a few tags with some overlap (e.g., 200 of them have the dinner tag, 300 are vegetarian, 100 are vegan, etc) but this number was justified because the majority of users tend to search primarily among a few tags and they rarely scroll anywhere near the 500th recipe. Now, everytime a user clicks a tag, we can query Elasticsearch (a distributed, RESTful search and analytics engine) to get all recipes with the tag, join with the recipes in the lookup table, then serve the results. Since our lookup table is quite small, the join is quick and the users are served the top recipes almost immediately.

Cool, what’s next?!

The next steps involve questions I never thought about until this internship. How much improvement will justify the operational cost of the integration (Big Query, AWS clusters, etc)? How much improvement do you expect to see? To try and answer these questions I talked to a senior product manager to get an idea of the metrics the team would want to see improved. It’s always good for a data scientist to better understand how the product should be impacted by their work. With these metrics in hand, I can now run an A/B test to see if the recommendation system had a significant and practical impact on the ways people use the app!

Additionally, there are a few things to do that could potentially improve the recommendations such as augmenting the score based on recipe historical performance (is this a recipe that people actually watch if they click on it?) or based on how recent the recipe was published. For example, people who use the app frequently might be looking for the newest thing to cook for dinner. So there might be a way to promote recently published content while not sticking with reverse chronological posting order.

Lastly, this utilized vectors of latent features for both users and recipes. These could be useful for lots of other tasks, such as which recipe should be the first one featured in the “Discover” page. Can we add new carousels (a row of recipes in the app with a theme such as ‘steak dinner’) utilizing which recipes our users tend to click on, etc. So many possibilities and fun ways to utilize this project!

I learned a ton working on this project and had fun amidst the bugs and frustration! I hope you all have fun using them to discover your next favorite meal!

Interested in becoming an intern yourself? Keep an eye out for internship postings on the BuzzFeed jobs page here!

To keep in touch with us here and find out what’s going on at BuzzFeed Tech, be sure to follow us on Twitter @BuzzFeedExp where a member of our Tech team takes over the handle for a week!