Exploration-Exploitation Problem in Recommender Systems: Leveraging Uncertainty for Better Recommendations
In the world of recommender systems, the exploration-exploitation problem is a constant challenge. How do we balance recommending items that we know perform well with exploring new, potentially better options? In a joint post with Inbar Naor, we explore how uncertainty can help us solve this problem.
At Taboola, our goal is to recommend items that users will find relevant. This relevance is measured through Click Through Rate (CTR) — the probability of a user clicking on a recommended item. However, we don’t have perfect information about the CTR of all items. This is where uncertainty comes into play.
We face a familiar scenario — like choosing an ice cream flavor out of many options. Do we stick with our favorite, known flavor, or do we explore new, unknown flavors in search of a better option? These strategies — exploitation and exploration — are essential in recommender systems as well.
One simple approach to tackling this challenge is the ϵ-greedy algorithm, where a percentage of traffic is allocated to exploring new items randomly. The rest is reserved for exploiting known, high-performing items. This serves as a baseline for more sophisticated methods.
One such method is the Upper Confidence Bound (UCB) algorithm, which uses uncertainty to determine which items to recommend. By balancing the expected CTR with the confidence bound, UCB guides us towards exploring new items with potential high performance. Thompson Sampling is another approach that incorporates the entire estimated distribution of an item’s CTR.
However, in a dynamic environment like Taboola where new items enter and leave the system daily, relying solely on empirical data may not be sufficient. We need a way to estimate the CTR of new items without showing them to users. This is where neural networks come into play, allowing us to use model estimations to make informed decisions about exploring new items with uncertainty.
To measure the success of our exploration efforts, we use an exploration throughput metric at Taboola. This metric helps us evaluate how well different models are exploring new items while maintaining good performance. We found that models using the UCB approach strike a good balance between exploring new items and recommending high-performing items.
The exploration-exploitation problem remains a fascinating challenge for many companies in the recommender systems domain. By leveraging uncertainty and advanced algorithms like UCB, we can continue to improve our recommendation systems and provide the best service to users. Stay tuned for our next post where we’ll dive deeper into the models used for estimating CTR and uncertainty.