Until recently, big data processing has been dominated by batch systems like MapReduce and Spark, which allow us to periodically process a large amount of data very efficiently. As a result, most of today’s machine learning workload is done in batches. For example, a model might generate predictions once a day and be updated with new training data once a month.
While batch-first machine learning still works for many companies, this paradigm often leads to suboptimal model performance and lost business opportunities. In the coming year, I hope that more companies will deploy models that can generate predictions in real time and update more frequently to adapt to changing environments.
Consider an ecommerce website where half the visitors are new users or existing users who aren’t logged in. Because these visitors are new, there are no recommendations personalized to them until the next batch of predictions is computed. By then, it’s likely that many of these visitors will have left without making a purchase because they didn’t find anything relevant to them.
In the last couple of years, technically progressive companies have moved toward real-time machine learning. The first level is online prediction. These companies use streaming technologies like Kafka and Kinesis to capture and process a visitor’s activities on their sites — often called behavioral data — in real-time. This enables them to extract online features and combine them with batch features to generate predictions tailored to a specific visitor based on their activities. Companies that have switched to online prediction, which include Coveo, eBay, Faire, Stripe, and Netflix, have seen more accurate predictions. This leads to higher conversion rates, retention rates, and eventually higher revenue. Online inference also enables sophisticated evaluation techniques like contextual bandits that can determine the best-performing model using much less data than traditional A/B testing.
The next level of real-time machine learning is continual learning. While machine learning practitioners understand that data distributions shift continually and models go stale in production, the vast majority of models used in production today can’t adapt to shifting data distributions. The more the distribution shifts, the worse the model’s performance. Frequent retraining can help combat this, but the holy grail is to automatically and continually update the model with new data whenever it shows signs of going stale.
Continual learning not only helps improve performance, but it can also reduce training costs. When you retrain your model once a month, you may need to train it from scratch on a lot of data. However, with continual learning, you may only need to fine-tune it with a much smaller amount of new data.
A handful of companies have used continual learning successfully including Alibaba, ByteDance, and Tencent. However, it requires heavy infrastructure investment and a mental shift. Therefore, it still meets with a lot of resistance, and I don’t expect many companies to embrace it for at least a few years.
In 2022, I expect a lot more companies to move toward online prediction, thanks to increasingly mature streaming technologies and a growing number of success stories. And the same underlying streaming infrastructure can be leveraged for real-time model analytics.
Chip Huyen works on a startup that helps companies move toward real-time machine learning. She teaches Machine Learning Systems Design at Stanford University.