Dear friends,
In earlier letters, I discussed some differences between developing traditional software and AI products, including the challenges of unclear technical feasibility, complex product specification, and need for data to start development. This time, let’s examine the further challenge of additional maintenance cost.
Some engineers think that when you deploy an AI system, you’re done. But when you first deploy, you may only be halfway to the goal. Substantial work lies ahead in monitoring and maintaining the system. Here are some reasons why:
- Data drift. The model was trained on a certain distribution of inputs, but this distribution changes over time. For example, a model may have learned to estimate demand for electricity from historical data, but climate change is causing unprecedented changes to weather, so the model’s accuracy degrades.
- Concept drift. The model was trained to learn an x->y mapping, but the statistical relationship between x and y changes, so the same input x now demands a different prediction y. For example, a model that predicts housing prices based on square footage will lose accuracy as inflation causes prices to rise.
- Changing requirements. The model was built to perform a particular task, but the product team decides to modify its capabilities. For instance, a model detects construction workers who wander into a dangerous area without a hard hat for more than 5 seconds. But safety requirements change, and now it must flag hatless workers who enter the area for more than 3 seconds. (This issue sometimes manifests as concept drift, but I put it in a different category because it’s often driven by changes in the product specification rather than changes in the world.)
Detecting concept and data drift is challenging, because AI systems have unclear boundary conditions. For traditional software, boundary conditions — the range of valid inputs — are usually easy to specify. But for AI software trained on a given data distribution, it’s challenging to recognize when the data distribution has changed sufficiently to compromise performance.
This problem is exacerbated when one AI system’s output is used as another AI’s input in what’s known as a data cascade. For example, one system may detect people and a second may determine whether each person detected is wearing a hard hat. If the first system changes — say, you upgrade to a better person detector — the second may experience data drift, causing the whole system to degrade.
Even if we detect these issues, our tools for fixing them are immature. Over the past few decades, software engineers have developed relatively sophisticated tools for versioning, maintaining, and collaborating on code. We have processes and tools that can help you fix a bug in code that a teammate wrote 2 years ago. But AI systems require both code and data. If you need to fix a few training examples that a teammate collected and labeled 2 years ago, will you be able to find the documentation and the exact version of the data? Can you verify that your changes are sound and retrain the model on the revised dataset? Tools for data management, unlike tools for code management, are still nascent.
Beyond data maintenance, we still have traditional software maintenance to deal with. For instance, many teams had to upgrade from TensorFlow 1 to TensorFlow 2.
These problems will recede as data-centric AI tools and methodologies evolve. But for now, being aware of them and planning projects around them can help you build better models and reduce costs.
Keep learning!
Andrew