Dear friends,
Machine learning development is highly iterative. Rather than designing a grand system, spending months to build it, and then launching it and hoping for the best, it’s usually better to build a quick-and-dirty system, get feedback, and use that feedback to improve the system.
The iterative aspect of machine learning applies to many steps. For example:
Data labeling: It’s hard to come up with fully fleshed-out labeling guidelines that result in clean and consistent labels on your first attempt. It might be better to use an initial set of guidelines to label some data, see what problems arise, and then improve the guidelines.
Model training: Building an AI system requires deciding what data, hyperparameters, and model architecture to use. Rather than overthinking these choices, it’s often better to train an initial model, then use error analysis to drive improvements.
Deployment and monitoring: When deploying a machine learning system, you might implement dashboards that track various metrics to try to spot concept drift or data drift. For example, if you’re building a product recommendation system, you might track both software metrics such as queries per second and statistical metrics such as how often the system recommends products of different categories. What metrics should we track? Rather than try to design the perfect set of dashboards before launch, I find it more fruitful to pick a very large set of metrics, evolve them, and prune the ones that prove less useful.
Iteration is helpful in other phases of machine learning development as well. It make sense to take an empirical, experimental approach to decision making whenever:
- Multiple options are available and it's hard to know the best choice in advance.
- We can run experiments to get data quickly about the performance of different options.
These two properties hold true for many steps in a typical ML project.
One implication is that, if we can build tools and processes that enable high-throughput experimentation, we can make faster progress. For instance, if you have an MLOps platform that enables you to quickly train and evaluate new models, this will allow you to improve models more quickly.
This principle applies to other aspects of ML development that are iterative. That’s why time spent optimizing your team's capacity to run many experiments can pay off well.
Keep learning!
Andrew