Deploying models for practical use is an industrial concern that generally goes unaddressed in research. As a result, publications on the subject tend to come from the major AI companies. These companies have built platforms to manage model design, training, deployment, and maintenance on a large scale, and their writings offer insight into current practices and issues. Beyond that, a few intrepid researchers have developed techniques that are proving critical in real-world applications.
The High-Interest Credit Card of Technical Debt: The notion of technical debt — hidden costs incurred by building a good-enough system that contains bugs or lacks functionality that becomes essential in due course — is familiar in software development. The authors argue that machine learning’s dependence on external code and real-world data makes these costs even more difficult to discover before the bill comes due. They offer a roadmap to finding and mitigating them, emphasizing the need to pay careful attention to inputs and outputs, as changing anything — training data, input structure, external code dependencies — causes other changes to ripple through the system.
Towards ML Engineering: Google offers this synopsis of TensorFlow Extended (TFX), a scaffold atop the TensorFlow programming framework that helps track data statistics and model behavior and automates various parts of a machine learning pipeline. During data collection, TFX compares incoming data with training data to evaluate its value for further training. During training, it tests models to make sure performance improves with each iteration of a model.
The Winding Road to Better Learning Infrastructure: Spotify built a hybrid platform using both TensorFlow Extended and Kubeflow, which encapsulates functions like data preprocessing, model training, and model validation to allow for reuse and reproducibility. The platform tracks each component’s use to provide a catalog of experiments, helping engineers cut the number of redundant experiments and learn from earlier efforts. It also helped the company discover a rogue pipeline that was triggered every five minutes for a few weeks.
Introducing FBLearner Flow: Facebook found that tweaking existing machine learning models yielded better performance than creating new ones. FBLearner Flow encourages such recycling company-wide, lowering the bar of expertise to take advantage of machine learning. The platform provides an expansive collection of algorithms to use and modify. It also manages the intricate details of scheduling experiments and executing them in parallel across many machines, along with dashboards for tracking the results.
Scaling Machine Learning as a Service: Models in development should train on batches of data for computational efficiency, whereas models in production should deliver inferences to users as fast as possible — that’s the idea behind Uber’s machine learning platform. During experimentation, code draws data from SQL databases, computes features, and stores them. Later, the features can be reused by deployed models for rapid prediction, ensuring that feature computation is consistent between testing and production.
A Unified Approach to Interpreting Model Predictions: Why did the model make the decision it did? That question is pressing as machine learning becomes more widely deployed. To help answer it, production platforms are starting to integrate Shapley Additive Explanations (SHAP). This method uses an explainable model such as linear regression to mimic a black-box model’s output. The explainable model is built by feeding perturbed inputs to the black-box model and measuring how its output changes in response to the perturbations. Once the model is built, ranking the features most important to the decision highlights bias in the original model.