![](https://crypto4nerd.com/wp-content/uploads/2024/04/1FEP7_2n50c0vwQt8lOwI_Q-1024x585.png)
The modeling of the machine learning system is an iterative process,
Similar to machine learning modeling, the deployment of the system is also iterative.
It takes a few iterative processes to choose the right set of metrics to monitor.
After choosing the right set of metrics, you can set the thresholds for each metrics.
When a model needs to be updated, you can either retrain it manually or retrain it automatically.
Pipeline Monitoring:
Let’s take an example of a speech recognition system,
The implementation of this system is not simple like this but even there more complex pipeline.
With the VAD (Voice activity detection) module, when this speech recognition system runs on the cloud, you don’t need to stream more bandwidth.
VAD looks at a long stream of audio on the cell phone and clips or shortens the audio to just the part where someone is talking and streams only that to the cloud server to perform speech recognition.
Changes to the first module (e.g. VAD) may affect the performance of the second module (e.g. speech recognition), for example, some cellphones might have VAD clip audio differently (may be more silence at the end or the start of audio), if VAD’s output changes then the input of speech recognition is also changes that cause the degradation of the performance of speech recognition system.
User profile example: Let’s take an example of the user profile, where the data is gathered by the user’s information that they have given or using some products.
If the user data changes, then it may be we lose the ability to predict whether the user owns a car or not, then the percentage of unknown increases, then the input of the recommender system changes, then the product recommendations quality might be affected.
When building these complex machine-learning pipelines which can help machine learning-based components or non machine learning-based components throughout the pipeline, it is useful to brainstorm metrics to monitor that can detect changes including concept drift or data drift.
Metrics to monitor: The metrics that you should monitor are discussed as above, which are
- Software metrics
- Input metrics
- Output metrics
Figure out how quickly the data changes.
- The rate at which the data changes is problem-dependent.
For some applications the data changes over the time scales of months or years. And for some applications the data changes in minutes.
Note: 1. User data generally changes relatively slowly.
2. Enterprise data (B2B applications) can change very quickly.