05A: Monitor (Models)
Materials:
Date: Tuesday, 27-Aug-2024.
Pre-work:
- Review Diagnostics of ML Systems from CS329s
In-Class
- Recap tests for monitoring data quality in the ETL stage
- Understand different types of drifts (label, covariate, concept) and ways to detect them that may be important in the model development stage
- Collecting data in the training stage - see Training Data, from CS329s
- Data Programming - programmatically create labels. snorkel introduced these ideas first in the NLP space.
- Synthetic Data Generation using LLMs is a very promising and emerging application LLMs. For example, we can create labels of a piece of text, by prompting LLMs.
Post-class:
- Review Training Data, from CS329s
- Review Feature Engineering, from CS329s