05A: Monitor (Models)

Materials:

Date: Tuesday, 27-Aug-2024.

Pre-work:

  1. Review Diagnostics of ML Systems from CS329s

In-Class

  1. Recap tests for monitoring data quality in the ETL stage
  2. Understand different types of drifts (label, covariate, concept) and ways to detect them that may be important in the model development stage
  3. Collecting data in the training stage - see Training Data, from CS329s
  4. Data Programming - programmatically create labels. snorkel introduced these ideas first in the NLP space.
  5. Synthetic Data Generation using LLMs is a very promising and emerging application LLMs. For example, we can create labels of a piece of text, by prompting LLMs.

Post-class:

  1. Review Training Data, from CS329s
  2. Review Feature Engineering, from CS329s