Full Semester Course

Below is an organization of the content to offer a 14 week full semester with 4 contact hours. It can be roughly divided into 3 parts.

Overview

Prereqs

  • Exposure and skill in data handling, building models in Python, PyTorch
  • Exposure and skill in developing code using Python, Git, IDEs like VS Code
  • A foundation course in Machine Learning, Deep Learning, Data Modeling, working with (Big) Data

Part-1: Essentials

  • Topics
    • basic principles and MLOps with Open Source Software
    • two assignments
  • Learning Outcomes: students will be able to
    • deploy models with logging, documentation, unit tests, and APIs
    • understand a conceptual framework to understand MLOps

Part-2: Full Stack MLOps

  • Topics
    • holistic understanding of ML development, beyond chasing typical performance metrics
    • one assignment, one mini project and a midterm
  • Learning Outcomes: students will be able to
    • deploy models, observe their performance, make improvements, redeploy them.
    • ensure that the ML pipeline is reproducible.
    • incorporate principles from Responsible AI and build ML systems which can consist of many models and tools.

Part-3: Application

  • Topics
    • practice, Cloud solutions
    • capstone project and presentations
    • invited lectures from Industry
  • Learning Outcomes: students will be able to
    • frame, discover, develop, deploy, monitor, improve, re-deploy and maintain an ML Application
    • approach the problem holistically, optimize RoI

Grading:

  • 10%: Scribe lecture notes
  • 30%: Three assignments, each 10%
  • 15%: Midterm mini project
  • 20%: In-class midterm MCQs, FIBs, Data Interpretation
  • 25%: Capstone project

Suggested Schedule (WIP)

Part-1: Essentials
Week Topics
01 Discovery
1. Course Objectives and ML Recap
2. ML Lifecycle, Fullstack ML Infrastructure
3. DAGs, Software 1.0 vs 2.0, Tool Ecosystem, Project Setup
4. Project Canvas & Human-centered Design
  Assignment-01: Build a model that is well documented, modular, testable and functional
02 Data Engineering
1. Design Patterns & Considerations, Data Models
2. ETL (with Flyte/dbt) and Feature Store (Chronon)
3. Data Versioning with DVC/Kedro and Logging
4. Feature Engineering with TFX/DFL/Encodings
  No Assignment:
03 Model Development & Experimentation
1. Design Patterns & Considerations
2. Developing and Managing multiple models (with Hydra)
3. Model versioning with MLFlow
4.DoEs, Experiment tracking with WandB
  Assignment-02: Build: run multiple experiments, benchmark with a baseline, pick a top performing model
04 Deploy & Serve
1. Design Patterns & Considerations Deploy with Docker
2. Model Serving (FastAPI, Flask)
3. Build a demo with Gradio
  No Assignment:
05 Evaluation & CI/CD 1. Design Patterns & Considerations
2. CI/CD with Github Actions
3. Model Evaluation and benchmarking
4. A/B Testing
  Assignment-03: Build: test multiple models, and based on performance, roll out the best performing model for all users
Part-2: Fullstack MLOps
Week Topics
06 Performance Scaling, Continuous Testing
1. Design Patterns & Considerations
2. Scaling training and serving with MetaFlow/ TrueFoundry
3. Continuous Testing
4. RoI on experiments (no free lunch)
  No Assignment:
07 Observibility, Reproducibility
1. Design Patterns & Considerations
2. Statistical tests for Model Drift, Data Drift
3. Monitoring drift with Alibi
4. R4 framework
  Midterm mini project: Build
1. ML pipeline that is reproducible, and
2. Implement “Replace” strategy where certain predictions were wrong, remove those data points, and redeploy the model
08 Trustworthy ML
1. Design Patterns & Considerations
2. Conformalization for Statistical Guarantees OOD
3. Human-in-the-Loop, Abstention, System of Models
  No Assignment
09 Responsible ML
1. Design Patterns & Considerations
2. Fairness, Safety, Alignment
3. Fairness with IBM 360
  No Assignment:
10 Data Centric AI and Pipeline Debugging
1. Automated Debug of Data and Pretrained Models
2. Human side of AI
3. Data Cards, Model Cards, Modeling Cards
  Midterm: in-class
Part-3: Practice
Week Topics
11 Case Study:
Putting It Together using OSS tools
Develop a RAG Chatbot using Mistral-7B
12 MLOps on Cloud Platforms
[Databricks, Google Vertex, AWS SageMaker, MS Azure, TrueFOundry, OuterBounds]
1. ETL and Feature Store
2. Train, Deploy, Monitor
3. A/B testing
4. CI/CD under drift strategy where certain predictions were wrong, remove those data points, and redeploy the model
13 Practitioner Talks and Ask Me Anything Sessions
1. Healthcare
2. Retail/ e-commerce
3. Logistics/ Supply Chain
4. Agriculture
14 Project Presentations by Teams

openly,
The Saddle Point