Full Semester Course
Below is an organization of the content to offer a 14 week full semester with 4 contact hours. It can be roughly divided into 3 parts.
Overview
Prereqs
- Exposure and skill in data handling, building models in Python, PyTorch
- Exposure and skill in developing code using Python, Git, IDEs like VS Code
- A foundation course in Machine Learning, Deep Learning, Data Modeling, working with (Big) Data
Part-1: Essentials
- Topics
- basic principles and MLOps with Open Source Software
- two assignments
- Learning Outcomes: students will be able to
- deploy models with logging, documentation, unit tests, and APIs
- understand a conceptual framework to understand MLOps
Part-2: Full Stack MLOps
- Topics
- holistic understanding of ML development, beyond chasing typical performance metrics
- one assignment, one mini project and a midterm
- Learning Outcomes: students will be able to
- deploy models, observe their performance, make improvements, redeploy them.
- ensure that the ML pipeline is reproducible.
- incorporate principles from Responsible AI and build ML systems which can consist of many models and tools.
- deploy models, observe their performance, make improvements, redeploy them.
Part-3: Application
- Topics
- practice, Cloud solutions
- capstone project and presentations
- invited lectures from Industry
- Learning Outcomes: students will be able to
- frame, discover, develop, deploy, monitor, improve, re-deploy and maintain an ML Application
- approach the problem holistically, optimize RoI
Grading:
- 10%: Scribe lecture notes
- 30%: Three assignments, each 10%
- 15%: Midterm mini project
- 20%: In-class midterm MCQs, FIBs, Data Interpretation
- 25%: Capstone project
Suggested Schedule (WIP)
Week | Topics |
---|---|
01 | Discovery 1. Course Objectives and ML Recap 2. ML Lifecycle, Fullstack ML Infrastructure 3. DAGs, Software 1.0 vs 2.0, Tool Ecosystem, Project Setup 4. Project Canvas & Human-centered Design |
Assignment-01: Build a model that is well documented, modular, testable and functional | |
02 | Data Engineering 1. Design Patterns & Considerations, Data Models 2. ETL (with Flyte/dbt) and Feature Store (Chronon) 3. Data Versioning with DVC/Kedro and Logging 4. Feature Engineering with TFX/DFL/Encodings |
No Assignment: | |
03 | Model Development & Experimentation 1. Design Patterns & Considerations 2. Developing and Managing multiple models (with Hydra) 3. Model versioning with MLFlow 4.DoEs, Experiment tracking with WandB |
Assignment-02: Build: run multiple experiments, benchmark with a baseline, pick a top performing model | |
04 | Deploy & Serve 1. Design Patterns & Considerations Deploy with Docker 2. Model Serving (FastAPI, Flask) 3. Build a demo with Gradio |
No Assignment: | |
05 | Evaluation & CI/CD 1. Design Patterns & Considerations 2. CI/CD with Github Actions 3. Model Evaluation and benchmarking 4. A/B Testing |
Assignment-03: Build: test multiple models, and based on performance, roll out the best performing model for all users |
Week | Topics |
---|---|
06 | Performance Scaling, Continuous Testing 1. Design Patterns & Considerations 2. Scaling training and serving with MetaFlow/ TrueFoundry 3. Continuous Testing 4. RoI on experiments (no free lunch) |
No Assignment: | |
07 | Observibility, Reproducibility 1. Design Patterns & Considerations 2. Statistical tests for Model Drift, Data Drift 3. Monitoring drift with Alibi 4. R4 framework |
Midterm mini project: Build 1. ML pipeline that is reproducible, and 2. Implement “Replace” strategy where certain predictions were wrong, remove those data points, and redeploy the model |
|
08 | Trustworthy ML 1. Design Patterns & Considerations 2. Conformalization for Statistical Guarantees OOD 3. Human-in-the-Loop, Abstention, System of Models |
No Assignment | |
09 | Responsible ML 1. Design Patterns & Considerations 2. Fairness, Safety, Alignment 3. Fairness with IBM 360 |
No Assignment: | |
10 | Data Centric AI and Pipeline Debugging 1. Automated Debug of Data and Pretrained Models 2. Human side of AI 3. Data Cards, Model Cards, Modeling Cards |
Midterm: in-class |
Week | Topics |
---|---|
11 | Case Study: Putting It Together using OSS tools Develop a RAG Chatbot using Mistral-7B |
12 | MLOps on Cloud Platforms [Databricks, Google Vertex, AWS SageMaker, MS Azure, TrueFOundry, OuterBounds] 1. ETL and Feature Store 2. Train, Deploy, Monitor 3. A/B testing 4. CI/CD under drift strategy where certain predictions were wrong, remove those data points, and redeploy the model |
13 | Practitioner Talks and Ask Me Anything Sessions 1. Healthcare 2. Retail/ e-commerce 3. Logistics/ Supply Chain 4. Agriculture |
14 | Project Presentations by Teams |
openly,
The Saddle Point