Image ML Pod
A modular framework to simplify your image dataset workflows.
Image ML Pod is a ready-to-use framework designed to make image-based machine learning pipelines easier, faster, and more scalable. From preprocessing to training, inference, and deployment, this pod provides you with tools and templates to focus on building your models instead of managing workflows.
Why Image ML Pod?
- Prebuilt Kedro Pipelines: Modular workflows for preprocessing, training, inference, and postprocessing.
- Seamless Integration: Built-in support for HuggingFace datasets, MLFlow, FastAPI, and Docker.
- Cutting-Edge Features:
- Out-of-Distribution (OOD) detection.
- Conformal predictions for reliability.
- Explainability with Integrated Gradients.
- Scalable Deployment: Easily bootstrap APIs or explore microservices architecture.
- Time-Saving: Spend less time on setup and more on experimentation.
Key Features
- Modular Design: Use only the pipelines you need, customize nodes, and add new ones effortlessly.
- Automatic OOD Detection: Ensure robustness with templates for MSP, RMD, and MultiMahalanobis detectors.
- Experiment Tracking: MLFlow integration lets you log hyperparameters, metrics, and models.
- FastAPI Integration: Bootstrap APIs directly from inference pipelines.
- Docker Support: Build and deploy your applications seamlessly with GPU compatibility.
- Conformal Predictions: Generate reliable prediction sets with torchcp.
- Explainability: Use Captum’s Integrated Gradients to interpret your model’s decisions.
How It Works
Framework Overview
- Data Handling: HuggingFace datasets integration for seamless loading and processing of image datasets.
- Preprocessing: Ready-to-use pipelines for image transforms, OOD detection, and data augmentation.
- Training: Kedro pipelines with placeholders for custom models and training logic.
- Inference: FastAPI server integration for real-time inference.
- Postprocessing: Enhance predictions with conformal methods and explainability tools.
Demos
Prebuilt Pipelines
- Load an image dataset with HuggingFace’s
ImageFolder
. - Train a model and log results with MLFlow.
- Deploy the inference pipeline as a FastAPI server.
- Load an image dataset with HuggingFace’s
Example Commands
# Generate conformal predictions kedro run --pipeline=inf_pred_postprocessing # Launch the FastAPI server uvicorn src.image_ml_pod_fastapi.app:app --host 0.0.0.0 --port 8000
Customization
Adding Custom Nodes
- Modify existing Kedro nodes or add new ones in the pipeline YAML files.
- Use the provided templates for:
- Data Preprocessing: Add torchvision transforms or custom logic.
- OOD Detection: Train your own detectors.
- Postprocessing: Implement explainability or custom logging.
Extending Pipelines
Add or remove nodes by editing the
catalog.yml
and pipeline configuration files.Example:
my_image_dataset: type: image_ml_pod.datasets.HFImageFolderDataSet data_dir: data/01_raw/images
Deployment
Running Locally
Run the FastAPI server for inference:
uvicorn src.image_ml_pod_fastapi.app:app --host 0.0.0.0 --port 8000
Dockerized Deployment
Build the Docker image:
docker build -t image-ml-pod .
Run the Docker container with GPU support:
docker run -p 8000:8000 --gpus all image-ml-pod
Documentation
We use Quarto and Quartodoc to generate up-to-date documentation directly from the codebase. To view:
# Generate docs
quartodoc build
# Preview as a website
quarto preview
License
This project is licensed under the MIT License. See the LICENSE file for details.
Acknowledgments
Built with love using Kedro, HuggingFace, MLFlow, FastAPI, and more. Special thanks to the open-source community for providing the tools that made this possible.