A private research build — not publicly hosted. This write-up focuses on the architecture and the engineering.
The problem
Fine-tuning small LLM adapters is easy to start and painful to operate: the same LoRA job has to run reproducibly across whatever GPU you can get (a Modal A10G today, a Kaggle T4 tomorrow, an Apple-Silicon laptop offline), every run's hyperparameters/loss/cost/eval need to be remembered, bad adapters must be stopped before they ship, and you want to serve dozens of them without paying for dozens of GPUs. Foundry is the control plane for all of that.
The approach
A self-built MLOps platform — "adapter lifecycle tracker" — with two surfaces (a Click CLI and a FastAPI service + web UI) over shared data, training, and inference layers, glued together by a pluggable compute-backend abstraction. One command takes a curated corpus to a tracked, evaluated, deployable adapter.
Architecture
- Compute backends — a single
ComputeBackendprotocol (trigger / status / logs / download / cancel) implemented for Modal (serverless A10G GPUs + Volumes), Kaggle, Colab, and local Apple-Silicon MLX, with a pollingRunManagerthat reconciles runs across all of them. - Training layer — a provider-agnostic
TrainingConfigand atrain_loradispatcher that auto-detects hardware capability and picks framework + precision: Unsloth + TRLSFTTrainerwith 4-bit LoRA on CUDA, HuggingFace transformers/PEFT, or MLX locally — including a workaround for the Turing-T4 bf16/GradScaler bug. - Inference — two modes from the same adapters: a Modal + vLLM OpenAI-compatible server for cloud, and a local MLX multi-adapter server that loads one base model and hot-swaps LoRA tensors per request (~100ms), choosing the adapter via the OpenAI
modelfield or anX-Adapterheader. ("Deploy thousands of models for the cost of one.") - Tracking & governance — every run persisted in SQLite (hyperparameters, loss history, cost), an LLM-judge eval-policy engine, and eval-gated deployment that refuses to promote an adapter unless its latest verdict is
PASSED. - Data layer — corpus discovery + a human-in-the-loop curation workflow (per-record keep / drop / flag → filtered corpus), a scrape→parse→merge→quality-scan data pipeline with versioning, and idempotent migrations that reconstruct historical experiment rows.
Where it fits
Foundry is domain-agnostic at the core (its example adapter is literally "buddhism") but was built to train the pentest-specialist adapters that ChakravyuhRift's PDCA traces produce — the training half of that self-improvement loop.