Foundry — LoRA Fine-Tuning & Adapter-Lifecycle Platform

A private research build — not publicly hosted. This write-up focuses on the architecture and the engineering.

The problem

Fine-tuning small LLM adapters is easy to start and painful to operate: the same LoRA job has to run reproducibly across whatever GPU you can get (a Modal A10G today, a Kaggle T4 tomorrow, an Apple-Silicon laptop offline), every run's hyperparameters/loss/cost/eval need to be remembered, bad adapters must be stopped before they ship, and you want to serve dozens of them without paying for dozens of GPUs. Foundry is the control plane for all of that.

The approach

A self-built MLOps platform — "adapter lifecycle tracker" — with two surfaces (a Click CLI and a FastAPI service + web UI) over shared data, training, and inference layers, glued together by a pluggable compute-backend abstraction. One command takes a curated corpus to a tracked, evaluated, deployable adapter.

Architecture

Compute backends — a single ComputeBackend protocol (trigger / status / logs / download / cancel) implemented for Modal (serverless A10G GPUs + Volumes), Kaggle, Colab, and local Apple-Silicon MLX, with a polling RunManager that reconciles runs across all of them.
Training layer — a provider-agnostic TrainingConfig and a train_lora dispatcher that auto-detects hardware capability and picks framework + precision: Unsloth + TRL SFTTrainer with 4-bit LoRA on CUDA, HuggingFace transformers/PEFT, or MLX locally — including a workaround for the Turing-T4 bf16/GradScaler bug.
Inference — two modes from the same adapters: a Modal + vLLM OpenAI-compatible server for cloud, and a local MLX multi-adapter server that loads one base model and hot-swaps LoRA tensors per request (~100ms), choosing the adapter via the OpenAI model field or an X-Adapter header. ("Deploy thousands of models for the cost of one.")
Tracking & governance — every run persisted in SQLite (hyperparameters, loss history, cost), an LLM-judge eval-policy engine, and eval-gated deployment that refuses to promote an adapter unless its latest verdict is PASSED.
Data layer — corpus discovery + a human-in-the-loop curation workflow (per-record keep / drop / flag → filtered corpus), a scrape→parse→merge→quality-scan data pipeline with versioning, and idempotent migrations that reconstruct historical experiment rows.

Where it fits

Foundry is domain-agnostic at the core (its example adapter is literally "buddhism") but was built to train the pentest-specialist adapters that ChakravyuhRift's PDCA traces produce — the training half of that self-improvement loop.

Foundry — LoRA Fine-Tuning & Adapter-Lifecycle Platform

The problem

The approach

Architecture

Where it fits

Outcomes

Tech Stack