{"componentChunkName":"component---src-templates-project-template-js","path":"/projects/foundry-lora-fine-tuning-adapter-lifecycle-platform","result":{"data":{"project":{"title":"Foundry — LoRA Fine-Tuning & Adapter-Lifecycle Platform","slug":"foundry-lora-fine-tuning-adapter-lifecycle-platform","description":"A private research build. A self-built MLOps platform that trains, tracks, evaluates, and serves small LLM (LoRA) adapters across heterogeneous compute — Modal serverless GPUs, Kaggle, and local Apple-Silicon MLX. Exposes a Click CLI and a FastAPI service over a pluggable compute-backend abstraction, with run tracking in SQLite, eval-gated deployment, and OpenAI-compatible inference. Built to train ChakravyuhRift's pentest-specialist adapters, but architecturally domain-agnostic.","caseStudy":"> A private research build — not publicly hosted. This write-up focuses on the architecture and the engineering.\n\n## The problem\n\nFine-tuning small LLM adapters is easy to start and painful to operate: the same LoRA job has to run reproducibly across whatever GPU you can get (a Modal A10G today, a Kaggle T4 tomorrow, an Apple-Silicon laptop offline), every run's hyperparameters/loss/cost/eval need to be remembered, bad adapters must be stopped before they ship, and you want to serve dozens of them without paying for dozens of GPUs. Foundry is the control plane for all of that.\n\n## The approach\n\nA self-built MLOps platform — \"adapter lifecycle tracker\" — with two surfaces (a Click **CLI** and a **FastAPI** service + web UI) over shared data, training, and inference layers, glued together by a **pluggable compute-backend abstraction**. One command takes a curated corpus to a tracked, evaluated, deployable adapter.\n\n## Architecture\n\n- **Compute backends** — a single `ComputeBackend` protocol (trigger / status / logs / download / cancel) implemented for **Modal** (serverless A10G GPUs + Volumes), **Kaggle**, and **local Apple-Silicon MLX** (a Colab backend is scaffolded but stubbed), with a polling `RunManager` that reconciles runs across them.\n- **Training layer** — a provider-agnostic `TrainingConfig` and a `train_lora` dispatcher that auto-detects hardware capability and picks framework + precision: **Unsloth + TRL `SFTTrainer`** with 4-bit LoRA on CUDA, HuggingFace transformers/PEFT, or MLX locally — including a workaround for the Turing-T4 bf16/GradScaler bug.\n- **Inference** — two paths: a Modal + **vLLM** OpenAI-compatible server for **single-model** cloud inference, and a local **MLX multi-adapter server** that loads one base model and **hot-swaps LoRA tensors per request (~100ms, one active adapter at a time)**, choosing the adapter via the OpenAI `model` field or an `X-Adapter` header — many adapters served for roughly the cost of one base model.\n- **Tracking & governance** — every run persisted in SQLite (hyperparameters, loss history, cost), an **LLM-judge eval-policy engine**, and **eval-gated deployment** that refuses to promote an adapter unless its latest verdict is `PASSED`.\n- **Data layer** — corpus discovery + a human-in-the-loop curation workflow (per-record keep / drop / flag → filtered corpus), a scrape→parse→merge→quality-scan data pipeline with versioning, and idempotent migrations that reconstruct historical experiment rows.\n\n## Where it fits\n\nFoundry is domain-agnostic at the core (its example adapter is literally \"buddhism\") but was built to train the pentest-specialist adapters that [ChakravyuhRift](/projects/chakravyuhrift-autonomous-offensive-security-agent-platform/)'s PDCA traces produce — the training half of that self-improvement loop.\n","gallery":[],"date_start":"2025","date_end":null,"hours":null,"client":null,"tags":["ai","mlops","llm","fine-tuning","research"],"outcomes":["Provider-agnostic LoRA training layer that auto-detects hardware capability and dispatches to the right framework/precision (Unsloth+TRL on CUDA, HF transformers/PEFT, or Apple MLX), including a Turing-T4 bf16/GradScaler workaround","ComputeBackend protocol with working Modal, Kaggle, and local-MLX implementations (Colab backend stubbed) plus a polling RunManager — one CLI command uploads data to a Modal Volume, spawns an A10G Unsloth+TRL SFT job, streams loss events, and downloads adapter weights","Multi-adapter inference via a local MLX server that hot-swaps LoRA tensors over one shared base model (~100ms, one active adapter at a time) selected via the OpenAI model field / X-Adapter header, plus a Modal+vLLM OpenAI-compatible server for single-model cloud inference","Experiment-tracking + governance: SQLite-persisted runs (hyperparameters, loss history, cost), an LLM-judge eval-policy engine, and eval-gated deployment that blocks promotion unless the latest verdict is PASSED","Human-in-the-loop data-curation workflow (per-record keep/drop/flag → filtered corpus) across CLI and FastAPI, with idempotent evidence-backfill migrations reconstructing historical experiment rows"],"tech_stack":["Python","FastAPI","LoRA / SFT Fine-Tuning","Unsloth / TRL","Modal","vLLM","Apple MLX","SQLite","Click CLI"],"links":[],"image":{"childImageSharp":{"fluid":{"tracedSVG":"data:image/svg+xml,%3csvg%20xmlns='http://www.w3.org/2000/svg'%20width='400'%20height='229'%20viewBox='0%200%20400%20229'%20preserveAspectRatio='none'%3e%3cpath%20d='M158%20142h-15c-11%200-14%200-14%202%200%201%2037%202%2038%200s-7-4-9-2'%20fill='%23d3d3d3'%20fill-rule='evenodd'/%3e%3c/svg%3e","aspectRatio":1.7543859649122806,"src":"/static/3c098a2752e6375f215e068ef759a3ec/ee604/foundry.png","srcSet":"/static/3c098a2752e6375f215e068ef759a3ec/69585/foundry.png 200w,\n/static/3c098a2752e6375f215e068ef759a3ec/497c6/foundry.png 400w,\n/static/3c098a2752e6375f215e068ef759a3ec/ee604/foundry.png 800w","srcWebp":"/static/3c098a2752e6375f215e068ef759a3ec/58556/foundry.webp","srcSetWebp":"/static/3c098a2752e6375f215e068ef759a3ec/61e93/foundry.webp 200w,\n/static/3c098a2752e6375f215e068ef759a3ec/1f5c5/foundry.webp 400w,\n/static/3c098a2752e6375f215e068ef759a3ec/58556/foundry.webp 800w","sizes":"(max-width: 800px) 100vw, 800px"}}},"stack_icons":[{"name":"Python","icon":{"childImageSharp":{"fixed":{"tracedSVG":"data:image/svg+xml,%3csvg%20xmlns='http://www.w3.org/2000/svg'%20width='24'%20height='24'%20viewBox='0%200%2024%2024'%20preserveAspectRatio='none'%3e%3cpath%20d='M1%201v22c2%202%2020%201%2022-1s3-22%201-21H1'%20fill='%23d3d3d3'%20fill-rule='evenodd'/%3e%3c/svg%3e","width":24,"height":24,"src":"/static/64d0f7b1b208f14bd8dd5134b3ed7ff5/6d1ba/python.png","srcSet":"/static/64d0f7b1b208f14bd8dd5134b3ed7ff5/6d1ba/python.png 1x,\n/static/64d0f7b1b208f14bd8dd5134b3ed7ff5/a9c35/python.png 1.5x,\n/static/64d0f7b1b208f14bd8dd5134b3ed7ff5/559c9/python.png 2x","srcWebp":"/static/64d0f7b1b208f14bd8dd5134b3ed7ff5/f8bad/python.webp","srcSetWebp":"/static/64d0f7b1b208f14bd8dd5134b3ed7ff5/f8bad/python.webp 1x,\n/static/64d0f7b1b208f14bd8dd5134b3ed7ff5/f81b6/python.webp 1.5x,\n/static/64d0f7b1b208f14bd8dd5134b3ed7ff5/804d1/python.webp 2x"}}}},{"name":"LLM","icon":{"childImageSharp":{"fixed":{"tracedSVG":"data:image/svg+xml,%3csvg%20xmlns='http://www.w3.org/2000/svg'%20width='24'%20height='24'%20viewBox='0%200%2024%2024'%20preserveAspectRatio='none'%3e%3cpath%20d='M1%201v22c2%202%2020%201%2022-1s3-22%201-21H1'%20fill='%23d3d3d3'%20fill-rule='evenodd'/%3e%3c/svg%3e","width":24,"height":24,"src":"/static/88083f797bbb622a09f48a92d99d6231/6d1ba/llm.png","srcSet":"/static/88083f797bbb622a09f48a92d99d6231/6d1ba/llm.png 1x,\n/static/88083f797bbb622a09f48a92d99d6231/a9c35/llm.png 1.5x,\n/static/88083f797bbb622a09f48a92d99d6231/559c9/llm.png 2x","srcWebp":"/static/88083f797bbb622a09f48a92d99d6231/f8bad/llm.webp","srcSetWebp":"/static/88083f797bbb622a09f48a92d99d6231/f8bad/llm.webp 1x,\n/static/88083f797bbb622a09f48a92d99d6231/f81b6/llm.webp 1.5x,\n/static/88083f797bbb622a09f48a92d99d6231/804d1/llm.webp 2x"}}}},{"name":"Embeddings","icon":{"childImageSharp":{"fixed":{"tracedSVG":"data:image/svg+xml,%3csvg%20xmlns='http://www.w3.org/2000/svg'%20width='24'%20height='24'%20viewBox='0%200%2024%2024'%20preserveAspectRatio='none'%3e%3cpath%20d='M1%201v22c2%202%2020%201%2022-1s3-22%201-21H1'%20fill='%23d3d3d3'%20fill-rule='evenodd'/%3e%3c/svg%3e","width":24,"height":24,"src":"/static/b0940db6930f1b27d8451c404e8a5e5c/6d1ba/embeddings.png","srcSet":"/static/b0940db6930f1b27d8451c404e8a5e5c/6d1ba/embeddings.png 1x,\n/static/b0940db6930f1b27d8451c404e8a5e5c/a9c35/embeddings.png 1.5x,\n/static/b0940db6930f1b27d8451c404e8a5e5c/559c9/embeddings.png 2x","srcWebp":"/static/b0940db6930f1b27d8451c404e8a5e5c/f8bad/embeddings.webp","srcSetWebp":"/static/b0940db6930f1b27d8451c404e8a5e5c/f8bad/embeddings.webp 1x,\n/static/b0940db6930f1b27d8451c404e8a5e5c/f81b6/embeddings.webp 1.5x,\n/static/b0940db6930f1b27d8451c404e8a5e5c/804d1/embeddings.webp 2x"}}}},{"name":"API","icon":{"childImageSharp":{"fixed":{"tracedSVG":"data:image/svg+xml,%3csvg%20xmlns='http://www.w3.org/2000/svg'%20width='24'%20height='24'%20viewBox='0%200%2024%2024'%20preserveAspectRatio='none'%3e%3cpath%20d='M1%201v22c2%202%2020%201%2022-1s3-22%201-21H1'%20fill='%23d3d3d3'%20fill-rule='evenodd'/%3e%3c/svg%3e","width":24,"height":24,"src":"/static/fb76c0da90f90c016ad01e4dc810443f/6d1ba/api.png","srcSet":"/static/fb76c0da90f90c016ad01e4dc810443f/6d1ba/api.png 1x,\n/static/fb76c0da90f90c016ad01e4dc810443f/a9c35/api.png 1.5x,\n/static/fb76c0da90f90c016ad01e4dc810443f/559c9/api.png 2x","srcWebp":"/static/fb76c0da90f90c016ad01e4dc810443f/f8bad/api.webp","srcSetWebp":"/static/fb76c0da90f90c016ad01e4dc810443f/f8bad/api.webp 1x,\n/static/fb76c0da90f90c016ad01e4dc810443f/f81b6/api.webp 1.5x,\n/static/fb76c0da90f90c016ad01e4dc810443f/804d1/api.webp 2x"}}}}]}},"pageContext":{"slug":"foundry-lora-fine-tuning-adapter-lifecycle-platform"}},"staticQueryHashes":["3724428426"]}