AQA — System of Record for Agentic Verification

The problem

When coding agents write and run tests, the verification itself becomes throwaway. Green CI doesn't mean reliable software: a failure gets diagnosed, fixed, and the reasoning evaporates — so the next time a similar failure appears, an agent re-derives the root cause from scratch. Agents also tend to mark their own homework, and a flaky or cascading failure can hide a real regression behind a false green. AQA is built to make verification institutional — a durable, attributable, queryable record of what was tested, what failed, and why.

The approach

AQA is a TestLink-style test-management system designed for the agentic coding loop, reachable by agents over MCP (not just humans in a dashboard). It sits one layer above the test runner and tracks plans, cases, dependencies, run history, and requirements-to-test traceability. Its distinguishing feature is verification memory: a failed run's root-cause reasoning is stored and semantically recalled, so a recurrence comes back with its earlier diagnosis instead of being re-investigated.

Architecture

One transport-agnostic service layer is exposed three ways — a REST API (FastAPI), an MCP server (FastMCP, 43 tools), and a CLI (Typer) — so humans and agents operate on the same state. Data lives in PostgreSQL with pgvector; failure root-cause prose is embedded locally with all-MiniLM-L6-v2 for semantic similarity search, and artifacts (traces, logs, screenshots) go to S3-compatible blob storage.

Four capabilities define it:

Regression memory — search_similar_failures / get_known_regressions recall prior diagnoses and cached fix-paths, and log the re-investigations they avoid.
Blind-spot radar — requirements → coverage links surface tests that don't exist yet.
Doer ≠ checker — a claim/verify protocol with identity enforcement, so work isn't self-certified.
Dependency gating — a dependency-aware run manifest and cascade-blocking that kills false-green CI when an upstream case fails.

A note on rigor

AQA dogfoods itself: a Stop hook blocks "done" while coverage gaps are open, and CI records its own runs. It raises the reliability floor — no silent regressions, no invisible blind spots, no self-graded work — a ratchet that compounds across commits and sessions.

AQA — System of Record for Agentic Verification

The problem

The approach

Architecture

A note on rigor

Outcomes

Tech Stack