{"componentChunkName":"component---src-templates-blog-template-js","path":"/blog/i-tried-to-make-my-ai-lie","result":{"data":{"blog":{"content":"# I built an AI that scores my fit for a job — then I tried to make it lie\n\nThere's an MCP server behind this site. Point Claude or ChatGPT at `mcp.nishanttiwari.com/mcp` and it can query my real skills and projects, and run a tool — `match_role` — that scores how well I fit a job description. I built it to be useful to a recruiter's AI. So the question that matters isn't \"does it work?\" It's \"does it lie?\"\n\nHere's how I found out it did.\n\n## The test that mattered\n\nEvery test I'd run fed it roles I'm a good fit for — cloud architecture, AI platforms, DevOps. All came back strong, all true. None tested the case that actually matters: a role I *can't* do.\n\nSo I passed it a Senior Rust Engineer job — required skills Rust, Tokio, Go, Elixir, Systems Programming. I've never shipped Rust. The honest answer is \"no.\"\n\nIt returned: **overall_score 71.52, verdict good_match, hire_signal \"likely_yes\", zero skill gaps.** \"Likely yes\" for a Rust role, to a recruiter, about a man with no Rust — through the *exact* interface a real recruiter's AI would use. That's the worst thing an AI-facing profile can do.\n\n## What was actually happening\n\n`match_role` had a semantic layer: a small local embedding model (MiniLM, 22M parameters) that, for any skill it couldn't match exactly, found the \"nearest\" skill I *do* have and counted it as covered. Rust to Linux (cosine around 0.30). Tokio to Boto3. Go to Google Cloud. Elixir to Lambda. It then **overwrote the job's requirements with my own skills** and checked them all off. Of course there were no gaps — it had quietly swapped the question for one it could answer.\n\n## \"Just use a better model\" doesn't work\n\nI measured it — cosine separation between false pairs (Rust/Linux) and true paraphrases (Software Architecture/Solution Architecture):\n\n- **MiniLM:** false and true pairs *overlap* — no clean line.\n- **all-mpnet-base-v2:** separates by a razor-thin 0.047 — too narrow to trust.\n- **bge-base:** worse — it inflates everything (Rust/Linux jumps to 0.66).\n\nShort skill strings live in too tight a band for any embedding model to cleanly tell \"same skill, different words\" from \"different skill, same field.\" A bigger model just re-tunes the same coin flip.\n\n## The actual fix: know who's smart in the room\n\nHere's the thing I'd missed: the model *calling* my tool is a frontier LLM — already in the loop, already reading the job description. It knows perfectly well that Rust is a systems language and Linux is an OS. I was using a 22-million-parameter model to make a judgment the model on the other end makes effortlessly.\n\nSo I deleted the embedding layer. Now the server does one honest job — deterministic, verifiable matching (exact skill names plus vendor aliases like S3 to AWS). A requirement it can't verify is reported as an **honest gap**, with an explicit note: *this is evidence, not a verdict — you make the fit call.* The fuzzy part goes to the model that's actually good at it.\n\n## The result\n\nSame Rust job, today: **weak_match, \"unlikely,\" all five requirements flagged as gaps.** And a real LLM client, run end-to-end, read the evidence and concluded on its own: *\"not a good fit — zero verified Rust; his strengths are cloud, Python, and AI work, not systems languages.\"* Correct. The server stopped pretending to be smart, and the answer got smarter.\n\n## The lesson I keep relearning\n\nThe same principle bit me twice here. Don't make the server parse the job description — the calling LLM is the better reader. Don't make the server judge skill-equivalence — the calling LLM is the better judge. The server's job is to be the source of truth, stated plainly; judgment belongs to whoever in the pipeline is best at it, and increasingly that's the model already in the loop, not the bespoke code.\n\nI shipped this to my own résumé tool, which now happily tells you when I'm wrong for a role. That makes it more trustworthy, not less. The bug was never the embarrassing part — shipping a confident lie and never checking would have been."}},"pageContext":{"slug":"i-tried-to-make-my-ai-lie"}},"staticQueryHashes":[]}