Back to selected workCase study · /00
Personal project (open source) · 2026

Tender Response Assistant — Evidence-Bound Drafting Pipeline for Industrial Bid Managers

Full-stack drafting copilot for industrial-automation bid managers: 60–200-page PDF in, verbatim-cited requirements, evidence-bound first-pass answers and a reviewable DOCX out — not a chat-bot.

Development·Architect & sole developer
Next.js 15 App RouterTypeScript strictSupabase (Postgres + Storage)OpenRouter gatewayOpenAI SDKClaude Sonnet 4.6Claude 3 HaikuZodpdf-parsedocxTailwind (OKLCH tokens)Vercel Pro
/01

Problem

Industrial-automation and warehouse-logistics suppliers respond to public tenders that arrive as 60–200-page PDFs containing 80–400 individual requirements, scattered across lots, evaluation criteria and required-document lists. Bid managers spend roughly a week per tender turning that into a structured response: locating every requirement, deciding which the company can credibly cover from its capability matrix, drafting first-pass answers tied to evidence, and packaging the lot into a submission DOCX. The product is built for that role specifically, with Bosch's industrial-automation profile as the concrete demo target.

Procurement is one of the domains where a confident-sounding LLM is a liability, not an asset. A drafted answer that quietly overclaims a capability the company doesn't actually have can disqualify the bid or expose the supplier to contractual risk. So three things had to be true at once: every extracted requirement must carry the exact source quote from the PDF (no paraphrase), every drafted answer must be traceable to a row in the capability matrix or be flagged for human decision, and the long-running fan-out work (200+ Haiku calls per draft pass) must survive Vercel timeouts and recover cleanly. None of these were optional.

/02

Solution

A four-stage pipeline runs each tender through extract → match → draft → risks, each as its own API route with its own Markdown prompt in lib/prompts/. Extract is one Sonnet 4.6 call (~80 s p50) returning metadata, lots, requirements, required documents and evaluation criteria. Match chunks 20 requirements at a time and fans out to Haiku in Promise.all, classifying each as fully / partially / not covered / unclear with a gap_description and confidence. Draft runs Haiku through a Semaphore(5) at ~1.8 s per requirement, behind a hard evidence guard that auto-tags [REQUIRES BID MANAGER DECISION] when match_status is fully / partial but evidence_used is empty, and always blocks not_covered drafts with the gap_description as the visible reason. Risks runs last and emits severity + recommended_action. Every LLM call goes through one client with placeholder substitution, JSON-mode + Zod validation, one corrective retry, a 65 s + retry on 429 and structured logging.

/02b

Architecture

Browser → Next.js 15 App Router on Vercel Pro → Supabase Postgres + Storage. LLM traffic is fanned out from the four pipeline routes through a single lib/llm/client.ts to OpenRouter (Claude Sonnet 4.6 for extract, Claude 3 Haiku for match / draft / risks) with Zod validation, JSON-mode and request_logs persistence on every call.

/02c

Product

/03

Key metrics

~145 sEnd-to-end pipeline · ~100-requirement tender
~7 daysManual baseline replaced
4 stagesExtract → match → draft → risks
8 tabsReviewer surface, editorial UI
/05

Live links

/06

Key decisions

Evidence guard as contract, not post-processing

The model is asked to mark requires_bid_manager_decision itself and to emit evidence_used; the guard then verifies the flag against the evidence array. Treating this as a contract — and overriding the draft when the two disagree — keeps the safety floor at the type level instead of relying on prompt discipline.

Sonnet 4.6 for extract, Haiku for everything else

Extraction is one expensive call where verbatim-quote fidelity matters most and latency is amortised over the whole pipeline. Match / draft / risks are fan-out workloads where Haiku's lower cost and faster TTFT pay back across hundreds of calls per tender. Splitting the model choice by stage made the cost / latency / quality envelope tractable on Vercel Pro.

Structural JSON balancer over 'just ask for more tokens'

Chunked match responses occasionally hit token limits mid-array. Bumping max_tokens delays the problem rather than fixing it. A deterministic balancer that rewinds to the last fully-closed element saves N-1 entries on every truncation, makes the failure mode debuggable and stops the whole chunk being lost to a single missing brace.

RLS off, single passcode gate, for the demo

Single-tenant demo for a portfolio audience — Supabase RLS is disabled and access is gated by a shared passcode, with the service role used server-side. This is recorded explicitly as a gap rather than dressed up; multi-tenant work would re-introduce RLS, per-user JWT claims and audit trail before anything else.

/07

Reflection

The interesting part of this project was not the LLM plumbing — it was deciding where the model is allowed to make claims at all. Once the evidence guard, the verbatim-quote rule and the [REQUIRES BID MANAGER DECISION] marker were in place, every other decision (which model per stage, how to chunk, when to retry, how to recover from a killed Vercel function) became local. The product is honest about being reviewer-in-the-loop because the architecture forces it to be.