Building a 5-Agent Pipeline in Production: Inside the AI Business Analyst

Personal AI — multi-agent product architecture built by Cinnaboner

Key takeaway: A production-grade AI agent pipeline uses 5 specialised agents running in sequence with fallback logic and source labelling — not a single-prompt demo — and finishes in 45–90 seconds per request.

Most agent demos you see on LinkedIn are single-turn toys. One model, one prompt, one answer, a screen recording that ends before anything breaks. Production looks different. We shipped a 5-agent pipeline that runs on every AI Business Analyst request, finishes in 45 to 90 seconds, and holds up when one of the agents has a bad day.

This is the actual architecture. What each agent does, why it's separate, and the small engineering disciplines that keep the whole thing stable.

One mega-prompt is the wrong answer

The first version of this tool was a single enormous prompt. Paste a URL, stuff everything into context, ask for a full business audit, render the response. It worked in the demo. It fell apart in production.

Three problems showed up fast. First, the model had to juggle five different frames of thinking at once — business model, competitors, SEO, strategy, prioritization — and it got worse at all of them simultaneously. Second, when any one section went wrong, the whole output was suspect. Third, we couldn't run anything in parallel, so every audit took three minutes and felt broken.

The fix was obvious in retrospect. Split the work. Give each agent one job. Run the independent ones in parallel. Make the prompts smaller, the outputs stricter, and the failure surface narrower.

The five agents and what they actually do

The pipeline has five specialized agents behind one endpoint. Here is what each one owns.

Business Model (BM). Takes the scraped site content and produces a Lean Canvas plus a Unit Economics block. Segments, problem, value prop, channels, cost structure, revenue streams. Structured JSON, every field typed. It runs in parallel with Competitor and Digital Presence.

Competitor (CP). Returns five qualitative competitor cards. We deliberately don't scrape each competitor's site — we don't have permission to, and the data we could get would be shallow. Instead the agent produces five named, plausible competitors with positioning notes and a threat level. This is explicitly labeled as LLM-inferred in the final report, because honesty about provenance is the only way any of this stays useful.

Digital Presence (DP). The biggest single agent. It produces E-E-A-T scores, a GEO readiness view, brand voice observations, and keyword seed ideas. All grounded on the same snapshot the other agents see — the scraped DOM, the tech stack, the PageSpeed numbers.

Checklist. Runs after BM, CP, and DP finish. It takes their combined output and produces 50 ICE-scored action items. Impact, Confidence, Ease, each on a 1–10 scale. The reason this one runs second is simple: it's a prioritization agent, and you can't prioritize against context you don't have yet.

Assembler. The final pass. Takes everything — the three parallel agents, the checklist, the raw snapshot — and produces the human-facing pieces. Executive summary, maturity assessment, SWOT, strategic direction, and the documentation block. This is the one agent whose job is narrative synthesis, not structured extraction.

Total runtime on production hardware: 45–90 seconds. A chunk of that is the parallel phase, which is only parallel because each agent owns a small, self-contained piece.

Why separate agents, not separate sections of one prompt

There's a reasonable-sounding alternative: one model call, carefully structured, with section headers. We tried it. It doesn't work for three reasons.

Context bleed. When you ask one model to produce a Lean Canvas and a competitor analysis in the same call, the competitor section subtly influences the Lean Canvas. The model tries to be consistent across sections, and consistency masquerades as quality until you read it carefully.

Failure blast radius. If a single call errors out halfway through, you lose everything. With five calls, a bad Competitor response doesn't poison the Business Model output.

Prompt maintainability. Five agents means five focused prompts of 40–80 lines each. One mega-prompt means one 400-line monster that nobody wants to touch. The second version rots faster.

Building a multi-agent product and hitting reliability walls?

We've debugged most of the interesting failure modes. Let's look at yours before they cost a client.

Taking on new projects

Strict JSON and the jsonCall.js insurance policy

Every agent in the pipeline is instructed to return strict JSON. Not "JSON wrapped in prose", not "here's a code block of JSON, let me explain", just a JSON object matching a schema. This is non-negotiable, because the frontend binds directly to the typed Report object.

The reality is models slip. Even with temperature pinned and clear instructions, one in fifty responses arrives with a stray sentence before the opening brace or a trailing comment after the closing one. That's the job of utils/jsonCall.js. It wraps every model call, tries JSON.parse first, and if that fails, extracts the largest JSON-shaped substring and tries again. It also enforces a schema check before returning.

This is boring infrastructure. It's also the difference between a demo and a shipping product. You can either trust the model every time, or you can trust it ninety-eight percent of the time and have a safety net for the rest.

Grounded input: the snapshot every agent sees

The other non-negotiable is that every agent reads from the same grounded snapshot. Before any LLM call happens, the pipeline does two things in parallel: it scrapes the target site with Cheerio, and it calls Google PageSpeed Insights. The scrape pulls real signals — title tag, meta description, H1s, alt text coverage, JSON-LD schema, tech stack from script tags, SKU count from /products/* links, blog velocity from <time datetime> on /blog or /news, review counts from aggregateRating markup when present. PageSpeed contributes Core Web Vitals and the Performance score.

That snapshot gets injected into every agent's system prompt. It's the ground truth. And here's the critical rule: when a signal isn't in the snapshot, the agent is instructed to output "Not detectable" rather than make something up. So if the scraper couldn't find a blog, the digital presence agent doesn't invent a posting cadence — it says it couldn't be detected.

This one rule — "Not detectable" over fabrication — is the single highest-leverage instruction in the whole system. Every other agent prompt inherits it from systemBase.js, so there's no version of the pipeline where an agent is allowed to guess.

A worked example: what actually happens on a request

A user pastes https://example-saas.com into the intake form. Here's the sequence.

The request hits POST /api/analyse. We validate the URL, normalize it, and kick off scrape + PageSpeed concurrently. Roughly 3–8 seconds later, we have a snapshot object: page title, meta, tech stack detected, perf score 62, no schema, no blog found.

We fan out. BM, CP, and DP all start at the same time. Each gets the snapshot plus its own specialized prompt. Each call goes through jsonCall.js. If PageSpeed had failed, the snapshot would still arrive — just with perfScore: null — and the downstream agents would label the performance section "Not measured" instead of blocking the whole report.

Once those three return, the Checklist agent fires with all three outputs as context. 50 ICE-scored items come back. Then the Assembler runs with everything — snapshot, BM, CP, DP, checklist — and produces the narrative layer.

The final Report object is typed, shaped, and ready for the frontend. The React components bind directly to the JSON. No transformation layer, no post-processing. What the pipeline produces is what the user sees.

What this buys you

Speed, because parallelism. Stability, because failures stay contained. Honesty, because grounding plus "Not detectable" kills the worst hallucinations at the source. Maintainability, because each agent is a small prompt file you can actually read.

A production multi-agent system isn't a prompt engineering problem. It's a small systems engineering problem wearing a prompt engineering hat.

If you're building something like this and it falls over at 3 a.m. on a Tuesday, we've already debugged most of the interesting failure modes.

Ship agent products clients can actually trust.

Grounded inputs. Strict JSON. Honest provenance. We build these for a living.

Taking on new projects
Keep reading