The coordination
layer for
AI taskforces.

Humans specifywhatTaskforce coordinateshow

A coordination protocol and runtime that lets a higher-level AI supervisor reliably govern specialized workers, tools, and evaluations. Inspectable, safe, and robust.

Today, multi-step AI agents are powerful but messy. When workflows scale, they become hard to trust: inventing arguments, looping on failures, or quietly executing unauthorized actions.

Taskforce establishes formal execution contracts, handling the tedious repair loops automatically, and escalating to the human only for strategic, irreversible milestones.

§ 01Story of a Loop: Sourcing & Dealflow

Let's optimize a manual grunt process.

Sourcing early-stage startups and matching them to investors for an AI-native VC fund is a fragile house of cards under typical improvisational agent frameworks.

The Slow Manual Pipeline — Before Taskforce
1

Scattered Sourcing

Associates scrape directories manually. The unconstrained LLM improvises founder records, invents profiles, and clogs the CRM with duplicate data.

Model guesses details
2

Intake Friction

Parsing pitch deck PDFs. Models frequently miscalculate runway times or confuse Gross Merchandise Value with actual Annual Recurring Revenue.

GMV confused with ARR
3

Silent Failures

Cross-checking metric claims. If an API connection drops, scripts ignore the error and route unverified, hallucinated metrics directly to partners.

No source verification
4

2 AM Slack Pings

The script breaks because a model payload returned markdown fences instead of pure JSON. Developers wake up in the middle of the night to fix it.

Markdown breaks parser
The Bounded Runtime — After Taskforce Playbook

We encapsulated this sourcing handbook into an immutable playbook.

Taskforce locks the execution boundaries. The strategic L2 supervisor checks schemas, dispatches tasks to specialized L3 workers, repairs validation anomalies autonomously, and escalates to human gates only when strategic.

Playbook Execution Lifecycle
Playbook: market-intel
Secured
Node 01 · L2 Supervisor

Initialize Playbook

Taskforce loads the playbook schema and locks execution boundaries against the Hub registry. The agent is strictly barred from calling unapproved tools or creating unverified loops.

worker.collect verifiedeval.claims verifiedgate.human verified
Active
Node 02 · L3 Worker

Specialized Dispatch

L3 worker fetches unstructured files and matches schemas. Under task constraints, output structures are normalized and strictly validated.

arXiv API
8 sources
GitHub API
4 matches
Web Intake
Verified
Halted
Node 03 · L3 Judge

Contract Check Failed

The contract judge scans output text and detects a factual grounding gap: the assertion "ABRT is deprecated" has no source evidence. Execution halts.

Factual Grounding: 0.45Required: 0.90
Repairing
Node 04 · L2 Supervisor

Autonomous Self-Repair

Taskforce Strategic Supervisor intercepts the audit exception. Instead of crashing, it feeds the validation report back, dispatches a patch work order, and commands a corrective rewrite.

L2 Supervisorworker.rewriteVerify
Passed
Node 05 · L3 Judge

Re-Audit: Factual Grounding Verified

The corrected output is re-run through evaluation. Factual grounding reaches 100% compliance with strict corpus alignment. The node passes safely.

Factual Grounding: 0.98Required: 0.90
Awaiting Operator
Node 06 · Strategic Gate

Strategic Human Gate

Low-level issues are managed. However, high-stakes external actions (dispatching the finalized deal sheets) require an explicit human signature.

Operator signature required
Node 07 · L2 Supervisor

Committed

Deal sheet successfully committed to production channels. Playbook logs are locked, and a transparent runtime receipt is generated.

§ 02Core Operational Thesis

Autonomy without contracts is chaos.

Most developer frameworks optimize for maximum agent autonomy—giving models tools and letting them improvise. But in enterprise systems, unconstrained autonomy creates silent failures and unpredictable operational overhead.

Improvisational FrameworksChaotic

LLMs figuring out the pipeline on the fly.

When agents randomly choose tools, synthesize inputs, or attempt recursive repairs, they enter unstable execution paths.

  • Models invent tool payloads or make up parameters, throwing runtime exceptions.
  • Infinite recursion loops consume significant token budget on trivial syntax retries.
  • Silent degraded paths generate fake or mock data to bypass failed API steps.
  • Humans are dragged in for low-level mechanical fixes (e.g. "repair this JSON field format").
RESULT: Systems that make for amazing sandbox demos, but are fundamentally untrustworthy in production.
Taskforce GovernanceDisciplined

Execution constrained by immutable Playbooks.

Taskforce constrains LLMs inside strict, inspectable boundaries. The system operates inside predefined capabilities approved in the Hub.

  • All capabilities, tools, and schemas are pre-approved and locked in the Taskforce Hub.
  • Autonomous repair protocols automatically resolve low-level schema issues without human noise.
  • Failing path honesty: if a required resource or credential is missing, fail clearly and halt.
  • Humans only answer the **"WHAT"** (approving high-stakes, irreversible, or strategic outputs).
RESULT: Complete audit records, extreme runtime reliability, and zero-compromise system contracts.
§ 03The Manifesto

Honesty is a runtime property, not just a team value.

01

No Silent Degradations

If an API call fails or resources are offline, the system throws an explicit error immediately. We do not mask failures with synthetic placeholder responses.

02

Predefined Capabilities

Supervisors lookup approved workers and playbooks from the central Hub. A model cannot invent new tools or improvise actions beyond its registry contract.

People will not build serious, long-term companies on AI infrastructure that quietly substitutes fake behavior or skips the hard parts.

Taskforce works under a zero-pretence rule. If a required credential, eval score, or human signature is missing, the workflow fails clearly. This uncompromising predictability is the bedrock of enterprise trust.

"We make agentic work inspectable, repeatable, and completely safe by replacing endless LLM trial-and-error with formal software boundaries."
§ 04Architectural Blueprint

Inspired by how real organizations operate.

We separate strategic coordination from low-level narrow execution. This L2/L3 division ensures complete accountability and granular inspectability.

Strategic Layer

The Supervisor

Acts like an architect or mission commander. It is given a playbook, maps the execution nodes, monitors tasks, handles repair loops, and escalates when necessary.

L2 Runtime Core Duties:
1. Bounded Assignment
Translates high-level tasks into discrete, isolated worker contracts with strict schemas.
2. Input/Output Check
Validates all worker payloads against local JSON/Zod schemas before routing.
3. Auto-Repair Loops
Intercepts failures (API timeouts, grounding issues) and dispatches specific correction tasks.
4. Strategic Escalation
Stops execution loop and alerts human supervisor for irreversible high-stakes actions.
§ 05Immutable Capabilities

Taskforce Hub.

The Hub is an internal, strict capability registry. It is the source of truth for playbooks, tools, schemas, and evaluators. Taskforce references this directory to check permissions, enforce contracts, and route tasks.

Local Authority:A centralized location for teams to register capabilities, keeping executions completely hermetic.
KindIdentifierStatus
playbookmarket-intel.v1approved
workerworker.collect@1.2approved
workerworker.score@2.1approved
workerworker.rewrite@1.0approved
evaleval.claims@0.9approved
tooltool.arxiv.queryapproved
tooltool.github.searchapproved
workerworker.voice@0.4review
patternfail.grounding-mismatchindexed
patternfail.api-quota-limitindexed
§ 06Bimodal Mechanics

Separating Design Mode from Execution Mode.

Execution modewhen the Playbook is known

The factory supervisor.

L2 works directly under the pre-approved Playbook contract: dispatching work packages, verifying schemas, running self-repair loops, and stopping execution at strategic human boundaries.

follow rulesdispatch jobsverify schemasauto-repairrequest signature
Design modewhen the Playbook is unknown

The system architect.

L2 acts as a system designer. It analyzes the mission objective, maps missing worker requirements, structures schemas, tests the evaluation gates, and prompts the human to sign off before deployment.

analyze missionscaffold worker specswrite testsratify playbook
§ 07Private Sandbox Evaluation

Build serious AI operations on a runtime that refuses to pretend.

We are deploying private sandbox environments for developers, AI-native founders, and investors. Input your corporate credentials to deploy a sample Taskforce playbook.