Navigation
Getting Started
Directing
Watching
Shaping
Budgets
Problems
Concepts
Reference
Concepts

Labs

The R&D function. Where the company tries things, runs evals, gathers evidence, and decides whether a proposed change is good enough to reach production.

What a lab is

A lab is a durable environment for controlled learning — the place where the company organizes questions, experiments, specimens, instruments, evidence, results, and warrants around a target of improvement. It’s not a single experiment, eval, or test; it’s the place where proposed changes are handled as claims that must earn their evidence before production adoption.

Labs are part of the exocorp’s anatomy — an immune system and a learning engine in one — not an optional engineering habit. Without labs, self-modification becomes taste: agents change prompts because the new wording feels better; skills get attached because they sound useful; workflows accumulate because nobody pruned them. With labs, the company has a disciplined way to ask what actually changed.

The pieces of a lab

  • SpecimensTest cases. Real transcripts, customer conversations, synthetic cases, historical failures — the inputs you’re testing against.
  • LeversThe things the company might change. Prompts, skills, models, workflows, triggers, session policies, harness assignments. Each lever can be independently varied.
  • EvalsInstruments. Each eval has a range, a calibration, known noise, and known blind spots. Evals measure specific behavior — they don’t measure truth.
  • BaselinesThe comparison standard. Current production, a simple alternative, the previous model, doing nothing. Without a baseline, you mistake attractiveness for improvement.
  • WarrantsNarrow conclusions, scoped and time-bounded. A warrant says: this specific conclusion is supported by this evidence, under these limits, for this scope, until this expiry or review condition. Warrants are how evidence becomes adoption.

Common lab shapes

  • Organization-dynamics labsTest topology, attention patterns, workflows, cognition, learning speed. Should this team be split? Does this workflow actually improve decision quality?
  • Competitor-benchmark labsExternal market and capability tests. How does the company’s output compare against alternatives a customer might choose?
  • Customer-engagement labsRetention, trust, channel quality. Do changes to outbound communication actually move what we care about?
  • Skill-adoption labsTest community capabilities against the company’s own use cases. Will this new skill add real capability, or just complexity?

How evidence flows

A lab produces results. Results that pass review become warrants. Warrants that pass approval become promotions — into a team’s skill set, into company doctrine, into a workflow version, into a harness assignment, or into a product change. Results that fail become archived negative evidence (with date, model, harness, version, cases, assumptions, failure mode) so future agents don’t repeat the same attractive mistake.

What you operate on as the human

You propose which questions become experiments. You understand eval scope and blind spots before trusting results. You review promotion paths — where does a result go? When risk is legal, ethical, or brand-related, the company surfaces results to you for human judgment before production adoption.

Next