Navigation
Getting Started
Directing
Watching
Shaping
Budgets
Problems
Concepts
Reference
Problems

Diagnosing failures

Step-by-step guides for the most common failure patterns. Start by figuring out which symptom you’re seeing, then follow the diagnosis path.

A run failed

A failed run lands in the Inbox. Click into the run detail and look at:

  • The error messageSometimes obvious: provider rate limit, API key expired, plugin endpoint unreachable. Fix the underlying cause (rotate credentials, restore the dependency) and the agent can retry.
  • The work item it was acting onCheck the work item’s context. Was the task text unambiguous? Was the team’s mandate appropriate for this work? If not, that’s upstream — fix the mandate or transfer the work to a different team.
  • The agentOne failure is noise. The agent detail page shows recent failures across runs. Three failures in a row on different work items is a signal that the agent itself isn’t set up correctly — wrong provider, wrong skills, wrong authority.

A work item is stuck

The Stuck filter in the Work view shows work items that aren’t progressing. Drill in:

  • What state is it inWaiting means it’s blocked on something. The waiting condition tells you what. In-progress with no recent run activity means an agent picked it up and hasn’t finished — check the run history.
  • What blocked itFor waiting items: is the dependency real? If you’re waiting on a customer response that never came, the work item should probably be canceled or have its scope reduced. If you’re waiting on an internal team, talk to the CEO about whether to push.
  • Retry loopMultiple failed runs against the same item means the agent keeps trying and failing. Either you intervene (change the mandate, cancel the item, transfer to a different team) or you let it keep failing — but then watch budget.

A team is degraded or unhealthy

Team health rolls up budget pressure, failed runs, waiting work, and heartbeat status. Drill into the team detail page:

  • Read the problems listThe team detail surfaces specific problem types: budget alert, failed runs (count), waiting work piling up (count), heartbeats failing (count). Each one is its own diagnosis path.
  • Look at recent runsIf multiple agents on this team are failing, the underlying problem is probably mandate / authority / capability. The team may need different tools, or its mandate may be wrong for the work.
  • Look at the budget detailIf the team is hard-stopped, it can’t admit new work. Either raise the limit or accept the team will be paused until the next budget period.
  • Talk to the CEOFor synthesis. “What’s going on with the marketing team?” The CEO has the cross-team context.

The runtime is unreachable

The exocorp shows as Running on the platform dashboard but the operator portal isn’t responding. A warning banner appears on the company detail page. Usually transient (a deploy in flight, a network blip, a restart midway). Wait 30-60 seconds and try again.

If it persists:

  • Check release statusThe company detail page surfaces release status. If there’s a failed provision job blocking, the error and the failed phase are recorded.
  • Try a restartRestart uses the last successful provision — fast and safe for transient issues. See Pause, restart, rollback.
  • Try a rollbackIf a recent update caused the issue, rollback to the previous known-good version.
  • Reach out to platform supportPersistent unreachability after restart/rollback is a platform-level issue.

The company is busy but not shipping

The hardest failure mode: the exocorp looks active but isn’t actually producing value. Symptoms:

  • Lots of work items, weak sourcesMost new work items trace back to internal pressure (“we should probably…”) rather than external signals (customer messages, metric changes, promises). The company is generating work from mission-text hunger, not from contact with reality.
  • Lots of discussion, no sharpeningComments piling up on work items, beliefs not being tested in labs, reviews showing “same as last time.” The company is agreeing with itself.
  • More agents, no new thinkingNew roles created but they’re behaving like existing ones. Headcount growth without capability differentiation.

The fix here isn’t a quick lifecycle action. It’s a direction conversation. Talk to the CEO. Ask why the company is producing what it’s producing. Sharpen the mandates that aren’t differentiating real work from internal motion.

Next