Budget and cost tracking¶

Overview¶

Reyn tracks LLM token usage and USD cost per session, per-agent, and per-model. Token and USD totals accumulate as LLM calls complete; configured caps refuse or warn before any call (or spawn) that would exceed them. The system is entirely opt-in: without a cost: block in reyn.yaml, runs are unlimited.

`reyn.yaml` schema¶

All budget configuration lives under the top-level cost: key. Every field is optional; omitting a sub-key (or setting its hard_limit to null) means unlimited for that dimension.

cost:
  # Per-agent caps — ledger-backed (survive restart + crash); cleared in
  # memory by /budget reset
  per_agent_tokens:
    hard_limit: 50000    # refuse after this many tokens for one agent
    warn_ratio: 0.8      # warn at 80% of hard_limit (default: 0.8)
  per_agent_cost_usd:
    hard_limit: 2.00     # refuse after $2.00 spent by one agent
    warn_ratio: 0.8

  # Per-model rate limit (hard cap, calls per 60-second window)
  rate_limit_per_minute:
    openai/gpt-4o: 60
  rate_limit_warn_ratio: 0.8   # warn at 80% of rate limit (default: 0.8)

  # Daily / monthly quotas — persistent across process restarts (PR25)
  # Stored in .reyn/state/budget_ledger.jsonl; auto-reset at midnight /
  # month boundary (local time).
  daily_tokens:
    hard_limit: 100000   # refuse after 100k tokens today
    warn_ratio: 0.8
  daily_cost_usd:
    hard_limit: 5.00     # refuse after $5.00 today
  monthly_tokens:
    hard_limit: 1000000  # refuse after 1M tokens this month
  monthly_cost_usd:
    hard_limit: 50.00    # refuse after $50.00 this month

Migration note: router_invocations_per_turn was moved from cost: to safety.loop. Use safety.loop.max_router_calls_per_turn instead. See Reference: reyn.yaml — safety block.

Field reference¶

Field	Scope	Persists	Resets
`per_agent_tokens`	per agent	ledger file	`/budget reset`
`per_agent_cost_usd`	per agent	ledger file	`/budget reset`
`rate_limit_per_minute`	per model	in-memory (60s window)	automatic sliding window
`rate_limit_warn_ratio`	global	—	—
`daily_tokens`	process-global	ledger file	midnight (local time)
`daily_cost_usd`	process-global	ledger file	midnight (local time)
`monthly_tokens`	process-global	ledger file	1st of month (local time)
`monthly_cost_usd`	process-global	ledger file	1st of month (local time)

Each cap dimension has two optional sub-fields:

Sub-field	Type	Default	Description
`hard_limit`	float or null	null (unlimited)	Refuse the next LLM call or spawn when this value is reached or exceeded.
`warn_ratio`	float	0.8	Emit a warning when usage reaches `hard_limit * warn_ratio`. A warning is emitted at most once per dimension per session.

USD cost calculation¶

USD cost is estimated via LiteLLM's pricing lookup after each call. Both proxy-mode (LiteLLM) and direct-API paths are supported. If the lookup returns no price for the model in use, the USD counter stays at $0.0000 and only tokens accumulate. Token counts are always reliable regardless of pricing availability.

Slash commands¶

While reyn chat is running, two slash commands expose the budget state.

`/cost`¶

One-line summary for the currently attached agent:

/cost

Example output:

alice: 12,450 tokens, $0.0187  (this session)

Reports the per-agent counters for this agent. These are restored from the ledger on startup (so they accumulate across restarts) and cleared in memory by /budget reset. Returns nothing when no cost: block is configured (unlimited mode).

`/budget`¶

Full breakdown across all dimensions and all agents seen this session:

/budget

Example output:

Usage (process invocation):

  Today (2026-05-09):   tokens 12,450 / 100,000 (12%) | $0.0187 / $5.00 (0%)
  Month (2026-05):      tokens 12,450 / 1,000,000 (1%) | $0.0187 / $50.00 (0%)

  alice (attached)
    tokens:       12,450 / 50,000  (warn at 40,000)
    cost:         $0.0187 / $2.00     (warn at $1.60)

  Per-chain skill calls:
    chain-abc/direct_llm:  2 / 5

  Rate limit (last minute):
    openai/gpt-4o:  14 / 60  (warn at 48)

  Reset counters with `/budget reset`.

The "Today / Month" section appears only when daily_* or monthly_* caps are configured and at least one LLM call has been made since startup.

`/budget reset`¶

Clear the in-memory per-agent counters:

/budget reset

Daily and monthly counters are not affected — they are backed by the persistent ledger (.reyn/state/budget_ledger.jsonl) and auto-reset at period boundaries. To clear them, delete or archive the ledger file while the process is stopped.

Cap tiers¶

Each dimension has two tiers:

Soft warn — emitted once when usage crosses hard_limit * warn_ratio. The LLM call proceeds; a [budget warn] status message is shown to the user in the REPL and an event is written to the event log.

Hard refuse — emitted when usage reaches or exceeds hard_limit. The LLM call is refused before it is made (no tokens are consumed). A [budget exceeded] message is shown to the user with current usage, the triggered dimension, and three recovery actions:

[budget exceeded] agent 'alice' is over the hard limit.

  Triggered:  per_agent_tokens (50,123/50,000)
  Also used:  $0.0374

The next LLM call has been refused.

What you can do:
  • Raise the limit in `reyn.yaml` or `reyn.local.yaml` (cost: section)
  • Reset counters with `/budget reset`
  • Restart `reyn chat` (limits are per-process)
  • See current usage with `/budget`

For rate-limit violations (rate_limit_per_minute), the call is refused until the next invocation falls within the 60-second window (no automatic sleep / throttle — the user or calling code must retry).

Events emitted¶

Event	When emitted
`router_retry_exhausted`	`safety.loop.max_router_calls_per_turn` cap is reached; carries `count`, `cap`, `last_reason`
`budget_reset`	`/budget reset` is executed; carries `before` snapshot of counters

Warning and refusal events are surfaced as outbox messages to the user rather than as distinct event types. The budget_warned / budget_refused signal is embedded in the outbox message text and the BudgetCheck return value that the runtime inspects.

Cross-link: reference/runtime/events.md

Per-call accumulation¶

Counters update after each LLM call completes successfully:

Token usage (input_tokens + output_tokens) is added to the per-agent accumulators.
USD cost is estimated via LiteLLM pricing and added to the USD accumulators.
A record is appended to .reyn/state/budget_ledger.jsonl (fsync'd for durability). The daily / monthly / per-agent counters are reconstructed from these records on the next startup.
The updated counters are checked against warn thresholds; any newly crossed threshold emits a warning outbox message (once per dimension per session).

Pre-call checks run before the call: if a hard cap is already exceeded, the call is refused at that point — no tokens are consumed.

Ledger file¶

Budget counters persist across process restarts — and crashes — via the fsync-per-append .reyn/state/budget_ledger.jsonl. One record per LLM call:

{"ts": "2026-05-09T10:23:00+09:00", "agent": "alice", "model": "openai/gpt-4o", "tokens": 312, "cost_usd": 0.00234}

Legacy note: a pre-existing ledger may also contain skill-spawn records ({"kind": "spawn", ...}) written before the per-chain skill-spawn cap was removed. They are no longer written; hydrate skips them on read.

Records are fsync'd on append. On startup, Reyn re-aggregates from the ledger: today's and this month's daily / monthly totals (period-filtered), the and the cumulative per-agent token + USD totals. The ledger is the cap-critical source of truth; .reyn/state/budget_state.json is a throttled best-effort cache layered on top (it can lag the ledger by up to a second, so the ledger value always wins on recovery). The ledger is append-only and grows at roughly a few MB per month; it can be manually archived if needed (stop the process first, or wait for the period rollover).

Per-agent cap recovery semantics¶

per_agent_tokens and per_agent_cost_usd are lifetime/persistent — they are reconstructed from the all-time durable ledger on every startup and survive crash and restart unchanged.

They do not reset per-conversation. The counters accumulate continuously and are only cleared explicitly by /budget reset (in-memory clear) or by archiving the ledger file.

Contrast with daily / monthly caps, which auto-reset at their period boundary (midnight or 1st of month, local time) regardless of process restarts or crashes.

Crash-recovery guarantee: a crash cannot lower a per-agent cap counter below its durable ledger value. On recovery, load_state (the throttled best-effort cache) is merged with hydrate (the ledger) using max() — so a stale or garbage-corrupted state file can never cause the cap to under-count spending and permit an over-budget call. Rationale: crash recovery must be complete; a crash that resets a lifetime cap would allow unbounded over-spend in the window before a human notices.

What is not yet implemented¶

Be aware of the following limitations:

Auto-throttle — when a rate limit is hit, Reyn refuses the call rather than sleeping until the window opens. The caller must retry.
Cross-process / multi-tenant budgets — each reyn chat or reyn web process maintains its own in-memory counters and only picks up another live process's ledger records on its next startup (hydrate). Concurrently running processes therefore enforce every cap independently in real time; the shared ledger reconciles daily / monthly / per-agent totals only when a process restarts.