Skill Self-Improvement¶

Reyn skills can improve themselves from execution traces — automatically, with full version archiving and one-command rollback. The entire process runs as a governed skill (skill_improver) rather than as a side effect outside the OS, which means every improvement passes through the permission model, is subject to user-approval gating, and is linked to the execution history via skill_version_hash. Five components landed together as FP-0006 on 2026-05-15.

Unlike Hermes GEPA — which triggers self-improvement as an unrestricted side effect after 5+ tool calls — Reyn's design treats skill improvement as a first-class, operator-governed operation. See Comparison with Hermes GEPA below.

How it fits together¶

skill_improver (stdlib skill)
    │
    ├─ optional: collect_traces phase ──► recall(sources=["events"]) → traces_summary.md
    │       (FP-0006 C — requires FP-0009 events index)
    │
    ├─ run_and_eval / plan_improvements / apply_improvements
    │
    └─ finalize
           ├─ snapshot pre-apply skill.md → .reyn/skill-versions/<name>/v<N>.md  (FP-0006 B)
           ├─ ask_user gate (config: on_propose)                                  (FP-0006 D)
           └─ apply
              → run_skill_started events carry skill_version_hash                 (FP-0006 A)

Audit + recovery:
    reyn skill versions <name>   list saved versions      (FP-0006 E)
    reyn skill rollback <name>   restore previous version (FP-0006 E)
    → emits skill_rolled_back P6 event                    (FP-0006 E + follow-up)

The collect_traces phase is optional — it depends on Operational Intelligence (FP-0009) having indexed the events log. When the index is absent, skill_improver falls back to running run_and_eval directly without a trace-driven context.

Components at a glance¶

Component	What it adds	Source
A	`skill_version_hash` field on every `run_skill_started` event	`src/reyn/op_runtime/run_skill.py`
B	`.reyn/skill-versions/<name>/v<N>.md` snapshot + `current` pointer	`skill_improver/version_snapshot.py` + `phases/finalize.md`
C	`collect_traces` phase (recall path + raw-events fallback)	`skill_improver/trace_collector.py` + `phases/collect_traces.md`
D	`on_propose: ask_user\\|auto\\|disabled` config + finalize gate	`src/reyn/config.py` `SelfImprovementConfig` + `phases/finalize.md`
E	`reyn skill versions / rollback` CLI	`src/reyn/cli/commands/skill.py`

Workflow walk-through¶

A typical self-improvement run for a project skill called my_skill:

1. Invoke skill_improver

reyn run skill_improver '{"target": "my_skill", "improvement_source": "traces"}'

2. Collect traces

skill_improver calls recall(sources=["events"], query="my_skill failure patterns") to retrieve a structured summary of recent runs — phase paths, error types, cost, pass rates grouped by skill_version_hash. The result lands in the workspace as traces_summary.md.

3. Plan and apply improvements

plan_improvements drafts concrete changes to my_skill/skill.md (instructions, phase graph, or eval criteria). apply_improvements writes the revised file via a write_file Control IR op — gated by the permission model like any other write.

4. Run eval

run_and_eval runs my_skill against its eval set and computes a pass-rate score. If the score is below the acceptance threshold configured in skill_improver's eval criteria, apply_improvements retries up to the configured iteration limit.

5. Finalize — version snapshot + user gate

On approval threshold reached, finalize:

Reads the current my_skill/skill.md and writes it to .reyn/skill-versions/my_skill/v2.md.
Updates the current pointer file to "3" (the new version number after apply).
If on_propose: ask_user (default), issues an ask_user intervention:

Apply v3 to my_skill? (eval score: 0.85 → 0.92)
[Apply] [Discard]

On approval, writes the improved skill.md back to reyn/project/my_skill/.

6. Version hash on next run

The next invocation of my_skill emits a run_skill_started event with skill_version_hash set to the sha256 of the new skill.md. reyn eval compare can now group runs by hash to detect regressions automatically.

7. Rollback if needed

reyn skill versions my_skill
#   v1  2026-05-01  (initial save)
#   v2  2026-05-05  improvement: instruction improvement in plan_improvements phase
#   v3  2026-05-09  improvement: failure pattern handling via collect_traces  ← current

reyn skill rollback my_skill --to v2

Rollback writes the archived v2.md back to reyn/project/my_skill/skill.md via a write_file op (permission-checked), then emits a skill_rolled_back P6 event:

{"skill": "my_skill", "from_version": 3, "to_version": 2, "reason": "user rollback"}

Configuration (`reyn.yaml`)¶

self_improvement:
  on_propose: ask_user   # ask_user | auto | disabled (default: ask_user)
  max_versions: 10       # cap on saved versions per skill (default: 10)

Mode	Behaviour
`ask_user`	Default. `finalize` pauses and shows the improvement diff + eval delta. The user approves or discards before any change lands.
`auto`	`finalize` applies without prompting. Intended for CI pipelines or scheduled batch runs where operator trust is established.
`disabled`	`skill_improver` runs through all phases and emits the proposed diff as an artifact, but never writes back to the skill. Dry-run mode.

When max_versions is reached, finalize deletes the oldest saved version (v1) before writing the new snapshot.

Permission model integration¶

The permission model handles meta-improvement and stdlib protection without any special-case logic:

Meta-improvement is auto-禁止 by default. src/reyn/stdlib/ is outside the default write zone. Attempting to improve skill_improver itself — or any other stdlib skill — results in a PermissionError at the write_file op dispatch stage, with no special check required in the OS layer (P7 compliant).

Stdlib rollback is refused by the CLI itself. reyn skill rollback only operates on reyn/project/ and reyn/local/ skills. Stdlib skills (src/reyn/stdlib/skills/) are ship-bundled and immutable. Users who want to customise a stdlib skill should copy it to reyn/project/<name>/ first — the skill resolution order (reyn/project/ > reyn/local/ > src/reyn/stdlib/skills/) ensures the project copy takes precedence.

on_propose: auto requires operator trust. The default ask_user mode is appropriate for interactive use. Switch to auto only in environments where the operator has reviewed the improvement pipeline and accepts autonomous writes — for example, a nightly CI job that runs skill_improver after evaluating a week of traces.

Comparison with Hermes GEPA¶

Hermes' GEPA mechanism triggers improvement as an unrestricted side effect outside the agent runtime. Reyn's approach treats improvement as a governed skill execution.

	Hermes GEPA	Reyn `skill_improver`
Execution model	Side effect outside the OS	Stdlib skill — governed by OS runtime
Trigger	Automatic after 5+ tool calls	User-invoked or cron (FP-0001)
Permission check	None	`write_file` op → Permission model
User approval	Not possible	`on_propose: ask_user\\|auto\\|disabled`
Change record	None	`skill_improved` event in P6 audit log
Recovery	Difficult (no change record)	`reyn skill rollback` + P6 event trace
Reproducibility	Not guaranteed	Every run linked to version via `skill_version_hash`
Meta-improvement	Unrestricted	Prohibited by default via Permission model

For the full Hermes GEPA analysis see docs/deep-dives/research/competitive/hermes-agent.md.