skill_improver — review criteria¶
Use this list when you (the skill_improver skill) audit an existing skill. Each criterion maps to one or more of the P1–P8 principles. Findings should be specific, file-anchored, and actionable.
Phase markdown¶
For every phases/<name>.md:
- [ ] Frontmatter declares only
input(and optionallypreprocessor,role,can_finish). No output schema. [P1] - [ ] No enumeration of next-phase artifact fields in the body. [P1, P8]
- [ ] No description of Control IR format (e.g. "emit
{kind: file, op: write, ...}"). [P8] - [ ] No reference to a sibling phase's name. Transitions live in the skill graph. [P1]
- [ ] Body covers WHAT to do, WHEN to pick which candidate, and domain rules — nothing else.
- [ ] If the phase has
can_finish: true, instructions clearly state when to finish vs. continue.
Skill markdown¶
For skill.md:
- [ ]
entry,graph,final_outputall present. [P2] - [ ]
final_outputis a single artifact type. - [ ] Every phase referenced in
graphhas a correspondingphases/<name>.md. - [ ] Every phase whose
can_finish: truehas a path toendin the graph. - [ ] No unreachable phases.
- [ ] No self-loops (graph entries listing themselves).
Artifact schemas¶
For each artifacts/<name>.yaml:
- [ ] One artifact = one purpose. No kitchen-sink shapes.
- [ ]
requiredis minimal; optional fields are marked optional. - [ ] Field names are lowercase_snake_case.
- [ ] No "quality_notes" / "revision_reason" / other meta-feedback fields baked in. [P7]
- [ ] No decision fields with skill-specific values (
revise,redo). [P7]
Preprocessor¶
For each preprocessor step:
- [ ] The step kind is one of
run_skill,iterate,validate,lint_plan,python. - [ ]
intodoesn't collide with an existing input artifact key. - [ ] If
python: matchingpermissions.pythonentry,.pyfile exists, function defined,output_schemadeclared. - [ ] If
pythonmode: safe: no banned constructs in the AST (open,eval,subprocess, etc.).
Eval-friendliness¶
- [ ] At least one path produces an output that can be judged against rubric criteria.
- [ ] Phase boundaries align with what a judge would assess (one phase = one judgeable output).
- [ ] No criteria that would be vacuously satisfied by any reasonable LLM output.
Output of a review¶
For each finding:
[severity] <file>:<line or section>
Issue: <what's wrong>
Why: <which principle / pitfall this hits>
Fix: <concrete change>
Severity:
error— violates P1–P8 or breaks linting; must fix.warning— likely cause of bugs but not a hard violation.info— stylistic or polish-level.
After auditing¶
- [ ] Run
reyn lint <skill>and include any new findings in the report. - [ ] If proposing changes, write them as diffs the user can review (don't silently rewrite).
- [ ] Don't claim "ready" until errors are resolved or explicitly waived.