Builder loop
Clarify the aim, map the terrain, choose a solution level, define evidence, execute, review, and preserve what survives.
- Best for a real codebase improvement
- Produces a complete artifact set
- Teaches when to stop, dissent, or salvage
A practical curriculum for builders using LLMs inside real systems: state the intent, select context, compare solution levels, verify the work, and preserve what the next session needs.
The tutorial starts from the failure everyone recognizes: the model produces a decent-looking patch, the chat summary sounds confident, and nobody can tell whether the work served the aim.
The fix is not more ceremony. It is a compact loop where every artifact preserves a decision, an assumption, an evidence check, and the next consumer.
Start with the full builder loop when you have a project slice. Use the focused tracks when the active bottleneck is reusable interfaces or eval design.
Clarify the aim, map the terrain, choose a solution level, define evidence, execute, review, and preserve what survives.
Turn selected context into a checkable prompt, then decide what stays dynamic and what deserves a skill or subagent boundary.
Turn “the agent passed” into a purpose, signal, fixture set, grader, threshold, action policy, and failure-reading loop.
The tutorial uses the Open Horizons loop at different altitudes. Review, dissent, and salvage are available at any point; they are checks, not final ceremony.
Name the behavior change, fit the model to the task, select context, and assemble a prompt with reviewer checks.
Map constraints and terrain, choose a problem statement, then compare band-aid, local optimum, reframe, and redesign options.
Define checks before implementation, delegate one bounded slice, and review against the aim rather than the summary.
Promote repeated procedures to skills, bounded roles to subagents, and hard-won lessons to durable knowledge.
Do not use a blank repo. Judgment needs terrain: tests, multiple subsystems, a known annoyance, and enough history that technical debt is not hypothetical.
Write one sentence that names the outcome, not the activity. Pause after a short burst of likely causes, files, checks, and failure modes.
Name the work the model is suited to do, the context it needs, what it must not infer, and how a reviewer can check it.
Select context with provenance: project shape, relevant files, constraints, prior attempts, landmines, checks, and stop triggers.
Combine stable instructions, dynamic context, success criteria, examples where needed, output contract, and missing-evidence behavior.
Turn the intent note into an aim statement with current state, desired state, mechanism, assumptions, feedback signal, and guardrails.
Map systems, actors, repeated symptoms, constraints, assumptions to test, existing evidence, prior attempts, and blast radius.
Compare symptom, systems, and maintainer-outcome framings. Select one, reject the others, and name what evidence would invalidate it.
Generate breadth before judgment. Score band-aid, local optimum, reframe, and redesign options against impact, cost, testability, reversibility, and maintenance burden.
Define the old behavior that should now fail, the invariant that must hold, positive and negative cases, threshold, action policy, and residual risk.
Hand off a bounded slice, run the checks, review the diff against the aim, use dissent when confidence outruns scrutiny, and salvage when work drifts.
Use this path when a one-off prompt starts becoming an interface the next session should reuse. The rule is simple: context stays current, prompts assemble the current request, skills preserve repeated procedures, and subagents preserve bounded roles.
Start by naming fitness for purpose, not by choosing a metric. Prefer harness-executed outcome checks and app or user signals before reaching for model graders.
A future agent or maintainer should recover the aim, constraints, chosen framing, evidence checks, role boundaries, and failure modes without rediscovering the whole problem.
The tutorial is designed to be used while working, not consumed as homework. Read the deep dive when that step becomes the active bottleneck.
npx skills add open-horizon-labs/skills -g -a claude-code -y