Modal LoRA Plan for Wonton-Soup Tactic Provider
This plan defines a reproducible path to train a new Lean tactic provider with Modal.
Goal
Train a model that outputs n next-tactic candidates for MCTS expansion in wonton-soup.
Runtime contract:
- Input: current Lean goal state
- Output: ranked
(tactic, score)candidates
Constraints
- Current
export-learningrows includegoal_typeand tactic outcomes, but not full per-node hypothesis strings. - V1 training therefore uses
goal_typeas the state representation. - V2 should add richer state logging (full hypotheses) if v1 is promising.
Data Pipeline
- Run
wonton.py export-learningon one or more completed runs. - Convert exports to SFT JSONL rows in miniCTX-style prompt shape:
[CTX]theorem header up tosorry[STATE]goal-only line:⊢ <goal_type>- target:
[TAC] <tactic> [/TAC]
- Train split and eval split are created in the Modal job.
Default label policy:
committed_success(only successful tactics that were committed in the tree)
Training Pipeline (Modal)
- Upload dataset JSONL to Modal Volume.
- Launch Unsloth LoRA training (default GPU:
L40S, optionalA100-80GB). - Save adapter checkpoints and run manifest to output Volume.
- Resume from latest checkpoint when requested.
Integration Path
- Add provider that loads base model + adapter and returns ranked tactic candidates.
- Add provider option to
orchestrator/lean.pyprovider registry and CLI validation. - Benchmark against
reproveranddeepseekon identical corpus/budget/seeds.
Acceptance Gates
- Primary: solve-rate delta on
researchcorpus. - Secondary:
- tactic validity rate
- per-call latency
- candidate diversity
Security / Secrets
- Do not hardcode tokens in code, files, or logs.
- Use Modal Secrets (for example
hf) and read via environment variables at runtime. - If a token has been pasted in chat/history, rotate it.