Benchmark Card (Draft): `goal_to_tactic_proposal_v1`

Identity

Input unit: one row from index.jsonl.
Output unit: one attempt record per (item_id, sample_index).
Adapters: mock (deterministic local baseline), ollama (local inference endpoint), openai (OpenAI-compatible chat completions endpoint).
Verification modes: none, synthetic, repo_replay.

Row loading validates required fields and fails on malformed rows.
Selection is deterministic from split_policy, goal_slice, seed, and optional max_items.
Multi-sample runs use deterministic per-attempt seeds from (seed, item_id, sample_index).
Verification only runs when generation had no error and tactic is non-empty and does not contain sorry/admit.
repo_replay can load per-repo profile config (schema_version: 1) to override replay policy by repo_remote and/or repo_lean_version.

Proposal metrics (summary.json):
- valid_rate, nonempty_rate, contains_sorry_rate
- latency (latency_ms_mean, latency_ms_p50, latency_ms_p95)
- generation error taxonomy (generation_error_kinds, generation_error_domains)
Verification metrics (summary.json):
- attempted/success/error rates and latency
- verification_pass_at_k_* using item-level success aggregation
- verification error taxonomy (verification_error_kinds, verification_error_domains)
Output files:
- attempts.jsonl
- summary.json

selected_rows_hash pins the selected item set.
Optional deterministic sharding (--shard-count, --shard-index) partitions selected items by stable item_id hashing.
Replay profile provenance is explicit in artifacts (profile_config_path, profile_match_counts, and per-attempt repo_replay_profile_id).
Pass@k values are bounded to samples_per_item.
Frozen split generation writes split_manifest.json and contamination_report.json with source index SHA-256 and split config.

Tactic extraction keeps only the first tactic line from model output.
synthetic verification checks reconstructed scripts from goal_text; it is not equivalent to repo-context replay.
repo_replay verifies patched-file Lean execution at repo@commit; it is first-pass and not full-project CI orchestration.