5.4 Pro Review Packet

This packet is the minimal handoff for reviewing the current state of the equational-theories-distillation addendum.

Objective

Build the strongest possible plain-text cheatsheet for the SAIR Equational Theories Stage 1 competition.

The operating assumption is:

  • inference-time model gets no tools
  • submission is a plain-text cheatsheet under 10 KB
  • the public benchmark is a teacher surface, not the final target
  • the real goal is a compact decision procedure that generalizes beyond the public split

Ground Truth Surface

The addendum pins four source assets:

  • public normal.jsonl
  • public hard.jsonl
  • equations.txt
  • graph.json

The public benchmark maps exactly into the fixed 4694-law implication graph:

  • public label agreement with graph.json: 1200/1200

So the graph is the offline teacher surface.

Current System

Current pipeline:

  1. map every public problem into the 4694-law universe
  2. mine theorem-backed always-true source-law families from the full graph
  3. evaluate each pair against a fixed 10-table 2-element basis
  4. mine exact source-row semantics over the same 2-element theory classes
  5. mine false implications by explicit finite countermodels
  6. mine nontrivial short kernel bridges for the true-side residual
  7. extend that into a tiny 2-kernel micro-rewrite layer
  8. distill the surviving rules into a compact cheatsheet draft

False-side search stack:

  • fixed 10-table 2-element pair evaluator
  • greedy 6-operation 2-element witness battery
  • exhaustive 3-element brute force
  • residual SAT search at size 4
  • residual SAT search at size 5

True-side stack:

  • collapse family detection
  • mixed-self-reference-with-singleton family detection
  • direct substitution-instance matcher
  • one-hole contextual-instance matcher
  • nontrivial short-kernel bridge search
  • exact source-row semantics under the 2-element theory classes

Strongest Results

True side:

  • collapse family: source law x = t with x absent from t
    • full graph: 816/816 imply every target law
    • public true coverage: 243/574
  • mixed-self-reference-with-singleton family:
    • source law has form x = t
    • every occurrence of x in t lies on a mixed path whose root-to-leaf trace uses both L and R
    • some other variable appears exactly once in t
    • the x-path set does not contain both exact LR and RL
    • full graph: 680/680 imply every target law
    • public true coverage: 188/574
  • one-hole context closure: 6/574
  • combined theorem-backed true coverage: 433/574 (75.44%)
  • remaining theorem-uncovered public true cases: 141

Source-row semantics:

  • there are 10 distinct 2-element theory classes induced by the 16 binary operations
  • for 2407/4694 source laws, the full implication row is exactly the intersection of the satisfying 2-element theories
  • on the public set this is exact for 669/1200 problems
  • public exact split: 558 true, 111 false
  • among the 141 theorem-uncovered public true cases, exact source-row semantics explains 126
  • dominant exact public source fingerprints:
    • constant: 92 problems, 48 true
    • right_projection: 39 problems, 30 true
    • left_projection: 36 problems, 29 true
    • not_right + right_projection: 32 problems, 13 true
    • left_projection + not_left: 18 problems, 6 true

Pair evaluator:

  • a fixed 10-representative 2-element basis separates 467/626 public false problems
  • this is 13 better than the old greedy six-table battery
  • the current cheatsheet draft now uses this full basis instead of source-bucket lookup as the main offline evaluator

Collision analysis:

  • there are 182 raw source-fingerprint buckets
  • only 51 are single-row buckets
  • 131 buckets are internally divergent
  • only 51 laws live in single-row buckets; the other 4643 laws live in divergent buckets
  • worst collision buckets:
    • empty fingerprint: 1558 laws, 1125 row signatures
    • constant: 671 laws, 579 row signatures
    • left_projection: 220 laws, 199 row signatures
    • right_projection: 220 laws, 195 row signatures
    • xor_xnor: 182 laws, 179 row signatures

False side:

  • 10-table pair evaluator: 467/626
  • greedy 2-element battery: 454/626
  • exhaustive 3-element search: +84
  • size-4 SAT at 1000 ms: +51
  • size-5 SAT at 1000 ms: +16
  • total explained false cases: 605/626 (96.65%)
  • remaining unexplained public false cases: 21
  • increasing the unresolved size-5 cases from 1000 ms to 3000 ms added 0 new witnesses
  • residual size-4 pair statuses: 20 sat / 1 unknown / 16 unsat
  • residual size-5 pair statuses: 7 sat / 3 unsat / 7 unknown
  • residual search is now deduped by exact source-target pair
    • the final 21 false cases are only 10 unique pairs over 9 unique sources

Kernel proof layer:

  • exclude the vacuous case where the kernel is just the target law itself
  • restrict kernels to operation count <= 4
  • nontrivial kernel bridges explain 13/15 current residual true problems
  • that covers 10/12 residual true unique pairs
  • teacher-side candidate surface becomes 1177/1200
  • a 2-kernel micro-rewrite layer closes both remaining true pairs:
    • 711 -> 3567
    • 2170 -> 4640
  • teacher-side candidate surface after the micro-rewrite layer: 1179/1200
  • remaining candidate tail after the proof layers: 21 false and 0 true

Combined public decision surface:

  • theorem-backed TRUE + exact source-row semantics + explicit false witnesses = 1164/1200
  • unresolved exact-decision tail: 15 true and 21 false
  • unresolved true fingerprints:
    • xor_xnor: 7
    • right_projection: 3
    • constant: 3
    • not_left_and_right_left_implies_right + right_projection: 1
    • left_projection + not_left: 1
  • unresolved false fingerprints:
    • empty fingerprint: 9
    • xor_xnor: 5
    • right_projection: 4
    • left_projection: 3

False-tail followup:

  • size-5 residual statuses are 7 sat / 3 unsat / 7 unknown
  • canonicalizing the 7 size-5 unknown pairs to the smallest mutually implied source and target laws changes 1 of them from unknown to unsat
  • canonical followup statuses: 6 unknown / 1 unsat

Current Direction

The addendum is now centered on a compact pair evaluator:

  • theorem-backed all-target TRUE families first
  • then the fixed 10-table 2-element separator basis
  • then direct substitution/context checks
  • then a short-kernel proof layer with a tiny 2-kernel micro-rewrite extension
  • false-side search quality matters more than bigger timeout because the size-5 tail now has explicit unknown cases
  • source-row semantics remains the teacher-side lens, but raw fingerprint buckets are no longer treated as the final offline abstraction

Review Questions

Please focus on these:

  1. Is the new pair-evaluator framing actually the right Stage 1 abstraction, or is there a stronger compact evaluator to distill?
  2. Is the current short-kernel + 2-kernel micro-rewrite layer the right shallow proof-search mechanism, or should it be replaced by something cleaner?
  3. What is the best next false-side move for the residual 21 cases given the 7 unknown size-5 statuses: better SAT encoding, symmetry breaking, a different solver, or a different small semantic basis?
  4. Is canonicalization before model search enough for the current unknown tail, or should the solver surface itself change next?
  5. Is the current cheatsheet draft actually aligned with the competition objective as an offline decision procedure, or is there a better distillation target?

Key Files

Start here:

  • README.md
  • docs/public-benchmark-notes.md
  • docs/cheatsheet-v0.txt
  • equational_theories_distillation/source_row_semantics.py
  • equational_theories_distillation/source_patterns.py
  • equational_theories_distillation/analysis.py
  • equational_theories_distillation/cheatsheet.py
  • tmp/equational-theories-distillation/analysis/public-analysis.json

Repro

cd addenda/equational-theories-distillation
uv sync
uv run ruff check .
uv run ty check
uv run python -m pytest
uv run python -m equational_theories_distillation analyze \
  --size4-sat-timeout-ms 1000 \
  --size5-sat-timeout-ms 1000
uv run python -m equational_theories_distillation draft-cheatsheet