5.4 Pro Review Packet
This packet is the minimal handoff for reviewing the current state of the equational-theories-distillation addendum.
Objective
Build the strongest possible plain-text cheatsheet for the SAIR Equational Theories Stage 1 competition.
The operating assumption is:
- inference-time model gets no tools
- submission is a plain-text cheatsheet under
10 KB - the public benchmark is a teacher surface, not the final target
- the real goal is a compact decision procedure that generalizes beyond the public split
Ground Truth Surface
The addendum pins four source assets:
- public
normal.jsonl - public
hard.jsonl equations.txtgraph.json
The public benchmark maps exactly into the fixed 4694-law implication graph:
- public label agreement with
graph.json:1200/1200
So the graph is the offline teacher surface.
Current System
Current pipeline:
- map every public problem into the
4694-law universe - mine theorem-backed always-true source-law families from the full graph
- evaluate each pair against a fixed
10-table2-element basis - mine exact source-row semantics over the same
2-element theory classes - mine false implications by explicit finite countermodels
- mine nontrivial short kernel bridges for the true-side residual
- extend that into a tiny
2-kernel micro-rewrite layer - distill the surviving rules into a compact cheatsheet draft
False-side search stack:
- fixed
10-table2-element pair evaluator - greedy
6-operation2-element witness battery - exhaustive
3-element brute force - residual SAT search at size
4 - residual SAT search at size
5
True-side stack:
- collapse family detection
- mixed-self-reference-with-singleton family detection
- direct substitution-instance matcher
- one-hole contextual-instance matcher
- nontrivial short-kernel bridge search
- exact source-row semantics under the
2-element theory classes
Strongest Results
True side:
- collapse family: source law
x = twithxabsent fromt- full graph:
816/816imply every target law - public true coverage:
243/574
- full graph:
- mixed-self-reference-with-singleton family:
- source law has form
x = t - every occurrence of
xintlies on a mixed path whose root-to-leaf trace uses bothLandR - some other variable appears exactly once in
t - the x-path set does not contain both exact
LRandRL - full graph:
680/680imply every target law - public true coverage:
188/574
- source law has form
- one-hole context closure:
6/574 - combined theorem-backed true coverage:
433/574(75.44%) - remaining theorem-uncovered public true cases:
141
Source-row semantics:
- there are
10distinct2-element theory classes induced by the16binary operations - for
2407/4694source laws, the full implication row is exactly the intersection of the satisfying2-element theories - on the public set this is exact for
669/1200problems - public exact split:
558true,111false - among the
141theorem-uncovered public true cases, exact source-row semantics explains126 - dominant exact public source fingerprints:
constant:92problems,48trueright_projection:39problems,30trueleft_projection:36problems,29truenot_right + right_projection:32problems,13trueleft_projection + not_left:18problems,6true
Pair evaluator:
- a fixed
10-representative2-element basis separates467/626public false problems - this is
13better than the old greedy six-table battery - the current cheatsheet draft now uses this full basis instead of source-bucket lookup as the main offline evaluator
Collision analysis:
- there are
182raw source-fingerprint buckets - only
51are single-row buckets 131buckets are internally divergent- only
51laws live in single-row buckets; the other4643laws live in divergent buckets - worst collision buckets:
- empty fingerprint:
1558laws,1125row signatures constant:671laws,579row signaturesleft_projection:220laws,199row signaturesright_projection:220laws,195row signaturesxor_xnor:182laws,179row signatures
- empty fingerprint:
False side:
10-table pair evaluator:467/626- greedy
2-element battery:454/626 - exhaustive
3-element search:+84 - size-
4SAT at1000 ms:+51 - size-
5SAT at1000 ms:+16 - total explained false cases:
605/626(96.65%) - remaining unexplained public false cases:
21 - increasing the unresolved size-
5cases from1000 msto3000 msadded0new witnesses - residual size-
4pair statuses:20 sat / 1 unknown / 16 unsat - residual size-
5pair statuses:7 sat / 3 unsat / 7 unknown - residual search is now deduped by exact source-target pair
- the final
21false cases are only10unique pairs over9unique sources
- the final
Kernel proof layer:
- exclude the vacuous case where the kernel is just the target law itself
- restrict kernels to operation count
<= 4 - nontrivial kernel bridges explain
13/15current residual true problems - that covers
10/12residual true unique pairs - teacher-side candidate surface becomes
1177/1200 - a
2-kernel micro-rewrite layer closes both remaining true pairs:711 -> 35672170 -> 4640
- teacher-side candidate surface after the micro-rewrite layer:
1179/1200 - remaining candidate tail after the proof layers:
21false and0true
Combined public decision surface:
- theorem-backed TRUE + exact source-row semantics + explicit false witnesses =
1164/1200 - unresolved exact-decision tail:
15true and21false - unresolved true fingerprints:
xor_xnor:7right_projection:3constant:3not_left_and_right_left_implies_right + right_projection:1left_projection + not_left:1
- unresolved false fingerprints:
- empty fingerprint:
9 xor_xnor:5right_projection:4left_projection:3
- empty fingerprint:
False-tail followup:
- size-
5residual statuses are7 sat / 3 unsat / 7 unknown - canonicalizing the
7size-5unknown pairs to the smallest mutually implied source and target laws changes1of them fromunknowntounsat - canonical followup statuses:
6 unknown / 1 unsat
Current Direction
The addendum is now centered on a compact pair evaluator:
- theorem-backed all-target TRUE families first
- then the fixed
10-table2-element separator basis - then direct substitution/context checks
- then a short-kernel proof layer with a tiny
2-kernel micro-rewrite extension - false-side search quality matters more than bigger timeout because the size-
5tail now has explicitunknowncases - source-row semantics remains the teacher-side lens, but raw fingerprint buckets are no longer treated as the final offline abstraction
Review Questions
Please focus on these:
- Is the new pair-evaluator framing actually the right Stage 1 abstraction, or is there a stronger compact evaluator to distill?
- Is the current short-kernel +
2-kernel micro-rewrite layer the right shallow proof-search mechanism, or should it be replaced by something cleaner? - What is the best next false-side move for the residual
21cases given the7 unknownsize-5statuses: better SAT encoding, symmetry breaking, a different solver, or a different small semantic basis? - Is canonicalization before model search enough for the current unknown tail, or should the solver surface itself change next?
- Is the current cheatsheet draft actually aligned with the competition objective as an offline decision procedure, or is there a better distillation target?
Key Files
Start here:
README.mddocs/public-benchmark-notes.mddocs/cheatsheet-v0.txtequational_theories_distillation/source_row_semantics.pyequational_theories_distillation/source_patterns.pyequational_theories_distillation/analysis.pyequational_theories_distillation/cheatsheet.pytmp/equational-theories-distillation/analysis/public-analysis.json
Repro
cd addenda/equational-theories-distillation
uv sync
uv run ruff check .
uv run ty check
uv run python -m pytest
uv run python -m equational_theories_distillation analyze \
--size4-sat-timeout-ms 1000 \
--size5-sat-timeout-ms 1000
uv run python -m equational_theories_distillation draft-cheatsheet