GeodeBench v0 (MVP)
- S slice (quadratic):
t2
only; targetalpha
. - G slice (multivariate): small
t2,t3
nonzero; targetalpha
.
Tasks:
- Coefficient prediction: recover truncated coefficients from samples.
- Geode recovery: predict alpha
from slice inputs.
- Invariance checks: evaluate symmetry/generalization splits.
Generate data (small demo):
python bench/generate_slices.py --degrees 3,5,8 --trials 10 --out docs/assets/geode_slices.csv
Large-scale generator with splits/shards:
python bench/generate_geodebench.py \
--degrees 3,5,8,12,16,20 \
--trials 1000 \
--slices S,FUSS3,FUSS4,BITRI,MIXED \
--out_prefix docs/assets/geodebench \
--shard_size 50000 \
--seed 0
# writes docs/assets/geodebench_part-0000.csv, ...
Splits:
- easy: S
, FUSSd
slices
- medium: BITRI
- hard: MIXED
(and unseen higher degrees)
Invariance (scaffolding):
- Euler residual | (V - E + F) - 1 |
averaged over small layering levels; lower is better.
Starter notebook: notebooks/GeodeBench_Starter.ipynb
Baselines:
# Linear baseline
python scripts/baseline_transformer.py --in docs/assets/geode_slices.csv --out_csv docs/assets/gb_baseline_demo.csv
# Tiny Transformer (PyTorch)
python scripts/baseline_tiny_transformer.py --in docs/assets/geode_slices.csv --out_csv docs/assets/gb_tinytx_demo.csv --epochs 10 --hidden 64 --heads 4 --layers 2
# Sharded input example
python scripts/baseline_tiny_transformer.py --in docs/assets/geodebench_part-0000.csv,docs/assets/geodebench_part-0001.csv --out_csv docs/assets/gb_tinytx_sharded.csv --epochs 3
Leaderboard submission (CSV):
- Columns: Method,Split,Metric,Score,NumExamples
- Validate locally:
python scripts/leaderboard_validate.py --submission docs/assets/gb_tinytx_sharded.csv
Head-to-head and break-even (predict+polish):
# Generate quick Newton vs Hybrid CSV
python scripts/bench_newton_vs_hybrid.py --degrees 3,5,8 --trials 3 --out docs/assets/newton_vs_hybrid_quick.csv
# Compute break-even vs Newton assuming AI inference=0.05ms and 40% polish speedup
python scripts/bench_h2h_predict_polish.py \
--bench_csv docs/assets/newton_vs_hybrid_quick.csv \
--base_method hybrid \
--compare_method newton \
--inference_ms 0.05 \
--polish_factor 0.6 \
--out_prefix docs/assets/h2h_quick
# Outputs: docs/assets/h2h_quick_detail.csv, h2h_quick_summary.csv
Interpretation: - On small degrees and CPU, Newton is extremely fast; AI+polish won’t beat it unless inference is near-zero and polish reduces time drastically. - On harder distributions (clusters, |t|≈1) or with GPU batched polish, the break-even moves in favor of AI+polish.
Why predict+polish makes sense
We compare Newton-only time t_newton
to predict+polish time t_ai = t_infer + s_polish × f_reduction × t_polish_baseline
:
- t_infer
: model inference per instance
- s_polish
: hardware scaling (e.g., GPU speedup for polish)
- f_reduction
: reduction in polish work from better seeds (measured)
- t_polish_baseline
: polish time from a standard seed
Break-even when t_ai < t_newton
. This holds in practical regimes:
- Batched workloads (vision calibration/inversion; spectral/AR roots)
- Streaming/nearby instances (control pole placement)
- Tough root geometry (clusters, near-multiples, |t|≈1)
Next in docs: we will report measured f_reduction
from learned seeds and GPU s_polish
, plus exact thresholds per degree and toughness.
Break-even results (early)
Artifacts:
- Cluster epsilon summary: docs/assets/h2h_tough_hd_cluster_epsilon_summary.csv
- Scale ratio summary: docs/assets/h2h_tough_hd_scale_ratio_summary.csv
- Plots: docs/assets/h2h_cluster_epsilon.png
, docs/assets/h2h_scale_ratio.png
Highlights: - deg=32 (cluster): first epsilon where AI+polish wins is ~1e-5 (with GPU-like polish scaling). - deg=32 (ill-scaled): first scale ratio where AI+polish wins is ~1.0 (with GPU-like polish scaling). - deg=24: no wins yet on sampled grid; tighter epsilons and more seeds may change this.
High-degree batched polish (CPU vs MPS)
Plot: docs/assets/high_degree_cpu_vs_mps.png
Measured (MPS on Apple Silicon): - deg=64, B=1024: near parity. - deg=128, B=1024: ~2.9× faster vs CPU. - deg=256, B=512: ~2.1× faster vs CPU. - deg=512, B=512: ~3.6× faster vs CPU. - deg=1024, B=256: ~3.4× faster vs CPU.
Newton vs AI+Polish at high degrees
Plot: docs/assets/newton_vs_ai_polish_hd.png
- Newton modeled from degrees 32/64 (quadratic trend) vs AI+Polish per-instance time derived from batched CPU and MPS runs (inference=0.05ms, polish_factor=0.6).
- Shows the crossover where AI+Polish overtakes Newton at higher degrees, especially with MPS.
Guidance: For high-degree, batched workloads in control/filters/PDE/crypto, MPS gains are material; predict+polish is compelling when paired with batched GPU polish.
Leaderboard (v0 preview)
Method | Split | Metric | Score |
---|---|---|---|
Linear baseline | Random (S+G) | MAE( | alpha |
Naive OEIS-style | Symmetry holdout | MAE( | alpha |
Tiny Transformer (stub) | Symmetry holdout | MAE( | alpha |