AI+Polish: Learned Warm-Starts + Classical Refinement

AI+Polish combines small learned predictors with proven iterative solvers to achieve fast, reliable polynomial root-finding at scale.

What “AI+Polish” means

AI: a compact supervised model (linear/MLP/tiny Transformer) predicts good initializations or step hints from the polynomial’s coefficients or Geode-derived features (e.g., t-slices, layerings).
Polish: a classical, numerically robust finisher (Aberth–Ehrlich, Halley, Newton) refines those warm-starts to machine-precision roots.

You retain solver guarantees (tiny residuals) while reducing iterations and stalls, especially in hard regimes (clusters, near-multiples, |t|≈1) and at high degrees.

What parts of the Geode paper we use

Series reversion (Lagrange inversion): Hyper‑Catalan S[t2,t3,…] provides analytic seeds; resummation (Padé/Borel/auto) stabilizes near |t|≈1.
Geode t-mapping and center selection: minimize |t| = |-a0/a1| over candidate centers; S-evaluations yield local step y = t·α.
Qcubic (Bi–Tri) approximant: fast cubic Geode approximant to predict α and displacements when full S is heavy.
Geode factorization (S−1 = S1·G): motivates invariant targets/features (α, slices) for training and underpins GeodeBench symmetry tasks.
Layerings and Euler invariance: vertex/edge/face layerings and V−E+F=1 create structure-aware features, regularizers, and split criteria.
Convergence guidance: radius/weight arguments determine when to resummate, how many series passes, and when to drop to Aberth.

Net: AI predicts better seeds/centers/α using S/Qcubic/Geode structure; polish (Aberth/Halley) finishes to machine precision.

What type of AI? Does it need training?

Model types: linear baselines, small MLPs, or tiny Transformers over coefficient features or Geode features.
Training: optional.
No training: analytic S/Qcubic already give strong seeds; polish alone works.
With training: small models (trained on GeodeBench or domain data) reduce iterations and variance, improving wall-clock—especially with batched GPU polish.
Inputs/outputs: inputs are coeffs or Geode features; outputs are a few complex scalars (centers, α, seeds, damping) used to initialize polish.

Where it’s practical (wins)

High-degree batched workloads (control, filters, PDE, crypto): measured MPS wins vs CPU at degrees 128–1024; predict+polish leverages batched polish.
Tough geometry: clustered or near-multiple roots, ill-scaled coefficients, and |t|≈1 regimes.
Streaming/nearby instances: consecutive problems share structure (e.g., pole placement); warm-starts cut iterations.

Break-even intuition

We compare Newton-only time t_newton to predict+polish time:

t_ai = t_infer + s_polish × f_reduction × t_polish_baseline
t_infer: model inference per instance (O(10–100 µs) for tiny models)
s_polish: hardware scale (e.g., MPS/CUDA speedup from batched polish)
f_reduction: measured iteration/time reduction from better seeds
t_polish_baseline: polish time from a standard seed

Break-even holds when t_ai < t_newton. In practice, this appears at higher degrees/batches, or under hard distributions. See: - docs/assets/newton_vs_ai_polish_hd.png (modeled Newton vs AI+Polish CPU/MPS at high degrees) - docs/assets/h2h_cluster_epsilon.png, docs/assets/h2h_scale_ratio.png (toughness sweeps)

Getting started

Run baselines: scripts/baseline_transformer.py, scripts/baseline_tiny_transformer.py.
Generate data: bench/generate_geodebench.py (S/FUSS/Bi‑Tri/Mixed slices; splits; invariance).
Use polish finishers: method="hybrid"|"aberth"|"dk" with resum="auto" for robust seeds.
Batched polish: Torch MPS/CUDA via geodepoly.batched for throughput.

References

Paper mapping: docs/paper_guide.md
AI overview and use cases: docs/ai_overview.md, docs/ai_use_cases.md
GeodeBench: docs/geodebench.md