AI+Polish: Learned Warm-Starts + Classical Refinement
AI+Polish combines small learned predictors with proven iterative solvers to achieve fast, reliable polynomial root-finding at scale.
What “AI+Polish” means
- AI: a compact supervised model (linear/MLP/tiny Transformer) predicts good initializations or step hints from the polynomial’s coefficients or Geode-derived features (e.g., t-slices, layerings).
- Polish: a classical, numerically robust finisher (Aberth–Ehrlich, Halley, Newton) refines those warm-starts to machine-precision roots.
You retain solver guarantees (tiny residuals) while reducing iterations and stalls, especially in hard regimes (clusters, near-multiples, |t|≈1) and at high degrees.
What parts of the Geode paper we use
- Series reversion (Lagrange inversion): Hyper‑Catalan S[t2,t3,…] provides analytic seeds; resummation (Padé/Borel/auto) stabilizes near |t|≈1.
- Geode t-mapping and center selection: minimize |t| = |-a0/a1| over candidate centers; S-evaluations yield local step y = t·α.
- Qcubic (Bi–Tri) approximant: fast cubic Geode approximant to predict α and displacements when full S is heavy.
- Geode factorization (S−1 = S1·G): motivates invariant targets/features (α, slices) for training and underpins GeodeBench symmetry tasks.
- Layerings and Euler invariance: vertex/edge/face layerings and V−E+F=1 create structure-aware features, regularizers, and split criteria.
- Convergence guidance: radius/weight arguments determine when to resummate, how many series passes, and when to drop to Aberth.
Net: AI predicts better seeds/centers/α using S/Qcubic/Geode structure; polish (Aberth/Halley) finishes to machine precision.
What type of AI? Does it need training?
- Model types: linear baselines, small MLPs, or tiny Transformers over coefficient features or Geode features.
- Training: optional.
- No training: analytic S/Qcubic already give strong seeds; polish alone works.
- With training: small models (trained on GeodeBench or domain data) reduce iterations and variance, improving wall-clock—especially with batched GPU polish.
- Inputs/outputs: inputs are coeffs or Geode features; outputs are a few complex scalars (centers, α, seeds, damping) used to initialize polish.
Where it’s practical (wins)
- High-degree batched workloads (control, filters, PDE, crypto): measured MPS wins vs CPU at degrees 128–1024; predict+polish leverages batched polish.
- Tough geometry: clustered or near-multiple roots, ill-scaled coefficients, and |t|≈1 regimes.
- Streaming/nearby instances: consecutive problems share structure (e.g., pole placement); warm-starts cut iterations.
Break-even intuition
We compare Newton-only time t_newton to predict+polish time:
- t_ai = t_infer + s_polish × f_reduction × t_polish_baseline
- t_infer: model inference per instance (O(10–100 µs) for tiny models)
- s_polish: hardware scale (e.g., MPS/CUDA speedup from batched polish)
- f_reduction: measured iteration/time reduction from better seeds
- t_polish_baseline: polish time from a standard seed
Break-even holds when t_ai < t_newton. In practice, this appears at higher degrees/batches, or under hard distributions. See:
- docs/assets/newton_vs_ai_polish_hd.png
(modeled Newton vs AI+Polish CPU/MPS at high degrees)
- docs/assets/h2h_cluster_epsilon.png
, docs/assets/h2h_scale_ratio.png
(toughness sweeps)
Getting started
- Run baselines:
scripts/baseline_transformer.py
,scripts/baseline_tiny_transformer.py
. - Generate data:
bench/generate_geodebench.py
(S/FUSS/Bi‑Tri/Mixed slices; splits; invariance). - Use polish finishers:
method="hybrid"|"aberth"|"dk"
withresum="auto"
for robust seeds. - Batched polish: Torch MPS/CUDA via
geodepoly.batched
for throughput.
References
- Paper mapping:
docs/paper_guide.md
- AI overview and use cases:
docs/ai_overview.md
,docs/ai_use_cases.md
- GeodeBench:
docs/geodebench.md