-
Notifications
You must be signed in to change notification settings - Fork 132
Description
Problem
The benchmark suite (./mfc.sh bench) currently uses --case-optimization for all 5 benchmark cases on both the PR and MASTER branches. This conflates two concerns:
- Performance regression detection — comparing PR vs MASTER grind times
- Case-optimization correctness — verifying that case-optimized binaries produce correct results
Case optimization generates a custom binary per case by injecting 20+ Fypp macros at compile time (WENO order, num_dims, num_fluids, viscous, mhd, etc.). This gives ~10x runtime speedup but requires a full rebuild for every benchmark case. The build phase dominates wall time — on Phoenix GPU-OMP, building 5 case-optimized binaries takes ~34 minutes, while the benchmarks themselves run in ~12 minutes.
The consequence: benchmark CI jobs are slow, fragile, and more likely to hit SLURM time limits or preemption windows (especially CPU benchmarks which need 3+ hours total). Meanwhile, performance regressions are detectable without case optimization — the relative difference between PR and MASTER is what matters, not absolute grind time.
Proposed changes
1. Remove --case-optimization from benchmarks
Run ./mfc.sh bench without --case-optimization. Both PR and MASTER use the same generic (non-optimized) binary, so relative performance comparisons remain valid. Benefits:
- One build instead of five — a single generic binary serves all benchmark cases
- Faster CI — eliminates ~30 min of redundant builds per side (PR + MASTER)
- More robust — shorter SLURM jobs are less likely to hit time limits or preemption
2. Add case-optimization correctness tests
Currently, --case-optimization is never tested for correctness in CI. A Fypp macro bug in case-optimized codegen could silently produce wrong results.
Add a small set of tests (based on the existing benchmark cases) that build with --case-optimization at small grid sizes and compare against golden files. The current 5 benchmark cases (5eq_rk3_weno3_hllc, viscous_weno5_sgb_acoustic, ibm, hypo_hll, igr) already cover the key case-optimization parameter space — just run them small and fast.
Summary
After the split:
- Test suite validates: "does
--case-optimizationproduce the same answers as the generic build?" - Benchmark suite validates: "did this PR introduce a performance regression?"
Each does one job well instead of both doing both poorly.