Skip to content

Separate case-optimization correctness testing from performance benchmarks #1275

@sbryngelson

Description

@sbryngelson

Problem

The benchmark suite (./mfc.sh bench) currently uses --case-optimization for all 5 benchmark cases on both the PR and MASTER branches. This conflates two concerns:

  1. Performance regression detection — comparing PR vs MASTER grind times
  2. Case-optimization correctness — verifying that case-optimized binaries produce correct results

Case optimization generates a custom binary per case by injecting 20+ Fypp macros at compile time (WENO order, num_dims, num_fluids, viscous, mhd, etc.). This gives ~10x runtime speedup but requires a full rebuild for every benchmark case. The build phase dominates wall time — on Phoenix GPU-OMP, building 5 case-optimized binaries takes ~34 minutes, while the benchmarks themselves run in ~12 minutes.

The consequence: benchmark CI jobs are slow, fragile, and more likely to hit SLURM time limits or preemption windows (especially CPU benchmarks which need 3+ hours total). Meanwhile, performance regressions are detectable without case optimization — the relative difference between PR and MASTER is what matters, not absolute grind time.

Proposed changes

1. Remove --case-optimization from benchmarks

Run ./mfc.sh bench without --case-optimization. Both PR and MASTER use the same generic (non-optimized) binary, so relative performance comparisons remain valid. Benefits:

  • One build instead of five — a single generic binary serves all benchmark cases
  • Faster CI — eliminates ~30 min of redundant builds per side (PR + MASTER)
  • More robust — shorter SLURM jobs are less likely to hit time limits or preemption

2. Add case-optimization correctness tests

Currently, --case-optimization is never tested for correctness in CI. A Fypp macro bug in case-optimized codegen could silently produce wrong results.

Add a small set of tests (based on the existing benchmark cases) that build with --case-optimization at small grid sizes and compare against golden files. The current 5 benchmark cases (5eq_rk3_weno3_hllc, viscous_weno5_sgb_acoustic, ibm, hypo_hll, igr) already cover the key case-optimization parameter space — just run them small and fast.

Summary

After the split:

  • Test suite validates: "does --case-optimization produce the same answers as the generic build?"
  • Benchmark suite validates: "did this PR introduce a performance regression?"

Each does one job well instead of both doing both poorly.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions