Separate case-optimization correctness testing from performance benchmarks

## Problem

The benchmark suite (`./mfc.sh bench`) currently uses `--case-optimization` for all 5 benchmark cases on both the PR and MASTER branches. This conflates two concerns:

1. **Performance regression detection** — comparing PR vs MASTER grind times
2. **Case-optimization correctness** — verifying that case-optimized binaries produce correct results

Case optimization generates a custom binary per case by injecting 20+ Fypp macros at compile time (WENO order, num_dims, num_fluids, viscous, mhd, etc.). This gives ~10x runtime speedup but requires a full rebuild for every benchmark case. The build phase dominates wall time — on Phoenix GPU-OMP, building 5 case-optimized binaries takes ~34 minutes, while the benchmarks themselves run in ~12 minutes.

The consequence: benchmark CI jobs are slow, fragile, and more likely to hit SLURM time limits or preemption windows (especially CPU benchmarks which need 3+ hours total). Meanwhile, performance regressions are detectable without case optimization — the relative difference between PR and MASTER is what matters, not absolute grind time.

## Proposed changes

### 1. Remove `--case-optimization` from benchmarks

Run `./mfc.sh bench` without `--case-optimization`. Both PR and MASTER use the same generic (non-optimized) binary, so relative performance comparisons remain valid. Benefits:
- **One build instead of five** — a single generic binary serves all benchmark cases
- **Faster CI** — eliminates ~30 min of redundant builds per side (PR + MASTER)
- **More robust** — shorter SLURM jobs are less likely to hit time limits or preemption

### 2. Add case-optimization correctness tests

Currently, `--case-optimization` is never tested for correctness in CI. A Fypp macro bug in case-optimized codegen could silently produce wrong results.

Add a small set of tests (based on the existing benchmark cases) that build with `--case-optimization` at small grid sizes and compare against golden files. The current 5 benchmark cases (`5eq_rk3_weno3_hllc`, `viscous_weno5_sgb_acoustic`, `ibm`, `hypo_hll`, `igr`) already cover the key case-optimization parameter space — just run them small and fast.

### Summary

After the split:
- **Test suite** validates: "does `--case-optimization` produce the same answers as the generic build?"
- **Benchmark suite** validates: "did this PR introduce a performance regression?"

Each does one job well instead of both doing both poorly.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Separate case-optimization correctness testing from performance benchmarks #1275

Problem

Proposed changes

1. Remove `--case-optimization` from benchmarks

2. Add case-optimization correctness tests

Summary

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Separate case-optimization correctness testing from performance benchmarks #1275

Description

Problem

Proposed changes

1. Remove --case-optimization from benchmarks

2. Add case-optimization correctness tests

Summary

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

1. Remove `--case-optimization` from benchmarks