Skip to content

Add gcov-based test pruning with coverage cache#1284

Open
sbryngelson wants to merge 7 commits intoMFlowCode:masterfrom
sbryngelson:gcov-test-pruning-v3
Open

Add gcov-based test pruning with coverage cache#1284
sbryngelson wants to merge 7 commits intoMFlowCode:masterfrom
sbryngelson:gcov-test-pruning-v3

Conversation

@sbryngelson
Copy link
Member

@sbryngelson sbryngelson commented Mar 2, 2026

Summary

  • File-level gcov coverage cache maps 555 test UUIDs to exercised .fpp source files (gzip JSON, 11KB, committed to repo)
  • --only-changes flag prunes tests by intersecting PR-changed files against coverage cache
  • --build-coverage-cache flag + 3-phase parallel cache builder (prepare, run, gcov collect)
  • New rebuild-cache CI job runs on Phoenix via SLURM when cases.py or Fortran dependency graph changes
  • Dep-change detection greps PR/push diffs for added use/include statements
  • Conservative fallbacks: missing cache runs all, missing sim coverage includes test, ALWAYS_RUN_ALL files trigger full suite
  • Remove continue-on-error from github CI job (fixes auto-cancellation)
  • 53 unit tests cover core coverage logic
  • TEMP: duplicate use in m_bubbles.fpp + remove CMakeLists.txt from ALWAYS_RUN_ALL to test the full cache rebuild pipeline in CI

Replaces #1283.

Test plan

  • CI lint checks pass
  • rebuild-cache job triggers (dep_changed detection)
  • Github test jobs download cache artifact and prune tests via --only-changes
  • Revert TEMP changes before merge

🤖 Generated with Claude Code

@github-actions
Copy link

github-actions bot commented Mar 2, 2026

Claude Code Review

Head SHA: c78d04bf37264c0b5c1890d7e0043ece051c7886
Files changed: 16

  • .github/file-filter.yml
  • .github/workflows/frontier/test.sh
  • .github/workflows/frontier_amd/test.sh
  • .github/workflows/phoenix/rebuild-cache.sh (new)
  • .github/workflows/phoenix/submit.sh
  • .github/workflows/phoenix/test.sh
  • .github/workflows/test.yml
  • .gitignore
  • CMakeLists.txt
  • src/simulation/m_bubbles.fpp
  • toolchain/mfc/cli/commands.py
  • toolchain/mfc/test/case.py
  • toolchain/mfc/test/coverage.py (new, 664 lines)
  • toolchain/mfc/test/test.py
  • toolchain/mfc/test/test_coverage_cache.json.gz (new)
  • toolchain/mfc/test/test_coverage_unit.py (new, 656 lines)

Summary

  • Adds a file-level gcov coverage cache (gzip JSON) mapping 555 test UUIDs to exercised .fpp sources, enabling PR-scoped test pruning via --only-changes.
  • Introduces --build-coverage-cache (3-phase: prepare, run, gcov) and rebuild-cache CI job that triggers on dep-graph changes; commits updated cache to master on push.
  • Conservative fallbacks: missing cache runs all; tests not in cache are included; simulation-file changes conservatively include tests with no simulation coverage.
  • 53 unit tests cover core coverage logic with offline mocks; test.py grows --only-changes branch in __filter.
  • Two TEMP changes are present and must be reverted before merge (acknowledged in PR description).

Findings

1. TEMP changes must not be merged — src/simulation/m_bubbles.fpp

File: src/simulation/m_bubbles.fpp, diff line +419

Duplicate use statement is explicitly called out in the PR as a TEMP artifact for pipeline testing. Must be reverted. Will fail lint/precheck in strict compilers.

2. TEMP change — CMakeLists.txt absent from ALWAYS_RUN_ALL

File: toolchain/mfc/test/coverage.py, around line 604–617
CMakeLists.txt was removed from ALWAYS_RUN_ALL (per PR body) to force the full cache rebuild to trigger in CI. CMake flag changes (optimisation levels, feature flags) affect test correctness and must be in ALWAYS_RUN_ALL before merge. If this entry is missing post-merge, a CMakeLists change could silently skip all tests.

3. Security: rebuild-cache job checks out fork code with contents: write

File: .github/workflows/test.yml, lines 197–232

For a pull_request event from a fork, github.event.pull_request.head.sha is the fork's commit. The job then executes .github/workflows/phoenix/rebuild-cache.sh from that checkout — i.e., code from the fork — under contents: write permissions on a self-hosted runner. A malicious fork PR that passes the cases_py or dep_changed trigger condition could push arbitrary commits to the base repo. Consider:

  • Pinning the checkout to github.base_ref / github.sha (base branch) instead of the fork's head SHA for the CI scripts, or
  • Switching to pull_request_target with explicit trust gating, or
  • Restricting contents: write to push/dispatch events only.

4. Unhandled FileNotFoundError in load_coverage_cache

File: toolchain/mfc/test/coverage.py, line ~1107

If toolchain/mfc/test/cases.py does not exist (unusual but possible in a partial checkout or CI artefact context), this raises an unhandled FileNotFoundError. Should be wrapped in try/except consistent with the surrounding cache-load error handling, returning None on failure.

5. _run_single_test_direct: missing binary is a silent coverage gap

File: toolchain/mfc/test/coverage.py, line ~869–871

When a binary is missing, the test continues with zero (or partial) coverage for that target. The resulting cache entry will be under-populated, causing the pruning step to incorrectly skip tests that exercise the missing binary. Consider recording a distinct marker (e.g., None) for tests whose cache entries are incomplete so filter_tests_by_coverage can treat them conservatively.

6. Phase labels are confusing

File: toolchain/mfc/test/coverage.py, lines ~975, ~985, ~1027

There are 3 phases (0, 1, 2) but the denominator is 2. Use 1-indexed Phase 1/3 / Phase 2/3 / Phase 3/3 to match the expected convention.


Minor / Non-blocking

  • find_gcov_binary unused parameter (coverage.py line ~643): _root_dir: str = "" is never read inside the function. The pylint disable comment acknowledges this but the parameter adds noise; remove it or add a TODO comment explaining future intent.
  • Python toolchain files not in ALWAYS_RUN_ALL: toolchain/mfc/test/test.py and toolchain/mfc/cli/commands.py are not listed. A change to those files would fall through to the .fpp-only filter, potentially skipping all tests. If intentional (they don't change test enumeration), document it; otherwise add them.
  • dep-check regex in test.yml (line ~172): The grep pattern uses -P (PCRE) which may not be available on all CI runner images (BSD grep, etc.). Consider -E extended regex or guard with grep --version check.

@github-actions
Copy link

github-actions bot commented Mar 2, 2026

Claude Code Review

Head SHA: 110e00c

Files changed: 16

  • .github/file-filter.yml
  • .github/workflows/frontier/test.sh (+ frontier_amd)
  • .github/workflows/phoenix/rebuild-cache.sh (new)
  • .github/workflows/phoenix/submit.sh
  • .github/workflows/phoenix/test.sh
  • .github/workflows/test.yml
  • .gitignore
  • CMakeLists.txt
  • src/simulation/m_bubbles.fpp
  • toolchain/mfc/cli/commands.py
  • toolchain/mfc/test/case.py, test.py, coverage.py (new), test_coverage_unit.py (new), test_coverage_cache.json.gz (new)

Summary

  • Adds file-level gcov coverage cache mapping 555 test UUIDs → covered .fpp source files, committed as a gzip binary blob.
  • New --only-changes flag filters the test suite at PR time by intersecting changed files against the coverage cache.
  • New --build-coverage-cache flag runs a 3-phase parallel cache builder (prepare → run → gcov collect).
  • New rebuild-cache CI job triggers on Phoenix via SLURM when cases.py or Fortran use/include statements change.
  • 53 unit tests cover the core coverage logic. Conservative fallbacks are well-designed (missing cache → run all; no sim coverage recorded → include conservatively).

Findings

🔴 Must Revert Before Merge (PR explicitly acknowledges these as TEMP)

1. Duplicate use in src/simulation/m_bubbles.fpp — line 419 in diff
use m_helper_basic appears twice. Harmless at runtime but will likely generate a compiler warning and violates the codebase style. Must be removed before merge.

2. Stripped ALWAYS_RUN_ALLtoolchain/mfc/test/coverage.py lines 606–611
macros.fpp, cases.py, case.py, definitions.py, input.py, case_validator.py, case.fpp, and coverage.py were intentionally removed to exercise CI pruning. The corresponding unit tests (lines ~1545–1556) also now test that macros.fpp does not trigger all, which is backward from the intended post-merge behaviour.

3. Empty ALWAYS_RUN_ALL_PREFIXEScoverage.py line 618
The cmake/ prefix was removed as a TEMP change. The full list must be restored, including at minimum cmake/ and toolchain/ prefix coverage.


🟡 Significant Issues

4. contents: write permission on PR-triggered rebuild-cache job — test.yml line 197
The rebuild-cache job grants permissions: contents: write unconditionally. For push/workflow_dispatch events this is required (to commit the updated cache to master). However, PRs that trigger this job (when cases_py or dep_changed is true) also receive the elevated permission, even though only the Commit Cache to Master step (guarded by if: github.event_name == 'push') uses it. Consider scoping write permission more narrowly or acknowledging this as an accepted risk.

5. submit.sh partition change affects all CPU test jobs — test.yml line 207 / submit.sh
The partition was changed from cpu-small (24 cores, 2 GB/core) to cpu-gnr (--exclusive, Granite Rapids). This change is shared by all CPU jobs submitted through submit.sh, not only the new cache-rebuild job. The existing Phoenix CPU test job will now also land on GNR nodes. If this is intentional, it should be called out explicitly; if not, a separate SLURM configuration for the cache rebuild path is safer.

6. Dep-change detection grep can fire on comment lines — test.yml line 172
grep -qP '^\+\s*(use[\s,]+\w|...)\' matches any added line starting with + that contains use <word>. This would false-positive on Fortran comment lines or documentation that happens to start with + and include the word use. The false positive only triggers an unnecessary cache rebuild (no correctness issue), but it could be tightened by also excluding !\s* prefixes.


🟢 Minor / Improvement Opportunities

7. extra_args parameter silently unused — coverage.py line 934
build_coverage_cache(root_dir, cases, extra_args=None, n_jobs=None) accepts extra_args (passed from ARG("--") in test.py) but never uses it. The pylint: disable=unused-argument suppresses the warning, but if extra args are needed in the future (e.g., to forward --mpi flags to test runs), this slot is ready. If truly not needed, the parameter could be removed.

8. cons.raw.file manipulation is implementation-fragile — coverage.py lines 913–923
Redirecting console output by directly replacing cons.raw.file relies on internal rich console state. If the console implementation changes this will silently break. A context manager or logging.disable() pattern would be more robust, though this is low priority.

9. FYPP_GCOV_OPTS is set inside a GNU-only block but used unconditionally — CMakeLists.txt line 407
The variable is only set when CMAKE_Fortran_COMPILER_ID STREQUAL "GNU" and MFC_GCov is true; for other compilers it will be empty. CMake treats unset/empty variables as empty strings, so ${FYPP_GCOV_OPTS} in HANDLE_SOURCES safely expands to nothing. This is correct but a comment clarifying the intentional empty-on-non-GNU case would help reviewers.


Pre-merge Checklist

  • Revert duplicate use m_helper_basic in m_bubbles.fpp
  • Restore ALWAYS_RUN_ALL to full production set (macros.fpp, cases.py, case.py, definitions.py, input.py, case_validator.py, case.fpp, coverage.py)
  • Restore ALWAYS_RUN_ALL_PREFIXES to include at least cmake/ and toolchain/
  • Update unit tests to match restored ALWAYS_RUN_ALL (macros.fpp should trigger all again)
  • Confirm partition change in submit.sh is intentional for all CPU jobs

@github-actions
Copy link

github-actions bot commented Mar 2, 2026

Claude Code Review

Head SHA: dbe7098f05f65cc1b71a3130d3756c2822310ff9

Files changed: 16

  • .github/file-filter.yml
  • .github/workflows/frontier/test.sh (+ frontier_amd variant)
  • .github/workflows/phoenix/rebuild-cache.sh (new)
  • .github/workflows/phoenix/submit.sh
  • .github/workflows/phoenix/test.sh
  • .github/workflows/test.yml
  • CMakeLists.txt
  • src/simulation/m_bubbles.fpp
  • toolchain/mfc/cli/commands.py
  • toolchain/mfc/test/case.py
  • toolchain/mfc/test/coverage.py (new, 661 lines)
  • toolchain/mfc/test/test.py
  • toolchain/mfc/test/test_coverage_cache.json.gz (new)
  • toolchain/mfc/test/test_coverage_unit.py (new, 608 lines)

Summary:

  • Adds gcov-based file-level test pruning: builds a coverage map (UUID -> covered .fpp files), then for PRs intersects changed files against that map to run only affected tests
  • New --only-changes / --build-coverage-cache CLI flags with conservative fallbacks (missing cache -> run all, test not in cache -> include)
  • New rebuild-cache CI job on Phoenix that rebuilds and commits the cache when cases.py or Fortran dependency graph changes
  • 53 unit tests cover the core pruning logic
  • Three deliberate TEMP hacks left in to exercise the pruning path in CI (acknowledged in PR description; must be reverted before merge)

Findings

1. TEMP changes must be reverted before merge (3 items)

src/simulation/m_bubbles.fpp:19 — Duplicate use m_helper_basic added intentionally as a dep-change trigger. This will fail precheck / linting and is semantically redundant. Must be removed.

toolchain/mfc/test/coverage.py:604–618ALWAYS_RUN_ALL is stripped to only GPU macro files, and ALWAYS_RUN_ALL_PREFIXES is emptied, so changes to macros.fpp, cases.py, CMakeLists.txt, cmake/ etc. currently do not trigger the full suite. The corresponding unit tests are also adjusted (test_coverage_unit.py:1549, 1725). Both need to be restored before the pruning logic is safe for production.


2. _prepare_test unconditionally applies POST_PROCESS_OUTPUT_PARAMS to all tests

toolchain/mfc/test/coverage.py:904:

case.params.update(get_post_process_mods(case.params))

get_post_process_mods is documented as returning params needed when post_process is a target, and it always returns POST_PROCESS_OUTPUT_PARAMS (sets parallel_io=T, prim_vars_wrt=T, etc.). This is applied to every test case in the coverage builder, even tests that never use post_process. Two consequences:

  • Tests run with different parameters than their golden-file configurations, potentially changing code paths exercised and making the coverage map inaccurate.
  • Tests that explicitly set parallel_io=F will have that overridden.

The application should be conditional on whether the test case includes post_process as a target, mirroring the generated case.py logic it references.


3. Direct git push to master from CI with contents: write

.github/workflows/test.yml:224–231:

permissions:
  contents: write
...
git push

The rebuild-cache job commits directly to master on push events, bypassing PR review, branch protection rules, and pre-commit hooks. This is an intentional design choice, but worth confirming it is compatible with branch protection settings on the repo (e.g., required_status_checks, require_pull_request_reviews). If branch protection is enforced, the push will silently fail. The job is gated to github.repository == 'MFlowCode/MFC' which is correct. The [skip ci] commit message suffix should also match whatever skip-CI convention is active.


4. exec()-based fallback import in unit tests

toolchain/mfc/test/test_coverage_unit.py:~183:

exec(compile(_src, _COVERAGE_PATH, "exec"), _globals)

The fallback path string-replaces import lines in coverage.py source then execs it. This is fragile: any change to the exact spelling of an import statement silently breaks the fallback (the except ImportError: pass swallows it). If the importlib approach works reliably when the package is properly installed, the fallback can be removed. If both paths are needed, the fallback should at least raise on failure rather than silently substituting partial stubs.


Nit

phoenix/test.sh — inside $(( )), an unset variable expands to 0, not the shell default. The expression $(( SLURM_CPUS_ON_NODE > 64 ? 64 : ${SLURM_CPUS_ON_NODE:-8} )) will evaluate to 64 when SLURM_CPUS_ON_NODE is unset (because 0 > 64 is false, but the else branch ${SLURM_CPUS_ON_NODE:-8} is expanded before the arithmetic, so it actually is 8 there — but this mixing of shell expansion and arithmetic is confusing). Clearer: ncpus=${SLURM_CPUS_ON_NODE:-8}; n_test_threads=$(( ncpus > 64 ? 64 : ncpus )).

@sbryngelson sbryngelson marked this pull request as ready for review March 2, 2026 03:49
Copilot AI review requested due to automatic review settings March 2, 2026 03:49
@qodo-code-review
Copy link
Contributor

Review Summary by Qodo

Add gcov-based test pruning with file-level coverage cache and CI integration

✨ Enhancement 🧪 Tests

Grey Divider

Walkthroughs

Description
• Add gcov-based file-level coverage cache for intelligent test pruning
  - 555 test UUIDs mapped to exercised .fpp source files (11KB gzip JSON)
  - --only-changes flag prunes tests by intersecting PR-changed files against cache
  - --build-coverage-cache flag with 3-phase parallel builder (prepare, run, gcov collect)
• New rebuild-cache CI job on Phoenix via SLURM when cases.py or Fortran dependencies change
  - Dependency-change detection greps PR/push diffs for added use/include statements
  - Conservative fallbacks: missing cache runs all, missing sim coverage includes test
• Extract post_process output params into shared constants for coverage builder consistency
• Add 53 unit tests covering core coverage logic with offline mocks
• Fix CI auto-cancellation by removing continue-on-error from github test job
• Update CMake gcov build flags: disable -march=native and LTO to avoid assembler errors
Diagram
flowchart LR
  A["Build with --gcov"] -->|Phase 1| B["Prepare Tests"]
  B -->|Phase 2| C["Run Tests Isolated"]
  C -->|Phase 3| D["Collect Coverage"]
  D --> E["Cache: UUID → Files"]
  E --> F["PR: Get Changed Files"]
  F --> G["Filter Tests"]
  G --> H["Run Only Affected"]
Loading

Grey Divider

File Changes

1. toolchain/mfc/test/coverage.py ✨ Enhancement +661/-0

New file-level gcov coverage cache builder and filter

toolchain/mfc/test/coverage.py


2. toolchain/mfc/test/test_coverage_unit.py 🧪 Tests +608/-0

53 unit tests for coverage logic with offline mocks

toolchain/mfc/test/test_coverage_unit.py


3. toolchain/mfc/test/case.py ✨ Enhancement +42/-18

Extract post_process params into shared constants

toolchain/mfc/test/case.py


View more (11)
4. toolchain/mfc/test/test.py ✨ Enhancement +66/-1

Integrate --only-changes and --build-coverage-cache flags

toolchain/mfc/test/test.py


5. toolchain/mfc/cli/commands.py ✨ Enhancement +25/-0

Add CLI arguments for coverage cache building and pruning

toolchain/mfc/cli/commands.py


6. .github/workflows/test.yml ⚙️ Configuration changes +122/-8

Add rebuild-cache job and coverage artifact download to CI

.github/workflows/test.yml


7. .github/workflows/phoenix/rebuild-cache.sh ⚙️ Configuration changes +21/-0

New SLURM script to build coverage cache on Phoenix

.github/workflows/phoenix/rebuild-cache.sh


8. .github/workflows/phoenix/submit.sh ⚙️ Configuration changes +3/-3

Update SLURM partition to cpu-gnr with exclusive access

.github/workflows/phoenix/submit.sh


9. .github/workflows/phoenix/test.sh ⚙️ Configuration changes +10/-2

Add --only-changes flag for PR test pruning

.github/workflows/phoenix/test.sh


10. .github/workflows/frontier/test.sh ⚙️ Configuration changes +8/-2

Add --only-changes flag for PR test pruning

.github/workflows/frontier/test.sh


11. .github/workflows/frontier_amd/test.sh ⚙️ Configuration changes +8/-2

Add --only-changes flag for PR test pruning

.github/workflows/frontier_amd/test.sh


12. .github/file-filter.yml ⚙️ Configuration changes +3/-0

Add cases.py to file change detection filters

.github/file-filter.yml


13. CMakeLists.txt 🐞 Bug fix +26/-11

Disable -march=native and LTO for gcov builds, add Fypp line markers

CMakeLists.txt


14. src/simulation/m_bubbles.fpp Miscellaneous +1/-0

Duplicate use statement for testing cache rebuild pipeline

src/simulation/m_bubbles.fpp


Grey Divider

Qodo Logo

@qodo-code-review
Copy link
Contributor

qodo-code-review bot commented Mar 2, 2026

Code Review by Qodo

🐞 Bugs (4) 📘 Rule violations (0) 📎 Requirement gaps (0)

Grey Divider


Action required

1. Non-fpp PR skips tests 🐞 Bug ✓ Correctness
Description
With --only-changes, the filter path skips *all* tests when no .fpp files changed; CI now
enables --only-changes for PRs, so many PRs (CMake/Python/workflow/.f90) can end up with zero
functional tests. This is made worse by the TEMP-reduced ALWAYS_RUN_ALL/prefixes that currently
won’t force a full run for infrastructure changes.
Code

toolchain/mfc/test/test.py[R111-145]

+    # --only-changes: filter based on file-level gcov coverage
+    if ARG("only_changes"):
+        from .coverage import (  # pylint: disable=import-outside-toplevel
+            load_coverage_cache, get_changed_files,
+            should_run_all_tests, filter_tests_by_coverage,
+        )
+
+        cache = load_coverage_cache(common.MFC_ROOT_DIR)
+        if cache is None:
+            cons.print("[yellow]Coverage cache missing or stale.[/yellow]")
+            cons.print("[yellow]Run: ./mfc.sh build --gcov -j 8 && ./mfc.sh test --build-coverage-cache[/yellow]")
+            cons.print("[yellow]Falling back to full test suite.[/yellow]")
+        else:
+            changed_files = get_changed_files(common.MFC_ROOT_DIR, ARG("changes_branch"))
+
+            if changed_files is None:
+                cons.print("[yellow]git diff failed — falling back to full test suite.[/yellow]")
+            elif should_run_all_tests(changed_files):
+                cons.print()
+                cons.print("[bold cyan]Coverage Change Analysis[/bold cyan]")
+                cons.print("-" * 50)
+                cons.print("[yellow]Infrastructure or macro file changed — running full test suite.[/yellow]")
+                cons.print("-" * 50)
+            else:
+                changed_fpp = {f for f in changed_files if f.endswith(".fpp")}
+                if not changed_fpp:
+                    cons.print()
+                    cons.print("[bold cyan]Coverage Change Analysis[/bold cyan]")
+                    cons.print("-" * 50)
+                    cons.print("[green]No .fpp source changes detected — skipping all tests.[/green]")
+                    cons.print("-" * 50)
+                    cons.print()
+                    skipped_cases += cases
+                    cases = []
+                else:
Evidence
--only-changes explicitly skips all tests when changed_fpp is empty; CI injects --only-changes
for PRs, and CI’s change filter (checkall) includes many non-.fpp changes
(python/cmake/yml/.f90) that will still trigger test jobs but then run zero tests. Meanwhile, the
escape hatch list meant to force full runs is explicitly TEMP-stripped in this PR.

toolchain/mfc/test/test.py[111-145]
.github/workflows/test.yml[255-281]
.github/file-filter.yml[4-38]
toolchain/mfc/test/coverage.py[39-56]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
`--only-changes` currently skips the entire suite when there are no `.fpp` changes, but CI now enables this flag for PRs. This can result in CI running **zero functional tests** for many PRs (Python/toolchain, CMake, workflow changes, `.f90` changes) unless those paths are explicitly classified as “always run all”.

### Issue Context
- CI’s `checkall` filter triggers test jobs for many non-`.fpp` changes.
- The pruning logic only looks at `.fpp` changes and explicitly empties the test list when there are none.
- The intended escape hatch list is TEMP-stripped in this PR.

### Fix Focus Areas
- toolchain/mfc/test/test.py[111-145]
- toolchain/mfc/test/coverage.py[39-56]
- .github/workflows/test.yml[255-281]
- .github/file-filter.yml[4-38]

### Suggested direction
- In `__filter`, if `--only-changes` is set and there are **no relevant source changes**, default to a safe behavior (e.g., run full suite, or run a minimal smoke subset) unless the change-set is clearly “docs-only”.
- Restore/expand `ALWAYS_RUN_ALL`/`ALWAYS_RUN_ALL_PREFIXES` to include infrastructure inputs that can invalidate the cache or the toolchain itself (e.g., `CMakeLists.txt`, `toolchain/**`, `.github/workflows/**`, etc.).
- Consider updating CI to set `ONLY_CHANGES` only when the PR actually changes Fortran sources (or when a fresh cache artifact is present).

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


2. Cache staleness too weak 🐞 Bug ✓ Correctness
Description
Cache validity is only checked via cases.py hash, so dependency-graph/build changes can leave a
stale cache that still “looks valid” and causes incorrect pruning. CI will still pass
--only-changes even when the cache rebuild job is skipped or fails to provide a new artifact.
Code

toolchain/mfc/test/coverage.py[R520-549]

+def load_coverage_cache(root_dir: str) -> Optional[dict]:
+    """
+    Load the coverage cache, returning None if missing or stale.
+
+    Staleness is detected by comparing the SHA256 of cases.py at cache-build time
+    against the current cases.py. Auto-converts old line-level format if needed.
+    """
+    if not COVERAGE_CACHE_PATH.exists():
+        return None
+
+    try:
+        with gzip.open(COVERAGE_CACHE_PATH, "rt", encoding="utf-8") as f:
+            cache = json.load(f)
+    except (OSError, gzip.BadGzipFile, json.JSONDecodeError, UnicodeDecodeError):
+        cons.print("[yellow]Warning: Coverage cache is unreadable or corrupt.[/yellow]")
+        return None
+
+    cases_py = Path(root_dir) / "toolchain/mfc/test/cases.py"
+    try:
+        current_hash = hashlib.sha256(cases_py.read_bytes()).hexdigest()
+    except FileNotFoundError:
+        cons.print("[yellow]Warning: cases.py not found; cannot verify cache staleness.[/yellow]")
+        return None
+    stored_hash = cache.get("_meta", {}).get("cases_hash", "")
+
+    if current_hash != stored_hash:
+        cons.print("[yellow]Warning: Coverage cache is stale (cases.py changed).[/yellow]")
+        return None
+
+    return _normalize_cache(cache)
Evidence
The toolchain accepts the cache as long as cases.py hash matches; it does not incorporate
dependency graph signatures or relevant source/build metadata. Meanwhile CI enables --only-changes
for PRs and only downloads a rebuilt cache artifact when rebuild-cache succeeds, so failure/skip
cases can still run with a stale committed cache and prune tests incorrectly.

toolchain/mfc/test/coverage.py[524-549]
.github/workflows/test.yml[95-136]
.github/workflows/test.yml[190-197]
.github/workflows/test.yml[255-281]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
The coverage cache can become stale due to Fortran dependency graph changes or other build/source changes without modifying `cases.py`. Because `load_coverage_cache` only checks `cases.py` hash, the toolchain may treat a stale cache as valid and incorrectly prune tests.

### Issue Context
- CI attempts to rebuild cache on `dep_changed`, but `--only-changes` is still enabled on PRs regardless, and cache downloads are best-effort.
- If rebuild fails or the artifact is missing, CI may still prune using an older committed cache that is no longer correct.

### Fix Focus Areas
- toolchain/mfc/test/coverage.py[520-549]
- .github/workflows/test.yml[95-149]
- .github/workflows/test.yml[190-197]
- .github/workflows/test.yml[255-281]

### Suggested direction
- Add additional `_meta` fields to the cache (e.g., hash of a dependency-graph input, or a hash of relevant Fortran source/include trees) and validate them in `load_coverage_cache`.
- In CI, if `dep_changed == true`, only enable `--only-changes` when the rebuilt cache artifact was successfully produced and downloaded; otherwise force full suite (or omit `--only-changes`).
- Consider making artifact download non-optional (remove `continue-on-error`) when pruning is enabled based on it.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


3. Duplicate USE statement 🐞 Bug ✓ Correctness
Description
m_bubbles.fpp now contains a duplicate use m_helper_basic, which should be reverted before merge
as noted in the PR description. Leaving it in can introduce compiler warnings or stricter build
failures depending on flags.
Code

src/simulation/m_bubbles.fpp[19]

+    use m_helper_basic
Evidence
The module imports m_helper_basic twice consecutively in the same scope.

src/simulation/m_bubbles.fpp[16-20]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
`src/simulation/m_bubbles.fpp` has a duplicate `use m_helper_basic` statement.

### Issue Context
The PR description marks this as TEMP to exercise CI dep-change detection; it should not be merged.

### Fix Focus Areas
- src/simulation/m_bubbles.fpp[16-20]ുഞ്ഞ

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools



Remediation recommended

4. Empty coverage causes skips 🐞 Bug ⛯ Reliability
Description
The cache builder records empty coverage for tests where gcov collection fails, and pruning treats
that as definitive “covers nothing,” which can incorrectly skip tests for non-simulation changes.
Only simulation-file changes have a conservative include fallback for missing simulation coverage.
Code

toolchain/mfc/test/coverage.py[R640-652]

+        test_file_set = set(test_files)
+
+        # If simulation files changed but this test has no simulation coverage,
+        # include it conservatively — the cache build likely failed for this test.
+        if changed_sim and not any(f.startswith("src/simulation/") for f in test_file_set):
+            to_run.append(case)
+            n_no_sim_coverage += 1
+            continue
+
+        if test_file_set & changed_fpp:
+            to_run.append(case)
+        else:
+            skipped.append(case)
Evidence
Cache build stores coverage lists even when empty, only warning if many tests are missing coverage.
During pruning, an empty list becomes an empty set and will not intersect changed_fpp, so the test
is skipped unless the special simulation-only fallback applies.

toolchain/mfc/test/coverage.py[470-488]
toolchain/mfc/test/coverage.py[640-653]
toolchain/mfc/test/coverage.py[642-647]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
A partially broken cache build can silently produce empty coverage lists for some tests. The pruning logic then interprets empty coverage as “this test doesn’t touch any changed file,” which can skip tests incorrectly.

### Issue Context
- Cache build accepts partial results (warns only when coverage is sparse).
- Pruning only has a conservative fallback for missing **simulation** coverage.

### Fix Focus Areas
- toolchain/mfc/test/coverage.py[456-488]
- toolchain/mfc/test/coverage.py[640-653]

### Suggested direction
- Track collection failures explicitly (e.g., store `None` or a flag for tests where gcov failed) and include those tests conservatively when any relevant source changes occur.
- Expand the conservative fallback beyond `src/simulation/` to other critical areas (`src/common/`, `src/pre_process/`, `src/post_process/`).
- Optionally: make cache build fail if any test has zero `.gcda`/coverage, if correctness is more important than best-effort caching.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


Grey Divider

ⓘ The new review experience is currently in Beta. Learn more

Grey Divider

Qodo Logo

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a gcov-derived, file-level coverage cache to enable pruning the MFC test suite to only those tests that exercise files changed in a PR, with CI support to rebuild and distribute the cache when it becomes stale.

Changes:

  • Introduces toolchain/mfc/test/coverage.py (cache build/load + changed-file detection + coverage-based test filtering) and commits a gzipped JSON cache.
  • Integrates new CLI flags --only-changes and --build-coverage-cache into the test runner and CLI schema.
  • Updates CI workflows/scripts to rebuild the cache on dependency graph changes and to run pruned test suites on PRs.

Reviewed changes

Copilot reviewed 14 out of 16 changed files in this pull request and generated 9 comments.

Show a summary per file
File Description
toolchain/mfc/test/coverage.py Implements coverage cache build/load + changed-file diffing + pruning logic (ALWAYS_RUN_ALL safeguards).
toolchain/mfc/test/test_coverage_unit.py Adds offline unit tests covering diff parsing, gcov JSON parsing, cache normalization, and pruning behavior.
toolchain/mfc/test/test.py Wires --only-changes pruning and --build-coverage-cache cache generation into the test command.
toolchain/mfc/cli/commands.py Adds CLI arguments/examples for coverage cache building and pruning.
toolchain/mfc/test/case.py Factors post-process output parameter mods into reusable constants/helpers used by the cache builder.
toolchain/mfc/test/test_coverage_cache.json.gz Adds committed baseline coverage cache artifact.
.github/workflows/test.yml Adds rebuild-cache job + dependency-change detection + artifact flow; enables pruning on PRs.
.github/workflows/phoenix/rebuild-cache.sh SLURM-executed rebuild script: clean, gcov build, build cache in parallel.
.github/workflows/phoenix/test.sh Enables pruning on PRs and scales CPU thread count based on allocation.
.github/workflows/phoenix/submit.sh Adjusts CPU SLURM submission options for cache rebuild workload.
.github/workflows/frontier/test.sh Enables pruning on PRs.
.github/workflows/frontier_amd/test.sh Enables pruning on PRs.
.github/file-filter.yml Adds cases_py filter to trigger cache rebuilds when test definitions change.
CMakeLists.txt Improves gcov build reliability (disable -march=native/LTO for gcov; add Fypp line-marker flag).
.gitignore Ignores legacy raw (non-gz) cache file.
src/simulation/m_bubbles.fpp TEMP change: duplicated use for pipeline exercise.

Comment on lines +41 to +56
# toolchain files define or change the set of tests themselves.
# TEMP: stripped to GPU macros only so CI exercises the pruning logic.
# Restore full list before merge.
ALWAYS_RUN_ALL = frozenset([
"src/common/include/parallel_macros.fpp",
"src/common/include/acc_macros.fpp",
"src/common/include/omp_macros.fpp",
"src/common/include/shared_parallel_macros.fpp",
])

# Directory prefixes: any changed file under these paths triggers full suite.
# Note: src/simulation/include/ (.fpp files like inline_riemann.fpp) is NOT
# listed here — Fypp line markers (--line-marker-format=gfortran5) correctly
# attribute included file paths, so gcov coverage tracks them accurately.
# TEMP: cmake/ prefix removed to exercise pruning in CI. Restore before merge.
ALWAYS_RUN_ALL_PREFIXES = ()
Copy link

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ALWAYS_RUN_ALL / ALWAYS_RUN_ALL_PREFIXES are explicitly marked TEMP and currently exclude toolchain/build-system paths. If this ships, --only-changes can incorrectly skip tests for changes to CMake/toolchain/CI files, creating false-green CI. Restore the full ALWAYS_RUN_ALL list and prefixes before merge (as described in the PR description).

Suggested change
# toolchain files define or change the set of tests themselves.
# TEMP: stripped to GPU macros only so CI exercises the pruning logic.
# Restore full list before merge.
ALWAYS_RUN_ALL = frozenset([
"src/common/include/parallel_macros.fpp",
"src/common/include/acc_macros.fpp",
"src/common/include/omp_macros.fpp",
"src/common/include/shared_parallel_macros.fpp",
])
# Directory prefixes: any changed file under these paths triggers full suite.
# Note: src/simulation/include/ (.fpp files like inline_riemann.fpp) is NOT
# listed here — Fypp line markers (--line-marker-format=gfortran5) correctly
# attribute included file paths, so gcov coverage tracks them accurately.
# TEMP: cmake/ prefix removed to exercise pruning in CI. Restore before merge.
ALWAYS_RUN_ALL_PREFIXES = ()
# toolchain / build-system files define or change the set of tests themselves.
ALWAYS_RUN_ALL = frozenset([
# GPU macro files: coverage does not see directive-only changes.
"src/common/include/parallel_macros.fpp",
"src/common/include/acc_macros.fpp",
"src/common/include/omp_macros.fpp",
"src/common/include/shared_parallel_macros.fpp",
# Core build configuration.
"CMakeLists.txt",
])
# Directory prefixes: any changed file under these paths triggers full suite.
# Note: src/simulation/include/ (.fpp files like inline_riemann.fpp) is NOT
# listed here — Fypp line markers (--line-marker-format=gfortran5) correctly
# attribute included file paths, so gcov coverage tracks them accurately.
ALWAYS_RUN_ALL_PREFIXES = (
# CMake modules and support scripts.
"cmake/",
# Python toolchain (build, test harness, case validator, etc.).
"toolchain/",
# CI configuration directories.
".github/",
".gitlab/",
".circleci/",
)

Copilot uses AI. Check for mistakes.
Comment on lines +224 to +232
# TEMP: macros.fpp, cases.py, case.py, definitions.py, input.py,
# case_validator.py, case.fpp, coverage.py, cmake/ removed from
# ALWAYS_RUN_ALL to exercise pruning in CI. Restore before merge.

def test_macros_fpp_does_not_trigger_all(self):
assert should_run_all_tests(
{"src/common/include/macros.fpp"}
) is False

Copy link

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The TEMP expectations here (e.g., macros.fpp not triggering should_run_all_tests, cases.py removed from ALWAYS_RUN_ALL) encode behavior that the PR description says must be reverted before merge. These tests will either fail once ALWAYS_RUN_ALL is restored, or worse, lock in unsafe pruning rules if they stay. Please remove/adjust the TEMP tests when restoring the full ALWAYS_RUN_ALL set/prefixes.

Suggested change
# TEMP: macros.fpp, cases.py, case.py, definitions.py, input.py,
# case_validator.py, case.fpp, coverage.py, cmake/ removed from
# ALWAYS_RUN_ALL to exercise pruning in CI. Restore before merge.
def test_macros_fpp_does_not_trigger_all(self):
assert should_run_all_tests(
{"src/common/include/macros.fpp"}
) is False

Copilot uses AI. Check for mistakes.

# pylint: disable=too-many-branches, too-many-statements, trailing-whitespace
# pylint: disable=too-many-branches,too-many-locals,too-many-statements,trailing-whitespace
def __filter(cases_) -> typing.List[TestCase]:
Copy link

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

__filter is annotated as returning typing.List[TestCase], but it actually returns a tuple (selected_cases, skipped_cases). Please fix the return type annotation to match the real return value (or adjust the function to return only the list).

Suggested change
def __filter(cases_) -> typing.List[TestCase]:
def __filter(cases_) -> typing.Tuple[typing.List[TestCase], typing.List[TestCase]]:

Copilot uses AI. Check for mistakes.
DIFF=""
fi
if echo "$DIFF" | \
grep -qP '^\+\s*(use[\s,]+\w|#:include\s|include\s+['"'"'"])'; then
Copy link

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The dependency-change detector comment says it detects added/removed use/include, but the grep only matches added lines (^+). Removals (or modifications where a use is deleted) can also change the dependency graph and should trigger a cache rebuild. Consider matching both ^+ and ^- (while excluding diff headers like +++/---).

Suggested change
grep -qP '^\+\s*(use[\s,]+\w|#:include\s|include\s+['"'"'"])'; then
grep -qP '^[+-]\s*(use[\s,]+\w|#:include\s|include\s+['"'"'"])'; then

Copilot uses AI. Check for mistakes.
Comment on lines +527 to +531
if not COVERAGE_CACHE_PATH.exists():
return None

try:
with gzip.open(COVERAGE_CACHE_PATH, "rt", encoding="utf-8") as f:
Copy link

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

load_coverage_cache(root_dir) ignores its root_dir argument when locating the cache file because COVERAGE_CACHE_PATH is computed from common.MFC_ROOT_DIR at import time. This makes the function harder to reuse/test with alternate roots and is inconsistent with the root_dir parameter (used for staleness checks). Consider computing the cache path from root_dir inside load_coverage_cache (or make the function take an explicit cache_path).

Suggested change
if not COVERAGE_CACHE_PATH.exists():
return None
try:
with gzip.open(COVERAGE_CACHE_PATH, "rt", encoding="utf-8") as f:
cache_path = Path(root_dir) / "toolchain/mfc/test/test_coverage_cache.json.gz"
if not cache_path.exists():
return None
try:
with gzip.open(cache_path, "rt", encoding="utf-8") as f:

Copilot uses AI. Check for mistakes.
Comment on lines +400 to +401
# TEMP: test_non_fpp_always_run_all_detected removed (cases.py not
# in ALWAYS_RUN_ALL during CI pruning test). Restore before merge.
Copy link

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comments below indicate TEMP removal of ALWAYS_RUN_ALL coverage for non-.fpp files (like cases.py). As written, this reduces safety (toolchain changes may skip tests) and conflicts with the intended conservative fallback rules. Please restore the non-.fpp ALWAYS_RUN_ALL coverage and reintroduce/update the removed test case(s).

Suggested change
# TEMP: test_non_fpp_always_run_all_detected removed (cases.py not
# in ALWAYS_RUN_ALL during CI pruning test). Restore before merge.
def test_non_fpp_always_run_all_detected(self):
"""
Non-.fpp toolchain files like cases.py are part of ALWAYS_RUN_ALL.
Changing them must force should_run_all_tests() to return True so
that toolchain logic changes never silently skip tests.
"""
changed = {"toolchain/mfc/cases.py"}
assert should_run_all_tests(changed) is True

Copilot uses AI. Check for mistakes.
use m_variables_conversion !< State variables type conversion procedures

use m_helper_basic !< Functions to compare floating point numbers
use m_helper_basic
Copy link

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This introduces a duplicate use m_helper_basic statement. Even if accepted by some compilers, it is redundant noise and (per the PR description) is a TEMP change that should be reverted before merge.

Suggested change
use m_helper_basic

Copilot uses AI. Check for mistakes.
Comment on lines +80 to +84
if [ "${{ github.event_name }}" = "pull_request" ]; then
DIFF=$(gh pr diff ${{ github.event.pull_request.number }})
elif [ "${{ github.event_name }}" = "push" ]; then
DIFF=$(git diff ${{ github.event.before }}..${{ github.event.after }} 2>/dev/null || echo "")
else
Copy link

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the push event path, git diff ${{ github.event.before }}..${{ github.event.after }} can fail with the default shallow checkout (fetch-depth=1), yielding an empty DIFF and missing real dependency-graph changes. Consider fetching the required commits (e.g., set checkout fetch-depth: 0 / fetch the before SHA) or using the GitHub API/gh to get the push diff reliably.

Copilot uses AI. Check for mistakes.
# Override Release -O3 with -O1 for gcov: coverage instrumentation is
# inaccurate at -O3, and aggressive codegen (e.g. AVX-512 FP16 on
# Granite Rapids) can emit instructions that older assemblers reject.
set(CMAKE_Fortran_FLAGS_RELEASE "-O1 -DNDEBUG" CACHE STRING "" FORCE)
Copy link

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For gcov builds, this force-overrides CMAKE_Fortran_FLAGS_RELEASE in the cache. That can unexpectedly discard user/toolchain-provided Release flags (and makes it harder to compose flags in downstream builds). Prefer appending or conditionally adjusting optimization flags (e.g., replace -O3 with -O1) rather than clobbering the entire Release flag string.

Suggested change
set(CMAKE_Fortran_FLAGS_RELEASE "-O1 -DNDEBUG" CACHE STRING "" FORCE)
# Adjust existing flags instead of clobbering user/toolchain settings.
set(_mfc_orig_ftn_release_flags "${CMAKE_Fortran_FLAGS_RELEASE}")
if(_mfc_orig_ftn_release_flags MATCHES "-O[0-9]")
string(REGEX REPLACE "-O[0-9]" "-O1" CMAKE_Fortran_FLAGS_RELEASE "${_mfc_orig_ftn_release_flags}")
else()
set(CMAKE_Fortran_FLAGS_RELEASE "${_mfc_orig_ftn_release_flags} -O1")
endif()
if(NOT CMAKE_Fortran_FLAGS_RELEASE MATCHES "-DNDEBUG")
set(CMAKE_Fortran_FLAGS_RELEASE "${CMAKE_Fortran_FLAGS_RELEASE} -DNDEBUG")
endif()

Copilot uses AI. Check for mistakes.
@github-actions
Copy link

github-actions bot commented Mar 2, 2026

Claude Code Review

Head SHA: dbe7098f05f65cc1b71a3130d3756c2822310ff9
Files changed: 16

Changed files
  • .github/file-filter.yml
  • .github/workflows/frontier/test.sh
  • .github/workflows/frontier_amd/test.sh
  • .github/workflows/phoenix/rebuild-cache.sh
  • .github/workflows/phoenix/submit.sh
  • .github/workflows/phoenix/test.sh
  • .github/workflows/test.yml
  • .gitignore
  • CMakeLists.txt
  • src/simulation/m_bubbles.fpp
  • toolchain/mfc/cli/commands.py
  • toolchain/mfc/test/case.py
  • toolchain/mfc/test/coverage.py
  • toolchain/mfc/test/test.py
  • toolchain/mfc/test/test_coverage_cache.json.gz
  • toolchain/mfc/test/test_coverage_unit.py

Summary

  • Adds a gcov-based file-level test pruning system: a gzipped JSON cache maps each test UUID to the .fpp source files it executes
  • New --only-changes flag intersects PR-changed .fpp files against the cache and skips unaffected tests
  • New --build-coverage-cache flag drives a 3-phase parallel cache builder (prepare → run → gcov collect)
  • New rebuild-cache CI job regenerates the cache on Phoenix when cases.py or Fortran dependency graph changes
  • 53 unit tests cover the core pruning logic; coverage.py is well-structured with conservative fallbacks

Findings

1. SECURITY — Self-hosted runner checkout of untrusted PR head SHA with write permissions (BLOCKER)

File: .github/workflows/test.yml, rebuild-cache job (lines ~200–232)

The rebuild-cache job:

  • Checks out untrusted fork code (github.event.pull_request.head.sha) from a PR author
  • Runs it on the self-hosted Phoenix runner (runs-on: group: phoenix, labels: gt)
  • Has permissions: contents: write

A malicious PR could inject arbitrary code into rebuild-cache.sh, submit.sh, or CMakeLists.txt and have it executed on the self-hosted runner with repository write access. This is a well-documented attack pattern for pull_request_target + self-hosted runners.

Fix: Either (a) only trigger rebuild-cache on push/workflow_dispatch events (not pull_request), (b) use a separate trusted workflow triggered by workflow_run after the PR is approved, or (c) ensure the checkout always uses github.sha (base branch code), never the fork head SHA.


2. MUST REVERT — Multiple TEMP changes bypass the safety net (BLOCKER before merge)

The PR body lists these as temporary and the test plan says "Revert TEMP changes before merge", but they are present in this diff:

a) Duplicate use m_helper_basic in src/simulation/m_bubbles.fpp (line 419 of diff)
Explicitly a TEMP change to trigger dep-change detection in CI. Dead code that must not be merged.

b) ALWAYS_RUN_ALL stripped to GPU macros only (toolchain/mfc/test/coverage.py, lines 606–618):

ALWAYS_RUN_ALL = frozenset([
    # Only 4 GPU macro files; missing: macros.fpp, cases.py, case.py,
    # definitions.py, input.py, case_validator.py, cmake/ ...
])
ALWAYS_RUN_ALL_PREFIXES = ()  # TEMP: cmake/ removed

After revert, the full list must be restored (src/common/include/macros.fpp, toolchain/mfc/test/cases.py, cmake/ prefix, etc.). Unit tests for these (e.g., test_macros_fpp_does_not_trigger_all asserting False) must also flip back.


3. dep-change detection grep pattern is too broad

File: .github/workflows/test.yml, line ~172

The grep pattern '^\+\s*(use[\s,]+\w|#:include\s|include\s+['"'"'"])' matches any new line starting with use <identifier> in any file in the PR diff — including Python, YAML, and Markdown. A Python PR adding use_my_var = True would false-positive trigger a 4-hour cache rebuild. The pattern should gate on file extension (.fpp or .f90) or filter the diff to Fortran files first.


4. rebuild-cache blocks all downstream test jobs even when not triggered

File: .github/workflows/test.yml, lines ~238–244 and ~295–303

Both github and self jobs now need: [rebuild-cache] with an always() guard. If rebuild-cache isn't triggered (it's conditional), GitHub Actions treats a skipped job as result == 'skipped', which is handled by != 'cancelled' correctly. However, when rebuild-cache is triggered and runs the full 240-minute SLURM job, all downstream test jobs block for up to 4 hours. Consider whether the github matrix (Ubuntu/macOS) truly needs to wait for the Phoenix cache before running.


5. New .fpp files silently skip all existing tests

File: toolchain/mfc/test/coverage.py, lines ~1181–1183

When a brand-new .fpp source file is added (no existing test covers it in the cache), all existing tests are skipped because no coverage entry contains the new file path. This is acknowledged in test_new_fpp_file_no_coverage_skips, but means a PR adding a new Fortran module will pass CI with zero simulation tests run (assuming no new test UUID is added). The conservative fallback for tests-not-in-cache helps for new test UUIDs, but not for new files covered only by existing tests. Consider emitting a warning when 0 tests are selected.


6. CMakeLists.txt: CACHE STRING "" FORCE silently overrides user-set flags

File: CMakeLists.txt, line ~361

set(CMAKE_Fortran_FLAGS_RELEASE "-O1 -DNDEBUG" CACHE STRING "" FORCE)

FORCE unconditionally overwrites any -DCMAKE_Fortran_FLAGS_RELEASE=... the user may have passed. This is the right approach for the gcov use-case, but the behavior is non-obvious. At minimum expand the comment to note that user-passed flags are intentionally overridden for coverage builds.


Minor / Non-blocking

  • toolchain/mfc/test/test.py line ~1302: The --build-coverage-cache path calls build() for each unique slug regardless of --no-build. If intentional, document it.
  • coverage.py line ~967: max(os.cpu_count() or 1, 1) — the outer max(..., 1) is redundant since or 1 already guarantees ≥ 1.
  • test.py lines ~1265–1273: Caller pre-checks not changed_fpp before calling filter_tests_by_coverage, which also checks it internally. Harmless duplication but produces two different console messages for the same condition.

Overall: The core coverage pruning logic is sound, well-tested, and the conservative fallbacks (missing cache → run all, missing sim coverage → include conservatively) are appropriate. The self-hosted runner security concern in the CI workflow must be resolved before merge, and the TEMP changes must be reverted as noted in the PR checklist.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 2, 2026

📝 Walkthrough

Walkthrough

This pull request implements file-level gcov-based coverage analysis for selective test execution. It introduces a new coverage.py module that builds coverage caches during CI, enabling PR workflows to run only tests affected by changed files. Supporting changes include: new CLI arguments (--build-coverage-cache, --only-changes), workflow updates across Frontier and Phoenix to apply pruning logic, a cache rebuild script, CMakeLists.txt gcov instrumentation handling, and CI pipeline modifications in test.yml to download/upload coverage artifacts. The changes maintain backward compatibility while adding test filtering capabilities for improved CI efficiency.

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 58.21% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check ❓ Inconclusive The description covers key aspects (features, test plan, temporary changes) but lacks explicit sections matching the template structure (Type of change, Testing details, Checklist items). Reorganize description to match template sections: add Type of change, expand Testing section with specific test results, and complete the Checklist with explicit checkboxes.
✅ Passed checks (1 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely summarizes the main change: adding gcov-based test pruning with a coverage cache mechanism.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (3)
src/simulation/m_bubbles.fpp (1)

19-19: Remove duplicate module import.

Line 19 repeats use m_helper_basic already declared on Line 18. This is redundant and can cause avoidable compiler warnings/noise across toolchains.

Proposed fix
-    use m_helper_basic
.github/workflows/phoenix/test.sh (1)

54-56: Potential issue with unset variable in arithmetic comparison.

When SLURM_CPUS_ON_NODE is unset, the comparison SLURM_CPUS_ON_NODE > 64 evaluates the unset variable as 0, but the default :-8 is only applied in the else branch. This works but is fragile. Consider applying the default first:

Suggested improvement
-# Use up to 64 parallel test threads on CPU (GNR nodes have 192 cores).
-# Cap at 64 to avoid overwhelming MPI's ORTE daemons with concurrent launches.
-n_test_threads=$(( SLURM_CPUS_ON_NODE > 64 ? 64 : ${SLURM_CPUS_ON_NODE:-8} ))
+# Use up to 64 parallel test threads on CPU (GNR nodes have 192 cores).
+# Cap at 64 to avoid overwhelming MPI's ORTE daemons with concurrent launches.
+_cpus=${SLURM_CPUS_ON_NODE:-8}
+n_test_threads=$(( _cpus > 64 ? 64 : _cpus ))
toolchain/mfc/test/test_coverage_unit.py (1)

74-75: Use underscore-prefixed stub args to keep lint clean.

Optional cleanup to silence ARG005 without changing behavior.

♻️ Suggested tweak
-_case_stub.input_bubbles_lagrange = lambda case: None
-_case_stub.get_post_process_mods = lambda params: {}
+_case_stub.input_bubbles_lagrange = lambda _case: None
+_case_stub.get_post_process_mods = lambda _params: {}

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4ee892c and dbe7098.

⛔ Files ignored due to path filters (1)
  • toolchain/mfc/test/test_coverage_cache.json.gz is excluded by !**/*.gz
📒 Files selected for processing (15)
  • .github/file-filter.yml
  • .github/workflows/frontier/test.sh
  • .github/workflows/frontier_amd/test.sh
  • .github/workflows/phoenix/rebuild-cache.sh
  • .github/workflows/phoenix/submit.sh
  • .github/workflows/phoenix/test.sh
  • .github/workflows/test.yml
  • .gitignore
  • CMakeLists.txt
  • src/simulation/m_bubbles.fpp
  • toolchain/mfc/cli/commands.py
  • toolchain/mfc/test/case.py
  • toolchain/mfc/test/coverage.py
  • toolchain/mfc/test/test.py
  • toolchain/mfc/test/test_coverage_unit.py

Comment on lines +42 to +56
# TEMP: stripped to GPU macros only so CI exercises the pruning logic.
# Restore full list before merge.
ALWAYS_RUN_ALL = frozenset([
"src/common/include/parallel_macros.fpp",
"src/common/include/acc_macros.fpp",
"src/common/include/omp_macros.fpp",
"src/common/include/shared_parallel_macros.fpp",
])

# Directory prefixes: any changed file under these paths triggers full suite.
# Note: src/simulation/include/ (.fpp files like inline_riemann.fpp) is NOT
# listed here — Fypp line markers (--line-marker-format=gfortran5) correctly
# attribute included file paths, so gcov coverage tracks them accurately.
# TEMP: cmake/ prefix removed to exercise pruning in CI. Restore before merge.
ALWAYS_RUN_ALL_PREFIXES = ()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Do not merge with TEMP ALWAYS_RUN_ALL relaxations in place.

The reduced set and empty prefixes materially weaken conservative full-suite fallbacks and can create false negatives in PR validation.

Comment on lines +568 to +584
for ref in [compare_branch, f"origin/{compare_branch}"]:
merge_base_result = subprocess.run(
["git", "merge-base", ref, "HEAD"],
capture_output=True, text=True, cwd=root_dir, timeout=30, check=False
)
if merge_base_result.returncode == 0:
break
else:
return None
merge_base = merge_base_result.stdout.strip()
if not merge_base:
return None

diff_result = subprocess.run(
["git", "diff", merge_base, "HEAD", "--name-only", "--no-color"],
capture_output=True, text=True, cwd=root_dir, timeout=30, check=False
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Preserve fallback contract by catching git subprocess exceptions.

get_changed_files() currently returns None on git failure codes, but TimeoutExpired/OSError can still bubble and hard-fail instead of falling back.

🛡️ Proposed fix
     # Try local branch first, then origin/ remote ref (CI shallow clones).
     for ref in [compare_branch, f"origin/{compare_branch}"]:
-        merge_base_result = subprocess.run(
-            ["git", "merge-base", ref, "HEAD"],
-            capture_output=True, text=True, cwd=root_dir, timeout=30, check=False
-        )
+        try:
+            merge_base_result = subprocess.run(
+                ["git", "merge-base", ref, "HEAD"],
+                capture_output=True, text=True, cwd=root_dir, timeout=30, check=False
+            )
+        except (subprocess.TimeoutExpired, OSError):
+            continue
         if merge_base_result.returncode == 0:
             break
     else:
         return None
@@
-    diff_result = subprocess.run(
-        ["git", "diff", merge_base, "HEAD", "--name-only", "--no-color"],
-        capture_output=True, text=True, cwd=root_dir, timeout=30, check=False
-    )
+    try:
+        diff_result = subprocess.run(
+            ["git", "diff", merge_base, "HEAD", "--name-only", "--no-color"],
+            capture_output=True, text=True, cwd=root_dir, timeout=30, check=False
+        )
+    except (subprocess.TimeoutExpired, OSError):
+        return None
     if diff_result.returncode != 0:
         return None
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
for ref in [compare_branch, f"origin/{compare_branch}"]:
merge_base_result = subprocess.run(
["git", "merge-base", ref, "HEAD"],
capture_output=True, text=True, cwd=root_dir, timeout=30, check=False
)
if merge_base_result.returncode == 0:
break
else:
return None
merge_base = merge_base_result.stdout.strip()
if not merge_base:
return None
diff_result = subprocess.run(
["git", "diff", merge_base, "HEAD", "--name-only", "--no-color"],
capture_output=True, text=True, cwd=root_dir, timeout=30, check=False
)
for ref in [compare_branch, f"origin/{compare_branch}"]:
try:
merge_base_result = subprocess.run(
["git", "merge-base", ref, "HEAD"],
capture_output=True, text=True, cwd=root_dir, timeout=30, check=False
)
except (subprocess.TimeoutExpired, OSError):
continue
if merge_base_result.returncode == 0:
break
else:
return None
merge_base = merge_base_result.stdout.strip()
if not merge_base:
return None
try:
diff_result = subprocess.run(
["git", "diff", merge_base, "HEAD", "--name-only", "--no-color"],
capture_output=True, text=True, cwd=root_dir, timeout=30, check=False
)
except (subprocess.TimeoutExpired, OSError):
return None
🧰 Tools
🪛 Ruff (0.15.2)

[error] 569-569: subprocess call: check for execution of untrusted input

(S603)


[error] 570-570: Starting a process with a partial executable path

(S607)


[error] 581-581: subprocess call: check for execution of untrusted input

(S603)


[error] 582-582: Starting a process with a partial executable path

(S607)

Comment on lines +619 to +621
changed_fpp = {f for f in changed_files if f.endswith(".fpp")}
if not changed_fpp:
return [], list(cases)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Do not skip all tests for non-.fpp source edits.

If a PR changes src/**/*.f90 but no .fpp, this returns “run none,” which can bypass testing for real code changes.

💡 Conservative fallback tweak
     changed_fpp = {f for f in changed_files if f.endswith(".fpp")}
+    changed_fortran_nonfpp = {
+        f for f in changed_files
+        if f.startswith("src/") and f.endswith((".f90", ".f"))
+    }
+    if changed_fortran_nonfpp and not changed_fpp:
+        # Cache currently maps .fpp coverage only; non-.fpp source edits are not representable.
+        return list(cases), []
     if not changed_fpp:
         return [], list(cases)

Comment on lines +143 to +145
skipped_cases += cases
cases = []
else:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Avoid passing empty pruned sets into shard validation.

When pruning yields zero runnable tests, this path keeps flowing and can hit the shard guard at Line 190, raising an error for valid “nothing to run” cases (e.g., docs-only PRs or sparse shard hits).

💡 Proposed fix
                 if not changed_fpp:
                     cons.print()
                     cons.print("[bold cyan]Coverage Change Analysis[/bold cyan]")
                     cons.print("-" * 50)
                     cons.print("[green]No .fpp source changes detected — skipping all tests.[/green]")
                     cons.print("-" * 50)
                     cons.print()
-                    skipped_cases += cases
-                    cases = []
+                    return [], skipped_cases + cases
                 else:
                     cons.print()
                     cons.print("[bold cyan]Coverage Change Analysis[/bold cyan]")
                     cons.print("-" * 50)
                     for fpp_file in sorted(changed_fpp):
                         cons.print(f"  [green]*[/green] {fpp_file}")

                     cases, new_skipped = filter_tests_by_coverage(cases, cache, changed_files)
                     skipped_cases += new_skipped
                     cons.print(f"\n[bold]Tests to run: {len(cases)} / {len(cases) + len(new_skipped)}[/bold]")
                     cons.print("-" * 50)
                     cons.print()
+                    if not cases:
+                        return [], skipped_cases

Also applies to: 152-154

Comment on lines +111 to +145
# --only-changes: filter based on file-level gcov coverage
if ARG("only_changes"):
from .coverage import ( # pylint: disable=import-outside-toplevel
load_coverage_cache, get_changed_files,
should_run_all_tests, filter_tests_by_coverage,
)

cache = load_coverage_cache(common.MFC_ROOT_DIR)
if cache is None:
cons.print("[yellow]Coverage cache missing or stale.[/yellow]")
cons.print("[yellow]Run: ./mfc.sh build --gcov -j 8 && ./mfc.sh test --build-coverage-cache[/yellow]")
cons.print("[yellow]Falling back to full test suite.[/yellow]")
else:
changed_files = get_changed_files(common.MFC_ROOT_DIR, ARG("changes_branch"))

if changed_files is None:
cons.print("[yellow]git diff failed — falling back to full test suite.[/yellow]")
elif should_run_all_tests(changed_files):
cons.print()
cons.print("[bold cyan]Coverage Change Analysis[/bold cyan]")
cons.print("-" * 50)
cons.print("[yellow]Infrastructure or macro file changed — running full test suite.[/yellow]")
cons.print("-" * 50)
else:
changed_fpp = {f for f in changed_files if f.endswith(".fpp")}
if not changed_fpp:
cons.print()
cons.print("[bold cyan]Coverage Change Analysis[/bold cyan]")
cons.print("-" * 50)
cons.print("[green]No .fpp source changes detected — skipping all tests.[/green]")
cons.print("-" * 50)
cons.print()
skipped_cases += cases
cases = []
else:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Action required

1. Non-fpp pr skips tests 🐞 Bug ✓ Correctness

With --only-changes, the filter path skips *all* tests when no .fpp files changed; CI now
enables --only-changes for PRs, so many PRs (CMake/Python/workflow/.f90) can end up with zero
functional tests. This is made worse by the TEMP-reduced ALWAYS_RUN_ALL/prefixes that currently
won’t force a full run for infrastructure changes.
Agent Prompt
### Issue description
`--only-changes` currently skips the entire suite when there are no `.fpp` changes, but CI now enables this flag for PRs. This can result in CI running **zero functional tests** for many PRs (Python/toolchain, CMake, workflow changes, `.f90` changes) unless those paths are explicitly classified as “always run all”.

### Issue Context
- CI’s `checkall` filter triggers test jobs for many non-`.fpp` changes.
- The pruning logic only looks at `.fpp` changes and explicitly empties the test list when there are none.
- The intended escape hatch list is TEMP-stripped in this PR.

### Fix Focus Areas
- toolchain/mfc/test/test.py[111-145]
- toolchain/mfc/test/coverage.py[39-56]
- .github/workflows/test.yml[255-281]
- .github/file-filter.yml[4-38]

### Suggested direction
- In `__filter`, if `--only-changes` is set and there are **no relevant source changes**, default to a safe behavior (e.g., run full suite, or run a minimal smoke subset) unless the change-set is clearly “docs-only”.
- Restore/expand `ALWAYS_RUN_ALL`/`ALWAYS_RUN_ALL_PREFIXES` to include infrastructure inputs that can invalidate the cache or the toolchain itself (e.g., `CMakeLists.txt`, `toolchain/**`, `.github/workflows/**`, etc.).
- Consider updating CI to set `ONLY_CHANGES` only when the PR actually changes Fortran sources (or when a fresh cache artifact is present).

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

Comment on lines +520 to +549
def load_coverage_cache(root_dir: str) -> Optional[dict]:
"""
Load the coverage cache, returning None if missing or stale.

Staleness is detected by comparing the SHA256 of cases.py at cache-build time
against the current cases.py. Auto-converts old line-level format if needed.
"""
if not COVERAGE_CACHE_PATH.exists():
return None

try:
with gzip.open(COVERAGE_CACHE_PATH, "rt", encoding="utf-8") as f:
cache = json.load(f)
except (OSError, gzip.BadGzipFile, json.JSONDecodeError, UnicodeDecodeError):
cons.print("[yellow]Warning: Coverage cache is unreadable or corrupt.[/yellow]")
return None

cases_py = Path(root_dir) / "toolchain/mfc/test/cases.py"
try:
current_hash = hashlib.sha256(cases_py.read_bytes()).hexdigest()
except FileNotFoundError:
cons.print("[yellow]Warning: cases.py not found; cannot verify cache staleness.[/yellow]")
return None
stored_hash = cache.get("_meta", {}).get("cases_hash", "")

if current_hash != stored_hash:
cons.print("[yellow]Warning: Coverage cache is stale (cases.py changed).[/yellow]")
return None

return _normalize_cache(cache)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Action required

2. Cache staleness too weak 🐞 Bug ✓ Correctness

Cache validity is only checked via cases.py hash, so dependency-graph/build changes can leave a
stale cache that still “looks valid” and causes incorrect pruning. CI will still pass
--only-changes even when the cache rebuild job is skipped or fails to provide a new artifact.
Agent Prompt
### Issue description
The coverage cache can become stale due to Fortran dependency graph changes or other build/source changes without modifying `cases.py`. Because `load_coverage_cache` only checks `cases.py` hash, the toolchain may treat a stale cache as valid and incorrectly prune tests.

### Issue Context
- CI attempts to rebuild cache on `dep_changed`, but `--only-changes` is still enabled on PRs regardless, and cache downloads are best-effort.
- If rebuild fails or the artifact is missing, CI may still prune using an older committed cache that is no longer correct.

### Fix Focus Areas
- toolchain/mfc/test/coverage.py[520-549]
- .github/workflows/test.yml[95-149]
- .github/workflows/test.yml[190-197]
- .github/workflows/test.yml[255-281]

### Suggested direction
- Add additional `_meta` fields to the cache (e.g., hash of a dependency-graph input, or a hash of relevant Fortran source/include trees) and validate them in `load_coverage_cache`.
- In CI, if `dep_changed == true`, only enable `--only-changes` when the rebuilt cache artifact was successfully produced and downloaded; otherwise force full suite (or omit `--only-changes`).
- Consider making artifact download non-optional (remove `continue-on-error`) when pruning is enabled based on it.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

use m_variables_conversion !< State variables type conversion procedures

use m_helper_basic !< Functions to compare floating point numbers
use m_helper_basic
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Action required

3. Duplicate use statement 🐞 Bug ✓ Correctness

m_bubbles.fpp now contains a duplicate use m_helper_basic, which should be reverted before merge
as noted in the PR description. Leaving it in can introduce compiler warnings or stricter build
failures depending on flags.
Agent Prompt
### Issue description
`src/simulation/m_bubbles.fpp` has a duplicate `use m_helper_basic` statement.

### Issue Context
The PR description marks this as TEMP to exercise CI dep-change detection; it should not be merged.

### Fix Focus Areas
- src/simulation/m_bubbles.fpp[16-20]ുഞ്ഞ

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

@github-actions
Copy link

github-actions bot commented Mar 2, 2026

Claude Code Review

Head SHA: 0e5c6cbe35d223300de588660cb91e43fc7b5e0e
Files changed: 16 — coverage.py (661 lines, new), test_coverage_unit.py (608 lines, new), test.yml, test.py, CMakeLists.txt, case.py, commands.py, rebuild-cache.sh, test_coverage_cache.json.gz, shell scripts, m_bubbles.fpp, .gitignore, file-filter.yml

Summary:

  • Adds a gcov-based file-level coverage cache (555 test UUIDs → covered .fpp files) to enable test pruning on PRs
  • New --only-changes flag intersects PR-changed .fpp files against the cache; conservative fallbacks for missing/stale cache or git failures
  • New rebuild-cache CI job runs on Phoenix via SLURM when cases.py or Fortran dependency graph changes; commits cache to master on pushes, uploads as artifact on PRs
  • CMakeLists.txt correctly disables -march=native and LTO for gcov builds to avoid assembler issues on Granite Rapids
  • 53 unit tests verify core coverage logic entirely offline

Findings

[BLOCKER] TEMP code must be removed before merge

The PR body's test plan explicitly says "Revert TEMP changes before merge," but three TEMP items remain:

  1. src/simulation/m_bubbles.fpp line 19 — intentional duplicate use m_helper_basic added to exercise cache rebuild. This must be removed; it will cause a lint/precheck failure on master.
  2. toolchain/mfc/test/coverage.py lines 604–611 — ALWAYS_RUN_ALL is stripped to GPU macros only (comment: "Restore full list before merge"). Missing entries: toolchain/mfc/test/cases.py, CMakeLists.txt, toolchain/ prefix, etc.
  3. toolchain/mfc/test/coverage.py line 618 — ALWAYS_RUN_ALL_PREFIXES = () is an empty tuple (comment: "cmake/ prefix removed to exercise pruning in CI. Restore before merge"). An empty tuple means CMakeLists.txt changes will never trigger full suite.

[SECURITY] Self-hosted runner checks out PR head SHA

.github/workflows/test.yml, rebuild-cache job, line ~203:

ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }}

The job runs on group: phoenix (self-hosted HPC runner) and checks out untrusted fork code on PR events. This is a standard self-hosted runner + pull_request event security risk: a malicious PR could exfiltrate secrets or abuse cluster allocations. Consider using pull_request_target with the base SHA for cache rebuild, or restricting the job to github.event_name == 'push' and workflow_dispatch only (artifacts path for PRs could instead download a pre-built cache from master).


[BUG] _prepare_test unconditionally applies post_process output params

toolchain/mfc/test/coverage.py line 904:

case.params.update(get_post_process_mods(case.params))

This is applied to all test cases regardless of whether post_process is in that test's targets. get_post_process_mods returns {'parallel_io': 'T', 'cons_vars_wrt': 'T', 'prim_vars_wrt': 'T', ...}. For tests that don't run post_process, injecting parallel_io: T changes simulation behavior (writes field data), altering runtime and potentially coverage. This makes the cache represent different runs than what normal ./mfc.sh test executes. The check should be conditional on whether the test includes post_process as a target.


[MINOR] Phase 2 cap comment says "Phase 1"

toolchain/mfc/test/coverage.py lines 958–959:

# Cap Phase 1 parallelism: each test spawns MPI processes (~500MB each)...
phase1_jobs = min(n_jobs, 32)

Phase 1 (test preparation) is a sequential for loop (lines 970–974). This cap applies to Phase 2 (parallel test execution via ThreadPoolExecutor(max_workers=phase1_jobs)). The variable name and comment should reference Phase 2.


[MINOR] Dep-change detection comment says "added/removed" but only checks + lines

.github/workflows/test.yml lines 162–163:

# Detect added/removed use/include statements that change the
# Fortran dependency graph

The grep -qP '^\+\s*...' pattern only matches added lines. Removed use statements don't trigger a rebuild. This is conservatively correct (no false negatives for the critical direction), but the comment is inaccurate. Consider updating to "Detect added use/include statements...".


Overall this is a well-designed feature with good conservative fallbacks. The TEMP items are the primary blocker — the duplicate use in m_bubbles.fpp will fail precheck on master, and the stripped ALWAYS_RUN_ALL list means CMakeLists.txt and toolchain changes won't trigger full suite after merge. Recommend addressing those before merging.

@codecov
Copy link

codecov bot commented Mar 2, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 44.04%. Comparing base (ab5082e) to head (0e5c6cb).
⚠️ Report is 3 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #1284   +/-   ##
=======================================
  Coverage   44.04%   44.04%           
=======================================
  Files          70       70           
  Lines       20499    20499           
  Branches     1993     1993           
=======================================
  Hits         9028     9028           
  Misses      10330    10330           
  Partials     1141     1141           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

sbryngelson and others added 6 commits March 2, 2026 10:45
- File-level gcov coverage cache maps 555 test UUIDs to exercised .fpp
  source files (gzip JSON, 11KB, committed to repo)
- --only-changes flag prunes tests by intersecting PR-changed files
  against coverage cache; --build-coverage-cache builds the cache
- New rebuild-cache CI job runs on Phoenix via SLURM when cases.py or
  Fortran dependency graph changes (on both PRs and master pushes)
- Dep-change detection greps PR/push diffs for added use/include
  statements that would invalidate the coverage cache
- Conservative fallbacks: missing cache runs all, missing sim coverage
  includes test, ALWAYS_RUN_ALL files trigger full suite
- Remove continue-on-error from github CI job (fixes auto-cancellation)
- TEMP: duplicate use in m_bubbles.fpp + remove CMakeLists.txt from
  ALWAYS_RUN_ALL to test the full cache rebuild pipeline in CI
- 53 unit tests cover core coverage logic

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The coverage cache builder's _prepare_test was generating simulation
.inp files without post_process output params (cons_vars_wrt, parallel_io,
etc.). Without these, simulation doesn't write output files and
post_process fails.

Extract post_process param dicts into shared constants in case.py
(POST_PROCESS_OUTPUT_PARAMS, POST_PROCESS_3D_PARAMS, POST_PROCESS_OFF_PARAMS)
and a get_post_process_mods() function. Both the generated case.py template
and coverage.py now use the same source of truth.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…uard

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Heavy 3D QBMM tests with gcov instrumentation need more time.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Stale LTO objects from a different node architecture cause linker
failures that persist across retries when only staging/ and install/
are removed.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@sbryngelson sbryngelson force-pushed the gcov-test-pruning-v3 branch from 8a62499 to eda7082 Compare March 2, 2026 15:45
@github-actions
Copy link

github-actions bot commented Mar 2, 2026

Claude Code Review

Head SHA: eda7082
Files changed: 18

Changed files
  • .github/file-filter.yml
  • .github/workflows/frontier/build.sh
  • .github/workflows/frontier/test.sh
  • .github/workflows/frontier_amd/build.sh
  • .github/workflows/frontier_amd/test.sh
  • .github/workflows/phoenix/rebuild-cache.sh
  • .github/workflows/phoenix/submit.sh
  • .github/workflows/phoenix/test.sh
  • .github/workflows/test.yml
  • .gitignore
  • CMakeLists.txt
  • src/simulation/m_bubbles.fpp
  • toolchain/mfc/cli/commands.py
  • toolchain/mfc/test/case.py
  • toolchain/mfc/test/coverage.py
  • toolchain/mfc/test/test.py
  • toolchain/mfc/test/test_coverage_cache.json.gz
  • toolchain/mfc/test/test_coverage_unit.py

Summary:

  • Adds coverage.py (661 lines) implementing a 3-phase gcov-based test pruning system: build with coverage, run all tests with isolated GCOV_PREFIX dirs, collect per-test .fpp file coverage.
  • Introduces --build-coverage-cache and --only-changes CLI flags for targeted test execution on PRs.
  • Adds a rebuild-cache CI job (Phoenix, SLURM) that auto-commits updated cache on pushes to master.
  • Removes continue-on-error: true from the github CI job — test failures now block the run.
  • Upgrades Phoenix CPU partition from cpu-small to cpu-gnr (exclusive Granite Rapids) and scales parallel threads up to 64.

Findings

1. TEMP code must be reverted before merge — unchecked PR TODO

The PR description explicitly lists "Revert TEMP changes before merge" as an unchecked checklist item. Three locations have deliberate temporary regressions:

src/simulation/m_bubbles.fpp:463 — duplicate use m_helper_basic inserted to trigger a dep-change detection in CI. This violates Fortran module conventions and will produce a compiler warning (or error on strict compilers).

toolchain/mfc/test/coverage.py:649–655ALWAYS_RUN_ALL is stripped to GPU macros only. The comment says: "TEMP: stripped to GPU macros only so CI exercises the pruning logic. Restore full list before merge." Missing entries include toolchain/mfc/test/cases.py, CMakeLists.txt, any Fypp include files under cmake/, and the toolchain itself. Without these, a PR that only modifies cases.py or CMakeLists.txt (but not the GPU macros) will incorrectly prune tests via the coverage logic.

toolchain/mfc/test/coverage.py:661–662ALWAYS_RUN_ALL_PREFIXES = () is empty. The comment says: "TEMP: cmake/ prefix removed to exercise pruning in CI. Restore before merge." CMakeLists.txt changes control what gets compiled; they must always run the full suite.

Action required: Revert all three TEMP changes before this is merged. The PR should not merge with an unchecked TODO item in the test plan.


2. filter_tests_by_coverage silently skips all tests when no .fpp files changed

coverage.py:1226–1227:

changed_fpp = {f for f in changed_files if f.endswith(".fpp")}
if not changed_fpp:
    return [], list(cases)

If a PR changes only Python files (toolchain/mfc/...), YAML, shell scripts, or CMake files — but cases.py is not in ALWAYS_RUN_ALL (currently it isn't, due to TEMP) — --only-changes will return zero tests to run. This is correct only if ALWAYS_RUN_ALL includes all toolchain files that affect test outcomes. The TEMP stripping of ALWAYS_RUN_ALL makes this a real risk right now.

After the TEMP is reverted, this should be re-verified: confirm that toolchain/mfc/test/cases.py is in ALWAYS_RUN_ALL so that test-definition changes force a full run.


3. rebuild-cache writes to master via bot commit — branch protection interaction

test.yml:265–276:

- name: Commit Cache to Master
  if: github.event_name == 'push' || github.event_name == 'workflow_dispatch'
  run: |
    git config user.name "github-actions[bot]"
    git push

This bot push to master will fail silently if branch protection rules require PRs or signed commits. The git diff --cached --quiet guard prevents an empty commit, but if the push itself fails (e.g., due to a concurrent push to master creating a non-fast-forward), the step exits non-zero and fails the job. Consider adding || echo "Push failed — cache will be rebuilt on next run." or using a separate dedicated branch for the cache file to avoid contention.


4. should_run_all_tests with empty ALWAYS_RUN_ALL_PREFIXES

coverage.py:1207–1208:

return any(f.startswith(ALWAYS_RUN_ALL_PREFIXES) for f in changed_files)

When ALWAYS_RUN_ALL_PREFIXES = () (an empty tuple), str.startswith(()) always returns False. This is correct Python semantics, but it means the any(...) call always returns False, silently defeating the prefix check. The code is correct in behavior but fragile — add a comment explaining the empty-tuple edge case is intentional (currently the TEMP comment is the only documentation).


5. Phase 2 parallelism cap comment is misleading

coverage.py:1001–1003:

# Cap Phase 1 parallelism: each test spawns MPI processes (~500MB each)...
phase1_jobs = min(n_jobs, 32)

The variable is named phase1_jobs but is used for Phase 2 (the parallel test execution via ThreadPoolExecutor). Phase 1 (preparation) is a plain for loop with zero parallelism. The comment says "Cap Phase 1 parallelism" which is misleading. Minor but worth fixing for maintainability.


6. _prepare_test unconditionally applies POST_PROCESS_OUTPUT_PARAMS

coverage.py:948:

case.params.update(get_post_process_mods(case.params))

This injects post_process output params (parallel_io: T, cons_vars_wrt: T, etc.) into every test case unconditionally, regardless of whether the test normally runs post_process. This diverges from what the test golden files were generated with and could cause simulation to write extra output files, potentially altering behavior for tests that don't use post_process. The intent is correct (post_process needs data), but this should at minimum be gated on whether post_process is actually a target for this test case.


Minor Observations

  • The dep-check step in test.yml:211 references ${{ github.event.before }} and ${{ github.event.after }} without quoting or default values — the || echo "" fallback handles the missing-SHA case correctly, but an explicit ${{ github.event.before || '' }} would be more readable.
  • coverage.py:974: ppn is accessed via getattr(case, 'ppn', 1) rather than case.ppn. If ppn is always set on TestCase, the getattr default is unnecessary. If it can be absent, this is correct defensive code.
  • The build call inside build_coverage_cache (via test.py:1349) iterates [PRE_PROCESS, SIMULATION, POST_PROCESS] but _prepare_test also prepares a SYSCHECK binary. If SYSCHECK needs a special build slug it may not be available.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@github-actions
Copy link

github-actions bot commented Mar 2, 2026

Claude Code Review

Head SHA: 4d94545
Files changed: 18
Key files: toolchain/mfc/test/coverage.py (+668), toolchain/mfc/test/test_coverage_unit.py (+608), .github/workflows/test.yml (+122/-8), CMakeLists.txt (+26/-11), toolchain/mfc/test/case.py (+42/-18), toolchain/mfc/cli/commands.py (+25), toolchain/mfc/test/test.py (+66/-1), toolchain/mfc/test/test_coverage_cache.json.gz (binary)


Summary

  • Adds gcov-based file-level coverage cache (555-test map, gzip JSON committed to repo) to support --only-changes test pruning on PRs
  • New rebuild-cache CI job on Phoenix SLURM node rebuilds cache when cases.py or Fortran use/include graph changes; commits back to master on push events
  • Conservative fallbacks: missing cache → full suite; test not in cache → include; simulation coverage absent but sim files changed → include
  • 53 unit tests cover filter_tests_by_coverage, should_run_all_tests, _parse_gcov_json_output, and _normalize_cache
  • PR description explicitly flags TEMP changes that must be reverted before merge

Findings

🔴 Must Fix Before Merge

1. TEMP changes are present and must be reverted
The PR description explicitly lists these as TEMP and requires reverting before merge:

  • src/simulation/m_bubbles.fpp:19 — duplicate use m_helper_basic (will cause a compiler warning/error and is intentionally invalid Fortran)
  • toolchain/mfc/test/coverage.py:650–662ALWAYS_RUN_ALL is stripped to only 4 GPU macro files; ALWAYS_RUN_ALL_PREFIXES = () (empty). Missing entries: src/common/include/macros.fpp, toolchain/mfc/test/cases.py, toolchain/mfc/test/case.py, toolchain/mfc/params/definitions.py, toolchain/mfc/run/input.py, toolchain/mfc/case_validator.py, cmake/ prefix directory
  • toolchain/mfc/test/test_coverage_unit.py:1600–1607 — test assertions for macros.fpp and cases.py triggering full suite are disabled with TEMP comments, and test_non_fpp_always_run_all_detected is deleted

Until reverted, changing build-system files (CMakeLists.txt, cmake/, toolchain Python) will incorrectly prune tests instead of triggering the full suite.


🟡 Design / Correctness Concerns

2. _prepare_test applies get_post_process_mods() unconditionally to every test (coverage.py:950)

case.params.update(get_post_process_mods(case.params))

This injects parallel_io=T, cons_vars_wrt=T, etc. into every case—including tests that don't run post_process. The actual test runner applies these params only when post_process is a target. As a result, the coverage cache is built with parameters that differ from real test runs, meaning coverage recorded for simulation code paths may not reflect what the actual test suite exercises. For tests whose simulation run writes different data with these params enabled, the coverage map may be inaccurate or over-inclusive.

3. Brand-new .fpp file skips all tests (coverage.py:1218–1265, documented in test_coverage_unit.py:1765)
If a PR adds a new .fpp source file (e.g., a new physics module), no existing test covers it, so test_file_set & changed_fpp is empty for every test and all tests are skipped. The conservative fallback for "no simulation coverage" (n_no_sim_coverage) does not trigger if the test has other simulation-file coverage—it only fires when the test has zero simulation coverage. The result is a new file introduction gets zero test coverage from --only-changes, which defeats the safety guarantee of the system. Consider adding a rule: if changed_fpp contains files not appearing anywhere in the cache, fall back to running the full suite.

4. rebuild-cache job checks out PR fork code on self-hosted runner (test.yml:247)

ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }}

For pull_request events from external contributors (forks), this checks out and builds + executes the fork's code on a privileged self-hosted Phoenix HPC node (with contents: write permission). While pull_request events don't expose secrets, running untrusted Fortran code on shared cluster infrastructure (with MPI, filesystem access, and potential side-effects) is a supply-chain risk. Consider restricting rebuild-cache to push and workflow_dispatch events only, or requiring approval for fork PRs via pull_request_target with environment: protected gating.


🟢 Minor / Improvement Opportunities

5. Dep-change regex may produce false positives (test.yml:215–216)

grep -qP '^\+\s*(use[\s,]+\w|#:include\s|include\s+['"'"'"])'

This runs on the full diff including Python, YAML, and Markdown files. A docstring like # use this pattern or a Python line use_flag = True would match if it starts the line. Consider filtering to only *.fpp diff hunks before applying the pattern.

6. rebuild-cache dependency makes phoenix and checkall jobs require it (test.yml:282–288)
Both downstream jobs check needs.rebuild-cache.result != 'cancelled'. GitHub marks skipped jobs as skipped, not cancelled, so a skipped rebuild-cache (no dep change) will not block them. This is correct, but worth noting: if rebuild-cache fails (hardware issue, test failure), downstream jobs still proceed without a fresh cache. This is intentional (fall back to full suite), but it means a failing cache build is silent—consider logging a warning when the cache artifact is absent despite rebuild-cache having run.

7. phase1_jobs cap comment is slightly misleading (coverage.py:1003–1005)
The comment says "each test spawns MPI processes (~500MB each)" but each test runs 3 targets (pre_process + simulation + post_process) sequentially. The memory estimate is per-target, not per-test. With 32 workers × 3 binaries × ~500MB, peak RAM could be ~48 GB. On GNR nodes this is likely fine, but the comment should reflect the actual multiplier.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

2 participants