Skip to content

fix(execution): report cancellation durability truthfully#3550

Open
PlaneInABottle wants to merge 5 commits intosimstudioai:stagingfrom
PlaneInABottle:fix/cancel-durability
Open

fix(execution): report cancellation durability truthfully#3550
PlaneInABottle wants to merge 5 commits intosimstudioai:stagingfrom
PlaneInABottle:fix/cancel-durability

Conversation

@PlaneInABottle
Copy link

Summary

  • make the cancel-execution response reflect whether the Redis-backed cancellation marker was durably recorded
  • return durability metadata (durablyRecorded, reason) and avoid misleading success logging when persistence fails
  • fix the targeted cancellation tests by hoisting Vitest mocks in cancellation.test.ts

Problem / current behavior

  • the cancel-execution API previously implied success after attempting cancellation even when the Redis-backed marker was not durably recorded
  • when Redis was unavailable or the write failed, callers and operators could read the response/logging as a successful cancellation even though no durable cancellation marker existed
  • the targeted cancellation tests also needed a follow-up fix so the mocked module state was initialized correctly under Vitest

Proposed fix / behavior after

  • update the cancellation helper to return structured durability results instead of a bare boolean
  • make the cancel route set success based on whether the cancellation marker was durably recorded, and include durablyRecorded plus a failure reason in the response
  • downgrade the non-durable path from misleading success reporting to warning-level reporting
  • hoist the Vitest mocks in apps/sim/lib/execution/cancellation.test.ts so the targeted tests reliably validate the success and failure outcomes

Why this matters

  • prevents false-positive cancellation reporting
  • makes Redis availability/write failures visible and actionable to API consumers and operators
  • preserves regression coverage for the durability contract this branch introduces

Files changed

  • apps/sim/app/api/workflows/[id]/executions/[executionId]/cancel/route.ts
  • apps/sim/app/api/workflows/[id]/executions/[executionId]/cancel/route.test.ts
  • apps/sim/lib/execution/cancellation.ts
  • apps/sim/lib/execution/cancellation.test.ts

Tests run

  • cd /Users/Y_ALTAY1/Documents/simWoot/sim/apps/sim && bun run test -- lib/execution/cancellation.test.ts 'app/api/workflows/[id]/executions/[executionId]/cancel/route.test.ts'

Notes / non-goals / follow-ups

  • this PR is intentionally scoped to truthful cancellation durability reporting plus the follow-up test-only fix
  • it does not redesign broader cancellation orchestration, retries, or worker-side handling beyond surfacing whether the Redis-backed marker was durably written
  • the second commit on this branch is test-only and limited to hoisting Vitest mocks in cancellation.test.ts

@cursor
Copy link

cursor bot commented Mar 12, 2026

PR Summary

Medium Risk
Touches workflow execution cancellation paths and SSE streaming lifecycle; incorrect registration/unregistration or response semantics could lead to leaked abort handlers or misleading cancellation behavior.

Overview
Makes execution cancellation reporting truthful and observable by changing markExecutionCancelled to return structured durability results (durablyRecorded + reason) and updating the cancel API response/logging to reflect Redis availability/write failures.

Adds an in-process fallback for manual/SSE runs via a new manual-cancellation registry: the execute route registers an aborter per executionId and the cancel route can now immediately abort locally when Redis durability is unavailable.

Includes new/updated Vitest coverage for the cancel API and cancellation helpers, plus a test mock update to provide AuthType in the hybrid auth mock.

Written by Cursor Bugbot for commit 7aed473. This will update automatically on new commits. Configure here.

@vercel
Copy link

vercel bot commented Mar 12, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

1 Skipped Deployment
Project Deployment Actions Updated (UTC)
docs Skipped Skipped Mar 13, 2026 11:00am

Request Review

@icecrasher321
Copy link
Collaborator

@PlaneInABottle please make this point towards staging as well

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Mar 12, 2026

Greptile Summary

This PR improves cancellation durability reporting by refactoring markExecutionCancelled to return a structured discriminated union (ExecutionCancellationRecordResult) instead of a bare boolean, and propagates that result through the cancel API route so callers can distinguish between "durably recorded", "Redis unavailable", and "Redis write failed" outcomes. The fix is well-scoped and the core cancellation.ts changes are clean; however, one semantic issue exists in the route response.

Key changes:

  • cancellation.ts: markExecutionCancelled now returns { durablyRecorded, reason } instead of boolean; the exported discriminated union type is a good contract
  • route.ts: success is now gated on durablyRecorded, and a reason field surfaces the failure mode — but redisAvailable is incorrectly set to cancellation.durablyRecorded, conflating "is Redis reachable" with "did the write succeed"
  • route.test.ts / cancellation.test.ts: New test files with vi.hoisted mocks correctly cover all three outcome variants, though the route tests mirror the redisAvailable bug

Confidence Score: 3/5

  • Safe to merge with a minor fix — the redisAvailable field in the response incorrectly reports false when Redis is reachable but the write fails, misleading API consumers.
  • The core logic in cancellation.ts is correct and well-typed. The route improvement is meaningful (truthful success flag, reason field). The only defect is that redisAvailable is aliased to durablyRecorded rather than being derived from reason !== 'redis_unavailable', producing a self-contradictory response body in the redis_write_failed path. The tests reinforce this incorrect behaviour, so the bug would persist without a targeted fix.
  • apps/sim/app/api/workflows/[id]/executions/[executionId]/cancel/route.ts (line 52) and the corresponding assertion in route.test.ts (lines 77 and 99).

Important Files Changed

Filename Overview
apps/sim/app/api/workflows/[id]/executions/[executionId]/cancel/route.ts Durability reporting upgraded to use structured result; however redisAvailable is incorrectly aliased to durablyRecorded, misreporting Redis availability when the write fails after a successful connection.
apps/sim/lib/execution/cancellation.ts Clean refactor: markExecutionCancelled now returns a discriminated union ExecutionCancellationRecordResult with durablyRecorded and typed reason; all three paths (unavailable, success, write error) are correctly handled.
apps/sim/app/api/workflows/[id]/executions/[executionId]/cancel/route.test.ts New test file with good coverage of all three cancellation outcomes; the redisAvailable: false expectation for the redis_write_failed case mirrors the bug in the route, so it will pass but test incorrect behaviour.
apps/sim/lib/execution/cancellation.test.ts New test file; mocks are properly hoisted with vi.hoisted, covering all three result variants of the refactored markExecutionCancelled.

Sequence Diagram

sequenceDiagram
    participant Client
    participant CancelRoute as POST /cancel
    participant markExecutionCancelled
    participant Redis

    Client->>CancelRoute: POST (workflowId, executionId)
    CancelRoute->>CancelRoute: checkHybridAuth
    CancelRoute->>CancelRoute: authorizeWorkflowByWorkspacePermission
    CancelRoute->>markExecutionCancelled: markExecutionCancelled(executionId)

    alt Redis client is null
        markExecutionCancelled-->>CancelRoute: { durablyRecorded: false, reason: 'redis_unavailable' }
    else Redis client exists
        markExecutionCancelled->>Redis: SET execution:cancel:{id} 1 EX 3600
        alt Write succeeds
            Redis-->>markExecutionCancelled: OK
            markExecutionCancelled-->>CancelRoute: { durablyRecorded: true, reason: 'recorded' }
        else Write throws
            Redis-->>markExecutionCancelled: Error
            markExecutionCancelled-->>CancelRoute: { durablyRecorded: false, reason: 'redis_write_failed' }
        end
    end

    CancelRoute-->>Client: 200 { success, executionId, redisAvailable, durablyRecorded, reason }
Loading

Last reviewed commit: 47dadaa

@PlaneInABottle PlaneInABottle changed the base branch from main to staging March 12, 2026 23:26
@PlaneInABottle
Copy link
Author

@icecrasher321 Done — the PR has been retargeted to staging.

@icecrasher321 icecrasher321 changed the title fix: report cancellation durability truthfully fix(execution): report cancellation durability truthfully Mar 12, 2026
@icecrasher321
Copy link
Collaborator

bugbot run

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Bugbot reviewed your changes and found no new issues!

Comment @cursor review or bugbot run to trigger another review on this PR

@icecrasher321
Copy link
Collaborator

Screenshot 2026-03-13 at 3 05 16 AM

Seems to be a problem on staging too -- but client side aborts using the start/stop button in the top right in the canvas are not being marked as Cancelled reliably in the logs. Try with wait block for 10s -- and cancel midway -- it should say Cancelled always, but sometimes it says Info.

@PlaneInABottle

@vercel
Copy link

vercel bot commented Mar 13, 2026

Someone is attempting to deploy a commit to the Sim Team on Vercel.

A member of the Team first needs to authorize it.

@PlaneInABottle
Copy link
Author

PlaneInABottle commented Mar 13, 2026

Follow-up fix pushed in 0b8b084c2 for the staging symptom where a manual run could still finish as Info after pressing Stop during a wait block.

Root cause I verified locally:

  • the cancel route was only writing the Redis cancellation marker
  • on the affected runs, Redis was unavailable, so the marker was never durably recorded
  • the in-flight manual SSE execution therefore never observed cancellation and the wait block completed normally, which let the response block run and the execution finalize as successful/info

What changed:

  • added a tiny in-memory registry for active manual SSE execution aborters in apps/sim/lib/execution/manual-cancellation.ts
  • manual execution now registers its local abort controller in apps/sim/app/api/workflows/[id]/execute/route.ts
  • the cancel route now falls back to that local in-process abort in apps/sim/app/api/workflows/[id]/executions/[executionId]/cancel/route.ts when Redis durability is unavailable
  • the existing Redis durability fields stay intact; the response now also includes locallyAborted

Validation:

  • bun run --cwd apps/sim test -- app/api/workflows/[id]/executions/[executionId]/cancel/route.test.ts lib/execution/cancellation.test.ts
  • bun run --cwd apps/sim type-check
  • analyzer review: approved

This keeps the PR on the same cancellation durability theme, but closes the practical gap for local/manual executions when Redis is down.

@PlaneInABottle
Copy link
Author

Screenshot 2026-03-13 at 3 05 16 AM

Seems to be a problem on staging too -- but client side aborts using the start/stop button in the top right in the canvas are not being marked as Cancelled reliably in the logs. Try with wait block for 10s -- and cancel midway -- it should say Cancelled always, but sometimes it says Info.

@PlaneInABottle

I think they were a bit unrelated but patched the issue and added to this pr.

resim

it now shows cancelled

PlaneInABottle and others added 5 commits March 13, 2026 13:57
Return explicit durability results for execution cancellation so success only reflects persisted cancellation state instead of best-effort Redis availability.
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Abort active manual SSE executions locally when Redis cannot durably record the cancellation marker so the run still finalizes as cancelled instead of completing normally.
Keep the rebased async execute route test aligned with the current hybrid auth module exports so it exercises the queueing path instead of failing at import time.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants