Add --with-heartbeat flag for heartbeat logging during long-running tasks#714
Add --with-heartbeat flag for heartbeat logging during long-running tasks#714ay901246 wants to merge 1 commit intocloudfoundry:mainfrom
Conversation
|
Discussed with the team offline - the general use case is |
|
Note - once this is merged, we also need to update https://github.com/cloudfoundry/docs-bosh/blob/master/content/cli-v2.md |
3d887a0 to
ae86fc9
Compare
|
Can we take a look at the name? |
…rrands Errands produce no CLI output while the Agent executes the script. This silence causes CI/CD systems to kill the process due to inactivity timeouts and leaves operators unsure whether the task is still running. Add an opt-in --with-heartbeat flag to `bosh run-errand` that prints periodic heartbeat status lines while a task is processing or queued. bosh run-errand smoke_tests --with-heartbeat bosh run-errand smoke_tests --with-heartbeat=10 Output: Task 185528 | 16:16:23 | Task state: processing (5s elapsed) No Director changes required — uses existing task API fields (state, started_at). The HeartbeatReporter interface extends TaskReporter via safe type assertion so no existing code is affected. Throttling is handled in the reporter via timestamp comparison. Made-with: Cursor
ae86fc9 to
cab07db
Compare
Alphasite
left a comment
There was a problem hiding this comment.
Some structural nits, but nothing awful. If you want to defer this to a seperate pr, feel free.
| return c.sessionImpl() | ||
| } | ||
|
|
||
| func (c Cmd) sessionImpl() *SessionImpl { |
There was a problem hiding this comment.
Do we need this method here?
There was a problem hiding this comment.
I think this makes sense to change - if it's more "Go" to cast the interface to a concrete *SessionImpl (like sess, ok := c.session().(*SessionImpl)) instead of an explicit method to return the concrete ... struct? class? interface? then I think we can do this.
| TaskOutputChunk(int, []byte) | ||
| } | ||
|
|
||
| // HeartbeatReporter is an optional extension of TaskReporter. Implementations |
There was a problem hiding this comment.
We only have 2 implementations of this interface (the reporter and a noop reporter, and there is an EnableHeartbeat is there any reason to ad an interface here? it feels like an unnecessary abstraction.
(I tend to prefer to fewer interfaces and abstractions where i can)
There was a problem hiding this comment.
I don't necessarily agree with this - we just had some churn around removing bosh task <id> --poll-state which shrinks the radius of this change, and if we remove this, we'd have to rely on heartbeatSeconds <= 0 being the only gate as to whether we have heartbeats or not, which kind of feels like a "side-car" driving the logic rather than explicitly saying "you're a heartbeat reporter, you mind heartbeats".
Optionally, here's how AI suggested a reply to this (in case my lack of go-understanding muddies the communication here):
I'm hesitant to merge this into the main TaskReporter interface. Since we narrowed the scope of this feature to just run-errand (dropping bosh task ), heartbeats are highly specific to a single workflow.
If we merge them, we'd have to rely on the internal heartbeatSeconds <= 0 state as the only gatekeeper for whether heartbeats happen or not. That feels a bit like a 'side-car' driving the logic. I prefer using the Go idiom of optional interfaces (like io.WriterTo extending io.Writer), which explicitly checks for structural capability rather than internal state. It also protects any future reporters (like a JSON or webhook reporter) from being forced to implement a dummy heartbeat method just to satisfy the core interface.
Let me know what you think!
Also, it feels like making a separate interface for this adheres more to the "i" in SOLID (interface segregation).
| fakedir "github.com/cloudfoundry/bosh-cli/v7/director/directorfakes" | ||
| ) | ||
|
|
||
| type heartbeatCall struct { |
There was a problem hiding this comment.
i feel like this could also be solved with the counterfeiter generated mock rather than building a custom one, although i do see what you're going for.
There was a problem hiding this comment.
I think(?) this can only be solved that way if we merge the heartbeat reporter into the task reporter?
I think if we can agree on changing point 1 to explicitly cast rather than utilize a method to return the concrete implementation, but also agree on leaving 2 and 3, then I can make these changes and we can approve/push. If you disagree, let's hop on a quick call Monday to hash it out, in case I'm misunderstanding. |
Rev 4
Usage update:
Rev 3
descriptionfrom tests, task response, and no longer passing it through heartbeat reporterRevision 2 (per reviewer feedback)
Removed --poll-state from bosh task — scoped to bosh run-errand only
Removed task description from heartbeat output to avoid redundancy with the event log
Kept elapsed time for operator visibility
Heartbeat output is now minimal:
Task 185528 | 16:16:23 | Task state: processing (5s elapsed)
Files changed in this revision: cmd/cmd.go, cmd/opts/opts.go, cmd/opts/opts_test.go, ui/task/reporter.go, ui/task/reporter_test.go (net -16 lines)
Summary
Related to issue #715
Errands and other long-running BOSH tasks produce no CLI output while the Agent executes the errand script. This silence causes CI/CD systems (Concourse, GitHub Actions, Jenkins) to kill the process due to inactivity timeouts, and leaves human operators unsure whether the task is still running.
This PR adds an opt-in
--poll-stateflag tobosh run-errandandbosh taskthat prints periodic heartbeat status lines while a task is in theprocessingorqueuedstate. The flag is fully backward-compatible — without it, behavior is identical to today.Usage
Default 30s interval
bosh run-errand smoke_tests --poll-stateCustom 10s interval
bosh run-errand smoke_tests --poll-state=10Reconnect to an in-progress task
bosh task 185528 --poll-stateOutput format
Heartbeat lines match the existing BOSH event log format:
Task 185528 | 16:16:18 | Task state: queuedTask 185528 | 16:16:23 | Task state: processing (5s elapsed) (run errand hello-errand from deployment my-errand-deployment)Architecture
state,description,started_at).HeartbeatReporter): ExtendsTaskReportervia safe type assertion — does not break any existing code.ReporterImpl— notime.Sleepin the UI layer, theWaitForCompletionloop spins at its normal rate and the reporter only prints when the interval has passed.sync.Mutexfor concurrent access to heartbeat state.boshui.UI: Ensures--jsonmode works correctly.Files changed
director/interfaces.goHeartbeatReporterinterfacedirector/task_client_request.godescription/started_attotaskShortRespui/task/reporter.goEnableHeartbeat()+TaskHeartbeat()with throttlingcmd/opts/opts.go--poll-stateflag onRunErrandOptsandTaskOptscmd/cmd.gocmd/session.gotaskReporterforrun-errandpathTest plan
Unit tests (included)
startedAtshows elapsed timestartedAt(value 0) — no elapsed, no "queued" confusionWaitForCompletionemits heartbeats when reporter implementsHeartbeatReporterWaitForCompletiondoes not emit when reporter does not implement the interfacePollStateon bothTaskOptsandRunErrandOpts