Only increment stats when the worker acknowledged the test #373

ChrisBr · 2026-02-08T20:47:11Z

Only increment error stats when the worker acknowledged the test otherwise we end up with an incorrect counter.

ChrisBr · 2026-02-08T20:47:51Z

ruby/lib/minitest/queue/runner.rb

          attributes.merge(type: type)
        end.compact
-        File.open(queue_config.warnings_file, 'w') do |f|
+        File.open(queue_config.warnings_file, 'a') do |f|


We call this multiple times so we need to use append.

kangze-jia · 2026-02-09T21:20:00Z

ruby/lib/ci/queue/redis/build_record.rb

-          @queue.increment_test_failed if acknowledged == 1
+          if acknowledged
+            # if another worker already acknowledged the test, we don't need to update the global stats or increment the test failed count
+            record_stats(stats)


This may not be enough.

Considering this scenario:

Worker A runs test T1 (reclaim), fails, calls record_error. acknowledged = 0 → we don’t call record_stats.

BuildStatusRecorder has already done self.failures += 1 (or self.errors += 1) for T1 in record() before calling build.record_error.

Worker A then runs test T2, fails, calls record_error. acknowledged = 1 → we do call record_stats(stats).
stats is built from current self.failures / self.errors, which still include T1. So we send e.g. failures: 2 (T1 + T2) for worker A.

Redis ends up with worker A’s failures = 2, but only one of those (T2) was a first ack; T1 was a duplicate and was never supposed to be counted in stats.

So we over-count whenever a worker has both a duplicate ack and a later first ack: the duplicate stays in the in-memory counter and is included in the next record_stats.

Yes this could be a problem. In that case you need to flip the logging around which makes the diff / refactor a lot more complicated.

Something like this https://github.com/Shopify/ci-queue/compare/cbruckmayer/fix-logging-of-tests?expand=1

Sure, I can continue the work on your branch.

I would probably keep it simple and remove the ignored state and introduce it in a later dedicated PR.

kangze-jia · 2026-02-09T21:22:15Z

ruby/lib/ci/queue/redis/build_record.rb

        end

        def record_error(id, payload, stats: nil)
-          acknowledged, _ = redis.pipelined do |pipeline|


Current code uses one pipeline: acknowledge + record_stats together. The new code goes more round trip which may cause performance regression.

Yes, if you want to keep it pipelined you need to inline it into the lua script which is a lot more complicated as you cannot rely on the result of the previous command in a pipeline. I don't think it's a significant performance regression so I prefer simplicity here.

Make sense. We can start simple first

… duplicate ack - Redis BuildRecord#record_error: run acknowledge first, then only record_stats and increment_test_failed when SADD indicates we were the first to ack. Return boolean so callers can roll back local state. - Minitest BuildStatusRecorder: when record_error returns false (duplicate ack), decrement local errors/failures so reporter totals stay correct when the same test is run by multiple workers.

ChrisBr · 2026-02-10T17:30:09Z

ruby/lib/ci/queue/redis/build_record.rb

        def record_error(id, payload, stats: nil)
-          acknowledged, _ = redis.pipelined do |pipeline|
+          # Run acknowledge first so we know whether we're the first to ack
+          acknowledged = redis.pipelined do |pipeline|


If there is only one command, you don't need a pipeline.

Yeah, you are right. This lua script is a single round-trip even though it does several Redis calls inside.

I will remove pipeline in here.

ChrisBr · 2026-02-10T17:30:16Z

ruby/lib/ci/queue/redis/build_record.rb

            @queue.acknowledge(id, error: payload, pipeline: pipeline)
-            record_stats(stats, pipeline: pipeline)
-          end
+          end.first


Yeah, it is not necessary. Removing it along with pipeline.

ChrisBr · 2026-02-10T17:30:31Z

ruby/lib/ci/queue/redis/build_record.rb

-          nil
+          if acknowledged
+            # We were the first to ack; another worker already ack'd would get falsy from SADD
+            redis.pipelined do |pipeline|


If there is only one command, we don't need a pipeline.

inside record_stats method (this is not lua script), it does multiple commands (hset and expire). So we can get benefit with pipeline. I prefer to keep it in here.

ChrisBr · 2026-02-10T17:32:03Z

ruby/lib/ci/queue/redis/build_record.rb

+            @queue.increment_test_failed
+          end
+          # Return so caller can roll back local counter when not acknowledged
+          !!acknowledged


Do we need to update the other implementations of record_error too?

Don't think so.

lib/ci/queue/build_record.rb (base BuildRecord): Static/local queue, single process, no Redis, no distributed workers.

lib/ci/queue/redis/grind_record.rb (GrindRecord): Grind mode (run tests many times). Different model from the main queue. It just lpushes onto an error list and updates stats. No “processed” set or SADD-based first-ack semantics.

What I mean is: Are they used with the reporter because they all return nil now which means we would always decrement the error count.

…gle command)

ChrisBr · 2026-02-10T20:52:30Z

ruby/lib/ci/queue/redis/build_record.rb

+            @queue.increment_test_failed
+          end
+          # Return so caller can roll back local counter when not acknowledged
+          !!acknowledged


!! is this needed?

!! does not change whether the value is “success” or “failure”. It only turns that into a real boolean.

acknowledge.lua already returns a boolean, so the !! is not a must. However, this can make sure the future change to acknowledge.lua will be safe.

- Redis BuildRecord: record_success returns !!acknowledged; record_requeue returns true - Base BuildRecord: record_error, record_success return true; add record_requeue (no-op, returns true) - BuildStatusRecorder: build pending stats, call record_* once, increment local counters only when acknowledged (avoids duplicate count for success/error) - Add tests: duplicate success/error no double-count, build record return booleans, static build returns true

ChrisBr · 2026-02-11T09:21:35Z

ruby/lib/minitest/queue/build_status_recorder.rb


      private

+      def stat_delta(counter, test)


Why is this needed?

Only increment counts when we acknowledge

928f0b2

ChrisBr commented Feb 8, 2026

View reviewed changes

ChrisBr mentioned this pull request Feb 8, 2026

Only increment counts when we acknowledge #372

Closed

ChrisBr requested a review from thadcraft-shopify February 8, 2026 21:03

Fix appending to warning file

14cf9e8

ChrisBr force-pushed the cbruckmayer/only-increment-on-ack-v2 branch from a9c8024 to 14cf9e8 Compare February 9, 2026 10:37

thadcraft-shopify approved these changes Feb 9, 2026

View reviewed changes

kangze-jia reviewed Feb 9, 2026

View reviewed changes

ChrisBr commented Feb 10, 2026

View reviewed changes

Simplify record_error: call acknowledge directly (no pipeline for sin…

102952b

…gle command)

ChrisBr commented Feb 10, 2026

View reviewed changes

ChrisBr commented Feb 11, 2026

View reviewed changes

ruby/lib/minitest/queue/build_status_recorder.rb

private

def stat_delta(counter, test)

Copy link

Contributor Author

ChrisBr Feb 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this needed?

Only increment stats when the worker acknowledged the test #373

Are you sure you want to change the base?

Only increment stats when the worker acknowledged the test #373

Conversation

ChrisBr commented Feb 8, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants