Skip to content

[SPIKE] MongoDB Persistence#5281

Draft
johnsimons wants to merge 23 commits intomasterfrom
cloudxp-679-warwick
Draft

[SPIKE] MongoDB Persistence#5281
johnsimons wants to merge 23 commits intomasterfrom
cloudxp-679-warwick

Conversation

@johnsimons
Copy link
Member

@johnsimons johnsimons commented Jan 29, 2026

MongoDB Persistence Spike (Audit Instance)

Adds a MongoDB persistence layer for ServiceControl.Audit, replacing RavenDB with support for multiple MongoDB wire protocol-compatible databases.

Supported and Tested Databases

  • MongoDB Community / Enterprise (self-hosted) — full feature set
  • MongoDB Atlas (cloud-hosted SaaS) - full feature set
  • Azure DocumentDB (managed) — PostgreSQL-based, MongoDB wire protocol
  • Amazon DocumentDB (managed) — MongoDB-compatible

Key Features

  • Flexible body storage - three modes: None (metadata only), Database (inline in document), Blob (Azure Blob Storage)
  • Automatic product detection - detects database product and adapts feature usage (bulk writes, text indexes, $facet aggregation)
  • Adaptive backpressure - WiredTiger cache monitoring (self-hosted) or latency-based detection (cloud) to pause ingestion under load
  • Deadlock resilience - automatic retry with exponential backoff for bulk write deadlocks (critical for cloud databases)
  • Full-text search - single-phase (inline bodies) or two-phase (external bodies) search across metadata and message bodies
  • Auto-tuning - TargetMessageIngestionRate auto-calculates batch sizes, writer counts, and concurrency levels
  • TTL-based expiry - automatic document expiration via MongoDB TTL indexes, no cleanup jobs needed
  • Optimistic Inserts for Processed Messages - Default to inserts only until a duplication error occurs. Then update. Saves IO on the vase majority of message inserts.
  • Batched async body writer - bounded channel with configurable parallelism isolates body upload latency from ingestion throughput
  • Redesigned audit ingestion pipeline - rearchitected from single-channel/single-reader to a three-stage pipeline (message channel → batch assembler → parallel writer pool) with configurable batch size, timeout, and writer count

New Configuration Settings

Setting Default Description
Database/BodyStorageType Database Where to store message bodies: None, Database, or Blob
Database/BodyWriterBatchSize auto Body documents per bulk write batch (auto-tuned from target rate)
Database/BodyWriterParallelWriters auto Parallel background body writer tasks (auto-tuned from target rate)
Database/BodyWriterBatchTimeout 500ms Max wait time before flushing a partial body batch
Database/BlobConnectionString (none) Azure Blob Storage connection string (required when BodyStorageType=Blob)
Database/BlobContainerName message-bodies Azure Blob Storage container name
TargetMessageIngestionRate (not set) Target msgs/sec — auto-calculates batch size, writers, and concurrency when set
MaximumConcurrencyLevel (transport default) Transport concurrency slots. Auto: ceiling(rate × estimatedLatency × 1.5)
AuditIngestionBatchSize 50 (auto: rate × 0.05, clamped 50–200) Messages per persistence batch
AuditIngestionMaxParallelWriters 4 (auto: max(2, ceiling(rate / 1000)), max 16) Number of parallel writer tasks
AuditIngestionBatchTimeout 100ms Max wait before flushing a partial batch (range: 10ms–5s)

Performance Observations

  • Local MongoDB: ~2,000 msg/s (4 writers)
  • Azure DocumentDB M30: ~546 msg/s (4 writers)
  • Full-text indexing roughly doubles write I/O

Scope

  • ServiceControl.Audit persistence — fully implemented
  • ServiceControl primary instance — not yet migrated (still RavenDB)

Spike Details and Tests

@warwickschroeder warwickschroeder changed the title [SPIKE] On MongoDB [SPIKE] MongoDB Persistence Feb 17, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants