Skip to content

Context Map — Audit Trail Platform

The context map is the single, authoritative view of how the Audit Trail Platform is partitioned into bounded contexts and how those contexts collaborate. It makes responsibilities and seams explicit—showing ownership, upstream/downstream dependencies, collaboration styles, contract touchpoints, and tenancy/security boundaries across Audit.Gateway, Audit.Ingestion, Audit.Policy, Audit.Integrity, Audit.Projection, Audit.Query, Audit.Search, Audit.Export, and Audit.Admin. Use it to align design, reviews, and incident triage.

This page is written for platform and domain architects, microservice owners, SRE/operations, and security/compliance reviewers (and is a quick on-ramp for new contributors). Start with “Bounded Contexts (at a glance)” to learn what each context owns, then follow labeled edges to see collaboration styles and jump into contracts/events for exact schemas.

Bounded Contexts (at a glance)

The table captures responsibility, I/O surfaces, persistence anchors, and how tenant isolation is enforced. One-line Published Language entries define the canonical nouns/verbs used inside each context.

Context Responsibility Primary Interfaces (OHS/Contracts) Persistence (authoritative) Tenancy notes
Audit.Gateway Front-door for append/query; enforces authn/z, rate limits, edition/feature checks; normalizes requests and emits intent. HTTP OHS: /audit/append, /audit/query · Async out: audit.append (intent) · Policies consulted per call. Stateless; relies on downstream. Per-tenant authentication + edition gates; injects TenantId, CorrelationId; rejects cross-tenant access.
Audit.Ingestion Validates and accepts append requests; deduplicates via idempotency keys; persists canonical audit entries. Async in: audit.append · HTTP/gRPC internal: AcceptAppend, Replay · Async out: audit.appended. Append-only event store (immutability), durable queue for staging. Partitioned by TenantId; idempotency scope {TenantId, Source, Key}; strict RLS on store.
Audit.Policy Centralizes policy: retention, PII classification/redaction, schema/edition gates; “can I append/query?” decisions. gRPC/HTTP: CheckPolicy, GetRetention, Classify · Async: policy.changed. Config DB for policies + KV cache; policy versioning tracked. Policies scoped by tenant and edition; defaults at platform → overridden per tenant.
Audit.Integrity Produces tamper-evidence (hash chains, Merkle roots, anchors); verifies integrity on demand; evidence ledger. Async in: audit.appended · HTTP/gRPC: VerifyEntry, VerifyRange, GetEvidence. Evidence ledger (hash chain/Merkle) + anchor journal (e.g., periodic root). Evidence keyed by TenantId; cross-tenant proofs disallowed; verification requires tenant context.
Audit.Projection Builds/maintains read models/materialized views for fast query; handles rebuild/replay. Async in: audit.appended, policy.changed · gRPC internal: Rebuild, Checkpoint. Read models (per-tenant tables) + checkpoint store. Per-tenant physical/logical partitioning; rebuilds isolated per tenant; back-pressure aware.
Audit.Query Query and retrieval over projections; pagination, filtering, joins with evidence on demand. HTTP/gRPC OHS: /audit/search, /audit/{id}, /audit/verify · Async out: request→Export. Query DB (authoritative for reads) referencing projections; optional cache. Server-enforced tenant filters; ABAC claims (tenant, role, scope); no cross-tenant joins.
Audit.Search Full-text and faceted search over audit content; powers free-text and advanced filters. HTTP internal: SearchIndex, Suggest · Async in: audit.appended · Async out: search.indexed. Search index (per-tenant index/partition) + queue for indexing. Separate index per tenant or partitioned by TenantId; query-time filters enforced.
Audit.Export Asynchronous export of query results (CSV/JSON/Parquet); packaging, signing, delivery & callbacks. HTTP OHS: /export/jobs · Async in: export requests · Webhook: export.completed (signed). Staging store + object storage for artifacts; job metadata DB. Exports scoped to requesting tenant; artifacts stored in tenant bucket/prefix; time-bound signed URLs.
Audit.Admin Tenant onboarding, keys/credentials, contract catalogs, schema registry pointers, operational toggles. HTTP OHS: /admin/tenants, /admin/contracts, /admin/policies · Async out: tenant.updated. Admin DB for tenants, client apps, contract metadata. Tenant registry is authoritative; ensures residency/region and edition applied across contexts.

Published Language — one-liners

  • Audit.Gateway“Append entries; submit queries; enforce edition/tenant guardrails.”
  • Audit.Ingestion“Accept intent; deduplicate by idempotency key; persist canonical AuditEntry.”
  • Audit.Policy“Classify fields; decide (allow/deny); retain/expire by RetentionPolicy; redact by DataClass.”
  • Audit.Integrity“Hash entries; link chains; anchor roots; verify evidence for a ProofRange.”
  • Audit.Projection“Project events; materialize views; checkpoint progress; rebuild deterministically.”
  • Audit.Query“Filter and retrieve AuditRecords; paginate; optionally verify on read.”
  • Audit.Search“Index content; tokenize; rank and facet results for SearchQuery.”
  • Audit.Export“Package results; sign artifacts; deliver via ExportJob with callback.”
  • Audit.Admin“Onboard tenants; register contracts; configure keys and govern editions.”

Collaboration Styles per Edge

Mini-catalog (with ATP examples)

  • Open Host Service (OHS) — A well-documented API a context exposes to the world.
    Example: Audit.Query offers /audit/search for read models via Gateway.

  • Published Language (PL) — A shared, versioned vocabulary/schema used between contexts.
    Example: Audit.Admin publishes TenantUpdated with canonical fields (tenantId, edition, region).

  • Customer–Supplier (CS) — Downstream (customer) drives expectations; upstream (supplier) commits to meet them.
    Example: Audit.Query (customer) asks Audit.Integrity (supplier) to verify evidence.

  • Conformist (CONF) — Consumer voluntarily adopts the upstream model to reduce translation/latency.
    Example: Audit.Gateway conforms to Audit.Query’s request/response shapes.

  • ACL / Anti-Corruption Layer (ACL) — Translation shield to isolate domain model from foreign one.
    Example: External publishers → Audit.Gateway (adapters normalize into AppendIntent).

  • Event Choreography (CHOREO) — Asynchronous collaboration via events; no central orchestrator.
    Example: Audit.Ingestion emits audit.appended; Integrity, Projection, and Search react.

Rule of one label per edge. Some edges mix interface and posture (e.g., OHS + Conformist). For clarity, we assign a single canonical style per edge below and reflect the dominant characteristic on the diagram label.


Canonical styles per pair (with rationale)

From → To Style Rationale (1-liner)
Gateway → Ingestion CS Gateway depends on Ingestion to accept intents and meet throughput/latency expectations.
Gateway → Query CONF Gateway adopts Query’s request/response shapes to stay thin and avoid translation.
Gateway → Export CONF Gateway forwards job creation using Export’s native API contract.
Ingestion → Policy CONF Ingestion conforms to Policy’s decision interface to make synchronous “allow/deny/redact” cheap.
Ingestion → Integrity CHOREO Integrity passively reacts to audit.appended to build hash chains without coupling.
Ingestion → Projection CHOREO Projection updates read models on audit.appended, enabling replay/rebuild.
Ingestion → Search CHOREO Search indexing is event-driven for elasticity and back-pressure handling.
Policy → Projection CHOREO Policy changes (policy.changed) drive projection rebuilds without sync coupling.
Query → Integrity CS Query requests on-read verification; Integrity commits to provide proofs.
Query → Export CS Query initiates long-running export jobs; Export provides job lifecycle guarantees.
Admin → Policy PL Admin publishes tenant/policy state in a canonical schema Policy consumes.
Admin → Gateway PL Admin’s tenant/edition updates propagate as canonical events that Gateway understands.
External Systems → Gateway ACL Gateway shields core domain by translating foreign payloads into AppendIntent.

Mermaid diagram (edge labels show the canonical style)

graph LR
  %% Legend: CS=Customer–Supplier, CONF=Conformist, ACL=Anti-Corruption Layer, CHOREO=Event Choreography, OHS=Open Host Service, PL=Published Language

  subgraph Clients
    U[API Clients / SDKs]
    EXT[External Systems]
  end

  subgraph Audit Trail Platform
    GW[Audit.Gateway]
    ING[Audit.Ingestion]
    POL[Audit.Policy]
    INT[Audit.Integrity]
    PRJ[Audit.Projection]
    QRY[Audit.Query]
    SRCH[Audit.Search]
    EXP[Audit.Export]
    ADM[Audit.Admin]
  end

  %% Client to Gateway (public API surface)
  U -->|OHS| GW

  %% Gateway collaborations
  GW -->|CS| ING
  GW -->|CONF| QRY
  GW -->|CONF| EXP

  %% Ingestion collaborations
  ING -->|CONF| POL
  ING -->|CHOREO| INT
  ING -->|CHOREO| PRJ
  ING -->|CHOREO| SRCH

  %% Policy broadcasts
  POL -->|CHOREO| PRJ

  %% Query collaborations
  QRY -->|CS| INT
  QRY -->|CS| EXP

  %% Admin publications
  ADM -->|PL| POL
  ADM -->|PL| GW

  %% External integrations
  EXT -->|ACL| GW
Hold "Alt" / "Option" to enable pan & zoom

Notes

  • When an edge is CONF, the consumer follows the supplier’s contract as-is; API shape lives with the supplier’s OHS and is versioned there.
  • CHOREO edges imply replayability and idempotency requirements on consumers; see “Reliability Notes” for DLQ/back-pressure specifics.
  • PL edges point to canonical schemas (tenant, edition, residency) owned by Audit.Admin; versioning is additive-first with deprecation windows.

Upstream/Downstream Matrix & Criticality

This matrix makes directionality and dependency explicit for each context and tags the operational criticality of every edge.

Legend:

LS — latency-sensitive (inline call on hot path) 📦 TS — throughput-sensitive (sustained high volume) ⏳ BA — batch/async (queued or long-running)

Context Upstream (depends on) Downstream (depends on it) Critical contracts (examples)
Audit.Gateway Audit.Admin — PL ⏳ Audit.Ingestion — CS 📦; Audit.Query — CONF ⚡; Audit.Export — CONF ⚡ HTTP OHS: /audit/append, /audit/query, /export/jobs; async out: audit.append (intent); consumes tenant.updated (PL).
Audit.Ingestion Audit.Gateway — CS 📦; Audit.Policy — CONF ⚡ Audit.Integrity — CHOREO ⏳; Audit.Projection — CHOREO ⏳; Audit.Search — CHOREO ⏳ Async in: audit.append; sync: CheckPolicy (gRPC/HTTP); async out: audit.appended.
Audit.Policy Audit.Admin — PL ⏳ Audit.Ingestion — CONF ⚡; Audit.Projection — CHOREO ⏳ Sync: CheckPolicy, GetRetention, Classify; async out: policy.changed.
Audit.Integrity Audit.Ingestion — CHOREO ⏳ Audit.Query — CS ⚡ Async in: audit.appended; HTTP/gRPC: VerifyEntry, VerifyRange, GetEvidence.
Audit.Projection Audit.Ingestion — CHOREO ⏳; Audit.Policy — CHOREO ⏳ Audit.Query — (reads projections) ⚡ Async in: audit.appended, policy.changed; internal: Rebuild, Checkpoint; read models/tables (authoritative for reads).
Audit.Query Audit.Projection — (read models) ⚡; Audit.Integrity — CS ⚡ Audit.Gateway — OHS ⚡; Audit.Export — CS ⏳ HTTP/gRPC OHS: /audit/search, /audit/{id}, /audit/verify; sync to Integrity on-read; async/sync to Export to start jobs.
Audit.Search Audit.Ingestion — CHOREO ⏳ Audit.Query — OHS ⚡ Async in: audit.appended; HTTP internal: SearchIndex, Suggest; async out: search.indexed.
Audit.Export Audit.Query — CS ⏳ Audit.Gateway — OHS ⚡; Webhook recipients — Webhook ⏳ HTTP OHS: /export/jobs, /export/jobs/{id}; webhook: export.completed (signed); artifacts in object storage.
Audit.Admin Audit.Policy — PL ⏳; Audit.Gateway — PL ⏳ HTTP OHS: /admin/tenants, /admin/contracts, /admin/policies; async out: tenant.updated, contract.updated.

Operational guidance.
• Treat ⚡ LS edges as part of your p95/p99 SLO budgets (timeouts, retries, circuit breakers).
• For 📦 TS edges, prefer queue/bulk APIs, shard keys, and idempotency; measure drain rates.
⏳ BA edges must have DLQ/retry/replay documented (see “Reliability Notes: Hot Paths & Recovery”).


Contracts & Events: Where to Look

Source of truth for contracts lives under docs/domain/contracts/ and event semantics under docs/domain/events-catalog.md. This section only pins touchpoints so you can jump to the exact files.

Audit.Gateway

Audit.Ingestion

Audit.Policy

Audit.Integrity

Audit.Projection

Audit.Query

Audit.Search

Audit.Export

Audit.Admin


Versioning & Discovery (watermark)

  • HTTP/gRPC: SemVer in Accept (e.g., application/vnd.connectsoft.audit.search+json;v=1) and/or path (/v1/...). Additive-first; breaking changes → new vN surface.
  • Events/Topics: Subject suffix .vN (e.g., audit.appended.v1). Schema ID carried in metadata (schemaId, schemaHash).
  • Deprecation windows: Minimum 180 days; both vN and vN+1 live in parallel; announce via contract.updated (Admin).
  • Registry & “current” pointers:

Always link specs from code repos to these canonical files. If you must drift, open an ADR and reference it in Evolution & ADR Links.


Contracts & Events: Where to Look

Source of truth for contracts lives under docs/domain/contracts/ and event semantics under docs/domain/events-catalog.md. This section only pins touchpoints so you can jump to the exact files.

Audit.Gateway

Audit.Ingestion

Audit.Policy

Audit.Integrity

Audit.Projection

Audit.Query

Audit.Search

Audit.Export

Audit.Admin


Versioning & Discovery (watermark)

  • HTTP/gRPC: SemVer in Accept (e.g., application/vnd.connectsoft.audit.search+json;v=1) and/or path (/v1/...). Additive-first; breaking changes → new vN surface.
  • Events/Topics: Subject suffix .vN (e.g., audit.appended.v1). Schema ID carried in metadata (schemaId, schemaHash).
  • Deprecation windows: Minimum 180 days; both vN and vN+1 live in parallel; announce via contract.updated (Admin).
  • Registry & “current” pointers:

Always link specs from code repos to these canonical files. If you must drift, open an ADR and reference it in Evolution & ADR Links.


Tenancy & Ownership

Each context declares what it owns (authoritative sources of truth) and what it derives (indices, projections, caches). Tenant isolation is enforced end-to-end via keys, partitioning, and RLS/filters.

Context Owns (authoritative) Indices / Projections (derived) Tenant keying (partitioning / RLS / filters) Cross-tenant rules
Audit.Gateway None (stateless for business data). May persist AccessLog/RateLimit counters (operational). All inbound calls must carry TenantId; Gateway injects TenantId & CorrelationId downstream; rejects missing/mismatched tenant claims. Disallowed. Gateway enforces per-request tenant scoping; no cross-tenant fan-out.
Audit.Ingestion AuditEntry (append-only, immutable), AppendReceipt (ack metadata). Staging queue only (transient). Physical/logical partition by TenantId; RLS on event store; idempotency scope {TenantId, ProducerAppId, Source, IdempotencyKey}. Disallowed. Replay & rebuild are tenant-scoped jobs.
Audit.Policy PolicyDefinition, RetentionPolicy, ClassificationPolicy (incl. versions). KV/cache of compiled policies per tenant/version. Policies keyed {TenantId, Edition} with platform defaults and per-tenant overrides; reads require tenant match. No cross-tenant decisions; multi-tenant policy reads restricted to platform admins (read-only).
Audit.Integrity EvidenceLink, ChainSegment, MerkleRootAnchor. Verification caches (proof memoization) per tenant. Separate chain per TenantId (and optionally per region/namespace); proof queries require matching TenantId. Disallowed. No proofs across tenants; anchors never mix tenants.
Audit.Projection none (does not create new business truth). AuditRecordView (per-tenant tables), PolicyView; Checkpoint state. Per-tenant schemas or table partition by TenantId; RLS on all views; rebuild jobs are tenant-scoped. Disallowed. Aggregates and joins are tenant-bounded.
Audit.Query No business truth; may own QueryAuditLog (requests), AccessAudit. Uses Projection as source of truth for reads. All queries require tenant filters injected server-side; ABAC checks on roles/scopes. Disallowed by default. Exception: platform auditors with explicit platform-admin scope and “multi-tenant read” feature flag (read-only).
Audit.Search No business truth. AuditSearchDoc (per-tenant index/partition), suggest dictionaries. Dedicated index per tenant or partition key TenantId; query-time tenant filter enforced. Disallowed. No cross-tenant search indices or queries.
Audit.Export ExportJob, ExportArtifact (metadata). Temporary assembly areas; signed URLs; delivery receipts. Artifact storage path tenants/{TenantId}/exports/{JobId}; keys/secrets resolved per tenant; webhook signing keys per tenant. Disallowed. Exports cannot include records from multiple tenants.
Audit.Admin Tenant registry, Edition mapping, ClientApp credentials, ContractDescriptor. Catalog caches for quick lookup. TenantId is authoritative here; residency/region and edition enforced at provisioning time for downstream stores. Limited to platform staff; writes are tenant-scoped; reads across tenants only for admin console.

Idempotency & Correlation (ingest/query requirements)

  • Idempotency (Append)

    • Header/metadata: X-Idempotency-Key (opaque), plus producer metadata ProducerAppId, Source.
    • Scope: {TenantId, ProducerAppId, Source, IdempotencyKey}.
    • Retention: configurable; default ≥ 7 days to survive network retries and DLQ replays.
    • Behavior: On duplicate, Ingestion returns the original AppendReceipt (HTTP 200) without writing a new AuditEntry.
  • Correlation & Tracing

    • Accept X-Correlation-Id (sticky across contexts) and W3C traceparent (OpenTelemetry).
    • Gateway creates values if missing, propagates via event metadata (correlationId, traceId) and request headers.
    • All domain events (audit.appended, policy.changed, etc.) must include {tenantId, correlationId, causationId?, traceId}.
  • Tenant Header & Claims

    • X-Tenant-Id (or embedded in client credentials) is mandatory at Gateway; downstream services do not trust caller-supplied tenant values—Gateway signs/enriches them.

Residency & Data Locality (enforced by Admin)

  • Region binding (per tenant) drives storage/account selection for Ingestion store, Integrity ledger, Projection DB, Search index, and Export bucket/prefix.
  • Cross-region moves require Admin workflow (deactivate → migrate → reactivate) and explicit ADR.

Rule of thumb: If a context “creates original business facts,” it owns them; everything else is derived and must be reproducible from owned facts + policies.


Security Overlays (Zero-Trust)

We treat every hop as hostile by default. Controls are enforced at the edge (Gateway), inside the mesh (service→service), and at data layers. Workloads authenticate with workload identity; traffic is mTLS; requests are authorized with ABAC/RBAC and tenant/edition guards; sensitive fields are classified/redacted at policy checkpoints.


Overlay diagram


flowchart LR
  subgraph EDGE["Edge / Public Ingress"]
    C[Clients/SDKs]
  end

  subgraph GZ["Gateway Zone (mTLS ingress)"]
    GW[Audit.Gateway<br />AuthN: OAuth2/JWT<br />AuthZ: ABAC/RBAC<br />Rate limit<br />Schema & edition gates]
  end

  subgraph MESH["Service Mesh / mTLS"]
    ING[Audit.Ingestion<br />Idempotency + tenancy inject]
    POL[Audit.Policy<br />Classification & decisions]
    INT[Audit.Integrity<br />Hash/Merkle proofs]
    PRJ[Audit.Projection]
    QRY[Audit.Query]
    SRCH[Audit.Search]
    EXP[Audit.Export<br />Signed webhooks]
    ADM[Audit.Admin<br />Tenant/Edition registry]
  end

  C ---|TLS| GW
  GW ---|mTLS + ABAC| ING
  ING ---|mTLS - CONF| POL
  ING -. events (signed, tenant-scoped) .-> INT
  ING -. events (signed, tenant-scoped) .-> PRJ
  ING -. events (signed, tenant-scoped) .-> SRCH
  QRY ---|mTLS - CS| INT
  QRY ---|mTLS - CS| EXP
  ADM -. PL events .-> GW
  ADM -. PL events .-> POL

  subgraph DATA["KMS-protected Data Layers"]
    ES[(Event Store / Ledger)]
    RM[(Read Models)]
    IDX[(Search Index)]
    OBJ[(Object Storage — Exports)]
  end

  ING --- ES
  PRJ --- RM
  SRCH --- IDX
  EXP --- OBJ
Hold "Alt" / "Option" to enable pan & zoom

Boundary policies (who enforces what)

Client → Gateway (public edge)

  • Transport: TLS 1.2+; HSTS at edge; strict ALPN/ciphers.
  • AuthN: OAuth2/JWT (aud/iss/exp/nbf checked); optional mTLS for partner apps.
  • AuthZ: ABAC (tenant, roles/scopes, edition features).
  • Tenancy: Gateway injects TenantId, rejects cross-tenant hints; correlates X-Correlation-Id.
  • Validation: JSON schema & contract version; edition gates; size limits; content scanning (optional).
  • Rate limiting: Token-bucket per tenant + client with burst/steady; 429 + Retry-After.
  • PII hooks: Request models pre-classified; drop disallowed fields before emit.

Gateway → Ingestion (hot path)

  • mTLS + workload identity: service-to-service with SPIFFE/SPIRE or AAD Workload Identity.
  • AuthZ: ABAC check (tenant/edition) survives hop via signed claims.
  • Idempotency: Required X-Idempotency-Key; scope {TenantId, ProducerAppId, Source, Key}.
  • Observability: W3C traceparent propagated; structured audit log.

Ingestion ⇢ {Integrity, Projection, Search} (event edges)

  • Transport: Broker auth (SAS/AAD), topic-level ACLs; mTLS where supported.
  • Envelope: Events signed (producer key id + hash) and include {tenantId, schemaId, correlationId}.
  • PII: Fields already classified/redacted by Policy at accept time.
  • Replay/DLQ: Tenant-scoped DLQ; poison-pill quarantine; ordered replays per partition key.

Ingestion → Policy (decision call)

  • mTLS + workload identity; timeout budget small (latency-sensitive).
  • Cache: Negative/positive decision caching with TTL; eTags/version.

Query → Integrity (on-read verify)

  • mTLS; AuthZ requires same-tenant proof requests.
  • Proofs: Range proofs returned with evidenceHash, chainId, window.

Query → Export; Export → Webhook recipients

  • Export API: mTLS; ABAC on job scope; encryption conf set by tenant policy.
  • Artifacts: KMS envelope encryption; per-tenant bucket/prefix.
  • Webhooks: HMAC-SHA256 signature over canonical payload; headers X-Export-Signature, X-Export-Timestamp; 5-min skew window; retries with exponential backoff.

Admin → {Gateway, Policy} (PL events)

  • Publisher: Admin signs tenant.updated, contract.updated; consumers verify signature + version.
  • Controls: Edition/residency changes require multi-party approval (break-glass logged).

Mandatory controls matrix

Area Control Enforced by
Transport TLS at edge; mTLS in mesh Ingress/Gateway, Mesh/Sidecars
Workload identity SPIFFE/SPIRE or AAD Workload Identity; no static secrets in pods Platform IAM
AuthN/AuthZ OAuth2/JWT (clients); ABAC/RBAC (services) Gateway, Services
Tenancy Server-enforced tenant filter; signed tenant claims Gateway, All services
PII/Classif. Policy classifies; Ingestion applies redaction before persist Policy, Ingestion
Rate limiting Per-tenant/client; separate write vs read buckets Gateway
Data-at-rest KMS keys per store (event store, read models, search, exports) Platform/KMS
Secrets Central vault; short-lived tokens; no inline secrets Platform
Logging Structured, redacted logs; no PII beyond policy All services
Integrity Hash chains, Merkle roots, anchored periodically Integrity

Notes & defaults

  • Default-deny on network and IAM. Only declared edges are allowed.
  • Additive-first versioning; breaking changes require new vN and ADR.
  • Residency and region enforced by Admin at provisioning; data paths are tenant-scoped.
  • Back-pressure policies (429/deferral/DLQ) must not leak cross-tenant timing channels.

See also: Tenancy & Ownership for partitioning/RLS, and Reliability Notes for DLQ/replay guarantees.


Reliability Notes: Hot Paths & Recovery

This section names the golden paths, sets p95 targets, and documents back-pressure + replay entry points so SREs and service owners share the same operational contract.

SLIs & scope

  • Availability (per OHS): successful responses / total, excluding client 4xx (except 429).
  • Latency (p95) measured server-side within the same region; excludes client/network RTT.
  • Durability of append: an append is “accepted” once persisted to the Ingestion event store; downstream consumers are eventually consistent.

Golden Path A — Append → Accept → (event) Project/Search/Integrity

User intent: client app appends an audit entry.

Targets

  • Gateway POST /audit/append (sync mode): p95 ≤ 150 ms, p99 ≤ 300 ms.
  • If Gateway sheds to async mode (intent enqueue): p95 ≤ 60 ms for 202 Ack; Ingestion accept within p95 ≤ 2 s end-to-end (queue to persist).

Back-pressure & controls

  • Gateway: per-tenant/token buckets; on breach → 429 with Retry-After. Adaptive mode: switches from sync to async (enqueue audit.append intent) when local queue > threshold.
  • Gateway → Ingestion (sync path): if Policy/Ingress budget exceeded → 503 with Retry-After; clients retry with same Idempotency-Key.
  • Ingestion: write throttles to event store; if pressure → stage to durable queue; consumers slowed via prefetch & concurrency limits.
  • Events (Ingestion → {Projection, Search, Integrity}): ASB DLQ after N attempts (default 10, exponential backoff); deferral for out-of-order ranges; tenant-partitioned subscriptions.

Replay / rebuild

  • Re-ingest (safe): Ingestion.Replay(tenantId, fromOffset, toOffset?)idempotent by {TenantId, ProducerAppId, Source, IdempotencyKey}.
  • Projection: Projection.Rebuild(tenantId, checkpoint?) — drains from event store; checkpoints per tenant.
  • Search: Search.Reindex(tenantId, range?) — idempotent on documentId + version.
  • Integrity: Integrity.Repair(tenantId, gapRange) — recomputes chains and anchors; no cross-tenant proofs.

Observability - Required dimensions: {tenantId, route, mode(sync|async), status, idempotencyHit(bool)}; export drain rate and oldest message age for audit.append.


Golden Path B — Query → (read models) → (optional) Verify

User intent: list/filter audit records, optionally verify integrity on read.

Targets - POST /audit/search: p95 ≤ 250 ms for page sizes ≤ 100. - GET /audit/records/{id} + POST /audit/verify: p95 ≤ 200 ms for typical proof windows (≤ 1 k entries).

Back-pressure & controls - Gateway: per-tenant read buckets; expensive queries are shaped (server-side limits) or return 429 with guidance. - Query: protects read models with max rows / timeouts; cache hot filters; coalesce repeated requests by tenantId+hash(query) window. - Query → Integrity: if verify exceeds budget → return partial set + verificationPending=true (option), or 202 to async verify with callback.

Replay / rebuild - Source of truth is Projection; if gaps detected → trigger Projection.Rebuild(tenantId).
- Integrity verification can be deferred: VerifyRangeAsync(ticket); result cached by (tenantId, rangeHash).

Observability - Export SLIs: qps, hitRatio(cache), p95, timeouts, verifyRate, verifyLatency.


Golden Path C — Export (long-running)

User intent: export query results to CSV/JSON/Parquet with signed delivery.

Targets - POST /export/jobs: p95 ≤ 120 ms to accept & enqueue. - Job start SLA: p95 ≤ 60 s under steady load; completion depends on dataset size; progress exposed via /export/jobs/{id}.

Back-pressure & controls - Bounded concurrency per tenant; queue length SLO and age alerts. - If object storage throttles → exponential backoff; partial chunks checkpointed. - Webhook retries (HMAC signed) with jittered backoff; DLQ on receiver 4xx/5xx after budget.

Replay / rebuild - Export.Resume(jobId) — idempotent chunking; artifacts content-addressed. - Regenerate artifact: Export.Rerun(jobId) stores new version, previous retained by retention policy.

Observability

  • SLIs per tenant: queuedJobs, runningJobs, meanChunkTime, artifactSize, webhookFailures.

Golden Path D — Admin Updates (tenant/edition/policy)

User intent: update tenant/edition or policy; propagate safely.

Targets

  • POST /admin/policies or /admin/tenants: p95 ≤ 200 ms to persist and emit *.updated.
  • Downstream consumption SLO: p95 ≤ 30 s for Projection/Search to reflect policy changes.

Back-pressure & controls

  • Admin changes rate-limited; circuit prevents mass invalidation storms.
  • Consumers apply changes with version gating; if behind → queue defers until consistent.

Replay / rebuild

  • Admin.Rebroadcast(tenantId, type, version) seeds re-delivery.
  • Projection.Rebuild if classification/retention policy changed (ensures derived state consistency).

Observability

  • Track policy version adoption per service; alert on lag (e.g., > 2 versions).

Back-pressure & DLQ Matrix (summary)

Edge Mechanism Default policy
Client → Gateway 429 + Retry-After; adaptive degrade to async appends Per-tenant token bucket; separate write/read buckets
Gateway ↔ Policy (sync) 503 + retry with jitter; small timeouts; circuit breaker Budget 50 ms median; fallback = enqueue intent
Gateway → Ingestion (sync) 503 + retry w/ idempotency; or enqueue Idempotent keys; drop duplicates by scope
Ingestion → Event Bus Outbox + publish retry; handoff DLQ Max attempts 10; DLQ w/ diagnostics and sample payload
Bus → Consumer retry/backoff; DLQ/deferral Tenant partition key; replay safe
Query → Integrity Timeout budget; partial results or async verify SLA ties to page size/proof window
Export → Webhook Signed retries; DLQ after budget 6 attempts, exponential backoff, 5-minute max skew

Operator playbook pointers

  • Rebuild: Start with Projection, then Search, then re-issue VerifyRange if needed.
  • Drain: Pause producers for a tenant using Gateway admission control, then Replay from last good checkpoint.
  • Hot shard: Enable producer-side backoff; increase consumer concurrency for that partition only; avoid global scale-out first.

All paths rely on idempotency + tenant partitioning to make replay safe. If any component cannot guarantee this, file an ADR and link it in Evolution & ADR Links.


We evolve edges additively-first and treat the context map as a governed contract. Any deviation or breaking move must carry an ADR.


Versioning & change rules

  • APIs (HTTP/gRPC)

    • Allowed (non-breaking): add optional fields; add new endpoints; widen enum with default; increase limits with server-side caps.
    • Breaking → new major vN: remove/rename fields; change semantics; tighten validation; change status codes.
    • Surface major in path (/v2/...) or Accept header (v=2); run vN and vN+1 in parallel for ≥ 180 days.
  • Events/Topics

    • Name as domain.event.vN; additive changes stay within vN; breaking changes → new subject .vN+1.
    • Carry schemaId, schemaHash, tenantId, correlationId.
    • Dual-publish during migrations; consumers opt-in per version.
  • Contracts registry

    • Update contracts/index.md and contracts/registry.json with each change; bump SemVer.
    • Emit contract.updated.v1 from Admin announcing availability/deprecation window.

Edge evolution policies (what’s allowed)

  • Add a new edge: allowed if tenancy, security, and SLO budget are documented (see checklist below).
  • Change edge style (e.g., CONFACL): require ADR with rationale (coupling, translation needs, upstream churn).
  • Remove an edge: only after window closes and all consumers are verified migrated (evidence via dashboards).
  • Raise criticality (e.g., BALS): needs capacity/SLO analysis and load test artifacts.
  • Residency/region impact: requires migration plan + data path verification; Admin gated.

When to introduce an ACL (decision cues)

Introduce an Anti-Corruption Layer if any of the following hold:

  • Upstream “Published Language” conflicts with our domain terms or policy model.
  • Upstream schema churns frequently or has weak versioning guarantees.
  • Security/classification requirements differ (need pre-ingest redaction/validation).
  • We need canary or translation for rollout without exposing internals.

If none apply and latency is critical, prefer Conformist.


“Propose a new edge” — PR checklist (copy into PR description)

  • Schema & Contracts

    • Contract file(s) under docs/domain/contracts/... with examples and validation (JSON Schema/Proto).
    • Event subject named x.y.vN; schema registered in contracts/registry.json.
    • Compatibility statement (additive vs breaking) and deprecation plan (if replacing an existing edge).
  • Tenancy & Data

    • tenantId propagation from Gateway; server-enforced filters/RLS noted.
    • Idempotency scope defined (if write/append): {TenantId, ProducerAppId, Source, Key}.
    • Residency/region path (storage/index/bucket prefix).
  • Security

    • AuthN (OAuth2/JWT or workload identity) and AuthZ posture (ABAC/RBAC).
    • PII classification hooks; redaction policy at accept time.
    • KMS usage for any data-at-rest or artifact produced; secret source (Vault).
  • Observability

    • W3C trace propagation (traceparent); correlation fields in events.
    • Metrics: QPS, p95, error rate, DLQ age / drain rate; logs redaction noted.
    • Dashboards/alerts updated (golden signals).
  • SLO & Reliability

    • Proposed p95/p99; capacity estimate; back-pressure behavior (429/503, deferral).
    • Retry policy, DLQ policy, replay entry points; idempotency verified.
  • Docs & Governance

    • context-map.md edges/labels updated with style & criticality.
    • events-catalog.md entry added/updated.
    • Runbook link (rebuild/replay).
    • ADR linked (see below).

  • docs/adr/2025-10-22-gw-query-conformist.md — Adopt Conformist on Gateway→Query to minimize latency.
  • docs/adr/2025-10-22-introduce-acl-external-publishers.md — Add ACL adapters at Gateway for external systems.
  • docs/adr/2025-10-22-event-versioning-v2-for-audit-appended.md — Promote audit.appended to v2 with new required field evidenceHint.

Tip: Use the ADR template (docs/adr/_template.md) and tag with context-map, contracts, security, tenancy.


Governance hooks (automation)

  • CI checks:
    • Contract lint (schema validity), breaking-change detector (forbidden field deletes), link checker for docs.
    • “Contracts registry up-to-date” gate; CODEOWNERS require sign-off from Domain, SRE, Security.
  • Release:
    • Dual-run vN & vN+1; publish adoption dashboards; emit contract.updated.
    • Set deprecation tombstones with removal date; auto-open tracking issue.

If a PR passes this section’s checklist and CI gates, reviewers can approve without additional architecture meetings.