Context Map — Audit Trail Platform¶
The context map is the single, authoritative view of how the Audit Trail Platform is partitioned into bounded contexts and how those contexts collaborate. It makes responsibilities and seams explicit—showing ownership, upstream/downstream dependencies, collaboration styles, contract touchpoints, and tenancy/security boundaries across Audit.Gateway, Audit.Ingestion, Audit.Policy, Audit.Integrity, Audit.Projection, Audit.Query, Audit.Search, Audit.Export, and Audit.Admin. Use it to align design, reviews, and incident triage.
This page is written for platform and domain architects, microservice owners, SRE/operations, and security/compliance reviewers (and is a quick on-ramp for new contributors). Start with “Bounded Contexts (at a glance)” to learn what each context owns, then follow labeled edges to see collaboration styles and jump into contracts/events for exact schemas.
Bounded Contexts (at a glance)¶
The table captures responsibility, I/O surfaces, persistence anchors, and how tenant isolation is enforced. One-line Published Language entries define the canonical nouns/verbs used inside each context.
| Context | Responsibility | Primary Interfaces (OHS/Contracts) | Persistence (authoritative) | Tenancy notes |
|---|---|---|---|---|
| Audit.Gateway | Front-door for append/query; enforces authn/z, rate limits, edition/feature checks; normalizes requests and emits intent. | HTTP OHS: /audit/append, /audit/query · Async out: audit.append (intent) · Policies consulted per call. |
Stateless; relies on downstream. | Per-tenant authentication + edition gates; injects TenantId, CorrelationId; rejects cross-tenant access. |
| Audit.Ingestion | Validates and accepts append requests; deduplicates via idempotency keys; persists canonical audit entries. | Async in: audit.append · HTTP/gRPC internal: AcceptAppend, Replay · Async out: audit.appended. |
Append-only event store (immutability), durable queue for staging. | Partitioned by TenantId; idempotency scope {TenantId, Source, Key}; strict RLS on store. |
| Audit.Policy | Centralizes policy: retention, PII classification/redaction, schema/edition gates; “can I append/query?” decisions. | gRPC/HTTP: CheckPolicy, GetRetention, Classify · Async: policy.changed. |
Config DB for policies + KV cache; policy versioning tracked. | Policies scoped by tenant and edition; defaults at platform → overridden per tenant. |
| Audit.Integrity | Produces tamper-evidence (hash chains, Merkle roots, anchors); verifies integrity on demand; evidence ledger. | Async in: audit.appended · HTTP/gRPC: VerifyEntry, VerifyRange, GetEvidence. |
Evidence ledger (hash chain/Merkle) + anchor journal (e.g., periodic root). | Evidence keyed by TenantId; cross-tenant proofs disallowed; verification requires tenant context. |
| Audit.Projection | Builds/maintains read models/materialized views for fast query; handles rebuild/replay. | Async in: audit.appended, policy.changed · gRPC internal: Rebuild, Checkpoint. |
Read models (per-tenant tables) + checkpoint store. | Per-tenant physical/logical partitioning; rebuilds isolated per tenant; back-pressure aware. |
| Audit.Query | Query and retrieval over projections; pagination, filtering, joins with evidence on demand. | HTTP/gRPC OHS: /audit/search, /audit/{id}, /audit/verify · Async out: request→Export. |
Query DB (authoritative for reads) referencing projections; optional cache. | Server-enforced tenant filters; ABAC claims (tenant, role, scope); no cross-tenant joins. |
| Audit.Search | Full-text and faceted search over audit content; powers free-text and advanced filters. | HTTP internal: SearchIndex, Suggest · Async in: audit.appended · Async out: search.indexed. |
Search index (per-tenant index/partition) + queue for indexing. | Separate index per tenant or partitioned by TenantId; query-time filters enforced. |
| Audit.Export | Asynchronous export of query results (CSV/JSON/Parquet); packaging, signing, delivery & callbacks. | HTTP OHS: /export/jobs · Async in: export requests · Webhook: export.completed (signed). |
Staging store + object storage for artifacts; job metadata DB. | Exports scoped to requesting tenant; artifacts stored in tenant bucket/prefix; time-bound signed URLs. |
| Audit.Admin | Tenant onboarding, keys/credentials, contract catalogs, schema registry pointers, operational toggles. | HTTP OHS: /admin/tenants, /admin/contracts, /admin/policies · Async out: tenant.updated. |
Admin DB for tenants, client apps, contract metadata. | Tenant registry is authoritative; ensures residency/region and edition applied across contexts. |
Published Language — one-liners¶
- Audit.Gateway — “Append entries; submit queries; enforce edition/tenant guardrails.”
- Audit.Ingestion — “Accept intent; deduplicate by idempotency key; persist canonical AuditEntry.”
- Audit.Policy — “Classify fields; decide (allow/deny); retain/expire by RetentionPolicy; redact by DataClass.”
- Audit.Integrity — “Hash entries; link chains; anchor roots; verify evidence for a ProofRange.”
- Audit.Projection — “Project events; materialize views; checkpoint progress; rebuild deterministically.”
- Audit.Query — “Filter and retrieve AuditRecords; paginate; optionally verify on read.”
- Audit.Search — “Index content; tokenize; rank and facet results for SearchQuery.”
- Audit.Export — “Package results; sign artifacts; deliver via ExportJob with callback.”
- Audit.Admin — “Onboard tenants; register contracts; configure keys and govern editions.”
Collaboration Styles per Edge¶
Mini-catalog (with ATP examples)¶
-
Open Host Service (OHS) — A well-documented API a context exposes to the world.
Example: Audit.Query offers/audit/searchfor read models via Gateway. -
Published Language (PL) — A shared, versioned vocabulary/schema used between contexts.
Example: Audit.Admin publishesTenantUpdatedwith canonical fields (tenantId,edition,region). -
Customer–Supplier (CS) — Downstream (customer) drives expectations; upstream (supplier) commits to meet them.
Example: Audit.Query (customer) asks Audit.Integrity (supplier) to verify evidence. -
Conformist (CONF) — Consumer voluntarily adopts the upstream model to reduce translation/latency.
Example: Audit.Gateway conforms to Audit.Query’s request/response shapes. -
ACL / Anti-Corruption Layer (ACL) — Translation shield to isolate domain model from foreign one.
Example: External publishers → Audit.Gateway (adapters normalize intoAppendIntent). -
Event Choreography (CHOREO) — Asynchronous collaboration via events; no central orchestrator.
Example: Audit.Ingestion emitsaudit.appended; Integrity, Projection, and Search react.
Rule of one label per edge. Some edges mix interface and posture (e.g., OHS + Conformist). For clarity, we assign a single canonical style per edge below and reflect the dominant characteristic on the diagram label.
Canonical styles per pair (with rationale)¶
| From → To | Style | Rationale (1-liner) |
|---|---|---|
| Gateway → Ingestion | CS | Gateway depends on Ingestion to accept intents and meet throughput/latency expectations. |
| Gateway → Query | CONF | Gateway adopts Query’s request/response shapes to stay thin and avoid translation. |
| Gateway → Export | CONF | Gateway forwards job creation using Export’s native API contract. |
| Ingestion → Policy | CONF | Ingestion conforms to Policy’s decision interface to make synchronous “allow/deny/redact” cheap. |
| Ingestion → Integrity | CHOREO | Integrity passively reacts to audit.appended to build hash chains without coupling. |
| Ingestion → Projection | CHOREO | Projection updates read models on audit.appended, enabling replay/rebuild. |
| Ingestion → Search | CHOREO | Search indexing is event-driven for elasticity and back-pressure handling. |
| Policy → Projection | CHOREO | Policy changes (policy.changed) drive projection rebuilds without sync coupling. |
| Query → Integrity | CS | Query requests on-read verification; Integrity commits to provide proofs. |
| Query → Export | CS | Query initiates long-running export jobs; Export provides job lifecycle guarantees. |
| Admin → Policy | PL | Admin publishes tenant/policy state in a canonical schema Policy consumes. |
| Admin → Gateway | PL | Admin’s tenant/edition updates propagate as canonical events that Gateway understands. |
| External Systems → Gateway | ACL | Gateway shields core domain by translating foreign payloads into AppendIntent. |
Mermaid diagram (edge labels show the canonical style)¶
graph LR
%% Legend: CS=Customer–Supplier, CONF=Conformist, ACL=Anti-Corruption Layer, CHOREO=Event Choreography, OHS=Open Host Service, PL=Published Language
subgraph Clients
U[API Clients / SDKs]
EXT[External Systems]
end
subgraph Audit Trail Platform
GW[Audit.Gateway]
ING[Audit.Ingestion]
POL[Audit.Policy]
INT[Audit.Integrity]
PRJ[Audit.Projection]
QRY[Audit.Query]
SRCH[Audit.Search]
EXP[Audit.Export]
ADM[Audit.Admin]
end
%% Client to Gateway (public API surface)
U -->|OHS| GW
%% Gateway collaborations
GW -->|CS| ING
GW -->|CONF| QRY
GW -->|CONF| EXP
%% Ingestion collaborations
ING -->|CONF| POL
ING -->|CHOREO| INT
ING -->|CHOREO| PRJ
ING -->|CHOREO| SRCH
%% Policy broadcasts
POL -->|CHOREO| PRJ
%% Query collaborations
QRY -->|CS| INT
QRY -->|CS| EXP
%% Admin publications
ADM -->|PL| POL
ADM -->|PL| GW
%% External integrations
EXT -->|ACL| GW
Notes
- When an edge is CONF, the consumer follows the supplier’s contract as-is; API shape lives with the supplier’s OHS and is versioned there.
- CHOREO edges imply replayability and idempotency requirements on consumers; see “Reliability Notes” for DLQ/back-pressure specifics.
- PL edges point to canonical schemas (tenant, edition, residency) owned by Audit.Admin; versioning is additive-first with deprecation windows.
Upstream/Downstream Matrix & Criticality¶
This matrix makes directionality and dependency explicit for each context and tags the operational criticality of every edge.
Legend:
⚡ LS — latency-sensitive (inline call on hot path) 📦 TS — throughput-sensitive (sustained high volume) ⏳ BA — batch/async (queued or long-running)
| Context | Upstream (depends on) | Downstream (depends on it) | Critical contracts (examples) |
|---|---|---|---|
| Audit.Gateway | Audit.Admin — PL ⏳ | Audit.Ingestion — CS 📦; Audit.Query — CONF ⚡; Audit.Export — CONF ⚡ | HTTP OHS: /audit/append, /audit/query, /export/jobs; async out: audit.append (intent); consumes tenant.updated (PL). |
| Audit.Ingestion | Audit.Gateway — CS 📦; Audit.Policy — CONF ⚡ | Audit.Integrity — CHOREO ⏳; Audit.Projection — CHOREO ⏳; Audit.Search — CHOREO ⏳ | Async in: audit.append; sync: CheckPolicy (gRPC/HTTP); async out: audit.appended. |
| Audit.Policy | Audit.Admin — PL ⏳ | Audit.Ingestion — CONF ⚡; Audit.Projection — CHOREO ⏳ | Sync: CheckPolicy, GetRetention, Classify; async out: policy.changed. |
| Audit.Integrity | Audit.Ingestion — CHOREO ⏳ | Audit.Query — CS ⚡ | Async in: audit.appended; HTTP/gRPC: VerifyEntry, VerifyRange, GetEvidence. |
| Audit.Projection | Audit.Ingestion — CHOREO ⏳; Audit.Policy — CHOREO ⏳ | Audit.Query — (reads projections) ⚡ | Async in: audit.appended, policy.changed; internal: Rebuild, Checkpoint; read models/tables (authoritative for reads). |
| Audit.Query | Audit.Projection — (read models) ⚡; Audit.Integrity — CS ⚡ | Audit.Gateway — OHS ⚡; Audit.Export — CS ⏳ | HTTP/gRPC OHS: /audit/search, /audit/{id}, /audit/verify; sync to Integrity on-read; async/sync to Export to start jobs. |
| Audit.Search | Audit.Ingestion — CHOREO ⏳ | Audit.Query — OHS ⚡ | Async in: audit.appended; HTTP internal: SearchIndex, Suggest; async out: search.indexed. |
| Audit.Export | Audit.Query — CS ⏳ | Audit.Gateway — OHS ⚡; Webhook recipients — Webhook ⏳ | HTTP OHS: /export/jobs, /export/jobs/{id}; webhook: export.completed (signed); artifacts in object storage. |
| Audit.Admin | — | Audit.Policy — PL ⏳; Audit.Gateway — PL ⏳ | HTTP OHS: /admin/tenants, /admin/contracts, /admin/policies; async out: tenant.updated, contract.updated. |
Operational guidance.
• Treat ⚡ LS edges as part of your p95/p99 SLO budgets (timeouts, retries, circuit breakers).
• For 📦 TS edges, prefer queue/bulk APIs, shard keys, and idempotency; measure drain rates.
• ⏳ BA edges must have DLQ/retry/replay documented (see “Reliability Notes: Hot Paths & Recovery”).
Contracts & Events: Where to Look¶
Source of truth for contracts lives under
docs/domain/contracts/and event semantics underdocs/domain/events-catalog.md. This section only pins touchpoints so you can jump to the exact files.
Audit.Gateway¶
- HTTP (OHS)
POST /audit/append→contracts/gateway/http/append.v1.mdPOST /audit/query→contracts/gateway/http/query.v1.md
- Async (Intent topic)
audit.append→events-catalog.md#auditappend-intent
- Notes
- Normalizes external payloads into
AppendIntent(see ACL adapter stubs:contracts/gateway/acl/).
- Normalizes external payloads into
Audit.Ingestion¶
- gRPC/HTTP (internal)
AcceptAppend,Replay→contracts/ingestion/grpc/acceptappend.v1.proto
- Async (domain events)
audit.appended.v1→events-catalog.md#auditappended
- Idempotency
- Key =
{tenantId, source, idempotencyKey}(see header/metadata spec:contracts/shared/idempotency.md).
- Key =
Audit.Policy¶
- gRPC/HTTP (decisions/config)
CheckPolicy,GetRetention,Classify→contracts/policy/grpc/checkpolicy.v1.proto
- Async
policy.changed.v1→events-catalog.md#policychanged
Audit.Integrity¶
- HTTP/gRPC (verification)
GET /integrity/entries/{id}/verifyPOST /integrity/ranges/verify→contracts/integrity/http/verify.v1.md
- Async
- Consumes
audit.appended.v1(build chains) →events-catalog.md#auditappended
- Consumes
Audit.Projection¶
- Async (builders)
- Consumes
audit.appended.v1,policy.changed.v1
- Consumes
- Internal ops
Rebuild,Checkpoint→contracts/projection/internal/rebuild.v1.md
Audit.Query¶
- HTTP/gRPC (OHS)
POST /audit/searchGET /audit/records/{id}POST /audit/verify→contracts/query/http/search.v1.md
- Async (export trigger)
- Emits
export.requested.v1(optional) →events-catalog.md#exportrequested
- Emits
Audit.Search¶
- HTTP (internal)
POST /search/indexGET /search/suggest→contracts/search/http/index.v1.md
- Async
- Consumes
audit.appended.v1; emitssearch.indexed.v1→events-catalog.md#searchindexed
- Consumes
Audit.Export¶
- HTTP (OHS)
POST /export/jobsGET /export/jobs/{id}→contracts/export/http/jobs.v1.md
- Webhooks (signed)
export.completed.v1(HMAC-SHA256 over payload; headerX-Export-Signature) →contracts/export/webhooks/export.completed.v1.md
Audit.Admin¶
- HTTP (OHS)
/admin/tenants,/admin/contracts,/admin/policies→contracts/admin/http/
- Async (PL events)
tenant.updated.v1,contract.updated.v1→events-catalog.md#tenantupdated
Versioning & Discovery (watermark)¶
- HTTP/gRPC: SemVer in Accept (e.g.,
application/vnd.connectsoft.audit.search+json;v=1) and/or path (/v1/...). Additive-first; breaking changes → newvNsurface. - Events/Topics: Subject suffix
.vN(e.g.,audit.appended.v1). Schema ID carried in metadata (schemaId,schemaHash). - Deprecation windows: Minimum 180 days; both
vNandvN+1live in parallel; announce viacontract.updated(Admin). - Registry & “current” pointers:
- Index:
contracts/index.mdlists all surfaces by context. - Machine-readable:
contracts/registry.jsonexposes latest versions and schema IDs. - Event catalog:
events-catalog.mdis normative for names, required fields, and semantics.
- Index:
Always link specs from code repos to these canonical files. If you must drift, open an ADR and reference it in Evolution & ADR Links.
Contracts & Events: Where to Look¶
Source of truth for contracts lives under
docs/domain/contracts/and event semantics underdocs/domain/events-catalog.md. This section only pins touchpoints so you can jump to the exact files.
Audit.Gateway¶
- HTTP (OHS)
POST /audit/append→contracts/gateway/http/append.v1.mdPOST /audit/query→contracts/gateway/http/query.v1.md
- Async (Intent topic)
audit.append→events-catalog.md#auditappend-intent
- Notes
- Normalizes external payloads into
AppendIntent(see ACL adapter stubs:contracts/gateway/acl/).
- Normalizes external payloads into
Audit.Ingestion¶
- gRPC/HTTP (internal)
AcceptAppend,Replay→contracts/ingestion/grpc/acceptappend.v1.proto
- Async (domain events)
audit.appended.v1→events-catalog.md#auditappended
- Idempotency
- Key =
{tenantId, source, idempotencyKey}(see header/metadata spec:contracts/shared/idempotency.md).
- Key =
Audit.Policy¶
- gRPC/HTTP (decisions/config)
CheckPolicy,GetRetention,Classify→contracts/policy/grpc/checkpolicy.v1.proto
- Async
policy.changed.v1→events-catalog.md#policychanged
Audit.Integrity¶
- HTTP/gRPC (verification)
GET /integrity/entries/{id}/verifyPOST /integrity/ranges/verify→contracts/integrity/http/verify.v1.md
- Async
- Consumes
audit.appended.v1(build chains) →events-catalog.md#auditappended
- Consumes
Audit.Projection¶
- Async (builders)
- Consumes
audit.appended.v1,policy.changed.v1
- Consumes
- Internal ops
Rebuild,Checkpoint→contracts/projection/internal/rebuild.v1.md
Audit.Query¶
- HTTP/gRPC (OHS)
POST /audit/searchGET /audit/records/{id}POST /audit/verify→contracts/query/http/search.v1.md
- Async (export trigger)
- Emits
export.requested.v1(optional) →events-catalog.md#exportrequested
- Emits
Audit.Search¶
- HTTP (internal)
POST /search/indexGET /search/suggest→contracts/search/http/index.v1.md
- Async
- Consumes
audit.appended.v1; emitssearch.indexed.v1→events-catalog.md#searchindexed
- Consumes
Audit.Export¶
- HTTP (OHS)
POST /export/jobsGET /export/jobs/{id}→contracts/export/http/jobs.v1.md
- Webhooks (signed)
export.completed.v1(HMAC-SHA256 over payload; headerX-Export-Signature) →contracts/export/webhooks/export.completed.v1.md
Audit.Admin¶
- HTTP (OHS)
/admin/tenants,/admin/contracts,/admin/policies→contracts/admin/http/
- Async (PL events)
tenant.updated.v1,contract.updated.v1→events-catalog.md#tenantupdated
Versioning & Discovery (watermark)¶
- HTTP/gRPC: SemVer in Accept (e.g.,
application/vnd.connectsoft.audit.search+json;v=1) and/or path (/v1/...). Additive-first; breaking changes → newvNsurface. - Events/Topics: Subject suffix
.vN(e.g.,audit.appended.v1). Schema ID carried in metadata (schemaId,schemaHash). - Deprecation windows: Minimum 180 days; both
vNandvN+1live in parallel; announce viacontract.updated(Admin). - Registry & “current” pointers:
- Index:
contracts/index.mdlists all surfaces by context. - Machine-readable:
contracts/registry.jsonexposes latest versions and schema IDs. - Event catalog:
events-catalog.mdis normative for names, required fields, and semantics.
- Index:
Always link specs from code repos to these canonical files. If you must drift, open an ADR and reference it in Evolution & ADR Links.
Tenancy & Ownership¶
Each context declares what it owns (authoritative sources of truth) and what it derives (indices, projections, caches). Tenant isolation is enforced end-to-end via keys, partitioning, and RLS/filters.
| Context | Owns (authoritative) | Indices / Projections (derived) | Tenant keying (partitioning / RLS / filters) | Cross-tenant rules |
|---|---|---|---|---|
| Audit.Gateway | None (stateless for business data). May persist AccessLog/RateLimit counters (operational). | — | All inbound calls must carry TenantId; Gateway injects TenantId & CorrelationId downstream; rejects missing/mismatched tenant claims. |
Disallowed. Gateway enforces per-request tenant scoping; no cross-tenant fan-out. |
| Audit.Ingestion | AuditEntry (append-only, immutable), AppendReceipt (ack metadata). | Staging queue only (transient). | Physical/logical partition by TenantId; RLS on event store; idempotency scope {TenantId, ProducerAppId, Source, IdempotencyKey}. |
Disallowed. Replay & rebuild are tenant-scoped jobs. |
| Audit.Policy | PolicyDefinition, RetentionPolicy, ClassificationPolicy (incl. versions). | KV/cache of compiled policies per tenant/version. | Policies keyed {TenantId, Edition} with platform defaults and per-tenant overrides; reads require tenant match. |
No cross-tenant decisions; multi-tenant policy reads restricted to platform admins (read-only). |
| Audit.Integrity | EvidenceLink, ChainSegment, MerkleRootAnchor. | Verification caches (proof memoization) per tenant. | Separate chain per TenantId (and optionally per region/namespace); proof queries require matching TenantId. |
Disallowed. No proofs across tenants; anchors never mix tenants. |
| Audit.Projection | none (does not create new business truth). | AuditRecordView (per-tenant tables), PolicyView; Checkpoint state. | Per-tenant schemas or table partition by TenantId; RLS on all views; rebuild jobs are tenant-scoped. |
Disallowed. Aggregates and joins are tenant-bounded. |
| Audit.Query | No business truth; may own QueryAuditLog (requests), AccessAudit. | Uses Projection as source of truth for reads. | All queries require tenant filters injected server-side; ABAC checks on roles/scopes. | Disallowed by default. Exception: platform auditors with explicit platform-admin scope and “multi-tenant read” feature flag (read-only). |
| Audit.Search | No business truth. | AuditSearchDoc (per-tenant index/partition), suggest dictionaries. | Dedicated index per tenant or partition key TenantId; query-time tenant filter enforced. |
Disallowed. No cross-tenant search indices or queries. |
| Audit.Export | ExportJob, ExportArtifact (metadata). | Temporary assembly areas; signed URLs; delivery receipts. | Artifact storage path tenants/{TenantId}/exports/{JobId}; keys/secrets resolved per tenant; webhook signing keys per tenant. |
Disallowed. Exports cannot include records from multiple tenants. |
| Audit.Admin | Tenant registry, Edition mapping, ClientApp credentials, ContractDescriptor. | Catalog caches for quick lookup. | TenantId is authoritative here; residency/region and edition enforced at provisioning time for downstream stores. |
Limited to platform staff; writes are tenant-scoped; reads across tenants only for admin console. |
Idempotency & Correlation (ingest/query requirements)¶
-
Idempotency (Append)
- Header/metadata:
X-Idempotency-Key(opaque), plus producer metadataProducerAppId,Source. - Scope:
{TenantId, ProducerAppId, Source, IdempotencyKey}. - Retention: configurable; default ≥ 7 days to survive network retries and DLQ replays.
- Behavior: On duplicate, Ingestion returns the original AppendReceipt (HTTP 200) without writing a new
AuditEntry.
- Header/metadata:
-
Correlation & Tracing
- Accept
X-Correlation-Id(sticky across contexts) and W3Ctraceparent(OpenTelemetry). - Gateway creates values if missing, propagates via event metadata (
correlationId,traceId) and request headers. - All domain events (
audit.appended,policy.changed, etc.) must include{tenantId, correlationId, causationId?, traceId}.
- Accept
-
Tenant Header & Claims
X-Tenant-Id(or embedded in client credentials) is mandatory at Gateway; downstream services do not trust caller-supplied tenant values—Gateway signs/enriches them.
Residency & Data Locality (enforced by Admin)¶
- Region binding (per tenant) drives storage/account selection for Ingestion store, Integrity ledger, Projection DB, Search index, and Export bucket/prefix.
- Cross-region moves require Admin workflow (deactivate → migrate → reactivate) and explicit ADR.
Rule of thumb: If a context “creates original business facts,” it owns them; everything else is derived and must be reproducible from owned facts + policies.
Security Overlays (Zero-Trust)¶
We treat every hop as hostile by default. Controls are enforced at the edge (Gateway), inside the mesh (service→service), and at data layers. Workloads authenticate with workload identity; traffic is mTLS; requests are authorized with ABAC/RBAC and tenant/edition guards; sensitive fields are classified/redacted at policy checkpoints.
Overlay diagram¶
flowchart LR
subgraph EDGE["Edge / Public Ingress"]
C[Clients/SDKs]
end
subgraph GZ["Gateway Zone (mTLS ingress)"]
GW[Audit.Gateway<br />AuthN: OAuth2/JWT<br />AuthZ: ABAC/RBAC<br />Rate limit<br />Schema & edition gates]
end
subgraph MESH["Service Mesh / mTLS"]
ING[Audit.Ingestion<br />Idempotency + tenancy inject]
POL[Audit.Policy<br />Classification & decisions]
INT[Audit.Integrity<br />Hash/Merkle proofs]
PRJ[Audit.Projection]
QRY[Audit.Query]
SRCH[Audit.Search]
EXP[Audit.Export<br />Signed webhooks]
ADM[Audit.Admin<br />Tenant/Edition registry]
end
C ---|TLS| GW
GW ---|mTLS + ABAC| ING
ING ---|mTLS - CONF| POL
ING -. events (signed, tenant-scoped) .-> INT
ING -. events (signed, tenant-scoped) .-> PRJ
ING -. events (signed, tenant-scoped) .-> SRCH
QRY ---|mTLS - CS| INT
QRY ---|mTLS - CS| EXP
ADM -. PL events .-> GW
ADM -. PL events .-> POL
subgraph DATA["KMS-protected Data Layers"]
ES[(Event Store / Ledger)]
RM[(Read Models)]
IDX[(Search Index)]
OBJ[(Object Storage — Exports)]
end
ING --- ES
PRJ --- RM
SRCH --- IDX
EXP --- OBJ
Boundary policies (who enforces what)¶
Client → Gateway (public edge)
- Transport: TLS 1.2+; HSTS at edge; strict ALPN/ciphers.
- AuthN: OAuth2/JWT (aud/iss/exp/nbf checked); optional mTLS for partner apps.
- AuthZ: ABAC (tenant, roles/scopes, edition features).
- Tenancy: Gateway injects
TenantId, rejects cross-tenant hints; correlatesX-Correlation-Id. - Validation: JSON schema & contract version; edition gates; size limits; content scanning (optional).
- Rate limiting: Token-bucket per tenant + client with burst/steady; 429 +
Retry-After. - PII hooks: Request models pre-classified; drop disallowed fields before emit.
Gateway → Ingestion (hot path)
- mTLS + workload identity: service-to-service with SPIFFE/SPIRE or AAD Workload Identity.
- AuthZ: ABAC check (tenant/edition) survives hop via signed claims.
- Idempotency: Required
X-Idempotency-Key; scope{TenantId, ProducerAppId, Source, Key}. - Observability: W3C
traceparentpropagated; structured audit log.
Ingestion ⇢ {Integrity, Projection, Search} (event edges)
- Transport: Broker auth (SAS/AAD), topic-level ACLs; mTLS where supported.
- Envelope: Events signed (producer key id + hash) and include
{tenantId, schemaId, correlationId}. - PII: Fields already classified/redacted by Policy at accept time.
- Replay/DLQ: Tenant-scoped DLQ; poison-pill quarantine; ordered replays per partition key.
Ingestion → Policy (decision call)
- mTLS + workload identity; timeout budget small (latency-sensitive).
- Cache: Negative/positive decision caching with TTL; eTags/version.
Query → Integrity (on-read verify)
- mTLS; AuthZ requires same-tenant proof requests.
- Proofs: Range proofs returned with
evidenceHash,chainId,window.
Query → Export; Export → Webhook recipients
- Export API: mTLS; ABAC on job scope; encryption conf set by tenant policy.
- Artifacts: KMS envelope encryption; per-tenant bucket/prefix.
- Webhooks: HMAC-SHA256 signature over canonical payload; headers
X-Export-Signature,X-Export-Timestamp; 5-min skew window; retries with exponential backoff.
Admin → {Gateway, Policy} (PL events)
- Publisher: Admin signs
tenant.updated,contract.updated; consumers verify signature + version. - Controls: Edition/residency changes require multi-party approval (break-glass logged).
Mandatory controls matrix¶
| Area | Control | Enforced by |
|---|---|---|
| Transport | TLS at edge; mTLS in mesh | Ingress/Gateway, Mesh/Sidecars |
| Workload identity | SPIFFE/SPIRE or AAD Workload Identity; no static secrets in pods | Platform IAM |
| AuthN/AuthZ | OAuth2/JWT (clients); ABAC/RBAC (services) | Gateway, Services |
| Tenancy | Server-enforced tenant filter; signed tenant claims | Gateway, All services |
| PII/Classif. | Policy classifies; Ingestion applies redaction before persist | Policy, Ingestion |
| Rate limiting | Per-tenant/client; separate write vs read buckets | Gateway |
| Data-at-rest | KMS keys per store (event store, read models, search, exports) | Platform/KMS |
| Secrets | Central vault; short-lived tokens; no inline secrets | Platform |
| Logging | Structured, redacted logs; no PII beyond policy | All services |
| Integrity | Hash chains, Merkle roots, anchored periodically | Integrity |
Notes & defaults¶
- Default-deny on network and IAM. Only declared edges are allowed.
- Additive-first versioning; breaking changes require new
vNand ADR. - Residency and region enforced by Admin at provisioning; data paths are tenant-scoped.
- Back-pressure policies (429/deferral/DLQ) must not leak cross-tenant timing channels.
See also: Tenancy & Ownership for partitioning/RLS, and Reliability Notes for DLQ/replay guarantees.
Reliability Notes: Hot Paths & Recovery¶
This section names the golden paths, sets p95 targets, and documents back-pressure + replay entry points so SREs and service owners share the same operational contract.
SLIs & scope¶
- Availability (per OHS): successful responses / total, excluding client 4xx (except 429).
- Latency (p95) measured server-side within the same region; excludes client/network RTT.
- Durability of append: an append is “accepted” once persisted to the Ingestion event store; downstream consumers are eventually consistent.
Golden Path A — Append → Accept → (event) Project/Search/Integrity¶
User intent: client app appends an audit entry.
Targets
- Gateway
POST /audit/append(sync mode): p95 ≤ 150 ms, p99 ≤ 300 ms. - If Gateway sheds to async mode (intent enqueue): p95 ≤ 60 ms for 202 Ack; Ingestion accept within p95 ≤ 2 s end-to-end (queue to persist).
Back-pressure & controls
- Gateway: per-tenant/token buckets; on breach → 429 with
Retry-After. Adaptive mode: switches from sync to async (enqueueaudit.appendintent) when local queue > threshold. - Gateway → Ingestion (sync path): if Policy/Ingress budget exceeded → 503 with
Retry-After; clients retry with same Idempotency-Key. - Ingestion: write throttles to event store; if pressure → stage to durable queue; consumers slowed via prefetch & concurrency limits.
- Events (Ingestion → {Projection, Search, Integrity}): ASB DLQ after
Nattempts (default 10, exponential backoff); deferral for out-of-order ranges; tenant-partitioned subscriptions.
Replay / rebuild
- Re-ingest (safe):
Ingestion.Replay(tenantId, fromOffset, toOffset?)— idempotent by{TenantId, ProducerAppId, Source, IdempotencyKey}. - Projection:
Projection.Rebuild(tenantId, checkpoint?)— drains from event store; checkpoints per tenant. - Search:
Search.Reindex(tenantId, range?)— idempotent ondocumentId+version. - Integrity:
Integrity.Repair(tenantId, gapRange)— recomputes chains and anchors; no cross-tenant proofs.
Observability
- Required dimensions: {tenantId, route, mode(sync|async), status, idempotencyHit(bool)}; export drain rate and oldest message age for audit.append.
Golden Path B — Query → (read models) → (optional) Verify¶
User intent: list/filter audit records, optionally verify integrity on read.
Targets
- POST /audit/search: p95 ≤ 250 ms for page sizes ≤ 100.
- GET /audit/records/{id} + POST /audit/verify: p95 ≤ 200 ms for typical proof windows (≤ 1 k entries).
Back-pressure & controls
- Gateway: per-tenant read buckets; expensive queries are shaped (server-side limits) or return 429 with guidance.
- Query: protects read models with max rows / timeouts; cache hot filters; coalesce repeated requests by tenantId+hash(query) window.
- Query → Integrity: if verify exceeds budget → return partial set + verificationPending=true (option), or 202 to async verify with callback.
Replay / rebuild
- Source of truth is Projection; if gaps detected → trigger Projection.Rebuild(tenantId).
- Integrity verification can be deferred: VerifyRangeAsync(ticket); result cached by (tenantId, rangeHash).
Observability
- Export SLIs: qps, hitRatio(cache), p95, timeouts, verifyRate, verifyLatency.
Golden Path C — Export (long-running)¶
User intent: export query results to CSV/JSON/Parquet with signed delivery.
Targets
- POST /export/jobs: p95 ≤ 120 ms to accept & enqueue.
- Job start SLA: p95 ≤ 60 s under steady load; completion depends on dataset size; progress exposed via /export/jobs/{id}.
Back-pressure & controls - Bounded concurrency per tenant; queue length SLO and age alerts. - If object storage throttles → exponential backoff; partial chunks checkpointed. - Webhook retries (HMAC signed) with jittered backoff; DLQ on receiver 4xx/5xx after budget.
Replay / rebuild
- Export.Resume(jobId) — idempotent chunking; artifacts content-addressed.
- Regenerate artifact: Export.Rerun(jobId) stores new version, previous retained by retention policy.
Observability
- SLIs per tenant:
queuedJobs,runningJobs,meanChunkTime,artifactSize,webhookFailures.
Golden Path D — Admin Updates (tenant/edition/policy)¶
User intent: update tenant/edition or policy; propagate safely.
Targets
POST /admin/policiesor/admin/tenants: p95 ≤ 200 ms to persist and emit*.updated.- Downstream consumption SLO: p95 ≤ 30 s for Projection/Search to reflect policy changes.
Back-pressure & controls
- Admin changes rate-limited; circuit prevents mass invalidation storms.
- Consumers apply changes with version gating; if behind → queue defers until consistent.
Replay / rebuild
Admin.Rebroadcast(tenantId, type, version)seeds re-delivery.Projection.Rebuildif classification/retention policy changed (ensures derived state consistency).
Observability
- Track policy version adoption per service; alert on lag (e.g., > 2 versions).
Back-pressure & DLQ Matrix (summary)¶
| Edge | Mechanism | Default policy |
|---|---|---|
| Client → Gateway | 429 + Retry-After; adaptive degrade to async appends |
Per-tenant token bucket; separate write/read buckets |
| Gateway ↔ Policy (sync) | 503 + retry with jitter; small timeouts; circuit breaker | Budget 50 ms median; fallback = enqueue intent |
| Gateway → Ingestion (sync) | 503 + retry w/ idempotency; or enqueue | Idempotent keys; drop duplicates by scope |
| Ingestion → Event Bus | Outbox + publish retry; handoff DLQ | Max attempts 10; DLQ w/ diagnostics and sample payload |
| Bus → | Consumer retry/backoff; DLQ/deferral | Tenant partition key; replay safe |
| Query → Integrity | Timeout budget; partial results or async verify | SLA ties to page size/proof window |
| Export → Webhook | Signed retries; DLQ after budget | 6 attempts, exponential backoff, 5-minute max skew |
Operator playbook pointers¶
- Rebuild: Start with Projection, then Search, then re-issue VerifyRange if needed.
- Drain: Pause producers for a tenant using Gateway admission control, then
Replayfrom last good checkpoint. - Hot shard: Enable producer-side backoff; increase consumer concurrency for that partition only; avoid global scale-out first.
All paths rely on idempotency + tenant partitioning to make replay safe. If any component cannot guarantee this, file an ADR and link it in Evolution & ADR Links.
Evolution & ADR Links¶
We evolve edges additively-first and treat the context map as a governed contract. Any deviation or breaking move must carry an ADR.
Versioning & change rules¶
-
APIs (HTTP/gRPC)
- Allowed (non-breaking): add optional fields; add new endpoints; widen enum with default; increase limits with server-side caps.
- Breaking → new major
vN: remove/rename fields; change semantics; tighten validation; change status codes. - Surface major in path (
/v2/...) or Accept header (v=2); run vN and vN+1 in parallel for ≥ 180 days.
-
Events/Topics
- Name as
domain.event.vN; additive changes stay withinvN; breaking changes → new subject.vN+1. - Carry
schemaId,schemaHash,tenantId,correlationId. - Dual-publish during migrations; consumers opt-in per version.
- Name as
-
Contracts registry
- Update
contracts/index.mdandcontracts/registry.jsonwith each change; bump SemVer. - Emit
contract.updated.v1from Admin announcing availability/deprecation window.
- Update
Edge evolution policies (what’s allowed)¶
- Add a new edge: allowed if tenancy, security, and SLO budget are documented (see checklist below).
- Change edge style (e.g.,
CONF→ACL): require ADR with rationale (coupling, translation needs, upstream churn). - Remove an edge: only after window closes and all consumers are verified migrated (evidence via dashboards).
- Raise criticality (e.g.,
BA→LS): needs capacity/SLO analysis and load test artifacts. - Residency/region impact: requires migration plan + data path verification; Admin gated.
When to introduce an ACL (decision cues)¶
Introduce an Anti-Corruption Layer if any of the following hold:
- Upstream “Published Language” conflicts with our domain terms or policy model.
- Upstream schema churns frequently or has weak versioning guarantees.
- Security/classification requirements differ (need pre-ingest redaction/validation).
- We need canary or translation for rollout without exposing internals.
If none apply and latency is critical, prefer Conformist.
“Propose a new edge” — PR checklist (copy into PR description)¶
-
Schema & Contracts
- Contract file(s) under
docs/domain/contracts/...with examples and validation (JSON Schema/Proto). - Event subject named
x.y.vN; schema registered incontracts/registry.json. - Compatibility statement (additive vs breaking) and deprecation plan (if replacing an existing edge).
- Contract file(s) under
-
Tenancy & Data
-
tenantIdpropagation from Gateway; server-enforced filters/RLS noted. - Idempotency scope defined (if write/append):
{TenantId, ProducerAppId, Source, Key}. - Residency/region path (storage/index/bucket prefix).
-
-
Security
- AuthN (OAuth2/JWT or workload identity) and AuthZ posture (ABAC/RBAC).
- PII classification hooks; redaction policy at accept time.
- KMS usage for any data-at-rest or artifact produced; secret source (Vault).
-
Observability
- W3C trace propagation (
traceparent); correlation fields in events. - Metrics: QPS, p95, error rate, DLQ age / drain rate; logs redaction noted.
- Dashboards/alerts updated (golden signals).
- W3C trace propagation (
-
SLO & Reliability
- Proposed p95/p99; capacity estimate; back-pressure behavior (429/503, deferral).
- Retry policy, DLQ policy, replay entry points; idempotency verified.
-
Docs & Governance
-
context-map.mdedges/labels updated with style & criticality. -
events-catalog.mdentry added/updated. - Runbook link (rebuild/replay).
- ADR linked (see below).
-
ADRs for deviations (link examples)¶
docs/adr/2025-10-22-gw-query-conformist.md— Adopt Conformist on Gateway→Query to minimize latency.docs/adr/2025-10-22-introduce-acl-external-publishers.md— Add ACL adapters at Gateway for external systems.docs/adr/2025-10-22-event-versioning-v2-for-audit-appended.md— Promoteaudit.appendedto v2 with new required fieldevidenceHint.
Tip: Use the ADR template (
docs/adr/_template.md) and tag withcontext-map,contracts,security,tenancy.
Governance hooks (automation)¶
- CI checks:
- Contract lint (schema validity), breaking-change detector (forbidden field deletes), link checker for docs.
- “Contracts registry up-to-date” gate; CODEOWNERS require sign-off from Domain, SRE, Security.
- Release:
- Dual-run vN & vN+1; publish adoption dashboards; emit
contract.updated. - Set deprecation tombstones with removal date; auto-open tracking issue.
- Dual-run vN & vN+1; publish adoption dashboards; emit
If a PR passes this section’s checklist and CI gates, reviewers can approve without additional architecture meetings.