Data Model - Audit Trail Platform (ATP)¶
This document defines the canonical data model for the Audit Trail Platform (ATP). It is the contract for how audit facts are written, stored, projected, searched, exported, and governed across tenants.
Purpose¶
- Provide a single source of truth for entities, value objects, and wire contracts used by ATP (write-path facts, read models, policies, proofs, exports).
- Enable safe evolution (versioning and backward compatibility) across microservices, storage engines, and client SDKs.
- Align teams on ubiquitous language (UL) used in ATP and referenced by the HLD and Context Map.
Scope¶
This document covers:
- Canonical write model (
AuditRecord) and its aggregates (Actor, ResourceRef, Correlation, Decision). - Policy and governance models (Classification, Redaction, Retention, Legal Hold).
- Integrity structures (proof references, segments, Merkle roots) and export manifests.
- Tenancy, partitioning, projections, and optional search index mappings.
- Event contracts, validation and limits, schema evolution strategy.
- Golden fixtures and conformance assets for CI.
Non-Goals¶
- Runtime behavior of services (covered in HLD and component docs).
- API endpoint routing, authz middleware, or deployment topologies.
- Vendor-specific storage tuning beyond documented size budgets.
- Business analytics models unrelated to audit semantics.
Modeling Principles & Conventions¶
This section defines canonical rules for names, time, identifiers, types, enums, nullability, casing, and cross-representation alignment (JSON, C# gRPC code-first, storage).
Conventions (Cheat-Sheet)¶
| Topic | Rule (MUST unless noted) | Rationale | Example |
|---|---|---|---|
| Character encoding | UTF-8 everywhere | Interop, signatures | — |
| Time | RFC3339/ISO-8601 in UTC with Z suffix; accept offsets at ingress but normalize to UTC |
Consistency, ordering | 2025-10-22T14:05:13.481Z |
| Clock skew | Accept up to ±5 min skew; record both producer and gateway times when relevant | Resilience | createdAt vs observedAt |
| IDs (general) | Opaque strings; ASCII [A-Za-z0-9._-]; length ≤ 128 |
Safe in logs/URLs | exp_01HFJ0..., user.42 |
| Record IDs | ULID (26 chars, Crockford base32) for append-ordered facts | K-sort by time; unique | 01JD0R3R9E7Z4XTKP5B7XH3FXF |
| Correlation | traceId (W3C 16-byte hex), spanId (8-byte), requestId (string), causationId (ULID) |
Debuggability | traceId="4fd5…" |
| Idempotency | idempotencyKey (≤128 chars) on write-path |
Safe retries | POST /records with key |
| Booleans | Never tri-state in JSON; nullable only if “unknown” is distinct | Avoid ambiguity | isRedacted: true|false |
| Numbers | 64-bit integers; decimals as strings with regex if precision matters | Avoid JS rounding | "0.0001", "123.45" |
| Maps | String keys; typed values; ≤ 100 entries unless stated | Bound payloads | { "tags": { "env":"prod" } } |
| Arrays | Deterministic order if order matters; empty arrays allowed; omit when unknown | Predictability | events: [] |
| Nullability | Prefer omit over null; if null, define semantics |
Diff friendliness | (see below) |
| Casing (JSON) | lowerCamelCase field names; kebab-case schema filenames | Ecosystem norms | auditRecordId, audit-record.v1.json |
| Casing (C# gRPC code-first & Protobuf) | PascalCase for classes, properties, methods, enum types & values, and protobuf field names (when emitted). | .NET idioms; code-first parity | AuditRecord.CreatedAt, AppendAsync(...) |
| Database schema | PascalCase for table names (plural) and column names | Readability; tooling parity | Table AuditRecords, column CreatedAt |
| Resource types | PascalCase singular nouns; namespaced where needed | Clarity | Patient, Vetspire.Appointment |
| Enum safety | Include Unknown = 0; never reuse numbers; reserve removed values |
Wire compat | (see Enums) |
| Problem+JSON | RFC 9457 application/problem+json for API errors; include traceId |
Operability | (see Errors) |
| Extensibility | Additive evolution; unknown fields ignored; never break required invariants | Compatibility | C19 rules apply |
Concrete size/count limits finalize in C20 — Validation, Limits & Canonicalization.
Time & Clocks¶
- Ingress: Accept RFC3339 timestamps with offsets and normalize to UTC on write.
- Canonical fields:
createdAt— producer-asserted (UTC)observedAt— gateway receipt (UTC)effectiveAt— policy/application effective (UTC)
- Skew: Allow ±5 minutes; larger deltas annotate
validation.warningsor route to a suspect queue.
Example (JSON)
{
"auditRecordId": "01JD0R3R9E7Z4XTKP5B7XH3FXF",
"createdAt": "2025-10-22T14:05:13.481Z",
"observedAt": "2025-10-22T14:05:14.022Z"
}
Identifiers¶
- ULID for append facts:
auditRecordIdMUST be a ULID to preserve time-ordered writes. - Opaque everywhere else: IDs are never parsed for meaning; semantics live in adjacent fields.
- Idempotency: Producers SHOULD send
idempotencyKey; gateways reject duplicates within a configured window (C20).
Validation
- Regex (IDs):
^[A-Za-z0-9._-]{1,128}$ - Regex (ULID):
^[0-9A-HJKMNP-TV-Z]{26}$
JSON ↔ C# gRPC Code-First ↔ Protobuf¶
| Concept | JSON | C# (code-first) | Protobuf (if emitted) | Notes |
|---|---|---|---|---|
| Timestamp | RFC3339 string (UTC) | DateTimeOffset/Timestamp wrapper |
google.protobuf.Timestamp |
REST renders RFC3339; gRPC binary uses Timestamp |
| 64-bit int | JSON number | long / ulong |
int64 / uint64 |
JS clients may stringify |
| Decimal (precise) | string with pattern ^-?[0-9]+(\\.[0-9]+)?$ |
string |
string |
Avoid precision loss |
| Map | JSON object | Dictionary<string,T> |
map<string, T> |
String keys only |
| Optional | Omit field | nullable refs string? / T? |
optional / presence |
Prefer omit in JSON |
| Oneof | Presence discriminator | C# OneOf pattern / union type |
oneof |
Mutually exclusive |
| Enum | JSON string literal (REST) | enum DecisionOutcome { Unknown=0, ... } |
enum (0=Unknown) |
REST SHOULD expose strings |
| Naming | lowerCamelCase | PascalCase | PascalCase (fields too) + json_name lowerCamelCase |
Keep REST/JSON consistent |
C# code-first example (message + service)
[DataContract]
public sealed class AuditRecord
{
[DataMember(Order = 1)] public string AuditRecordId { get; init; } = default!;
[DataMember(Order = 2)] public DateTimeOffset CreatedAt { get; init; }
[DataMember(Order = 3)] public string TenantId { get; init; } = default!;
}
[ServiceContract]
public interface IAuditIngestionService
{
Task<AppendResult> AppendAsync(AuditRecord record, CallContext context = default);
}
If emitting .proto from code-first: keep PascalCase names, but set json_name to lowerCamelCase to align REST:
message AuditRecord {
string AuditRecordId = 1 [json_name = "auditRecordId"];
string TenantId = 2 [json_name = "tenantId"];
google.protobuf.Timestamp CreatedAt = 3 [json_name = "createdAt"];
}
JSON naming policy: REST services MUST apply a lowerCamelCase policy (e.g., System.Text.Json JsonNamingPolicy.CamelCase) so C# CreatedAt → JSON createdAt.
Nullability & Field Presence¶
- Prefer omit in JSON for unknown/not applicable.
- Use
nullonly when meaningful (e.g., redacted): schema MUST define semantics. - Booleans: avoid tri-state; use an enum (
Yes|No|Unknown) when needed. - C#: enable nullable ref types; mark optional as
string?,T?. - Protobuf: use
optionalfor presence; do not overload zero/empty.
Example
Enums & Versioning¶
- Shape:
- Types/values in PascalCase (
DecisionOutcome.Allow) - Value 0 is
Unknownand MUST exist
- Types/values in PascalCase (
- Stability: Never reuse numeric values; reserve removed ones.
- JSON representation: REST SHOULD emit/accept string names.
- Evolution: Additive; consumers must treat unknowns as
Unknownor a safe default.
C#
Naming & Casing Details¶
- JSON fields:
lowerCamelCase(auditRecordId,tenantId,createdAt). - C# & Protobuf (code-first): PascalCase for classes, properties, methods, and protobuf field names when generated.
- Database schema:
- Tables: PascalCase plural (
AuditRecords,ExportJobs) - Columns: PascalCase (
AuditRecordId,TenantId,CreatedAt) - Indexes:
IX_<Table>_<Col1>_<Col2>(IX_AuditRecords_TenantId_CreatedAt) - Foreign keys:
FK_<FromTable>_<ToTable>_<Column>
- Tables: PascalCase plural (
DDL sketch
CREATE TABLE dbo.AuditRecords (
AuditRecordId CHAR(26) NOT NULL, -- ULID
TenantId NVARCHAR(64) NOT NULL,
CreatedAt DATETIME2(3) NOT NULL,
CONSTRAINT PK_AuditRecords PRIMARY KEY (AuditRecordId)
);
CREATE INDEX IX_AuditRecords_TenantId_CreatedAt
ON dbo.AuditRecords (TenantId, CreatedAt);
Problem+JSON (API Error Hints)¶
Use RFC 9457 application/problem+json with:
type,title,status,detail,instance,traceIderrorsmap for field-level validation messages
Example
{
"type": "urn:connectsoft:errors:validation",
"title": "Invalid audit record",
"status": 400,
"detail": "createdAt must be a valid RFC3339 timestamp in UTC (Z).",
"instance": "/records/ingest",
"traceId": "4fd5d2ac7a0b8f1f",
"errors": { "createdAt": ["Expected UTC (Z) suffix"] }
}
Canonical AuditRecord (Entity)¶
Defines the canonical write model appended by producers to represent a single, immutable audit fact within a tenant.
Overview¶
An AuditRecord captures who (actor) did what (action) to which resource (resource), when (createdAt), and in what context (correlation, optional decision).
Records are append-only and immutable once accepted; deduplication relies on auditRecordId (ULID) and/or an idempotencyKey.
Key invariants:
- Immutability: No in-place updates after acceptance; corrections are new records linked via
correlation.causationId. - Tenant isolation:
tenantIdis mandatory and non-transitive across joins. - Time: All canonical timestamps are UTC (see Principles).
- Size bounds: Concrete limits finalized in the Validation & Limits section; design for bounded maps/arrays.
Fields¶
| JSON field (lowerCamel) | C# property (PascalCase) | Type | Req. | Description & rules |
|---|---|---|---|---|
auditRecordId |
AuditRecordId |
string(ULID) | ✓ | Primary identifier for the record; 26-char Crockford ULID. |
tenantId |
TenantId |
string | ✓ | Logical tenant key; opaque, ASCII [A-Za-z0-9._-]{1,128}. |
createdAt |
CreatedAt |
timestamp | ✓ | Producer-asserted time of the audited action (UTC). |
observedAt |
ObservedAt |
timestamp | Gateway receipt time (UTC). | |
actor |
Actor |
object | ✓ | Actor that initiated the action; see Actor Model (code-first class / schema ref). |
resource |
Resource |
object | ✓ | Target of the action; see ResourceRef. |
action |
Action |
string | ✓ | Verb or verb.noun taxonomy (e.g., create, appointment.update). Lowercase; max 64 chars. |
decision |
Decision |
object | Authorization outcome for access events; see Decision & Access Outcome. | |
correlation |
Correlation |
object | Trace/request correlation; includes traceId, requestId, causationId. |
|
idempotencyKey |
IdempotencyKey |
string | Producer key for safe retries; ≤128 chars. | |
attributes |
Attributes |
map |
Flat tags for quick filtering (e.g., env=prod, region=us-central). |
|
delta |
Delta |
object | Field-level changes; see Deltas (before/after). | |
request |
Request |
object | Optional ingress context (e.g., ip, userAgent); subject to classification/redaction. |
|
schemaVersion |
SchemaVersion |
string | Semantic version for the record shape (e.g., audit-record.v1). |
Note:
actor,resource,decision,correlation, anddeltaare defined in their respective sections and referenced here to keep the write model canonical and stable.
JSON Schema (v1)¶
{
"$id": "urn:connectsoft:schemas:audit-record.v1.json",
"$schema": "https://json-schema.org/draft/2020-12/schema",
"title": "AuditRecord",
"type": "object",
"additionalProperties": false,
"properties": {
"auditRecordId": {
"type": "string",
"description": "ULID primary identifier",
"pattern": "^[0-9A-HJKMNP-TV-Z]{26}$"
},
"tenantId": {
"type": "string",
"minLength": 1,
"maxLength": 128,
"pattern": "^[A-Za-z0-9._-]+$"
},
"createdAt": { "type": "string", "format": "date-time" },
"observedAt": { "type": "string", "format": "date-time" },
"actor": { "$ref": "urn:connectsoft:schemas:partials/actor.v1.json" },
"resource": { "$ref": "urn:connectsoft:schemas:partials/resource-ref.v1.json" },
"action": {
"type": "string",
"minLength": 1,
"maxLength": 64,
"pattern": "^[a-z]+([.][a-z][a-z0-9_-]+)?$"
},
"decision": { "$ref": "urn:connectsoft:schemas:partials/decision.v1.json" },
"correlation": { "$ref": "urn:connectsoft:schemas:partials/correlation.v1.json" },
"idempotencyKey": {
"type": "string",
"maxLength": 128,
"pattern": "^[A-Za-z0-9._-]+$"
},
"attributes": {
"type": "object",
"additionalProperties": { "type": "string", "maxLength": 256 }
},
"delta": { "$ref": "urn:connectsoft:schemas:partials/delta.v1.json" },
"request": {
"type": "object",
"additionalProperties": false,
"properties": {
"ip": { "type": "string", "format": "ipv4" },
"userAgent": { "type": "string", "maxLength": 512 }
}
},
"schemaVersion": { "type": "string", "pattern": "^audit-record\\.v[0-9]+$" }
},
"required": [
"auditRecordId",
"tenantId",
"createdAt",
"actor",
"resource",
"action"
]
}
C# (gRPC code-first) shape¶
[DataContract]
public sealed class AuditRecord
{
[DataMember(Order = 1)] public string AuditRecordId { get; init; } = default!;
[DataMember(Order = 2)] public string TenantId { get; init; } = default!;
[DataMember(Order = 3)] public DateTimeOffset CreatedAt { get; init; }
[DataMember(Order = 4)] public DateTimeOffset? ObservedAt { get; init; }
[DataMember(Order = 5)] public Actor Actor { get; init; } = default!;
[DataMember(Order = 6)] public ResourceRef Resource { get; init; } = default!;
[DataMember(Order = 7)] public string Action { get; init; } = default!;
[DataMember(Order = 8)] public Decision? Decision { get; init; }
[DataMember(Order = 9)] public Correlation? Correlation { get; init; }
[DataMember(Order = 10)] public string? IdempotencyKey { get; init; }
[DataMember(Order = 11)] public IReadOnlyDictionary<string, string>? Attributes { get; init; }
[DataMember(Order = 12)] public Delta? Delta { get; init; }
[DataMember(Order = 13)] public RequestContext? Request { get; init; }
[DataMember(Order = 14)] public string? SchemaVersion { get; init; } = "audit-record.v1";
}
[DataContract]
public sealed class RequestContext
{
[DataMember(Order = 1)] public string? Ip { get; init; }
[DataMember(Order = 2)] public string? UserAgent { get; init; }
}
JSON serialization MUST apply a camelCase naming policy so
AuditRecordId→auditRecordId, etc. Database schema uses PascalCase table/column names (see below).
Protobuf (optional emission)¶
If you emit .proto from the code-first model, keep PascalCase for message/field names and set json_name to lowerCamelCase for REST:
syntax = "proto3";
package connectsoft.audit.v1;
import "google/protobuf/timestamp.proto";
message AuditRecord {
string AuditRecordId = 1 [json_name = "auditRecordId"];
string TenantId = 2 [json_name = "tenantId"];
google.protobuf.Timestamp CreatedAt = 3 [json_name = "createdAt"];
google.protobuf.Timestamp ObservedAt = 4 [json_name = "observedAt"];
Actor Actor = 5 [json_name = "actor"];
ResourceRef Resource = 6 [json_name = "resource"];
string Action = 7 [json_name = "action"];
Decision Decision = 8 [json_name = "decision"];
Correlation Correlation = 9 [json_name = "correlation"];
string IdempotencyKey = 10 [json_name = "idempotencyKey"];
map<string,string> Attributes = 11 [json_name = "attributes"];
Delta Delta = 12 [json_name = "delta"];
RequestContext Request = 13 [json_name = "request"];
string SchemaVersion = 14 [json_name = "schemaVersion"];
}
Examples¶
Minimal (required only)
{
"auditRecordId": "01JE1X7F3Q5X1X3ZQ1TF9Q4Q7J",
"tenantId": "splootvets",
"createdAt": "2025-10-22T14:05:13.481Z",
"actor": { "id": "user_123", "type": "User", "display": "Alex" },
"resource": { "type": "Vetspire.Appointment", "id": "A-9981" },
"action": "create"
}
Rich (with decision, correlation, delta)
{
"auditRecordId": "01JE1X8MFT7Z7P8K9V7E7Q0Q2B",
"tenantId": "splootvets",
"createdAt": "2025-10-22T14:15:01.210Z",
"observedAt": "2025-10-22T14:15:01.742Z",
"actor": { "id": "svc-gw", "type": "Service", "display": "Gateway" },
"resource": { "type": "Vetspire.Appointment", "id": "A-9981", "path": "/status" },
"action": "appointment.update",
"decision": { "outcome": "Allow", "reason": "Policy.Grant" },
"correlation": {
"traceId": "a3f2ccbd7a1f4c0f8b1e9f4d1b7f2a66",
"requestId": "REQ-7f3b4a",
"causationId": "01JE1X8M9QNG3B2W6J5SP2K5H5"
},
"idempotencyKey": "A-9981:status:2025-10-22T14:15:01Z",
"attributes": { "env": "prod", "region": "us-central" },
"delta": {
"fields": {
"status": { "before": "Scheduled", "after": "Booked" }
}
},
"request": { "ip": "203.0.113.27", "userAgent": "Mozilla/5.0" },
"schemaVersion": "audit-record.v1"
}
Storage mapping (authoritative store)¶
Storage names use PascalCase (tables/columns). Immutability is enforced via append-only semantics and constraints.
CREATE TABLE dbo.AuditRecords (
AuditRecordId CHAR(26) NOT NULL, -- ULID
TenantId NVARCHAR(64) NOT NULL,
CreatedAt DATETIME2(3) NOT NULL,
ObservedAt DATETIME2(3) NULL,
Actor NVARCHAR(MAX) NOT NULL, -- JSON (Actor)
Resource NVARCHAR(512) NOT NULL, -- JSON or structured FK; start JSON
Action NVARCHAR(64) NOT NULL,
Decision NVARCHAR(MAX) NULL, -- JSON (Decision)
Correlation NVARCHAR(256) NULL, -- JSON (ids) or separate table if needed
IdempotencyKey NVARCHAR(128) NULL,
Attributes NVARCHAR(MAX) NULL, -- JSON (tags)
Delta NVARCHAR(MAX) NULL, -- JSON (field changes)
Request NVARCHAR(1024) NULL, -- JSON (IP/UA) — subject to redaction
SchemaVersion NVARCHAR(32) NULL,
CONSTRAINT PK_AuditRecords PRIMARY KEY (AuditRecordId)
);
CREATE INDEX IX_AuditRecords_TenantId_CreatedAt
ON dbo.AuditRecords (TenantId, CreatedAt);
CREATE UNIQUE INDEX UX_AuditRecords_Tenant_Idempotency
ON dbo.AuditRecords (TenantId, IdempotencyKey)
WHERE IdempotencyKey IS NOT NULL;
Notes: You may start with JSON columns for Actor, Decision, Correlation, Attributes, Delta, and later project them into read models (see Projections).
Actor Model¶
Defines the subject that initiated the action: a human user, a backend service, or a scheduled/background job. The Actor travels inside AuditRecord to capture identity, roles, and provenance (identity provider details, client information, and optional on-behalf-of linkage).
Overview¶
An Actor is a compact, PII-aware identity envelope designed for long-term immutable storage:
- Types:
User,Service,Job(plusUnknownfor safety). - Identity stability: Prefer stable, opaque IDs (never parse IDs for meaning).
- Provenance: Map OIDC/identity tokens to normalized fields (
Issuer,Subject,ClientId, etc.). - Impersonation / On-Behalf-Of: When a service acts for a user, capture
OnBehalfOfwith a minimal ref. - PII minimization:
Emailis optional and may be redacted;EmailHash(SHA-256 hex) supports joins without exposing PII.
Fields¶
| JSON (lowerCamel) | C# (PascalCase) | Type | Req. | Description & rules |
|---|---|---|---|---|
id |
Id |
string | ✓ | Stable opaque identifier of the actor (ASCII [A-Za-z0-9._-]{1,128}). |
type |
Type |
enum | ✓ | One of Unknown | User | Service | Job. |
display |
Display |
string | Friendly label (max 128). May be redacted on read. | |
email |
Email |
string (email) | Optional; PII. If present, consider redaction/policy. | |
emailHash |
EmailHash |
string (hex64) | SHA-256 of lowercase trimmed email; recommended when email is omitted/redacted. |
|
roles |
Roles |
array\ |
Role names (≤32 items; each ≤64 chars, ASCII [A-Za-z0-9._:-]). |
|
provenance |
Provenance |
object | Identity provider details (issuer, subject, clientId, authType, session, jti). | |
onBehalfOf |
OnBehalfOf |
object | Minimal actor ref when this actor acted for another principal (e.g., service for user). |
PII fields (
display) are subject to classification & redaction policies defined elsewhere.
JSON Schema (partial, v1)¶
{
"$id": "urn:connectsoft:schemas:partials/actor.v1.json",
"$schema": "https://json-schema.org/draft/2020-12/schema",
"title": "Actor",
"type": "object",
"additionalProperties": false,
"properties": {
"id": {
"type": "string",
"minLength": 1,
"maxLength": 128,
"pattern": "^[A-Za-z0-9._-]+$"
},
"type": {
"type": "string",
"enum": ["Unknown", "User", "Service", "Job"]
},
"display": { "type": "string", "maxLength": 128 },
"email": { "type": "string", "format": "email", "maxLength": 254 },
"emailHash": {
"type": "string",
"pattern": "^[a-f0-9]{64}$"
},
"roles": {
"type": "array",
"maxItems": 32,
"items": {
"type": "string",
"maxLength": 64,
"pattern": "^[A-Za-z0-9._:-]+$"
}
},
"provenance": {
"type": "object",
"additionalProperties": false,
"properties": {
"issuer": { "type": "string", "maxLength": 256 },
"subject": { "type": "string", "maxLength": 128 },
"clientId": { "type": "string", "maxLength": 128 },
"authType": { "type": "string", "maxLength": 64 },
"sessionId": { "type": "string", "maxLength": 128 },
"tokenId": { "type": "string", "maxLength": 128 }
}
},
"onBehalfOf": {
"type": "object",
"additionalProperties": false,
"properties": {
"id": {
"type": "string",
"minLength": 1,
"maxLength": 128,
"pattern": "^[A-Za-z0-9._-]+$"
},
"type": { "type": "string", "enum": ["Unknown", "User", "Service", "Job"] },
"display": { "type": "string", "maxLength": 128 }
},
"required": ["id", "type"]
}
},
"required": ["id", "type"]
}
C# (gRPC code-first)¶
[DataContract]
public sealed class Actor
{
[DataMember(Order = 1)] public string Id { get; init; } = default!;
[DataMember(Order = 2)] public ActorType Type { get; init; } = ActorType.Unknown;
[DataMember(Order = 3)] public string? Display { get; init; }
[DataMember(Order = 4)] public string? Email { get; init; } // PII; may be redacted on read
[DataMember(Order = 5)] public string? EmailHash { get; init; } // SHA-256 hex of normalized email
[DataMember(Order = 6)] public IReadOnlyList<string>? Roles { get; init; }
[DataMember(Order = 7)] public ActorProvenance? Provenance { get; init; }
[DataMember(Order = 8)] public ActorRef? OnBehalfOf { get; init; }
}
[DataContract]
public enum ActorType
{
Unknown = 0,
User = 1,
Service = 2,
Job = 3
}
[DataContract]
public sealed class ActorProvenance
{
[DataMember(Order = 1)] public string? Issuer { get; init; } // e.g., https://login.microsoftonline.com/<tenant>/v2.0
[DataMember(Order = 2)] public string? Subject { get; init; } // OIDC 'sub'
[DataMember(Order = 3)] public string? ClientId { get; init; } // OIDC 'azp' or 'client_id'
[DataMember(Order = 4)] public string? AuthType { get; init; } // e.g., "OIDC", "PAT", "mTLS", "APIKey"
[DataMember(Order = 5)] public string? SessionId { get; init; } // OIDC 'sid'
[DataMember(Order = 6)] public string? TokenId { get; init; } // OIDC 'jti'
}
[DataContract]
public sealed class ActorRef
{
[DataMember(Order = 1)] public string Id { get; init; } = default!;
[DataMember(Order = 2)] public ActorType Type { get; init; } = ActorType.Unknown;
[DataMember(Order = 3)] public string? Display { get; init; }
}
JSON serialization MUST use camelCase policy so
EmailHash→emailHash. Database columns follow PascalCase if stored separately (ActorEmailHash, etc.), but typicallyActoris embedded as JSON withinAuditRecords.
Protobuf (optional emission)¶
syntax = "proto3";
package connectsoft.audit.v1;
message Actor {
string Id = 1 [json_name = "id"];
ActorType Type = 2 [json_name = "type"];
string Display = 3 [json_name = "display"];
string Email = 4 [json_name = "email"];
string EmailHash = 5 [json_name = "emailHash"];
repeated string Roles = 6 [json_name = "roles"];
ActorProvenance Provenance = 7 [json_name = "provenance"];
ActorRef OnBehalfOf = 8 [json_name = "onBehalfOf"];
}
enum ActorType {
ActorType_Unknown = 0;
ActorType_User = 1;
ActorType_Service = 2;
ActorType_Job = 3;
}
message ActorProvenance {
string Issuer = 1 [json_name = "issuer"];
string Subject = 2 [json_name = "subject"];
string ClientId = 3 [json_name = "clientId"];
string AuthType = 4 [json_name = "authType"];
string SessionId = 5 [json_name = "sessionId"];
string TokenId = 6 [json_name = "tokenId"];
}
message ActorRef {
string Id = 1 [json_name = "id"];
ActorType Type = 2 [json_name = "type"];
string Display = 3 [json_name = "display"];
}
Examples¶
User actor with redacted email (hash only)
{
"id": "user_123",
"type": "User",
"display": "Alex",
"email": null,
"emailHash": "7c4a8d09ca3762af61e59520943dc26494f8941b...".padEnd(64, "x"),
"roles": ["Member", "Veterinarian"],
"provenance": {
"issuer": "https://login.microsoftonline.com/contoso/v2.0",
"subject": "a1b2c3d4-...-z9",
"authType": "OIDC",
"sessionId": "S-abc123",
"tokenId": "JTI-789"
}
}
Service acting on behalf of a user
{
"id": "svc-gateway",
"type": "Service",
"display": "Gateway",
"roles": ["System"],
"provenance": {
"clientId": "gw-client",
"authType": "mTLS"
},
"onBehalfOf": {
"id": "user_123",
"type": "User",
"display": "Alex"
}
}
Scheduled job
Validation rules (summary)¶
idpattern^[A-Za-z0-9._-]{1,128}$.roles≤ 32 items; each max 64 chars, ASCII[A-Za-z0-9._:-].- If
emailpresent, computeemailHash=SHA256(lowercase(trim(email)))and prefer maskingemailon read according to policy. onBehalfOfrequires bothidandtype.
Use these rules when validating
AuditRecord.Actor. Redaction/PII handling is governed by the platform policies referenced elsewhere in the documentation.
ResourceRef Model¶
Standardizes how a record points to the target resource affected by an action: its type, identifier, and an optional path to a sub-element. Includes an optional tenantScopedId for efficient partition/index keys.
Overview¶
A ResourceRef is a compact, immutable pointer:
- Type describes the domain kind (e.g.,
Appointment,ExportJob) and may be namespaced (e.g.,Vetspire.Appointment). - Id is an opaque identifier meaningful to the producing system—never parsed for semantics.
- Path optionally narrows the reference to a field or sub-resource using a JSON Pointer–style path.
- Tenant scope is explicit via the parent
AuditRecord.tenantId;tenantScopedIdis an optional, derived convenience key.
Fields¶
| JSON (lowerCamel) | C# (PascalCase) | Type | Req. | Description & rules |
|---|---|---|---|---|
type |
Type |
string | ✓ | PascalCase singular; may be dotted namespace (^[A-Z][A-Za-z0-9]*(\\.[A-Z][A-Za-z0-9]*)*$). Examples: Appointment, Vetspire.Appointment. |
id |
Id |
string | ✓ | Opaque ASCII identifier [A-Za-z0-9._:-]{1,128}; must not contain /. |
path |
Path |
string | Optional JSON Pointer–style path to a sub-element. Use /-separated tokens; escape ~ → ~0, / → ~1. Examples: /status, /lines/0/price. |
|
tenantScopedId |
TenantScopedId |
string | Optional derived key for indexing/partitioning. Canonical form: <tenantId>:<type>:<id> (ASCII, max 160). Not authoritative. |
PII note: Some
idvalues can be PII or sensitive (e.g., emails). Classification/redaction policies apply on read paths.
JSON Schema (partial, v1)¶
{
"$id": "urn:connectsoft:schemas:partials/resource-ref.v1.json",
"$schema": "https://json-schema.org/draft/2020-12/schema",
"title": "ResourceRef",
"type": "object",
"additionalProperties": false,
"properties": {
"type": {
"type": "string",
"pattern": "^[A-Z][A-Za-z0-9]*(\\.[A-Z][A-Za-z0-9]*)*$",
"minLength": 1,
"maxLength": 128
},
"id": {
"type": "string",
"pattern": "^[A-Za-z0-9._:-]{1,128}$"
},
"path": {
"type": "string",
"pattern": "^(/([^/~]|~[01])*)*$",
"maxLength": 256
},
"tenantScopedId": {
"type": "string",
"pattern": "^[A-Za-z0-9._:-]{1,160}$"
}
},
"required": ["type", "id"]
}
C# (gRPC code-first)¶
[DataContract]
public sealed class ResourceRef
{
[DataMember(Order = 1)] public string Type { get; init; } = default!; // PascalCase, optional dotted namespace
[DataMember(Order = 2)] public string Id { get; init; } = default!; // Opaque; no slashes
[DataMember(Order = 3)] public string? Path { get; init; } // JSON Pointer–style (e.g., "/status", "/lines/0/price")
[DataMember(Order = 4)] public string? TenantScopedId { get; init; } // "<tenantId>:<type>:<id>" (optional, derived)
}
public static class ResourceRefRules
{
public const string TypePattern = "^[A-Z][A-Za-z0-9]*(\\.[A-Z][A-Za-z0-9]*)*$";
public const string IdPattern = "^[A-Za-z0-9._:-]{1,128}$";
public const string PathPattern = "^(/([^/~]|~[01])*)*$";
public static string MakeTenantScopedId(string tenantId, string type, string id)
=> $"{tenantId}:{type}:{id}";
}
JSON serialization MUST use camelCase policy so
TenantScopedId→tenantScopedId. Database columns follow PascalCase if stored separately (e.g.,ResourceType,ResourceId,ResourcePath,ResourceTenantScopedId) or embed the wholeResourceobject as JSON inAuditRecords.
Protobuf (optional emission)¶
syntax = "proto3";
package connectsoft.audit.v1;
message ResourceRef {
string Type = 1 [json_name = "type"]; // PascalCase, may be namespaced
string Id = 2 [json_name = "id"]; // Opaque; no '/'
string Path = 3 [json_name = "path"]; // JSON Pointer–style
string TenantScopedId = 4 [json_name = "tenantScopedId"]; // Derived convenience key
}
Examples¶
Simple resource
Namespaced external resource
Sub-resource path (JSON Pointer semantics)
With tenant-scoped key
{
"type": "Vetspire.Appointment",
"id": "A-9981",
"tenantScopedId": "splootvets:Vetspire.Appointment:A-9981"
}
Array element path
Validation rules (summary)¶
typemust match PascalCase (with optional dotted namespaces).idmust match^[A-Za-z0-9._:-]{1,128}$and must not include/.pathuses a JSON Pointer–style subset:/segment/0/name; escape/as~1and~as~0.tenantScopedIdis optional, derived, and must not be the sole source of tenant enforcement (tenant isolation is governed byAuditRecord.tenantIdand storage-level policies).
Correlation & Provenance¶
Defines how an AuditRecord links to distributed traces, requests, and causal chains, and how the producer (service or component) is identified. Standardized correlation enables stitching records across gateways, services, jobs, and exports.
Overview¶
Correlation captures where the record came from and how it relates to other work:
- Trace: W3C Trace Context (
traceId, optionalspanId) to join with OpenTelemetry. - Request: A stable
requestIdfor the inbound operation (HTTP/gRPC/message). - Causation: A
causationId(ULID) referencing the prior record/command that triggered this fact. - Producer: The service/runtime that produced the record (name, version, instance, env, region).
Fields¶
| JSON (lowerCamel) | C# (PascalCase) | Type | Req. | Description & rules |
|---|---|---|---|---|
traceId |
TraceId |
string (hex32) | ✓ | W3C 16-byte ID rendered as 32 lowercase hex chars: ^[0-9a-f]{32}$. |
spanId |
SpanId |
string (hex16) | W3C 8-byte span ID: ^[0-9a-f]{16}$. Optional on write path. |
|
requestId |
RequestId |
string | Stable per inbound request. ASCII [A-Za-z0-9._-]{1,128}. |
|
causationId |
CausationId |
string (ULID) | Links to the cause (prior record/command/message). | |
producer |
Producer |
object | Producer identity: service name/version/instance/env/region. |
traceIdSHOULD be present on all records. If missing at ingress, the gateway MUST create a new trace and settraceId.
JSON Schema (partial, v1)¶
{
"$id": "urn:connectsoft:schemas:partials/correlation.v1.json",
"$schema": "https://json-schema.org/draft/2020-12/schema",
"title": "Correlation",
"type": "object",
"additionalProperties": false,
"properties": {
"traceId": {
"type": "string",
"pattern": "^[0-9a-f]{32}$"
},
"spanId": {
"type": "string",
"pattern": "^[0-9a-f]{16}$"
},
"requestId": {
"type": "string",
"maxLength": 128,
"pattern": "^[A-Za-z0-9._-]+$"
},
"causationId": {
"type": "string",
"pattern": "^[0-9A-HJKMNP-TV-Z]{26}$"
},
"producer": {
"$ref": "urn:connectsoft:schemas:partials/producer.v1.json"
}
},
"required": ["traceId"]
}
Producer schema (partial, v1)
{
"$id": "urn:connectsoft:schemas:partials/producer.v1.json",
"$schema": "https://json-schema.org/draft/2020-12/schema",
"title": "Producer",
"type": "object",
"additionalProperties": false,
"properties": {
"service": { "type": "string", "minLength": 1, "maxLength": 128, "pattern": "^[A-Za-z0-9._-]+$" },
"version": { "type": "string", "maxLength": 64 },
"environment": { "type": "string", "maxLength": 32 }, // e.g., dev|staging|prod
"instanceId": { "type": "string", "maxLength": 128 }, // hostname/pod
"region": { "type": "string", "maxLength": 32 }, // e.g., us-central, westeurope
"zone": { "type": "string", "maxLength": 32 } // optional AZ
},
"required": ["service"]
}
C# (gRPC code-first)¶
[DataContract]
public sealed class Correlation
{
[DataMember(Order = 1)] public string TraceId { get; init; } = default!; // hex32
[DataMember(Order = 2)] public string? SpanId { get; init; } // hex16
[DataMember(Order = 3)] public string? RequestId { get; init; } // ASCII token
[DataMember(Order = 4)] public string? CausationId { get; init; } // ULID
[DataMember(Order = 5)] public Producer? Producer { get; init; }
}
[DataContract]
public sealed class Producer
{
[DataMember(Order = 1)] public string Service { get; init; } = default!;
[DataMember(Order = 2)] public string? Version { get; init; }
[DataMember(Order = 3)] public string? Environment { get; init; } // dev|staging|prod
[DataMember(Order = 4)] public string? InstanceId { get; init; } // hostname/pod
[DataMember(Order = 5)] public string? Region { get; init; } // e.g., us-central
[DataMember(Order = 6)] public string? Zone { get; init; } // e.g., us-central1-a
}
JSON serialization MUST use camelCase (
TraceId→traceId). Database columns follow PascalCase if denormalized (e.g.,TraceId,SpanId,RequestId,CausationId,ProducerJSON).
Protobuf (optional emission)¶
syntax = "proto3";
package connectsoft.audit.v1;
message Correlation {
string TraceId = 1 [json_name = "traceId"]; // hex32
string SpanId = 2 [json_name = "spanId"]; // hex16
string RequestId = 3 [json_name = "requestId"];
string CausationId = 4 [json_name = "causationId"]; // ULID
Producer Producer = 5 [json_name = "producer"];
}
message Producer {
string Service = 1 [json_name = "service"];
string Version = 2 [json_name = "version"];
string Environment = 3 [json_name = "environment"];
string InstanceId = 4 [json_name = "instanceId"];
string Region = 5 [json_name = "region"];
string Zone = 6 [json_name = "zone"];
}
Propagation rules¶
HTTP/gRPC ingress
- Parse W3C Trace Context header
traceparent(andtracestateif present). - If
traceparentmissing, start a new trace and settraceId; generate a serverspanId. - Derive
requestIdfrom:x-request-idif present; else gateway-generated token.
- Attach
producerat the component that creates theAuditRecord(e.g., gateway or service).
Async messaging / background jobs
- When publishing/consuming messages, propagate
traceIdvia transport headers; if new work is caused by a prior record/command, setcausationIdto the ULID of that prior entity. - Background/scheduled jobs SHOULD start a new span and link to the originating trace if known; otherwise only
produceris set.
Cross-service calls
- Always forward
traceparentandtracestate. - Each service uses a new
spanId;traceIdremains stable. - If an error response is returned, include
requestId(andtraceId) in Problem+JSON for user correlation.
Header & attribute mapping (reference)¶
| Surface | From | To | Notes |
|---|---|---|---|
| HTTP | traceparent |
correlation.traceId/spanId |
W3C propagation |
| HTTP (legacy) | x-request-id / x-correlation-id |
correlation.requestId |
Accept either; prefer x-request-id |
| gRPC | metadata traceparent |
correlation.traceId/spanId |
Same format |
| Messaging | transport headers (e.g., traceparent) |
correlation.traceId/spanId |
Preserve on publish/consume |
| Internal | prior record/command ULID | correlation.causationId |
Link causal chain |
Stitched examples¶
1) Gateway appends record at ingress
{
"correlation": {
"traceId": "a3f2ccbd7a1f4c0f8b1e9f4d1b7f2a66",
"spanId": "9b1e9f4d1b7f2a66",
"requestId": "REQ-7f3b4a",
"producer": { "service": "gateway", "version": "1.12.0", "environment": "prod", "instanceId": "gw-7c9ff7", "region": "us-central" }
}
}
2) Ingestion service persists derived record (same trace, new span)
{
"correlation": {
"traceId": "a3f2ccbd7a1f4c0f8b1e9f4d1b7f2a66",
"spanId": "c0f8b1e9f4d1b7f2",
"requestId": "REQ-7f3b4a",
"causationId": "01JE1X8M9QNG3B2W6J5SP2K5H5",
"producer": { "service": "ingestion", "version": "2.4.3", "environment": "prod", "instanceId": "ing-42a1", "region": "us-central" }
}
}
3) Projector handles event asynchronously (same trace, new span; causal link to append)
{
"correlation": {
"traceId": "a3f2ccbd7a1f4c0f8b1e9f4d1b7f2a66",
"spanId": "7a1f4c0f8b1e9f4d",
"causationId": "01JE1X8MFT7Z7P8K9V7E7Q0Q2B",
"producer": { "service": "projector", "version": "0.9.0", "environment": "prod", "instanceId": "proj-0a77", "region": "us-central" }
}
}
Validation rules (summary)¶
traceId:^[0-9a-f]{32}$;spanId:^[0-9a-f]{16}$.requestId: ASCII token^[A-Za-z0-9._-]{1,128}$.causationId: ULID pattern^[0-9A-HJKMNP-TV-Z]{26}$.producer.servicerequired; other producer fields optional.traceIdSHOULD be present on all records; when absent at ingress, generate one and propagate forward.
Decision & Access Outcome¶
Represents the authorization outcome associated with an access attempt or policy check performed during an action. It is optional on AuditRecord but SHOULD be populated by components that enforce/consult policy (gateway, authz service, PDP).
Overview¶
- Outcome enum:
Allow,Deny,NotApplicable(Unknownfor forward-compat). - Reasoning: Capture a compact reason code (machine-friendly) and optional human detail.
- Evidence: Optionally reference the policy/rule and engine used to evaluate.
- Attributes: Small key/value map with parameters relevant to the decision (e.g.,
scope=read:appointments,mfa=required). - Time: Include
evaluatedAt(UTC) for the moment the decision was made.
Fields¶
| JSON (lowerCamel) | C# (PascalCase) | Type | Req. | Description & rules |
|---|---|---|---|---|
outcome |
Outcome |
enum | ✓ | Unknown | Allow | Deny | NotApplicable. |
reasonCode |
ReasonCode |
string | Machine-friendly code. Pattern ^[A-Za-z][A-Za-z0-9]*(\\.[A-Za-z][A-Za-z0-9_-]*)*$ (e.g., Policy.Grant, Policy.Deny.Scope). |
|
reason |
Reason |
string | Human-readable explanation (≤512 chars). | |
attributes |
Attributes |
map\ |
Small K/V context (≤32 entries, value ≤256). No secrets/PII. | |
policyRef |
PolicyRef |
object | Points to policy & rule that produced the outcome. | |
engine |
Engine |
object | Evaluator identity (name/version/mode). | |
evaluatedAt |
EvaluatedAt |
timestamp | RFC3339 UTC Z. Defaults to observedAt when omitted. |
PolicyRef
- id (string ≤128), version (string ≤32), ruleId (string ≤64), name (string ≤128, optional).
Engine
- name (e.g., pdp, opa, authz-gw), version (e.g., 2.1.0), mode (e.g., enforce, audit).
JSON Schema (v1)¶
{
"$id": "urn:connectsoft:schemas:partials/decision.v1.json",
"$schema": "https://json-schema.org/draft/2020-12/schema",
"title": "Decision",
"type": "object",
"additionalProperties": false,
"properties": {
"outcome": {
"type": "string",
"enum": ["Unknown", "Allow", "Deny", "NotApplicable"]
},
"reasonCode": {
"type": "string",
"maxLength": 128,
"pattern": "^[A-Za-z][A-Za-z0-9]*(\\.[A-Za-z][A-Za-z0-9_-]*)*$"
},
"reason": { "type": "string", "maxLength": 512 },
"attributes": {
"type": "object",
"maxProperties": 32,
"additionalProperties": { "type": "string", "maxLength": 256 }
},
"policyRef": {
"type": "object",
"additionalProperties": false,
"properties": {
"id": { "type": "string", "maxLength": 128, "minLength": 1 },
"version": { "type": "string", "maxLength": 32 },
"ruleId": { "type": "string", "maxLength": 64 },
"name": { "type": "string", "maxLength": 128 }
},
"required": ["id"]
},
"engine": {
"type": "object",
"additionalProperties": false,
"properties": {
"name": { "type": "string", "maxLength": 64 },
"version": { "type": "string", "maxLength": 32 },
"mode": { "type": "string", "maxLength": 32 }
},
"required": ["name"]
},
"evaluatedAt": { "type": "string", "format": "date-time" }
},
"required": ["outcome"]
}
C# (gRPC code-first)¶
[DataContract]
public sealed class Decision
{
[DataMember(Order = 1)] public DecisionOutcome Outcome { get; init; } = DecisionOutcome.Unknown;
[DataMember(Order = 2)] public string? ReasonCode { get; init; } // e.g., "Policy.Grant", "Policy.Deny.Scope"
[DataMember(Order = 3)] public string? Reason { get; init; } // human readable
[DataMember(Order = 4)] public IReadOnlyDictionary<string, string>? Attributes { get; init; }
[DataMember(Order = 5)] public PolicyRef? PolicyRef { get; init; }
[DataMember(Order = 6)] public DecisionEngine? Engine { get; init; }
[DataMember(Order = 7)] public DateTimeOffset? EvaluatedAt { get; init; }
}
[DataContract]
public enum DecisionOutcome
{
Unknown = 0,
Allow = 1,
Deny = 2,
NotApplicable = 3
}
[DataContract]
public sealed class PolicyRef
{
[DataMember(Order = 1)] public string Id { get; init; } = default!;
[DataMember(Order = 2)] public string? Version { get; init; }
[DataMember(Order = 3)] public string? RuleId { get; init; }
[DataMember(Order = 4)] public string? Name { get; init; }
}
[DataContract]
public sealed class DecisionEngine
{
[DataMember(Order = 1)] public string Name { get; init; } = default!; // "pdp", "opa", "authz-gw"
[DataMember(Order = 2)] public string? Version { get; init; } // "2.1.0"
[DataMember(Order = 3)] public string? Mode { get; init; } // "enforce" | "audit"
}
JSON serialization MUST use camelCase; DB columns remain PascalCase if denormalized (
DecisionOutcome,DecisionReasonCode, …). Avoid persisting largeattributes; keep ≤32 entries.
Protobuf (optional emission)¶
syntax = "proto3";
package connectsoft.audit.v1;
message Decision {
DecisionOutcome Outcome = 1 [json_name = "outcome"];
string ReasonCode = 2 [json_name = "reasonCode"];
string Reason = 3 [json_name = "reason"];
map<string, string> Attributes = 4 [json_name = "attributes"];
PolicyRef PolicyRef = 5 [json_name = "policyRef"];
DecisionEngine Engine = 6 [json_name = "engine"];
google.protobuf.Timestamp EvaluatedAt = 7 [json_name = "evaluatedAt"];
}
enum DecisionOutcome {
DecisionOutcome_Unknown = 0;
DecisionOutcome_Allow = 1;
DecisionOutcome_Deny = 2;
DecisionOutcome_NotApplicable = 3;
}
message PolicyRef {
string Id = 1 [json_name = "id"];
string Version = 2 [json_name = "version"];
string RuleId = 3 [json_name = "ruleId"];
string Name = 4 [json_name = "name"];
}
message DecisionEngine {
string Name = 1 [json_name = "name"];
string Version = 2 [json_name = "version"];
string Mode = 3 [json_name = "mode"];
}
Examples¶
Allow with explicit policy & scope
{
"outcome": "Allow",
"reasonCode": "Policy.Grant",
"attributes": { "scope": "appointments:read", "tenantPolicy": "TIER_2" },
"policyRef": { "id": "access-policy-main", "version": "2025-10-01", "ruleId": "R-ALLOW-APPT-READ" },
"engine": { "name": "pdp", "version": "2.1.0", "mode": "enforce" },
"evaluatedAt": "2025-10-22T14:05:14.022Z"
}
Deny due to missing scope
{
"outcome": "Deny",
"reasonCode": "Policy.Deny.Scope",
"reason": "Caller lacks scope appointments:write",
"attributes": { "requiredScope": "appointments:write" },
"policyRef": { "id": "access-policy-main", "version": "2025-10-01", "ruleId": "R-DENY-MISSING-SCOPE" },
"engine": { "name": "authz-gw", "version": "1.12.0", "mode": "enforce" }
}
Not applicable (non-access event)
Auditing guidance¶
- Populate
decisiononly when an access control or policy evaluation occurred for the audited action. - Prefer a stable
reasonCodetaxonomy for analytics (e.g.,Policy.Grant,Policy.Deny.Scope,Policy.Deny.MFARequired,Policy.NA.EventType). - Keep
attributessmall and non-sensitive (no tokens, no secrets, no full PII). - If the decision depends on tenant edition/feature flags, include a neutral attribute (e.g.,
edition=Pro) but do not duplicate entire policy documents. - For on-behalf-of flows, store the acting principal in
actorand useattributesto hint at delegation constraints if needed (e.g.,delegation=limited). - Ensure
evaluatedAt(UTC) is set by the component making the decision; if absent, readers may fall back toobservedAt.
Data Classification & Redaction Rules¶
Defines the sensitivity taxonomy (DataClass) and the redaction rules applied to fields at write/read/export time. The goal is to minimize exposure of PII/Secrets while keeping audit facts useful and verifiable.
Overview¶
- Classification first: Every sensitive field is labeled with a
DataClass. - Rule-driven transforms: A
RedactionRulewith akind(None|Hash|Mask|Drop|Tokenize) and optionalparamsdictates how a field’s value is handled. - Where applied:
- Write path (ingestion): classification is attached; only classes marked “never store raw” are transformed at write-time (e.g.,
Credential). - Read path (APIs/exports): rules are enforced based on the caller’s clearance/role/tenant policy and the effective RedactionPlan.
- Write path (ingestion): classification is attached; only classes marked “never store raw” are transformed at write-time (e.g.,
- Determinism: Hashing must be deterministic across tenants only when required; otherwise tenant-salted.
DataClass (taxonomy)¶
| Enum value | Meaning | Examples |
|---|---|---|
Public |
Non-sensitive; safe to expose | Action verbs, non-PII tags |
Internal |
Operational metadata; limited exposure | requestId, instanceId |
Personal |
PII light | Display name, city |
Sensitive |
PII / financial / strict protection | Email, phone, address, last4 PAN |
Credential |
Secrets/tokens/keys; never store raw | API keys, OAuth tokens, passwords |
Phi |
Health information; regulated | Diagnosis notes, vitals |
Classification is monotonic: components may upgrade (e.g.,
Personal→Sensitive) but must not downgrade.
Rule matrix (default posture)¶
| DataClass | Write-time (store) | Read: Privileged (auditor) | Read: Standard (tenant user) | Notes |
|---|---|---|---|---|
| Public | None |
None |
None |
— |
| Internal | None |
None |
Mask (optional) |
Mask hostname/pod if needed |
| Personal | None |
Mask or Hash |
Mask |
Keep analytics utility |
| Sensitive | None |
Hash |
Mask/Drop |
Prefer tenant-salted Hash |
| Credential | Hash or Drop (write) |
Drop |
Drop |
Do not persist raw secrets |
| Phi | None |
Mask/Tokenize |
Mask/Drop |
Tokenize when longitudinal joins required |
These are defaults. Tenants/editions may override via policy (see ClassificationPolicy).
JSON Schemas (partials, v1)¶
data-class.v1.json
{
"$id": "urn:connectsoft:schemas:policy/data-class.v1.json",
"$schema": "https://json-schema.org/draft/2020-12/schema",
"title": "DataClass",
"type": "string",
"enum": ["Public", "Internal", "Personal", "Sensitive", "Credential", "Phi"]
}
redaction-rule.v1.json
{
"$id": "urn:connectsoft:schemas:policy/redaction-rule.v1.json",
"$schema": "https://json-schema.org/draft/2020-12/schema",
"title": "RedactionRule",
"type": "object",
"additionalProperties": false,
"properties": {
"kind": {
"type": "string",
"enum": ["None", "Hash", "Mask", "Drop", "Tokenize"]
},
"params": {
"type": "object",
"additionalProperties": { "type": "string", "maxLength": 128 }
}
},
"required": ["kind"]
}
classification-policy.v1.json
{
"$id": "urn:connectsoft:schemas:policy/classification-policy.v1.json",
"$schema": "https://json-schema.org/draft/2020-12/schema",
"title": "ClassificationPolicy",
"type": "object",
"additionalProperties": false,
"properties": {
"id": { "type": "string", "maxLength": 128 },
"version": { "type": "integer", "minimum": 1 },
"effectiveFromUtc": { "type": "string", "format": "date-time" },
"rulesByClass": {
"type": "object",
"additionalProperties": { "$ref": "urn:connectsoft:schemas:policy/redaction-rule.v1.json" }
},
"overridesByField": {
"type": "object",
"additionalProperties": { "$ref": "urn:connectsoft:schemas:policy/redaction-rule.v1.json" }
},
"defaultByField": {
"type": "object",
"additionalProperties": { "$ref": "urn:connectsoft:schemas:policy/data-class.v1.json" }
}
},
"required": ["id", "version"]
}
defaultByFieldlabels specific fields (e.g.,actor.email = Sensitive).rulesByClasssets class-wide defaults.overridesByFieldtakes precedence over class rules.
C# (gRPC code-first)¶
[DataContract]
public enum DataClass
{
Public = 0,
Internal = 1,
Personal = 2,
Sensitive = 3,
Credential = 4,
Phi = 5
}
[DataContract]
public enum RedactionKind
{
None = 0,
Hash = 1,
Mask = 2,
Drop = 3,
Tokenize = 4
}
[DataContract]
public sealed class RedactionRule
{
[DataMember(Order = 1)] public RedactionKind Kind { get; init; } = RedactionKind.None;
[DataMember(Order = 2)] public IReadOnlyDictionary<string,string>? Params { get; init; }
// Params examples:
// Hash: { "alg":"SHA256", "tenantSalt":"<optional>" }
// Mask: { "showFirst":"2", "showLast":"4", "replacement":"*" }
// Tokenize: { "provider":"FPE", "tokenSet":"email", "context":"<tenantId>" }
}
[DataContract]
public sealed class ClassificationPolicy
{
[DataMember(Order = 1)] public string Id { get; init; } = default!;
[DataMember(Order = 2)] public int Version { get; init; }
[DataMember(Order = 3)] public DateTimeOffset? EffectiveFromUtc { get; init; }
[DataMember(Order = 4)] public IReadOnlyDictionary<DataClass, RedactionRule>? RulesByClass { get; init; }
[DataMember(Order = 5)] public IReadOnlyDictionary<string, RedactionRule>? OverridesByField { get; init; } // "actor.email", "request.ip"
[DataMember(Order = 6)] public IReadOnlyDictionary<string, DataClass>? DefaultByField { get; init; } // default classification tags
}
JSON serialization uses camelCase (
overridesByField,rulesByClass). Storage columns remain PascalCase when denormalized (e.g.,DataClass,RedactionPlan), otherwise embedded JSON.
Protobuf (optional emission)¶
syntax = "proto3";
package connectsoft.audit.v1;
message RedactionRule {
RedactionKind Kind = 1 [json_name = "kind"];
map<string,string> Params = 2 [json_name = "params"];
}
enum RedactionKind {
RedactionKind_None = 0;
RedactionKind_Hash = 1;
RedactionKind_Mask = 2;
RedactionKind_Drop = 3;
RedactionKind_Tokenize = 4;
}
enum DataClass {
DataClass_Public = 0;
DataClass_Internal = 1;
DataClass_Personal = 2;
DataClass_Sensitive = 3;
DataClass_Credential = 4;
DataClass_Phi = 5;
}
message ClassificationPolicy {
string Id = 1 [json_name = "id"];
int32 Version = 2 [json_name = "version"];
google.protobuf.Timestamp EffectiveFromUtc = 3 [json_name = "effectiveFromUtc"];
map<int32, RedactionRule> RulesByClass = 4 [json_name = "rulesByClass"]; // key = DataClass numeric
map<string, RedactionRule> OverridesByField = 5 [json_name = "overridesByField"];
map<string, int32> DefaultByField = 6 [json_name = "defaultByField"]; // value = DataClass numeric
}
Evaluation & precedence¶
- Determine field class:
overridesByField→defaultByField→ inferred (component hints) → fallbackInternal. - Select rule:
overridesByField(if present) →rulesByClass[class]→ default posture table. - Apply location:
- Write path: apply only for
Credential(hash/drop) and explicit write-time overrides. Store classification tag alongside value. - Read path: apply selected rule based on caller clearance and edition policy (e.g., auditors may see
Hashvs users seeMask). - Export: use export-specific plan (often equal to read), plus downstream data sharing agreements.
- Write path: apply only for
Redaction behavior
Hash: DeterministicSHA256(hex). Prefer tenant-salted (HMAC-SHA256) unless global joins are required.Mask: Keep edges viashowFirst/showLast; replace middle withreplacement(default*).Drop: Remove the field entirely from the payload.Tokenize: Replace with a reversible token through a tokenization provider (e.g., FPE); store provider metadata inparams.
Example policy & usage¶
Policy (JSON)
{
"id": "policy-default",
"version": 3,
"rulesByClass": {
"Personal": { "kind": "Mask", "params": { "showFirst": "1", "showLast": "3" } },
"Sensitive": { "kind": "Hash", "params": { "alg": "HMAC-SHA256" } },
"Credential": { "kind": "Drop" },
"Phi": { "kind": "Tokenize", "params": { "provider": "FPE", "tokenSet": "phi" } }
},
"overridesByField": {
"actor.email": { "kind": "Hash", "params": { "alg": "HMAC-SHA256" } },
"request.ip": { "kind": "Mask", "params": { "showLast": "4" } }
},
"defaultByField": {
"actor.display": "Personal",
"actor.email": "Sensitive",
"request.ip": "Sensitive"
}
}
Original stored record (excerpt)
{
"actor": { "id": "user_123", "display": "Alex", "email": "alex@example.com" },
"request": { "ip": "203.0.113.27", "userAgent": "Mozilla/5.0" }
}
Redacted for standard tenant user (read path)
{
"actor": { "id": "user_123", "display": "A***", "email": "df12c0...a9f" },
"request": { "ip": "***.***.**3.27", "userAgent": "Mozilla/5.0" }
}
Redacted for auditor (higher clearance)
{
"actor": { "id": "user_123", "display": "Alex", "email": "df12c0...a9f" },
"request": { "ip": "203.0.113.27", "userAgent": "Mozilla/5.0" }
}
Implementation notes¶
- Tagging: Persist classification per field in projections or alongside values (e.g., a parallel metadata map) to support dynamic plans.
- Search: Index only post-redaction values where applicable (e.g., email hashes), and avoid indexing raw sensitive fields.
- Logging: Apply the same plan to logs; ensure log redactors mirror these rules.
- Testing: Include golden fixtures verifying the same input produces expected redacted outputs for each clearance profile.
- Backfills: When policy versions change, re-project read models; never retroactively de-hash or re-expose dropped secrets.
Deltas (before/after)¶
Captures safe field-level changes for an audited action. Each entry expresses what changed on a field or sub-field path, optionally with a redaction hint to guide read/export transformations without exposing raw sensitive data.
Overview¶
- Minimal, explicit changes only: include only fields that changed.
- Typed values:
before/aftermay be JSON scalars or small structured fragments. - Paths: Keys may be simple field names (
status) or JSON Pointer–style paths (/lines/0/price). - PII-aware: When raw values must not be stored, carry hashes and a
redactionHintinstead of rawbefore/after. - Bounded: Payload sizes and counts are capped; large objects should use hashes and set
truncated=true.
Fields¶
| JSON (lowerCamel) | C# (PascalCase) | Type | Req. | Description & rules |
|---|---|---|---|---|
fields |
Fields |
map\ |
✓ | Map of field/path → change record. Keys: ^[A-Za-z][A-Za-z0-9._-]{0,63}$ or JSON Pointer (/…) |
DeltaField
| JSON | C# | Type | Req. | Description |
|---|---|---|---|---|
before |
Before |
any | Previous value (scalar/object/array/null). Omit if hashed/dropped. | |
after |
After |
any | New value (scalar/object/array/null). Omit if hashed/dropped. | |
beforeHash |
BeforeHash |
string | Hex SHA-256 (or HMAC) of previous value when raw not stored. | |
afterHash |
AfterHash |
string | Hex SHA-256 (or HMAC) of new value when raw not stored. | |
algorithm |
Algorithm |
string | Hash algo (SHA256, HMAC-SHA256, …). |
|
truncated |
Truncated |
bool | true if value(s) truncated to fit caps. |
|
redactionHint |
RedactionHint |
object | Hint for read/export (class/kind). See below. |
RedactionHint
| Field | Type | Description |
|---|---|---|
class |
DataClass |
Classification of the field (e.g., Sensitive, Credential). |
applied |
RedactionKind |
Transformation already applied to stored values (Hash, Drop, Mask, Tokenize, None). |
note |
string | Optional operator note (≤256 chars). |
When
appliedisHash/Drop, preferbeforeHash/afterHashinstead of raw values. WhenMaskis applied at read-time only,before/aftermay still be stored raw depending on policy (see Classification).
JSON Schema (v1)¶
{
"$id": "urn:connectsoft:schemas:partials/delta.v1.json",
"$schema": "https://json-schema.org/draft/2020-12/schema",
"title": "Delta",
"type": "object",
"additionalProperties": false,
"properties": {
"fields": {
"type": "object",
"maxProperties": 64,
"patternProperties": {
"^(?:[A-Za-z][A-Za-z0-9._-]{0,63}|(/([^/~]|~[01])*)+)$": {
"type": "object",
"additionalProperties": false,
"properties": {
"before": { "type": ["string","number","integer","boolean","object","array","null"] },
"after": { "type": ["string","number","integer","boolean","object","array","null"] },
"beforeHash": { "type": "string", "pattern": "^[a-f0-9]{64}$" },
"afterHash": { "type": "string", "pattern": "^[a-f0-9]{64}$" },
"algorithm": { "type": "string", "maxLength": 32 },
"truncated": { "type": "boolean" },
"redactionHint": {
"type": "object",
"additionalProperties": false,
"properties": {
"class": { "$ref": "urn:connectsoft:schemas:policy/data-class.v1.json" },
"applied": { "type": "string", "enum": ["None","Hash","Mask","Drop","Tokenize"] },
"note": { "type": "string", "maxLength": 256 }
}
}
}
}
}
}
},
"required": ["fields"]
}
String values SHOULD be ≤ 2048 chars; arrays ≤ 50 elements; objects ≤ 32 properties (excess SHOULD be replaced with a hashed representation and
truncated=true). Concrete limits finalize in Validation & Limits.
C# (gRPC code-first)¶
using System.Text.Json;
using System.Runtime.Serialization;
[DataContract]
public sealed class Delta
{
[DataMember(Order = 1)]
public IReadOnlyDictionary<string, DeltaField> Fields { get; init; } =
new Dictionary<string, DeltaField>();
}
[DataContract]
public sealed class DeltaField
{
// Arbitrary JSON values; prefer JsonElement to preserve types without rehydration.
[DataMember(Order = 1)] public JsonElement? Before { get; init; }
[DataMember(Order = 2)] public JsonElement? After { get; init; }
// If redaction applied at write-time
[DataMember(Order = 3)] public string? BeforeHash { get; init; } // hex
[DataMember(Order = 4)] public string? AfterHash { get; init; } // hex
[DataMember(Order = 5)] public string? Algorithm { get; init; } // "SHA256", "HMAC-SHA256"
[DataMember(Order = 6)] public bool? Truncated { get; init; }
[DataMember(Order = 7)] public RedactionHint? RedactionHint { get; init; }
}
[DataContract]
public sealed class RedactionHint
{
[DataMember(Order = 1)] public DataClass? Class { get; init; }
[DataMember(Order = 2)] public RedactionKind? Applied { get; init; }
[DataMember(Order = 3)] public string? Note { get; init; }
}
JSON serialization MUST use camelCase. Database projections may store
Deltaas JSON (NVARCHAR(MAX)/JSONB) with separate computed columns for common keys if needed.
Protobuf (optional emission)¶
syntax = "proto3";
package connectsoft.audit.v1;
import "google/protobuf/struct.proto";
message Delta {
map<string, DeltaField> Fields = 1 [json_name = "fields"];
}
message DeltaField {
google.protobuf.Value Before = 1 [json_name = "before"]; // optional
google.protobuf.Value After = 2 [json_name = "after"]; // optional
string BeforeHash = 3 [json_name = "beforeHash"];
string AfterHash = 4 [json_name = "afterHash"];
string Algorithm = 5 [json_name = "algorithm"];
bool Truncated = 6 [json_name = "truncated"];
RedactionHint RedactionHint = 7 [json_name = "redactionHint"];
}
message RedactionHint {
DataClass Class = 1 [json_name = "class"];
RedactionKind Applied = 2 [json_name = "applied"];
string Note = 3 [json_name = "note"];
}
Examples¶
1) Simple scalar change
2) JSON Pointer path to sub-field
3) Sensitive field (hash-only at write-time)
{
"fields": {
"actor.email": {
"beforeHash": "df12c0a5f7...a9f0c4b1e3d2c1b0a9f8e7d6c5b4a3f20123456789abcdef0",
"afterHash": "3f0c4b1e3d...df12c0a5f7e9a8b7c6d5e4f30123456789abcdef0a9f8e7d6",
"algorithm": "HMAC-SHA256",
"redactionHint": { "class": "Sensitive", "applied": "Hash" }
}
}
}
4) Large object truncated with hash
{
"fields": {
"profile": {
"beforeHash": "7b2a...c9e",
"after": { "display": "Alex", "city": "Denver" },
"algorithm": "SHA256",
"truncated": true,
"redactionHint": { "class": "Personal", "applied": "Hash", "note": "Oversize object summarized" }
}
}
}
Budgets & caps¶
- Max changed fields per record: 64.
- Max key length: 64 (simple) or JSON Pointer ≤ 256 chars.
- String value length: ≤ 2048 chars per value; larger values SHOULD be hashed with
truncated=true. - Array length: ≤ 50 elements (store diff of impacted indices when possible).
- Object property count: ≤ 32 properties (beyond that, prefer hash summary).
- Hash algorithm:
SHA256orHMAC-SHA256(tenant salt preferred for Sensitive/Personal). - Computation: Delta computation happens at write-time; no deferred “re-diff” during reads.
Validation rules (summary)¶
- Keys match either simple field pattern
^[A-Za-z][A-Za-z0-9._-]{0,63}$or JSON Pointer (^(/([^/~]|~[01])*)+$). - If
before/afteromitted, at least one ofbeforeHash/afterHashMUST be present. - When
algorithmpresent, any hash field MUST be 64 hex chars. redactionHint.class/appliedvalues align with the platform enums (DataClass,RedactionKind).truncated=trueMUST be set when caps are exceeded and a hash summary replaces raw content.
Integrity Structures¶
Defines the objects that provide tamper-evidence for appended audit facts: a per-record IntegrityRef, segment/block containers, a chained Merkle root, and the proof material required to verify end-to-end integrity.
Overview¶
- Canonical hashing: Each record is hashed in a canonical JSON form (UTF-8, RFC 8785/JCS style). The
integrityfield itself is excluded from the hash input. - Segments → Blocks → Chain: Records are batched into Segments (rolling windows by count/time). Segment leaves form a Merkle tree with a
rootHash. Multiple segments roll into an IntegrityBlock that carries the BlockRoot and a signature and points to the previous block (hash chain). - Proofs: Each record stores a compact Merkle proof (
leafHash,path) sufficient to recompute the segment root and validate against the block. - Algorithms: Default hash
SHA256(hex); signatures via detached CMS/PKCS#7 or Ed25519. Parameters are part of the manifest for forward compatibility.
IntegrityRef (per record)¶
Minimal proof pointer & path placed on each AuditRecord.
| JSON (lowerCamel) | C# (PascalCase) | Type | Req. | Description |
|---|---|---|---|---|
blockId |
BlockId |
string (ULID) | ✓ | The enclosing IntegrityBlock identifier. |
segmentId |
SegmentId |
string (ULID) | ✓ | The segment identifier inside the block. |
leafIndex |
LeafIndex |
integer | ✓ | Zero-based index of the record leaf within the segment. |
leafHash |
LeafHash |
string (hex64) | ✓ | Hash of the canonical record bytes. |
algo |
Algo |
string | Hash algorithm, default SHA256. |
|
merklePath |
MerklePath |
array\ |
✓ | Sibling hashes to climb from leaf to the segment root. |
PathNode
pos:"L"or"R"(sibling position relative to the running hash)hash: hex64 sibling hash
Segment & Block¶
IntegritySegment
| Field | Type | Description |
|---|---|---|
segmentId |
ULID | Unique segment id. |
blockId |
ULID | Owning block. |
algo |
string | Hash algorithm (SHA256). |
leafCount |
int | Number of leaves in the tree. |
rootHash |
hex64 | Merkle root for this segment. |
startedAt / closedAt |
timestamp | Segment window bounds (UTC). |
IntegrityBlock
| Field | Type | Description |
|---|---|---|
blockId |
ULID | Unique block id. |
tenantId |
string | Tenant scope for the block. |
algo |
string | Hash algorithm for all segments in this block. |
segmentCount |
int | Count of segments sealed into the block. |
blockRoot |
hex64 | Hash over (ordered) segment roots (e.g., Merkle of segment roots). |
prevBlockRoot |
hex64 | Previous block’s blockRoot (forms a chain). |
signature |
object | Detached signature over blockRoot + header. |
signingKeyId |
string | Key identifier (KMS/Key Vault/ JWKS kid). |
startedAt / sealedAt |
timestamp | Block time bounds (UTC). |
region / environment |
string | Operational labels (optional). |
Ordering: Segment roots are included in ascending
segmentId(or time) to computeblockRoot. The block header (blockId, tenantId, algo, segment root list digest, prevBlockRoot, timestamps) is the signed content.
JSON Schemas (partials, v1)¶
integrity-ref.v1.json
{
"$id": "urn:connectsoft:schemas:integrity/integrity-ref.v1.json",
"$schema": "https://json-schema.org/draft/2020-12/schema",
"title": "IntegrityRef",
"type": "object",
"additionalProperties": false,
"properties": {
"blockId": { "type": "string", "pattern": "^[0-9A-HJKMNP-TV-Z]{26}$" },
"segmentId": { "type": "string", "pattern": "^[0-9A-HJKMNP-TV-Z]{26}$" },
"leafIndex": { "type": "integer", "minimum": 0 },
"leafHash": { "type": "string", "pattern": "^[a-f0-9]{64}$" },
"algo": { "type": "string", "enum": ["SHA256"] },
"merklePath": {
"type": "array",
"maxItems": 64,
"items": {
"type": "object",
"additionalProperties": false,
"properties": {
"pos": { "type": "string", "enum": ["L","R"] },
"hash": { "type": "string", "pattern": "^[a-f0-9]{64}$" }
},
"required": ["pos","hash"]
}
}
},
"required": ["blockId","segmentId","leafIndex","leafHash","merklePath"]
}
integrity-segment.v1.json
{
"$id": "urn:connectsoft:schemas:integrity/segment.v1.json",
"$schema": "https://json-schema.org/draft/2020-12/schema",
"title": "IntegritySegment",
"type": "object",
"additionalProperties": false,
"properties": {
"segmentId": { "type": "string", "pattern": "^[0-9A-HJKMNP-TV-Z]{26}$" },
"blockId": { "type": "string", "pattern": "^[0-9A-HJKMNP-TV-Z]{26}$" },
"algo": { "type": "string", "enum": ["SHA256"] },
"leafCount": { "type": "integer", "minimum": 1 },
"rootHash": { "type": "string", "pattern": "^[a-f0-9]{64}$" },
"startedAt": { "type": "string", "format": "date-time" },
"closedAt": { "type": "string", "format": "date-time" }
},
"required": ["segmentId","blockId","algo","leafCount","rootHash","startedAt","closedAt"]
}
integrity-block.v1.json
{
"$id": "urn:connectsoft:schemas:integrity/block.v1.json",
"$schema": "https://json-schema.org/draft/2020-12/schema",
"title": "IntegrityBlock",
"type": "object",
"additionalProperties": false,
"properties": {
"blockId": { "type": "string", "pattern": "^[0-9A-HJKMNP-TV-Z]{26}$" },
"tenantId": { "type": "string", "pattern": "^[A-Za-z0-9._-]{1,128}$" },
"algo": { "type": "string", "enum": ["SHA256"] },
"segmentCount": { "type": "integer", "minimum": 1 },
"blockRoot": { "type": "string", "pattern": "^[a-f0-9]{64}$" },
"prevBlockRoot":{ "type": "string", "pattern": "^[a-f0-9]{64}$" },
"signature": {
"type": "object",
"additionalProperties": false,
"properties": {
"scheme": { "type": "string", "enum": ["Ed25519","PKCS7"] },
"value": { "type": "string", "contentEncoding": "base64" }
},
"required": ["scheme","value"]
},
"signingKeyId": { "type": "string", "maxLength": 128 },
"startedAt": { "type": "string", "format": "date-time" },
"sealedAt": { "type": "string", "format": "date-time" },
"region": { "type": "string", "maxLength": 32 },
"environment": { "type": "string", "maxLength": 32 }
},
"required": ["blockId","tenantId","algo","segmentCount","blockRoot","prevBlockRoot","signature","signingKeyId","startedAt","sealedAt"]
}
C# (gRPC code-first)¶
[DataContract]
public sealed class IntegrityRef
{
[DataMember(Order = 1)] public string BlockId { get; init; } = default!; // ULID
[DataMember(Order = 2)] public string SegmentId { get; init; } = default!; // ULID
[DataMember(Order = 3)] public int LeafIndex { get; init; }
[DataMember(Order = 4)] public string LeafHash { get; init; } = default!; // hex SHA256
[DataMember(Order = 5)] public string? Algo { get; init; } = "SHA256";
[DataMember(Order = 6)] public IReadOnlyList<MerklePathNode> MerklePath { get; init; } = Array.Empty<MerklePathNode>();
}
[DataContract]
public sealed class MerklePathNode
{
[DataMember(Order = 1)] public string Pos { get; init; } = default!; // "L" | "R"
[DataMember(Order = 2)] public string Hash { get; init; } = default!; // hex
}
[DataContract]
public sealed class IntegritySegment
{
[DataMember(Order = 1)] public string SegmentId { get; init; } = default!;
[DataMember(Order = 2)] public string BlockId { get; init; } = default!;
[DataMember(Order = 3)] public string Algo { get; init; } = "SHA256";
[DataMember(Order = 4)] public int LeafCount { get; init; }
[DataMember(Order = 5)] public string RootHash { get; init; } = default!;
[DataMember(Order = 6)] public DateTimeOffset StartedAt { get; init; }
[DataMember(Order = 7)] public DateTimeOffset ClosedAt { get; init; }
}
[DataContract]
public sealed class IntegrityBlock
{
[DataMember(Order = 1)] public string BlockId { get; init; } = default!;
[DataMember(Order = 2)] public string TenantId { get; init; } = default!;
[DataMember(Order = 3)] public string Algo { get; init; } = "SHA256";
[DataMember(Order = 4)] public int SegmentCount { get; init; }
[DataMember(Order = 5)] public string BlockRoot { get; init; } = default!;
[DataMember(Order = 6)] public string PrevBlockRoot { get; init; } = default!;
[DataMember(Order = 7)] public Signature Signature { get; init; } = new();
[DataMember(Order = 8)] public string SigningKeyId { get; init; } = default!;
[DataMember(Order = 9)] public DateTimeOffset StartedAt { get; init; }
[DataMember(Order = 10)] public DateTimeOffset SealedAt { get; init; }
[DataMember(Order = 11)] public string? Region { get; init; }
[DataMember(Order = 12)] public string? Environment { get; init; }
}
[DataContract]
public sealed class Signature
{
[DataMember(Order = 1)] public string Scheme { get; init; } = "Ed25519"; // or "PKCS7"
[DataMember(Order = 2)] public string Value { get; init; } = default!; // base64
}
JSON serialization uses camelCase; database columns stay PascalCase (
BlockId,BlockRoot, …).
Protobuf (optional emission)¶
syntax = "proto3";
package connectsoft.audit.v1;
message IntegrityRef {
string BlockId = 1 [json_name = "blockId"];
string SegmentId = 2 [json_name = "segmentId"];
int32 LeafIndex = 3 [json_name = "leafIndex"];
string LeafHash = 4 [json_name = "leafHash"];
string Algo = 5 [json_name = "algo"];
repeated MerklePathNode MerklePath = 6 [json_name = "merklePath"];
}
message MerklePathNode {
string Pos = 1 [json_name = "pos"]; // "L" | "R"
string Hash = 2 [json_name = "hash"]; // hex
}
message IntegritySegment {
string SegmentId = 1 [json_name = "segmentId"];
string BlockId = 2 [json_name = "blockId"];
string Algo = 3 [json_name = "algo"];
int32 LeafCount = 4 [json_name = "leafCount"];
string RootHash = 5 [json_name = "rootHash"];
google.protobuf.Timestamp StartedAt = 6 [json_name = "startedAt"];
google.protobuf.Timestamp ClosedAt = 7 [json_name = "closedAt"];
}
message IntegrityBlock {
string BlockId = 1 [json_name = "blockId"];
string TenantId = 2 [json_name = "tenantId"];
string Algo = 3 [json_name = "algo"];
int32 SegmentCount = 4 [json_name = "segmentCount"];
string BlockRoot = 5 [json_name = "blockRoot"];
string PrevBlockRoot = 6 [json_name = "prevBlockRoot"];
Signature Signature = 7 [json_name = "signature"];
string SigningKeyId = 8 [json_name = "signingKeyId"];
google.protobuf.Timestamp StartedAt = 9 [json_name = "startedAt"];
google.protobuf.Timestamp SealedAt = 10 [json_name = "sealedAt"];
string Region = 11 [json_name = "region"];
string Environment = 12 [json_name = "environment"];
}
message Signature {
string Scheme = 1 [json_name = "scheme"]; // "Ed25519" | "PKCS7"
string Value = 2 [json_name = "value"]; // base64
}
Examples¶
Per-record reference (embedded in AuditRecord)
{
"integrity": {
"blockId": "01JE3G8T1E4J9BW6VQ4G6S1Q2C",
"segmentId": "01JE3G8T7P0A7P9F3Q1H6X9V2Z",
"leafIndex": 17,
"leafHash": "3a8f0e9a58f3b3d6e1c0a9f7b6c5d4e3a2b1c0d9e8f7a6b5c4d3e2f1a0b9c8d7",
"algo": "SHA256",
"merklePath": [
{ "pos": "L", "hash": "8f2a..." },
{ "pos": "R", "hash": "5c91..." }
]
}
}
Segment manifest
{
"segmentId": "01JE3G8T7P0A7P9F3Q1H6X9V2Z",
"blockId": "01JE3G8T1E4J9BW6VQ4G6S1Q2C",
"algo": "SHA256",
"leafCount": 512,
"rootHash": "0f6d4a...e91c",
"startedAt": "2025-10-22T14:00:00Z",
"closedAt": "2025-10-22T14:05:00Z"
}
Block header (sealed)
{
"blockId": "01JE3G8T1E4J9BW6VQ4G6S1Q2C",
"tenantId": "splootvets",
"algo": "SHA256",
"segmentCount": 8,
"blockRoot": "6a4b3c...0d2e",
"prevBlockRoot": "5b3a2c...9c1f",
"signature": { "scheme": "Ed25519", "value": "MEUCIQDv..." },
"signingKeyId": "kv:prod/atp-integrity/ed25519-2025-01",
"startedAt": "2025-10-22T14:00:00Z",
"sealedAt": "2025-10-22T14:10:00Z",
"region": "us-central",
"environment": "prod"
}
Verification flow (reader/exporter)¶
- Canonicalize the target record JSON (excluding its
integritynode) → bytes (UTF-8). - Compute
leafHash = SHA256(bytes); compare tointegrity.leafHash. - Climb the Merkle path: iteratively hash with sibling nodes using the recorded
posto reachsegment.rootHash. - Fetch the segment manifest; compare computed root to
IntegritySegment.rootHash. - Fetch the block; recompute
blockRootfrom ordered segment roots; compare toIntegrityBlock.blockRoot. - Verify signature using
signingKeyId(from Key Vault/JWKS). - Optionally verify the chain:
block.prevBlockRootequals the prior block’sblockRoot. - Verification passes only if all steps succeed.
Storage mapping (authoritative)¶
CREATE TABLE dbo.IntegrityBlocks (
BlockId CHAR(26) NOT NULL,
TenantId NVARCHAR(64) NOT NULL,
Algo NVARCHAR(16) NOT NULL, -- "SHA256"
SegmentCount INT NOT NULL,
BlockRoot CHAR(64) NOT NULL, -- hex
PrevBlockRoot CHAR(64) NOT NULL,
SignatureScheme NVARCHAR(16) NOT NULL,
SignatureValue VARBINARY(MAX) NOT NULL,
SigningKeyId NVARCHAR(128) NOT NULL,
StartedAt DATETIME2(3) NOT NULL,
SealedAt DATETIME2(3) NOT NULL,
Region NVARCHAR(32) NULL,
Environment NVARCHAR(32) NULL,
CONSTRAINT PK_IntegrityBlocks PRIMARY KEY (BlockId)
);
CREATE TABLE dbo.IntegritySegments (
SegmentId CHAR(26) NOT NULL,
BlockId CHAR(26) NOT NULL,
Algo NVARCHAR(16) NOT NULL,
LeafCount INT NOT NULL,
RootHash CHAR(64) NOT NULL,
StartedAt DATETIME2(3) NOT NULL,
ClosedAt DATETIME2(3) NOT NULL,
CONSTRAINT PK_IntegritySegments PRIMARY KEY (SegmentId),
CONSTRAINT FK_IntegritySegments_Blocks FOREIGN KEY (BlockId) REFERENCES dbo.IntegrityBlocks(BlockId)
);
Per-record
IntegrityRefis embedded onAuditRecords(JSON). Optionally projectBlockId/SegmentId/LeafIndexinto columns for faster lookups.
Budgets & caps¶
- Max merkle depth: 64 path nodes per record.
- Target segment size: 2^N leaves (e.g., 512 or 1024) or 5-minute window, whichever closes first.
- Block closure: fixed schedule (e.g., 10 minutes) or ~8 segments, with immediate seal/sign.
- Hash algorithm:
SHA256(hex) for all leaves and nodes; FIPS-approved variants may be introduced viaalgo. - Signature: Ed25519 default; PKCS#7 (CMS) supported for enterprise HSM workflows.
Notes¶
- Privacy: Canonical hashing operates on the stored representation; if write-time redaction hashes a field, the record’s leaf hash reflects the redacted value (by design).
- Portability: Block headers and segment manifests are self-describing; exporters include them alongside data packages.
- Forward-compat: New algorithms or layouts must be additive; verifiers fall back to manifest parameters when present.
--
Retention Policy Model¶
Expresses the RetentionPolicy aggregate, its scopes & windows, revisioning (revision with effectiveFromUtc), and the evaluation inputs/results used to decide when an AuditRecord becomes eligible for purge and when it must be kept (WORM-like minimums).
Overview¶
- Min/Max windows: Policies define a minimum keep window (no purge before) and an optional maximum window (target purge at).
- Rule engine: A policy contains ordered rules with scopes matching record facets (
resource.type,action,data classes, attributes), each providing a window. - Revisions: Policies evolve by incrementing
revision(forward-only) and settingeffectiveFromUtc. Re-evaluation may extend keep times but must not shorten previously committed ones. - Holds: LegalHold (elsewhere) supersedes retention—evaluation must surface
state=OnHold. - Clocks: Windows are typically anchored at
createdAt(can be configured per rule).
Model¶
RetentionWindow
minDays(int ≥ 0): Minimum days to retain (WORM).maxDays(int ≥ minDays, optional): If set, target purge after this many days.anchor(CreatedAt|ObservedAt|EffectiveAt), defaultCreatedAt.jitterDays(int ≥ 0, optional): Randomized offset to spread purge load (applied at evaluation time).
RetentionScope
resourceTypes(array of PascalCase names, may include dotted namespaces; supports*suffix wildcard, e.g.,Vetspire.*).actions(array of verb orverb.nounstrings; supports*suffix wildcard, e.g.,appointment.*).dataClasses(array ofDataClassvalues; matches if the record (or its delta) contains any of these classes).attributes(map of key → value or glob pattern; matchesAuditRecord.attributes).- Empty scope matches all records.
RetentionRule
id(string ≤ 64),description(optional),priority(int, lower wins),enabled(bool, defaulttrue),stopProcessing(bool, defaulttrue),scope(RetentionScope),window(RetentionWindow).
RetentionPolicy
id(string ≤ 128),tenantId(string; optional for global policies),revision(int≥1),effectiveFromUtc(timestamp),defaultWindow(RetentionWindow),rules(array\).
JSON Schemas (v1)¶
retention-policy.v1.json
{
"$id": "urn:connectsoft:schemas:policy/retention-policy.v1.json",
"$schema": "https://json-schema.org/draft/2020-12/schema",
"title": "RetentionPolicy",
"type": "object",
"additionalProperties": false,
"properties": {
"id": { "type": "string", "maxLength": 128, "minLength": 1 },
"tenantId": { "type": "string", "maxLength": 128, "pattern": "^[A-Za-z0-9._-]+$" },
"revision": { "type": "integer", "minimum": 1 },
"effectiveFromUtc": { "type": "string", "format": "date-time" },
"defaultWindow": { "$ref": "urn:connectsoft:schemas:policy/retention-window.v1.json" },
"rules": {
"type": "array",
"items": { "$ref": "urn:connectsoft:schemas:policy/retention-rule.v1.json" },
"maxItems": 200
}
},
"required": ["id","revision","effectiveFromUtc","defaultWindow"]
}
retention-window.v1.json
{
"$id": "urn:connectsoft:schemas:policy/retention-window.v1.json",
"$schema": "https://json-schema.org/draft/2020-12/schema",
"title": "RetentionWindow",
"type": "object",
"additionalProperties": false,
"properties": {
"minDays": { "type": "integer", "minimum": 0 },
"maxDays": { "type": "integer", "minimum": 0 },
"anchor": {
"type": "string",
"enum": ["CreatedAt","ObservedAt","EffectiveAt"],
"default": "CreatedAt"
},
"jitterDays": { "type": "integer", "minimum": 0 }
},
"required": ["minDays"]
}
retention-rule.v1.json
{
"$id": "urn:connectsoft:schemas:policy/retention-rule.v1.json",
"$schema": "https://json-schema.org/draft/2020-12/schema",
"title": "RetentionRule",
"type": "object",
"additionalProperties": false,
"properties": {
"id": { "type": "string", "maxLength": 64, "minLength": 1 },
"description": { "type": "string", "maxLength": 256 },
"priority": { "type": "integer", "minimum": 0, "default": 100 },
"enabled": { "type": "boolean", "default": true },
"stopProcessing": { "type": "boolean", "default": true },
"scope": { "$ref": "urn:connectsoft:schemas:policy/retention-scope.v1.json" },
"window": { "$ref": "urn:connectsoft:schemas:policy/retention-window.v1.json" }
},
"required": ["id","scope","window"]
}
retention-scope.v1.json
{
"$id": "urn:connectsoft:schemas:policy/retention-scope.v1.json",
"$schema": "https://json-schema.org/draft/2020-12/schema",
"title": "RetentionScope",
"type": "object",
"additionalProperties": false,
"properties": {
"resourceTypes": {
"type": "array",
"items": { "type": "string", "pattern": "^[A-Z][A-Za-z0-9]*(\\.[A-Z][A-Za-z0-9]*)*(\\.*|\\*)?$" },
"maxItems": 64
},
"actions": {
"type": "array",
"items": { "type": "string", "pattern": "^[a-z]+([.][a-z][a-z0-9_-]+)?(\\*|)$" },
"maxItems": 64
},
"dataClasses": {
"type": "array",
"items": { "$ref": "urn:connectsoft:schemas:policy/data-class.v1.json" },
"maxItems": 16
},
"attributes": {
"type": "object",
"additionalProperties": { "type": "string", "maxLength": 128 }
}
}
}
Evaluation I/O¶
EvaluationInput (what the engine needs)
policyId(string),revision(int) (optional; if omitted, engine picks the latest effective for the tenant atnowUtc).nowUtc(timestamp).record(subset ofAuditRecordmetadata):tenantId,createdAt,observedAt,effectiveAt(optional),action,resource.type,attributes,dataClasses(set of classes observed on fields/delta),legalHold(bool or hold refs).
EvaluationResult
state:Active|OnHold|Eligible|Purged|Error.eligibleAt(timestamp): when record first becomes eligible for purge (after min window).keepUntil(timestamp | null): hard no-purge-before time (minDayswindow).purgeAfter(timestamp | null): target purge time ifmaxDaysexists (plus jitter).matchedRuleId(string | null),appliedWindow(copy of window with effective values),policyId,revision.reasons(array): brief explanations (e.g., Matched rule R-APPT-READ,LegalHold active).errors(array, optional).
retention-eval-input.v1.json
{
"$id": "urn:connectsoft:schemas:policy/retention-eval-input.v1.json",
"$schema": "https://json-schema.org/draft/2020-12/schema",
"title": "RetentionEvaluationInput",
"type": "object",
"additionalProperties": false,
"properties": {
"policyId": { "type": "string" },
"revision": { "type": "integer", "minimum": 1 },
"nowUtc": { "type": "string", "format": "date-time" },
"record": {
"type": "object",
"additionalProperties": false,
"properties": {
"tenantId": { "type": "string", "pattern": "^[A-Za-z0-9._-]{1,128}$" },
"createdAt": { "type": "string", "format": "date-time" },
"observedAt": { "type": "string", "format": "date-time" },
"effectiveAt": { "type": "string", "format": "date-time" },
"action": { "type": "string" },
"resourceType": { "type": "string" },
"attributes": { "type": "object", "additionalProperties": { "type": "string" } },
"dataClasses": {
"type": "array",
"items": { "$ref": "urn:connectsoft:schemas:policy/data-class.v1.json" },
"uniqueItems": true
},
"legalHold": { "type": "boolean" }
},
"required": ["tenantId","createdAt","action","resourceType"]
}
},
"required": ["nowUtc","record"]
}
retention-eval-result.v1.json
{
"$id": "urn:connectsoft:schemas:policy/retention-eval-result.v1.json",
"$schema": "https://json-schema.org/draft/2020-12/schema",
"title": "RetentionEvaluationResult",
"type": "object",
"additionalProperties": false,
"properties": {
"state": { "type": "string", "enum": ["Active","OnHold","Eligible","Purged","Error"] },
"eligibleAt": { "type": "string", "format": "date-time" },
"keepUntil": { "type": ["string","null"], "format": "date-time" },
"purgeAfter": { "type": ["string","null"], "format": "date-time" },
"matchedRuleId": { "type": ["string","null"] },
"appliedWindow": { "$ref": "urn:connectsoft:schemas:policy/retention-window.v1.json" },
"policyId": { "type": "string" },
"revision": { "type": "integer" },
"reasons": { "type": "array", "items": { "type": "string" }, "maxItems": 10 },
"errors": { "type": "array", "items": { "type": "string" } }
},
"required": ["state","eligibleAt","policyId","revision"]
}
C# (gRPC code-first)¶
[DataContract]
public sealed class RetentionPolicy
{
[DataMember(Order = 1)] public string Id { get; init; } = default!;
[DataMember(Order = 2)] public string? TenantId { get; init; }
[DataMember(Order = 3)] public int Revision { get; init; }
[DataMember(Order = 4)] public DateTimeOffset EffectiveFromUtc { get; init; }
[DataMember(Order = 5)] public RetentionWindow DefaultWindow { get; init; } = new();
[DataMember(Order = 6)] public IReadOnlyList<RetentionRule> Rules { get; init; } = Array.Empty<RetentionRule>();
}
[DataContract]
public sealed class RetentionRule
{
[DataMember(Order = 1)] public string Id { get; init; } = default!;
[DataMember(Order = 2)] public string? Description { get; init; }
[DataMember(Order = 3)] public int Priority { get; init; } = 100;
[DataMember(Order = 4)] public bool Enabled { get; init; } = true;
[DataMember(Order = 5)] public bool StopProcessing { get; init; } = true;
[DataMember(Order = 6)] public RetentionScope Scope { get; init; } = new();
[DataMember(Order = 7)] public RetentionWindow Window { get; init; } = new();
}
[DataContract]
public sealed class RetentionWindow
{
[DataMember(Order = 1)] public int MinDays { get; init; } // >= 0
[DataMember(Order = 2)] public int? MaxDays { get; init; } // >= MinDays
[DataMember(Order = 3)] public string Anchor { get; init; } = "CreatedAt"; // "CreatedAt"|"ObservedAt"|"EffectiveAt"
[DataMember(Order = 4)] public int? JitterDays { get; init; } // optional
}
[DataContract]
public sealed class RetentionScope
{
[DataMember(Order = 1)] public IReadOnlyList<string>? ResourceTypes { get; init; } // "Appointment", "Vetspire.*"
[DataMember(Order = 2)] public IReadOnlyList<string>? Actions { get; init; } // "create", "appointment.*"
[DataMember(Order = 3)] public IReadOnlyList<DataClass>? DataClasses { get; init; }// from taxonomy
[DataMember(Order = 4)] public IReadOnlyDictionary<string,string>? Attributes { get; init; }
}
[DataContract]
public sealed class RetentionEvaluationInput
{
[DataMember(Order = 1)] public string? PolicyId { get; init; }
[DataMember(Order = 2)] public int? Revision { get; init; }
[DataMember(Order = 3)] public DateTimeOffset NowUtc { get; init; }
[DataMember(Order = 4)] public RetentionRecordProbe Record { get; init; } = new();
}
[DataContract]
public sealed class RetentionRecordProbe
{
[DataMember(Order = 1)] public string TenantId { get; init; } = default!;
[DataMember(Order = 2)] public DateTimeOffset CreatedAt { get; init; }
[DataMember(Order = 3)] public DateTimeOffset? ObservedAt { get; init; }
[DataMember(Order = 4)] public DateTimeOffset? EffectiveAt { get; init; }
[DataMember(Order = 5)] public string Action { get; init; } = default!;
[DataMember(Order = 6)] public string ResourceType { get; init; } = default!;
[DataMember(Order = 7)] public IReadOnlyDictionary<string,string>? Attributes { get; init; }
[DataMember(Order = 8)] public IReadOnlyList<DataClass>? DataClasses { get; init; }
[DataMember(Order = 9)] public bool? LegalHold { get; init; }
}
[DataContract]
public sealed class RetentionEvaluationResult
{
[DataMember(Order = 1)] public string State { get; init; } = "Active"; // "Active"|"OnHold"|"Eligible"|"Purged"|"Error"
[DataMember(Order = 2)] public DateTimeOffset EligibleAt { get; init; }
[DataMember(Order = 3)] public DateTimeOffset? KeepUntil { get; init; }
[DataMember(Order = 4)] public DateTimeOffset? PurgeAfter { get; init; }
[DataMember(Order = 5)] public string? MatchedRuleId { get; init; }
[DataMember(Order = 6)] public RetentionWindow AppliedWindow { get; init; } = new();
[DataMember(Order = 7)] public string PolicyId { get; init; } = default!;
[DataMember(Order = 8)] public int Revision { get; init; }
[DataMember(Order = 9)] public IReadOnlyList<string>? Reasons { get; init; }
[DataMember(Order = 10)] public IReadOnlyList<string>? Errors { get; init; }
}
JSON uses camelCase; database uses PascalCase (
RetentionPolicies, columns likeEffectiveFromUtc,Revision).
Protobuf (optional emission)¶
syntax = "proto3";
package connectsoft.policy.v1;
message RetentionWindow {
int32 MinDays = 1 [json_name = "minDays"];
google.protobuf.Int32Value MaxDays = 2 [json_name = "maxDays"];
string Anchor = 3 [json_name = "anchor"]; // "CreatedAt"|"ObservedAt"|"EffectiveAt"
google.protobuf.Int32Value JitterDays = 4 [json_name = "jitterDays"];
}
message RetentionScope {
repeated string ResourceTypes = 1 [json_name = "resourceTypes"];
repeated string Actions = 2 [json_name = "actions"];
repeated string DataClasses = 3 [json_name = "dataClasses"]; // enum names
map<string,string> Attributes = 4 [json_name = "attributes"];
}
message RetentionRule {
string Id = 1 [json_name = "id"];
string Description = 2 [json_name = "description"];
int32 Priority = 3 [json_name = "priority"];
bool Enabled = 4 [json_name = "enabled"];
bool StopProcessing = 5 [json_name = "stopProcessing"];
RetentionScope Scope = 6 [json_name = "scope"];
RetentionWindow Window = 7 [json_name = "window"];
}
message RetentionPolicy {
string Id = 1 [json_name = "id"];
string TenantId = 2 [json_name = "tenantId"];
int32 Revision = 3 [json_name = "revision"];
google.protobuf.Timestamp EffectiveFromUtc = 4 [json_name = "effectiveFromUtc"];
RetentionWindow DefaultWindow = 5 [json_name = "defaultWindow"];
repeated RetentionRule Rules = 6 [json_name = "rules"];
}
message RetentionEvaluationInput {
string PolicyId = 1 [json_name = "policyId"];
google.protobuf.Int32Value Revision = 2 [json_name = "revision"];
google.protobuf.Timestamp NowUtc = 3 [json_name = "nowUtc"];
RetentionRecordProbe Record = 4 [json_name = "record"];
}
message RetentionRecordProbe {
string TenantId = 1 [json_name = "tenantId"];
google.protobuf.Timestamp CreatedAt = 2 [json_name = "createdAt"];
google.protobuf.Timestamp ObservedAt = 3 [json_name = "observedAt"];
google.protobuf.Timestamp EffectiveAt = 4 [json_name = "effectiveAt"];
string Action = 5 [json_name = "action"];
string ResourceType = 6 [json_name = "resourceType"];
map<string,string> Attributes = 7 [json_name = "attributes"];
repeated string DataClasses = 8 [json_name = "dataClasses"];
bool LegalHold = 9 [json_name = "legalHold"];
}
message RetentionEvaluationResult {
string State = 1 [json_name = "state"];
google.protobuf.Timestamp EligibleAt = 2 [json_name = "eligibleAt"];
google.protobuf.Timestamp KeepUntil = 3 [json_name = "keepUntil"];
google.protobuf.Timestamp PurgeAfter = 4 [json_name = "purgeAfter"];
string MatchedRuleId = 5 [json_name = "matchedRuleId"];
RetentionWindow AppliedWindow = 6 [json_name = "appliedWindow"];
string PolicyId = 7 [json_name = "policyId"];
int32 Revision = 8 [json_name = "revision"];
repeated string Reasons = 9 [json_name = "reasons"];
repeated string Errors = 10 [json_name = "errors"];
}
Examples¶
Policy
{
"id": "policy-default",
"tenantId": "splootvets",
"revision": 4,
"effectiveFromUtc": "2025-10-01T00:00:00Z",
"defaultWindow": { "minDays": 90 },
"rules": [
{
"id": "R-APPT-READ",
"description": "Shorter retention for read-only appointment views",
"priority": 10,
"scope": { "resourceTypes": ["Vetspire.Appointment"], "actions": ["appointment.read"] },
"window": { "minDays": 30, "maxDays": 365, "anchor": "CreatedAt", "jitterDays": 7 }
},
{
"id": "R-CREDENTIALS",
"description": "Long retention for credential-related events",
"priority": 20,
"scope": { "dataClasses": ["Credential"] },
"window": { "minDays": 3650 }
}
]
}
Evaluation input
{
"nowUtc": "2025-10-22T14:30:00Z",
"record": {
"tenantId": "splootvets",
"createdAt": "2025-10-02T10:00:00Z",
"action": "appointment.read",
"resourceType": "Vetspire.Appointment",
"dataClasses": ["Personal"],
"legalHold": false
}
}
Evaluation result
{
"state": "Active",
"eligibleAt": "2025-11-01T10:00:00Z",
"keepUntil": "2025-11-01T10:00:00Z",
"purgeAfter": "2026-10-02T10:00:00Z", // may include ±jitter
"matchedRuleId": "R-APPT-READ",
"appliedWindow": { "minDays": 30, "maxDays": 365, "anchor": "CreatedAt", "jitterDays": 7 },
"policyId": "policy-default",
"revision": 4,
"reasons": ["Matched rule R-APPT-READ"]
}
Evaluation result with legal hold
{
"state": "OnHold",
"eligibleAt": "2025-12-31T10:00:00Z",
"keepUntil": "2025-12-31T10:00:00Z",
"purgeAfter": null,
"matchedRuleId": "R-APPT-READ",
"appliedWindow": { "minDays": 30, "maxDays": 365 },
"policyId": "policy-default",
"revision": 4,
"reasons": ["LegalHold active"]
}
Evaluation semantics¶
- Select policy by tenant and
nowUtc≥effectiveFromUtc; pick latestrevisionthat’s effective. - Find rule(s) by ascending
priorityamongenabled=truerules whosescopematches probe:resourceTypes/actions: exact or*suffix wildcard match.dataClasses: intersection non-empty.attributes: key must exist and value/glob must match.
- Apply first-window if
stopProcessing=true; otherwise combine windows conservatively:minDays= max of matched mins;maxDays= min of matched maxes (when both present).- If no rule matches, use
defaultWindow.
- Compute times from selected
anchor(defaultcreatedAt), addjitterDaysif present. - Monotonicity: if the record already has a committed
KeepUntil, new evaluation can extend (take later date) but must not reduce it. - Legal hold: If a hold is present, set
state=OnHold, keepkeepUntilfor reference, and setpurgeAfter=null. - State:
ActiveifnowUtc<eligibleAt.EligibleifnowUtc≥eligibleAtand no hold; delete may proceed any time ≥keepUntil(and ideally at/afterpurgeAfterif set).Purgedonly assigned by lifecycle once deletion is complete.
Storage mapping¶
CREATE TABLE dbo.RetentionPolicies (
Id NVARCHAR(128) NOT NULL,
TenantId NVARCHAR(128) NULL,
Revision INT NOT NULL,
EffectiveFromUtc DATETIME2(0) NOT NULL,
DefaultWindow NVARCHAR(256) NOT NULL, -- JSON (MinDays, MaxDays, Anchor, JitterDays)
RulesJson NVARCHAR(MAX) NOT NULL, -- JSON array of rules
CONSTRAINT PK_RetentionPolicies PRIMARY KEY (Id, Revision)
);
CREATE TABLE dbo.RecordRetention (
AuditRecordId CHAR(26) NOT NULL, -- ULID
TenantId NVARCHAR(64) NOT NULL,
PolicyId NVARCHAR(128) NOT NULL,
Revision INT NOT NULL,
MatchedRuleId NVARCHAR(64) NULL,
KeepUntil DATETIME2(0) NOT NULL,
EligibleAt DATETIME2(0) NOT NULL,
PurgeAfter DATETIME2(0) NULL,
State NVARCHAR(16) NOT NULL, -- "Active"|"OnHold"|"Eligible"|"Purged"
LastEvaluatedAt DATETIME2(0) NOT NULL,
CONSTRAINT PK_RecordRetention PRIMARY KEY (AuditRecordId),
INDEX IX_RecordRetention_Tenant_Eligible (TenantId, EligibleAt),
INDEX IX_RecordRetention_Tenant_KeepUntil (TenantId, KeepUntil)
);
Policies are immutable per
(Id, Revision). New revisions create new rows; readers pick the latest effective revision at evaluation time.
Budgets & caps¶
- Max rules per policy: 200.
- Max resourceTypes/actions per rule: 64 each.
minDays≤ 36500 (100 years);maxDays≤ 36500.jitterDays≤ 30.- Wildcards limited to suffix
*(no glob in middle) to keep evaluation O(1) per candidate list.
Validation rules (summary)¶
revisionstrictly increases;effectiveFromUtcmust be ≥ previous revision’s effective time.maxDays(when present) must be ≥minDays.- Combining multiple rules with
stopProcessing=falsemust produce a valid window (i.e.,minDays≤maxDaysifmaxDaysexists). - Monotonicity: persisted
KeepUntilmay only move forward in time upon re-evaluation. - Legal hold sets
state=OnHoldregardless of computed times;PurgeAfterbecomesnull.
Legal Hold Model¶
Locks the model for placing and releasing legal holds that suspend purge eligibility for matching records. A hold defines a scope (what records it applies to), case metadata (caseId, reason), provenance (who placed/released), and timing (placed/released/expiry).
Overview¶
- Effect: While a hold is Active, any matching record is not purgeable regardless of retention windows; evaluation surfaces
state=OnHold. - Scope-first: Holds target records by resource type, action, attributes, data classes, and time range on record clocks.
- Prospective & retrospective: Holds can apply to existing records, future records, or both.
- Provenance: Capture the placing/releasing principal via a compact
ActorRef. - Immutability: Core metadata is immutable after placement; only state can change (Active → Released/Expired) via explicit transitions.
Fields¶
| JSON (lowerCamel) | C# (PascalCase) | Type | Req. | Description & rules |
|---|---|---|---|---|
holdId |
HoldId |
string (ULID) | ✓ | Unique identifier of the hold. |
tenantId |
TenantId |
string | ✓ | Tenant scope; holds are tenant-local unless using a separate supervisory tenant. |
caseId |
CaseId |
string | ✓ | External case or matter reference (≤64). |
reason |
Reason |
string | ✓ | Short justification (≤256). |
note |
Note |
string | Optional longer note (≤1024). | |
state |
State |
enum | ✓ | Active | Released | Expired. |
scope |
Scope |
object | ✓ | Matching rules (see Scope). |
appliesTo |
AppliesTo |
enum | ✓ | Existing | Future | Both. Default Both. |
placedAt |
PlacedAt |
timestamp | ✓ | UTC when hold was placed. |
placedBy |
PlacedBy |
ActorRef |
✓ | Who placed the hold. |
releasedAt |
ReleasedAt |
timestamp | UTC when released (if any). | |
releasedBy |
ReleasedBy |
ActorRef |
Who released the hold (if any). | |
expiresAt |
ExpiresAt |
timestamp | Optional scheduled expiry; when passed, state becomes Expired. |
|
version |
Version |
integer | ✓ | Monotonic counter; increment on each state change. |
Scope
| Field | Type | Description |
|---|---|---|
resourceTypes |
array\ |
PascalCase, dotted namespace allowed; * suffix wildcard (e.g., Vetspire.*). |
actions |
array\ |
verb or verb.noun; * suffix wildcard. |
attributes |
map\ |
Match against AuditRecord.attributes (exact or glob */? allowed in values). |
dataClasses |
array\ |
Matches if any class appears on the record/delta. |
timeRange |
object | { "anchor": "CreatedAt"|"ObservedAt"|"EffectiveAt", "from": ts?, "to": ts? } — inclusive range. |
Empty
scopematches all records in the tenant.
JSON Schemas (v1)¶
legal-hold.v1.json
{
"$id": "urn:connectsoft:schemas:policy/legal-hold.v1.json",
"$schema": "https://json-schema.org/draft/2020-12/schema",
"title": "LegalHold",
"type": "object",
"additionalProperties": false,
"properties": {
"holdId": { "type": "string", "pattern": "^[0-9A-HJKMNP-TV-Z]{26}$" },
"tenantId": { "type": "string", "pattern": "^[A-Za-z0-9._-]{1,128}$" },
"caseId": { "type": "string", "maxLength": 64, "minLength": 1 },
"reason": { "type": "string", "maxLength": 256, "minLength": 1 },
"note": { "type": "string", "maxLength": 1024 },
"state": { "type": "string", "enum": ["Active","Released","Expired"] },
"scope": { "$ref": "urn:connectsoft:schemas:policy/legal-hold-scope.v1.json" },
"appliesTo":{ "type": "string", "enum": ["Existing","Future","Both"], "default": "Both" },
"placedAt": { "type": "string", "format": "date-time" },
"placedBy": { "$ref": "urn:connectsoft:schemas:partials/actor-ref.v1.json" },
"releasedAt": { "type": "string", "format": "date-time" },
"releasedBy": { "$ref": "urn:connectsoft:schemas:partials/actor-ref.v1.json" },
"expiresAt": { "type": "string", "format": "date-time" },
"version": { "type": "integer", "minimum": 1 }
},
"required": ["holdId","tenantId","caseId","reason","state","scope","appliesTo","placedAt","placedBy","version"]
}
legal-hold-scope.v1.json
{
"$id": "urn:connectsoft:schemas:policy/legal-hold-scope.v1.json",
"$schema": "https://json-schema.org/draft/2020-12/schema",
"title": "LegalHoldScope",
"type": "object",
"additionalProperties": false,
"properties": {
"resourceTypes": {
"type": "array",
"items": { "type": "string", "pattern": "^[A-Z][A-Za-z0-9]*(\\.[A-Z][A-Za-z0-9]*)*(\\*|)$" },
"maxItems": 64
},
"actions": {
"type": "array",
"items": { "type": "string", "pattern": "^[a-z]+([.][a-z][a-z0-9_-]+)?(\\*|)$" },
"maxItems": 64
},
"attributes": {
"type": "object",
"additionalProperties": { "type": "string", "maxLength": 128 }
},
"dataClasses": {
"type": "array",
"items": { "$ref": "urn:connectsoft:schemas:policy/data-class.v1.json" },
"maxItems": 16
},
"timeRange": {
"type": "object",
"additionalProperties": false,
"properties": {
"anchor": { "type": "string", "enum": ["CreatedAt","ObservedAt","EffectiveAt"], "default": "CreatedAt" },
"from": { "type": "string", "format": "date-time" },
"to": { "type": "string", "format": "date-time" }
}
}
}
}
actor-ref.v1.json (partial)
Minimal reference used in admin objects to avoid full PII.
{
"$id": "urn:connectsoft:schemas:partials/actor-ref.v1.json",
"$schema": "https://json-schema.org/draft/2020-12/schema",
"title": "ActorRef",
"type": "object",
"additionalProperties": false,
"properties": {
"id": { "type": "string", "maxLength": 128, "pattern": "^[A-Za-z0-9._-]+$" },
"type": { "type": "string", "enum": ["Unknown","User","Service","Job"] },
"display": { "type": "string", "maxLength": 128 }
},
"required": ["id","type"]
}
C# (gRPC code-first)¶
[DataContract]
public sealed class LegalHold
{
[DataMember(Order = 1)] public string HoldId { get; init; } = default!; // ULID
[DataMember(Order = 2)] public string TenantId { get; init; } = default!;
[DataMember(Order = 3)] public string CaseId { get; init; } = default!;
[DataMember(Order = 4)] public string Reason { get; init; } = default!;
[DataMember(Order = 5)] public string? Note { get; init; }
[DataMember(Order = 6)] public LegalHoldState State { get; init; } = LegalHoldState.Active;
[DataMember(Order = 7)] public LegalHoldScope Scope { get; init; } = new();
[DataMember(Order = 8)] public AppliesTo AppliesTo { get; init; } = AppliesTo.Both;
[DataMember(Order = 9)] public DateTimeOffset PlacedAt { get; init; }
[DataMember(Order = 10)] public ActorRef PlacedBy { get; init; } = new();
[DataMember(Order = 11)] public DateTimeOffset? ReleasedAt { get; init; }
[DataMember(Order = 12)] public ActorRef? ReleasedBy { get; init; }
[DataMember(Order = 13)] public DateTimeOffset? ExpiresAt { get; init; }
[DataMember(Order = 14)] public int Version { get; init; } = 1;
}
[DataContract]
public enum LegalHoldState { Active = 0, Released = 1, Expired = 2 }
[DataContract]
public enum AppliesTo { Existing = 0, Future = 1, Both = 2 }
[DataContract]
public sealed class LegalHoldScope
{
[DataMember(Order = 1)] public IReadOnlyList<string>? ResourceTypes { get; init; }
[DataMember(Order = 2)] public IReadOnlyList<string>? Actions { get; init; }
[DataMember(Order = 3)] public IReadOnlyDictionary<string,string>? Attributes { get; init; }
[DataMember(Order = 4)] public IReadOnlyList<DataClass>? DataClasses { get; init; }
[DataMember(Order = 5)] public TimeRange? TimeRange { get; init; }
}
[DataContract]
public sealed class TimeRange
{
[DataMember(Order = 1)] public string Anchor { get; init; } = "CreatedAt"; // CreatedAt|ObservedAt|EffectiveAt
[DataMember(Order = 2)] public DateTimeOffset? From { get; init; }
[DataMember(Order = 3)] public DateTimeOffset? To { get; init; }
}
// Reuse ActorRef from the Actor model section (minimal).
[DataContract]
public sealed class ActorRef
{
[DataMember(Order = 1)] public string Id { get; init; } = default!;
[DataMember(Order = 2)] public ActorType Type { get; init; } = ActorType.Unknown;
[DataMember(Order = 3)] public string? Display { get; init; }
}
JSON serialization uses camelCase; DB sticks to PascalCase (
LegalHolds, columns likeHoldId,PlacedAt,State).
Protobuf (optional emission)¶
syntax = "proto3";
package connectsoft.policy.v1;
message LegalHold {
string HoldId = 1 [json_name = "holdId"];
string TenantId = 2 [json_name = "tenantId"];
string CaseId = 3 [json_name = "caseId"];
string Reason = 4 [json_name = "reason"];
string Note = 5 [json_name = "note"];
LegalHoldState State = 6 [json_name = "state"];
LegalHoldScope Scope = 7 [json_name = "scope"];
AppliesTo AppliesTo = 8 [json_name = "appliesTo"];
google.protobuf.Timestamp PlacedAt = 9 [json_name = "placedAt"];
ActorRef PlacedBy = 10 [json_name = "placedBy"];
google.protobuf.Timestamp ReleasedAt = 11 [json_name = "releasedAt"];
ActorRef ReleasedBy = 12 [json_name = "releasedBy"];
google.protobuf.Timestamp ExpiresAt = 13 [json_name = "expiresAt"];
int32 Version = 14 [json_name = "version"];
}
enum LegalHoldState { LegalHoldState_Active = 0; LegalHoldState_Released = 1; LegalHoldState_Expired = 2; }
enum AppliesTo { AppliesTo_Existing = 0; AppliesTo_Future = 1; AppliesTo_Both = 2; }
message LegalHoldScope {
repeated string ResourceTypes = 1 [json_name = "resourceTypes"];
repeated string Actions = 2 [json_name = "actions"];
map<string,string> Attributes = 3 [json_name = "attributes"];
repeated string DataClasses = 4 [json_name = "dataClasses"]; // enum names
TimeRange TimeRange = 5 [json_name = "timeRange"];
}
message TimeRange {
string Anchor = 1 [json_name = "anchor"]; // "CreatedAt"|"ObservedAt"|"EffectiveAt"
google.protobuf.Timestamp From = 2 [json_name = "from"];
google.protobuf.Timestamp To = 3 [json_name = "to"];
}
message ActorRef {
string Id = 1 [json_name = "id"];
string Type = 2 [json_name = "type"];
string Display = 3 [json_name = "display"];
}
State machine¶
stateDiagram-v2
[*] --> Active: Place
Active --> Released: Release
Active --> Expired: Now >= ExpiresAt (auto)
Released --> [*]
Expired --> [*]
Transition rules
- Place: Create
LegalHoldinActivewithplacedAt,placedBy. - Release: Set
state=Released,releasedAt,releasedBy, and incrementversion. - Expire: If
expiresAtis set and the current time passes it, transition toExpired(idempotent). - Immutability:
caseId,reason,scope, andappliesToare immutable after placement.
Matching semantics¶
A record is under hold if any Active hold satisfying:
tenantIdmatches,scopematches (resource.type,action,attributesglob,dataClasses, andtimeRangeagainst the selected anchor time), and- if
appliesTo = Existing, the record’s anchor time is ≤ placedAt; ifFuture, anchor time is ≥ placedAt;Bothignores this bifurcation.
Examples¶
Place a hold over October appointment events for a case
{
"holdId": "01JE4MVQ20P5P8J4V9Q9T0T3FH",
"tenantId": "splootvets",
"caseId": "CASE-2025-001",
"reason": "Litigation hold for October appointments",
"state": "Active",
"scope": {
"resourceTypes": ["Vetspire.Appointment"],
"actions": ["appointment.*"],
"timeRange": { "anchor": "CreatedAt", "from": "2025-10-01T00:00:00Z", "to": "2025-10-31T23:59:59Z" }
},
"appliesTo": "Both",
"placedAt": "2025-10-22T10:00:00Z",
"placedBy": { "id": "legal.ops", "type": "Service", "display": "Legal Ops" },
"version": 1
}
Release the hold
{
"holdId": "01JE4MVQ20P5P8J4V9Q9T0T3FH",
"state": "Released",
"releasedAt": "2025-12-01T09:00:00Z",
"releasedBy": { "id": "user_789", "type": "User", "display": "Attorney Smith" },
"version": 2
}
Storage mapping¶
CREATE TABLE dbo.LegalHolds (
HoldId CHAR(26) NOT NULL, -- ULID
TenantId NVARCHAR(128) NOT NULL,
CaseId NVARCHAR(64) NOT NULL,
Reason NVARCHAR(256) NOT NULL,
Note NVARCHAR(1024) NULL,
State NVARCHAR(16) NOT NULL, -- Active|Released|Expired
ScopeJson NVARCHAR(MAX) NOT NULL, -- JSON (LegalHoldScope)
AppliesTo NVARCHAR(16) NOT NULL, -- Existing|Future|Both
PlacedAt DATETIME2(0) NOT NULL,
PlacedBy NVARCHAR(256) NOT NULL, -- JSON (ActorRef)
ReleasedAt DATETIME2(0) NULL,
ReleasedBy NVARCHAR(256) NULL, -- JSON (ActorRef)
ExpiresAt DATETIME2(0) NULL,
Version INT NOT NULL,
CONSTRAINT PK_LegalHolds PRIMARY KEY (HoldId),
INDEX IX_LegalHolds_Tenant_State (TenantId, State),
INDEX IX_LegalHolds_ExpiresAt (ExpiresAt)
);
Optionally materialize a membership table to snapshot matches for reporting or export:
CREATE TABLE dbo.LegalHoldAssignments (
HoldId CHAR(26) NOT NULL,
AuditRecordId CHAR(26) NOT NULL,
TenantId NVARCHAR(128) NOT NULL,
AssignedAt DATETIME2(0) NOT NULL,
UnassignedAt DATETIME2(0) NULL,
CONSTRAINT PK_LegalHoldAssignments PRIMARY KEY (HoldId, AuditRecordId),
INDEX IX_LegalHoldAssignments_Tenant (TenantId),
INDEX IX_LegalHoldAssignments_Record (AuditRecordId)
);
The assignment table is maintained by a background matcher that:
- On place: backfills existing matches per
scopeand begins streaming future records. - On release/expire: sets
UnassignedAtfor active assignments.
Budgets & caps¶
- Max active holds per tenant: 1,000.
- Max resourceTypes/actions per scope: 64 each.
note≤ 1024 chars.timeRangewidth is unbounded, but large ranges increase backfill cost—prefer explicit ranges per case.
Validation rules (summary)¶
holdIdULID format;tenantIdtoken pattern.- Immutable after placement:
tenantId,caseId,reason,scope,appliesTo. releasedAtrequiresreleasedBy; both set together.- When
expiresAtpasses, state auto-transitions toExpired(idempotent). - Matching uses UTC and the specified
timeRange.anchor.
Export Models & Manifests¶
Defines the Export domain: the long-running ExportJob that selects and packages records, the per-package ExportManifest (verifiable metadata + integrity), and the delivery/signature envelopes. Exports respect Legal Holds, Retention, and the effective Redaction Plan.
Overview¶
- Job orchestration: A durable
ExportJobscans by filter/window and emits Packages (shards) for parallel delivery. - Deterministic packages: Each package ships data (e.g., JSONL) plus a signed manifest with content hashes and integrity roots.
- Resume safety: Jobs are resumable via a compact resumeToken (ULID high-watermark + time watermarks).
- Compliance: Exports may include integrity proofs (block/segment roots + signature) for end-to-end verification.
- Redaction: Data materializes under a specific Redaction Plan (policy id/revision).
ExportJob¶
| JSON (lowerCamel) | C# (PascalCase) | Type | Req. | Description |
|---|---|---|---|---|
jobId |
JobId |
ULID | ✓ | Unique job id. |
tenantId |
TenantId |
string | ✓ | Tenant scope. |
createdAt |
CreatedAt |
timestamp | ✓ | Job creation (UTC). |
createdBy |
CreatedBy |
ActorRef |
✓ | Requestor (minimal). |
state |
State |
enum | ✓ | Pending | Running | Paused | Completed | Failed | Canceled. |
stateReason |
StateReason |
string | Short human reason (≤256). | |
filter |
Filter |
ExportFilter |
✓ | Selection (see below). |
format |
Format |
enum | ✓ | Jsonl | Parquet (default Jsonl). |
compression |
Compression |
enum | None | Gzip (default Gzip). |
|
encryption |
Encryption |
object | Package encryption (see below). | |
includeIntegrity |
IncludeIntegrity |
bool | Include proofs (default true). |
|
sealedThrough |
SealedThrough |
timestamp | Only include records whose IntegrityBlock.SealedAt ≤ sealedThrough. | |
redactionPlan |
RedactionPlan |
object | ✓ | { "id": string, "revision": int }. |
delivery |
Delivery |
object | ✓ | Where/how to deliver (descriptor). |
packageBytesTarget |
PackageBytesTarget |
int | Target uncompressed bytes per package (e.g., 512 MiB). | |
maxPackages |
MaxPackages |
int | Safety cap. | |
progress |
Progress |
object | { "records": long, "bytes": long, "packages": int } (updated as job runs). |
|
resumeToken |
ResumeToken |
string | Opaque token to resume from checkpoint. | |
callbacks |
Callbacks |
array\<Callback> |
Webhook notifications (events below). | |
lastUpdatedAt |
LastUpdatedAt |
timestamp | ✓ | Monotonic update time. |
ExportFilter
timeRange—{ "from": ts?, "to": ts?, "anchor": "CreatedAt"|"ObservedAt"|"EffectiveAt" }resourceTypes— array of PascalCase (supports suffix*)actions— array ofverborverb.noun(supports suffix*)attributes— map of key→value/glob (match againstAuditRecord.attributes)dataClasses— array ofDataClass(match if any present on record/delta)legalHoldOnly— bool (return only records currently under hold)
Encryption (per package)
scheme—None|AES256-GCMkeyId— KMS/KeyVault/JWKS id of wrapping key (whenAES256-GCM)wrappedKey— base64, envelope-encrypted DEK (writer only; optional in job)
Delivery (descriptor)
kind:S3|GCS|AzureBlob|Sftp|HttpsCallbackpath: bucket/container + prefix or remote pathcredentialsRef: secret reference (never inline secrets)callback(forHttpsCallback):{ "url": "...", "auth": { "kind": "Hmac", "secretRef": "...", "header": "X-Signature" } }
Callback events
ExportJob.Started|Paused|Resumed|Completed|Failed|CanceledExportPackage.Ready|Delivered|Failed
JSON Schemas (v1)¶
export-job.v1.json
{
"$id": "urn:connectsoft:schemas:export/export-job.v1.json",
"$schema": "https://json-schema.org/draft/2020-12/schema",
"title": "ExportJob",
"type": "object",
"additionalProperties": false,
"properties": {
"jobId": { "type": "string", "pattern": "^[0-9A-HJKMNP-TV-Z]{26}$" },
"tenantId": { "type": "string", "pattern": "^[A-Za-z0-9._-]{1,128}$" },
"createdAt": { "type": "string", "format": "date-time" },
"createdBy": { "$ref": "urn:connectsoft:schemas:partials/actor-ref.v1.json" },
"state": { "type": "string", "enum": ["Pending","Running","Paused","Completed","Failed","Canceled"] },
"stateReason": { "type": "string", "maxLength": 256 },
"filter": { "$ref": "urn:connectsoft:schemas:export/export-filter.v1.json" },
"format": { "type": "string", "enum": ["Jsonl","Parquet"], "default": "Jsonl" },
"compression": { "type": "string", "enum": ["None","Gzip"], "default": "Gzip" },
"encryption": { "$ref": "urn:connectsoft:schemas:export/encryption.v1.json" },
"includeIntegrity": { "type": "boolean", "default": true },
"sealedThrough": { "type": "string", "format": "date-time" },
"redactionPlan": {
"type": "object",
"additionalProperties": false,
"properties": {
"id": { "type": "string", "maxLength": 128 },
"revision": { "type": "integer", "minimum": 1 }
},
"required": ["id","revision"]
},
"delivery": { "$ref": "urn:connectsoft:schemas:export/delivery.v1.json" },
"packageBytesTarget": { "type": "integer", "minimum": 1 },
"maxPackages": { "type": "integer", "minimum": 1 },
"progress": {
"type": "object",
"additionalProperties": false,
"properties": {
"records": { "type": "integer", "minimum": 0 },
"bytes": { "type": "integer", "minimum": 0 },
"packages": { "type": "integer", "minimum": 0 }
}
},
"resumeToken": { "type": "string", "maxLength": 256 },
"callbacks": {
"type": "array",
"items": { "$ref": "urn:connectsoft:schemas:export/callback.v1.json" },
"maxItems": 8
},
"lastUpdatedAt": { "type": "string", "format": "date-time" }
},
"required": ["jobId","tenantId","createdAt","createdBy","state","filter","format","delivery","redactionPlan","lastUpdatedAt"]
}
export-filter.v1.json
{
"$id": "urn:connectsoft:schemas:export/export-filter.v1.json",
"$schema": "https://json-schema.org/draft/2020-12/schema",
"title": "ExportFilter",
"type": "object",
"additionalProperties": false,
"properties": {
"timeRange": {
"type": "object",
"additionalProperties": false,
"properties": {
"from": { "type": "string", "format": "date-time" },
"to": { "type": "string", "format": "date-time" },
"anchor": { "type": "string", "enum": ["CreatedAt","ObservedAt","EffectiveAt"], "default": "CreatedAt" }
}
},
"resourceTypes": { "type": "array", "items": { "type": "string" }, "maxItems": 64 },
"actions": { "type": "array", "items": { "type": "string" }, "maxItems": 64 },
"attributes": { "type": "object", "additionalProperties": { "type": "string" } },
"dataClasses": { "type": "array", "items": { "$ref": "urn:connectsoft:schemas:policy/data-class.v1.json" }, "maxItems": 16 },
"legalHoldOnly": { "type": "boolean", "default": false }
}
}
delivery.v1.json
{
"$id": "urn:connectsoft:schemas:export/delivery.v1.json",
"$schema": "https://json-schema.org/draft/2020-12/schema",
"title": "Delivery",
"type": "object",
"additionalProperties": false,
"properties": {
"kind": { "type": "string", "enum": ["S3","GCS","AzureBlob","Sftp","HttpsCallback"] },
"path": { "type": "string", "maxLength": 512 },
"credentialsRef": { "type": "string", "maxLength": 128 },
"callback": {
"type": "object",
"additionalProperties": false,
"properties": {
"url": { "type": "string", "format": "uri" },
"auth": {
"type": "object",
"additionalProperties": false,
"properties": {
"kind": { "type": "string", "enum": ["Hmac"] },
"secretRef": { "type": "string", "maxLength": 128 },
"header": { "type": "string", "maxLength": 64 }
}
}
}
}
},
"required": ["kind","path"]
}
encryption.v1.json
{
"$id": "urn:connectsoft:schemas:export/encryption.v1.json",
"$schema": "https://json-schema.org/draft/2020-12/schema",
"title": "Encryption",
"type": "object",
"additionalProperties": false,
"properties": {
"scheme": { "type": "string", "enum": ["None","AES256-GCM"], "default": "None" },
"keyId": { "type": "string", "maxLength": 128 },
"wrappedKey": { "type": "string", "contentEncoding": "base64" }
}
}
callback.v1.json
{
"$id": "urn:connectsoft:schemas:export/callback.v1.json",
"$schema": "https://json-schema.org/draft/2020-12/schema",
"title": "Callback",
"type": "object",
"additionalProperties": false,
"properties": {
"event": { "type": "string" },
"url": { "type": "string", "format": "uri" },
"hmacSecretRef": { "type": "string", "maxLength": 128 },
"header": { "type": "string", "maxLength": 64 }
},
"required": ["event","url"]
}
ExportManifest (per package)¶
Describes the payload file(s), their hashes, package boundaries, and integrity proofs (if included). Manifest itself can be signed.
| Field (JSON) | Type | Req. | Description |
|---|---|---|---|
schemaVersion |
string | ✓ | e.g., export-manifest.v1. |
jobId |
ULID | ✓ | Back-reference to job. |
packageId |
ULID | ✓ | Unique package id. |
tenantId |
string | ✓ | Tenant scope. |
createdAt |
timestamp | ✓ | Package creation time (UTC). |
packageIndex |
int | ✓ | 0-based sequence within job. |
packageCount |
int | Total packages (known when job completes). | |
format |
enum | ✓ | Jsonl |
compression |
enum | ✓ | None |
encryption |
object | Same shape as job (final applied). | |
redactionPlan |
object | ✓ | { "id": string, "revision": int }. |
recordCount |
long | ✓ | Number of records in package. |
bytesUncompressed |
long | ✓ | Sum of raw bytes. |
content |
array<ContentFile> |
✓ | One or more files (shards) in this package. |
bounds |
object | ✓ | { "minRecordId": ULID, "maxRecordId": ULID, "from": ts?, "to": ts? }. |
integrity |
object | { "blocks": [...], "segments": [...] } minimal bundle (see below). |
|
contentHash |
string | ✓ | SHA256 over concatenated content files (post-compression, pre-encryption). |
signature |
Signature |
Detached signature of manifest JSON canonical bytes. |
ContentFile
name(string) — filename (e.g.,export_<jobId>_<index>.jsonl.gz)uri(string) — delivery URI (s3://… or https://…)bytes(long) — size of the stored filerecords(long) — number of records in filesha256(string) — SHA256 of the stored file bytes
integrity bundle (minimal, to validate records within package)
segments: array of{ "segmentId": ULID, "rootHash": hex64, "blockId": ULID }(deduplicated)blocks: array of{ "blockId": ULID, "blockRoot": hex64, "prevBlockRoot": hex64, "signature": { "scheme": "Ed25519"|"PKCS7", "value": base64 }, "signingKeyId": string }
Signature
scheme:Ed25519|PKCS7value: base64 detached signature
Manifest file naming
export_<jobId>_<packageIndex>.manifest.json(optionally.sigalongside for detached signature)
export-manifest.v1.json
{
"$id": "urn:connectsoft:schemas:export/export-manifest.v1.json",
"$schema": "https://json-schema.org/draft/2020-12/schema",
"title": "ExportManifest",
"type": "object",
"additionalProperties": false,
"properties": {
"schemaVersion": { "type": "string", "pattern": "^export-manifest\\.v[0-9]+$" },
"jobId": { "type": "string", "pattern": "^[0-9A-HJKMNP-TV-Z]{26}$" },
"packageId": { "type": "string", "pattern": "^[0-9A-HJKMNP-TV-Z]{26}$" },
"tenantId": { "type": "string", "pattern": "^[A-Za-z0-9._-]{1,128}$" },
"createdAt": { "type": "string", "format": "date-time" },
"packageIndex": { "type": "integer", "minimum": 0 },
"packageCount": { "type": "integer", "minimum": 1 },
"format": { "type": "string", "enum": ["Jsonl","Parquet"] },
"compression": { "type": "string", "enum": ["None","Gzip"] },
"encryption": { "$ref": "urn:connectsoft:schemas:export/encryption.v1.json" },
"redactionPlan": {
"type": "object",
"properties": { "id": { "type": "string" }, "revision": { "type": "integer" } },
"required": ["id","revision"],
"additionalProperties": false
},
"recordCount": { "type": "integer", "minimum": 0 },
"bytesUncompressed": { "type": "integer", "minimum": 0 },
"content": {
"type": "array",
"items": {
"type": "object", "additionalProperties": false,
"properties": {
"name": { "type": "string" },
"uri": { "type": "string" },
"bytes": { "type": "integer", "minimum": 0 },
"records": { "type": "integer", "minimum": 0 },
"sha256": { "type": "string", "pattern": "^[a-f0-9]{64}$" }
},
"required": ["name","uri","bytes","records","sha256"]
}
},
"bounds": {
"type": "object",
"additionalProperties": false,
"properties": {
"minRecordId": { "type": "string", "pattern": "^[0-9A-HJKMNP-TV-Z]{26}$" },
"maxRecordId": { "type": "string", "pattern": "^[0-9A-HJKMNP-TV-Z]{26}$" },
"from": { "type": "string", "format": "date-time" },
"to": { "type": "string", "format": "date-time" }
},
"required": ["minRecordId","maxRecordId"]
},
"integrity": {
"type": "object",
"additionalProperties": false,
"properties": {
"segments": {
"type": "array",
"items": {
"type": "object", "additionalProperties": false,
"properties": {
"segmentId": { "type": "string", "pattern": "^[0-9A-HJKMNP-TV-Z]{26}$" },
"rootHash": { "type": "string", "pattern": "^[a-f0-9]{64}$" },
"blockId": { "type": "string", "pattern": "^[0-9A-HJKMNP-TV-Z]{26}$" }
},
"required": ["segmentId","rootHash","blockId"]
}
},
"blocks": {
"type": "array",
"items": {
"type": "object", "additionalProperties": false,
"properties": {
"blockId": { "type": "string", "pattern": "^[0-9A-HJKMNP-TV-Z]{26}$" },
"blockRoot": { "type": "string", "pattern": "^[a-f0-9]{64}$" },
"prevBlockRoot": { "type": "string", "pattern": "^[a-f0-9]{64}$" },
"signature": {
"type": "object", "additionalProperties": false,
"properties": {
"scheme": { "type": "string", "enum": ["Ed25519","PKCS7"] },
"value": { "type": "string", "contentEncoding": "base64" }
},
"required": ["scheme","value"]
},
"signingKeyId": { "type": "string" }
},
"required": ["blockId","blockRoot","prevBlockRoot","signature","signingKeyId"]
}
}
}
},
"contentHash": { "type": "string", "pattern": "^[a-f0-9]{64}$" },
"signature": {
"type": "object", "additionalProperties": false,
"properties": {
"scheme": { "type": "string", "enum": ["Ed25519","PKCS7"] },
"value": { "type": "string", "contentEncoding": "base64" }
}
}
},
"required": ["schemaVersion","jobId","packageId","tenantId","createdAt","packageIndex","format","compression","redactionPlan","recordCount","bytesUncompressed","content","bounds","contentHash"]
}
C# (gRPC code-first)¶
[DataContract]
public sealed class ExportJob
{
[DataMember(Order = 1)] public string JobId { get; init; } = default!; // ULID
[DataMember(Order = 2)] public string TenantId { get; init; } = default!;
[DataMember(Order = 3)] public DateTimeOffset CreatedAt { get; init; }
[DataMember(Order = 4)] public ActorRef CreatedBy { get; init; } = new();
[DataMember(Order = 5)] public ExportJobState State { get; init; } = ExportJobState.Pending;
[DataMember(Order = 6)] public string? StateReason { get; init; }
[DataMember(Order = 7)] public ExportFilter Filter { get; init; } = new();
[DataMember(Order = 8)] public ExportFormat Format { get; init; } = ExportFormat.Jsonl;
[DataMember(Order = 9)] public Compression Compression { get; init; } = Compression.Gzip;
[DataMember(Order = 10)] public Encryption? Encryption { get; init; }
[DataMember(Order = 11)] public bool IncludeIntegrity { get; init; } = true;
[DataMember(Order = 12)] public DateTimeOffset? SealedThrough { get; init; }
[DataMember(Order = 13)] public RedactionPlanRef RedactionPlan { get; init; } = new();
[DataMember(Order = 14)] public Delivery Delivery { get; init; } = new();
[DataMember(Order = 15)] public int? PackageBytesTarget { get; init; }
[DataMember(Order = 16)] public int? MaxPackages { get; init; }
[DataMember(Order = 17)] public ExportProgress? Progress { get; init; }
[DataMember(Order = 18)] public string? ResumeToken { get; init; }
[DataMember(Order = 19)] public IReadOnlyList<Callback>? Callbacks { get; init; }
[DataMember(Order = 20)] public DateTimeOffset LastUpdatedAt { get; init; }
}
public enum ExportJobState { Pending=0, Running=1, Paused=2, Completed=3, Failed=4, Canceled=5 }
public enum ExportFormat { Jsonl=0, Parquet=1 }
public enum Compression { None=0, Gzip=1 }
[DataContract] public sealed class RedactionPlanRef { [DataMember(Order = 1)] public string Id { get; init; } = default!; [DataMember(Order = 2)] public int Revision { get; init; } }
[DataContract] public sealed class ExportProgress { [DataMember(Order = 1)] public long Records { get; init; } [DataMember(Order = 2)] public long Bytes { get; init; } [DataMember(Order = 3)] public int Packages { get; init; } }
[DataContract]
public sealed class ExportFilter
{
[DataMember(Order = 1)] public TimeRange? TimeRange { get; init; }
[DataMember(Order = 2)] public IReadOnlyList<string>? ResourceTypes { get; init; }
[DataMember(Order = 3)] public IReadOnlyList<string>? Actions { get; init; }
[DataMember(Order = 4)] public IReadOnlyDictionary<string,string>? Attributes { get; init; }
[DataMember(Order = 5)] public IReadOnlyList<DataClass>? DataClasses { get; init; }
[DataMember(Order = 6)] public bool? LegalHoldOnly { get; init; }
}
[DataContract] public sealed class Encryption { [DataMember(Order = 1)] public string Scheme { get; init; } = "None"; [DataMember(Order = 2)] public string? KeyId { get; init; } [DataMember(Order = 3)] public string? WrappedKey { get; init; } }
[DataContract]
public sealed class Delivery
{
[DataMember(Order = 1)] public string Kind { get; init; } = default!; // "S3"|"GCS"|...
[DataMember(Order = 2)] public string Path { get; init; } = default!;
[DataMember(Order = 3)] public string? CredentialsRef { get; init; }
[DataMember(Order = 4)] public DeliveryCallback? Callback { get; init; }
}
[DataContract] public sealed class DeliveryCallback { [DataMember(Order = 1)] public string Url { get; init; } = default!; [DataMember(Order = 2)] public HmacAuth? Auth { get; init; } }
[DataContract] public sealed class HmacAuth { [DataMember(Order = 1)] public string Kind { get; init; } = "Hmac"; [DataMember(Order = 2)] public string SecretRef { get; init; } = default!; [DataMember(Order = 3)] public string Header { get; init; } = "X-Signature"; }
[DataContract] public sealed class Callback { [DataMember(Order = 1)] public string Event { get; init; } = default!; [DataMember(Order = 2)] public string Url { get; init; } = default!; [DataMember(Order = 3)] public string? HmacSecretRef { get; init; } [DataMember(Order = 4)] public string? Header { get; init; } }
Manifest
[DataContract]
public sealed class ExportManifest
{
[DataMember(Order = 1)] public string SchemaVersion { get; init; } = "export-manifest.v1";
[DataMember(Order = 2)] public string JobId { get; init; } = default!;
[DataMember(Order = 3)] public string PackageId { get; init; } = default!;
[DataMember(Order = 4)] public string TenantId { get; init; } = default!;
[DataMember(Order = 5)] public DateTimeOffset CreatedAt { get; init; }
[DataMember(Order = 6)] public int PackageIndex { get; init; }
[DataMember(Order = 7)] public int? PackageCount { get; init; }
[DataMember(Order = 8)] public ExportFormat Format { get; init; } = ExportFormat.Jsonl;
[DataMember(Order = 9)] public Compression Compression { get; init; } = Compression.Gzip;
[DataMember(Order = 10)] public Encryption? Encryption { get; init; }
[DataMember(Order = 11)] public RedactionPlanRef RedactionPlan { get; init; } = new();
[DataMember(Order = 12)] public long RecordCount { get; init; }
[DataMember(Order = 13)] public long BytesUncompressed { get; init; }
[DataMember(Order = 14)] public IReadOnlyList<ContentFile> Content { get; init; } = Array.Empty<ContentFile>();
[DataMember(Order = 15)] public ExportBounds Bounds { get; init; } = new();
[DataMember(Order = 16)] public IntegrityBundle? Integrity { get; init; }
[DataMember(Order = 17)] public string ContentHash { get; init; } = default!;
[DataMember(Order = 18)] public Signature? Signature { get; init; }
}
[DataContract] public sealed class ContentFile { [DataMember(Order = 1)] public string Name { get; init; } = default!; [DataMember(Order = 2)] public string Uri { get; init; } = default!; [DataMember(Order = 3)] public long Bytes { get; init; } [DataMember(Order = 4)] public long Records { get; init; } [DataMember(Order = 5)] public string Sha256 { get; init; } = default!; }
[DataContract] public sealed class ExportBounds { [DataMember(Order = 1)] public string MinRecordId { get; init; } = default!; [DataMember(Order = 2)] public string MaxRecordId { get; init; } = default!; [DataMember(Order = 3)] public DateTimeOffset? From { get; init; } [DataMember(Order = 4)] public DateTimeOffset? To { get; init; } }
[DataContract] public sealed class IntegrityBundle { [DataMember(Order = 1)] public IReadOnlyList<SegmentRef>? Segments { get; init; } [DataMember(Order = 2)] public IReadOnlyList<BlockRef>? Blocks { get; init; } }
[DataContract] public sealed class SegmentRef { [DataMember(Order = 1)] public string SegmentId { get; init; } = default!; [DataMember(Order = 2)] public string RootHash { get; init; } = default!; [DataMember(Order = 3)] public string BlockId { get; init; } = default!; }
[DataContract] public sealed class BlockRef { [DataMember(Order = 1)] public string BlockId { get; init; } = default!; [DataMember(Order = 2)] public string BlockRoot { get; init; } = default!; [DataMember(Order = 3)] public string PrevBlockRoot { get; init; } = default!; [DataMember(Order = 4)] public Signature Signature { get; init; } = new(); [DataMember(Order = 5)] public string SigningKeyId { get; init; } = default!; }
[DataContract] public sealed class Signature { [DataMember(Order = 1)] public string Scheme { get; init; } = "Ed25519"; [DataMember(Order = 2)] public string Value { get; init; } = default!; }
Protobuf (optional emission)¶
syntax = "proto3";
package connectsoft.export.v1;
message ExportJob {
string JobId = 1 [json_name = "jobId"];
string TenantId = 2 [json_name = "tenantId"];
google.protobuf.Timestamp CreatedAt = 3 [json_name = "createdAt"];
ActorRef CreatedBy = 4 [json_name = "createdBy"];
string State = 5 [json_name = "state"]; // Pending|Running|...
string StateReason = 6 [json_name = "stateReason"];
ExportFilter Filter = 7 [json_name = "filter"];
string Format = 8 [json_name = "format"]; // Jsonl|Parquet
string Compression = 9 [json_name = "compression"]; // None|Gzip
Encryption Encryption = 10 [json_name = "encryption"];
bool IncludeIntegrity = 11 [json_name = "includeIntegrity"];
google.protobuf.Timestamp SealedThrough = 12 [json_name = "sealedThrough"];
RedactionPlanRef RedactionPlan = 13 [json_name = "redactionPlan"];
Delivery Delivery = 14 [json_name = "delivery"];
int32 PackageBytesTarget = 15 [json_name = "packageBytesTarget"];
int32 MaxPackages = 16 [json_name = "maxPackages"];
string ResumeToken = 17 [json_name = "resumeToken"];
google.protobuf.Timestamp LastUpdatedAt = 18 [json_name = "lastUpdatedAt"];
}
message ExportFilter {
TimeRange TimeRange = 1 [json_name = "timeRange"];
repeated string ResourceTypes = 2 [json_name = "resourceTypes"];
repeated string Actions = 3 [json_name = "actions"];
map<string,string> Attributes = 4 [json_name = "attributes"];
repeated string DataClasses = 5 [json_name = "dataClasses"];
bool LegalHoldOnly = 6 [json_name = "legalHoldOnly"];
}
message ExportManifest {
string SchemaVersion = 1 [json_name = "schemaVersion"];
string JobId = 2 [json_name = "jobId"];
string PackageId = 3 [json_name = "packageId"];
string TenantId = 4 [json_name = "tenantId"];
google.protobuf.Timestamp CreatedAt = 5 [json_name = "createdAt"];
int32 PackageIndex = 6 [json_name = "packageIndex"];
int32 PackageCount = 7 [json_name = "packageCount"];
string Format = 8 [json_name = "format"];
string Compression = 9 [json_name = "compression"];
Encryption Encryption = 10 [json_name = "encryption"];
RedactionPlanRef RedactionPlan = 11 [json_name = "redactionPlan"];
int64 RecordCount = 12 [json_name = "recordCount"];
int64 BytesUncompressed = 13 [json_name = "bytesUncompressed"];
repeated ContentFile Content = 14 [json_name = "content"];
ExportBounds Bounds = 15 [json_name = "bounds"];
IntegrityBundle Integrity = 16 [json_name = "integrity"];
string ContentHash = 17 [json_name = "contentHash"];
Signature Signature = 18 [json_name = "signature"];
}
message ContentFile { string Name = 1 [json_name = "name"]; string Uri = 2 [json_name = "uri"]; int64 Bytes = 3 [json_name = "bytes"]; int64 Records = 4 [json_name = "records"]; string Sha256 = 5 [json_name = "sha256"]; }
message ExportBounds { string MinRecordId = 1 [json_name = "minRecordId"]; string MaxRecordId = 2 [json_name = "maxRecordId"]; google.protobuf.Timestamp From = 3 [json_name = "from"]; google.protobuf.Timestamp To = 4 [json_name = "to"]; }
message IntegrityBundle { repeated SegmentRef Segments = 1 [json_name = "segments"]; repeated BlockRef Blocks = 2 [json_name = "blocks"]; }
message SegmentRef { string SegmentId = 1 [json_name = "segmentId"]; string RootHash = 2 [json_name = "rootHash"]; string BlockId = 3 [json_name = "blockId"]; }
message BlockRef { string BlockId = 1 [json_name = "blockId"]; string BlockRoot = 2 [json_name = "blockRoot"]; string PrevBlockRoot = 3 [json_name = "prevBlockRoot"]; Signature Signature = 4 [json_name = "signature"]; string SigningKeyId = 5 [json_name = "signingKeyId"]; }
message Encryption { string Scheme = 1 [json_name = "scheme"]; string KeyId = 2 [json_name = "keyId"]; string WrappedKey = 3 [json_name = "wrappedKey"]; }
message RedactionPlanRef { string Id = 1 [json_name = "id"]; int32 Revision = 2 [json_name = "revision"]; }
message Delivery { string Kind = 1 [json_name = "kind"]; string Path = 2 [json_name = "path"]; string CredentialsRef = 3 [json_name = "credentialsRef"]; }
message Signature { string Scheme = 1 [json_name = "scheme"]; string Value = 2 [json_name = "value"]; }
Examples¶
Job request (JSON)
{
"jobId": "01JE5N3WTQ4J9V7M4A1ZP6D9TQ",
"tenantId": "splootvets",
"createdAt": "2025-10-22T15:00:00Z",
"createdBy": { "id": "user_321", "type": "User", "display": "Ops Analyst" },
"state": "Pending",
"filter": {
"timeRange": { "from": "2025-10-01T00:00:00Z", "to": "2025-10-21T23:59:59Z", "anchor": "CreatedAt" },
"resourceTypes": ["Vetspire.Appointment"],
"actions": ["appointment.*"],
"dataClasses": ["Personal","Sensitive"]
},
"format": "Jsonl",
"compression": "Gzip",
"includeIntegrity": true,
"sealedThrough": "2025-10-21T23:59:59Z",
"redactionPlan": { "id": "policy-default", "revision": 3 },
"delivery": { "kind": "S3", "path": "s3://exports/splootvets/2025-10/" },
"packageBytesTarget": 536870912
}
Manifest (per package)
{
"schemaVersion": "export-manifest.v1",
"jobId": "01JE5N3WTQ4J9V7M4A1ZP6D9TQ",
"packageId": "01JE5N7C3Q9Z2K8R1V0M5D4N6P",
"tenantId": "splootvets",
"createdAt": "2025-10-22T15:10:12Z",
"packageIndex": 0,
"format": "Jsonl",
"compression": "Gzip",
"redactionPlan": { "id": "policy-default", "revision": 3 },
"recordCount": 125_000,
"bytesUncompressed": 412_345_678,
"content": [
{
"name": "export_01JE5N3WTQ_000.jsonl.gz",
"uri": "s3://exports/splootvets/2025-10/export_01JE5N3WTQ_000.jsonl.gz",
"bytes": 98_765_432,
"records": 125_000,
"sha256": "9e1d4c0b...ab7f"
}
],
"bounds": {
"minRecordId": "01JDZZZZZZZZZZZZZZZZZZZZZZ",
"maxRecordId": "01JE0000000000000000000000",
"from": "2025-10-01T00:00:00Z",
"to": "2025-10-10T00:00:00Z"
},
"integrity": {
"segments": [
{ "segmentId": "01JE5H....1A", "rootHash": "ab12...ff", "blockId": "01JE5H....B1" }
],
"blocks": [
{
"blockId": "01JE5H....B1",
"blockRoot": "a0b1c2...9d",
"prevBlockRoot": "90fe12...aa",
"signature": { "scheme": "Ed25519", "value": "MEYCIQ..." },
"signingKeyId": "kv:prod/atp-integrity/ed25519-2025-01"
}
]
},
"contentHash": "0a1b2c3d4e5f...fe",
"signature": { "scheme": "Ed25519", "value": "MC4CFQ..." }
}
Package file (JSONL snippet)
{"auditRecordId":"01JE...","tenantId":"splootvets", "...": "..."}
{"auditRecordId":"01JE...","tenantId":"splootvets", "...": "..."}
State machine¶
stateDiagram-v2
[*] --> Pending
Pending --> Running: Start
Running --> Paused: Pause
Paused --> Running: Resume
Running --> Completed: All packages delivered
Running --> Failed: Error (with reason)
Pending --> Canceled: Cancel
Running --> Canceled: Cancel
Paused --> Canceled: Cancel
Resume tokens¶
Opaque resumeToken encodes the last committed checkpoint:
lastRecordId(ULID) andwatermark(UTC) of the anchor clock,packageIndexand byte offset (for partial file resume) if supported,- HMAC for tamper detection.
Writers MUST treat resumeToken as opaque and validate HMAC + monotonic advance.
Storage mapping¶
CREATE TABLE dbo.ExportJobs (
JobId CHAR(26) NOT NULL,
TenantId NVARCHAR(128) NOT NULL,
State NVARCHAR(16) NOT NULL,
StateReason NVARCHAR(256) NULL,
CreatedAt DATETIME2(0) NOT NULL,
CreatedBy NVARCHAR(256) NOT NULL, -- JSON (ActorRef)
FilterJson NVARCHAR(MAX) NOT NULL,
Format NVARCHAR(16) NOT NULL, -- Jsonl|Parquet
Compression NVARCHAR(16) NOT NULL, -- None|Gzip
EncryptionJson NVARCHAR(256) NULL,
IncludeIntegrity BIT NOT NULL DEFAULT 1,
SealedThrough DATETIME2(0) NULL,
RedactionPlan NVARCHAR(64) NOT NULL, -- "id:revision"
DeliveryJson NVARCHAR(512) NOT NULL,
PackageBytesTarget INT NULL,
MaxPackages INT NULL,
ProgressJson NVARCHAR(128) NULL,
ResumeToken NVARCHAR(256) NULL,
LastUpdatedAt DATETIME2(0) NOT NULL,
CONSTRAINT PK_ExportJobs PRIMARY KEY (JobId),
INDEX IX_ExportJobs_Tenant_State (TenantId, State)
);
CREATE TABLE dbo.ExportPackages (
PackageId CHAR(26) NOT NULL,
JobId CHAR(26) NOT NULL,
TenantId NVARCHAR(128) NOT NULL,
PackageIndex INT NOT NULL,
ManifestJson NVARCHAR(MAX) NOT NULL,
DeliveredAt DATETIME2(0) NULL,
DeliveryResult NVARCHAR(512) NULL, -- etag/url/etc
CONSTRAINT PK_ExportPackages PRIMARY KEY (PackageId),
CONSTRAINT FK_ExportPackages_Jobs FOREIGN KEY (JobId) REFERENCES dbo.ExportJobs(JobId),
INDEX IX_ExportPackages_Job (JobId, PackageIndex)
);
Validation rules (summary)¶
sealedThrough≤ current time; only include records withIntegrityBlock.SealedAt ≤ sealedThroughwhen set.- Legal hold respected: records under active hold are excluded unless export purpose is a hold export (then include and label).
- Determinism:
contentHashand per-filesha256must match delivered bytes; manifest signature verifies canonical JSON. - Redaction: exported records MUST reflect the specified
redactionPlan; no raw PII beyond the plan. - Package bounds:
minRecordId ≤ maxRecordId;packageIndexunique per job. - Encryption: when
AES256-GCM, each file has its own nonce/IV;wrappedKeypresent or retrievable viakeyId.
Tenancy Keys & Partitioning¶
Defines the tenant identity (tenantId) and the partition/sharding strategies used to enforce isolation, enable predictable scalability, and satisfy data residency constraints across storage and compute.
Overview¶
- Tenant-first: All authoritative writes and read models are keyed by
tenantId. Cross-tenant joins are prohibited. - Predictable locality: Partition primarily by
tenantId, secondarily by time (createdAt/ ULID time) to keep pruning cheap. - Shard ring: For horizontally scaled backends, map tenants to logical shards via a stable, HMAC-based hashing scheme.
- Row-Level Security (RLS): Enforce tenant isolation at the database layer via session-scoped predicates/policies.
- Residency: Each tenant declares a home region and allowed regions; data placement honors these rules end-to-end.
tenantId rules¶
| Aspect | Rule |
|---|---|
| Shape | Opaque ASCII token: ^[A-Za-z0-9._-]{1,128}$ |
| Stability | Immutable for the life of the tenant. No rename-in-place (use migration tooling if absolutely necessary). |
| Case | Case-sensitive by default (treat as opaque); do not normalize at write. |
| Exposure | Safe to appear in indexes, URIs, file paths. Do not embed secrets or PII. |
| Scope | Unique within the platform (global). |
| Derivatives | tenantScopedId = "<tenantId>:<type>:<id>" (see ResourceRef); tenantHash = HMAC-SHA256(tenantId, shardSecret) used for sharding only. |
JSON uses lowerCamel (
tenantId); database tables/columns use PascalCase (TenantId) per conventions.
Sharding & partitioning¶
Logical shard assignment
- Compute
tenantHash = HMACSHA256(tenantId, shardSecret)(hex), then:shardId = (uint32)first4Bytes(tenantHash) % ringSizeringVersionincrements when the fleet is rebalanced; mapping is persisted for auditability.
Physical partitioning (authoritative store)
- Partition key:
TenantId - Secondary prune:
CreatedAt(orUlidTimederived fromAuditRecordId) - Indexes:
(TenantId, CreatedAt)for range scans(TenantId, IdempotencyKey)filtered unique (when present)
Hotspot guidance
- ULIDs are time-ordered; to avoid hot partitions, always prefix by
TenantIdand use time-bucketed partitions (e.g., monthly). - Large “whale” tenants may receive dedicated shards (explicit
shardIdoverride) while retaining the same logical model.
Data residency¶
Residency policy (per tenant)
| Field (JSON) | Type | Req. | Description |
|---|---|---|---|
homeRegion |
string | ✓ | Canonical region (e.g., eu-west-1, us-central). |
allowedRegions |
string[] | ✓ | Regions where data-at-rest may reside. |
pinToHome |
bool | If true, authoritative data stored only in homeRegion. |
|
replication |
enum | None | AsyncCrossRegion | MultiActive. |
|
exceptions |
object | Categories allowed to cross borders (e.g., "telemetry": "aggregated-only"). |
Readers/writers must respect residency at ingress, storage, index, backup, and export time.
JSON Schemas (partials, v1)¶
tenant-context.v1.json
Context carried internally to route requests and validate RLS.
{
"$id": "urn:connectsoft:schemas:tenancy/tenant-context.v1.json",
"$schema": "https://json-schema.org/draft/2020-12/schema",
"title": "TenantContext",
"type": "object",
"additionalProperties": false,
"properties": {
"tenantId": { "type": "string", "pattern": "^[A-Za-z0-9._-]{1,128}$" },
"shardId": { "type": "integer", "minimum": 0 },
"ringVersion": { "type": "integer", "minimum": 1 },
"homeRegion": { "type": "string", "maxLength": 32 },
"effectiveRegion": { "type": "string", "maxLength": 32 }
},
"required": ["tenantId","shardId","ringVersion"]
}
residency-policy.v1.json
{
"$id": "urn:connectsoft:schemas:tenancy/residency-policy.v1.json",
"$schema": "https://json-schema.org/draft/2020-12/schema",
"title": "TenantResidencyPolicy",
"type": "object",
"additionalProperties": false,
"properties": {
"tenantId": { "type": "string", "pattern": "^[A-Za-z0-9._-]{1,128}$" },
"homeRegion": { "type": "string" },
"allowedRegions": { "type": "array", "items": { "type": "string" }, "minItems": 1 },
"pinToHome": { "type": "boolean", "default": false },
"replication": { "type": "string", "enum": ["None","AsyncCrossRegion","MultiActive"], "default": "None" },
"exceptions": { "type": "object", "additionalProperties": { "type": "string" } },
"version": { "type": "integer", "minimum": 1 }
},
"required": ["tenantId","homeRegion","allowedRegions","version"]
}
shard-mapping.v1.json
{
"$id": "urn:connectsoft:schemas:tenancy/shard-mapping.v1.json",
"$schema": "https://json-schema.org/draft/2020-12/schema",
"title": "ShardMapping",
"type": "object",
"additionalProperties": false,
"properties": {
"ringVersion": { "type": "integer", "minimum": 1 },
"ringSize": { "type": "integer", "minimum": 1 },
"assignments": {
"type": "array",
"items": {
"type": "object",
"additionalProperties": false,
"properties": {
"tenantId": { "type": "string", "pattern": "^[A-Za-z0-9._-]{1,128}$" },
"shardId": { "type": "integer", "minimum": 0 }
},
"required": ["tenantId","shardId"]
}
}
},
"required": ["ringVersion","ringSize","assignments"]
}
C# (gRPC code-first)¶
[DataContract]
public sealed class TenantContext
{
[DataMember(Order = 1)] public string TenantId { get; init; } = default!;
[DataMember(Order = 2)] public int ShardId { get; init; }
[DataMember(Order = 3)] public int RingVersion { get; init; }
[DataMember(Order = 4)] public string? HomeRegion { get; init; }
[DataMember(Order = 5)] public string? EffectiveRegion { get; init; }
}
[DataContract]
public sealed class TenantResidencyPolicy
{
[DataMember(Order = 1)] public string TenantId { get; init; } = default!;
[DataMember(Order = 2)] public string HomeRegion { get; init; } = default!;
[DataMember(Order = 3)] public IReadOnlyList<string> AllowedRegions { get; init; } = Array.Empty<string>();
[DataMember(Order = 4)] public bool PinToHome { get; init; }
[DataMember(Order = 5)] public string Replication { get; init; } = "None"; // None|AsyncCrossRegion|MultiActive
[DataMember(Order = 6)] public IReadOnlyDictionary<string,string>? Exceptions { get; init; }
[DataMember(Order = 7)] public int Version { get; init; } = 1;
}
public static class ShardRing
{
// Derive a stable shard from tenantId
public static int ComputeShardId(string tenantId, int ringSize, byte[] shardSecret)
{
using var hmac = new System.Security.Cryptography.HMACSHA256(shardSecret);
var bytes = System.Text.Encoding.UTF8.GetBytes(tenantId);
var hash = hmac.ComputeHash(bytes);
var value = System.Buffers.Binary.BinaryPrimitives.ReadUInt32BigEndian(hash.AsSpan(0, 4));
return (int)(value % ringSize);
}
}
JSON serialization MUST use camelCase. Database schema and columns use PascalCase (e.g.,
Tenants,TenantId,HomeRegion,ShardId).
RLS (Row-Level Security) notes¶
PostgreSQL
-- Session setup (application must set this per connection)
SELECT set_config('app.tenant_id', :tenant_id, TRUE);
ALTER TABLE "AuditRecords" ENABLE ROW LEVEL SECURITY;
CREATE POLICY tenant_isolation ON "AuditRecords"
USING ("TenantId" = current_setting('app.tenant_id'));
-- Optional read-only policy for auditors scoped to a list
CREATE POLICY auditor_scope ON "AuditRecords"
FOR SELECT
USING ("TenantId" = ANY (current_setting('app.auditor_tenants')::text[]));
SQL Server
-- Session context set by application:
EXEC sp_set_session_context @key = N'tenant_id', @value = @tenantId;
CREATE FUNCTION dbo.fn_tenantPredicate(@TenantId AS NVARCHAR(128))
RETURNS TABLE WITH SCHEMABINDING
AS RETURN SELECT 1 AS fn_result
WHERE @TenantId = CONVERT(NVARCHAR(128), SESSION_CONTEXT(N'tenant_id'));
CREATE SECURITY POLICY dbo.TenantSecurityPolicy
ADD FILTER PREDICATE dbo.fn_tenantPredicate(TenantId) ON dbo.AuditRecords,
ADD BLOCK PREDICATE dbo.fn_tenantPredicate(TenantId) ON dbo.AuditRecords
WITH (STATE = ON);
Operational guardrails
- Always set the session tenant context before any query.
- Use least-privilege service accounts; forbid
BYPASSRLS-equivalent privileges. - Mirror the same predicates in reporting/BI and CDC pipelines.
Storage mapping (authoritative)¶
SQL (illustrative)
-- Monthly partitioning by CreatedAt in addition to TenantId indexes
CREATE TABLE dbo.AuditRecords (
AuditRecordId CHAR(26) NOT NULL,
TenantId NVARCHAR(128) NOT NULL,
CreatedAt DATETIME2(3) NOT NULL,
-- ... other columns (see AuditRecord)
CONSTRAINT PK_AuditRecords PRIMARY KEY (AuditRecordId)
);
CREATE INDEX IX_AuditRecords_Tenant_CreatedAt
ON dbo.AuditRecords (TenantId, CreatedAt);
-- Optional computed 'CreatedMonth' for partition pruning
ALTER TABLE dbo.AuditRecords ADD CreatedMonth AS (CONVERT(CHAR(7), CreatedAt, 126)) PERSISTED;
CREATE INDEX IX_AuditRecords_Tenant_CreatedMonth ON dbo.AuditRecords (TenantId, CreatedMonth);
Object storage (packages/exports) Prefix paths with tenant and region to keep listings cheap and enforce residency:
Search & index tenants¶
- Create one index per tenant (aliasing pattern) when using full-text engines, e.g.,
audit-{tenantId}oraudit-{region}-{tenantId}when residency matters. - Alternatively, multi-tenant indices must include a hard filter (
tenantId) in index templates/mappings and be protected by index-level RBAC.
Examples¶
Tenant context resolved for routing
{
"tenantId": "splootvets",
"shardId": 7,
"ringVersion": 3,
"homeRegion": "us-central",
"effectiveRegion": "us-central"
}
Residency policy
{
"tenantId": "eucorp",
"homeRegion": "eu-west-1",
"allowedRegions": ["eu-west-1","eu-central-1"],
"pinToHome": true,
"replication": "None",
"version": 2
}
Validation rules (summary)¶
tenantIdmatches^[A-Za-z0-9._-]{1,128}$; no whitespace or slashes.TenantContext.shardIdin[0, ringSize-1];ringVersionstrictly positive.- Write path requires a resolved
TenantContext; reject writes lacking one. - Residency:
effectiveRegion∈allowedRegions; whenpinToHome=true, authoritative writes must targethomeRegion. - RLS policies must be present and ON for all multi-tenant tables, including projections and CDC shadow tables.
Authoritative Stores (Write Path)¶
Models the append-only source of truth for audit facts. The write path persists a canonical AuditRecord (minus late-bound integrity) and enforces WORM (Write-Once-Read-Many) semantics with minimal indexes for durable ingestion and backpressure-friendly throughput.
Overview¶
- Append-only facts: New rows are inserted; no UPDATE/DELETE. Late-bound materials (e.g., integrity proofs) land in sidecar append tables.
- Tenant-first: All rows are keyed by
TenantId(see Tenancy Keys & Partitioning). Cross-tenant joins are prohibited. - Canonical JSON: Store the full record as canonical JSON (JCS/RFC8785) in
PayloadJsonusing camelCase field names; theintegritynode is excluded at write. - Idempotency: A per-tenant
IdempotencyKeysupports safe retries; duplicates are ignored and the existingAuditRecordIdis returned. - Minimal indexing: Only the keys needed for durability, dedupe, and range scans on time. All query-optimized shapes live in read projections (query path).
C# properties / gRPC code-first: PascalCase. JSON payload: lowerCamel. Tables/columns: PascalCase.
Logical model (authoritative)¶
AuditRecords (append-only, authoritative fact)
| Column (PascalCase) | Type | Req. | Description |
|---|---|---|---|
AuditRecordId |
ULID (CHAR(26)) | ✓ | Primary key (time-ordered). |
TenantId |
string | ✓ | Tenant scope token. |
CreatedAt |
timestamp(UTC) | ✓ | When the producer says this fact occurred. |
ObservedAt |
timestamp(UTC) | ✓ | When the platform observed/accepted it. |
EffectiveAt |
timestamp(UTC) | Optional domain-effective time. | |
Action |
string | ✓ | verb or verb.noun (lowercase). |
ResourceType |
string | ✓ | From resource.type (PascalCase). |
ResourceId |
string | ✓ | From resource.id (opaque). |
ResourcePath |
string | Optional JSON-Pointer-style path. | |
ActorId |
string | ✓ | From actor.id (opaque). |
ActorType |
enum | ✓ | Unknown|User|Service|Job. |
CorrelationTraceId |
hex32 | ✓ | W3C trace id. |
CorrelationRequestId |
string | Optional request token. | |
DecisionOutcome |
enum | If present on write (Allow|Deny|NotApplicable|Unknown). |
|
IdempotencyKey |
string | Optional per-tenant dedupe key. | |
SchemaVersion |
smallint | ✓ | AuditRecord schema version embedded in PayloadJson. |
PayloadJson |
JSON/JSONB/NVARCHAR(MAX) | ✓ | Entire canonical AuditRecord JSON without integrity. |
PayloadBytes |
int | ✓ | Raw payload size (bytes), for budgeting/backpressure. |
RecordIntegrity (sidecar, append-only; set by Integrity Service post-seal)
| Column | Type | Req. | Description |
|---|---|---|---|
AuditRecordId |
ULID | ✓ | FK → AuditRecords. |
BlockId |
ULID | ✓ | Integrity block that sealed this record. |
SegmentId |
ULID | ✓ | Segment containing the leaf. |
LeafIndex |
int | ✓ | Zero-based leaf index in segment. |
LeafHash |
hex64 | ✓ | SHA-256 of canonical record bytes (no integrity). |
Algo |
string | ✓ | SHA256. |
MerklePathJson |
JSON | ✓ | Array of { pos: "L"|"R", hash: hex64 }. |
SealedAt |
timestamp(UTC) | ✓ | When the block was sealed/signed. |
Keeping integrity in a sidecar maintains strict WORM for
AuditRecordswhile still allowing verifiable proofs.
C# (persistence rows; gRPC code-first)¶
[DataContract]
public sealed class AuditRecordRow
{
[DataMember(Order = 1)] public string AuditRecordId { get; init; } = default!; // ULID
[DataMember(Order = 2)] public string TenantId { get; init; } = default!;
[DataMember(Order = 3)] public DateTimeOffset CreatedAt { get; init; }
[DataMember(Order = 4)] public DateTimeOffset ObservedAt { get; init; }
[DataMember(Order = 5)] public DateTimeOffset? EffectiveAt { get; init; }
[DataMember(Order = 6)] public string Action { get; init; } = default!;
[DataMember(Order = 7)] public string ResourceType { get; init; } = default!;
[DataMember(Order = 8)] public string ResourceId { get; init; } = default!;
[DataMember(Order = 9)] public string? ResourcePath { get; init; }
[DataMember(Order = 10)] public string ActorId { get; init; } = default!;
[DataMember(Order = 11)] public ActorType ActorType { get; init; } = ActorType.Unknown;
[DataMember(Order = 12)] public string CorrelationTraceId { get; init; } = default!;
[DataMember(Order = 13)] public string? CorrelationRequestId { get; init; }
[DataMember(Order = 14)] public DecisionOutcome? DecisionOutcome { get; init; }
[DataMember(Order = 15)] public string? IdempotencyKey { get; init; }
[DataMember(Order = 16)] public short SchemaVersion { get; init; } = 1;
[DataMember(Order = 17)] public string PayloadJson { get; init; } = default!; // canonical JSON (lowerCamel)
[DataMember(Order = 18)] public int PayloadBytes { get; init; }
}
[DataContract]
public sealed class RecordIntegrityRow
{
[DataMember(Order = 1)] public string AuditRecordId { get; init; } = default!;
[DataMember(Order = 2)] public string BlockId { get; init; } = default!;
[DataMember(Order = 3)] public string SegmentId { get; init; } = default!;
[DataMember(Order = 4)] public int LeafIndex { get; init; }
[DataMember(Order = 5)] public string LeafHash { get; init; } = default!; // hex SHA-256
[DataMember(Order = 6)] public string Algo { get; init; } = "SHA256";
[DataMember(Order = 7)] public string MerklePathJson { get; init; } = default!; // JSON array
[DataMember(Order = 8)] public DateTimeOffset SealedAt { get; init; }
}
PayloadJsonfollows the canonical JSON rules from Modeling Principles & Conventions.SchemaVersionmust match the embeddedauditRecord.schemaVersion.
Storage mapping (PostgreSQL)¶
-- Authoritative facts (append-only)
CREATE TABLE "AuditRecords" (
"AuditRecordId" CHAR(26) PRIMARY KEY,
"TenantId" TEXT NOT NULL,
"CreatedAt" TIMESTAMPTZ NOT NULL,
"ObservedAt" TIMESTAMPTZ NOT NULL,
"EffectiveAt" TIMESTAMPTZ NULL,
"Action" TEXT NOT NULL,
"ResourceType" TEXT NOT NULL,
"ResourceId" TEXT NOT NULL,
"ResourcePath" TEXT NULL,
"ActorId" TEXT NOT NULL,
"ActorType" SMALLINT NOT NULL, -- enum ordinal
"CorrelationTraceId" CHAR(32) NOT NULL, -- hex
"CorrelationRequestId" TEXT NULL,
"DecisionOutcome" SMALLINT NULL, -- enum ordinal
"IdempotencyKey" TEXT NULL,
"SchemaVersion" SMALLINT NOT NULL DEFAULT 1,
"PayloadJson" JSONB NOT NULL,
"PayloadBytes" INTEGER NOT NULL
);
-- Sidecar integrity (append-only)
CREATE TABLE "RecordIntegrity" (
"AuditRecordId" CHAR(26) PRIMARY KEY REFERENCES "AuditRecords"("AuditRecordId"),
"BlockId" CHAR(26) NOT NULL,
"SegmentId" CHAR(26) NOT NULL,
"LeafIndex" INTEGER NOT NULL,
"LeafHash" CHAR(64) NOT NULL, -- hex
"Algo" TEXT NOT NULL DEFAULT 'SHA256',
"MerklePathJson" JSONB NOT NULL, -- [{pos:'L'|'R',hash:'...'}, ...]
"SealedAt" TIMESTAMPTZ NOT NULL
);
-- Minimal indexes for durability & idempotency
CREATE INDEX "IX_Audit_Tenant_CreatedAt" ON "AuditRecords" ("TenantId","CreatedAt");
CREATE INDEX "IX_Audit_Tenant_Trace" ON "AuditRecords" ("TenantId","CorrelationTraceId");
CREATE UNIQUE INDEX "UX_Audit_Tenant_Idem" ON "AuditRecords" ("TenantId","IdempotencyKey")
WHERE "IdempotencyKey" IS NOT NULL;
-- WORM enforcement: block UPDATE/DELETE; allow INSERT only
CREATE OR REPLACE FUNCTION fn_auditrecords_block_ud() RETURNS trigger AS $$
BEGIN
RAISE EXCEPTION 'WORM: AuditRecords are append-only';
END; $$ LANGUAGE plpgsql;
CREATE TRIGGER "trg_auditrecords_no_update"
BEFORE UPDATE OR DELETE ON "AuditRecords" FOR EACH ROW EXECUTE FUNCTION fn_auditrecords_block_ud();
CREATE OR REPLACE FUNCTION fn_recordintegrity_block_ud() RETURNS trigger AS $$
BEGIN
RAISE EXCEPTION 'WORM: RecordIntegrity is append-only';
END; $$ LANGUAGE plpgsql;
CREATE TRIGGER "trg_recordintegrity_no_update"
BEFORE UPDATE OR DELETE ON "RecordIntegrity" FOR EACH ROW EXECUTE FUNCTION fn_recordintegrity_block_ud();
Storage mapping (SQL Server)¶
-- Authoritative facts (append-only)
CREATE TABLE dbo.AuditRecords (
AuditRecordId CHAR(26) NOT NULL CONSTRAINT PK_AuditRecords PRIMARY KEY,
TenantId NVARCHAR(128) NOT NULL,
CreatedAt DATETIME2(3) NOT NULL,
ObservedAt DATETIME2(3) NOT NULL,
EffectiveAt DATETIME2(3) NULL,
Action NVARCHAR(64) NOT NULL,
ResourceType NVARCHAR(128) NOT NULL,
ResourceId NVARCHAR(128) NOT NULL,
ResourcePath NVARCHAR(256) NULL,
ActorId NVARCHAR(128) NOT NULL,
ActorType SMALLINT NOT NULL, -- enum ordinal
CorrelationTraceId CHAR(32) NOT NULL,
CorrelationRequestId NVARCHAR(128) NULL,
DecisionOutcome SMALLINT NULL, -- enum ordinal
IdempotencyKey NVARCHAR(128) NULL,
SchemaVersion SMALLINT NOT NULL CONSTRAINT DF_AuditRecords_SchemaVersion DEFAULT (1),
PayloadJson NVARCHAR(MAX) NOT NULL,
PayloadBytes INT NOT NULL
);
-- Sidecar integrity
CREATE TABLE dbo.RecordIntegrity (
AuditRecordId CHAR(26) NOT NULL CONSTRAINT PK_RecordIntegrity PRIMARY KEY
CONSTRAINT FK_RecordIntegrity_Audit FOREIGN KEY REFERENCES dbo.AuditRecords(AuditRecordId),
BlockId CHAR(26) NOT NULL,
SegmentId CHAR(26) NOT NULL,
LeafIndex INT NOT NULL,
LeafHash CHAR(64) NOT NULL,
Algo NVARCHAR(16) NOT NULL CONSTRAINT DF_RecordIntegrity_Algo DEFAULT ('SHA256'),
MerklePathJson NVARCHAR(MAX) NOT NULL,
SealedAt DATETIME2(3) NOT NULL
);
-- Minimal indexes
CREATE INDEX IX_Audit_Tenant_CreatedAt ON dbo.AuditRecords (TenantId, CreatedAt);
CREATE INDEX IX_Audit_Tenant_Trace ON dbo.AuditRecords (TenantId, CorrelationTraceId);
CREATE UNIQUE INDEX UX_Audit_Tenant_Idem ON dbo.AuditRecords (TenantId, IdempotencyKey) WHERE IdempotencyKey IS NOT NULL;
-- WORM enforcement via INSTEAD OF triggers
CREATE TRIGGER trg_AuditRecords_NoUpdateDelete ON dbo.AuditRecords
INSTEAD OF UPDATE, DELETE AS
BEGIN
RAISERROR ('WORM: AuditRecords are append-only', 16, 1);
END;
CREATE TRIGGER trg_RecordIntegrity_NoUpdateDelete ON dbo.RecordIntegrity
INSTEAD OF UPDATE, DELETE AS
BEGIN
RAISERROR ('WORM: RecordIntegrity is append-only', 16, 1);
END;
Write path flow (high level)¶
- Ingress (Gateway/Service) validates & canonicalizes
AuditRecord(JSON, nointegrity), assignsAuditRecordId(ULID), computesPayloadBytes. - Idempotency check: If
IdempotencyKeyprovided, attempt insert with unique(TenantId, IdempotencyKey); on conflict, return the existingAuditRecordId. - Insert into
AuditRecordswith minimal indexes only (low write amplification). - Integrity Service batches sealed segments/blocks and appends a row to
RecordIntegrityfor eachAuditRecordIdincluded, carryingLeafHash,MerklePathJson, andSealedAt. - Projectors build query-optimized read models asynchronously (see Read Models & Projections).
Budgets & caps¶
- Max
PayloadBytesat write: 256 KiB (see Performance & Size Budgets). Oversized records must pre-redact or summarize (hash +Delta.truncated=true). - Max write QPS per tenant (soft): tiered by edition; apply backpressure when
PayloadBytesor QPS budgets are exceeded. - IdempotencyKey TTL: keep the unique key for ≥ 24 hours (configurable) to absorb retries safely.
WORM guidance & operational controls¶
- SQL-layer WORM: Use
INSTEAD OF UPDATE/DELETEtriggers (SQL Server) orBEFORE UPDATE/DELETEtriggers (PostgreSQL) to block mutations. - ACLs: Only the ingestion service account has INSERT; reporting users get SELECT only. Deny schema-altering privileges.
- Physical immutability (optional): Stream all inserted rows into an object-storage WORM tier (e.g., S3 Object Lock / Azure Immutable Blob) for secondary immutability and eDiscovery exports.
- Retention: Deletions happen only via lifecycle after records are Eligible and not
OnHold(see Retention Policy & Legal Hold). Lifecycle performs hard delete fromAuditRecordsand cascadesRecordIntegrity.
Validation rules (summary)¶
TenantIdrequired and must pass tenancy predicate (RLS).CreatedAt ≤ ObservedAt ≤ now;EffectiveAt≤CreatedAt(if present).ResourceTypePascalCase;Actionlowercaseverborverb.noun.CorrelationTraceIdis hex32 (W3C).IdempotencyKeyunique per(TenantId, IdempotencyKey)when not null.PayloadJsonmust validate against AuditRecord v{SchemaVersion} and use lowerCamel property names.
Read Models & Projections (Query Path)¶
Defines query-optimized projections used by APIs, consoles, and exports. Projections are derived, denormalized, and rebuildable from the authoritative append store. They support seek-pagination, per-tenant watermarks, and idempotent upserts.
JSON: lowerCamel. C#/gRPC (code-first): PascalCase. Tables/columns: PascalCase.
Overview¶
- Shapes:
- Events: flat, filterable event stream per tenant for search/list views.
- Resource Timeline: fast per-resource history (
resource.type + resource.id). - Actor Activity: fast per-actor history.
- Selective fields: Only hot fields are projected (keep the canonical in authoritative store).
- At-least-once projectors: Use idempotent upserts keyed by
(TenantId, AuditRecordId). - Watermarks & checkpoints per tenant and projection, for resumable processing & rebuilds.
- Seek-pagination: stable order
(CreatedAt, AuditRecordId); opaque base64url cursor.
Canonical event projection¶
AuditEvents (one row per AuditRecord)
| Column (PascalCase) | Type | Req. | Notes |
|---|---|---|---|
TenantId |
string | ✓ | Partition/RLS key. |
AuditRecordId |
ULID | ✓ | Unique; PK with tenant. |
CreatedAt |
timestamp(UTC) | ✓ | Primary sort key. |
ObservedAt |
timestamp(UTC) | ✓ | Secondary time. |
Action |
string | ✓ | verb or verb.noun (lowercase). |
ResourceType |
string | ✓ | PascalCase. |
ResourceId |
string | ✓ | Opaque id. |
ActorId |
string | ✓ | Opaque id. |
ActorType |
smallint | ✓ | Enum ordinal. |
DecisionOutcome |
smallint | Enum ordinal, if present. | |
ChangedFields |
nvarchar/json | e.g., ["status","/lines/0/price"] (summary). |
|
DataClassFlags |
smallint | Bitmask over DataClass (Public=1, Internal=2, Personal=4, Sensitive=8, Credential=16, Phi=32). |
|
CorrelationTraceId |
char(32) | ✓ | hex32. |
IntegrityBlockId |
ULID | Optional mirror for join-free proof lookups. | |
PayloadBytes |
int | ✓ | For paging/budget hints. |
Indexes
- PK:
(TenantId, AuditRecordId) - Sort/filter:
IX_AuditEvents_Tenant_CreatedAt (TenantId, CreatedAt DESC, AuditRecordId DESC) - Selectivity helpers:
(TenantId, ResourceType, ResourceId, CreatedAt DESC, AuditRecordId DESC)(TenantId, ActorId, CreatedAt DESC, AuditRecordId DESC)(TenantId, DecisionOutcome)(TenantId, DataClassFlags)
ChangedFieldsis a compact array (≤64 entries) extracted fromDelta.fieldskeys.
Resource timeline projection¶
ResourceEvents (subset tuned for GET /resources/{type}/{id}/events)
| Column | Type | Req. | Notes |
|---|---|---|---|
TenantId |
string | ✓ | |
ResourceType |
string | ✓ | |
ResourceId |
string | ✓ | |
Seq |
bigint | ✓ | Monotonic per (Tenant,Resource) (gapless best-effort). |
CreatedAt |
timestamp | ✓ | |
AuditRecordId |
ULID | ✓ | |
Action |
string | ✓ | |
ActorId |
string | ✓ | |
DecisionOutcome |
smallint | ||
ChangedFields |
nvarchar/json |
Indexes
- PK:
(TenantId, ResourceType, ResourceId, Seq) - Seek:
(TenantId, ResourceType, ResourceId, CreatedAt DESC, AuditRecordId DESC)
Actor activity projection¶
ActorEvents (subset tuned for GET /actors/{actorId}/events)
| Column | Type | Req. |
|---|---|---|
TenantId |
string | ✓ |
ActorId |
string | ✓ |
Seq |
bigint | ✓ |
CreatedAt |
timestamp | ✓ |
AuditRecordId |
ULID | ✓ |
Action |
string | ✓ |
ResourceType |
string | ✓ |
ResourceId |
string | ✓ |
DecisionOutcome |
smallint |
Indexes
- PK:
(TenantId, ActorId, Seq) - Seek:
(TenantId, ActorId, CreatedAt DESC, AuditRecordId DESC)
Watermarks & checkpoints¶
ProjectionCheckpoints (one row per tenant × projection)
| Column | Type | Req. | Notes |
|---|---|---|---|
Projection |
string | ✓ | e.g., AuditEvents, ResourceEvents, ActorEvents. |
TenantId |
string | ✓ | |
HighWaterRecordId |
ULID | ✓ | Last fully applied AuditRecordId. |
HighWaterObservedAt |
timestamp | ✓ | Tie-break/time sanity. |
Version |
int | ✓ | Projection schema/version. |
UpdatedAt |
timestamp | ✓ | Monotonic clock. |
RebuildToken |
nvarchar | Opaque state during rebuild (optional). |
Semantics
- At-least-once: projectors may re-process a record; all target tables use UPSERT on
(TenantId, AuditRecordId)or(TenantId,Key,Seq)with idempotent content. - Rebuild: set checkpoint to floor (
HighWaterRecordId = 000…), stream forward; keep writer-exclusive lease to avoid double writers.
Pagination cursors (seek)¶
Sort order: (CreatedAt ASC, AuditRecordId ASC) for forward, DESC for reverse listings.
Cursor payload (binary layout)
{ version:1, direction:'f'|'b', createdAtUtc: int64 (ms), auditRecordId: 26-byte ULID }
Encoded as base64url; opaque to clients.
Request parameters
cursor(string, optional)limit(1–1000; default 100)direction(forward|backward; defaultforward)
Next cursor generation
- For forward paging: take the last row’s
(CreatedAt, AuditRecordId)and encode. - For backward paging: use the first row’s keys.
Cursors are per-tenant; APIs must enforce that the cursor’s tenant matches the request’s tenant.
JSON Schemas (partials, v1)¶
events-list-response.v1.json
{
"$id": "urn:connectsoft:schemas:read/events-list-response.v1.json",
"$schema": "https://json-schema.org/draft/2020-12/schema",
"title": "EventsListResponse",
"type": "object",
"additionalProperties": false,
"properties": {
"items": {
"type": "array",
"items": { "$ref": "urn:connectsoft:schemas:read/event-row.v1.json" }
},
"next": { "type": "string" },
"prev": { "type": "string" },
"count": { "type": "integer", "minimum": 0 }
},
"required": ["items"]
}
event-row.v1.json (projection row)
{
"$id": "urn:connectsoft:schemas:read/event-row.v1.json",
"$schema": "https://json-schema.org/draft/2020-12/schema",
"title": "EventRow",
"type": "object",
"additionalProperties": false,
"properties": {
"auditRecordId": { "type": "string", "pattern": "^[0-9A-HJKMNP-TV-Z]{26}$" },
"createdAt": { "type": "string", "format": "date-time" },
"observedAt": { "type": "string", "format": "date-time" },
"action": { "type": "string" },
"resourceType": { "type": "string" },
"resourceId": { "type": "string" },
"actorId": { "type": "string" },
"actorType": { "type": "string" },
"decisionOutcome": { "type": "string" },
"changedFields": { "type": "array", "items": { "type": "string" }, "maxItems": 64 },
"dataClassFlags": { "type": "integer", "minimum": 0 }
},
"required": ["auditRecordId","createdAt","action","resourceType","resourceId","actorId"]
}
C# (gRPC code-first)¶
[DataContract]
public sealed class ListEventsRequest
{
[DataMember(Order = 1)] public string TenantId { get; init; } = default!;
[DataMember(Order = 2)] public string? Cursor { get; init; }
[DataMember(Order = 3)] public int Limit { get; init; } = 100;
[DataMember(Order = 4)] public string Direction { get; init; } = "forward"; // forward|backward
// Optional filters (applied when generating the page)
[DataMember(Order = 5)] public string? ResourceType { get; init; }
[DataMember(Order = 6)] public string? ResourceId { get; init; }
[DataMember(Order = 7)] public string? ActorId { get; init; }
[DataMember(Order = 8)] public string? Action { get; init; }
[DataMember(Order = 9)] public short? DecisionOutcome { get; init; }
[DataMember(Order = 10)] public short? DataClassFlags { get; init; } // bitmask
[DataMember(Order = 11)] public DateTimeOffset? From { get; init; }
[DataMember(Order = 12)] public DateTimeOffset? To { get; init; }
}
[DataContract]
public sealed class EventRow
{
[DataMember(Order = 1)] public string AuditRecordId { get; init; } = default!;
[DataMember(Order = 2)] public DateTimeOffset CreatedAt { get; init; }
[DataMember(Order = 3)] public DateTimeOffset ObservedAt { get; init; }
[DataMember(Order = 4)] public string Action { get; init; } = default!;
[DataMember(Order = 5)] public string ResourceType { get; init; } = default!;
[DataMember(Order = 6)] public string ResourceId { get; init; } = default!;
[DataMember(Order = 7)] public string ActorId { get; init; } = default!;
[DataMember(Order = 8)] public string ActorType { get; init; } = "Unknown";
[DataMember(Order = 9)] public string? DecisionOutcome { get; init; }
[DataMember(Order = 10)] public IReadOnlyList<string>? ChangedFields { get; init; }
[DataMember(Order = 11)] public short? DataClassFlags { get; init; }
}
[DataContract]
public sealed class ListEventsResponse
{
[DataMember(Order = 1)] public IReadOnlyList<EventRow> Items { get; init; } = Array.Empty<EventRow>();
[DataMember(Order = 2)] public string? Next { get; init; }
[DataMember(Order = 3)] public string? Prev { get; init; }
[DataMember(Order = 4)] public int Count { get; init; }
}
Cursor utility (example)
public static class CursorCodec
{
public static string Encode(DateTimeOffset createdAt, string auditRecordId, bool forward = true)
{
var dir = forward ? (byte)'f' : (byte)'b';
var ts = createdAt.ToUnixTimeMilliseconds();
Span<byte> buf = stackalloc byte[1 + 8 + 26];
buf[0] = dir;
BitConverter.TryWriteBytes(buf.Slice(1,8), System.Buffers.Binary.BinaryPrimitives.ReverseEndianness((long)ts));
System.Text.Encoding.ASCII.GetBytes(auditRecordId, buf.Slice(9, 26));
return Base64UrlEncode(buf);
}
public static (DateTimeOffset ts, string id, bool forward) Decode(string cursor) { /* inverse of Encode */ throw new NotImplementedException(); }
private static string Base64UrlEncode(ReadOnlySpan<byte> bytes) { var s = Convert.ToBase64String(bytes.ToArray()); return s.Replace('+','-').Replace('/','_').TrimEnd('='); }
}
Storage mapping (SQL Server / PostgreSQL)¶
SQL Server (illustrative)
CREATE TABLE dbo.AuditEvents (
TenantId NVARCHAR(128) NOT NULL,
AuditRecordId CHAR(26) NOT NULL,
CreatedAt DATETIME2(3) NOT NULL,
ObservedAt DATETIME2(3) NOT NULL,
Action NVARCHAR(64) NOT NULL,
ResourceType NVARCHAR(128) NOT NULL,
ResourceId NVARCHAR(128) NOT NULL,
ActorId NVARCHAR(128) NOT NULL,
ActorType SMALLINT NOT NULL,
DecisionOutcome SMALLINT NULL,
ChangedFields NVARCHAR(2000) NULL, -- JSON array
DataClassFlags SMALLINT NULL,
CorrelationTraceId CHAR(32) NOT NULL,
IntegrityBlockId CHAR(26) NULL,
PayloadBytes INT NOT NULL,
CONSTRAINT PK_AuditEvents PRIMARY KEY (TenantId, AuditRecordId)
);
CREATE INDEX IX_AuditEvents_Tenant_CreatedAt ON dbo.AuditEvents (TenantId, CreatedAt DESC, AuditRecordId DESC);
CREATE INDEX IX_AuditEvents_Tenant_Res ON dbo.AuditEvents (TenantId, ResourceType, ResourceId, CreatedAt DESC, AuditRecordId DESC);
CREATE INDEX IX_AuditEvents_Tenant_Actor ON dbo.AuditEvents (TenantId, ActorId, CreatedAt DESC, AuditRecordId DESC);
PostgreSQL (illustrative)
CREATE TABLE "AuditEvents" (
"TenantId" TEXT NOT NULL,
"AuditRecordId" CHAR(26) NOT NULL,
"CreatedAt" TIMESTAMPTZ NOT NULL,
"ObservedAt" TIMESTAMPTZ NOT NULL,
"Action" TEXT NOT NULL,
"ResourceType" TEXT NOT NULL,
"ResourceId" TEXT NOT NULL,
"ActorId" TEXT NOT NULL,
"ActorType" SMALLINT NOT NULL,
"DecisionOutcome" SMALLINT NULL,
"ChangedFields" JSONB NULL,
"DataClassFlags" SMALLINT NULL,
"CorrelationTraceId" CHAR(32) NOT NULL,
"IntegrityBlockId" CHAR(26) NULL,
"PayloadBytes" INTEGER NOT NULL,
PRIMARY KEY ("TenantId","AuditRecordId")
);
CREATE INDEX "IX_AE_Tenant_CreatedAt" ON "AuditEvents" ("TenantId","CreatedAt" DESC,"AuditRecordId" DESC);
CREATE INDEX "IX_AE_Tenant_Res" ON "AuditEvents" ("TenantId","ResourceType","ResourceId","CreatedAt" DESC,"AuditRecordId" DESC);
CREATE INDEX "IX_AE_Tenant_Actor" ON "AuditEvents" ("TenantId","ActorId","CreatedAt" DESC,"AuditRecordId" DESC);
Apply RLS policies keyed on
TenantIdexactly as in the authoritative store.
Projection build rules¶
- Input: stream authoritative
AuditRecordsordered by(TenantId, CreatedAt, AuditRecordId)(or by ULID time). - For each record:
- Compute
ChangedFields= keys ofDelta.fields(bounded to 64). - Compute
DataClassFlagsfrom the record’s classification tags. - Upsert into
AuditEvents. - Upsert into
ResourceEventswhenresourcepresent; append aSeq = lastSeq+1per(TenantId, ResourceType, ResourceId). - Upsert into
ActorEventswithSeq = lastSeq+1per(TenantId, ActorId).
- Compute
- Checkpoint: After a successful batch, advance
ProjectionCheckpoints.HighWater*.
Idempotency
- All upserts keyed by
(TenantId, AuditRecordId)must be deterministic; repeated processing of the same record produces the same row.
Purges
- When lifecycle purges authoritative rows, delete corresponding projection rows (foreign-key cascade or projector “tombstone” stream).
API examples¶
List tenant events (forward, first page)
{
"items": [
{
"auditRecordId": "01JE6KQQD0Q0J5VQ8WJ6T1S9FX",
"createdAt": "2025-10-22T15:43:11.281Z",
"observedAt": "2025-10-22T15:43:11.500Z",
"action": "appointment.update",
"resourceType": "Vetspire.Appointment",
"resourceId": "A-9981",
"actorId": "user_123",
"actorType": "User",
"decisionOutcome": "Allow",
"changedFields": ["status","/lines/0/price"],
"dataClassFlags": 12
}
],
"next": "eyJ2IjoxLCJkIjoiZiIsInQiOjE3Mjk2NTUyOTEyODEsImlkIjoiMDFKRTZLU..." ,
"count": 1
}
List resource timeline (seek)
{
"items": [
{
"auditRecordId": "01JE6KR2P2FT7DSSX9W7EJQ2DT",
"createdAt": "2025-10-22T15:45:00.002Z",
"action": "appointment.read",
"actorId": "svc_gw",
"decisionOutcome": "NotApplicable",
"resourceType": "Vetspire.Appointment",
"resourceId": "A-9981"
}
],
"next": "eyJ2IjoxLCJkIjoiZiIsInQiOjE3Mjk2NTU0MDAwMDIsImlkIjoiMDFKRTZL..."
}
Budgets & caps¶
- Page size: 1–1000 (default 100).
- ChangedFields: ≤ 64 entries; strings ≤ 128 chars each.
- Projections rebuild time: parallelized per tenant/shard; no cross-tenant fan-in.
- Checkpoint lag SLO: configurable (e.g., p95 < 60 seconds from authoritative write).
Validation rules (summary)¶
- Per-tenant RLS applied for all projection tables.
- Cursors must decode to monotonic coordinates and match the request tenant.
- Projectors must never mutate authoritative payloads; projections are delete/rebuild only.
- During rebuilds, target tables can be shadowed (
…_Rebuild) and swapped atomically.
Search Index Schema (Optional)¶
Defines per-tenant search indexes to power full-text search, filtering, and type-ahead suggestions over projected audit events. Search indexes are derived, redacted, and rebuildable; they MUST never store more than the effective Redaction Plan allows.
JSON docs use lowerCamel. C# POCOs (for producers/clients) use PascalCase. Index names and fields include
tenantIdfor strict multi-tenancy.
Overview¶
- Per-tenant aliasing: Prefer one index alias per tenant (either mapping to a dedicated physical index or a filtered multi-tenant index).
- Fields for search: action, resource, actor, time, decision outcome, changed fields, and a compact
searchTextblob for catch-all text queries. - Suggest:
completionsuggesters for resource IDs and actor IDs;search_as_you_typeor edge-ngrams for action/resource types. - Analyzers: email/URL aware tokenization; keyword+lowercase normalizers for exact filters; hierarchical analyzer for
resource.path. - Lifecycle: rollover by size/time; ILM policy to delete index shards at or before Retention windows (often ≤ authoritative retention).
- Reindex strategy: versioned index names with write/read aliases per tenant; zero-downtime rebuild + alias swap.
Index naming & tenancy¶
- Dedicated index per tenant (preferred for 100s–1Ks of tenants):
audit-{tenantId}-v{schemaVersion}-{yyyy.MM}(monthly rollover)
Aliases: - write alias:
audit-{tenantId}-write -
read alias:
audit-{tenantId} -
Shared (multi-tenant) index (for 10Ks+ tenants):
audit-shared-v{schemaVersion}-{yyyy.MM}with filtered read aliases:
audit-{tenantId}alias → filterterm: { tenantId: "<tenantId>" }
⚠️ All queries must enforce a musttenantIdterm; index-level RBAC required.
Indexed document (logical shape)¶
| Field (JSON) | Type | Notes |
|---|---|---|
tenantId |
keyword | Hard filter for ALL queries. |
auditRecordId |
keyword | ULID string; unique within tenant. |
createdAt |
date | UTC; primary sort. |
observedAt |
date | Secondary time. |
action |
text + keyword | verb or verb.noun; keyword subfield for exact. |
resourceType |
text(sa yt) + keyword | PascalCase; search-as-you-type and exact. |
resourceId |
keyword + completion | Opaque id; suggester enabled. |
resourcePath |
text(path_hierarchy) | Optional path (JSON Pointer style). |
actorId |
keyword + completion | Opaque id; suggester enabled. |
actorType |
keyword | Enum name. |
decisionOutcome |
keyword | Enum name if present. |
changedFields |
keyword | Multi-valued; from delta.fields keys. |
dataClassFlags |
integer | Bitmask for quick filtering. |
payloadBytes |
integer | For query budgeting. |
searchText |
text | Concatenated, redacted text for catch-all queries. |
schemaVersion |
short | Index doc version for reindexing. |
Only redacted values are indexed (e.g., hashed email, masked IP). Never index raw PII beyond plan.
OpenSearch/Elasticsearch mapping (template)¶
{
"index_patterns": ["audit-*"],
"template": {
"settings": {
"index.lifecycle.name": "audit-ilm",
"index.refresh_interval": "5s",
"analysis": {
"analyzer": {
"edge_en": { "tokenizer": "edge_ngram_tok", "filter": ["lowercase"] },
"path_hierarchy_an": { "tokenizer": "path_hierarchy", "filter": ["lowercase"] }
},
"tokenizer": {
"edge_ngram_tok": { "type": "edge_ngram", "min_gram": 2, "max_gram": 15, "token_chars": ["letter","digit"] }
},
"normalizer": {
"kw_lower": { "type": "custom", "filter": ["lowercase"] }
}
}
},
"mappings": {
"dynamic": "false",
"properties": {
"tenantId": { "type": "keyword", "normalizer": "kw_lower" },
"auditRecordId": { "type": "keyword" },
"createdAt": { "type": "date" },
"observedAt": { "type": "date" },
"action": {
"type": "text",
"fields": { "kw": { "type": "keyword", "normalizer": "kw_lower" }, "sug": { "type": "search_as_you_type" } }
},
"resourceType": {
"type": "text",
"fields": { "kw": { "type": "keyword" }, "sug": { "type": "search_as_you_type" } }
},
"resourceId": {
"type": "keyword",
"fields": { "suggest": { "type": "completion" } }
},
"resourcePath": { "type": "text", "analyzer": "path_hierarchy_an" },
"actorId": {
"type": "keyword",
"fields": { "suggest": { "type": "completion" } }
},
"actorType": { "type": "keyword" },
"decisionOutcome":{ "type": "keyword" },
"changedFields": { "type": "keyword" },
"dataClassFlags": { "type": "integer" },
"payloadBytes": { "type": "integer" },
"searchText": { "type": "text" },
"schemaVersion": { "type": "short" }
}
}
},
"priority": 500
}
ILM policy example
{
"policy": {
"phases": {
"hot": {
"actions": {
"rollover": { "max_primary_shard_size": "30gb", "max_age": "7d" }
}
},
"warm": { "min_age": "30d", "actions": { "forcemerge": { "max_num_segments": 1 } } },
"delete": { "min_age": "90d", "actions": { "delete": {} } }
}
}
}
Tune
min_ageto not exceed tenant Retention. For tenants with longer retention, apply a custom ILM bound to their alias.
C# document contract (producer/client)¶
public sealed class SearchEvent
{
public string TenantId { get; init; } = default!;
public string AuditRecordId { get; init; } = default!;
public DateTimeOffset CreatedAt { get; init; }
public DateTimeOffset ObservedAt { get; init; }
public string Action { get; init; } = default!;
public string ResourceType { get; init; } = default!;
public string ResourceId { get; init; } = default!;
public string? ResourcePath { get; init; }
public string ActorId { get; init; } = default!;
public string ActorType { get; init; } = "Unknown";
public string? DecisionOutcome { get; init; }
public IReadOnlyList<string>? ChangedFields { get; init; }
public short? DataClassFlags { get; init; }
public int PayloadBytes { get; init; }
public string SearchText { get; init; } = ""; // redacted concat
public short SchemaVersion { get; init; } = 1;
}
When serializing to JSON for indexing, ensure lowerCamel property names (e.g., System.Text.Json with
PropertyNamingPolicy = CamelCase).
Ingest pipeline (recommended)¶
- Redaction step: apply the Redaction Plan to each field destined for search (e.g., hash email, mask IP).
- searchText construction: concatenate safe fields (action, resourceType, resourceId, actorId, changedFields, small text fragments from
delta) into a single text field. - Suggest inputs: populate
resourceId.suggestandactorId.suggestwith the same values; optionally include synonyms/aliases. - DataClassFlags: compute bitmask from record classifications for fast filtering.
Queries (examples)¶
Tenant-scoped free text + filters
{
"query": {
"bool": {
"must": [{ "query_string": { "query": "book* OR status:Booked", "fields": ["searchText","action","resourceType.sug"] } }],
"filter": [
{ "term": { "tenantId": "splootvets" } },
{ "term": { "resourceType.kw": "Vetspire.Appointment" } },
{ "range": { "createdAt": { "gte": "now-7d/d" } } }
]
}
},
"sort": [{ "createdAt": "desc" }, { "auditRecordId": "desc" }],
"size": 50
}
Type-ahead suggestion for resourceId
{
"suggest": {
"resid": { "prefix": "A-99", "completion": { "field": "resourceId.suggest", "skip_duplicates": true, "size": 5 } }
}
}
Per-actor activity (exact)
{
"query": {
"bool": {
"filter": [
{ "term": { "tenantId": "splootvets" } },
{ "term": { "actorId": "user_123" } }
]
}
}
}
Reindex strategy (zero-downtime)¶
- Bump
schemaVersionwhen mapping/analysis changes. - Create new physical index
audit-{tenantId}-v{N+1}-000001with the updated template. - Backfill by replaying projection stream (or use
_reindexfrom old alias → new write alias with a reprocessor that re-applies redaction). - Dual write temporarily (optional) to old and new write aliases to converge.
- Swap read alias
audit-{tenantId}to point exclusively to v{N+1}. - Freeze and delete old indices post verification (checksum doc counts, sample queries).
Keep a compat window where both versions are queryable if you expose API
indexVersionparameters.
Retention for index docs¶
- Hot: 0–7 days (fast refresh, frequent rollovers).
- Warm: 7–30 days (force-merge, slower refresh).
- Delete: at or before the record’s Retention
keepUntil, unless tenant policy requires full parity. - Legal hold: optionally pin affected shards via index block or copy matching docs to a hold index until release.
Sharding & capacity¶
- Target primary shard size 20–50 GB post-merge; prefer more smaller shards for heavy ingest tenants.
- Refresh interval
5sin hot phase,60sin warm phase. - Limit doc size: ≤ 8 KiB typical (search doc is compact); avoid embedding full
deltabodies—only keys/summary.
Validation rules (summary)¶
- All queries include
tenantIdterm (or use filtered read aliases). - Only redacted values are indexed; no secrets/credentials/PHI raw values.
schemaVersionin the doc matches the current template version.- ILM policy applied to all indices; rollover and delete actions succeed before shard limits.
- Reindex tooling verifies doc count parity (± expected drops due to ILM) and sample checksum of
auditRecordIdsets.
Event Contracts (Published Language)¶
Catalog of domain events the Audit Trail Platform (ATP) emits/consumes. Events use a common envelope, are tenant-scoped, and are designed for at-least-once delivery with backward-compatible evolution.
JSON uses lowerCamel; C# (gRPC code-first) uses PascalCase. Protobuf fields use PascalCase with
json_namemapped to lowerCamel.
Overview¶
- Transport-agnostic payloads (Kafka/NATS/Service Bus friendly).
- Per-tenant partitioning (partition key =
tenantId). - Correlation-friendly with OTel-compatible
traceId(see Correlation & Provenance). - Small, focused
datasections; large artifacts referenced via URIs (e.g., export files). - Versioned schemas with additive-first evolution.
Envelope¶
All events share a minimal, stable header plus a type-specific data object.
| JSON (lowerCamel) | C# (PascalCase) | Type | Req. | Notes |
|---|---|---|---|---|
eventId |
EventId |
ULID | ✓ | Unique per event (not per record). |
eventType |
EventType |
string | ✓ | Namespaced, e.g., connectsoft.audit.v1/AuditRecord.Appended. |
tenantId |
TenantId |
string | ✓ | Partition/authorization key. |
publishedAt |
PublishedAt |
timestamp | ✓ | UTC time the event was published. |
traceId |
TraceId |
hex32 | ✓ | W3C trace id (from correlation). |
causationId |
CausationId |
ULID | Event id that caused this emission (if any). | |
schemaVersion |
SchemaVersion |
string | ✓ | e.g., event-envelope.v1. |
producer |
Producer |
string | ✓ | Logical service name/version. |
data |
Data |
object | ✓ | Type-specific payload (below). |
CloudEvents mapping (optional):
eventId → id,eventType → type,publishedAt → time,tenantId → subject ("tenant:<id>"),producer → source,data → data.
Event types¶
1) AuditRecord.Appended – emitted when an audit fact is durably persisted to the authoritative store.
data:
auditRecordId(ULID)createdAt,observedAt(timestamps)action(string),resourceType(string),resourceId(string)actorId(string),actorType(enum name)hasDelta(bool),dataClassFlags(int bitmask)payloadBytes(int)
2) AuditRecord.Accepted – idempotent ack for producers; emitted even on retry/duplicate.
data:
auditRecordId(ULID)idempotencyKey(string?)status("Created"|"Duplicate")createdAt,observedAt
3) Projection.Updated – a projection row has been upserted (e.g., AuditEvents).
data:
projection("AuditEvents" | "ResourceEvents" | "ActorEvents")auditRecordId(ULID)checkpoint(object){ "highWaterRecordId": ULID, "highWaterObservedAt": timestamp, "version": int }
4) Integrity.ProofComputed – an IntegrityBlock sealed; proofs available.
data:
blockId(ULID),sealedAt(timestamp)segmentCount(int),recordCount(long)blockRoot(hex64),prevBlockRoot(hex64)signature{ "scheme": "Ed25519"|"PKCS7", "signingKeyId": string }
5) Export.Requested – an export job created/started.
data:
jobId(ULID),createdAt(timestamp)filter(object; summarized)format("Jsonl"|"Parquet"),includeIntegrity(bool)redactionPlan{ "id": string, "revision": int }
6) Export.Completed – export job finished (success or failure).
data:
jobId(ULID),state("Completed"|"Failed"|"Canceled"),reason(string?)packageCount(int),recordCount(long),bytesUncompressed(long)manifests(array of URIs or ids)
7) Policy.Changed – a policy revision becomes effective (Retention/Redaction/Residency).
data:
kind("Retention"|"Redaction"|"Residency")id(string),revision(int),effectiveFromUtc(timestamp)previousRevision(int?)
Topics & partitioning (illustrative)¶
atp.audit.v1→AuditRecord.*(partition key =tenantId)atp.integrity.v1→Integrity.*(partition key =tenantId, secondary route byblockIdif supported)atp.projection.v1→Projection.*(partition key =tenantId)atp.export.v1→Export.*(partition key =tenantId)atp.policy.v1→Policy.*(partition key =tenantId)
Ordering within a partition is preserved by transport; do not rely on cross-partition ordering.
JSON Schemas (v1)¶
event-envelope.v1.json
{
"$id": "urn:connectsoft:schemas:events/event-envelope.v1.json",
"$schema": "https://json-schema.org/draft/2020-12/schema",
"title": "EventEnvelope",
"type": "object",
"additionalProperties": false,
"properties": {
"eventId": { "type": "string", "pattern": "^[0-9A-HJKMNP-TV-Z]{26}$" },
"eventType": { "type": "string", "maxLength": 128 },
"tenantId": { "type": "string", "pattern": "^[A-Za-z0-9._-]{1,128}$" },
"publishedAt": { "type": "string", "format": "date-time" },
"traceId": { "type": "string", "pattern": "^[a-f0-9]{32}$" },
"causationId": { "type": "string", "pattern": "^[0-9A-HJKMNP-TV-Z]{26}$" },
"schemaVersion": { "type": "string", "pattern": "^event-envelope\\.v[0-9]+$" },
"producer": { "type": "string", "maxLength": 64 },
"data": { "type": "object" }
},
"required": ["eventId","eventType","tenantId","publishedAt","traceId","schemaVersion","producer","data"]
}
auditrecord.appended.v1.json
{
"$id": "urn:connectsoft:schemas:events/auditrecord.appended.v1.json",
"$schema": "https://json-schema.org/draft/2020-12/schema",
"title": "AuditRecord.Appended",
"type": "object",
"additionalProperties": false,
"properties": {
"auditRecordId": { "type": "string", "pattern": "^[0-9A-HJKMNP-TV-Z]{26}$" },
"createdAt": { "type": "string", "format": "date-time" },
"observedAt": { "type": "string", "format": "date-time" },
"action": { "type": "string" },
"resourceType": { "type": "string" },
"resourceId": { "type": "string" },
"actorId": { "type": "string" },
"actorType": { "type": "string" },
"hasDelta": { "type": "boolean" },
"dataClassFlags": { "type": "integer", "minimum": 0 },
"payloadBytes": { "type": "integer", "minimum": 0 }
},
"required": ["auditRecordId","createdAt","observedAt","action","resourceType","resourceId","actorId","actorType","payloadBytes"]
}
auditrecord.accepted.v1.json
{
"$id": "urn:connectsoft:schemas:events/auditrecord.accepted.v1.json",
"$schema": "https://json-schema.org/draft/2020-12/schema",
"title": "AuditRecord.Accepted",
"type": "object",
"additionalProperties": false,
"properties": {
"auditRecordId": { "type": "string", "pattern": "^[0-9A-HJKMNP-TV-Z]{26}$" },
"idempotencyKey": { "type": "string", "maxLength": 128 },
"status": { "type": "string", "enum": ["Created","Duplicate"] },
"createdAt": { "type": "string", "format": "date-time" },
"observedAt": { "type": "string", "format": "date-time" }
},
"required": ["auditRecordId","status","observedAt"]
}
projection.updated.v1.json
{
"$id": "urn:connectsoft:schemas:events/projection.updated.v1.json",
"$schema": "https://json-schema.org/draft/2020-12/schema",
"title": "Projection.Updated",
"type": "object",
"additionalProperties": false,
"properties": {
"projection": { "type": "string", "enum": ["AuditEvents","ResourceEvents","ActorEvents"] },
"auditRecordId": { "type": "string", "pattern": "^[0-9A-HJKMNP-TV-Z]{26}$" },
"checkpoint": {
"type": "object",
"additionalProperties": false,
"properties": {
"highWaterRecordId": { "type": "string", "pattern": "^[0-9A-HJKMNP-TV-Z]{26}$" },
"highWaterObservedAt": { "type": "string", "format": "date-time" },
"version": { "type": "integer", "minimum": 1 }
},
"required": ["highWaterRecordId","highWaterObservedAt","version"]
}
},
"required": ["projection","auditRecordId","checkpoint"]
}
integrity.proofcomputed.v1.json
{
"$id": "urn:connectsoft:schemas:events/integrity.proofcomputed.v1.json",
"$schema": "https://json-schema.org/draft/2020-12/schema",
"title": "Integrity.ProofComputed",
"type": "object",
"additionalProperties": false,
"properties": {
"blockId": { "type": "string", "pattern": "^[0-9A-HJKMNP-TV-Z]{26}$" },
"sealedAt": { "type": "string", "format": "date-time" },
"segmentCount": { "type": "integer", "minimum": 1 },
"recordCount": { "type": "integer", "minimum": 1 },
"blockRoot": { "type": "string", "pattern": "^[a-f0-9]{64}$" },
"prevBlockRoot": { "type": "string", "pattern": "^[a-f0-9]{64}$" },
"signature": {
"type": "object",
"additionalProperties": false,
"properties": {
"scheme": { "type": "string", "enum": ["Ed25519","PKCS7"] },
"signingKeyId": { "type": "string", "maxLength": 128 }
},
"required": ["scheme","signingKeyId"]
}
},
"required": ["blockId","sealedAt","segmentCount","recordCount","blockRoot","prevBlockRoot","signature"]
}
export.requested.v1.json
{
"$id": "urn:connectsoft:schemas:events/export.requested.v1.json",
"$schema": "https://json-schema.org/draft/2020-12/schema",
"title": "Export.Requested",
"type": "object",
"additionalProperties": false,
"properties": {
"jobId": { "type": "string", "pattern": "^[0-9A-HJKMNP-TV-Z]{26}$" },
"createdAt": { "type": "string", "format": "date-time" },
"format": { "type": "string", "enum": ["Jsonl","Parquet"] },
"includeIntegrity": { "type": "boolean" },
"filter": { "type": "object" },
"redactionPlan": {
"type": "object",
"additionalProperties": false,
"properties": { "id": { "type": "string" }, "revision": { "type": "integer" } },
"required": ["id","revision"]
}
},
"required": ["jobId","createdAt","format","redactionPlan"]
}
export.completed.v1.json
{
"$id": "urn:connectsoft:schemas:events/export.completed.v1.json",
"$schema": "https://json-schema.org/draft/2020-12/schema",
"title": "Export.Completed",
"type": "object",
"additionalProperties": false,
"properties": {
"jobId": { "type": "string", "pattern": "^[0-9A-HJKMNP-TV-Z]{26}$" },
"state": { "type": "string", "enum": ["Completed","Failed","Canceled"] },
"reason": { "type": "string", "maxLength": 256 },
"packageCount": { "type": "integer", "minimum": 0 },
"recordCount": { "type": "integer", "minimum": 0 },
"bytesUncompressed": { "type": "integer", "minimum": 0 },
"manifests": { "type": "array", "items": { "type": "string" }, "maxItems": 1000 }
},
"required": ["jobId","state"]
}
policy.changed.v1.json
{
"$id": "urn:connectsoft:schemas:events/policy.changed.v1.json",
"$schema": "https://json-schema.org/draft/2020-12/schema",
"title": "Policy.Changed",
"type": "object",
"additionalProperties": false,
"properties": {
"kind": { "type": "string", "enum": ["Retention","Redaction","Residency"] },
"id": { "type": "string", "maxLength": 128 },
"revision": { "type": "integer", "minimum": 1 },
"effectiveFromUtc": { "type": "string", "format": "date-time" },
"previousRevision": { "type": "integer", "minimum": 1 }
},
"required": ["kind","id","revision","effectiveFromUtc"]
}
C# (gRPC code-first)¶
[DataContract]
public sealed class EventEnvelope<T>
{
[DataMember(Order = 1)] public string EventId { get; init; } = default!; // ULID
[DataMember(Order = 2)] public string EventType { get; init; } = default!;
[DataMember(Order = 3)] public string TenantId { get; init; } = default!;
[DataMember(Order = 4)] public DateTimeOffset PublishedAt { get; init; }
[DataMember(Order = 5)] public string TraceId { get; init; } = default!; // hex32
[DataMember(Order = 6)] public string? CausationId { get; init; } // ULID
[DataMember(Order = 7)] public string SchemaVersion { get; init; } = "event-envelope.v1";
[DataMember(Order = 8)] public string Producer { get; init; } = default!;
[DataMember(Order = 9)] public T Data { get; init; } = default!;
}
[DataContract] public sealed class AuditRecordAppended {
[DataMember(Order = 1)] public string AuditRecordId { get; init; } = default!;
[DataMember(Order = 2)] public DateTimeOffset CreatedAt { get; init; }
[DataMember(Order = 3)] public DateTimeOffset ObservedAt { get; init; }
[DataMember(Order = 4)] public string Action { get; init; } = default!;
[DataMember(Order = 5)] public string ResourceType { get; init; } = default!;
[DataMember(Order = 6)] public string ResourceId { get; init; } = default!;
[DataMember(Order = 7)] public string ActorId { get; init; } = default!;
[DataMember(Order = 8)] public string ActorType { get; init; } = "Unknown";
[DataMember(Order = 9)] public bool HasDelta { get; init; }
[DataMember(Order = 10)] public short? DataClassFlags { get; init; }
[DataMember(Order = 11)] public int PayloadBytes { get; init; }
}
[DataContract] public sealed class AuditRecordAccepted {
[DataMember(Order = 1)] public string AuditRecordId { get; init; } = default!;
[DataMember(Order = 2)] public string? IdempotencyKey { get; init; }
[DataMember(Order = 3)] public string Status { get; init; } = "Created"; // Created|Duplicate
[DataMember(Order = 4)] public DateTimeOffset? CreatedAt { get; init; }
[DataMember(Order = 5)] public DateTimeOffset ObservedAt { get; init; }
}
[DataContract] public sealed class ProjectionUpdated {
[DataMember(Order = 1)] public string Projection { get; init; } = default!; // AuditEvents|ResourceEvents|ActorEvents
[DataMember(Order = 2)] public string AuditRecordId { get; init; } = default!;
[DataMember(Order = 3)] public ProjectionCheckpoint Checkpoint { get; init; } = new();
}
[DataContract] public sealed class ProjectionCheckpoint {
[DataMember(Order = 1)] public string HighWaterRecordId { get; init; } = default!;
[DataMember(Order = 2)] public DateTimeOffset HighWaterObservedAt { get; init; }
[DataMember(Order = 3)] public int Version { get; init; }
}
[DataContract] public sealed class IntegrityProofComputed {
[DataMember(Order = 1)] public string BlockId { get; init; } = default!;
[DataMember(Order = 2)] public DateTimeOffset SealedAt { get; init; }
[DataMember(Order = 3)] public int SegmentCount { get; init; }
[DataMember(Order = 4)] public long RecordCount { get; init; }
[DataMember(Order = 5)] public string BlockRoot { get; init; } = default!;
[DataMember(Order = 6)] public string PrevBlockRoot { get; init; } = default!;
[DataMember(Order = 7)] public SignatureHeader Signature { get; init; } = new();
}
[DataContract] public sealed class SignatureHeader {
[DataMember(Order = 1)] public string Scheme { get; init; } = "Ed25519";
[DataMember(Order = 2)] public string SigningKeyId { get; init; } = default!;
}
[DataContract] public sealed class ExportRequested {
[DataMember(Order = 1)] public string JobId { get; init; } = default!;
[DataMember(Order = 2)] public DateTimeOffset CreatedAt { get; init; }
[DataMember(Order = 3)] public string Format { get; init; } = "Jsonl";
[DataMember(Order = 4)] public bool IncludeIntegrity { get; init; }
[DataMember(Order = 5)] public object? Filter { get; init; } // summary
[DataMember(Order = 6)] public RedactionPlanRef RedactionPlan { get; init; } = new();
}
[DataContract] public sealed class ExportCompleted {
[DataMember(Order = 1)] public string JobId { get; init; } = default!;
[DataMember(Order = 2)] public string State { get; init; } = "Completed"; // Completed|Failed|Canceled
[DataMember(Order = 3)] public string? Reason { get; init; }
[DataMember(Order = 4)] public int? PackageCount { get; init; }
[DataMember(Order = 5)] public long? RecordCount { get; init; }
[DataMember(Order = 6)] public long? BytesUncompressed { get; init; }
[DataMember(Order = 7)] public IReadOnlyList<string>? Manifests { get; init; }
}
[DataContract] public sealed class PolicyChanged {
[DataMember(Order = 1)] public string Kind { get; init; } = default!; // Retention|Redaction|Residency
[DataMember(Order = 2)] public string Id { get; init; } = default!;
[DataMember(Order = 3)] public int Revision { get; init; }
[DataMember(Order = 4)] public DateTimeOffset EffectiveFromUtc { get; init; }
[DataMember(Order = 5)] public int? PreviousRevision { get; init; }
}
Protobuf (optional emission)¶
syntax = "proto3";
package connectsoft.events.v1;
message EventEnvelope {
string EventId = 1 [json_name = "eventId"];
string EventType = 2 [json_name = "eventType"];
string TenantId = 3 [json_name = "tenantId"];
google.protobuf.Timestamp PublishedAt = 4 [json_name = "publishedAt"];
string TraceId = 5 [json_name = "traceId"];
string CausationId = 6 [json_name = "causationId"];
string SchemaVersion = 7 [json_name = "schemaVersion"];
string Producer = 8 [json_name = "producer"];
google.protobuf.Any Data = 9 [json_name = "data"];
}
message AuditRecordAppended {
string AuditRecordId = 1 [json_name = "auditRecordId"];
google.protobuf.Timestamp CreatedAt = 2 [json_name = "createdAt"];
google.protobuf.Timestamp ObservedAt = 3 [json_name = "observedAt"];
string Action = 4 [json_name = "action"];
string ResourceType = 5 [json_name = "resourceType"];
string ResourceId = 6 [json_name = "resourceId"];
string ActorId = 7 [json_name = "actorId"];
string ActorType = 8 [json_name = "actorType"];
bool HasDelta = 9 [json_name = "hasDelta"];
int32 DataClassFlags = 10 [json_name = "dataClassFlags"];
int32 PayloadBytes = 11 [json_name = "payloadBytes"];
}
message AuditRecordAccepted {
string AuditRecordId = 1 [json_name = "auditRecordId"];
string IdempotencyKey = 2 [json_name = "idempotencyKey"];
string Status = 3 [json_name = "status"]; // Created|Duplicate
google.protobuf.Timestamp CreatedAt = 4 [json_name = "createdAt"];
google.protobuf.Timestamp ObservedAt = 5 [json_name = "observedAt"];
}
message ProjectionUpdated {
string Projection = 1 [json_name = "projection"];
string AuditRecordId = 2 [json_name = "auditRecordId"];
ProjectionCheckpoint Checkpoint = 3 [json_name = "checkpoint"];
}
message ProjectionCheckpoint {
string HighWaterRecordId = 1 [json_name = "highWaterRecordId"];
google.protobuf.Timestamp HighWaterObservedAt = 2 [json_name = "highWaterObservedAt"];
int32 Version = 3 [json_name = "version"];
}
message IntegrityProofComputed {
string BlockId = 1 [json_name = "blockId"];
google.protobuf.Timestamp SealedAt = 2 [json_name = "sealedAt"];
int32 SegmentCount = 3 [json_name = "segmentCount"];
int64 RecordCount = 4 [json_name = "recordCount"];
string BlockRoot = 5 [json_name = "blockRoot"];
string PrevBlockRoot = 6 [json_name = "prevBlockRoot"];
SignatureHeader Signature = 7 [json_name = "signature"];
}
message SignatureHeader {
string Scheme = 1 [json_name = "scheme"];
string SigningKeyId = 2 [json_name = "signingKeyId"];
}
message ExportRequested {
string JobId = 1 [json_name = "jobId"];
google.protobuf.Timestamp CreatedAt = 2 [json_name = "createdAt"];
string Format = 3 [json_name = "format"];
bool IncludeIntegrity = 4 [json_name = "includeIntegrity"];
google.protobuf.Struct Filter = 5 [json_name = "filter"];
RedactionPlanRef RedactionPlan = 6 [json_name = "redactionPlan"];
}
message ExportCompleted {
string JobId = 1 [json_name = "jobId"];
string State = 2 [json_name = "state"]; // Completed|Failed|Canceled
string Reason = 3 [json_name = "reason"];
int32 PackageCount = 4 [json_name = "packageCount"];
int64 RecordCount = 5 [json_name = "recordCount"];
int64 BytesUncompressed = 6 [json_name = "bytesUncompressed"];
repeated string Manifests = 7 [json_name = "manifests"];
}
message RedactionPlanRef { string Id = 1 [json_name = "id"]; int32 Revision = 2 [json_name = "revision"]; }
message PolicyChanged {
string Kind = 1 [json_name = "kind"]; // Retention|Redaction|Residency
string Id = 2 [json_name = "id"];
int32 Revision = 3 [json_name = "revision"];
google.protobuf.Timestamp EffectiveFromUtc = 4 [json_name = "effectiveFromUtc"];
int32 PreviousRevision = 5 [json_name = "previousRevision"];
}
Examples¶
AuditRecord.Appended
{
"eventId": "01JE7DE6X1J6J7KJ6G7VQ5T5S4",
"eventType": "connectsoft.audit.v1/AuditRecord.Appended",
"tenantId": "splootvets",
"publishedAt": "2025-10-22T16:05:12.345Z",
"traceId": "3e1f2d0c9b8a7f6e5d4c3b2a19081716",
"schemaVersion": "event-envelope.v1",
"producer": "ingress-gw/2.4.1",
"data": {
"auditRecordId": "01JE7D9ZQ1D7J6H2DZX7HQB6XB",
"createdAt": "2025-10-22T16:05:12.100Z",
"observedAt": "2025-10-22T16:05:12.320Z",
"action": "appointment.update",
"resourceType": "Vetspire.Appointment",
"resourceId": "A-9981",
"actorId": "user_123",
"actorType": "User",
"hasDelta": true,
"dataClassFlags": 12,
"payloadBytes": 1536
}
}
Integrity.ProofComputed
{
"eventId": "01JE7E1V2Q3W4E5R6T7Y8U9I0O",
"eventType": "connectsoft.audit.v1/Integrity.ProofComputed",
"tenantId": "splootvets",
"publishedAt": "2025-10-22T16:10:00Z",
"traceId": "9c8b7a6f5e4d3c2b1a09182736455443",
"schemaVersion": "event-envelope.v1",
"producer": "integrity-svc/1.3.0",
"data": {
"blockId": "01JE7E0B5V2C6M9N3X7Z4K2J8L",
"sealedAt": "2025-10-22T16:09:58Z",
"segmentCount": 8,
"recordCount": 1024,
"blockRoot": "8a7b6c...d1",
"prevBlockRoot": "7f6e5d...aa",
"signature": { "scheme": "Ed25519", "signingKeyId": "kv:prod/atp-integrity/ed25519-2025-01" }
}
}
Reprocessing & idempotency¶
- Consumers MUST treat events as at-least-once and be idempotent using
(tenantId, auditRecordId)or the event’s business id. AuditRecord.Acceptedmay be emitted for duplicates (status="Duplicate").Projection.Updatedis informational; rebuilds should not depend on it for correctness.
Evolution & compatibility¶
- New fields are additive and optional.
- Breaking changes require a new schema id (e.g.,
…v2) and a neweventTypesuffix. - Producers MUST keep emitting previous versions until all critical consumers upgrade (dual-publish window).
Validation rules (summary)¶
- Envelope is present and valid;
eventTypematches thedataschema. tenantIdis non-empty and matches tenancy pattern.traceIdis W3C hex32;eventId/causationIdare ULIDs.- Topic routing key must be
tenantId. - Events must not contain raw data beyond the active Redaction Plan.
Schema Evolution & Compatibility¶
Defines how models change safely across JSON, Protobuf, SQL, search indexes, and events. Our strategy is additive-first, forward-tolerant, and backward-compatible with clear deprecation timelines and a resolvable schema registry.
JSON uses lowerCamel; C#/gRPC code-first uses PascalCase; DB tables/columns use PascalCase.
Principles¶
- Additive first: Prefer adding optional fields over changing/removing existing ones.
- Must-ignore: Readers must ignore unknown fields and preserve them when round-tripping (where the transport supports it).
- Deterministic hashing: Integrity hashes are computed over canonical JSON (RFC8785) of the authoritative payload excluding
integrity. Never mutate stored payloads for backfills. - Version in-payload: Domain objects include a
schemaVersion(string, e.g.,auditrecord.v1). Event envelopes include their own version id (e.g.,event-envelope.v1). - Registry-resolvable: Every schema’s
$idis a stable URN resolvable via the Schema Registry. - Compatibility gates: CI enforces additive changes; breaking changes require a new major schema id (
…v2) and migration plan.
Versioning scheme¶
| Artifact | Versioning | Compatibility |
|---|---|---|
| JSON Schemas | …vN in $id and schemaVersion field |
Add fields/enums → backward/forward OK; breaking change → new vN+1 id |
| Protobuf (proto3) | Field numbers immutable; reserved on removal |
Add fields with new numbers; never reuse numbers; mark obsolete via deprecated |
| C# (gRPC code-first) | DataMember(Order=N) immutable |
Add optional props; do not change orders; use nullable/wrapper types for presence |
| SQL | Online, additive DDL | Add nullable columns/tables/indexes; avoid PK changes; never shrink/widen enums by ordinal |
| Events | Type name suffix and schema id (e.g., …/AuditRecord.Appended + …v1) |
Dual-publish window for v1/v2; consumers must accept both |
| Search Index | Template schemaVersion in docs; index name carries version |
Reindex + alias swap; never change analyzer on live index |
Schema registry¶
- URN format:
urn:connectsoft:schemas/<domain>/<name>.vN.json
Examples:urn:connectsoft:schemas/domain/auditrecord.v1.jsonurn:connectsoft:schemas/events/export.completed.v1.json
- Resolution: URNs resolve to signed JSON files in the registry (git + object store).
- Pinning: Producers/consumers pin to a commit digest or signed manifest for reproducibility.
- Discovery: Each payload includes
schemaVersionmatching a registry$id(minus theurn:prefix if desired).
Patterns by artifact¶
JSON (domain, admin, responses)¶
- Add: new optional properties, new enum literals (receivers treat unknown as
"Unknown"or ignore). - Never:
- change meaning/shape of a property,
- tighten required-ness,
- remove properties without a deprecation window.
- Extensibility: reserve
extobject for tenant/vendor fields:
All schemas: additionalProperties: false at root, except allow "ext": { "additionalProperties": true }.
Example (v1 → v2 bump)
// v1
{ "$id": "urn:connectsoft:schemas/domain/auditrecord.v1.json", ... }
// v2 (breaking due to renamed field 'actor' -> 'principal')
{ "$id": "urn:connectsoft:schemas/domain/auditrecord.v2.json", ... }
Protobuf (proto3)¶
- Add: new fields with new numbers; defaulting via wrapper types (
google.protobuf.*Value) when presence matters. - Remove: do not remove; mark
deprecated = trueand add:
UNSPECIFIED = 0; only append new members at the end; never renumber.
C# (gRPC code-first)¶
- Keep
DataMember(Order = N)stable forever. - Use
bool?,int?, etc., or wrapper classes to model presence. - Mark old members
[Obsolete("use NewField")]and keep them readable until sunset.
SQL¶
- Additive only on hot paths: new nullable columns with defaults, new tables, indexes created online.
- Never update authoritative payload columns post-write (WORM).
- Migrations: shadow tables for rebuilds; double-write temporarily if needed; swap atomically.
Events¶
- New fields: additive → consumers must ignore unknown.
- Breaking: create a new event schema id and/or eventType (
…v2), dual-publish for a defined window, then sunset v1.
Deprecation lifecycle¶
| Stage | Signal | Producer behavior | Consumer expectation | Typical duration |
|---|---|---|---|---|
| Proposed | Changelog entry | No change | Awareness | — |
| Deprecated | Schema annotations + docs | Keep writing old field; start writing new | Read both | 1–2 minors (≤ 6 months) |
| Sunset | Date announced | Stop writing deprecated field/event; emit v2 only | Must read new form | 1 minor (≤ 3 months) |
| Removed | Changelog & major | Field/event removed entirely | Must be updated | Next major |
Signaling mechanisms
- JSON schemas carry
"deprecationNote": "…". - gRPC/C#:
[Obsolete]attributes. - REST/streaming APIs may include
Sunsetheaders or metadata in Problem+JSON errors.
schemaVersion and registry pointers (in-payload)¶
- Domain payloads include:
{
"schemaVersion": "auditrecord.v1",
"schemaRef": "urn:connectsoft:schemas/domain/auditrecord.v1.json"
}
schemaVersion: "event-envelope.v1".
* Projections may cache schemaVersion of source for debugging and replays.
Evolution playbook¶
- Author change as additive; update JSON Schema & Protobuf or C# contracts.
- Register new/updated schema in the registry; bump version if breaking.
- Implement dual-read/write when renaming/moving fields:
- Write both
oldFieldandnewFieldfor the deprecation window. - Read prefers
newField ?? oldField.
- Write both
- Backfill only in projections. Never mutate authoritative JSON or it will invalidate integrity proofs.
- Roll out behind a feature flag or edition gate when applicable.
- Sunset: remove dual-write, emit v2 only; keep readers tolerant for an additional buffer.
Compatibility matrix (quick ref)¶
| Change | JSON | Protobuf | C# | SQL | Events | Allowed? |
|---|---|---|---|---|---|---|
| Add optional field | ✓ | ✓ (new number) | ✓ | ✓ (nullable) | ✓ | Yes |
| Add enum value | ✓ (tolerate unknown) | ✓ (append) | ✓ | n/a | ✓ | Yes |
| Rename field | via dual write, then v2 | add new, deprecate old | add new prop | new column; backfill only in projections | new event type | With plan |
| Remove field | deprecate → v2 | reserved number/name | [Obsolete] then remove in major | drop column after purge/rebuild | stop emitting v1 | Breaking |
| Change type | v2 | new field | new prop | new column | v2 | Breaking |
| Tighten required-ness | v2 | n/a | n/a | NOT on authoritative | v2 | Breaking |
CI checks & tooling¶
- jsonschema-compat: validates additive-only updates between
vNand working copy. - protoc lints: enforce reserved numbers, forbid renumbering, require
UNSPECIFIED=0. - contract tests: golden samples (see Fixtures, Samples & Test Data) round-trip through serializers/deserializers; consumers must pass unknown-field tolerance tests.
- hash guard: recompute canonical JSON → SHA-256 and assert no change for historical fixtures.
- DB migrator: dry-run additive DDL; ensure online flags; verify RLS remains intact.
Examples¶
JSON: additive field
// v1
{ "actor": { "id": "user_1", "type": "User" } }
// v1 additive (OK)
{ "actor": { "id": "user_1", "type": "User", "emailHash": "b109f3..."} }
Protobuf: rename via additive + deprecate
C#: dual read/write shim
public sealed class Actor
{
[DataMember(Order = 1)] public string Id { get; init; } = default!;
[DataMember(Order = 2)] public string Type { get; init; } = "Unknown";
[DataMember(Order = 99)] [Obsolete("Use Display")]
public string? Name { get; init; } // deprecated
[DataMember(Order = 100)]
public string? Display { get; init; }
}
// Reader preference
var display = actor.Display ?? actor.Name;
SQL: additive column (projection only)
ALTER TABLE dbo.AuditEvents ADD UADevice NVARCHAR(64) NULL;
-- Populate via projector; do not touch authoritative payload.
Validation rules (summary)¶
schemaVersionpresent and resolvable via registry$id.- Unknown JSON fields do not cause validation failure (except where explicitly disallowed); they flow into
extor are ignored. - Protobuf messages never reuse field numbers; removed fields are reserved.
- Authoritative payloads are never mutated post-write; all backfills happen in projections or sidecars.
- Event consumers and REST clients tolerate unknown fields and new enum values.
- Breaking changes require new
…vN+1identifiers and a documented migration & sunset plan.
Validation, Limits & Canonicalization¶
Centralizes constraints and normalization rules applied on the write path (ingress), mirrored by projectors and API responses. Ensures every AuditRecord is well-formed, bounded, redacted, and canonical before persistence and hashing.
JSON uses lowerCamel; C#/gRPC code-first uses PascalCase; tables/columns use PascalCase. Canonical JSON follows JCS (RFC 8785) for hashing (integrity excludes the
integritynode).
Record-level budgets¶
| Area | Limit | Notes |
|---|---|---|
Payload size (PayloadBytes) |
≤ 262,144 bytes (256 KiB) | Hard cap at ingress (reject with 413-equivalent). |
Attributes count (attributes) |
≤ 64 pairs | Keys must be simple tokens (see pattern). |
| Attribute key length | ≤ 64 chars | Pattern: ^[a-z][a-z0-9._-]{0,63}$ (ASCII). |
| Attribute value length | ≤ 256 chars | UTF-8 NFC normalized (see below). |
action |
≤ 64 chars | Canonicalized to lowercase verb or verb.noun. |
resource.type |
≤ 128 chars | Canonicalized to PascalCase dotted segments. |
resource.id |
≤ 128 chars | Opaque; case-preserving; no whitespace. |
resource.path |
≤ 512 chars | JSON Pointer-like; normalized (see below). |
actor.id |
≤ 128 chars | Opaque; case-preserving; no whitespace. |
actor.type |
enum | Unknown|User|Service|Job. |
| Idempotency key | ≤ 128 chars | ASCII visible; unique per tenant (soft TTL ≥ 24h). |
Correlation traceId |
hex32 | Lowercase 32 hex (W3C). |
Correlation requestId |
≤ 128 chars | Freeform token; trimmed. |
| Timestamps precision | ms | ISO-8601 UTC (Z) with millisecond precision. |
Clock & time sanity¶
- ObservedAt: set by platform to now (UTC) (ms precision).
- CreatedAt: producer-supplied; must satisfy:
createdAt ≤ now + 2m(future skew tolerance),createdAt ≥ now - 365d(hard past bound; older events rejected unless a special backfill path).
- EffectiveAt (if present):
effectiveAt ≤ createdAt. - All comparisons use UTC; rounding to millisecond precision for storage and hashing.
If
createdAt > now + 2m, reject (createdAt.futureBeyondSkew). IfeffectiveAt > createdAt, reject (effectiveAt.afterCreatedAt).
Canonicalization pipeline (ingress)¶
- Unicode: Normalize all free-text strings to NFC; strip leading/trailing whitespace; collapse internal runs of whitespace to single spaces.
- Action: lower-case and validate:
^[a-z]+(\.[a-z0-9_-]+)?$.- Examples:
create,appointment.read,user.reset_password(underscore allowed after the dot).
- Examples:
- Resource type: split on
., PascalCase each segment, then rejoin with.. Validate:^[A-Z][A-Za-z0-9]*(\.[A-Z][A-Za-z0-9]*)*$.- Example:
vetspire.appointment→Vetspire.Appointment.
- Example:
- Resource path (optional): normalize JSON Pointer-like value:
- Ensure it starts with
/, decode/encode escapes per RFC 6901, and remove trailing/unless root.
- Ensure it starts with
- Attributes:
- Keys → ASCII, pattern
^[a-z][a-z0-9._-]{0,63}$. - Values → UTF-8 NFC, ≤256; drop non-printable control chars.
- Keys → ASCII, pattern
- Correlation:
traceId→ lowercase hex32; reject non-hex;requestId→ trim and squash whitespace; ≤128.
- IP addresses (if provided under conventional keys like
client.ip,server.ip):- Parse; if IPv4-mapped IPv6, convert to IPv4;
- IPv4: dotted decimal, no leading zeros; IPv6: RFC 5952 canonical form (lowercase hex, zero-compression).
- User agent (if present as
client.userAgent):- Remove controls; truncate to 256 chars; optional parse to structured UA in projections.
- Numbers: preserve numeric types; reject NaN/Infinity.
- JCS: materialize canonical JSON for hashing: sorted keys, no insignificant whitespace, timestamps as ISO-8601 UTC with ms precision, strings as NFC.
Delta & redaction caps¶
delta.fieldsmap: ≤ 256 entries.- Field key length: ≤ 128 (JSON Pointer or dotted path, normalized to one style in the model).
- Each
before/afterscalar string ≤ 1024; longer content must be redacted/truncated and marked withredactionHint = "Truncated". - Binary values are not allowed; base64 strings must be ≤ 2 KiB after encoding.
- If a delta entry violates limits, drop the value, keep the key, and attach
redactionHint.
Validation matrix (selected)¶
| Field | Pattern / Rule | Failure code |
|---|---|---|
tenantId |
^[A-Za-z0-9._-]{1,128}$ |
tenantId.invalid |
action |
^[a-z]+(\.[a-z0-9_-]+)?$ |
action.invalid |
resource.type |
^[A-Z][A-Za-z0-9]*(\.[A-Z][A-Za-z0-9]*)*$ |
resource.type.invalid |
resource.id |
no spaces, ≤128 | resource.id.invalid |
actor.id |
no spaces, ≤128 | actor.id.invalid |
actor.type |
enum | actor.type.invalid |
correlation.traceId |
hex32 | traceId.invalid |
createdAt |
≤ now + 2m |
createdAt.futureBeyondSkew |
effectiveAt |
≤ createdAt |
effectiveAt.afterCreatedAt |
attributes.*.key |
^[a-z][a-z0-9._-]{0,63}$ |
attributes.key.invalid |
attributes.*.value |
≤256, printable | attributes.value.invalid |
payloadBytes |
≤ 256 KiB | payload.tooLarge |
Problem+JSON error hints (ingress)¶
Type base: urn:connectsoft:errors/validation/{code}
Example (oversized payload)
{
"type": "urn:connectsoft:errors/validation/payload.tooLarge",
"title": "Payload exceeds 256 KiB",
"status": 413,
"detail": "Submitted AuditRecord is 312,884 bytes.",
"instance": "/ingest/records",
"extensions": { "limitBytes": 262144 }
}
Example (bad action)
{
"type": "urn:connectsoft:errors/validation/action.invalid",
"title": "Invalid action",
"status": 400,
"detail": "Expected 'verb' or 'verb.noun' lowercase.",
"errors": [{ "pointer": "/action", "reason": "regex" }]
}
JSON Schema (snippets, v1 addenda)¶
Add the following constraints to auditrecord.v1.json:
{
"properties": {
"action": { "type": "string", "maxLength": 64, "pattern": "^[a-z]+(\\.[a-z0-9_-]+)?$" },
"resource": {
"type": "object",
"properties": {
"type": { "type": "string", "maxLength": 128, "pattern": "^[A-Z][A-Za-z0-9]*(\\.[A-Z][A-Za-z0-9]*)*$" },
"id": { "type": "string", "maxLength": 128, "pattern": "^(?!.*\\s).+$" },
"path": { "type": "string", "maxLength": 512 }
}
},
"actor": {
"type": "object",
"properties": {
"id": { "type": "string", "maxLength": 128, "pattern": "^(?!.*\\s).+$" },
"type": { "type": "string", "enum": ["Unknown","User","Service","Job"] }
}
},
"attributes": {
"type": "object",
"propertyNames": { "pattern": "^[a-z][a-z0-9._-]{0,63}$" },
"additionalProperties": { "type": "string", "maxLength": 256 }
},
"correlation": {
"type": "object",
"properties": {
"traceId": { "type": "string", "pattern": "^[a-f0-9]{32}$" },
"requestId": { "type": "string", "maxLength": 128 },
"causationId":{ "type": "string", "pattern": "^[0-9A-HJKMNP-TV-Z]{26}$" }
}
}
}
}
C# canonicalizers (gRPC code-first)¶
public static class Canon
{
// Unicode NFC + trim + collapse inner whitespace
public static string NormalizeText(string s)
{
if (string.IsNullOrWhiteSpace(s)) return string.Empty;
var nfc = s.Normalize(NormalizationForm.FormC).Trim();
return Regex.Replace(nfc, @"\s+", " ");
}
public static string CanonicalizeAction(string action)
{
var s = NormalizeText(action).ToLowerInvariant();
if (!Regex.IsMatch(s, "^[a-z]+(\\.[a-z0-9_-]+)?$")) throw new ValidationException("action.invalid");
return s;
}
public static string CanonicalizeResourceType(string type)
{
var s = NormalizeText(type);
var segs = s.Split('.', StringSplitOptions.RemoveEmptyEntries)
.Select(Pascalize);
var joined = string.Join('.', segs);
if (!Regex.IsMatch(joined, "^[A-Z][A-Za-z0-9]*(\\.[A-Z][A-Za-z0-9]*)*$"))
throw new ValidationException("resource.type.invalid");
return joined;
static string Pascalize(string x) =>
Regex.Replace(x, @"(^|[_\-\s]+)([a-zA-Z0-9])", m => m.Groups[2].Value.ToUpperInvariant()) // word caps
.Replace("_", "").Replace("-", "").Replace(" ", "");
}
public static string CanonicalizeTraceId(string hex32)
{
var s = NormalizeText(hex32).ToLowerInvariant();
if (!Regex.IsMatch(s, "^[a-f0-9]{32}$")) throw new ValidationException("traceId.invalid");
return s;
}
public static string CanonicalizeIp(string ip)
{
if (string.IsNullOrWhiteSpace(ip)) return ip;
if (System.Net.IPAddress.TryParse(ip.Trim(), out var addr))
{
if (addr.IsIPv4MappedToIPv6) addr = addr.MapToIPv4();
return addr.AddressFamily == System.Net.Sockets.AddressFamily.InterNetworkV6
? addr.ToString()!.ToLowerInvariant() // .NET outputs RFC 5952-ish
: addr.ToString(); // IPv4 dot-decimal
}
throw new ValidationException("ip.invalid");
}
public static string Truncate(string s, int max) =>
s.Length <= max ? s : s.Substring(0, max);
public static (DateTimeOffset createdAt, DateTimeOffset observedAt, DateTimeOffset? effectiveAt)
ValidateClocks(DateTimeOffset createdAt, DateTimeOffset? effectiveAt, DateTimeOffset nowUtc)
{
if (createdAt > nowUtc.AddMinutes(2)) throw new ValidationException("createdAt.futureBeyondSkew");
if (effectiveAt is { } e && e > createdAt) throw new ValidationException("effectiveAt.afterCreatedAt");
var observedAt = nowUtc; // set by platform
return (RoundMs(createdAt), RoundMs(observedAt), effectiveAt is null ? null : RoundMs(effectiveAt.Value));
static DateTimeOffset RoundMs(DateTimeOffset t) => new DateTimeOffset(t.UtcDateTime.AddTicks(-(t.UtcDateTime.Ticks % TimeSpan.TicksPerMillisecond)), TimeSpan.Zero);
}
}
Apply canonicalizers before computing the canonical JSON and
PayloadBytes. Reject on first hard violation; include Problem+JSON hints.
UA, IP & path normalization (field guidance)¶
- Prefer
attributes["client.ip"],attributes["server.ip"],attributes["client.userAgent"]. - Store normalized IPs and truncated UA (≤256).
- For resource sub-paths, prefer JSON Pointer (e.g.,
/lines/0/price), not dotted paths.
Consistency & projection mirroring¶
- Projections assume canonicalized inputs; do not re-normalize, only validate invariants when enriching (e.g., computing
ChangedFields, bitmasks). - Search docs and exports must reflect the post-redaction, canonical values only.
Operational guards¶
- Ingress backpressure: reject when payloads regularly near the 256 KiB ceiling; emit
Validation.Alertmetrics. - Shadow validation: periodically sample stored payloads and re-run canonicalizers to detect drift.
- Schema pinning: assert
auditRecord.schemaVersionmatches registry; reject unknown majors.
Privacy & PII Inventory¶
Maps all AuditRecord fields to sensitivity classes and defines data minimization and mask-on-read behavior. Inventory drives redaction at write, project, search, export, and read time.
JSON uses lowerCamel; C#/gRPC uses PascalCase; DB tables/columns use PascalCase.
DataClassFlagsis a bitmask on projections:Public=1,Internal=2,Personal=4,Sensitive=8,Credential=16,Phi=32.
Data classes (recap)¶
| Class | Description | Examples | Default read mask |
|---|---|---|---|
Public |
Harmless metadata | action, resource.type | None |
Internal |
Ops-only, non-PII | correlation ids, shard ids | None |
Personal |
Direct or indirect identifiers (PII) | names, emails, IPs, device ids | Partial mask / hash |
Sensitive |
High-risk PII | SSN, national id, exact GPS | Tokenize / drop |
Credential |
Secrets/tokens/passwords | API keys, session tokens | Drop at write |
Phi |
Regulated health info | diagnosis, treatment codes | Tokenize / policy-gated |
Redaction rules per class are defined in Data Classification & Redaction Rules. This section focuses on where those classes apply.
Field → DataClass inventory (canonical AuditRecord)¶
| Path | Type | DataClass | Minimization posture |
|---|---|---|---|
auditRecordId |
ULID | Internal | Keep. |
tenantId |
token | Internal | Keep. |
schemaVersion |
string | Internal | Keep. |
createdAt,observedAt,effectiveAt |
timestamp | Internal | Round to ms; keep. |
action |
string | Public | Keep lowercased. |
resource.type |
string | Public | Keep PascalCase. |
resource.id |
string | Personal (pseudonymous) | Keep opaque id; never expand. |
resource.path |
string | Internal | Normalize JSON Pointer; keep. |
actor.id |
string | Personal (pseudonymous) | Keep opaque id. |
actor.type |
enum | Public | Keep. |
actor.display |
string | Personal | Partial mask or hash (UI default: mask). |
decision.outcome |
enum | Internal | Keep. |
decision.reason |
string | Internal | Truncate ≤256. |
correlation.traceId |
hex32 | Internal | Keep lowercase. |
correlation.requestId |
string | Internal | Keep trimmed. |
correlation.causationId |
ULID | Internal | Keep. |
producer |
string | Internal | Keep. |
attributes["client.ip"] |
ip | Personal | IPv4/6 canonical; mask on read (e.g., /24, /64). |
attributes["server.ip"] |
ip | Internal | Keep canonical. |
attributes["client.userAgent"] |
string | Personal | Truncate 256; mask on read (suffix elide). |
attributes["email"] / *.email |
string | Personal | Store hash(email); optionally store masked a***z@d***.tld. |
attributes["phone"] / *.phone |
string | Personal | Normalize E.164; store masked; hash full value. |
attributes["geo.lat"]|["geo.lon"] |
number | Sensitive | Quantize; tokenize or drop if policy forbids. |
attributes["secret"|"password"|"token"|"apiKey"] |
string | Credential | Drop at write; keep a hash-of-hash fingerprint optionally. |
delta.fields[*].before/after |
scalar | Class of field | Enforce per-field class; large strings redact or tokenize. |
integrity.* |
proof | Internal | Keep (sidecar) once sealed. |
Attribute classification uses key heuristics (see below) plus tenant policy overrides.
Attribute classification heuristics¶
When attributes are free-form, apply pattern-based classification at ingress:
| Pattern | DataClass | Rule |
|---|---|---|
(?i)\b(email|e-mail)\b |
Personal | Store SHA-256(email) + masked printable (e.g., f***@example.com). |
(?i)\b(phone|mobile|msisdn)\b |
Personal | Normalize E.164; store masked + hash. |
(?i)\b(ssn|nin|national[_-]?id)\b |
Sensitive | Tokenize (format-preserving if needed) or drop. |
(?i)\b(password|secret|api[_-]?key|token|credential|bearer)\b |
Credential | Drop value; keep indicator present=true. |
(?i)\b(ip|client\.ip|remote[_-]?addr)\b |
Personal | Canonicalize; mask on read. |
(?i)\b(gps|geo\.(lat|lon)|location)\b |
Sensitive | Quantize (e.g., 2 decimals) + tokenize or drop. |
(?i)\b(name|first[_-]?name|last[_-]?name|full[_-]?name)\b |
Personal | Mask on read; optional hash. |
Tenants can override via a
classificationOverridesmap (key →DataClass), versioned in policy.
Default redaction on write¶
Apply these write-time transformations before storage/hashing:
| DataClass | Write action |
|---|---|
| Credential | Drop value; store {present:true} and optional sha256(sha256(value)) fingerprint. |
| Sensitive | Tokenize or hash (irreversible) per tenant policy; optional bucketing (e.g., age bands). |
| Personal | Store value if necessary for audit, but also store masked variant for read; hash common identifiers. |
| Internal/Public | Keep as-is (normalized). |
Masking on read (role × purpose)¶
Server-side readers must apply a masking profile derived from role and purpose-of-use.
| Profile | Intended users | Personal | Sensitive | Credential | Phi |
|---|---|---|---|---|---|
Safe (default) |
Console users, search | Mask (email a***z@e***.com, IP /24//64) |
Tokenized | Omit | Tokenized |
Support |
Support tickets | Mask | Omit | Omit | Omit |
Investigator (JIT) |
Security/IR with approval | Unmask (JIT logged) | Tokenized | Omit | Tokenized |
Raw (policy-gated) |
Legal export with basis | Unmask | Unmask per DPA | Omit | Unmask if lawful |
All unmask operations are just-in-time (JIT), time-bound, and audited (who, when, purpose, scope).
API contract (hint)¶
Reads accept an optional Redaction header or query parameter:
Redaction: profile=Safe|Support|Investigator|Raw; purpose="Incident #1234"; expiry=2025-10-31T23:59:59Z
Server may downgrade profile based on tenant policy and user role.
JSON: classification hints (partials)¶
Add optional hints to AuditRecord for explicit tagging and masking provenance:
{
"classification": {
"record": ["Internal","Personal"],
"fields": {
"actor.display": ["Personal"],
"attributes.client.ip": ["Personal"],
"attributes.email": ["Personal"]
},
"policyRef": { "id": "policy-default", "revision": 3 }
},
"redaction": {
"planId": "policy-default",
"appliedAt": "2025-10-22T12:00:00Z",
"profile": "Safe"
}
}
C# helpers (masking)¶
public enum RedactionProfile { Safe, Support, Investigator, Raw }
public static class Mask
{
public static string Email(string email, RedactionProfile p) =>
p switch {
RedactionProfile.Safe or RedactionProfile.Support => MaskEmail(email),
RedactionProfile.Investigator or RedactionProfile.Raw => email,
_ => MaskEmail(email)
};
public static string Ip(string ip, RedactionProfile p)
{
if (p == RedactionProfile.Raw || p == RedactionProfile.Investigator) return ip;
if (System.Net.IPAddress.TryParse(ip, out var addr))
{
if (addr.AddressFamily == System.Net.Sockets.AddressFamily.InterNetwork)
{
var oct = addr.ToString().Split('.');
return $"{oct[0]}.{oct[1]}.{oct[2]}.0/24";
}
// IPv6 /64
var hextets = addr.ToString().Split(':');
return string.Join(':', hextets.Take(4)) + "::/64";
}
return ip;
}
private static string MaskEmail(string email)
{
var parts = email.Split('@');
if (parts.Length != 2) return email;
string m(string s) => s.Length <= 2 ? new string('*', s.Length) :
$"{s[0]}{new string('*', s.Length - 2)}{s[^1]}";
var local = m(parts[0]);
var dom = parts[1].Split('.');
var domainMasked = dom.Length >= 2 ? $"{m(dom[0])}.{dom[^1]}" : m(parts[1]);
return $"{local}@{domainMasked}";
}
}
Search & exports¶
- Search index: index only masked or hashed forms for Personal/Sensitive; never index Credential.
- Exports: honor the job’s
redactionPlan. For legal hold exports, default toInvestigatororRawonly when lawful basis exists and is recorded in the job metadata.
SQL masking views (illustrative)¶
CREATE VIEW dbo.AuditEvents_Masked AS
SELECT
TenantId, AuditRecordId, CreatedAt, ObservedAt, Action, ResourceType, ResourceId,
CASE WHEN DataClassFlags & 4 = 4 THEN -- Personal
CONCAT(LEFT(ActorId, 2), '***') ELSE ActorId END AS ActorId,
DecisionOutcome, ChangedFields, DataClassFlags
FROM dbo.AuditEvents;
Use DB views for BI/reporting contexts that cannot call service-side maskers.
Examples¶
Stored authoritative JSON (after write-time minimization)
{
"actor": { "id": "user_123", "type": "User", "display": "A. Smith" },
"attributes": {
"client.ip": "203.0.113.42",
"client.userAgent": "Mozilla/5.0 ...",
"email": "sha256:2c26b46b68ffc68ff99b453c1d304134..."
},
"resource": { "type": "Vetspire.Appointment", "id": "A-9981" },
"action": "appointment.update",
"schemaVersion": "auditrecord.v1"
}
Read (profile=Safe)
{
"actor": { "id": "user_123", "type": "User", "display": "A***h" },
"attributes": {
"client.ip": "203.0.113.0/24",
"client.userAgent": "Mozilla/5.0 …(masked)",
"email.masked": "a***h@e***.com"
}
}
Validation rules (summary)¶
- Credential values are never persisted; reject or drop with
redactionHint="Dropped". - Sensitive values require tokenize/hash at write unless tenant policy allows storage with heightened controls.
- Personal values default to mask on read; store hashed surrogates for join-free investigations.
- All unmask operations require purpose, approver (if configured), expiry, and generate read-access audit events.
- DataClassFlags on projections reflect the union of classes present on the record and its delta.
Data Lifecycle & States¶
Models the end-to-end lifecycle of an AuditRecord from append → accepted → projected → sealed → eligible → purged → exported. Defines clocks, transitions, and a durable lifecycle transition log.
JSON uses lowerCamel; C#/gRPC uses PascalCase; DB tables/columns use PascalCase. Times are ISO-8601 UTC with ms precision. “OnHold” is an overlay that blocks purge (see Legal Hold).
Overview¶
- Authoritative write appends a canonical JSON payload (no integrity) and emits Accepted.
- Projectors upsert query shapes and advance per-tenant watermarks.
- Integrity Service batches leaves into segments/blocks and seals proofs (sidecar).
- Retention Evaluator computes the earliest EligibleAt date respecting policy & legal holds.
- Lifecycle performs purge once records are Eligible and not OnHold.
- Export can occur anytime after Accepted; manifests reference integrity material when available.
- All steps produce idempotent lifecycle transition entries.
Clocks¶
| Name | Source | Purpose | Rule |
|---|---|---|---|
createdAt |
Producer | Domain time of the event | ≤ now + 2m skew; ms precision. |
observedAt |
Platform | Ingress time | Set on accept. |
sealedAt |
Integrity | Time the block was sealed | From Integrity Service. |
eligibleAt |
Retention | Earliest purge date | From evaluator: policy × attributes × state. |
purgedAt |
Lifecycle | Authoritative deletion time | Set by lifecycle job. |
exportedAt |
Export | Package/manifests creation time | Per package. |
projectedAt |
Projector | Projection upsert time | Optional; usually implicit via checkpoints. |
State model¶
We model monotonic states plus overlays. A record may be exported multiple times, and OnHold can toggle independently.
stateDiagram-v2
[*] --> Appended: Ingest Append
Appended --> Accepted: Durable write
Accepted --> Projected: Projectors upsert rows
Projected --> Sealed: Integrity sealed (sidecar)
Sealed --> Eligible: Retention evaluator computes date reached
Accepted --> Eligible: (path when sealing disabled)
Eligible --> Purged: Lifecycle purge
Accepted --> Exported: Export produces package(s)
Projected --> Exported
Sealed --> Exported
Exported --> Exported: Subsequent exports
note right of Eligible: Blocked by OnHold overlay
state OnHold <<choice>>
Derived “current state” (live query):
- Purged if not found in authoritative store and a
Purgedtransition exists. - Else Eligible if
now ≥ eligibleAtand no active holds. - Else Sealed if
RecordIntegrityexists. - Else Projected if checkpoints ≥ record.
- Else Accepted once durable.
- Appended is transient (pre-commit).
Lifecycle transitions¶
Transitions are recorded in a durable log (append-only). Each transition is idempotent (same key → same effect).
| Event | When | Required fields (JSON) |
|---|---|---|
Accepted |
After authoritative insert | auditRecordId, observedAt |
Projected |
After all required projections upsert | auditRecordId, projectedAt, projections |
Sealed |
After integrity proof computed | auditRecordId, sealedAt, blockId, segmentId, leafHash |
EligibleComputed |
When evaluator computes date | auditRecordId, eligibleAt, policyId, policyRevision |
OnHoldApplied |
When a hold starts matching | auditRecordId, holdId, placedAt |
OnHoldReleased |
When the last matching hold ends | auditRecordId, holdId, releasedAt |
Purged |
After successful authoritative delete | auditRecordId, purgedAt, reason |
Exported |
Per package including the record | auditRecordId, exportedAt, jobId, packageId, manifestUri |
OnHoldis modeled as entries; “currently on hold” is computed as (applied − released) across matching holds.
JSON Schemas (v1)¶
lifecycle-transition.v1.json
{
"$id": "urn:connectsoft:schemas/lifecycle/lifecycle-transition.v1.json",
"$schema": "https://json-schema.org/draft/2020-12/schema",
"title": "LifecycleTransition",
"type": "object",
"additionalProperties": false,
"properties": {
"tenantId": { "type": "string", "pattern": "^[A-Za-z0-9._-]{1,128}$" },
"eventId": { "type": "string", "pattern": "^[0-9A-HJKMNP-TV-Z]{26}$" },
"auditRecordId": { "type": "string", "pattern": "^[0-9A-HJKMNP-TV-Z]{26}$" },
"kind": { "type": "string", "enum": ["Accepted","Projected","Sealed","EligibleComputed","OnHoldApplied","OnHoldReleased","Purged","Exported"] },
"at": { "type": "string", "format": "date-time" },
"traceId": { "type": "string", "pattern": "^[a-f0-9]{32}$" },
"producer": { "type": "string", "maxLength": 64 },
"data": { "type": "object", "additionalProperties": true } // kind-specific
},
"required": ["tenantId","eventId","auditRecordId","kind","at"]
}
Kind-specific data payloads
Projected:{ "projections": ["AuditEvents","ResourceEvents"], "projectedAt": ts }Sealed:{ "blockId": ULID, "segmentId": ULID, "leafHash": "hex64" }EligibleComputed:{ "eligibleAt": ts, "policyId": "id", "policyRevision": 3, "basis": "RuleName" }Purged:{ "reason": "Retention|GDPR.Request|Admin.Purge", "jobId": ULID? }Exported:{ "jobId": ULID, "packageId": ULID, "manifestUri": "s3://…", "exportedAt": ts }
C# (gRPC code-first)¶
[DataContract]
public sealed class LifecycleTransition
{
[DataMember(Order = 1)] public string TenantId { get; init; } = default!;
[DataMember(Order = 2)] public string EventId { get; init; } = default!; // ULID
[DataMember(Order = 3)] public string AuditRecordId { get; init; } = default!; // ULID
[DataMember(Order = 4)] public string Kind { get; init; } = default!; // enum name
[DataMember(Order = 5)] public DateTimeOffset At { get; init; }
[DataMember(Order = 6)] public string? TraceId { get; init; } // hex32
[DataMember(Order = 7)] public string? Producer { get; init; }
[DataMember(Order = 8)] public Dictionary<string, object>? Data { get; init; }
}
public static class LifecycleKinds
{
public const string Accepted = "Accepted";
public const string Projected = "Projected";
public const string Sealed = "Sealed";
public const string EligibleComputed = "EligibleComputed";
public const string OnHoldApplied = "OnHoldApplied";
public const string OnHoldReleased = "OnHoldReleased";
public const string Purged = "Purged";
public const string Exported = "Exported";
}
Storage mapping (SQL)¶
LifecycleTransitions (append-only, per tenant × record)
CREATE TABLE dbo.LifecycleTransitions (
EventId CHAR(26) NOT NULL, -- ULID
TenantId NVARCHAR(128) NOT NULL,
AuditRecordId CHAR(26) NOT NULL,
Kind NVARCHAR(32) NOT NULL,
At DATETIME2(3) NOT NULL,
TraceId CHAR(32) NULL,
Producer NVARCHAR(64) NULL,
DataJson NVARCHAR(MAX) NULL,
CONSTRAINT PK_LifecycleTransitions PRIMARY KEY (EventId),
INDEX IX_Life_Tenant_Record_At (TenantId, AuditRecordId, At),
INDEX IX_Life_Tenant_Kind_At (TenantId, Kind, At DESC)
);
The lifecycle log outlives the authoritative row by policy (e.g., 2 years) and contains no raw PII beyond opaque ids.
Eligibility & TTLs¶
Evaluator inputs
- Record clocks (
createdAt,effectiveAt),action,resource.type,attributes, tenant policy (RetentionPolicy), and active holds.
Outputs
eligibleAttimestamp andbasis(rule id). Recomputed on policy change or hold changes.
Typical SLOs & TTLs
| Stage | SLO target | TTL / cadence |
|---|---|---|
| Projected | p95 < 60s from observedAt |
Continuous |
| Sealed | p95 < 10m or block size threshold | Batching |
| EligibleComputed | within 24h of write or policy change | Daily evaluator sweep |
| Purged | within 7d of now ≥ eligibleAt (no holds) |
Daily lifecycle |
| Lifecycle log retention | ≥ 2y post-purge (configurable) | Separate policy |
When
includeIntegrity=false, the path Accepted → Eligible remains valid;Sealedis optional.
Current state API (hint)¶
GET /tenants/{tenantId}/records/{auditRecordId}/lifecycle
Response shape:
{
"state": "Sealed",
"onHold": true,
"clocks": {
"createdAt": "2025-10-22T12:00:03.100Z",
"observedAt": "2025-10-22T12:00:03.300Z",
"sealedAt": "2025-10-22T12:06:00.000Z",
"eligibleAt": "2026-10-22T00:00:00.000Z",
"purgedAt": null
},
"transitions": [
{ "kind": "Accepted", "at": "2025-10-22T12:00:03.300Z" },
{ "kind": "Projected", "at": "2025-10-22T12:00:45.020Z", "data": { "projections": ["AuditEvents"] } },
{ "kind": "Sealed", "at": "2025-10-22T12:06:00.000Z", "data": { "blockId": "01JE7E0...", "segmentId": "01JE7DZ..." } },
{ "kind": "EligibleComputed", "at": "2025-10-23T00:05:00.000Z", "data": { "eligibleAt": "2026-10-22T00:00:00Z", "policyId": "ret-std", "policyRevision": 3 } },
{ "kind": "OnHoldApplied", "at": "2025-11-01T09:00:00.000Z", "data": { "holdId": "01JEAH..." } }
]
}
Transition sources (wiring)¶
Accepted: write path completion; emits AuditRecord.Accepted event.Projected: projector completion; may be inferred from Projection.Updated.Sealed: from Integrity.ProofComputed (match byAuditRecordIdvia segment membership).EligibleComputed: Retention Service evaluation.OnHold*: Legal Hold matcher.Purged: Lifecycle job after successful delete.Exported: Export job on package completion.
All sources produce a LifecycleTransition row with
traceIdandproducerset.
Edge cases & rules¶
- Late arrivals/backfills: State reconstruction relies on log order by
At; idempotency on(EventId)ensures safe replays. - Hold toggling:
OnHoldApplied/OnHoldReleasedcan bracket any state; Eligible does not imply Purge until no holds remain. - Export pre-seal: Allowed; manifest may omit integrity bundle. Later exports SHOULD include it once sealed.
- Purge vs Export: Purge never deletes lifecycle log or export manifests; it deletes authoritative payload and projections only.
- Rebuilds: If lifecycle log is lost, recompute from authoritative store + sidecars + events; reseal times cannot be reconstructed, keep as null with a
basis="Rebuilt"note.
Validation rules (summary)¶
- Lifecycle transitions must be append-only; block UPDATE/DELETE.
Atmust be monotonic non-decreasing per(TenantId, AuditRecordId, Kind); ties allowed across kinds.Purgedrequires record absence in authoritative store.EligibleComputed.eligibleAtmust be ≥ observedAt and derived from an existing policy revision.- Lifecycle log must contain no raw PII beyond opaque identifiers.
Performance & Size Budgets¶
Establishes size ceilings, throughput targets, compaction windows, and tiering to keep ingestion smooth, search snappy, and costs predictable.
JSON uses lowerCamel; C#/gRPC uses PascalCase; DB tables/columns use PascalCase. Sizes use binary units (KiB, MiB, GiB). Values are defaults—tenants/editions may override within safe bounds.
Quick budgets (at a glance)¶
| Area | Target / Ceiling | Notes |
|---|---|---|
Authoritative write (PayloadBytes) |
≤ 256 KiB (hard) | Reject > 256 KiB at ingress. See Validation & Canonicalization. |
| AuditEvents row | ≤ 512 B typical | Derived, compact; no large blobs. |
| Search doc | ≤ 8 KiB | Redacted & compact fields only. |
| Projection lag (p95) | < 60 s | From ObservedAt → visible in projections. |
| Integrity seal lag (p95) | < 10 min | Block/segment thresholds or time window. |
| Per-tenant ingest | burst ≤ 2,000 rps; sustained ≤ 500 rps | Edition/tier gates; per-shard back-pressure. |
| Global ingest | plan for ≥ 50k rps | Scale shards linearly; see shard ring. |
| Export package target | 512 MiB raw per package | Balanced for network & resume safety. |
| Search shard size | 20–50 GiB after merge | Per index rollover/ILM. |
| Hot data window | 7–30 days | Fast storage, high refresh. |
| Warm window | 30–180 days | Cheaper storage, slower refresh. |
| Cold/Archive | 6–84 months | Object storage snapshots / parquet exports. |
Authoritative store budgets¶
Row shape & bytes
AuditRecordRow.PayloadJsondominates size; keep envelopes lean.- Ceilings (hard):
PayloadBytes≤ 262,144Attributes≤ 64 pairs (key ≤ 64, value ≤ 256)Delta.fields≤ 256 entries, values ≤ 1,024 chars each (post-redaction)
Indexes (minimal)
(TenantId, CreatedAt)– primary scan path(TenantId, CorrelationTraceId)– OTel correlation(TenantId, IdempotencyKey)(filtered unique) – dedupe
Selectivity guidance
- Composite keys always start with
TenantIdfor RLS pruning. - Avoid additional secondary indexes unless filter selectivity < 10% over tenant partitions.
Write amplification guardrails
- Max 2 secondary indexes on the authoritative table.
- Aim for < 1.5× WAL/redo amplification per insert.
Integrity service sizing¶
| Unit | Target | Trigger |
|---|---|---|
| Segment | ~64 KiB–8 MiB of leaves or 4k records, whichever first | Start a new segment when either threshold is hit. |
| Block | ≤ 1,000,000 records or ≤ 10 min window | Seal and sign; emit Integrity.ProofComputed. |
- Keep
SealedAtjitter < 2 min to smooth proof availability. - Segment & block sizes are tunable per shard to meet p95 seal lag.
Projections (read models)¶
Tables
AuditEvents(primary),ResourceEvents,ActorEvents.
Storage budgets
- Row ≤ 512 B typical; avoid wide JSON blobs.
ChangedFields≤ 64 keys, string key ≤ 128 chars.
Indexes
(TenantId, CreatedAt DESC, AuditRecordId DESC)– universal seek- Per-resource and per-actor composites (see Read Models).
Checkpoint SLOs
- Per projection × tenant checkpoint advances at least every 5 s under load.
Search index budgets¶
- Doc size ≤ 8 KiB after analysis.
- Refresh interval: 5 s (hot), 60 s (warm).
- Primary shard size: 20–50 GiB post-merge.
- Rollover: 30 GiB or 7 d (first wins).
- ILM delete: ≤ authoritative retention for the tenant.
Selectivity hints
- Always include
tenantIdas a must clause or alias filter. - Field-cardinality:
resourceType.kwhigh (1e2–1e4) ✅action.kwmedium (1e1–1e3) ✅decisionOutcomelow (3–4) – use as a filter, not sort.
Export sizing¶
- Package raw bytes target: 512 MiB.
- Compression: default Gzip → expect 3–6× reduction on JSONL.
- Concurrency: up to 8 packages in-flight per job per shard.
- Resume granularity: file offset checkpoints or package boundaries only.
Hot / Warm / Cold tiers¶
| Tier | What | Storage | Policy |
|---|---|---|---|
| Hot | Authoritative + projections for most-recent window | SSD / premium DB tier; search hot indices | Fast ingest; frequent compaction; refresh 5s |
| Warm | Older projections & search indices | General SSD; colder DB tier | Lower refresh; force-merge; fewer replicas |
| Cold | Historical snapshots | Object storage (WORM optional) | Parquet/JSONL exports; integrity bundles persisted |
Typical windows (suggested defaults)
- Hot: 0–30 d, Warm: 30–180 d, Cold: > 180 d (subject to tenant retention).
Compaction & maintenance windows¶
- Vacuum/Autovacuum (PG) / Index Rebuild (SQL Server): nightly maintenance window per shard (staggered).
- Projection compaction: weekly
CLUSTER/REINDEX(PG) orREORGANIZE(SQL Server) when bloat > 20%. - Search force-merge: warm phase to 1 segment per shard once write-complete.
Throughput & concurrency targets¶
| Dimension | Target | Notes |
|---|---|---|
| Single shard ingest | ≥ 3k rps sustained | With 2 secondary indexes and WAL sync. |
| Latency (p95) | Ingest < 50 ms; Project < 60 s; Seal < 10 min | From ObservedAt. |
| Tenant burst | ≤ 2k rps for 60 s | Token bucket; smooth via queue. |
| Exporter | ≥ 200 MiB/s per shard (network bound) | Multi-part uploads. |
Sizing heuristic
- Records/day =
rps * 86,400. - Authoritative storage/day ≈
avgPayloadBytes * records/day * replicationFactor. - Ensure shard disks stay < 70% full at monthly peak.
Back-pressure & throttling¶
Ingress token bucket (per tenant)
- Capacity =
burstRps * 60. - Fill rate =
sustainedRps. - HTTP 429/
Problem+JSONwhen empty; includeretryAfter.
System-wide pressure signals (any trips → slow producers)
- WAL/transaction log > 80% of burst buffer.
- DB queue depth p95 > 200 ms for 1 min.
- CPU > 75% for 5 min on ingest nodes.
- Disk IO latency > 20 ms p95.
- Search indexing backlog > 15 min behind.
Shedding order
- Defer/slow export workers.
- Reduce search refresh rate (hot from 5s → 30s).
- Throttle tenant bursts (429).
- Pause non-critical projectors (actor/resource timelines) before
AuditEvents.
Example capacity plan (rule-of-thumb)¶
Assume avgPayload = 2 KiB, global 20k rps, replication x3:
- Authoritative/day ≈ 2 KiB × 20k × 86,400 × 3 ≈ 10.4 TiB/day.
- With 30-day hot window → ~312 TiB hot-tier (pre-compaction).
- Search docs (~1.2 KiB/doc) × 20k rps → ~6.7 TiB/day before merges; warm merge reduces by ~30–40%.
Shards: target ≤ 8 TiB hot data/shard → need ~40 shards for hot tier. Scale projectors & integrity workers per shard.
Monitoring & alerts (key SLOs)¶
- Ingest: p95 < 50 ms; error rate < 0.1%.
- Projection lag: p95 < 60 s; 99.9% < 5 min.
- Seal lag: p95 < 10 min.
- Search freshness: hot alias max age < 2 min.
- Export: time-to-first-package < 2 min; throughput ≥ 100 MiB/s/job.
Alert when:
- Any SLO violated for 5 consecutive minutes.
- Disk utilization > 80% or predicted > 90% within 7 days.
Validation rules (summary)¶
- Reject writes over 256 KiB; surface
payload.tooLarge. - Enforce index minimalism on authoritative store (≤ 2 secondaries).
- Search docs over 8 KiB are dropped with a reprocessor retry (after additional redaction).
- Integrity uses configured segment/block thresholds; seal window must keep p95 < 10 min.
- Export packages respect 512 MiB target and resume tokens; per-tenant concurrency caps apply.
- Back-pressure must prefer fairness: no single tenant can starve others.
Fixtures, Samples & Test Data¶
Provides golden artifacts and repeatable generators for developers, CI, and integration partners. Artifacts cover authoritative writes, projections, search docs, events, and exports—validated against the Schema Registry and produced with deterministic seeds.
JSON uses lowerCamel; C#/gRPC uses PascalCase; tables/columns use PascalCase. Payloads follow canonical JSON (JCS/RFC 8785) where noted.
Principles¶
- Deterministic: identical inputs → identical outputs (fixed RNG seed, fixed clock anchors).
- Tenant-scoped: fixtures always set
tenantId; RLS-safe. - Redaction-aware: no raw secrets; Personal/Sensitive fields are masked, hashed, or tokenized per policy.
- Minimal & real-ish: small enough to grok; realistic enough to catch edge cases (delta, holds, idempotency).
- Cross-form parity: JSONL ↔ projections ↔ events ↔ search docs ↔ exports represent the same facts.
Directory layout¶
/fixtures
/schemas
json/… # JSON Schemas (registry-resolved copies)
avro/… # Avro equivalents (subset)
proto/… # Optional .proto (published language)
/authoritative
minimal-10.jsonl
delta-redaction-50.jsonl
hotday-1k.jsonl
idempotency-dupes-5.jsonl
/projections
audit-events-*.csv
resource-events-*.csv
/search
docs-*.jsonl
/events
appended-*.jsonl
accepted-*.jsonl
/sql
postgres-seed.sql
sqlserver-seed.sql
/exports
manifest-sample.json
package-README.md
File naming: *-v{schemaVersion}-{yyyymmdd}.(json|jsonl|csv|sql) when applicable.
Authoritative JSONL (golden)¶
/fixtures/authoritative/minimal-10.jsonl
{"tenantId":"splootvets","schemaVersion":"auditrecord.v1","auditRecordId":"01JE7K4J9F9D0S6E7X5Q1A3BCP","createdAt":"2025-10-22T12:00:03.100Z","observedAt":"2025-10-22T12:00:03.300Z","action":"user.create","resource":{"type":"Iam.User","id":"U-1001"},"actor":{"id":"svc_gw","type":"Service","display":"ingress-gw"},"attributes":{"client.ip":"203.0.113.42","client.userAgent":"Mozilla/5.0"}}
{"tenantId":"splootvets","schemaVersion":"auditrecord.v1","auditRecordId":"01JE7K4JAMJQ2B1NE8V3V7Y5ND","createdAt":"2025-10-22T12:01:00.000Z","observedAt":"2025-10-22T12:01:00.120Z","action":"appointment.update","resource":{"type":"Vetspire.Appointment","id":"A-9981","path":"/status"},"actor":{"id":"user_123","type":"User","display":"A. Smith"},"decision":{"outcome":"Allow"},"delta":{"fields":{"status":{"before":"Pending","after":"Booked"}}},"attributes":{"email":"sha256:2c26b46b68ffc68ff99b453c1d304134","client.ip":"2001:db8::1"}}
/fixtures/authoritative/delta-redaction-50.jsonl
Contains variations:
- credential-like keys (dropped at write,
redactionHint=Dropped) - large strings (truncated), base64 caps, path normalization cases
- correlation with
traceId,requestId, causation chains
/fixtures/authoritative/idempotency-dupes-5.jsonl
Same record repeated with idempotencyKey to validate dedupe behavior.
Each JSONL line validates against
urn:connectsoft:schemas/domain/auditrecord.v1.json. Timestamps use ms precision.
Projections (derived CSV)¶
/fixtures/projections/audit-events-minimal-10.csv
TenantId,AuditRecordId,CreatedAt,ObservedAt,Action,ResourceType,ResourceId,ActorId,ActorType,DecisionOutcome,ChangedFields,DataClassFlags,CorrelationTraceId,PayloadBytes
splootvets,01JE7K4J9F9D0S6E7X5Q1A3BCP,2025-10-22T12:00:03.100Z,2025-10-22T12:00:03.300Z,user.create,Iam.User,U-1001,svc_gw,Service,,[],2,,512
splootvets,01JE7K4JAMJQ2B1NE8V3V7Y5ND,2025-10-22T12:01:00.000Z,2025-10-22T12:01:00.120Z,appointment.update,Vetspire.Appointment,A-9981,user_123,User,Allow,["status"],12,,1536
/fixtures/projections/resource-events-*.csv and actor-events-*.csv include monotonic Seq per key.
Search docs (redacted JSONL)¶
/fixtures/search/docs-minimal-10.jsonl
{"tenantId":"splootvets","auditRecordId":"01JE7K4J9F9D0S6E7X5Q1A3BCP","createdAt":"2025-10-22T12:00:03.100Z","observedAt":"2025-10-22T12:00:03.300Z","action":"user.create","resourceType":"Iam.User","resourceId":"U-1001","actorId":"svc_gw","actorType":"Service","changedFields":[],"dataClassFlags":2,"searchText":"user.create Iam.User U-1001 svc_gw"}
{"tenantId":"splootvets","auditRecordId":"01JE7K4JAMJQ2B1NE8V3V7Y5ND","createdAt":"2025-10-22T12:01:00.000Z","observedAt":"2025-10-22T12:01:00.120Z","action":"appointment.update","resourceType":"Vetspire.Appointment","resourceId":"A-9981","actorId":"user_123","actorType":"User","decisionOutcome":"Allow","changedFields":["status"],"dataClassFlags":12,"searchText":"appointment.update Vetspire.Appointment A-9981 user_123 status Booked"}
Event streams (enveloped JSONL)¶
/fixtures/events/appended-minimal-10.jsonl (one per authoritative line)
{"eventId":"01JE7K7G7B0Q3E5M7Z8X9V1C2D","eventType":"connectsoft.audit.v1/AuditRecord.Appended","tenantId":"splootvets","publishedAt":"2025-10-22T12:00:03.350Z","traceId":"3e1f2d0c9b8a7f6e5d4c3b2a19081716","schemaVersion":"event-envelope.v1","producer":"ingress-gw/2.4.1","data":{"auditRecordId":"01JE7K4J9F9D0S6E7X5Q1A3BCP","createdAt":"2025-10-22T12:00:03.100Z","observedAt":"2025-10-22T12:00:03.300Z","action":"user.create","resourceType":"Iam.User","resourceId":"U-1001","actorId":"svc_gw","actorType":"Service","hasDelta":false,"dataClassFlags":2,"payloadBytes":512}}
/fixtures/events/accepted-minimal-10.jsonl mirrors Accepted acks with status.
SQL seeds¶
/fixtures/sql/postgres-seed.sql
-- RLS/session context assumed set earlier (see Tenancy)
INSERT INTO "AuditRecords"
("AuditRecordId","TenantId","CreatedAt","ObservedAt","EffectiveAt","Action","ResourceType","ResourceId","ResourcePath","ActorId","ActorType","CorrelationTraceId","CorrelationRequestId","DecisionOutcome","IdempotencyKey","SchemaVersion","PayloadJson","PayloadBytes")
VALUES
('01JE7K4J9F9D0S6E7X5Q1A3BCP','splootvets','2025-10-22T12:00:03.100Z','2025-10-22T12:00:03.300Z',NULL,'user.create','Iam.User','U-1001',NULL,'svc_gw',1,'3e1f2d0c9b8a7f6e5d4c3b2a19081716',NULL,NULL,NULL,1,'{"tenantId":"splootvets","schemaVersion":"auditrecord.v1","auditRecordId":"01JE7K4J9F9D0S6E7X5Q1A3BCP","createdAt":"2025-10-22T12:00:03.100Z","observedAt":"2025-10-22T12:00:03.300Z","action":"user.create","resource":{"type":"Iam.User","id":"U-1001"},"actor":{"id":"svc_gw","type":"Service","display":"ingress-gw"}}',512);
INSERT INTO "AuditEvents"
("TenantId","AuditRecordId","CreatedAt","ObservedAt","Action","ResourceType","ResourceId","ActorId","ActorType","DecisionOutcome","ChangedFields","DataClassFlags","CorrelationTraceId","PayloadBytes")
VALUES
('splootvets','01JE7K4J9F9D0S6E7X5Q1A3BCP','2025-10-22T12:00:03.100Z','2025-10-22T12:00:03.300Z','user.create','Iam.User','U-1001','svc_gw',1,NULL,'[]',2,'3e1f2d0c9b8a7f6e5d4c3b2a19081716',512);
/fixtures/sql/sqlserver-seed.sql provides equivalent INSERT statements with NVARCHAR types.
Schemas (JSON & Avro)¶
/fixtures/schemas/json/auditrecord.v1.json → registry-resolved copy (read-only).
/fixtures/schemas/avro/auditrecord.v1.avsc (subset for export pipelines):
{
"type":"record","name":"AuditRecord","namespace":"connectsoft.domain.v1",
"fields":[
{"name":"tenantId","type":"string"},
{"name":"auditRecordId","type":"string"},
{"name":"schemaVersion","type":"string"},
{"name":"createdAt","type":"string"},
{"name":"observedAt","type":"string"},
{"name":"action","type":"string"},
{"name":"resource","type":{"type":"record","name":"Resource","fields":[
{"name":"type","type":"string"},
{"name":"id","type":"string"},
{"name":"path","type":["null","string"],"default":null}
]}},
{"name":"actor","type":{"type":"record","name":"Actor","fields":[
{"name":"id","type":"string"},
{"name":"type","type":"string"},
{"name":"display","type":["null","string"],"default":null}
]}},
{"name":"decision","type":["null",{"type":"record","name":"Decision","fields":[
{"name":"outcome","type":"string"},
{"name":"reason","type":["null","string"],"default":null}
]}],"default":null}
]
}
Export manifest sample¶
/fixtures/exports/manifest-sample.json
{
"jobId":"01JE7M2F2N7QW8E9R0T1Y2U3I4",
"tenantId":"splootvets",
"createdAt":"2025-10-22T12:10:00Z",
"format":"Jsonl",
"packages":[
{"packageId":"01JE7M2PAK0001","uri":"s3://bucket/us-central/splootvets/exports/2025/10/22/job-01JE7M2/part-0001.jsonl.gz","recordCount":500000,"sha256":"…"},
{"packageId":"01JE7M2PAK0002","uri":"s3://bucket/us-central/splootvets/exports/2025/10/22/job-01JE7M2/part-0002.jsonl.gz","recordCount":120345,"sha256":"…"}
],
"integrity":{"blockIds":["01JE7E0B5V2C6M9N3X7Z4K2J8L"],"signature":{"scheme":"Ed25519","signingKeyId":"kv:prod/atp-integrity/ed25519-2025-01"}}
}
Generators (C#, deterministic)¶
/fixtures/Generators.cs (excerpt)
[DataContract] public sealed class AuditRecord { /* domain contract (PascalCase) */ }
public static class FixtureGen
{
public static IEnumerable<AuditRecord> Minimal(string tenantId, DateTimeOffset anchorUtc, int count, int seed = 1337)
{
var rnd = new Random(seed);
for (var i = 0; i < count; i++)
{
var id = Ulid.NewUlid(anchorUtc.AddMilliseconds(i));
var mk = (string verb, string rtype, string rid) => rnd.Next(2) == 0
? ("user.create","Iam.User",$"U-{1000+i}")
: ("appointment.update","Vetspire.Appointment",$"A-{9000+i}");
var (verb, rtype, rid) = mk("", "", "");
yield return new AuditRecord {
TenantId = tenantId,
SchemaVersion = "auditrecord.v1",
AuditRecordId = id.ToString(),
CreatedAt = anchorUtc.AddMilliseconds(i),
ObservedAt = anchorUtc.AddMilliseconds(i).AddMilliseconds(200),
Action = verb,
Resource = new() { Type = rtype, Id = rid },
Actor = new() { Id = i % 3 == 0 ? "svc_gw" : "user_123", Type = i % 3 == 0 ? "Service" : "User", Display = i % 3 == 0 ? "ingress-gw" : "A. Smith" },
Decision = i % 2 == 0 ? new() { Outcome = "Allow" } : null
};
}
}
}
Emit JSONL canonically
var options = new JsonSerializerOptions { PropertyNamingPolicy = JsonNamingPolicy.CamelCase, WriteIndented = false };
await using var w = new StreamWriter("authoritative/minimal-10.jsonl", false, new UTF8Encoding(false));
foreach (var r in FixtureGen.Minimal("splootvets", DateTimeOffset.Parse("2025-10-22T12:00:00Z"), 10))
{
var json = JsonSerializer.Serialize(r, options);
await w.WriteLineAsync(json);
}
Compute
PayloadBytesas the UTF-8 byte length of the canonical JSON. Integrity hashing uses the JCS form of that JSON.
Validation harness¶
- JSON Schema: validate all JSON/JSONL via the registry copies in
/fixtures/schemas/json. - Cross-shape parity:
- For each authoritative line, lookup the same
auditRecordIdin projections, search docs, and events. - Assert invariants:
createdAt ≤ observedAt, classification flags,changedFieldsderivation.
- For each authoritative line, lookup the same
- RLS sanity: run seeds through both Postgres and SQL Server scripts with tenant context set;
SELECT COUNT(*)per tenant must match line counts. - Redaction: verify no
Credentialkeys/values appear in any artifact; verify IP masks in read profiles.
Test matrix (scenarios)¶
| ID | Case | Purpose |
|---|---|---|
| T01 | Minimal create/read | Happy path; schema/clock sanity |
| T02 | Update with delta | ChangedFields extraction; search doc text |
| T03 | Idempotent retry | Accepted(Duplicate) emission; unique key |
| T04 | Redaction at write | Credential drop; email hash/mask |
| T05 | IPv6 + UA | Canonicalization; truncation |
| T06 | Legal hold overlay | Lifecycle blocks purge |
| T07 | No-seal tenant | Eligible path without integrity sidecar |
| T08 | Large payload near 256 KiB | Back-pressure/error surfacing |
| T09 | Residency pin | Export target path prefixes contain region/tenant |
| T10 | Policy change | Policy.Changed effect on eligibility |
Notes¶
- Golden fixtures are immutable—add new files for new cases/versions; never rewrite in place.
- Keep a small README in each folder explaining origin, schema version, and generation seed.
- For CI, treat
/fixtures/**changes as contract-affecting; require approval to merge.