Skip to content

Data Model - Audit Trail Platform (ATP)

This document defines the canonical data model for the Audit Trail Platform (ATP). It is the contract for how audit facts are written, stored, projected, searched, exported, and governed across tenants.


Purpose

  • Provide a single source of truth for entities, value objects, and wire contracts used by ATP (write-path facts, read models, policies, proofs, exports).
  • Enable safe evolution (versioning and backward compatibility) across microservices, storage engines, and client SDKs.
  • Align teams on ubiquitous language (UL) used in ATP and referenced by the HLD and Context Map.

Scope

This document covers:

  • Canonical write model (AuditRecord) and its aggregates (Actor, ResourceRef, Correlation, Decision).
  • Policy and governance models (Classification, Redaction, Retention, Legal Hold).
  • Integrity structures (proof references, segments, Merkle roots) and export manifests.
  • Tenancy, partitioning, projections, and optional search index mappings.
  • Event contracts, validation and limits, schema evolution strategy.
  • Golden fixtures and conformance assets for CI.

Non-Goals

  • Runtime behavior of services (covered in HLD and component docs).
  • API endpoint routing, authz middleware, or deployment topologies.
  • Vendor-specific storage tuning beyond documented size budgets.
  • Business analytics models unrelated to audit semantics.

Modeling Principles & Conventions

This section defines canonical rules for names, time, identifiers, types, enums, nullability, casing, and cross-representation alignment (JSON, C# gRPC code-first, storage).


Conventions (Cheat-Sheet)

Topic Rule (MUST unless noted) Rationale Example
Character encoding UTF-8 everywhere Interop, signatures
Time RFC3339/ISO-8601 in UTC with Z suffix; accept offsets at ingress but normalize to UTC Consistency, ordering 2025-10-22T14:05:13.481Z
Clock skew Accept up to ±5 min skew; record both producer and gateway times when relevant Resilience createdAt vs observedAt
IDs (general) Opaque strings; ASCII [A-Za-z0-9._-]; length ≤ 128 Safe in logs/URLs exp_01HFJ0..., user.42
Record IDs ULID (26 chars, Crockford base32) for append-ordered facts K-sort by time; unique 01JD0R3R9E7Z4XTKP5B7XH3FXF
Correlation traceId (W3C 16-byte hex), spanId (8-byte), requestId (string), causationId (ULID) Debuggability traceId="4fd5…"
Idempotency idempotencyKey (≤128 chars) on write-path Safe retries POST /records with key
Booleans Never tri-state in JSON; nullable only if “unknown” is distinct Avoid ambiguity isRedacted: true|false
Numbers 64-bit integers; decimals as strings with regex if precision matters Avoid JS rounding "0.0001", "123.45"
Maps String keys; typed values; ≤ 100 entries unless stated Bound payloads { "tags": { "env":"prod" } }
Arrays Deterministic order if order matters; empty arrays allowed; omit when unknown Predictability events: []
Nullability Prefer omit over null; if null, define semantics Diff friendliness (see below)
Casing (JSON) lowerCamelCase field names; kebab-case schema filenames Ecosystem norms auditRecordId, audit-record.v1.json
Casing (C# gRPC code-first & Protobuf) PascalCase for classes, properties, methods, enum types & values, and protobuf field names (when emitted). .NET idioms; code-first parity AuditRecord.CreatedAt, AppendAsync(...)
Database schema PascalCase for table names (plural) and column names Readability; tooling parity Table AuditRecords, column CreatedAt
Resource types PascalCase singular nouns; namespaced where needed Clarity Patient, Vetspire.Appointment
Enum safety Include Unknown = 0; never reuse numbers; reserve removed values Wire compat (see Enums)
Problem+JSON RFC 9457 application/problem+json for API errors; include traceId Operability (see Errors)
Extensibility Additive evolution; unknown fields ignored; never break required invariants Compatibility C19 rules apply

Concrete size/count limits finalize in C20 — Validation, Limits & Canonicalization.


Time & Clocks

  • Ingress: Accept RFC3339 timestamps with offsets and normalize to UTC on write.
  • Canonical fields:
    • createdAt — producer-asserted (UTC)
    • observedAt — gateway receipt (UTC)
    • effectiveAt — policy/application effective (UTC)
  • Skew: Allow ±5 minutes; larger deltas annotate validation.warnings or route to a suspect queue.

Example (JSON)

{
  "auditRecordId": "01JD0R3R9E7Z4XTKP5B7XH3FXF",
  "createdAt": "2025-10-22T14:05:13.481Z",
  "observedAt": "2025-10-22T14:05:14.022Z"
}


Identifiers

  • ULID for append facts: auditRecordId MUST be a ULID to preserve time-ordered writes.
  • Opaque everywhere else: IDs are never parsed for meaning; semantics live in adjacent fields.
  • Idempotency: Producers SHOULD send idempotencyKey; gateways reject duplicates within a configured window (C20).

Validation

  • Regex (IDs): ^[A-Za-z0-9._-]{1,128}$
  • Regex (ULID): ^[0-9A-HJKMNP-TV-Z]{26}$

JSON ↔ C# gRPC Code-First ↔ Protobuf

Concept JSON C# (code-first) Protobuf (if emitted) Notes
Timestamp RFC3339 string (UTC) DateTimeOffset/Timestamp wrapper google.protobuf.Timestamp REST renders RFC3339; gRPC binary uses Timestamp
64-bit int JSON number long / ulong int64 / uint64 JS clients may stringify
Decimal (precise) string with pattern ^-?[0-9]+(\\.[0-9]+)?$ string string Avoid precision loss
Map JSON object Dictionary<string,T> map<string, T> String keys only
Optional Omit field nullable refs string? / T? optional / presence Prefer omit in JSON
Oneof Presence discriminator C# OneOf pattern / union type oneof Mutually exclusive
Enum JSON string literal (REST) enum DecisionOutcome { Unknown=0, ... } enum (0=Unknown) REST SHOULD expose strings
Naming lowerCamelCase PascalCase PascalCase (fields too) + json_name lowerCamelCase Keep REST/JSON consistent

C# code-first example (message + service)

[DataContract]
public sealed class AuditRecord
{
    [DataMember(Order = 1)] public string AuditRecordId { get; init; } = default!;
    [DataMember(Order = 2)] public DateTimeOffset CreatedAt { get; init; }
    [DataMember(Order = 3)] public string TenantId { get; init; } = default!;
}

[ServiceContract]
public interface IAuditIngestionService
{
    Task<AppendResult> AppendAsync(AuditRecord record, CallContext context = default);
}

If emitting .proto from code-first: keep PascalCase names, but set json_name to lowerCamelCase to align REST:

message AuditRecord {
  string AuditRecordId = 1 [json_name = "auditRecordId"];
  string TenantId      = 2 [json_name = "tenantId"];
  google.protobuf.Timestamp CreatedAt = 3 [json_name = "createdAt"];
}

JSON naming policy: REST services MUST apply a lowerCamelCase policy (e.g., System.Text.Json JsonNamingPolicy.CamelCase) so C# CreatedAt → JSON createdAt.


Nullability & Field Presence

  • Prefer omit in JSON for unknown/not applicable.
  • Use null only when meaningful (e.g., redacted): schema MUST define semantics.
  • Booleans: avoid tri-state; use an enum (Yes|No|Unknown) when needed.
  • C#: enable nullable ref types; mark optional as string?, T?.
  • Protobuf: use optional for presence; do not overload zero/empty.

Example

{
  "actor": {
    "id": "user_123",
    "email": null   // explicitly unknown (e.g., redacted)
  }
}

Enums & Versioning

  • Shape:
    • Types/values in PascalCase (DecisionOutcome.Allow)
    • Value 0 is Unknown and MUST exist
  • Stability: Never reuse numeric values; reserve removed ones.
  • JSON representation: REST SHOULD emit/accept string names.
  • Evolution: Additive; consumers must treat unknowns as Unknown or a safe default.

C#

public enum DecisionOutcome
{
    Unknown = 0,
    Allow   = 1,
    Deny    = 2,
    NotApplicable = 3
}

Naming & Casing Details

  • JSON fields: lowerCamelCase (auditRecordId, tenantId, createdAt).
  • C# & Protobuf (code-first): PascalCase for classes, properties, methods, and protobuf field names when generated.
  • Database schema:
    • Tables: PascalCase plural (AuditRecords, ExportJobs)
    • Columns: PascalCase (AuditRecordId, TenantId, CreatedAt)
    • Indexes: IX_<Table>_<Col1>_<Col2> (IX_AuditRecords_TenantId_CreatedAt)
    • Foreign keys: FK_<FromTable>_<ToTable>_<Column>

DDL sketch

CREATE TABLE dbo.AuditRecords (
  AuditRecordId CHAR(26)     NOT NULL, -- ULID
  TenantId      NVARCHAR(64) NOT NULL,
  CreatedAt     DATETIME2(3) NOT NULL,
  CONSTRAINT PK_AuditRecords PRIMARY KEY (AuditRecordId)
);
CREATE INDEX IX_AuditRecords_TenantId_CreatedAt
  ON dbo.AuditRecords (TenantId, CreatedAt);

Problem+JSON (API Error Hints)

Use RFC 9457 application/problem+json with:

  • type, title, status, detail, instance, traceId
  • errors map for field-level validation messages

Example

{
  "type": "urn:connectsoft:errors:validation",
  "title": "Invalid audit record",
  "status": 400,
  "detail": "createdAt must be a valid RFC3339 timestamp in UTC (Z).",
  "instance": "/records/ingest",
  "traceId": "4fd5d2ac7a0b8f1f",
  "errors": { "createdAt": ["Expected UTC (Z) suffix"] }
}

Canonical AuditRecord (Entity)

Defines the canonical write model appended by producers to represent a single, immutable audit fact within a tenant.


Overview

An AuditRecord captures who (actor) did what (action) to which resource (resource), when (createdAt), and in what context (correlation, optional decision).
Records are append-only and immutable once accepted; deduplication relies on auditRecordId (ULID) and/or an idempotencyKey.

Key invariants:

  • Immutability: No in-place updates after acceptance; corrections are new records linked via correlation.causationId.
  • Tenant isolation: tenantId is mandatory and non-transitive across joins.
  • Time: All canonical timestamps are UTC (see Principles).
  • Size bounds: Concrete limits finalized in the Validation & Limits section; design for bounded maps/arrays.

Fields

JSON field (lowerCamel) C# property (PascalCase) Type Req. Description & rules
auditRecordId AuditRecordId string(ULID) Primary identifier for the record; 26-char Crockford ULID.
tenantId TenantId string Logical tenant key; opaque, ASCII [A-Za-z0-9._-]{1,128}.
createdAt CreatedAt timestamp Producer-asserted time of the audited action (UTC).
observedAt ObservedAt timestamp Gateway receipt time (UTC).
actor Actor object Actor that initiated the action; see Actor Model (code-first class / schema ref).
resource Resource object Target of the action; see ResourceRef.
action Action string Verb or verb.noun taxonomy (e.g., create, appointment.update). Lowercase; max 64 chars.
decision Decision object Authorization outcome for access events; see Decision & Access Outcome.
correlation Correlation object Trace/request correlation; includes traceId, requestId, causationId.
idempotencyKey IdempotencyKey string Producer key for safe retries; ≤128 chars.
attributes Attributes map Flat tags for quick filtering (e.g., env=prod, region=us-central).
delta Delta object Field-level changes; see Deltas (before/after).
request Request object Optional ingress context (e.g., ip, userAgent); subject to classification/redaction.
schemaVersion SchemaVersion string Semantic version for the record shape (e.g., audit-record.v1).

Note: actor, resource, decision, correlation, and delta are defined in their respective sections and referenced here to keep the write model canonical and stable.


JSON Schema (v1)

{
  "$id": "urn:connectsoft:schemas:audit-record.v1.json",
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "title": "AuditRecord",
  "type": "object",
  "additionalProperties": false,
  "properties": {
    "auditRecordId": {
      "type": "string",
      "description": "ULID primary identifier",
      "pattern": "^[0-9A-HJKMNP-TV-Z]{26}$"
    },
    "tenantId": {
      "type": "string",
      "minLength": 1,
      "maxLength": 128,
      "pattern": "^[A-Za-z0-9._-]+$"
    },
    "createdAt": { "type": "string", "format": "date-time" },
    "observedAt": { "type": "string", "format": "date-time" },
    "actor": { "$ref": "urn:connectsoft:schemas:partials/actor.v1.json" },
    "resource": { "$ref": "urn:connectsoft:schemas:partials/resource-ref.v1.json" },
    "action": {
      "type": "string",
      "minLength": 1,
      "maxLength": 64,
      "pattern": "^[a-z]+([.][a-z][a-z0-9_-]+)?$"
    },
    "decision": { "$ref": "urn:connectsoft:schemas:partials/decision.v1.json" },
    "correlation": { "$ref": "urn:connectsoft:schemas:partials/correlation.v1.json" },
    "idempotencyKey": {
      "type": "string",
      "maxLength": 128,
      "pattern": "^[A-Za-z0-9._-]+$"
    },
    "attributes": {
      "type": "object",
      "additionalProperties": { "type": "string", "maxLength": 256 }
    },
    "delta": { "$ref": "urn:connectsoft:schemas:partials/delta.v1.json" },
    "request": {
      "type": "object",
      "additionalProperties": false,
      "properties": {
        "ip": { "type": "string", "format": "ipv4" },
        "userAgent": { "type": "string", "maxLength": 512 }
      }
    },
    "schemaVersion": { "type": "string", "pattern": "^audit-record\\.v[0-9]+$" }
  },
  "required": [
    "auditRecordId",
    "tenantId",
    "createdAt",
    "actor",
    "resource",
    "action"
  ]
}

C# (gRPC code-first) shape

[DataContract]
public sealed class AuditRecord
{
    [DataMember(Order = 1)] public string AuditRecordId { get; init; } = default!;
    [DataMember(Order = 2)] public string TenantId { get; init; } = default!;
    [DataMember(Order = 3)] public DateTimeOffset CreatedAt { get; init; }
    [DataMember(Order = 4)] public DateTimeOffset? ObservedAt { get; init; }

    [DataMember(Order = 5)] public Actor Actor { get; init; } = default!;
    [DataMember(Order = 6)] public ResourceRef Resource { get; init; } = default!;
    [DataMember(Order = 7)] public string Action { get; init; } = default!;

    [DataMember(Order = 8)] public Decision? Decision { get; init; }
    [DataMember(Order = 9)] public Correlation? Correlation { get; init; }

    [DataMember(Order = 10)] public string? IdempotencyKey { get; init; }
    [DataMember(Order = 11)] public IReadOnlyDictionary<string, string>? Attributes { get; init; }
    [DataMember(Order = 12)] public Delta? Delta { get; init; }

    [DataMember(Order = 13)] public RequestContext? Request { get; init; }
    [DataMember(Order = 14)] public string? SchemaVersion { get; init; } = "audit-record.v1";
}

[DataContract]
public sealed class RequestContext
{
    [DataMember(Order = 1)] public string? Ip { get; init; }
    [DataMember(Order = 2)] public string? UserAgent { get; init; }
}

JSON serialization MUST apply a camelCase naming policy so AuditRecordIdauditRecordId, etc. Database schema uses PascalCase table/column names (see below).


Protobuf (optional emission)

If you emit .proto from the code-first model, keep PascalCase for message/field names and set json_name to lowerCamelCase for REST:

syntax = "proto3";
package connectsoft.audit.v1;

import "google/protobuf/timestamp.proto";

message AuditRecord {
  string AuditRecordId = 1 [json_name = "auditRecordId"];
  string TenantId      = 2 [json_name = "tenantId"];
  google.protobuf.Timestamp CreatedAt  = 3 [json_name = "createdAt"];
  google.protobuf.Timestamp ObservedAt = 4 [json_name = "observedAt"];

  Actor      Actor    = 5 [json_name = "actor"];
  ResourceRef Resource = 6 [json_name = "resource"];
  string     Action   = 7 [json_name = "action"];

  Decision   Decision   = 8 [json_name = "decision"];
  Correlation Correlation = 9 [json_name = "correlation"];

  string IdempotencyKey = 10 [json_name = "idempotencyKey"];
  map<string,string> Attributes = 11 [json_name = "attributes"];
  Delta Delta = 12 [json_name = "delta"];

  RequestContext Request = 13 [json_name = "request"];
  string SchemaVersion = 14 [json_name = "schemaVersion"];
}

Examples

Minimal (required only)

{
  "auditRecordId": "01JE1X7F3Q5X1X3ZQ1TF9Q4Q7J",
  "tenantId": "splootvets",
  "createdAt": "2025-10-22T14:05:13.481Z",
  "actor": { "id": "user_123", "type": "User", "display": "Alex" },
  "resource": { "type": "Vetspire.Appointment", "id": "A-9981" },
  "action": "create"
}

Rich (with decision, correlation, delta)

{
  "auditRecordId": "01JE1X8MFT7Z7P8K9V7E7Q0Q2B",
  "tenantId": "splootvets",
  "createdAt": "2025-10-22T14:15:01.210Z",
  "observedAt": "2025-10-22T14:15:01.742Z",
  "actor": { "id": "svc-gw", "type": "Service", "display": "Gateway" },
  "resource": { "type": "Vetspire.Appointment", "id": "A-9981", "path": "/status" },
  "action": "appointment.update",
  "decision": { "outcome": "Allow", "reason": "Policy.Grant" },
  "correlation": {
    "traceId": "a3f2ccbd7a1f4c0f8b1e9f4d1b7f2a66",
    "requestId": "REQ-7f3b4a",
    "causationId": "01JE1X8M9QNG3B2W6J5SP2K5H5"
  },
  "idempotencyKey": "A-9981:status:2025-10-22T14:15:01Z",
  "attributes": { "env": "prod", "region": "us-central" },
  "delta": {
    "fields": {
      "status": { "before": "Scheduled", "after": "Booked" }
    }
  },
  "request": { "ip": "203.0.113.27", "userAgent": "Mozilla/5.0" },
  "schemaVersion": "audit-record.v1"
}

Storage mapping (authoritative store)

Storage names use PascalCase (tables/columns). Immutability is enforced via append-only semantics and constraints.

CREATE TABLE dbo.AuditRecords (
  AuditRecordId CHAR(26)     NOT NULL, -- ULID
  TenantId      NVARCHAR(64) NOT NULL,
  CreatedAt     DATETIME2(3) NOT NULL,
  ObservedAt    DATETIME2(3) NULL,
  Actor         NVARCHAR(MAX) NOT NULL,     -- JSON (Actor)
  Resource      NVARCHAR(512) NOT NULL,     -- JSON or structured FK; start JSON
  Action        NVARCHAR(64)  NOT NULL,
  Decision      NVARCHAR(MAX) NULL,         -- JSON (Decision)
  Correlation   NVARCHAR(256) NULL,         -- JSON (ids) or separate table if needed
  IdempotencyKey NVARCHAR(128) NULL,
  Attributes    NVARCHAR(MAX) NULL,         -- JSON (tags)
  Delta         NVARCHAR(MAX) NULL,         -- JSON (field changes)
  Request       NVARCHAR(1024) NULL,        -- JSON (IP/UA) — subject to redaction
  SchemaVersion NVARCHAR(32) NULL,
  CONSTRAINT PK_AuditRecords PRIMARY KEY (AuditRecordId)
);

CREATE INDEX IX_AuditRecords_TenantId_CreatedAt
  ON dbo.AuditRecords (TenantId, CreatedAt);

CREATE UNIQUE INDEX UX_AuditRecords_Tenant_Idempotency
  ON dbo.AuditRecords (TenantId, IdempotencyKey)
  WHERE IdempotencyKey IS NOT NULL;

Notes: You may start with JSON columns for Actor, Decision, Correlation, Attributes, Delta, and later project them into read models (see Projections).


Actor Model

Defines the subject that initiated the action: a human user, a backend service, or a scheduled/background job. The Actor travels inside AuditRecord to capture identity, roles, and provenance (identity provider details, client information, and optional on-behalf-of linkage).


Overview

An Actor is a compact, PII-aware identity envelope designed for long-term immutable storage:

  • Types: User, Service, Job (plus Unknown for safety).
  • Identity stability: Prefer stable, opaque IDs (never parse IDs for meaning).
  • Provenance: Map OIDC/identity tokens to normalized fields (Issuer, Subject, ClientId, etc.).
  • Impersonation / On-Behalf-Of: When a service acts for a user, capture OnBehalfOf with a minimal ref.
  • PII minimization: Email is optional and may be redacted; EmailHash (SHA-256 hex) supports joins without exposing PII.

Fields

JSON (lowerCamel) C# (PascalCase) Type Req. Description & rules
id Id string Stable opaque identifier of the actor (ASCII [A-Za-z0-9._-]{1,128}).
type Type enum One of Unknown | User | Service | Job.
display Display string Friendly label (max 128). May be redacted on read.
email Email string (email) Optional; PII. If present, consider redaction/policy.
emailHash EmailHash string (hex64) SHA-256 of lowercase trimmed email; recommended when email is omitted/redacted.
roles Roles array\ Role names (≤32 items; each ≤64 chars, ASCII [A-Za-z0-9._:-]).
provenance Provenance object Identity provider details (issuer, subject, clientId, authType, session, jti).
onBehalfOf OnBehalfOf object Minimal actor ref when this actor acted for another principal (e.g., service for user).

PII fields (email, potentially display) are subject to classification & redaction policies defined elsewhere.


JSON Schema (partial, v1)

{
  "$id": "urn:connectsoft:schemas:partials/actor.v1.json",
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "title": "Actor",
  "type": "object",
  "additionalProperties": false,
  "properties": {
    "id": {
      "type": "string",
      "minLength": 1,
      "maxLength": 128,
      "pattern": "^[A-Za-z0-9._-]+$"
    },
    "type": {
      "type": "string",
      "enum": ["Unknown", "User", "Service", "Job"]
    },
    "display": { "type": "string", "maxLength": 128 },
    "email": { "type": "string", "format": "email", "maxLength": 254 },
    "emailHash": {
      "type": "string",
      "pattern": "^[a-f0-9]{64}$"
    },
    "roles": {
      "type": "array",
      "maxItems": 32,
      "items": {
        "type": "string",
        "maxLength": 64,
        "pattern": "^[A-Za-z0-9._:-]+$"
      }
    },
    "provenance": {
      "type": "object",
      "additionalProperties": false,
      "properties": {
        "issuer": { "type": "string", "maxLength": 256 },
        "subject": { "type": "string", "maxLength": 128 },
        "clientId": { "type": "string", "maxLength": 128 },
        "authType": { "type": "string", "maxLength": 64 },
        "sessionId": { "type": "string", "maxLength": 128 },
        "tokenId": { "type": "string", "maxLength": 128 }
      }
    },
    "onBehalfOf": {
      "type": "object",
      "additionalProperties": false,
      "properties": {
        "id": {
          "type": "string",
          "minLength": 1,
          "maxLength": 128,
          "pattern": "^[A-Za-z0-9._-]+$"
        },
        "type": { "type": "string", "enum": ["Unknown", "User", "Service", "Job"] },
        "display": { "type": "string", "maxLength": 128 }
      },
      "required": ["id", "type"]
    }
  },
  "required": ["id", "type"]
}

C# (gRPC code-first)

[DataContract]
public sealed class Actor
{
    [DataMember(Order = 1)] public string Id { get; init; } = default!;
    [DataMember(Order = 2)] public ActorType Type { get; init; } = ActorType.Unknown;
    [DataMember(Order = 3)] public string? Display { get; init; }

    [DataMember(Order = 4)] public string? Email { get; init; }           // PII; may be redacted on read
    [DataMember(Order = 5)] public string? EmailHash { get; init; }       // SHA-256 hex of normalized email

    [DataMember(Order = 6)] public IReadOnlyList<string>? Roles { get; init; }
    [DataMember(Order = 7)] public ActorProvenance? Provenance { get; init; }
    [DataMember(Order = 8)] public ActorRef? OnBehalfOf { get; init; }
}

[DataContract]
public enum ActorType
{
    Unknown = 0,
    User    = 1,
    Service = 2,
    Job     = 3
}

[DataContract]
public sealed class ActorProvenance
{
    [DataMember(Order = 1)] public string? Issuer { get; init; }     // e.g., https://login.microsoftonline.com/<tenant>/v2.0
    [DataMember(Order = 2)] public string? Subject { get; init; }    // OIDC 'sub'
    [DataMember(Order = 3)] public string? ClientId { get; init; }   // OIDC 'azp' or 'client_id'
    [DataMember(Order = 4)] public string? AuthType { get; init; }   // e.g., "OIDC", "PAT", "mTLS", "APIKey"
    [DataMember(Order = 5)] public string? SessionId { get; init; }  // OIDC 'sid'
    [DataMember(Order = 6)] public string? TokenId { get; init; }    // OIDC 'jti'
}

[DataContract]
public sealed class ActorRef
{
    [DataMember(Order = 1)] public string Id { get; init; } = default!;
    [DataMember(Order = 2)] public ActorType Type { get; init; } = ActorType.Unknown;
    [DataMember(Order = 3)] public string? Display { get; init; }
}

JSON serialization MUST use camelCase policy so EmailHashemailHash. Database columns follow PascalCase if stored separately (ActorEmailHash, etc.), but typically Actor is embedded as JSON within AuditRecords.


Protobuf (optional emission)

syntax = "proto3";
package connectsoft.audit.v1;

message Actor {
  string Id = 1 [json_name = "id"];
  ActorType Type = 2 [json_name = "type"];
  string Display = 3 [json_name = "display"];
  string Email = 4 [json_name = "email"];
  string EmailHash = 5 [json_name = "emailHash"];
  repeated string Roles = 6 [json_name = "roles"];
  ActorProvenance Provenance = 7 [json_name = "provenance"];
  ActorRef OnBehalfOf = 8 [json_name = "onBehalfOf"];
}

enum ActorType {
  ActorType_Unknown = 0;
  ActorType_User    = 1;
  ActorType_Service = 2;
  ActorType_Job     = 3;
}

message ActorProvenance {
  string Issuer = 1 [json_name = "issuer"];
  string Subject = 2 [json_name = "subject"];
  string ClientId = 3 [json_name = "clientId"];
  string AuthType = 4 [json_name = "authType"];
  string SessionId = 5 [json_name = "sessionId"];
  string TokenId = 6 [json_name = "tokenId"];
}

message ActorRef {
  string Id = 1 [json_name = "id"];
  ActorType Type = 2 [json_name = "type"];
  string Display = 3 [json_name = "display"];
}

Examples

User actor with redacted email (hash only)

{
  "id": "user_123",
  "type": "User",
  "display": "Alex",
  "email": null,
  "emailHash": "7c4a8d09ca3762af61e59520943dc26494f8941b...".padEnd(64, "x"),
  "roles": ["Member", "Veterinarian"],
  "provenance": {
    "issuer": "https://login.microsoftonline.com/contoso/v2.0",
    "subject": "a1b2c3d4-...-z9",
    "authType": "OIDC",
    "sessionId": "S-abc123",
    "tokenId": "JTI-789"
  }
}

Service acting on behalf of a user

{
  "id": "svc-gateway",
  "type": "Service",
  "display": "Gateway",
  "roles": ["System"],
  "provenance": {
    "clientId": "gw-client",
    "authType": "mTLS"
  },
  "onBehalfOf": {
    "id": "user_123",
    "type": "User",
    "display": "Alex"
  }
}

Scheduled job

{
  "id": "job-export-2025-10-22T02:00Z",
  "type": "Job",
  "display": "Nightly Export"
}

Validation rules (summary)

  • id pattern ^[A-Za-z0-9._-]{1,128}$.
  • roles ≤ 32 items; each max 64 chars, ASCII [A-Za-z0-9._:-].
  • If email present, compute emailHash = SHA256(lowercase(trim(email))) and prefer masking email on read according to policy.
  • onBehalfOf requires both id and type.

Use these rules when validating AuditRecord.Actor. Redaction/PII handling is governed by the platform policies referenced elsewhere in the documentation.


ResourceRef Model

Standardizes how a record points to the target resource affected by an action: its type, identifier, and an optional path to a sub-element. Includes an optional tenantScopedId for efficient partition/index keys.


Overview

A ResourceRef is a compact, immutable pointer:

  • Type describes the domain kind (e.g., Appointment, ExportJob) and may be namespaced (e.g., Vetspire.Appointment).
  • Id is an opaque identifier meaningful to the producing system—never parsed for semantics.
  • Path optionally narrows the reference to a field or sub-resource using a JSON Pointer–style path.
  • Tenant scope is explicit via the parent AuditRecord.tenantId; tenantScopedId is an optional, derived convenience key.

Fields

JSON (lowerCamel) C# (PascalCase) Type Req. Description & rules
type Type string PascalCase singular; may be dotted namespace (^[A-Z][A-Za-z0-9]*(\\.[A-Z][A-Za-z0-9]*)*$). Examples: Appointment, Vetspire.Appointment.
id Id string Opaque ASCII identifier [A-Za-z0-9._:-]{1,128}; must not contain /.
path Path string Optional JSON Pointer–style path to a sub-element. Use /-separated tokens; escape ~~0, /~1. Examples: /status, /lines/0/price.
tenantScopedId TenantScopedId string Optional derived key for indexing/partitioning. Canonical form: <tenantId>:<type>:<id> (ASCII, max 160). Not authoritative.

PII note: Some id values can be PII or sensitive (e.g., emails). Classification/redaction policies apply on read paths.


JSON Schema (partial, v1)

{
  "$id": "urn:connectsoft:schemas:partials/resource-ref.v1.json",
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "title": "ResourceRef",
  "type": "object",
  "additionalProperties": false,
  "properties": {
    "type": {
      "type": "string",
      "pattern": "^[A-Z][A-Za-z0-9]*(\\.[A-Z][A-Za-z0-9]*)*$",
      "minLength": 1,
      "maxLength": 128
    },
    "id": {
      "type": "string",
      "pattern": "^[A-Za-z0-9._:-]{1,128}$"
    },
    "path": {
      "type": "string",
      "pattern": "^(/([^/~]|~[01])*)*$",
      "maxLength": 256
    },
    "tenantScopedId": {
      "type": "string",
      "pattern": "^[A-Za-z0-9._:-]{1,160}$"
    }
  },
  "required": ["type", "id"]
}

C# (gRPC code-first)

[DataContract]
public sealed class ResourceRef
{
    [DataMember(Order = 1)] public string Type { get; init; } = default!; // PascalCase, optional dotted namespace
    [DataMember(Order = 2)] public string Id { get; init; } = default!;   // Opaque; no slashes
    [DataMember(Order = 3)] public string? Path { get; init; }            // JSON Pointer–style (e.g., "/status", "/lines/0/price")
    [DataMember(Order = 4)] public string? TenantScopedId { get; init; }  // "<tenantId>:<type>:<id>" (optional, derived)
}

public static class ResourceRefRules
{
    public const string TypePattern = "^[A-Z][A-Za-z0-9]*(\\.[A-Z][A-Za-z0-9]*)*$";
    public const string IdPattern   = "^[A-Za-z0-9._:-]{1,128}$";
    public const string PathPattern = "^(/([^/~]|~[01])*)*$";

    public static string MakeTenantScopedId(string tenantId, string type, string id)
        => $"{tenantId}:{type}:{id}";
}

JSON serialization MUST use camelCase policy so TenantScopedIdtenantScopedId. Database columns follow PascalCase if stored separately (e.g., ResourceType, ResourceId, ResourcePath, ResourceTenantScopedId) or embed the whole Resource object as JSON in AuditRecords.


Protobuf (optional emission)

syntax = "proto3";
package connectsoft.audit.v1;

message ResourceRef {
  string Type = 1 [json_name = "type"];            // PascalCase, may be namespaced
  string Id   = 2 [json_name = "id"];              // Opaque; no '/'
  string Path = 3 [json_name = "path"];            // JSON Pointer–style
  string TenantScopedId = 4 [json_name = "tenantScopedId"]; // Derived convenience key
}

Examples

Simple resource

{ "type": "Appointment", "id": "A-9981" }

Namespaced external resource

{ "type": "Vetspire.Appointment", "id": "24017" }

Sub-resource path (JSON Pointer semantics)

{ "type": "Vetspire.Appointment", "id": "A-9981", "path": "/status" }

With tenant-scoped key

{
  "type": "Vetspire.Appointment",
  "id": "A-9981",
  "tenantScopedId": "splootvets:Vetspire.Appointment:A-9981"
}

Array element path

{ "type": "ExportJob", "id": "01JE2N6M2G8WQ3E8J0A5N6D4QS", "path": "/packages/0/hash" }

Validation rules (summary)

  • type must match PascalCase (with optional dotted namespaces).
  • id must match ^[A-Za-z0-9._:-]{1,128}$ and must not include /.
  • path uses a JSON Pointer–style subset: /segment/0/name; escape / as ~1 and ~ as ~0.
  • tenantScopedId is optional, derived, and must not be the sole source of tenant enforcement (tenant isolation is governed by AuditRecord.tenantId and storage-level policies).

Correlation & Provenance

Defines how an AuditRecord links to distributed traces, requests, and causal chains, and how the producer (service or component) is identified. Standardized correlation enables stitching records across gateways, services, jobs, and exports.


Overview

Correlation captures where the record came from and how it relates to other work:

  • Trace: W3C Trace Context (traceId, optional spanId) to join with OpenTelemetry.
  • Request: A stable requestId for the inbound operation (HTTP/gRPC/message).
  • Causation: A causationId (ULID) referencing the prior record/command that triggered this fact.
  • Producer: The service/runtime that produced the record (name, version, instance, env, region).

Fields

JSON (lowerCamel) C# (PascalCase) Type Req. Description & rules
traceId TraceId string (hex32) W3C 16-byte ID rendered as 32 lowercase hex chars: ^[0-9a-f]{32}$.
spanId SpanId string (hex16) W3C 8-byte span ID: ^[0-9a-f]{16}$. Optional on write path.
requestId RequestId string Stable per inbound request. ASCII [A-Za-z0-9._-]{1,128}.
causationId CausationId string (ULID) Links to the cause (prior record/command/message).
producer Producer object Producer identity: service name/version/instance/env/region.

traceId SHOULD be present on all records. If missing at ingress, the gateway MUST create a new trace and set traceId.


JSON Schema (partial, v1)

{
  "$id": "urn:connectsoft:schemas:partials/correlation.v1.json",
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "title": "Correlation",
  "type": "object",
  "additionalProperties": false,
  "properties": {
    "traceId": {
      "type": "string",
      "pattern": "^[0-9a-f]{32}$"
    },
    "spanId": {
      "type": "string",
      "pattern": "^[0-9a-f]{16}$"
    },
    "requestId": {
      "type": "string",
      "maxLength": 128,
      "pattern": "^[A-Za-z0-9._-]+$"
    },
    "causationId": {
      "type": "string",
      "pattern": "^[0-9A-HJKMNP-TV-Z]{26}$"
    },
    "producer": {
      "$ref": "urn:connectsoft:schemas:partials/producer.v1.json"
    }
  },
  "required": ["traceId"]
}

Producer schema (partial, v1)

{
  "$id": "urn:connectsoft:schemas:partials/producer.v1.json",
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "title": "Producer",
  "type": "object",
  "additionalProperties": false,
  "properties": {
    "service": { "type": "string", "minLength": 1, "maxLength": 128, "pattern": "^[A-Za-z0-9._-]+$" },
    "version": { "type": "string", "maxLength": 64 },
    "environment": { "type": "string", "maxLength": 32 },     // e.g., dev|staging|prod
    "instanceId": { "type": "string", "maxLength": 128 },      // hostname/pod
    "region": { "type": "string", "maxLength": 32 },           // e.g., us-central, westeurope
    "zone": { "type": "string", "maxLength": 32 }              // optional AZ
  },
  "required": ["service"]
}

C# (gRPC code-first)

[DataContract]
public sealed class Correlation
{
    [DataMember(Order = 1)] public string TraceId { get; init; } = default!;        // hex32
    [DataMember(Order = 2)] public string? SpanId { get; init; }                    // hex16
    [DataMember(Order = 3)] public string? RequestId { get; init; }                 // ASCII token
    [DataMember(Order = 4)] public string? CausationId { get; init; }               // ULID
    [DataMember(Order = 5)] public Producer? Producer { get; init; }
}

[DataContract]
public sealed class Producer
{
    [DataMember(Order = 1)] public string Service { get; init; } = default!;
    [DataMember(Order = 2)] public string? Version { get; init; }
    [DataMember(Order = 3)] public string? Environment { get; init; }   // dev|staging|prod
    [DataMember(Order = 4)] public string? InstanceId { get; init; }    // hostname/pod
    [DataMember(Order = 5)] public string? Region { get; init; }        // e.g., us-central
    [DataMember(Order = 6)] public string? Zone { get; init; }          // e.g., us-central1-a
}

JSON serialization MUST use camelCase (TraceIdtraceId). Database columns follow PascalCase if denormalized (e.g., TraceId, SpanId, RequestId, CausationId, Producer JSON).


Protobuf (optional emission)

syntax = "proto3";
package connectsoft.audit.v1;

message Correlation {
  string TraceId = 1 [json_name = "traceId"];     // hex32
  string SpanId = 2 [json_name = "spanId"];       // hex16
  string RequestId = 3 [json_name = "requestId"];
  string CausationId = 4 [json_name = "causationId"]; // ULID
  Producer Producer = 5 [json_name = "producer"];
}

message Producer {
  string Service = 1 [json_name = "service"];
  string Version = 2 [json_name = "version"];
  string Environment = 3 [json_name = "environment"];
  string InstanceId = 4 [json_name = "instanceId"];
  string Region = 5 [json_name = "region"];
  string Zone = 6 [json_name = "zone"];
}

Propagation rules

HTTP/gRPC ingress

  • Parse W3C Trace Context header traceparent (and tracestate if present).
  • If traceparent missing, start a new trace and set traceId; generate a server spanId.
  • Derive requestId from:
    • x-request-id if present; else gateway-generated token.
  • Attach producer at the component that creates the AuditRecord (e.g., gateway or service).

Async messaging / background jobs

  • When publishing/consuming messages, propagate traceId via transport headers; if new work is caused by a prior record/command, set causationId to the ULID of that prior entity.
  • Background/scheduled jobs SHOULD start a new span and link to the originating trace if known; otherwise only producer is set.

Cross-service calls

  • Always forward traceparent and tracestate.
  • Each service uses a new spanId; traceId remains stable.
  • If an error response is returned, include requestId (and traceId) in Problem+JSON for user correlation.

Header & attribute mapping (reference)

Surface From To Notes
HTTP traceparent correlation.traceId/spanId W3C propagation
HTTP (legacy) x-request-id / x-correlation-id correlation.requestId Accept either; prefer x-request-id
gRPC metadata traceparent correlation.traceId/spanId Same format
Messaging transport headers (e.g., traceparent) correlation.traceId/spanId Preserve on publish/consume
Internal prior record/command ULID correlation.causationId Link causal chain

Stitched examples

1) Gateway appends record at ingress

{
  "correlation": {
    "traceId": "a3f2ccbd7a1f4c0f8b1e9f4d1b7f2a66",
    "spanId": "9b1e9f4d1b7f2a66",
    "requestId": "REQ-7f3b4a",
    "producer": { "service": "gateway", "version": "1.12.0", "environment": "prod", "instanceId": "gw-7c9ff7", "region": "us-central" }
  }
}

2) Ingestion service persists derived record (same trace, new span)

{
  "correlation": {
    "traceId": "a3f2ccbd7a1f4c0f8b1e9f4d1b7f2a66",
    "spanId": "c0f8b1e9f4d1b7f2",
    "requestId": "REQ-7f3b4a",
    "causationId": "01JE1X8M9QNG3B2W6J5SP2K5H5",
    "producer": { "service": "ingestion", "version": "2.4.3", "environment": "prod", "instanceId": "ing-42a1", "region": "us-central" }
  }
}

3) Projector handles event asynchronously (same trace, new span; causal link to append)

{
  "correlation": {
    "traceId": "a3f2ccbd7a1f4c0f8b1e9f4d1b7f2a66",
    "spanId": "7a1f4c0f8b1e9f4d",
    "causationId": "01JE1X8MFT7Z7P8K9V7E7Q0Q2B",
    "producer": { "service": "projector", "version": "0.9.0", "environment": "prod", "instanceId": "proj-0a77", "region": "us-central" }
  }
}

Validation rules (summary)

  • traceId: ^[0-9a-f]{32}$; spanId: ^[0-9a-f]{16}$.
  • requestId: ASCII token ^[A-Za-z0-9._-]{1,128}$.
  • causationId: ULID pattern ^[0-9A-HJKMNP-TV-Z]{26}$.
  • producer.service required; other producer fields optional.
  • traceId SHOULD be present on all records; when absent at ingress, generate one and propagate forward.

Decision & Access Outcome

Represents the authorization outcome associated with an access attempt or policy check performed during an action. It is optional on AuditRecord but SHOULD be populated by components that enforce/consult policy (gateway, authz service, PDP).


Overview

  • Outcome enum: Allow, Deny, NotApplicable (Unknown for forward-compat).
  • Reasoning: Capture a compact reason code (machine-friendly) and optional human detail.
  • Evidence: Optionally reference the policy/rule and engine used to evaluate.
  • Attributes: Small key/value map with parameters relevant to the decision (e.g., scope=read:appointments, mfa=required).
  • Time: Include evaluatedAt (UTC) for the moment the decision was made.

Fields

JSON (lowerCamel) C# (PascalCase) Type Req. Description & rules
outcome Outcome enum Unknown | Allow | Deny | NotApplicable.
reasonCode ReasonCode string Machine-friendly code. Pattern ^[A-Za-z][A-Za-z0-9]*(\\.[A-Za-z][A-Za-z0-9_-]*)*$ (e.g., Policy.Grant, Policy.Deny.Scope).
reason Reason string Human-readable explanation (≤512 chars).
attributes Attributes map\ Small K/V context (≤32 entries, value ≤256). No secrets/PII.
policyRef PolicyRef object Points to policy & rule that produced the outcome.
engine Engine object Evaluator identity (name/version/mode).
evaluatedAt EvaluatedAt timestamp RFC3339 UTC Z. Defaults to observedAt when omitted.

PolicyRef - id (string ≤128), version (string ≤32), ruleId (string ≤64), name (string ≤128, optional).

Engine - name (e.g., pdp, opa, authz-gw), version (e.g., 2.1.0), mode (e.g., enforce, audit).


JSON Schema (v1)

{
  "$id": "urn:connectsoft:schemas:partials/decision.v1.json",
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "title": "Decision",
  "type": "object",
  "additionalProperties": false,
  "properties": {
    "outcome": {
      "type": "string",
      "enum": ["Unknown", "Allow", "Deny", "NotApplicable"]
    },
    "reasonCode": {
      "type": "string",
      "maxLength": 128,
      "pattern": "^[A-Za-z][A-Za-z0-9]*(\\.[A-Za-z][A-Za-z0-9_-]*)*$"
    },
    "reason": { "type": "string", "maxLength": 512 },
    "attributes": {
      "type": "object",
      "maxProperties": 32,
      "additionalProperties": { "type": "string", "maxLength": 256 }
    },
    "policyRef": {
      "type": "object",
      "additionalProperties": false,
      "properties": {
        "id":      { "type": "string", "maxLength": 128, "minLength": 1 },
        "version": { "type": "string", "maxLength": 32 },
        "ruleId":  { "type": "string", "maxLength": 64 },
        "name":    { "type": "string", "maxLength": 128 }
      },
      "required": ["id"]
    },
    "engine": {
      "type": "object",
      "additionalProperties": false,
      "properties": {
        "name":    { "type": "string", "maxLength": 64 },
        "version": { "type": "string", "maxLength": 32 },
        "mode":    { "type": "string", "maxLength": 32 }
      },
      "required": ["name"]
    },
    "evaluatedAt": { "type": "string", "format": "date-time" }
  },
  "required": ["outcome"]
}

C# (gRPC code-first)

[DataContract]
public sealed class Decision
{
    [DataMember(Order = 1)] public DecisionOutcome Outcome { get; init; } = DecisionOutcome.Unknown;
    [DataMember(Order = 2)] public string? ReasonCode { get; init; }   // e.g., "Policy.Grant", "Policy.Deny.Scope"
    [DataMember(Order = 3)] public string? Reason { get; init; }       // human readable
    [DataMember(Order = 4)] public IReadOnlyDictionary<string, string>? Attributes { get; init; }
    [DataMember(Order = 5)] public PolicyRef? PolicyRef { get; init; }
    [DataMember(Order = 6)] public DecisionEngine? Engine { get; init; }
    [DataMember(Order = 7)] public DateTimeOffset? EvaluatedAt { get; init; }
}

[DataContract]
public enum DecisionOutcome
{
    Unknown       = 0,
    Allow         = 1,
    Deny          = 2,
    NotApplicable = 3
}

[DataContract]
public sealed class PolicyRef
{
    [DataMember(Order = 1)] public string Id { get; init; } = default!;
    [DataMember(Order = 2)] public string? Version { get; init; }
    [DataMember(Order = 3)] public string? RuleId { get; init; }
    [DataMember(Order = 4)] public string? Name { get; init; }
}

[DataContract]
public sealed class DecisionEngine
{
    [DataMember(Order = 1)] public string Name { get; init; } = default!;   // "pdp", "opa", "authz-gw"
    [DataMember(Order = 2)] public string? Version { get; init; }           // "2.1.0"
    [DataMember(Order = 3)] public string? Mode { get; init; }              // "enforce" | "audit"
}

JSON serialization MUST use camelCase; DB columns remain PascalCase if denormalized (DecisionOutcome, DecisionReasonCode, …). Avoid persisting large attributes; keep ≤32 entries.


Protobuf (optional emission)

syntax = "proto3";
package connectsoft.audit.v1;

message Decision {
  DecisionOutcome Outcome = 1 [json_name = "outcome"];
  string ReasonCode = 2 [json_name = "reasonCode"];
  string Reason = 3 [json_name = "reason"];
  map<string, string> Attributes = 4 [json_name = "attributes"];
  PolicyRef PolicyRef = 5 [json_name = "policyRef"];
  DecisionEngine Engine = 6 [json_name = "engine"];
  google.protobuf.Timestamp EvaluatedAt = 7 [json_name = "evaluatedAt"];
}

enum DecisionOutcome {
  DecisionOutcome_Unknown = 0;
  DecisionOutcome_Allow = 1;
  DecisionOutcome_Deny = 2;
  DecisionOutcome_NotApplicable = 3;
}

message PolicyRef {
  string Id = 1 [json_name = "id"];
  string Version = 2 [json_name = "version"];
  string RuleId = 3 [json_name = "ruleId"];
  string Name = 4 [json_name = "name"];
}

message DecisionEngine {
  string Name = 1 [json_name = "name"];
  string Version = 2 [json_name = "version"];
  string Mode = 3 [json_name = "mode"];
}

Examples

Allow with explicit policy & scope

{
  "outcome": "Allow",
  "reasonCode": "Policy.Grant",
  "attributes": { "scope": "appointments:read", "tenantPolicy": "TIER_2" },
  "policyRef": { "id": "access-policy-main", "version": "2025-10-01", "ruleId": "R-ALLOW-APPT-READ" },
  "engine": { "name": "pdp", "version": "2.1.0", "mode": "enforce" },
  "evaluatedAt": "2025-10-22T14:05:14.022Z"
}

Deny due to missing scope

{
  "outcome": "Deny",
  "reasonCode": "Policy.Deny.Scope",
  "reason": "Caller lacks scope appointments:write",
  "attributes": { "requiredScope": "appointments:write" },
  "policyRef": { "id": "access-policy-main", "version": "2025-10-01", "ruleId": "R-DENY-MISSING-SCOPE" },
  "engine": { "name": "authz-gw", "version": "1.12.0", "mode": "enforce" }
}

Not applicable (non-access event)

{ "outcome": "NotApplicable", "reasonCode": "Policy.NA.EventType" }

Auditing guidance

  • Populate decision only when an access control or policy evaluation occurred for the audited action.
  • Prefer a stable reasonCode taxonomy for analytics (e.g., Policy.Grant, Policy.Deny.Scope, Policy.Deny.MFARequired, Policy.NA.EventType).
  • Keep attributes small and non-sensitive (no tokens, no secrets, no full PII).
  • If the decision depends on tenant edition/feature flags, include a neutral attribute (e.g., edition=Pro) but do not duplicate entire policy documents.
  • For on-behalf-of flows, store the acting principal in actor and use attributes to hint at delegation constraints if needed (e.g., delegation=limited).
  • Ensure evaluatedAt (UTC) is set by the component making the decision; if absent, readers may fall back to observedAt.

Data Classification & Redaction Rules

Defines the sensitivity taxonomy (DataClass) and the redaction rules applied to fields at write/read/export time. The goal is to minimize exposure of PII/Secrets while keeping audit facts useful and verifiable.


Overview

  • Classification first: Every sensitive field is labeled with a DataClass.
  • Rule-driven transforms: A RedactionRule with a kind (None|Hash|Mask|Drop|Tokenize) and optional params dictates how a field’s value is handled.
  • Where applied:
    • Write path (ingestion): classification is attached; only classes marked “never store raw” are transformed at write-time (e.g., Credential).
    • Read path (APIs/exports): rules are enforced based on the caller’s clearance/role/tenant policy and the effective RedactionPlan.
  • Determinism: Hashing must be deterministic across tenants only when required; otherwise tenant-salted.

DataClass (taxonomy)

Enum value Meaning Examples
Public Non-sensitive; safe to expose Action verbs, non-PII tags
Internal Operational metadata; limited exposure requestId, instanceId
Personal PII light Display name, city
Sensitive PII / financial / strict protection Email, phone, address, last4 PAN
Credential Secrets/tokens/keys; never store raw API keys, OAuth tokens, passwords
Phi Health information; regulated Diagnosis notes, vitals

Classification is monotonic: components may upgrade (e.g., PersonalSensitive) but must not downgrade.


Rule matrix (default posture)

DataClass Write-time (store) Read: Privileged (auditor) Read: Standard (tenant user) Notes
Public None None None
Internal None None Mask (optional) Mask hostname/pod if needed
Personal None Mask or Hash Mask Keep analytics utility
Sensitive None Hash Mask/Drop Prefer tenant-salted Hash
Credential Hash or Drop (write) Drop Drop Do not persist raw secrets
Phi None Mask/Tokenize Mask/Drop Tokenize when longitudinal joins required

These are defaults. Tenants/editions may override via policy (see ClassificationPolicy).


JSON Schemas (partials, v1)

data-class.v1.json

{
  "$id": "urn:connectsoft:schemas:policy/data-class.v1.json",
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "title": "DataClass",
  "type": "string",
  "enum": ["Public", "Internal", "Personal", "Sensitive", "Credential", "Phi"]
}

redaction-rule.v1.json

{
  "$id": "urn:connectsoft:schemas:policy/redaction-rule.v1.json",
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "title": "RedactionRule",
  "type": "object",
  "additionalProperties": false,
  "properties": {
    "kind": {
      "type": "string",
      "enum": ["None", "Hash", "Mask", "Drop", "Tokenize"]
    },
    "params": {
      "type": "object",
      "additionalProperties": { "type": "string", "maxLength": 128 }
    }
  },
  "required": ["kind"]
}

classification-policy.v1.json

{
  "$id": "urn:connectsoft:schemas:policy/classification-policy.v1.json",
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "title": "ClassificationPolicy",
  "type": "object",
  "additionalProperties": false,
  "properties": {
    "id": { "type": "string", "maxLength": 128 },
    "version": { "type": "integer", "minimum": 1 },
    "effectiveFromUtc": { "type": "string", "format": "date-time" },
    "rulesByClass": {
      "type": "object",
      "additionalProperties": { "$ref": "urn:connectsoft:schemas:policy/redaction-rule.v1.json" }
    },
    "overridesByField": {
      "type": "object",
      "additionalProperties": { "$ref": "urn:connectsoft:schemas:policy/redaction-rule.v1.json" }
    },
    "defaultByField": {
      "type": "object",
      "additionalProperties": { "$ref": "urn:connectsoft:schemas:policy/data-class.v1.json" }
    }
  },
  "required": ["id", "version"]
}

defaultByField labels specific fields (e.g., actor.email = Sensitive). rulesByClass sets class-wide defaults. overridesByField takes precedence over class rules.


C# (gRPC code-first)

[DataContract]
public enum DataClass
{
    Public     = 0,
    Internal   = 1,
    Personal   = 2,
    Sensitive  = 3,
    Credential = 4,
    Phi        = 5
}

[DataContract]
public enum RedactionKind
{
    None     = 0,
    Hash     = 1,
    Mask     = 2,
    Drop     = 3,
    Tokenize = 4
}

[DataContract]
public sealed class RedactionRule
{
    [DataMember(Order = 1)] public RedactionKind Kind { get; init; } = RedactionKind.None;
    [DataMember(Order = 2)] public IReadOnlyDictionary<string,string>? Params { get; init; }
    // Params examples:
    // Hash: { "alg":"SHA256", "tenantSalt":"<optional>" }
    // Mask: { "showFirst":"2", "showLast":"4", "replacement":"*" }
    // Tokenize: { "provider":"FPE", "tokenSet":"email", "context":"<tenantId>" }
}

[DataContract]
public sealed class ClassificationPolicy
{
    [DataMember(Order = 1)] public string Id { get; init; } = default!;
    [DataMember(Order = 2)] public int Version { get; init; }
    [DataMember(Order = 3)] public DateTimeOffset? EffectiveFromUtc { get; init; }

    [DataMember(Order = 4)] public IReadOnlyDictionary<DataClass, RedactionRule>? RulesByClass { get; init; }
    [DataMember(Order = 5)] public IReadOnlyDictionary<string, RedactionRule>? OverridesByField { get; init; } // "actor.email", "request.ip"
    [DataMember(Order = 6)] public IReadOnlyDictionary<string, DataClass>? DefaultByField { get; init; }       // default classification tags
}

JSON serialization uses camelCase (overridesByField, rulesByClass). Storage columns remain PascalCase when denormalized (e.g., DataClass, RedactionPlan), otherwise embedded JSON.


Protobuf (optional emission)

syntax = "proto3";
package connectsoft.audit.v1;

message RedactionRule {
  RedactionKind Kind = 1 [json_name = "kind"];
  map<string,string> Params = 2 [json_name = "params"];
}

enum RedactionKind {
  RedactionKind_None     = 0;
  RedactionKind_Hash     = 1;
  RedactionKind_Mask     = 2;
  RedactionKind_Drop     = 3;
  RedactionKind_Tokenize = 4;
}

enum DataClass {
  DataClass_Public     = 0;
  DataClass_Internal   = 1;
  DataClass_Personal   = 2;
  DataClass_Sensitive  = 3;
  DataClass_Credential = 4;
  DataClass_Phi        = 5;
}

message ClassificationPolicy {
  string Id = 1 [json_name = "id"];
  int32 Version = 2 [json_name = "version"];
  google.protobuf.Timestamp EffectiveFromUtc = 3 [json_name = "effectiveFromUtc"];
  map<int32, RedactionRule> RulesByClass = 4 [json_name = "rulesByClass"]; // key = DataClass numeric
  map<string, RedactionRule> OverridesByField = 5 [json_name = "overridesByField"];
  map<string, int32> DefaultByField = 6 [json_name = "defaultByField"];    // value = DataClass numeric
}

Evaluation & precedence

  1. Determine field class: overridesByFielddefaultByFieldinferred (component hints) → fallback Internal.
  2. Select rule: overridesByField (if present) → rulesByClass[class] → default posture table.
  3. Apply location:
    • Write path: apply only for Credential (hash/drop) and explicit write-time overrides. Store classification tag alongside value.
    • Read path: apply selected rule based on caller clearance and edition policy (e.g., auditors may see Hash vs users see Mask).
    • Export: use export-specific plan (often equal to read), plus downstream data sharing agreements.

Redaction behavior

  • Hash: Deterministic SHA256 (hex). Prefer tenant-salted (HMAC-SHA256) unless global joins are required.
  • Mask: Keep edges via showFirst / showLast; replace middle with replacement (default *).
  • Drop: Remove the field entirely from the payload.
  • Tokenize: Replace with a reversible token through a tokenization provider (e.g., FPE); store provider metadata in params.

Example policy & usage

Policy (JSON)

{
  "id": "policy-default",
  "version": 3,
  "rulesByClass": {
    "Personal": { "kind": "Mask", "params": { "showFirst": "1", "showLast": "3" } },
    "Sensitive": { "kind": "Hash", "params": { "alg": "HMAC-SHA256" } },
    "Credential": { "kind": "Drop" },
    "Phi": { "kind": "Tokenize", "params": { "provider": "FPE", "tokenSet": "phi" } }
  },
  "overridesByField": {
    "actor.email": { "kind": "Hash", "params": { "alg": "HMAC-SHA256" } },
    "request.ip":  { "kind": "Mask", "params": { "showLast": "4" } }
  },
  "defaultByField": {
    "actor.display": "Personal",
    "actor.email": "Sensitive",
    "request.ip": "Sensitive"
  }
}

Original stored record (excerpt)

{
  "actor": { "id": "user_123", "display": "Alex", "email": "alex@example.com" },
  "request": { "ip": "203.0.113.27", "userAgent": "Mozilla/5.0" }
}

Redacted for standard tenant user (read path)

{
  "actor": { "id": "user_123", "display": "A***", "email": "df12c0...a9f" },
  "request": { "ip": "***.***.**3.27", "userAgent": "Mozilla/5.0" }
}

Redacted for auditor (higher clearance)

{
  "actor": { "id": "user_123", "display": "Alex", "email": "df12c0...a9f" },
  "request": { "ip": "203.0.113.27", "userAgent": "Mozilla/5.0" }
}

Implementation notes

  • Tagging: Persist classification per field in projections or alongside values (e.g., a parallel metadata map) to support dynamic plans.
  • Search: Index only post-redaction values where applicable (e.g., email hashes), and avoid indexing raw sensitive fields.
  • Logging: Apply the same plan to logs; ensure log redactors mirror these rules.
  • Testing: Include golden fixtures verifying the same input produces expected redacted outputs for each clearance profile.
  • Backfills: When policy versions change, re-project read models; never retroactively de-hash or re-expose dropped secrets.

Deltas (before/after)

Captures safe field-level changes for an audited action. Each entry expresses what changed on a field or sub-field path, optionally with a redaction hint to guide read/export transformations without exposing raw sensitive data.


Overview

  • Minimal, explicit changes only: include only fields that changed.
  • Typed values: before/after may be JSON scalars or small structured fragments.
  • Paths: Keys may be simple field names (status) or JSON Pointer–style paths (/lines/0/price).
  • PII-aware: When raw values must not be stored, carry hashes and a redactionHint instead of raw before/after.
  • Bounded: Payload sizes and counts are capped; large objects should use hashes and set truncated=true.

Fields

JSON (lowerCamel) C# (PascalCase) Type Req. Description & rules
fields Fields map\ Map of field/path → change record. Keys: ^[A-Za-z][A-Za-z0-9._-]{0,63}$ or JSON Pointer (/…)

DeltaField

JSON C# Type Req. Description
before Before any Previous value (scalar/object/array/null). Omit if hashed/dropped.
after After any New value (scalar/object/array/null). Omit if hashed/dropped.
beforeHash BeforeHash string Hex SHA-256 (or HMAC) of previous value when raw not stored.
afterHash AfterHash string Hex SHA-256 (or HMAC) of new value when raw not stored.
algorithm Algorithm string Hash algo (SHA256, HMAC-SHA256, …).
truncated Truncated bool true if value(s) truncated to fit caps.
redactionHint RedactionHint object Hint for read/export (class/kind). See below.

RedactionHint

Field Type Description
class DataClass Classification of the field (e.g., Sensitive, Credential).
applied RedactionKind Transformation already applied to stored values (Hash, Drop, Mask, Tokenize, None).
note string Optional operator note (≤256 chars).

When applied is Hash/Drop, prefer beforeHash/afterHash instead of raw values. When Mask is applied at read-time only, before/after may still be stored raw depending on policy (see Classification).


JSON Schema (v1)

{
  "$id": "urn:connectsoft:schemas:partials/delta.v1.json",
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "title": "Delta",
  "type": "object",
  "additionalProperties": false,
  "properties": {
    "fields": {
      "type": "object",
      "maxProperties": 64,
      "patternProperties": {
        "^(?:[A-Za-z][A-Za-z0-9._-]{0,63}|(/([^/~]|~[01])*)+)$": {
          "type": "object",
          "additionalProperties": false,
          "properties": {
            "before": { "type": ["string","number","integer","boolean","object","array","null"] },
            "after":  { "type": ["string","number","integer","boolean","object","array","null"] },
            "beforeHash": { "type": "string", "pattern": "^[a-f0-9]{64}$" },
            "afterHash":  { "type": "string", "pattern": "^[a-f0-9]{64}$" },
            "algorithm":   { "type": "string", "maxLength": 32 },
            "truncated":   { "type": "boolean" },
            "redactionHint": {
              "type": "object",
              "additionalProperties": false,
              "properties": {
                "class":   { "$ref": "urn:connectsoft:schemas:policy/data-class.v1.json" },
                "applied": { "type": "string", "enum": ["None","Hash","Mask","Drop","Tokenize"] },
                "note":    { "type": "string", "maxLength": 256 }
              }
            }
          }
        }
      }
    }
  },
  "required": ["fields"]
}

String values SHOULD be ≤ 2048 chars; arrays ≤ 50 elements; objects ≤ 32 properties (excess SHOULD be replaced with a hashed representation and truncated=true). Concrete limits finalize in Validation & Limits.


C# (gRPC code-first)

using System.Text.Json;
using System.Runtime.Serialization;

[DataContract]
public sealed class Delta
{
    [DataMember(Order = 1)]
    public IReadOnlyDictionary<string, DeltaField> Fields { get; init; } =
        new Dictionary<string, DeltaField>();
}

[DataContract]
public sealed class DeltaField
{
    // Arbitrary JSON values; prefer JsonElement to preserve types without rehydration.
    [DataMember(Order = 1)] public JsonElement? Before { get; init; }
    [DataMember(Order = 2)] public JsonElement? After  { get; init; }

    // If redaction applied at write-time
    [DataMember(Order = 3)] public string? BeforeHash { get; init; }  // hex
    [DataMember(Order = 4)] public string? AfterHash  { get; init; }  // hex
    [DataMember(Order = 5)] public string? Algorithm  { get; init; }  // "SHA256", "HMAC-SHA256"

    [DataMember(Order = 6)] public bool? Truncated { get; init; }
    [DataMember(Order = 7)] public RedactionHint? RedactionHint { get; init; }
}

[DataContract]
public sealed class RedactionHint
{
    [DataMember(Order = 1)] public DataClass? Class { get; init; }
    [DataMember(Order = 2)] public RedactionKind? Applied { get; init; }
    [DataMember(Order = 3)] public string? Note { get; init; }
}

JSON serialization MUST use camelCase. Database projections may store Delta as JSON (NVARCHAR(MAX)/JSONB) with separate computed columns for common keys if needed.


Protobuf (optional emission)

syntax = "proto3";
package connectsoft.audit.v1;

import "google/protobuf/struct.proto";

message Delta {
  map<string, DeltaField> Fields = 1 [json_name = "fields"];
}

message DeltaField {
  google.protobuf.Value Before = 1 [json_name = "before"]; // optional
  google.protobuf.Value After  = 2 [json_name = "after"];  // optional
  string BeforeHash = 3 [json_name = "beforeHash"];
  string AfterHash  = 4 [json_name = "afterHash"];
  string Algorithm  = 5 [json_name = "algorithm"];
  bool Truncated    = 6 [json_name = "truncated"];
  RedactionHint RedactionHint = 7 [json_name = "redactionHint"];
}

message RedactionHint {
  DataClass Class     = 1 [json_name = "class"];
  RedactionKind Applied = 2 [json_name = "applied"];
  string Note         = 3 [json_name = "note"];
}

Examples

1) Simple scalar change

{
  "fields": {
    "status": { "before": "Scheduled", "after": "Booked" }
  }
}

2) JSON Pointer path to sub-field

{
  "fields": {
    "/lines/0/price": { "before": 99.0, "after": 109.0 }
  }
}

3) Sensitive field (hash-only at write-time)

{
  "fields": {
    "actor.email": {
      "beforeHash": "df12c0a5f7...a9f0c4b1e3d2c1b0a9f8e7d6c5b4a3f20123456789abcdef0",
      "afterHash":  "3f0c4b1e3d...df12c0a5f7e9a8b7c6d5e4f30123456789abcdef0a9f8e7d6",
      "algorithm": "HMAC-SHA256",
      "redactionHint": { "class": "Sensitive", "applied": "Hash" }
    }
  }
}

4) Large object truncated with hash

{
  "fields": {
    "profile": {
      "beforeHash": "7b2a...c9e",
      "after": { "display": "Alex", "city": "Denver" },
      "algorithm": "SHA256",
      "truncated": true,
      "redactionHint": { "class": "Personal", "applied": "Hash", "note": "Oversize object summarized" }
    }
  }
}

Budgets & caps

  • Max changed fields per record: 64.
  • Max key length: 64 (simple) or JSON Pointer ≤ 256 chars.
  • String value length: ≤ 2048 chars per value; larger values SHOULD be hashed with truncated=true.
  • Array length: ≤ 50 elements (store diff of impacted indices when possible).
  • Object property count: ≤ 32 properties (beyond that, prefer hash summary).
  • Hash algorithm: SHA256 or HMAC-SHA256 (tenant salt preferred for Sensitive/Personal).
  • Computation: Delta computation happens at write-time; no deferred “re-diff” during reads.

Validation rules (summary)

  • Keys match either simple field pattern ^[A-Za-z][A-Za-z0-9._-]{0,63}$ or JSON Pointer (^(/([^/~]|~[01])*)+$).
  • If before/after omitted, at least one of beforeHash/afterHash MUST be present.
  • When algorithm present, any hash field MUST be 64 hex chars.
  • redactionHint.class/applied values align with the platform enums (DataClass, RedactionKind).
  • truncated=true MUST be set when caps are exceeded and a hash summary replaces raw content.

Integrity Structures

Defines the objects that provide tamper-evidence for appended audit facts: a per-record IntegrityRef, segment/block containers, a chained Merkle root, and the proof material required to verify end-to-end integrity.


Overview

  • Canonical hashing: Each record is hashed in a canonical JSON form (UTF-8, RFC 8785/JCS style). The integrity field itself is excluded from the hash input.
  • Segments → Blocks → Chain: Records are batched into Segments (rolling windows by count/time). Segment leaves form a Merkle tree with a rootHash. Multiple segments roll into an IntegrityBlock that carries the BlockRoot and a signature and points to the previous block (hash chain).
  • Proofs: Each record stores a compact Merkle proof (leafHash, path) sufficient to recompute the segment root and validate against the block.
  • Algorithms: Default hash SHA256 (hex); signatures via detached CMS/PKCS#7 or Ed25519. Parameters are part of the manifest for forward compatibility.

IntegrityRef (per record)

Minimal proof pointer & path placed on each AuditRecord.

JSON (lowerCamel) C# (PascalCase) Type Req. Description
blockId BlockId string (ULID) The enclosing IntegrityBlock identifier.
segmentId SegmentId string (ULID) The segment identifier inside the block.
leafIndex LeafIndex integer Zero-based index of the record leaf within the segment.
leafHash LeafHash string (hex64) Hash of the canonical record bytes.
algo Algo string Hash algorithm, default SHA256.
merklePath MerklePath array\ Sibling hashes to climb from leaf to the segment root.

PathNode

  • pos: "L" or "R" (sibling position relative to the running hash)
  • hash: hex64 sibling hash

Segment & Block

IntegritySegment

Field Type Description
segmentId ULID Unique segment id.
blockId ULID Owning block.
algo string Hash algorithm (SHA256).
leafCount int Number of leaves in the tree.
rootHash hex64 Merkle root for this segment.
startedAt / closedAt timestamp Segment window bounds (UTC).

IntegrityBlock

Field Type Description
blockId ULID Unique block id.
tenantId string Tenant scope for the block.
algo string Hash algorithm for all segments in this block.
segmentCount int Count of segments sealed into the block.
blockRoot hex64 Hash over (ordered) segment roots (e.g., Merkle of segment roots).
prevBlockRoot hex64 Previous block’s blockRoot (forms a chain).
signature object Detached signature over blockRoot + header.
signingKeyId string Key identifier (KMS/Key Vault/ JWKS kid).
startedAt / sealedAt timestamp Block time bounds (UTC).
region / environment string Operational labels (optional).

Ordering: Segment roots are included in ascending segmentId (or time) to compute blockRoot. The block header (blockId, tenantId, algo, segment root list digest, prevBlockRoot, timestamps) is the signed content.


JSON Schemas (partials, v1)

integrity-ref.v1.json

{
  "$id": "urn:connectsoft:schemas:integrity/integrity-ref.v1.json",
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "title": "IntegrityRef",
  "type": "object",
  "additionalProperties": false,
  "properties": {
    "blockId":   { "type": "string", "pattern": "^[0-9A-HJKMNP-TV-Z]{26}$" },
    "segmentId": { "type": "string", "pattern": "^[0-9A-HJKMNP-TV-Z]{26}$" },
    "leafIndex": { "type": "integer", "minimum": 0 },
    "leafHash":  { "type": "string", "pattern": "^[a-f0-9]{64}$" },
    "algo":      { "type": "string", "enum": ["SHA256"] },
    "merklePath": {
      "type": "array",
      "maxItems": 64,
      "items": {
        "type": "object",
        "additionalProperties": false,
        "properties": {
          "pos":  { "type": "string", "enum": ["L","R"] },
          "hash": { "type": "string", "pattern": "^[a-f0-9]{64}$" }
        },
        "required": ["pos","hash"]
      }
    }
  },
  "required": ["blockId","segmentId","leafIndex","leafHash","merklePath"]
}

integrity-segment.v1.json

{
  "$id": "urn:connectsoft:schemas:integrity/segment.v1.json",
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "title": "IntegritySegment",
  "type": "object",
  "additionalProperties": false,
  "properties": {
    "segmentId": { "type": "string", "pattern": "^[0-9A-HJKMNP-TV-Z]{26}$" },
    "blockId":   { "type": "string", "pattern": "^[0-9A-HJKMNP-TV-Z]{26}$" },
    "algo":      { "type": "string", "enum": ["SHA256"] },
    "leafCount": { "type": "integer", "minimum": 1 },
    "rootHash":  { "type": "string", "pattern": "^[a-f0-9]{64}$" },
    "startedAt": { "type": "string", "format": "date-time" },
    "closedAt":  { "type": "string", "format": "date-time" }
  },
  "required": ["segmentId","blockId","algo","leafCount","rootHash","startedAt","closedAt"]
}

integrity-block.v1.json

{
  "$id": "urn:connectsoft:schemas:integrity/block.v1.json",
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "title": "IntegrityBlock",
  "type": "object",
  "additionalProperties": false,
  "properties": {
    "blockId":      { "type": "string", "pattern": "^[0-9A-HJKMNP-TV-Z]{26}$" },
    "tenantId":     { "type": "string", "pattern": "^[A-Za-z0-9._-]{1,128}$" },
    "algo":         { "type": "string", "enum": ["SHA256"] },
    "segmentCount": { "type": "integer", "minimum": 1 },
    "blockRoot":    { "type": "string", "pattern": "^[a-f0-9]{64}$" },
    "prevBlockRoot":{ "type": "string", "pattern": "^[a-f0-9]{64}$" },
    "signature": {
      "type": "object",
      "additionalProperties": false,
      "properties": {
        "scheme": { "type": "string", "enum": ["Ed25519","PKCS7"] },
        "value":  { "type": "string", "contentEncoding": "base64" }
      },
      "required": ["scheme","value"]
    },
    "signingKeyId": { "type": "string", "maxLength": 128 },
    "startedAt":    { "type": "string", "format": "date-time" },
    "sealedAt":     { "type": "string", "format": "date-time" },
    "region":       { "type": "string", "maxLength": 32 },
    "environment":  { "type": "string", "maxLength": 32 }
  },
  "required": ["blockId","tenantId","algo","segmentCount","blockRoot","prevBlockRoot","signature","signingKeyId","startedAt","sealedAt"]
}

C# (gRPC code-first)

[DataContract]
public sealed class IntegrityRef
{
    [DataMember(Order = 1)] public string BlockId { get; init; } = default!;   // ULID
    [DataMember(Order = 2)] public string SegmentId { get; init; } = default!; // ULID
    [DataMember(Order = 3)] public int LeafIndex { get; init; }
    [DataMember(Order = 4)] public string LeafHash { get; init; } = default!;  // hex SHA256
    [DataMember(Order = 5)] public string? Algo { get; init; } = "SHA256";
    [DataMember(Order = 6)] public IReadOnlyList<MerklePathNode> MerklePath { get; init; } = Array.Empty<MerklePathNode>();
}

[DataContract]
public sealed class MerklePathNode
{
    [DataMember(Order = 1)] public string Pos { get; init; } = default!;   // "L" | "R"
    [DataMember(Order = 2)] public string Hash { get; init; } = default!;  // hex
}

[DataContract]
public sealed class IntegritySegment
{
    [DataMember(Order = 1)] public string SegmentId { get; init; } = default!;
    [DataMember(Order = 2)] public string BlockId { get; init; } = default!;
    [DataMember(Order = 3)] public string Algo { get; init; } = "SHA256";
    [DataMember(Order = 4)] public int LeafCount { get; init; }
    [DataMember(Order = 5)] public string RootHash { get; init; } = default!;
    [DataMember(Order = 6)] public DateTimeOffset StartedAt { get; init; }
    [DataMember(Order = 7)] public DateTimeOffset ClosedAt { get; init; }
}

[DataContract]
public sealed class IntegrityBlock
{
    [DataMember(Order = 1)] public string BlockId { get; init; } = default!;
    [DataMember(Order = 2)] public string TenantId { get; init; } = default!;
    [DataMember(Order = 3)] public string Algo { get; init; } = "SHA256";
    [DataMember(Order = 4)] public int SegmentCount { get; init; }
    [DataMember(Order = 5)] public string BlockRoot { get; init; } = default!;
    [DataMember(Order = 6)] public string PrevBlockRoot { get; init; } = default!;
    [DataMember(Order = 7)] public Signature Signature { get; init; } = new();
    [DataMember(Order = 8)] public string SigningKeyId { get; init; } = default!;
    [DataMember(Order = 9)] public DateTimeOffset StartedAt { get; init; }
    [DataMember(Order = 10)] public DateTimeOffset SealedAt { get; init; }
    [DataMember(Order = 11)] public string? Region { get; init; }
    [DataMember(Order = 12)] public string? Environment { get; init; }
}

[DataContract]
public sealed class Signature
{
    [DataMember(Order = 1)] public string Scheme { get; init; } = "Ed25519"; // or "PKCS7"
    [DataMember(Order = 2)] public string Value { get; init; } = default!;   // base64
}

JSON serialization uses camelCase; database columns stay PascalCase (BlockId, BlockRoot, …).


Protobuf (optional emission)

syntax = "proto3";
package connectsoft.audit.v1;

message IntegrityRef {
  string BlockId = 1 [json_name = "blockId"];
  string SegmentId = 2 [json_name = "segmentId"];
  int32 LeafIndex = 3 [json_name = "leafIndex"];
  string LeafHash = 4 [json_name = "leafHash"];
  string Algo = 5 [json_name = "algo"];
  repeated MerklePathNode MerklePath = 6 [json_name = "merklePath"];
}

message MerklePathNode {
  string Pos = 1 [json_name = "pos"];    // "L" | "R"
  string Hash = 2 [json_name = "hash"];  // hex
}

message IntegritySegment {
  string SegmentId = 1 [json_name = "segmentId"];
  string BlockId = 2 [json_name = "blockId"];
  string Algo = 3 [json_name = "algo"];
  int32 LeafCount = 4 [json_name = "leafCount"];
  string RootHash = 5 [json_name = "rootHash"];
  google.protobuf.Timestamp StartedAt = 6 [json_name = "startedAt"];
  google.protobuf.Timestamp ClosedAt = 7 [json_name = "closedAt"];
}

message IntegrityBlock {
  string BlockId = 1 [json_name = "blockId"];
  string TenantId = 2 [json_name = "tenantId"];
  string Algo = 3 [json_name = "algo"];
  int32 SegmentCount = 4 [json_name = "segmentCount"];
  string BlockRoot = 5 [json_name = "blockRoot"];
  string PrevBlockRoot = 6 [json_name = "prevBlockRoot"];
  Signature Signature = 7 [json_name = "signature"];
  string SigningKeyId = 8 [json_name = "signingKeyId"];
  google.protobuf.Timestamp StartedAt = 9 [json_name = "startedAt"];
  google.protobuf.Timestamp SealedAt = 10 [json_name = "sealedAt"];
  string Region = 11 [json_name = "region"];
  string Environment = 12 [json_name = "environment"];
}

message Signature {
  string Scheme = 1 [json_name = "scheme"]; // "Ed25519" | "PKCS7"
  string Value = 2 [json_name = "value"];   // base64
}

Examples

Per-record reference (embedded in AuditRecord)

{
  "integrity": {
    "blockId": "01JE3G8T1E4J9BW6VQ4G6S1Q2C",
    "segmentId": "01JE3G8T7P0A7P9F3Q1H6X9V2Z",
    "leafIndex": 17,
    "leafHash": "3a8f0e9a58f3b3d6e1c0a9f7b6c5d4e3a2b1c0d9e8f7a6b5c4d3e2f1a0b9c8d7",
    "algo": "SHA256",
    "merklePath": [
      { "pos": "L", "hash": "8f2a..." },
      { "pos": "R", "hash": "5c91..." }
    ]
  }
}

Segment manifest

{
  "segmentId": "01JE3G8T7P0A7P9F3Q1H6X9V2Z",
  "blockId": "01JE3G8T1E4J9BW6VQ4G6S1Q2C",
  "algo": "SHA256",
  "leafCount": 512,
  "rootHash": "0f6d4a...e91c",
  "startedAt": "2025-10-22T14:00:00Z",
  "closedAt": "2025-10-22T14:05:00Z"
}

Block header (sealed)

{
  "blockId": "01JE3G8T1E4J9BW6VQ4G6S1Q2C",
  "tenantId": "splootvets",
  "algo": "SHA256",
  "segmentCount": 8,
  "blockRoot": "6a4b3c...0d2e",
  "prevBlockRoot": "5b3a2c...9c1f",
  "signature": { "scheme": "Ed25519", "value": "MEUCIQDv..." },
  "signingKeyId": "kv:prod/atp-integrity/ed25519-2025-01",
  "startedAt": "2025-10-22T14:00:00Z",
  "sealedAt": "2025-10-22T14:10:00Z",
  "region": "us-central",
  "environment": "prod"
}

Verification flow (reader/exporter)

  1. Canonicalize the target record JSON (excluding its integrity node) → bytes (UTF-8).
  2. Compute leafHash = SHA256(bytes); compare to integrity.leafHash.
  3. Climb the Merkle path: iteratively hash with sibling nodes using the recorded pos to reach segment.rootHash.
  4. Fetch the segment manifest; compare computed root to IntegritySegment.rootHash.
  5. Fetch the block; recompute blockRoot from ordered segment roots; compare to IntegrityBlock.blockRoot.
  6. Verify signature using signingKeyId (from Key Vault/JWKS).
  7. Optionally verify the chain: block.prevBlockRoot equals the prior block’s blockRoot.
  8. Verification passes only if all steps succeed.

Storage mapping (authoritative)

CREATE TABLE dbo.IntegrityBlocks (
  BlockId      CHAR(26)     NOT NULL,
  TenantId     NVARCHAR(64) NOT NULL,
  Algo         NVARCHAR(16) NOT NULL,  -- "SHA256"
  SegmentCount INT          NOT NULL,
  BlockRoot    CHAR(64)     NOT NULL,  -- hex
  PrevBlockRoot CHAR(64)    NOT NULL,
  SignatureScheme NVARCHAR(16) NOT NULL,
  SignatureValue  VARBINARY(MAX) NOT NULL,
  SigningKeyId NVARCHAR(128) NOT NULL,
  StartedAt    DATETIME2(3) NOT NULL,
  SealedAt     DATETIME2(3) NOT NULL,
  Region       NVARCHAR(32) NULL,
  Environment  NVARCHAR(32) NULL,
  CONSTRAINT PK_IntegrityBlocks PRIMARY KEY (BlockId)
);

CREATE TABLE dbo.IntegritySegments (
  SegmentId   CHAR(26)     NOT NULL,
  BlockId     CHAR(26)     NOT NULL,
  Algo        NVARCHAR(16) NOT NULL,
  LeafCount   INT          NOT NULL,
  RootHash    CHAR(64)     NOT NULL,
  StartedAt   DATETIME2(3) NOT NULL,
  ClosedAt    DATETIME2(3) NOT NULL,
  CONSTRAINT PK_IntegritySegments PRIMARY KEY (SegmentId),
  CONSTRAINT FK_IntegritySegments_Blocks FOREIGN KEY (BlockId) REFERENCES dbo.IntegrityBlocks(BlockId)
);

Per-record IntegrityRef is embedded on AuditRecords (JSON). Optionally project BlockId/SegmentId/LeafIndex into columns for faster lookups.


Budgets & caps

  • Max merkle depth: 64 path nodes per record.
  • Target segment size: 2^N leaves (e.g., 512 or 1024) or 5-minute window, whichever closes first.
  • Block closure: fixed schedule (e.g., 10 minutes) or ~8 segments, with immediate seal/sign.
  • Hash algorithm: SHA256 (hex) for all leaves and nodes; FIPS-approved variants may be introduced via algo.
  • Signature: Ed25519 default; PKCS#7 (CMS) supported for enterprise HSM workflows.

Notes

  • Privacy: Canonical hashing operates on the stored representation; if write-time redaction hashes a field, the record’s leaf hash reflects the redacted value (by design).
  • Portability: Block headers and segment manifests are self-describing; exporters include them alongside data packages.
  • Forward-compat: New algorithms or layouts must be additive; verifiers fall back to manifest parameters when present.

--

Retention Policy Model

Expresses the RetentionPolicy aggregate, its scopes & windows, revisioning (revision with effectiveFromUtc), and the evaluation inputs/results used to decide when an AuditRecord becomes eligible for purge and when it must be kept (WORM-like minimums).


Overview

  • Min/Max windows: Policies define a minimum keep window (no purge before) and an optional maximum window (target purge at).
  • Rule engine: A policy contains ordered rules with scopes matching record facets (resource.type, action, data classes, attributes), each providing a window.
  • Revisions: Policies evolve by incrementing revision (forward-only) and setting effectiveFromUtc. Re-evaluation may extend keep times but must not shorten previously committed ones.
  • Holds: LegalHold (elsewhere) supersedes retention—evaluation must surface state=OnHold.
  • Clocks: Windows are typically anchored at createdAt (can be configured per rule).

Model

RetentionWindow

  • minDays (int ≥ 0): Minimum days to retain (WORM).
  • maxDays (int ≥ minDays, optional): If set, target purge after this many days.
  • anchor (CreatedAt | ObservedAt | EffectiveAt), default CreatedAt.
  • jitterDays (int ≥ 0, optional): Randomized offset to spread purge load (applied at evaluation time).

RetentionScope

  • resourceTypes (array of PascalCase names, may include dotted namespaces; supports * suffix wildcard, e.g., Vetspire.*).
  • actions (array of verb or verb.noun strings; supports * suffix wildcard, e.g., appointment.*).
  • dataClasses (array of DataClass values; matches if the record (or its delta) contains any of these classes).
  • attributes (map of key → value or glob pattern; matches AuditRecord.attributes).
  • Empty scope matches all records.

RetentionRule

  • id (string ≤ 64), description (optional), priority (int, lower wins), enabled (bool, default true), stopProcessing (bool, default true), scope (RetentionScope), window (RetentionWindow).

RetentionPolicy

  • id (string ≤ 128), tenantId (string; optional for global policies), revision (int≥1), effectiveFromUtc (timestamp), defaultWindow (RetentionWindow), rules (array\).

JSON Schemas (v1)

retention-policy.v1.json

{
  "$id": "urn:connectsoft:schemas:policy/retention-policy.v1.json",
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "title": "RetentionPolicy",
  "type": "object",
  "additionalProperties": false,
  "properties": {
    "id": { "type": "string", "maxLength": 128, "minLength": 1 },
    "tenantId": { "type": "string", "maxLength": 128, "pattern": "^[A-Za-z0-9._-]+$" },
    "revision": { "type": "integer", "minimum": 1 },
    "effectiveFromUtc": { "type": "string", "format": "date-time" },
    "defaultWindow": { "$ref": "urn:connectsoft:schemas:policy/retention-window.v1.json" },
    "rules": {
      "type": "array",
      "items": { "$ref": "urn:connectsoft:schemas:policy/retention-rule.v1.json" },
      "maxItems": 200
    }
  },
  "required": ["id","revision","effectiveFromUtc","defaultWindow"]
}

retention-window.v1.json

{
  "$id": "urn:connectsoft:schemas:policy/retention-window.v1.json",
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "title": "RetentionWindow",
  "type": "object",
  "additionalProperties": false,
  "properties": {
    "minDays": { "type": "integer", "minimum": 0 },
    "maxDays": { "type": "integer", "minimum": 0 },
    "anchor": {
      "type": "string",
      "enum": ["CreatedAt","ObservedAt","EffectiveAt"],
      "default": "CreatedAt"
    },
    "jitterDays": { "type": "integer", "minimum": 0 }
  },
  "required": ["minDays"]
}

retention-rule.v1.json

{
  "$id": "urn:connectsoft:schemas:policy/retention-rule.v1.json",
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "title": "RetentionRule",
  "type": "object",
  "additionalProperties": false,
  "properties": {
    "id": { "type": "string", "maxLength": 64, "minLength": 1 },
    "description": { "type": "string", "maxLength": 256 },
    "priority": { "type": "integer", "minimum": 0, "default": 100 },
    "enabled": { "type": "boolean", "default": true },
    "stopProcessing": { "type": "boolean", "default": true },
    "scope": { "$ref": "urn:connectsoft:schemas:policy/retention-scope.v1.json" },
    "window": { "$ref": "urn:connectsoft:schemas:policy/retention-window.v1.json" }
  },
  "required": ["id","scope","window"]
}

retention-scope.v1.json

{
  "$id": "urn:connectsoft:schemas:policy/retention-scope.v1.json",
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "title": "RetentionScope",
  "type": "object",
  "additionalProperties": false,
  "properties": {
    "resourceTypes": {
      "type": "array",
      "items": { "type": "string", "pattern": "^[A-Z][A-Za-z0-9]*(\\.[A-Z][A-Za-z0-9]*)*(\\.*|\\*)?$" },
      "maxItems": 64
    },
    "actions": {
      "type": "array",
      "items": { "type": "string", "pattern": "^[a-z]+([.][a-z][a-z0-9_-]+)?(\\*|)$" },
      "maxItems": 64
    },
    "dataClasses": {
      "type": "array",
      "items": { "$ref": "urn:connectsoft:schemas:policy/data-class.v1.json" },
      "maxItems": 16
    },
    "attributes": {
      "type": "object",
      "additionalProperties": { "type": "string", "maxLength": 128 }
    }
  }
}

Evaluation I/O

EvaluationInput (what the engine needs)

  • policyId (string), revision (int) (optional; if omitted, engine picks the latest effective for the tenant at nowUtc).
  • nowUtc (timestamp).
  • record (subset of AuditRecord metadata): tenantId, createdAt, observedAt, effectiveAt (optional), action, resource.type, attributes, dataClasses (set of classes observed on fields/delta), legalHold (bool or hold refs).

EvaluationResult

  • state: Active | OnHold | Eligible | Purged | Error.
  • eligibleAt (timestamp): when record first becomes eligible for purge (after min window).
  • keepUntil (timestamp | null): hard no-purge-before time (minDays window).
  • purgeAfter (timestamp | null): target purge time if maxDays exists (plus jitter).
  • matchedRuleId (string | null), appliedWindow (copy of window with effective values), policyId, revision.
  • reasons (array): brief explanations (e.g., Matched rule R-APPT-READ, LegalHold active).
  • errors (array, optional).

retention-eval-input.v1.json

{
  "$id": "urn:connectsoft:schemas:policy/retention-eval-input.v1.json",
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "title": "RetentionEvaluationInput",
  "type": "object",
  "additionalProperties": false,
  "properties": {
    "policyId": { "type": "string" },
    "revision": { "type": "integer", "minimum": 1 },
    "nowUtc": { "type": "string", "format": "date-time" },
    "record": {
      "type": "object",
      "additionalProperties": false,
      "properties": {
        "tenantId": { "type": "string", "pattern": "^[A-Za-z0-9._-]{1,128}$" },
        "createdAt": { "type": "string", "format": "date-time" },
        "observedAt": { "type": "string", "format": "date-time" },
        "effectiveAt": { "type": "string", "format": "date-time" },
        "action": { "type": "string" },
        "resourceType": { "type": "string" },
        "attributes": { "type": "object", "additionalProperties": { "type": "string" } },
        "dataClasses": {
          "type": "array",
          "items": { "$ref": "urn:connectsoft:schemas:policy/data-class.v1.json" },
          "uniqueItems": true
        },
        "legalHold": { "type": "boolean" }
      },
      "required": ["tenantId","createdAt","action","resourceType"]
    }
  },
  "required": ["nowUtc","record"]
}

retention-eval-result.v1.json

{
  "$id": "urn:connectsoft:schemas:policy/retention-eval-result.v1.json",
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "title": "RetentionEvaluationResult",
  "type": "object",
  "additionalProperties": false,
  "properties": {
    "state": { "type": "string", "enum": ["Active","OnHold","Eligible","Purged","Error"] },
    "eligibleAt": { "type": "string", "format": "date-time" },
    "keepUntil": { "type": ["string","null"], "format": "date-time" },
    "purgeAfter": { "type": ["string","null"], "format": "date-time" },
    "matchedRuleId": { "type": ["string","null"] },
    "appliedWindow": { "$ref": "urn:connectsoft:schemas:policy/retention-window.v1.json" },
    "policyId": { "type": "string" },
    "revision": { "type": "integer" },
    "reasons": { "type": "array", "items": { "type": "string" }, "maxItems": 10 },
    "errors": { "type": "array", "items": { "type": "string" } }
  },
  "required": ["state","eligibleAt","policyId","revision"]
}

C# (gRPC code-first)

[DataContract]
public sealed class RetentionPolicy
{
    [DataMember(Order = 1)] public string Id { get; init; } = default!;
    [DataMember(Order = 2)] public string? TenantId { get; init; }
    [DataMember(Order = 3)] public int Revision { get; init; }
    [DataMember(Order = 4)] public DateTimeOffset EffectiveFromUtc { get; init; }
    [DataMember(Order = 5)] public RetentionWindow DefaultWindow { get; init; } = new();
    [DataMember(Order = 6)] public IReadOnlyList<RetentionRule> Rules { get; init; } = Array.Empty<RetentionRule>();
}

[DataContract]
public sealed class RetentionRule
{
    [DataMember(Order = 1)] public string Id { get; init; } = default!;
    [DataMember(Order = 2)] public string? Description { get; init; }
    [DataMember(Order = 3)] public int Priority { get; init; } = 100;
    [DataMember(Order = 4)] public bool Enabled { get; init; } = true;
    [DataMember(Order = 5)] public bool StopProcessing { get; init; } = true;
    [DataMember(Order = 6)] public RetentionScope Scope { get; init; } = new();
    [DataMember(Order = 7)] public RetentionWindow Window { get; init; } = new();
}

[DataContract]
public sealed class RetentionWindow
{
    [DataMember(Order = 1)] public int MinDays { get; init; }  // >= 0
    [DataMember(Order = 2)] public int? MaxDays { get; init; } // >= MinDays
    [DataMember(Order = 3)] public string Anchor { get; init; } = "CreatedAt"; // "CreatedAt"|"ObservedAt"|"EffectiveAt"
    [DataMember(Order = 4)] public int? JitterDays { get; init; } // optional
}

[DataContract]
public sealed class RetentionScope
{
    [DataMember(Order = 1)] public IReadOnlyList<string>? ResourceTypes { get; init; } // "Appointment", "Vetspire.*"
    [DataMember(Order = 2)] public IReadOnlyList<string>? Actions { get; init; }       // "create", "appointment.*"
    [DataMember(Order = 3)] public IReadOnlyList<DataClass>? DataClasses { get; init; }// from taxonomy
    [DataMember(Order = 4)] public IReadOnlyDictionary<string,string>? Attributes { get; init; }
}

[DataContract]
public sealed class RetentionEvaluationInput
{
    [DataMember(Order = 1)] public string? PolicyId { get; init; }
    [DataMember(Order = 2)] public int? Revision { get; init; }
    [DataMember(Order = 3)] public DateTimeOffset NowUtc { get; init; }
    [DataMember(Order = 4)] public RetentionRecordProbe Record { get; init; } = new();
}

[DataContract]
public sealed class RetentionRecordProbe
{
    [DataMember(Order = 1)] public string TenantId { get; init; } = default!;
    [DataMember(Order = 2)] public DateTimeOffset CreatedAt { get; init; }
    [DataMember(Order = 3)] public DateTimeOffset? ObservedAt { get; init; }
    [DataMember(Order = 4)] public DateTimeOffset? EffectiveAt { get; init; }
    [DataMember(Order = 5)] public string Action { get; init; } = default!;
    [DataMember(Order = 6)] public string ResourceType { get; init; } = default!;
    [DataMember(Order = 7)] public IReadOnlyDictionary<string,string>? Attributes { get; init; }
    [DataMember(Order = 8)] public IReadOnlyList<DataClass>? DataClasses { get; init; }
    [DataMember(Order = 9)] public bool? LegalHold { get; init; }
}

[DataContract]
public sealed class RetentionEvaluationResult
{
    [DataMember(Order = 1)] public string State { get; init; } = "Active"; // "Active"|"OnHold"|"Eligible"|"Purged"|"Error"
    [DataMember(Order = 2)] public DateTimeOffset EligibleAt { get; init; }
    [DataMember(Order = 3)] public DateTimeOffset? KeepUntil { get; init; }
    [DataMember(Order = 4)] public DateTimeOffset? PurgeAfter { get; init; }
    [DataMember(Order = 5)] public string? MatchedRuleId { get; init; }
    [DataMember(Order = 6)] public RetentionWindow AppliedWindow { get; init; } = new();
    [DataMember(Order = 7)] public string PolicyId { get; init; } = default!;
    [DataMember(Order = 8)] public int Revision { get; init; }
    [DataMember(Order = 9)] public IReadOnlyList<string>? Reasons { get; init; }
    [DataMember(Order = 10)] public IReadOnlyList<string>? Errors { get; init; }
}

JSON uses camelCase; database uses PascalCase (RetentionPolicies, columns like EffectiveFromUtc, Revision).


Protobuf (optional emission)

syntax = "proto3";
package connectsoft.policy.v1;

message RetentionWindow {
  int32 MinDays = 1 [json_name = "minDays"];
  google.protobuf.Int32Value MaxDays = 2 [json_name = "maxDays"];
  string Anchor = 3 [json_name = "anchor"]; // "CreatedAt"|"ObservedAt"|"EffectiveAt"
  google.protobuf.Int32Value JitterDays = 4 [json_name = "jitterDays"];
}

message RetentionScope {
  repeated string ResourceTypes = 1 [json_name = "resourceTypes"];
  repeated string Actions = 2 [json_name = "actions"];
  repeated string DataClasses = 3 [json_name = "dataClasses"]; // enum names
  map<string,string> Attributes = 4 [json_name = "attributes"];
}

message RetentionRule {
  string Id = 1 [json_name = "id"];
  string Description = 2 [json_name = "description"];
  int32 Priority = 3 [json_name = "priority"];
  bool Enabled = 4 [json_name = "enabled"];
  bool StopProcessing = 5 [json_name = "stopProcessing"];
  RetentionScope Scope = 6 [json_name = "scope"];
  RetentionWindow Window = 7 [json_name = "window"];
}

message RetentionPolicy {
  string Id = 1 [json_name = "id"];
  string TenantId = 2 [json_name = "tenantId"];
  int32 Revision = 3 [json_name = "revision"];
  google.protobuf.Timestamp EffectiveFromUtc = 4 [json_name = "effectiveFromUtc"];
  RetentionWindow DefaultWindow = 5 [json_name = "defaultWindow"];
  repeated RetentionRule Rules = 6 [json_name = "rules"];
}

message RetentionEvaluationInput {
  string PolicyId = 1 [json_name = "policyId"];
  google.protobuf.Int32Value Revision = 2 [json_name = "revision"];
  google.protobuf.Timestamp NowUtc = 3 [json_name = "nowUtc"];
  RetentionRecordProbe Record = 4 [json_name = "record"];
}

message RetentionRecordProbe {
  string TenantId = 1 [json_name = "tenantId"];
  google.protobuf.Timestamp CreatedAt = 2 [json_name = "createdAt"];
  google.protobuf.Timestamp ObservedAt = 3 [json_name = "observedAt"];
  google.protobuf.Timestamp EffectiveAt = 4 [json_name = "effectiveAt"];
  string Action = 5 [json_name = "action"];
  string ResourceType = 6 [json_name = "resourceType"];
  map<string,string> Attributes = 7 [json_name = "attributes"];
  repeated string DataClasses = 8 [json_name = "dataClasses"];
  bool LegalHold = 9 [json_name = "legalHold"];
}

message RetentionEvaluationResult {
  string State = 1 [json_name = "state"];
  google.protobuf.Timestamp EligibleAt = 2 [json_name = "eligibleAt"];
  google.protobuf.Timestamp KeepUntil = 3 [json_name = "keepUntil"];
  google.protobuf.Timestamp PurgeAfter = 4 [json_name = "purgeAfter"];
  string MatchedRuleId = 5 [json_name = "matchedRuleId"];
  RetentionWindow AppliedWindow = 6 [json_name = "appliedWindow"];
  string PolicyId = 7 [json_name = "policyId"];
  int32 Revision = 8 [json_name = "revision"];
  repeated string Reasons = 9 [json_name = "reasons"];
  repeated string Errors = 10 [json_name = "errors"];
}

Examples

Policy

{
  "id": "policy-default",
  "tenantId": "splootvets",
  "revision": 4,
  "effectiveFromUtc": "2025-10-01T00:00:00Z",
  "defaultWindow": { "minDays": 90 },
  "rules": [
    {
      "id": "R-APPT-READ",
      "description": "Shorter retention for read-only appointment views",
      "priority": 10,
      "scope": { "resourceTypes": ["Vetspire.Appointment"], "actions": ["appointment.read"] },
      "window": { "minDays": 30, "maxDays": 365, "anchor": "CreatedAt", "jitterDays": 7 }
    },
    {
      "id": "R-CREDENTIALS",
      "description": "Long retention for credential-related events",
      "priority": 20,
      "scope": { "dataClasses": ["Credential"] },
      "window": { "minDays": 3650 }
    }
  ]
}

Evaluation input

{
  "nowUtc": "2025-10-22T14:30:00Z",
  "record": {
    "tenantId": "splootvets",
    "createdAt": "2025-10-02T10:00:00Z",
    "action": "appointment.read",
    "resourceType": "Vetspire.Appointment",
    "dataClasses": ["Personal"],
    "legalHold": false
  }
}

Evaluation result

{
  "state": "Active",
  "eligibleAt": "2025-11-01T10:00:00Z",
  "keepUntil": "2025-11-01T10:00:00Z",
  "purgeAfter": "2026-10-02T10:00:00Z",        // may include ±jitter
  "matchedRuleId": "R-APPT-READ",
  "appliedWindow": { "minDays": 30, "maxDays": 365, "anchor": "CreatedAt", "jitterDays": 7 },
  "policyId": "policy-default",
  "revision": 4,
  "reasons": ["Matched rule R-APPT-READ"]
}

Evaluation result with legal hold

{
  "state": "OnHold",
  "eligibleAt": "2025-12-31T10:00:00Z",
  "keepUntil": "2025-12-31T10:00:00Z",
  "purgeAfter": null,
  "matchedRuleId": "R-APPT-READ",
  "appliedWindow": { "minDays": 30, "maxDays": 365 },
  "policyId": "policy-default",
  "revision": 4,
  "reasons": ["LegalHold active"]
}

Evaluation semantics

  1. Select policy by tenant and nowUtceffectiveFromUtc; pick latest revision that’s effective.
  2. Find rule(s) by ascending priority among enabled=true rules whose scope matches probe:
    • resourceTypes/actions: exact or * suffix wildcard match.
    • dataClasses: intersection non-empty.
    • attributes: key must exist and value/glob must match.
  3. Apply first-window if stopProcessing=true; otherwise combine windows conservatively:
    • minDays = max of matched mins; maxDays = min of matched maxes (when both present).
    • If no rule matches, use defaultWindow.
  4. Compute times from selected anchor (default createdAt), add jitterDays if present.
  5. Monotonicity: if the record already has a committed KeepUntil, new evaluation can extend (take later date) but must not reduce it.
  6. Legal hold: If a hold is present, set state=OnHold, keep keepUntil for reference, and set purgeAfter=null.
  7. State:
    • Active if nowUtc < eligibleAt.
    • Eligible if nowUtceligibleAt and no hold; delete may proceed any time ≥ keepUntil (and ideally at/after purgeAfter if set).
    • Purged only assigned by lifecycle once deletion is complete.

Storage mapping

CREATE TABLE dbo.RetentionPolicies (
  Id              NVARCHAR(128) NOT NULL,
  TenantId        NVARCHAR(128) NULL,
  Revision        INT           NOT NULL,
  EffectiveFromUtc DATETIME2(0) NOT NULL,
  DefaultWindow   NVARCHAR(256) NOT NULL, -- JSON (MinDays, MaxDays, Anchor, JitterDays)
  RulesJson       NVARCHAR(MAX) NOT NULL, -- JSON array of rules
  CONSTRAINT PK_RetentionPolicies PRIMARY KEY (Id, Revision)
);

CREATE TABLE dbo.RecordRetention (
  AuditRecordId CHAR(26)     NOT NULL, -- ULID
  TenantId      NVARCHAR(64) NOT NULL,
  PolicyId      NVARCHAR(128) NOT NULL,
  Revision      INT          NOT NULL,
  MatchedRuleId NVARCHAR(64) NULL,
  KeepUntil     DATETIME2(0) NOT NULL,
  EligibleAt    DATETIME2(0) NOT NULL,
  PurgeAfter    DATETIME2(0) NULL,
  State         NVARCHAR(16) NOT NULL, -- "Active"|"OnHold"|"Eligible"|"Purged"
  LastEvaluatedAt DATETIME2(0) NOT NULL,
  CONSTRAINT PK_RecordRetention PRIMARY KEY (AuditRecordId),
  INDEX IX_RecordRetention_Tenant_Eligible (TenantId, EligibleAt),
  INDEX IX_RecordRetention_Tenant_KeepUntil (TenantId, KeepUntil)
);

Policies are immutable per (Id, Revision). New revisions create new rows; readers pick the latest effective revision at evaluation time.


Budgets & caps

  • Max rules per policy: 200.
  • Max resourceTypes/actions per rule: 64 each.
  • minDays36500 (100 years); maxDays36500.
  • jitterDays30.
  • Wildcards limited to suffix * (no glob in middle) to keep evaluation O(1) per candidate list.

Validation rules (summary)

  • revision strictly increases; effectiveFromUtc must be ≥ previous revision’s effective time.
  • maxDays (when present) must be ≥ minDays.
  • Combining multiple rules with stopProcessing=false must produce a valid window (i.e., minDaysmaxDays if maxDays exists).
  • Monotonicity: persisted KeepUntil may only move forward in time upon re-evaluation.
  • Legal hold sets state=OnHold regardless of computed times; PurgeAfter becomes null.

Locks the model for placing and releasing legal holds that suspend purge eligibility for matching records. A hold defines a scope (what records it applies to), case metadata (caseId, reason), provenance (who placed/released), and timing (placed/released/expiry).


Overview

  • Effect: While a hold is Active, any matching record is not purgeable regardless of retention windows; evaluation surfaces state=OnHold.
  • Scope-first: Holds target records by resource type, action, attributes, data classes, and time range on record clocks.
  • Prospective & retrospective: Holds can apply to existing records, future records, or both.
  • Provenance: Capture the placing/releasing principal via a compact ActorRef.
  • Immutability: Core metadata is immutable after placement; only state can change (Active → Released/Expired) via explicit transitions.

Fields

JSON (lowerCamel) C# (PascalCase) Type Req. Description & rules
holdId HoldId string (ULID) Unique identifier of the hold.
tenantId TenantId string Tenant scope; holds are tenant-local unless using a separate supervisory tenant.
caseId CaseId string External case or matter reference (≤64).
reason Reason string Short justification (≤256).
note Note string Optional longer note (≤1024).
state State enum Active | Released | Expired.
scope Scope object Matching rules (see Scope).
appliesTo AppliesTo enum Existing | Future | Both. Default Both.
placedAt PlacedAt timestamp UTC when hold was placed.
placedBy PlacedBy ActorRef Who placed the hold.
releasedAt ReleasedAt timestamp UTC when released (if any).
releasedBy ReleasedBy ActorRef Who released the hold (if any).
expiresAt ExpiresAt timestamp Optional scheduled expiry; when passed, state becomes Expired.
version Version integer Monotonic counter; increment on each state change.

Scope

Field Type Description
resourceTypes array\ PascalCase, dotted namespace allowed; * suffix wildcard (e.g., Vetspire.*).
actions array\ verb or verb.noun; * suffix wildcard.
attributes map\ Match against AuditRecord.attributes (exact or glob */? allowed in values).
dataClasses array\ Matches if any class appears on the record/delta.
timeRange object { "anchor": "CreatedAt"|"ObservedAt"|"EffectiveAt", "from": ts?, "to": ts? } — inclusive range.

Empty scope matches all records in the tenant.


JSON Schemas (v1)

legal-hold.v1.json

{
  "$id": "urn:connectsoft:schemas:policy/legal-hold.v1.json",
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "title": "LegalHold",
  "type": "object",
  "additionalProperties": false,
  "properties": {
    "holdId":   { "type": "string", "pattern": "^[0-9A-HJKMNP-TV-Z]{26}$" },
    "tenantId": { "type": "string", "pattern": "^[A-Za-z0-9._-]{1,128}$" },
    "caseId":   { "type": "string", "maxLength": 64, "minLength": 1 },
    "reason":   { "type": "string", "maxLength": 256, "minLength": 1 },
    "note":     { "type": "string", "maxLength": 1024 },
    "state":    { "type": "string", "enum": ["Active","Released","Expired"] },
    "scope":    { "$ref": "urn:connectsoft:schemas:policy/legal-hold-scope.v1.json" },
    "appliesTo":{ "type": "string", "enum": ["Existing","Future","Both"], "default": "Both" },
    "placedAt": { "type": "string", "format": "date-time" },
    "placedBy": { "$ref": "urn:connectsoft:schemas:partials/actor-ref.v1.json" },
    "releasedAt": { "type": "string", "format": "date-time" },
    "releasedBy": { "$ref": "urn:connectsoft:schemas:partials/actor-ref.v1.json" },
    "expiresAt":  { "type": "string", "format": "date-time" },
    "version":  { "type": "integer", "minimum": 1 }
  },
  "required": ["holdId","tenantId","caseId","reason","state","scope","appliesTo","placedAt","placedBy","version"]
}

legal-hold-scope.v1.json

{
  "$id": "urn:connectsoft:schemas:policy/legal-hold-scope.v1.json",
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "title": "LegalHoldScope",
  "type": "object",
  "additionalProperties": false,
  "properties": {
    "resourceTypes": {
      "type": "array",
      "items": { "type": "string", "pattern": "^[A-Z][A-Za-z0-9]*(\\.[A-Z][A-Za-z0-9]*)*(\\*|)$" },
      "maxItems": 64
    },
    "actions": {
      "type": "array",
      "items": { "type": "string", "pattern": "^[a-z]+([.][a-z][a-z0-9_-]+)?(\\*|)$" },
      "maxItems": 64
    },
    "attributes": {
      "type": "object",
      "additionalProperties": { "type": "string", "maxLength": 128 }
    },
    "dataClasses": {
      "type": "array",
      "items": { "$ref": "urn:connectsoft:schemas:policy/data-class.v1.json" },
      "maxItems": 16
    },
    "timeRange": {
      "type": "object",
      "additionalProperties": false,
      "properties": {
        "anchor": { "type": "string", "enum": ["CreatedAt","ObservedAt","EffectiveAt"], "default": "CreatedAt" },
        "from":   { "type": "string", "format": "date-time" },
        "to":     { "type": "string", "format": "date-time" }
      }
    }
  }
}

actor-ref.v1.json (partial)

Minimal reference used in admin objects to avoid full PII.

{
  "$id": "urn:connectsoft:schemas:partials/actor-ref.v1.json",
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "title": "ActorRef",
  "type": "object",
  "additionalProperties": false,
  "properties": {
    "id":   { "type": "string", "maxLength": 128, "pattern": "^[A-Za-z0-9._-]+$" },
    "type": { "type": "string", "enum": ["Unknown","User","Service","Job"] },
    "display": { "type": "string", "maxLength": 128 }
  },
  "required": ["id","type"]
}

C# (gRPC code-first)

[DataContract]
public sealed class LegalHold
{
    [DataMember(Order = 1)]  public string HoldId { get; init; } = default!; // ULID
    [DataMember(Order = 2)]  public string TenantId { get; init; } = default!;
    [DataMember(Order = 3)]  public string CaseId { get; init; } = default!;
    [DataMember(Order = 4)]  public string Reason { get; init; } = default!;
    [DataMember(Order = 5)]  public string? Note { get; init; }
    [DataMember(Order = 6)]  public LegalHoldState State { get; init; } = LegalHoldState.Active;
    [DataMember(Order = 7)]  public LegalHoldScope Scope { get; init; } = new();
    [DataMember(Order = 8)]  public AppliesTo AppliesTo { get; init; } = AppliesTo.Both;
    [DataMember(Order = 9)]  public DateTimeOffset PlacedAt { get; init; }
    [DataMember(Order = 10)] public ActorRef PlacedBy { get; init; } = new();
    [DataMember(Order = 11)] public DateTimeOffset? ReleasedAt { get; init; }
    [DataMember(Order = 12)] public ActorRef? ReleasedBy { get; init; }
    [DataMember(Order = 13)] public DateTimeOffset? ExpiresAt { get; init; }
    [DataMember(Order = 14)] public int Version { get; init; } = 1;
}

[DataContract]
public enum LegalHoldState { Active = 0, Released = 1, Expired = 2 }

[DataContract]
public enum AppliesTo { Existing = 0, Future = 1, Both = 2 }

[DataContract]
public sealed class LegalHoldScope
{
    [DataMember(Order = 1)] public IReadOnlyList<string>? ResourceTypes { get; init; }
    [DataMember(Order = 2)] public IReadOnlyList<string>? Actions { get; init; }
    [DataMember(Order = 3)] public IReadOnlyDictionary<string,string>? Attributes { get; init; }
    [DataMember(Order = 4)] public IReadOnlyList<DataClass>? DataClasses { get; init; }
    [DataMember(Order = 5)] public TimeRange? TimeRange { get; init; }
}

[DataContract]
public sealed class TimeRange
{
    [DataMember(Order = 1)] public string Anchor { get; init; } = "CreatedAt"; // CreatedAt|ObservedAt|EffectiveAt
    [DataMember(Order = 2)] public DateTimeOffset? From { get; init; }
    [DataMember(Order = 3)] public DateTimeOffset? To { get; init; }
}

// Reuse ActorRef from the Actor model section (minimal).
[DataContract]
public sealed class ActorRef
{
    [DataMember(Order = 1)] public string Id { get; init; } = default!;
    [DataMember(Order = 2)] public ActorType Type { get; init; } = ActorType.Unknown;
    [DataMember(Order = 3)] public string? Display { get; init; }
}

JSON serialization uses camelCase; DB sticks to PascalCase (LegalHolds, columns like HoldId, PlacedAt, State).


Protobuf (optional emission)

syntax = "proto3";
package connectsoft.policy.v1;

message LegalHold {
  string HoldId = 1 [json_name = "holdId"];
  string TenantId = 2 [json_name = "tenantId"];
  string CaseId = 3 [json_name = "caseId"];
  string Reason = 4 [json_name = "reason"];
  string Note = 5 [json_name = "note"];
  LegalHoldState State = 6 [json_name = "state"];
  LegalHoldScope Scope = 7 [json_name = "scope"];
  AppliesTo AppliesTo = 8 [json_name = "appliesTo"];
  google.protobuf.Timestamp PlacedAt = 9 [json_name = "placedAt"];
  ActorRef PlacedBy = 10 [json_name = "placedBy"];
  google.protobuf.Timestamp ReleasedAt = 11 [json_name = "releasedAt"];
  ActorRef ReleasedBy = 12 [json_name = "releasedBy"];
  google.protobuf.Timestamp ExpiresAt = 13 [json_name = "expiresAt"];
  int32 Version = 14 [json_name = "version"];
}

enum LegalHoldState { LegalHoldState_Active = 0; LegalHoldState_Released = 1; LegalHoldState_Expired = 2; }
enum AppliesTo { AppliesTo_Existing = 0; AppliesTo_Future = 1; AppliesTo_Both = 2; }

message LegalHoldScope {
  repeated string ResourceTypes = 1 [json_name = "resourceTypes"];
  repeated string Actions = 2 [json_name = "actions"];
  map<string,string> Attributes = 3 [json_name = "attributes"];
  repeated string DataClasses = 4 [json_name = "dataClasses"]; // enum names
  TimeRange TimeRange = 5 [json_name = "timeRange"];
}

message TimeRange {
  string Anchor = 1 [json_name = "anchor"]; // "CreatedAt"|"ObservedAt"|"EffectiveAt"
  google.protobuf.Timestamp From = 2 [json_name = "from"];
  google.protobuf.Timestamp To = 3 [json_name = "to"];
}

message ActorRef {
  string Id = 1 [json_name = "id"];
  string Type = 2 [json_name = "type"];
  string Display = 3 [json_name = "display"];
}

State machine

stateDiagram-v2
  [*] --> Active: Place
  Active --> Released: Release
  Active --> Expired: Now >= ExpiresAt (auto)
  Released --> [*]
  Expired --> [*]
Hold "Alt" / "Option" to enable pan & zoom

Transition rules

  • Place: Create LegalHold in Active with placedAt, placedBy.
  • Release: Set state=Released, releasedAt, releasedBy, and increment version.
  • Expire: If expiresAt is set and the current time passes it, transition to Expired (idempotent).
  • Immutability: caseId, reason, scope, and appliesTo are immutable after placement.

Matching semantics

A record is under hold if any Active hold satisfying:

  • tenantId matches,
  • scope matches (resource.type, action, attributes glob, dataClasses, and timeRange against the selected anchor time), and
  • if appliesTo = Existing, the record’s anchor time is ≤ placedAt; if Future, anchor time is ≥ placedAt; Both ignores this bifurcation.

Examples

Place a hold over October appointment events for a case

{
  "holdId": "01JE4MVQ20P5P8J4V9Q9T0T3FH",
  "tenantId": "splootvets",
  "caseId": "CASE-2025-001",
  "reason": "Litigation hold for October appointments",
  "state": "Active",
  "scope": {
    "resourceTypes": ["Vetspire.Appointment"],
    "actions": ["appointment.*"],
    "timeRange": { "anchor": "CreatedAt", "from": "2025-10-01T00:00:00Z", "to": "2025-10-31T23:59:59Z" }
  },
  "appliesTo": "Both",
  "placedAt": "2025-10-22T10:00:00Z",
  "placedBy": { "id": "legal.ops", "type": "Service", "display": "Legal Ops" },
  "version": 1
}

Release the hold

{
  "holdId": "01JE4MVQ20P5P8J4V9Q9T0T3FH",
  "state": "Released",
  "releasedAt": "2025-12-01T09:00:00Z",
  "releasedBy": { "id": "user_789", "type": "User", "display": "Attorney Smith" },
  "version": 2
}

Storage mapping

CREATE TABLE dbo.LegalHolds (
  HoldId     CHAR(26)      NOT NULL,  -- ULID
  TenantId   NVARCHAR(128) NOT NULL,
  CaseId     NVARCHAR(64)  NOT NULL,
  Reason     NVARCHAR(256) NOT NULL,
  Note       NVARCHAR(1024) NULL,
  State      NVARCHAR(16)  NOT NULL,  -- Active|Released|Expired
  ScopeJson  NVARCHAR(MAX) NOT NULL,  -- JSON (LegalHoldScope)
  AppliesTo  NVARCHAR(16)  NOT NULL,  -- Existing|Future|Both
  PlacedAt   DATETIME2(0)  NOT NULL,
  PlacedBy   NVARCHAR(256) NOT NULL,  -- JSON (ActorRef)
  ReleasedAt DATETIME2(0)  NULL,
  ReleasedBy NVARCHAR(256) NULL,      -- JSON (ActorRef)
  ExpiresAt  DATETIME2(0)  NULL,
  Version    INT           NOT NULL,
  CONSTRAINT PK_LegalHolds PRIMARY KEY (HoldId),
  INDEX IX_LegalHolds_Tenant_State (TenantId, State),
  INDEX IX_LegalHolds_ExpiresAt (ExpiresAt)
);

Optionally materialize a membership table to snapshot matches for reporting or export:

CREATE TABLE dbo.LegalHoldAssignments (
  HoldId        CHAR(26)     NOT NULL,
  AuditRecordId CHAR(26)     NOT NULL,
  TenantId      NVARCHAR(128) NOT NULL,
  AssignedAt    DATETIME2(0) NOT NULL,
  UnassignedAt  DATETIME2(0) NULL,
  CONSTRAINT PK_LegalHoldAssignments PRIMARY KEY (HoldId, AuditRecordId),
  INDEX IX_LegalHoldAssignments_Tenant (TenantId),
  INDEX IX_LegalHoldAssignments_Record (AuditRecordId)
);

The assignment table is maintained by a background matcher that:

  • On place: backfills existing matches per scope and begins streaming future records.
  • On release/expire: sets UnassignedAt for active assignments.

Budgets & caps

  • Max active holds per tenant: 1,000.
  • Max resourceTypes/actions per scope: 64 each.
  • note ≤ 1024 chars.
  • timeRange width is unbounded, but large ranges increase backfill cost—prefer explicit ranges per case.

Validation rules (summary)

  • holdId ULID format; tenantId token pattern.
  • Immutable after placement: tenantId, caseId, reason, scope, appliesTo.
  • releasedAt requires releasedBy; both set together.
  • When expiresAt passes, state auto-transitions to Expired (idempotent).
  • Matching uses UTC and the specified timeRange.anchor.

Export Models & Manifests

Defines the Export domain: the long-running ExportJob that selects and packages records, the per-package ExportManifest (verifiable metadata + integrity), and the delivery/signature envelopes. Exports respect Legal Holds, Retention, and the effective Redaction Plan.


Overview

  • Job orchestration: A durable ExportJob scans by filter/window and emits Packages (shards) for parallel delivery.
  • Deterministic packages: Each package ships data (e.g., JSONL) plus a signed manifest with content hashes and integrity roots.
  • Resume safety: Jobs are resumable via a compact resumeToken (ULID high-watermark + time watermarks).
  • Compliance: Exports may include integrity proofs (block/segment roots + signature) for end-to-end verification.
  • Redaction: Data materializes under a specific Redaction Plan (policy id/revision).

ExportJob

JSON (lowerCamel) C# (PascalCase) Type Req. Description
jobId JobId ULID Unique job id.
tenantId TenantId string Tenant scope.
createdAt CreatedAt timestamp Job creation (UTC).
createdBy CreatedBy ActorRef Requestor (minimal).
state State enum Pending | Running | Paused | Completed | Failed | Canceled.
stateReason StateReason string Short human reason (≤256).
filter Filter ExportFilter Selection (see below).
format Format enum Jsonl | Parquet (default Jsonl).
compression Compression enum None | Gzip (default Gzip).
encryption Encryption object Package encryption (see below).
includeIntegrity IncludeIntegrity bool Include proofs (default true).
sealedThrough SealedThrough timestamp Only include records whose IntegrityBlock.SealedAt ≤ sealedThrough.
redactionPlan RedactionPlan object { "id": string, "revision": int }.
delivery Delivery object Where/how to deliver (descriptor).
packageBytesTarget PackageBytesTarget int Target uncompressed bytes per package (e.g., 512 MiB).
maxPackages MaxPackages int Safety cap.
progress Progress object { "records": long, "bytes": long, "packages": int } (updated as job runs).
resumeToken ResumeToken string Opaque token to resume from checkpoint.
callbacks Callbacks array\<Callback> Webhook notifications (events below).
lastUpdatedAt LastUpdatedAt timestamp Monotonic update time.

ExportFilter

  • timeRange{ "from": ts?, "to": ts?, "anchor": "CreatedAt"|"ObservedAt"|"EffectiveAt" }
  • resourceTypes — array of PascalCase (supports suffix *)
  • actions — array of verb or verb.noun (supports suffix *)
  • attributes — map of key→value/glob (match against AuditRecord.attributes)
  • dataClasses — array of DataClass (match if any present on record/delta)
  • legalHoldOnly — bool (return only records currently under hold)

Encryption (per package)

  • schemeNone | AES256-GCM
  • keyId — KMS/KeyVault/JWKS id of wrapping key (when AES256-GCM)
  • wrappedKey — base64, envelope-encrypted DEK (writer only; optional in job)

Delivery (descriptor)

  • kind: S3 | GCS | AzureBlob | Sftp | HttpsCallback
  • path: bucket/container + prefix or remote path
  • credentialsRef: secret reference (never inline secrets)
  • callback (for HttpsCallback): { "url": "...", "auth": { "kind": "Hmac", "secretRef": "...", "header": "X-Signature" } }

Callback events

  • ExportJob.Started|Paused|Resumed|Completed|Failed|Canceled
  • ExportPackage.Ready|Delivered|Failed

JSON Schemas (v1)

export-job.v1.json

{
  "$id": "urn:connectsoft:schemas:export/export-job.v1.json",
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "title": "ExportJob",
  "type": "object",
  "additionalProperties": false,
  "properties": {
    "jobId": { "type": "string", "pattern": "^[0-9A-HJKMNP-TV-Z]{26}$" },
    "tenantId": { "type": "string", "pattern": "^[A-Za-z0-9._-]{1,128}$" },
    "createdAt": { "type": "string", "format": "date-time" },
    "createdBy": { "$ref": "urn:connectsoft:schemas:partials/actor-ref.v1.json" },
    "state": { "type": "string", "enum": ["Pending","Running","Paused","Completed","Failed","Canceled"] },
    "stateReason": { "type": "string", "maxLength": 256 },
    "filter": { "$ref": "urn:connectsoft:schemas:export/export-filter.v1.json" },
    "format": { "type": "string", "enum": ["Jsonl","Parquet"], "default": "Jsonl" },
    "compression": { "type": "string", "enum": ["None","Gzip"], "default": "Gzip" },
    "encryption": { "$ref": "urn:connectsoft:schemas:export/encryption.v1.json" },
    "includeIntegrity": { "type": "boolean", "default": true },
    "sealedThrough": { "type": "string", "format": "date-time" },
    "redactionPlan": {
      "type": "object",
      "additionalProperties": false,
      "properties": {
        "id": { "type": "string", "maxLength": 128 },
        "revision": { "type": "integer", "minimum": 1 }
      },
      "required": ["id","revision"]
    },
    "delivery": { "$ref": "urn:connectsoft:schemas:export/delivery.v1.json" },
    "packageBytesTarget": { "type": "integer", "minimum": 1 },
    "maxPackages": { "type": "integer", "minimum": 1 },
    "progress": {
      "type": "object",
      "additionalProperties": false,
      "properties": {
        "records": { "type": "integer", "minimum": 0 },
        "bytes": { "type": "integer", "minimum": 0 },
        "packages": { "type": "integer", "minimum": 0 }
      }
    },
    "resumeToken": { "type": "string", "maxLength": 256 },
    "callbacks": {
      "type": "array",
      "items": { "$ref": "urn:connectsoft:schemas:export/callback.v1.json" },
      "maxItems": 8
    },
    "lastUpdatedAt": { "type": "string", "format": "date-time" }
  },
  "required": ["jobId","tenantId","createdAt","createdBy","state","filter","format","delivery","redactionPlan","lastUpdatedAt"]
}

export-filter.v1.json

{
  "$id": "urn:connectsoft:schemas:export/export-filter.v1.json",
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "title": "ExportFilter",
  "type": "object",
  "additionalProperties": false,
  "properties": {
    "timeRange": {
      "type": "object",
      "additionalProperties": false,
      "properties": {
        "from": { "type": "string", "format": "date-time" },
        "to": { "type": "string", "format": "date-time" },
        "anchor": { "type": "string", "enum": ["CreatedAt","ObservedAt","EffectiveAt"], "default": "CreatedAt" }
      }
    },
    "resourceTypes": { "type": "array", "items": { "type": "string" }, "maxItems": 64 },
    "actions": { "type": "array", "items": { "type": "string" }, "maxItems": 64 },
    "attributes": { "type": "object", "additionalProperties": { "type": "string" } },
    "dataClasses": { "type": "array", "items": { "$ref": "urn:connectsoft:schemas:policy/data-class.v1.json" }, "maxItems": 16 },
    "legalHoldOnly": { "type": "boolean", "default": false }
  }
}

delivery.v1.json

{
  "$id": "urn:connectsoft:schemas:export/delivery.v1.json",
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "title": "Delivery",
  "type": "object",
  "additionalProperties": false,
  "properties": {
    "kind": { "type": "string", "enum": ["S3","GCS","AzureBlob","Sftp","HttpsCallback"] },
    "path": { "type": "string", "maxLength": 512 },
    "credentialsRef": { "type": "string", "maxLength": 128 },
    "callback": {
      "type": "object",
      "additionalProperties": false,
      "properties": {
        "url": { "type": "string", "format": "uri" },
        "auth": {
          "type": "object",
          "additionalProperties": false,
          "properties": {
            "kind": { "type": "string", "enum": ["Hmac"] },
            "secretRef": { "type": "string", "maxLength": 128 },
            "header": { "type": "string", "maxLength": 64 }
          }
        }
      }
    }
  },
  "required": ["kind","path"]
}

encryption.v1.json

{
  "$id": "urn:connectsoft:schemas:export/encryption.v1.json",
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "title": "Encryption",
  "type": "object",
  "additionalProperties": false,
  "properties": {
    "scheme": { "type": "string", "enum": ["None","AES256-GCM"], "default": "None" },
    "keyId": { "type": "string", "maxLength": 128 },
    "wrappedKey": { "type": "string", "contentEncoding": "base64" }
  }
}

callback.v1.json

{
  "$id": "urn:connectsoft:schemas:export/callback.v1.json",
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "title": "Callback",
  "type": "object",
  "additionalProperties": false,
  "properties": {
    "event": { "type": "string" },
    "url": { "type": "string", "format": "uri" },
    "hmacSecretRef": { "type": "string", "maxLength": 128 },
    "header": { "type": "string", "maxLength": 64 }
  },
  "required": ["event","url"]
}

ExportManifest (per package)

Describes the payload file(s), their hashes, package boundaries, and integrity proofs (if included). Manifest itself can be signed.

Field (JSON) Type Req. Description
schemaVersion string e.g., export-manifest.v1.
jobId ULID Back-reference to job.
packageId ULID Unique package id.
tenantId string Tenant scope.
createdAt timestamp Package creation time (UTC).
packageIndex int 0-based sequence within job.
packageCount int Total packages (known when job completes).
format enum Jsonl
compression enum None
encryption object Same shape as job (final applied).
redactionPlan object { "id": string, "revision": int }.
recordCount long Number of records in package.
bytesUncompressed long Sum of raw bytes.
content array<ContentFile> One or more files (shards) in this package.
bounds object { "minRecordId": ULID, "maxRecordId": ULID, "from": ts?, "to": ts? }.
integrity object { "blocks": [...], "segments": [...] } minimal bundle (see below).
contentHash string SHA256 over concatenated content files (post-compression, pre-encryption).
signature Signature Detached signature of manifest JSON canonical bytes.

ContentFile

  • name (string) — filename (e.g., export_<jobId>_<index>.jsonl.gz)
  • uri (string) — delivery URI (s3://… or https://…)
  • bytes (long) — size of the stored file
  • records (long) — number of records in file
  • sha256 (string) — SHA256 of the stored file bytes

integrity bundle (minimal, to validate records within package)

  • segments: array of { "segmentId": ULID, "rootHash": hex64, "blockId": ULID } (deduplicated)
  • blocks: array of { "blockId": ULID, "blockRoot": hex64, "prevBlockRoot": hex64, "signature": { "scheme": "Ed25519"|"PKCS7", "value": base64 }, "signingKeyId": string }

Signature

  • scheme: Ed25519 | PKCS7
  • value: base64 detached signature

Manifest file naming

  • export_<jobId>_<packageIndex>.manifest.json (optionally .sig alongside for detached signature)

export-manifest.v1.json

{
  "$id": "urn:connectsoft:schemas:export/export-manifest.v1.json",
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "title": "ExportManifest",
  "type": "object",
  "additionalProperties": false,
  "properties": {
    "schemaVersion": { "type": "string", "pattern": "^export-manifest\\.v[0-9]+$" },
    "jobId": { "type": "string", "pattern": "^[0-9A-HJKMNP-TV-Z]{26}$" },
    "packageId": { "type": "string", "pattern": "^[0-9A-HJKMNP-TV-Z]{26}$" },
    "tenantId": { "type": "string", "pattern": "^[A-Za-z0-9._-]{1,128}$" },
    "createdAt": { "type": "string", "format": "date-time" },
    "packageIndex": { "type": "integer", "minimum": 0 },
    "packageCount": { "type": "integer", "minimum": 1 },
    "format": { "type": "string", "enum": ["Jsonl","Parquet"] },
    "compression": { "type": "string", "enum": ["None","Gzip"] },
    "encryption": { "$ref": "urn:connectsoft:schemas:export/encryption.v1.json" },
    "redactionPlan": {
      "type": "object",
      "properties": { "id": { "type": "string" }, "revision": { "type": "integer" } },
      "required": ["id","revision"],
      "additionalProperties": false
    },
    "recordCount": { "type": "integer", "minimum": 0 },
    "bytesUncompressed": { "type": "integer", "minimum": 0 },
    "content": {
      "type": "array",
      "items": {
        "type": "object", "additionalProperties": false,
        "properties": {
          "name": { "type": "string" },
          "uri":  { "type": "string" },
          "bytes": { "type": "integer", "minimum": 0 },
          "records": { "type": "integer", "minimum": 0 },
          "sha256": { "type": "string", "pattern": "^[a-f0-9]{64}$" }
        },
        "required": ["name","uri","bytes","records","sha256"]
      }
    },
    "bounds": {
      "type": "object",
      "additionalProperties": false,
      "properties": {
        "minRecordId": { "type": "string", "pattern": "^[0-9A-HJKMNP-TV-Z]{26}$" },
        "maxRecordId": { "type": "string", "pattern": "^[0-9A-HJKMNP-TV-Z]{26}$" },
        "from": { "type": "string", "format": "date-time" },
        "to": { "type": "string", "format": "date-time" }
      },
      "required": ["minRecordId","maxRecordId"]
    },
    "integrity": {
      "type": "object",
      "additionalProperties": false,
      "properties": {
        "segments": {
          "type": "array",
          "items": {
            "type": "object", "additionalProperties": false,
            "properties": {
              "segmentId": { "type": "string", "pattern": "^[0-9A-HJKMNP-TV-Z]{26}$" },
              "rootHash": { "type": "string", "pattern": "^[a-f0-9]{64}$" },
              "blockId": { "type": "string", "pattern": "^[0-9A-HJKMNP-TV-Z]{26}$" }
            },
            "required": ["segmentId","rootHash","blockId"]
          }
        },
        "blocks": {
          "type": "array",
          "items": {
            "type": "object", "additionalProperties": false,
            "properties": {
              "blockId": { "type": "string", "pattern": "^[0-9A-HJKMNP-TV-Z]{26}$" },
              "blockRoot": { "type": "string", "pattern": "^[a-f0-9]{64}$" },
              "prevBlockRoot": { "type": "string", "pattern": "^[a-f0-9]{64}$" },
              "signature": {
                "type": "object", "additionalProperties": false,
                "properties": {
                  "scheme": { "type": "string", "enum": ["Ed25519","PKCS7"] },
                  "value": { "type": "string", "contentEncoding": "base64" }
                },
                "required": ["scheme","value"]
              },
              "signingKeyId": { "type": "string" }
            },
            "required": ["blockId","blockRoot","prevBlockRoot","signature","signingKeyId"]
          }
        }
      }
    },
    "contentHash": { "type": "string", "pattern": "^[a-f0-9]{64}$" },
    "signature": {
      "type": "object", "additionalProperties": false,
      "properties": {
        "scheme": { "type": "string", "enum": ["Ed25519","PKCS7"] },
        "value": { "type": "string", "contentEncoding": "base64" }
      }
    }
  },
  "required": ["schemaVersion","jobId","packageId","tenantId","createdAt","packageIndex","format","compression","redactionPlan","recordCount","bytesUncompressed","content","bounds","contentHash"]
}

C# (gRPC code-first)

[DataContract]
public sealed class ExportJob
{
    [DataMember(Order = 1)]  public string JobId { get; init; } = default!;      // ULID
    [DataMember(Order = 2)]  public string TenantId { get; init; } = default!;
    [DataMember(Order = 3)]  public DateTimeOffset CreatedAt { get; init; }
    [DataMember(Order = 4)]  public ActorRef CreatedBy { get; init; } = new();
    [DataMember(Order = 5)]  public ExportJobState State { get; init; } = ExportJobState.Pending;
    [DataMember(Order = 6)]  public string? StateReason { get; init; }

    [DataMember(Order = 7)]  public ExportFilter Filter { get; init; } = new();
    [DataMember(Order = 8)]  public ExportFormat Format { get; init; } = ExportFormat.Jsonl;
    [DataMember(Order = 9)]  public Compression Compression { get; init; } = Compression.Gzip;
    [DataMember(Order = 10)] public Encryption? Encryption { get; init; }
    [DataMember(Order = 11)] public bool IncludeIntegrity { get; init; } = true;
    [DataMember(Order = 12)] public DateTimeOffset? SealedThrough { get; init; }
    [DataMember(Order = 13)] public RedactionPlanRef RedactionPlan { get; init; } = new();
    [DataMember(Order = 14)] public Delivery Delivery { get; init; } = new();

    [DataMember(Order = 15)] public int? PackageBytesTarget { get; init; }
    [DataMember(Order = 16)] public int? MaxPackages { get; init; }
    [DataMember(Order = 17)] public ExportProgress? Progress { get; init; }
    [DataMember(Order = 18)] public string? ResumeToken { get; init; }
    [DataMember(Order = 19)] public IReadOnlyList<Callback>? Callbacks { get; init; }
    [DataMember(Order = 20)] public DateTimeOffset LastUpdatedAt { get; init; }
}

public enum ExportJobState { Pending=0, Running=1, Paused=2, Completed=3, Failed=4, Canceled=5 }
public enum ExportFormat { Jsonl=0, Parquet=1 }
public enum Compression { None=0, Gzip=1 }

[DataContract] public sealed class RedactionPlanRef { [DataMember(Order = 1)] public string Id { get; init; } = default!; [DataMember(Order = 2)] public int Revision { get; init; } }
[DataContract] public sealed class ExportProgress { [DataMember(Order = 1)] public long Records { get; init; } [DataMember(Order = 2)] public long Bytes { get; init; } [DataMember(Order = 3)] public int Packages { get; init; } }

[DataContract]
public sealed class ExportFilter
{
    [DataMember(Order = 1)] public TimeRange? TimeRange { get; init; }
    [DataMember(Order = 2)] public IReadOnlyList<string>? ResourceTypes { get; init; }
    [DataMember(Order = 3)] public IReadOnlyList<string>? Actions { get; init; }
    [DataMember(Order = 4)] public IReadOnlyDictionary<string,string>? Attributes { get; init; }
    [DataMember(Order = 5)] public IReadOnlyList<DataClass>? DataClasses { get; init; }
    [DataMember(Order = 6)] public bool? LegalHoldOnly { get; init; }
}

[DataContract] public sealed class Encryption { [DataMember(Order = 1)] public string Scheme { get; init; } = "None"; [DataMember(Order = 2)] public string? KeyId { get; init; } [DataMember(Order = 3)] public string? WrappedKey { get; init; } }

[DataContract]
public sealed class Delivery
{
    [DataMember(Order = 1)] public string Kind { get; init; } = default!;   // "S3"|"GCS"|...
    [DataMember(Order = 2)] public string Path { get; init; } = default!;
    [DataMember(Order = 3)] public string? CredentialsRef { get; init; }
    [DataMember(Order = 4)] public DeliveryCallback? Callback { get; init; }
}
[DataContract] public sealed class DeliveryCallback { [DataMember(Order = 1)] public string Url { get; init; } = default!; [DataMember(Order = 2)] public HmacAuth? Auth { get; init; } }
[DataContract] public sealed class HmacAuth { [DataMember(Order = 1)] public string Kind { get; init; } = "Hmac"; [DataMember(Order = 2)] public string SecretRef { get; init; } = default!; [DataMember(Order = 3)] public string Header { get; init; } = "X-Signature"; }
[DataContract] public sealed class Callback { [DataMember(Order = 1)] public string Event { get; init; } = default!; [DataMember(Order = 2)] public string Url { get; init; } = default!; [DataMember(Order = 3)] public string? HmacSecretRef { get; init; } [DataMember(Order = 4)] public string? Header { get; init; } }

Manifest

[DataContract]
public sealed class ExportManifest
{
    [DataMember(Order = 1)]  public string SchemaVersion { get; init; } = "export-manifest.v1";
    [DataMember(Order = 2)]  public string JobId { get; init; } = default!;
    [DataMember(Order = 3)]  public string PackageId { get; init; } = default!;
    [DataMember(Order = 4)]  public string TenantId { get; init; } = default!;
    [DataMember(Order = 5)]  public DateTimeOffset CreatedAt { get; init; }
    [DataMember(Order = 6)]  public int PackageIndex { get; init; }
    [DataMember(Order = 7)]  public int? PackageCount { get; init; }
    [DataMember(Order = 8)]  public ExportFormat Format { get; init; } = ExportFormat.Jsonl;
    [DataMember(Order = 9)]  public Compression Compression { get; init; } = Compression.Gzip;
    [DataMember(Order = 10)] public Encryption? Encryption { get; init; }
    [DataMember(Order = 11)] public RedactionPlanRef RedactionPlan { get; init; } = new();
    [DataMember(Order = 12)] public long RecordCount { get; init; }
    [DataMember(Order = 13)] public long BytesUncompressed { get; init; }
    [DataMember(Order = 14)] public IReadOnlyList<ContentFile> Content { get; init; } = Array.Empty<ContentFile>();
    [DataMember(Order = 15)] public ExportBounds Bounds { get; init; } = new();
    [DataMember(Order = 16)] public IntegrityBundle? Integrity { get; init; }
    [DataMember(Order = 17)] public string ContentHash { get; init; } = default!;
    [DataMember(Order = 18)] public Signature? Signature { get; init; }
}

[DataContract] public sealed class ContentFile { [DataMember(Order = 1)] public string Name { get; init; } = default!; [DataMember(Order = 2)] public string Uri { get; init; } = default!; [DataMember(Order = 3)] public long Bytes { get; init; } [DataMember(Order = 4)] public long Records { get; init; } [DataMember(Order = 5)] public string Sha256 { get; init; } = default!; }
[DataContract] public sealed class ExportBounds { [DataMember(Order = 1)] public string MinRecordId { get; init; } = default!; [DataMember(Order = 2)] public string MaxRecordId { get; init; } = default!; [DataMember(Order = 3)] public DateTimeOffset? From { get; init; } [DataMember(Order = 4)] public DateTimeOffset? To { get; init; } }
[DataContract] public sealed class IntegrityBundle { [DataMember(Order = 1)] public IReadOnlyList<SegmentRef>? Segments { get; init; } [DataMember(Order = 2)] public IReadOnlyList<BlockRef>? Blocks { get; init; } }
[DataContract] public sealed class SegmentRef { [DataMember(Order = 1)] public string SegmentId { get; init; } = default!; [DataMember(Order = 2)] public string RootHash { get; init; } = default!; [DataMember(Order = 3)] public string BlockId { get; init; } = default!; }
[DataContract] public sealed class BlockRef { [DataMember(Order = 1)] public string BlockId { get; init; } = default!; [DataMember(Order = 2)] public string BlockRoot { get; init; } = default!; [DataMember(Order = 3)] public string PrevBlockRoot { get; init; } = default!; [DataMember(Order = 4)] public Signature Signature { get; init; } = new(); [DataMember(Order = 5)] public string SigningKeyId { get; init; } = default!; }
[DataContract] public sealed class Signature { [DataMember(Order = 1)] public string Scheme { get; init; } = "Ed25519"; [DataMember(Order = 2)] public string Value { get; init; } = default!; }

Protobuf (optional emission)

syntax = "proto3";
package connectsoft.export.v1;

message ExportJob {
  string JobId = 1 [json_name = "jobId"];
  string TenantId = 2 [json_name = "tenantId"];
  google.protobuf.Timestamp CreatedAt = 3 [json_name = "createdAt"];
  ActorRef CreatedBy = 4 [json_name = "createdBy"];
  string State = 5 [json_name = "state"]; // Pending|Running|...
  string StateReason = 6 [json_name = "stateReason"];
  ExportFilter Filter = 7 [json_name = "filter"];
  string Format = 8 [json_name = "format"]; // Jsonl|Parquet
  string Compression = 9 [json_name = "compression"]; // None|Gzip
  Encryption Encryption = 10 [json_name = "encryption"];
  bool IncludeIntegrity = 11 [json_name = "includeIntegrity"];
  google.protobuf.Timestamp SealedThrough = 12 [json_name = "sealedThrough"];
  RedactionPlanRef RedactionPlan = 13 [json_name = "redactionPlan"];
  Delivery Delivery = 14 [json_name = "delivery"];
  int32 PackageBytesTarget = 15 [json_name = "packageBytesTarget"];
  int32 MaxPackages = 16 [json_name = "maxPackages"];
  string ResumeToken = 17 [json_name = "resumeToken"];
  google.protobuf.Timestamp LastUpdatedAt = 18 [json_name = "lastUpdatedAt"];
}

message ExportFilter {
  TimeRange TimeRange = 1 [json_name = "timeRange"];
  repeated string ResourceTypes = 2 [json_name = "resourceTypes"];
  repeated string Actions = 3 [json_name = "actions"];
  map<string,string> Attributes = 4 [json_name = "attributes"];
  repeated string DataClasses = 5 [json_name = "dataClasses"];
  bool LegalHoldOnly = 6 [json_name = "legalHoldOnly"];
}

message ExportManifest {
  string SchemaVersion = 1 [json_name = "schemaVersion"];
  string JobId = 2 [json_name = "jobId"];
  string PackageId = 3 [json_name = "packageId"];
  string TenantId = 4 [json_name = "tenantId"];
  google.protobuf.Timestamp CreatedAt = 5 [json_name = "createdAt"];
  int32 PackageIndex = 6 [json_name = "packageIndex"];
  int32 PackageCount = 7 [json_name = "packageCount"];
  string Format = 8 [json_name = "format"];
  string Compression = 9 [json_name = "compression"];
  Encryption Encryption = 10 [json_name = "encryption"];
  RedactionPlanRef RedactionPlan = 11 [json_name = "redactionPlan"];
  int64 RecordCount = 12 [json_name = "recordCount"];
  int64 BytesUncompressed = 13 [json_name = "bytesUncompressed"];
  repeated ContentFile Content = 14 [json_name = "content"];
  ExportBounds Bounds = 15 [json_name = "bounds"];
  IntegrityBundle Integrity = 16 [json_name = "integrity"];
  string ContentHash = 17 [json_name = "contentHash"];
  Signature Signature = 18 [json_name = "signature"];
}

message ContentFile { string Name = 1 [json_name = "name"]; string Uri = 2 [json_name = "uri"]; int64 Bytes = 3 [json_name = "bytes"]; int64 Records = 4 [json_name = "records"]; string Sha256 = 5 [json_name = "sha256"]; }
message ExportBounds { string MinRecordId = 1 [json_name = "minRecordId"]; string MaxRecordId = 2 [json_name = "maxRecordId"]; google.protobuf.Timestamp From = 3 [json_name = "from"]; google.protobuf.Timestamp To = 4 [json_name = "to"]; }
message IntegrityBundle { repeated SegmentRef Segments = 1 [json_name = "segments"]; repeated BlockRef Blocks = 2 [json_name = "blocks"]; }
message SegmentRef { string SegmentId = 1 [json_name = "segmentId"]; string RootHash = 2 [json_name = "rootHash"]; string BlockId = 3 [json_name = "blockId"]; }
message BlockRef { string BlockId = 1 [json_name = "blockId"]; string BlockRoot = 2 [json_name = "blockRoot"]; string PrevBlockRoot = 3 [json_name = "prevBlockRoot"]; Signature Signature = 4 [json_name = "signature"]; string SigningKeyId = 5 [json_name = "signingKeyId"]; }

message Encryption { string Scheme = 1 [json_name = "scheme"]; string KeyId = 2 [json_name = "keyId"]; string WrappedKey = 3 [json_name = "wrappedKey"]; }
message RedactionPlanRef { string Id = 1 [json_name = "id"]; int32 Revision = 2 [json_name = "revision"]; }
message Delivery { string Kind = 1 [json_name = "kind"]; string Path = 2 [json_name = "path"]; string CredentialsRef = 3 [json_name = "credentialsRef"]; }
message Signature { string Scheme = 1 [json_name = "scheme"]; string Value = 2 [json_name = "value"]; }

Examples

Job request (JSON)

{
  "jobId": "01JE5N3WTQ4J9V7M4A1ZP6D9TQ",
  "tenantId": "splootvets",
  "createdAt": "2025-10-22T15:00:00Z",
  "createdBy": { "id": "user_321", "type": "User", "display": "Ops Analyst" },
  "state": "Pending",
  "filter": {
    "timeRange": { "from": "2025-10-01T00:00:00Z", "to": "2025-10-21T23:59:59Z", "anchor": "CreatedAt" },
    "resourceTypes": ["Vetspire.Appointment"],
    "actions": ["appointment.*"],
    "dataClasses": ["Personal","Sensitive"]
  },
  "format": "Jsonl",
  "compression": "Gzip",
  "includeIntegrity": true,
  "sealedThrough": "2025-10-21T23:59:59Z",
  "redactionPlan": { "id": "policy-default", "revision": 3 },
  "delivery": { "kind": "S3", "path": "s3://exports/splootvets/2025-10/" },
  "packageBytesTarget": 536870912
}

Manifest (per package)

{
  "schemaVersion": "export-manifest.v1",
  "jobId": "01JE5N3WTQ4J9V7M4A1ZP6D9TQ",
  "packageId": "01JE5N7C3Q9Z2K8R1V0M5D4N6P",
  "tenantId": "splootvets",
  "createdAt": "2025-10-22T15:10:12Z",
  "packageIndex": 0,
  "format": "Jsonl",
  "compression": "Gzip",
  "redactionPlan": { "id": "policy-default", "revision": 3 },
  "recordCount": 125_000,
  "bytesUncompressed": 412_345_678,
  "content": [
    {
      "name": "export_01JE5N3WTQ_000.jsonl.gz",
      "uri": "s3://exports/splootvets/2025-10/export_01JE5N3WTQ_000.jsonl.gz",
      "bytes": 98_765_432,
      "records": 125_000,
      "sha256": "9e1d4c0b...ab7f"
    }
  ],
  "bounds": {
    "minRecordId": "01JDZZZZZZZZZZZZZZZZZZZZZZ",
    "maxRecordId": "01JE0000000000000000000000",
    "from": "2025-10-01T00:00:00Z",
    "to": "2025-10-10T00:00:00Z"
  },
  "integrity": {
    "segments": [
      { "segmentId": "01JE5H....1A", "rootHash": "ab12...ff", "blockId": "01JE5H....B1" }
    ],
    "blocks": [
      {
        "blockId": "01JE5H....B1",
        "blockRoot": "a0b1c2...9d",
        "prevBlockRoot": "90fe12...aa",
        "signature": { "scheme": "Ed25519", "value": "MEYCIQ..." },
        "signingKeyId": "kv:prod/atp-integrity/ed25519-2025-01"
      }
    ]
  },
  "contentHash": "0a1b2c3d4e5f...fe",
  "signature": { "scheme": "Ed25519", "value": "MC4CFQ..." }
}

Package file (JSONL snippet)

{"auditRecordId":"01JE...","tenantId":"splootvets", "...": "..."}
{"auditRecordId":"01JE...","tenantId":"splootvets", "...": "..."}

State machine

stateDiagram-v2
  [*] --> Pending
  Pending --> Running: Start
  Running --> Paused: Pause
  Paused --> Running: Resume
  Running --> Completed: All packages delivered
  Running --> Failed: Error (with reason)
  Pending --> Canceled: Cancel
  Running --> Canceled: Cancel
  Paused --> Canceled: Cancel
Hold "Alt" / "Option" to enable pan & zoom

Resume tokens

Opaque resumeToken encodes the last committed checkpoint:

  • lastRecordId (ULID) and watermark (UTC) of the anchor clock,
  • packageIndex and byte offset (for partial file resume) if supported,
  • HMAC for tamper detection.

Writers MUST treat resumeToken as opaque and validate HMAC + monotonic advance.


Storage mapping

CREATE TABLE dbo.ExportJobs (
  JobId           CHAR(26)      NOT NULL,
  TenantId        NVARCHAR(128) NOT NULL,
  State           NVARCHAR(16)  NOT NULL,
  StateReason     NVARCHAR(256) NULL,
  CreatedAt       DATETIME2(0)  NOT NULL,
  CreatedBy       NVARCHAR(256) NOT NULL,   -- JSON (ActorRef)
  FilterJson      NVARCHAR(MAX) NOT NULL,
  Format          NVARCHAR(16)  NOT NULL,   -- Jsonl|Parquet
  Compression     NVARCHAR(16)  NOT NULL,   -- None|Gzip
  EncryptionJson  NVARCHAR(256) NULL,
  IncludeIntegrity BIT          NOT NULL DEFAULT 1,
  SealedThrough   DATETIME2(0)  NULL,
  RedactionPlan   NVARCHAR(64)  NOT NULL,   -- "id:revision"
  DeliveryJson    NVARCHAR(512) NOT NULL,
  PackageBytesTarget INT        NULL,
  MaxPackages     INT           NULL,
  ProgressJson    NVARCHAR(128) NULL,
  ResumeToken     NVARCHAR(256) NULL,
  LastUpdatedAt   DATETIME2(0)  NOT NULL,
  CONSTRAINT PK_ExportJobs PRIMARY KEY (JobId),
  INDEX IX_ExportJobs_Tenant_State (TenantId, State)
);

CREATE TABLE dbo.ExportPackages (
  PackageId        CHAR(26)     NOT NULL,
  JobId            CHAR(26)     NOT NULL,
  TenantId         NVARCHAR(128) NOT NULL,
  PackageIndex     INT          NOT NULL,
  ManifestJson     NVARCHAR(MAX) NOT NULL,
  DeliveredAt      DATETIME2(0)  NULL,
  DeliveryResult   NVARCHAR(512) NULL,     -- etag/url/etc
  CONSTRAINT PK_ExportPackages PRIMARY KEY (PackageId),
  CONSTRAINT FK_ExportPackages_Jobs FOREIGN KEY (JobId) REFERENCES dbo.ExportJobs(JobId),
  INDEX IX_ExportPackages_Job (JobId, PackageIndex)
);

Validation rules (summary)

  • sealedThroughcurrent time; only include records with IntegrityBlock.SealedAt ≤ sealedThrough when set.
  • Legal hold respected: records under active hold are excluded unless export purpose is a hold export (then include and label).
  • Determinism: contentHash and per-file sha256 must match delivered bytes; manifest signature verifies canonical JSON.
  • Redaction: exported records MUST reflect the specified redactionPlan; no raw PII beyond the plan.
  • Package bounds: minRecordId ≤ maxRecordId; packageIndex unique per job.
  • Encryption: when AES256-GCM, each file has its own nonce/IV; wrappedKey present or retrievable via keyId.

Tenancy Keys & Partitioning

Defines the tenant identity (tenantId) and the partition/sharding strategies used to enforce isolation, enable predictable scalability, and satisfy data residency constraints across storage and compute.


Overview

  • Tenant-first: All authoritative writes and read models are keyed by tenantId. Cross-tenant joins are prohibited.
  • Predictable locality: Partition primarily by tenantId, secondarily by time (createdAt / ULID time) to keep pruning cheap.
  • Shard ring: For horizontally scaled backends, map tenants to logical shards via a stable, HMAC-based hashing scheme.
  • Row-Level Security (RLS): Enforce tenant isolation at the database layer via session-scoped predicates/policies.
  • Residency: Each tenant declares a home region and allowed regions; data placement honors these rules end-to-end.

tenantId rules

Aspect Rule
Shape Opaque ASCII token: ^[A-Za-z0-9._-]{1,128}$
Stability Immutable for the life of the tenant. No rename-in-place (use migration tooling if absolutely necessary).
Case Case-sensitive by default (treat as opaque); do not normalize at write.
Exposure Safe to appear in indexes, URIs, file paths. Do not embed secrets or PII.
Scope Unique within the platform (global).
Derivatives tenantScopedId = "<tenantId>:<type>:<id>" (see ResourceRef); tenantHash = HMAC-SHA256(tenantId, shardSecret) used for sharding only.

JSON uses lowerCamel (tenantId); database tables/columns use PascalCase (TenantId) per conventions.


Sharding & partitioning

Logical shard assignment

  • Compute tenantHash = HMACSHA256(tenantId, shardSecret) (hex), then:
    • shardId = (uint32)first4Bytes(tenantHash) % ringSize
    • ringVersion increments when the fleet is rebalanced; mapping is persisted for auditability.

Physical partitioning (authoritative store)

  • Partition key: TenantId
  • Secondary prune: CreatedAt (or UlidTime derived from AuditRecordId)
  • Indexes:
    • (TenantId, CreatedAt) for range scans
    • (TenantId, IdempotencyKey) filtered unique (when present)

Hotspot guidance

  • ULIDs are time-ordered; to avoid hot partitions, always prefix by TenantId and use time-bucketed partitions (e.g., monthly).
  • Large “whale” tenants may receive dedicated shards (explicit shardId override) while retaining the same logical model.

Data residency

Residency policy (per tenant)

Field (JSON) Type Req. Description
homeRegion string Canonical region (e.g., eu-west-1, us-central).
allowedRegions string[] Regions where data-at-rest may reside.
pinToHome bool If true, authoritative data stored only in homeRegion.
replication enum None | AsyncCrossRegion | MultiActive.
exceptions object Categories allowed to cross borders (e.g., "telemetry": "aggregated-only").

Readers/writers must respect residency at ingress, storage, index, backup, and export time.


JSON Schemas (partials, v1)

tenant-context.v1.json
Context carried internally to route requests and validate RLS.

{
  "$id": "urn:connectsoft:schemas:tenancy/tenant-context.v1.json",
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "title": "TenantContext",
  "type": "object",
  "additionalProperties": false,
  "properties": {
    "tenantId": { "type": "string", "pattern": "^[A-Za-z0-9._-]{1,128}$" },
    "shardId": { "type": "integer", "minimum": 0 },
    "ringVersion": { "type": "integer", "minimum": 1 },
    "homeRegion": { "type": "string", "maxLength": 32 },
    "effectiveRegion": { "type": "string", "maxLength": 32 }
  },
  "required": ["tenantId","shardId","ringVersion"]
}

residency-policy.v1.json

{
  "$id": "urn:connectsoft:schemas:tenancy/residency-policy.v1.json",
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "title": "TenantResidencyPolicy",
  "type": "object",
  "additionalProperties": false,
  "properties": {
    "tenantId": { "type": "string", "pattern": "^[A-Za-z0-9._-]{1,128}$" },
    "homeRegion": { "type": "string" },
    "allowedRegions": { "type": "array", "items": { "type": "string" }, "minItems": 1 },
    "pinToHome": { "type": "boolean", "default": false },
    "replication": { "type": "string", "enum": ["None","AsyncCrossRegion","MultiActive"], "default": "None" },
    "exceptions": { "type": "object", "additionalProperties": { "type": "string" } },
    "version": { "type": "integer", "minimum": 1 }
  },
  "required": ["tenantId","homeRegion","allowedRegions","version"]
}

shard-mapping.v1.json

{
  "$id": "urn:connectsoft:schemas:tenancy/shard-mapping.v1.json",
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "title": "ShardMapping",
  "type": "object",
  "additionalProperties": false,
  "properties": {
    "ringVersion": { "type": "integer", "minimum": 1 },
    "ringSize": { "type": "integer", "minimum": 1 },
    "assignments": {
      "type": "array",
      "items": {
        "type": "object",
        "additionalProperties": false,
        "properties": {
          "tenantId": { "type": "string", "pattern": "^[A-Za-z0-9._-]{1,128}$" },
          "shardId": { "type": "integer", "minimum": 0 }
        },
        "required": ["tenantId","shardId"]
      }
    }
  },
  "required": ["ringVersion","ringSize","assignments"]
}

C# (gRPC code-first)

[DataContract]
public sealed class TenantContext
{
    [DataMember(Order = 1)] public string TenantId { get; init; } = default!;
    [DataMember(Order = 2)] public int ShardId { get; init; }
    [DataMember(Order = 3)] public int RingVersion { get; init; }
    [DataMember(Order = 4)] public string? HomeRegion { get; init; }
    [DataMember(Order = 5)] public string? EffectiveRegion { get; init; }
}

[DataContract]
public sealed class TenantResidencyPolicy
{
    [DataMember(Order = 1)] public string TenantId { get; init; } = default!;
    [DataMember(Order = 2)] public string HomeRegion { get; init; } = default!;
    [DataMember(Order = 3)] public IReadOnlyList<string> AllowedRegions { get; init; } = Array.Empty<string>();
    [DataMember(Order = 4)] public bool PinToHome { get; init; }
    [DataMember(Order = 5)] public string Replication { get; init; } = "None"; // None|AsyncCrossRegion|MultiActive
    [DataMember(Order = 6)] public IReadOnlyDictionary<string,string>? Exceptions { get; init; }
    [DataMember(Order = 7)] public int Version { get; init; } = 1;
}

public static class ShardRing
{
    // Derive a stable shard from tenantId
    public static int ComputeShardId(string tenantId, int ringSize, byte[] shardSecret)
    {
        using var hmac = new System.Security.Cryptography.HMACSHA256(shardSecret);
        var bytes = System.Text.Encoding.UTF8.GetBytes(tenantId);
        var hash = hmac.ComputeHash(bytes);
        var value = System.Buffers.Binary.BinaryPrimitives.ReadUInt32BigEndian(hash.AsSpan(0, 4));
        return (int)(value % ringSize);
    }
}

JSON serialization MUST use camelCase. Database schema and columns use PascalCase (e.g., Tenants, TenantId, HomeRegion, ShardId).


RLS (Row-Level Security) notes

PostgreSQL

-- Session setup (application must set this per connection)
SELECT set_config('app.tenant_id', :tenant_id, TRUE);

ALTER TABLE "AuditRecords" ENABLE ROW LEVEL SECURITY;

CREATE POLICY tenant_isolation ON "AuditRecords"
  USING ("TenantId" = current_setting('app.tenant_id'));

-- Optional read-only policy for auditors scoped to a list
CREATE POLICY auditor_scope ON "AuditRecords"
  FOR SELECT
  USING ("TenantId" = ANY (current_setting('app.auditor_tenants')::text[]));

SQL Server

-- Session context set by application:
EXEC sp_set_session_context @key = N'tenant_id', @value = @tenantId;

CREATE FUNCTION dbo.fn_tenantPredicate(@TenantId AS NVARCHAR(128))
RETURNS TABLE WITH SCHEMABINDING
AS RETURN SELECT 1 AS fn_result
WHERE @TenantId = CONVERT(NVARCHAR(128), SESSION_CONTEXT(N'tenant_id'));

CREATE SECURITY POLICY dbo.TenantSecurityPolicy
ADD FILTER PREDICATE dbo.fn_tenantPredicate(TenantId) ON dbo.AuditRecords,
ADD BLOCK PREDICATE dbo.fn_tenantPredicate(TenantId) ON dbo.AuditRecords
WITH (STATE = ON);

Operational guardrails

  • Always set the session tenant context before any query.
  • Use least-privilege service accounts; forbid BYPASSRLS-equivalent privileges.
  • Mirror the same predicates in reporting/BI and CDC pipelines.

Storage mapping (authoritative)

SQL (illustrative)

-- Monthly partitioning by CreatedAt in addition to TenantId indexes
CREATE TABLE dbo.AuditRecords (
  AuditRecordId CHAR(26)     NOT NULL,
  TenantId      NVARCHAR(128) NOT NULL,
  CreatedAt     DATETIME2(3) NOT NULL,
  -- ... other columns (see AuditRecord)
  CONSTRAINT PK_AuditRecords PRIMARY KEY (AuditRecordId)
);

CREATE INDEX IX_AuditRecords_Tenant_CreatedAt
  ON dbo.AuditRecords (TenantId, CreatedAt);

-- Optional computed 'CreatedMonth' for partition pruning
ALTER TABLE dbo.AuditRecords ADD CreatedMonth AS (CONVERT(CHAR(7), CreatedAt, 126)) PERSISTED;
CREATE INDEX IX_AuditRecords_Tenant_CreatedMonth ON dbo.AuditRecords (TenantId, CreatedMonth);

Object storage (packages/exports) Prefix paths with tenant and region to keep listings cheap and enforce residency:

s3://{bucket}/{region}/{tenantId}/exports/{yyyy}/{MM}/{jobId}/...

Search & index tenants

  • Create one index per tenant (aliasing pattern) when using full-text engines, e.g., audit-{tenantId} or audit-{region}-{tenantId} when residency matters.
  • Alternatively, multi-tenant indices must include a hard filter (tenantId) in index templates/mappings and be protected by index-level RBAC.

Examples

Tenant context resolved for routing

{
  "tenantId": "splootvets",
  "shardId": 7,
  "ringVersion": 3,
  "homeRegion": "us-central",
  "effectiveRegion": "us-central"
}

Residency policy

{
  "tenantId": "eucorp",
  "homeRegion": "eu-west-1",
  "allowedRegions": ["eu-west-1","eu-central-1"],
  "pinToHome": true,
  "replication": "None",
  "version": 2
}

Validation rules (summary)

  • tenantId matches ^[A-Za-z0-9._-]{1,128}$; no whitespace or slashes.
  • TenantContext.shardId in [0, ringSize-1]; ringVersion strictly positive.
  • Write path requires a resolved TenantContext; reject writes lacking one.
  • Residency: effectiveRegionallowedRegions; when pinToHome=true, authoritative writes must target homeRegion.
  • RLS policies must be present and ON for all multi-tenant tables, including projections and CDC shadow tables.

Authoritative Stores (Write Path)

Models the append-only source of truth for audit facts. The write path persists a canonical AuditRecord (minus late-bound integrity) and enforces WORM (Write-Once-Read-Many) semantics with minimal indexes for durable ingestion and backpressure-friendly throughput.


Overview

  • Append-only facts: New rows are inserted; no UPDATE/DELETE. Late-bound materials (e.g., integrity proofs) land in sidecar append tables.
  • Tenant-first: All rows are keyed by TenantId (see Tenancy Keys & Partitioning). Cross-tenant joins are prohibited.
  • Canonical JSON: Store the full record as canonical JSON (JCS/RFC8785) in PayloadJson using camelCase field names; the integrity node is excluded at write.
  • Idempotency: A per-tenant IdempotencyKey supports safe retries; duplicates are ignored and the existing AuditRecordId is returned.
  • Minimal indexing: Only the keys needed for durability, dedupe, and range scans on time. All query-optimized shapes live in read projections (query path).

C# properties / gRPC code-first: PascalCase. JSON payload: lowerCamel. Tables/columns: PascalCase.


Logical model (authoritative)

AuditRecords (append-only, authoritative fact)

Column (PascalCase) Type Req. Description
AuditRecordId ULID (CHAR(26)) Primary key (time-ordered).
TenantId string Tenant scope token.
CreatedAt timestamp(UTC) When the producer says this fact occurred.
ObservedAt timestamp(UTC) When the platform observed/accepted it.
EffectiveAt timestamp(UTC) Optional domain-effective time.
Action string verb or verb.noun (lowercase).
ResourceType string From resource.type (PascalCase).
ResourceId string From resource.id (opaque).
ResourcePath string Optional JSON-Pointer-style path.
ActorId string From actor.id (opaque).
ActorType enum Unknown|User|Service|Job.
CorrelationTraceId hex32 W3C trace id.
CorrelationRequestId string Optional request token.
DecisionOutcome enum If present on write (Allow|Deny|NotApplicable|Unknown).
IdempotencyKey string Optional per-tenant dedupe key.
SchemaVersion smallint AuditRecord schema version embedded in PayloadJson.
PayloadJson JSON/JSONB/NVARCHAR(MAX) Entire canonical AuditRecord JSON without integrity.
PayloadBytes int Raw payload size (bytes), for budgeting/backpressure.

RecordIntegrity (sidecar, append-only; set by Integrity Service post-seal)

Column Type Req. Description
AuditRecordId ULID FK → AuditRecords.
BlockId ULID Integrity block that sealed this record.
SegmentId ULID Segment containing the leaf.
LeafIndex int Zero-based leaf index in segment.
LeafHash hex64 SHA-256 of canonical record bytes (no integrity).
Algo string SHA256.
MerklePathJson JSON Array of { pos: "L"|"R", hash: hex64 }.
SealedAt timestamp(UTC) When the block was sealed/signed.

Keeping integrity in a sidecar maintains strict WORM for AuditRecords while still allowing verifiable proofs.


C# (persistence rows; gRPC code-first)

[DataContract]
public sealed class AuditRecordRow
{
    [DataMember(Order = 1)]  public string AuditRecordId { get; init; } = default!; // ULID
    [DataMember(Order = 2)]  public string TenantId { get; init; } = default!;
    [DataMember(Order = 3)]  public DateTimeOffset CreatedAt { get; init; }
    [DataMember(Order = 4)]  public DateTimeOffset ObservedAt { get; init; }
    [DataMember(Order = 5)]  public DateTimeOffset? EffectiveAt { get; init; }

    [DataMember(Order = 6)]  public string Action { get; init; } = default!;
    [DataMember(Order = 7)]  public string ResourceType { get; init; } = default!;
    [DataMember(Order = 8)]  public string ResourceId { get; init; } = default!;
    [DataMember(Order = 9)]  public string? ResourcePath { get; init; }

    [DataMember(Order = 10)] public string ActorId { get; init; } = default!;
    [DataMember(Order = 11)] public ActorType ActorType { get; init; } = ActorType.Unknown;

    [DataMember(Order = 12)] public string CorrelationTraceId { get; init; } = default!;
    [DataMember(Order = 13)] public string? CorrelationRequestId { get; init; }

    [DataMember(Order = 14)] public DecisionOutcome? DecisionOutcome { get; init; }

    [DataMember(Order = 15)] public string? IdempotencyKey { get; init; }
    [DataMember(Order = 16)] public short SchemaVersion { get; init; } = 1;

    [DataMember(Order = 17)] public string PayloadJson { get; init; } = default!; // canonical JSON (lowerCamel)
    [DataMember(Order = 18)] public int PayloadBytes { get; init; }
}

[DataContract]
public sealed class RecordIntegrityRow
{
    [DataMember(Order = 1)]  public string AuditRecordId { get; init; } = default!;
    [DataMember(Order = 2)]  public string BlockId { get; init; } = default!;
    [DataMember(Order = 3)]  public string SegmentId { get; init; } = default!;
    [DataMember(Order = 4)]  public int    LeafIndex { get; init; }
    [DataMember(Order = 5)]  public string LeafHash { get; init; } = default!;   // hex SHA-256
    [DataMember(Order = 6)]  public string Algo { get; init; } = "SHA256";
    [DataMember(Order = 7)]  public string MerklePathJson { get; init; } = default!; // JSON array
    [DataMember(Order = 8)]  public DateTimeOffset SealedAt { get; init; }
}

PayloadJson follows the canonical JSON rules from Modeling Principles & Conventions. SchemaVersion must match the embedded auditRecord.schemaVersion.


Storage mapping (PostgreSQL)

-- Authoritative facts (append-only)
CREATE TABLE "AuditRecords" (
  "AuditRecordId"  CHAR(26)      PRIMARY KEY,
  "TenantId"       TEXT          NOT NULL,
  "CreatedAt"      TIMESTAMPTZ   NOT NULL,
  "ObservedAt"     TIMESTAMPTZ   NOT NULL,
  "EffectiveAt"    TIMESTAMPTZ   NULL,

  "Action"         TEXT          NOT NULL,
  "ResourceType"   TEXT          NOT NULL,
  "ResourceId"     TEXT          NOT NULL,
  "ResourcePath"   TEXT          NULL,

  "ActorId"        TEXT          NOT NULL,
  "ActorType"      SMALLINT      NOT NULL,  -- enum ordinal

  "CorrelationTraceId"  CHAR(32) NOT NULL,  -- hex
  "CorrelationRequestId" TEXT    NULL,

  "DecisionOutcome" SMALLINT     NULL,      -- enum ordinal

  "IdempotencyKey"  TEXT         NULL,
  "SchemaVersion"   SMALLINT     NOT NULL DEFAULT 1,

  "PayloadJson"     JSONB        NOT NULL,
  "PayloadBytes"    INTEGER      NOT NULL
);

-- Sidecar integrity (append-only)
CREATE TABLE "RecordIntegrity" (
  "AuditRecordId" CHAR(26)    PRIMARY KEY REFERENCES "AuditRecords"("AuditRecordId"),
  "BlockId"       CHAR(26)    NOT NULL,
  "SegmentId"     CHAR(26)    NOT NULL,
  "LeafIndex"     INTEGER     NOT NULL,
  "LeafHash"      CHAR(64)    NOT NULL,   -- hex
  "Algo"          TEXT        NOT NULL DEFAULT 'SHA256',
  "MerklePathJson" JSONB      NOT NULL,   -- [{pos:'L'|'R',hash:'...'}, ...]
  "SealedAt"      TIMESTAMPTZ NOT NULL
);

-- Minimal indexes for durability & idempotency
CREATE INDEX "IX_Audit_Tenant_CreatedAt" ON "AuditRecords" ("TenantId","CreatedAt");
CREATE INDEX "IX_Audit_Tenant_Trace"      ON "AuditRecords" ("TenantId","CorrelationTraceId");
CREATE UNIQUE INDEX "UX_Audit_Tenant_Idem" ON "AuditRecords" ("TenantId","IdempotencyKey")
  WHERE "IdempotencyKey" IS NOT NULL;

-- WORM enforcement: block UPDATE/DELETE; allow INSERT only
CREATE OR REPLACE FUNCTION fn_auditrecords_block_ud() RETURNS trigger AS $$
BEGIN
  RAISE EXCEPTION 'WORM: AuditRecords are append-only';
END; $$ LANGUAGE plpgsql;

CREATE TRIGGER "trg_auditrecords_no_update"
  BEFORE UPDATE OR DELETE ON "AuditRecords" FOR EACH ROW EXECUTE FUNCTION fn_auditrecords_block_ud();

CREATE OR REPLACE FUNCTION fn_recordintegrity_block_ud() RETURNS trigger AS $$
BEGIN
  RAISE EXCEPTION 'WORM: RecordIntegrity is append-only';
END; $$ LANGUAGE plpgsql;

CREATE TRIGGER "trg_recordintegrity_no_update"
  BEFORE UPDATE OR DELETE ON "RecordIntegrity" FOR EACH ROW EXECUTE FUNCTION fn_recordintegrity_block_ud();

Storage mapping (SQL Server)

-- Authoritative facts (append-only)
CREATE TABLE dbo.AuditRecords (
  AuditRecordId   CHAR(26)      NOT NULL CONSTRAINT PK_AuditRecords PRIMARY KEY,
  TenantId        NVARCHAR(128) NOT NULL,
  CreatedAt       DATETIME2(3)  NOT NULL,
  ObservedAt      DATETIME2(3)  NOT NULL,
  EffectiveAt     DATETIME2(3)  NULL,

  Action          NVARCHAR(64)  NOT NULL,
  ResourceType    NVARCHAR(128) NOT NULL,
  ResourceId      NVARCHAR(128) NOT NULL,
  ResourcePath    NVARCHAR(256) NULL,

  ActorId         NVARCHAR(128) NOT NULL,
  ActorType       SMALLINT      NOT NULL,   -- enum ordinal

  CorrelationTraceId  CHAR(32)  NOT NULL,
  CorrelationRequestId NVARCHAR(128) NULL,

  DecisionOutcome  SMALLINT     NULL,       -- enum ordinal

  IdempotencyKey   NVARCHAR(128) NULL,
  SchemaVersion    SMALLINT     NOT NULL CONSTRAINT DF_AuditRecords_SchemaVersion DEFAULT (1),

  PayloadJson      NVARCHAR(MAX) NOT NULL,
  PayloadBytes     INT           NOT NULL
);

-- Sidecar integrity
CREATE TABLE dbo.RecordIntegrity (
  AuditRecordId   CHAR(26)     NOT NULL CONSTRAINT PK_RecordIntegrity PRIMARY KEY
                               CONSTRAINT FK_RecordIntegrity_Audit FOREIGN KEY REFERENCES dbo.AuditRecords(AuditRecordId),
  BlockId         CHAR(26)     NOT NULL,
  SegmentId       CHAR(26)     NOT NULL,
  LeafIndex       INT          NOT NULL,
  LeafHash        CHAR(64)     NOT NULL,
  Algo            NVARCHAR(16) NOT NULL CONSTRAINT DF_RecordIntegrity_Algo DEFAULT ('SHA256'),
  MerklePathJson  NVARCHAR(MAX) NOT NULL,
  SealedAt        DATETIME2(3) NOT NULL
);

-- Minimal indexes
CREATE INDEX IX_Audit_Tenant_CreatedAt ON dbo.AuditRecords (TenantId, CreatedAt);
CREATE INDEX IX_Audit_Tenant_Trace     ON dbo.AuditRecords (TenantId, CorrelationTraceId);
CREATE UNIQUE INDEX UX_Audit_Tenant_Idem ON dbo.AuditRecords (TenantId, IdempotencyKey) WHERE IdempotencyKey IS NOT NULL;

-- WORM enforcement via INSTEAD OF triggers
CREATE TRIGGER trg_AuditRecords_NoUpdateDelete ON dbo.AuditRecords
INSTEAD OF UPDATE, DELETE AS
BEGIN
  RAISERROR ('WORM: AuditRecords are append-only', 16, 1);
END;

CREATE TRIGGER trg_RecordIntegrity_NoUpdateDelete ON dbo.RecordIntegrity
INSTEAD OF UPDATE, DELETE AS
BEGIN
  RAISERROR ('WORM: RecordIntegrity is append-only', 16, 1);
END;

Write path flow (high level)

  1. Ingress (Gateway/Service) validates & canonicalizes AuditRecord (JSON, no integrity), assigns AuditRecordId (ULID), computes PayloadBytes.
  2. Idempotency check: If IdempotencyKey provided, attempt insert with unique (TenantId, IdempotencyKey); on conflict, return the existing AuditRecordId.
  3. Insert into AuditRecords with minimal indexes only (low write amplification).
  4. Integrity Service batches sealed segments/blocks and appends a row to RecordIntegrity for each AuditRecordId included, carrying LeafHash, MerklePathJson, and SealedAt.
  5. Projectors build query-optimized read models asynchronously (see Read Models & Projections).

Budgets & caps

  • Max PayloadBytes at write: 256 KiB (see Performance & Size Budgets). Oversized records must pre-redact or summarize (hash + Delta.truncated=true).
  • Max write QPS per tenant (soft): tiered by edition; apply backpressure when PayloadBytes or QPS budgets are exceeded.
  • IdempotencyKey TTL: keep the unique key for ≥ 24 hours (configurable) to absorb retries safely.

WORM guidance & operational controls

  • SQL-layer WORM: Use INSTEAD OF UPDATE/DELETE triggers (SQL Server) or BEFORE UPDATE/DELETE triggers (PostgreSQL) to block mutations.
  • ACLs: Only the ingestion service account has INSERT; reporting users get SELECT only. Deny schema-altering privileges.
  • Physical immutability (optional): Stream all inserted rows into an object-storage WORM tier (e.g., S3 Object Lock / Azure Immutable Blob) for secondary immutability and eDiscovery exports.
  • Retention: Deletions happen only via lifecycle after records are Eligible and not OnHold (see Retention Policy & Legal Hold). Lifecycle performs hard delete from AuditRecords and cascades RecordIntegrity.

Validation rules (summary)

  • TenantId required and must pass tenancy predicate (RLS).
  • CreatedAt ≤ ObservedAt ≤ now; EffectiveAtCreatedAt (if present).
  • ResourceType PascalCase; Action lowercase verb or verb.noun.
  • CorrelationTraceId is hex32 (W3C).
  • IdempotencyKey unique per (TenantId, IdempotencyKey) when not null.
  • PayloadJson must validate against AuditRecord v{SchemaVersion} and use lowerCamel property names.

Read Models & Projections (Query Path)

Defines query-optimized projections used by APIs, consoles, and exports. Projections are derived, denormalized, and rebuildable from the authoritative append store. They support seek-pagination, per-tenant watermarks, and idempotent upserts.

JSON: lowerCamel. C#/gRPC (code-first): PascalCase. Tables/columns: PascalCase.


Overview

  • Shapes:
    • Events: flat, filterable event stream per tenant for search/list views.
    • Resource Timeline: fast per-resource history (resource.type + resource.id).
    • Actor Activity: fast per-actor history.
  • Selective fields: Only hot fields are projected (keep the canonical in authoritative store).
  • At-least-once projectors: Use idempotent upserts keyed by (TenantId, AuditRecordId).
  • Watermarks & checkpoints per tenant and projection, for resumable processing & rebuilds.
  • Seek-pagination: stable order (CreatedAt, AuditRecordId); opaque base64url cursor.

Canonical event projection

AuditEvents (one row per AuditRecord)

Column (PascalCase) Type Req. Notes
TenantId string Partition/RLS key.
AuditRecordId ULID Unique; PK with tenant.
CreatedAt timestamp(UTC) Primary sort key.
ObservedAt timestamp(UTC) Secondary time.
Action string verb or verb.noun (lowercase).
ResourceType string PascalCase.
ResourceId string Opaque id.
ActorId string Opaque id.
ActorType smallint Enum ordinal.
DecisionOutcome smallint Enum ordinal, if present.
ChangedFields nvarchar/json e.g., ["status","/lines/0/price"] (summary).
DataClassFlags smallint Bitmask over DataClass (Public=1, Internal=2, Personal=4, Sensitive=8, Credential=16, Phi=32).
CorrelationTraceId char(32) hex32.
IntegrityBlockId ULID Optional mirror for join-free proof lookups.
PayloadBytes int For paging/budget hints.

Indexes

  • PK: (TenantId, AuditRecordId)
  • Sort/filter: IX_AuditEvents_Tenant_CreatedAt (TenantId, CreatedAt DESC, AuditRecordId DESC)
  • Selectivity helpers:
    • (TenantId, ResourceType, ResourceId, CreatedAt DESC, AuditRecordId DESC)
    • (TenantId, ActorId, CreatedAt DESC, AuditRecordId DESC)
    • (TenantId, DecisionOutcome)
    • (TenantId, DataClassFlags)

ChangedFields is a compact array (≤64 entries) extracted from Delta.fields keys.


Resource timeline projection

ResourceEvents (subset tuned for GET /resources/{type}/{id}/events)

Column Type Req. Notes
TenantId string
ResourceType string
ResourceId string
Seq bigint Monotonic per (Tenant,Resource) (gapless best-effort).
CreatedAt timestamp
AuditRecordId ULID
Action string
ActorId string
DecisionOutcome smallint
ChangedFields nvarchar/json

Indexes

  • PK: (TenantId, ResourceType, ResourceId, Seq)
  • Seek: (TenantId, ResourceType, ResourceId, CreatedAt DESC, AuditRecordId DESC)

Actor activity projection

ActorEvents (subset tuned for GET /actors/{actorId}/events)

Column Type Req.
TenantId string
ActorId string
Seq bigint
CreatedAt timestamp
AuditRecordId ULID
Action string
ResourceType string
ResourceId string
DecisionOutcome smallint

Indexes

  • PK: (TenantId, ActorId, Seq)
  • Seek: (TenantId, ActorId, CreatedAt DESC, AuditRecordId DESC)

Watermarks & checkpoints

ProjectionCheckpoints (one row per tenant × projection)

Column Type Req. Notes
Projection string e.g., AuditEvents, ResourceEvents, ActorEvents.
TenantId string
HighWaterRecordId ULID Last fully applied AuditRecordId.
HighWaterObservedAt timestamp Tie-break/time sanity.
Version int Projection schema/version.
UpdatedAt timestamp Monotonic clock.
RebuildToken nvarchar Opaque state during rebuild (optional).

Semantics

  • At-least-once: projectors may re-process a record; all target tables use UPSERT on (TenantId, AuditRecordId) or (TenantId,Key,Seq) with idempotent content.
  • Rebuild: set checkpoint to floor (HighWaterRecordId = 000…), stream forward; keep writer-exclusive lease to avoid double writers.

Pagination cursors (seek)

Sort order: (CreatedAt ASC, AuditRecordId ASC) for forward, DESC for reverse listings.

Cursor payload (binary layout)
{ version:1, direction:'f'|'b', createdAtUtc: int64 (ms), auditRecordId: 26-byte ULID }
Encoded as base64url; opaque to clients.

Request parameters

  • cursor (string, optional)
  • limit (1–1000; default 100)
  • direction (forward|backward; default forward)

Next cursor generation

  • For forward paging: take the last row’s (CreatedAt, AuditRecordId) and encode.
  • For backward paging: use the first row’s keys.

Cursors are per-tenant; APIs must enforce that the cursor’s tenant matches the request’s tenant.


JSON Schemas (partials, v1)

events-list-response.v1.json

{
  "$id": "urn:connectsoft:schemas:read/events-list-response.v1.json",
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "title": "EventsListResponse",
  "type": "object",
  "additionalProperties": false,
  "properties": {
    "items": {
      "type": "array",
      "items": { "$ref": "urn:connectsoft:schemas:read/event-row.v1.json" }
    },
    "next": { "type": "string" },
    "prev": { "type": "string" },
    "count": { "type": "integer", "minimum": 0 }
  },
  "required": ["items"]
}

event-row.v1.json (projection row)

{
  "$id": "urn:connectsoft:schemas:read/event-row.v1.json",
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "title": "EventRow",
  "type": "object",
  "additionalProperties": false,
  "properties": {
    "auditRecordId": { "type": "string", "pattern": "^[0-9A-HJKMNP-TV-Z]{26}$" },
    "createdAt": { "type": "string", "format": "date-time" },
    "observedAt": { "type": "string", "format": "date-time" },
    "action": { "type": "string" },
    "resourceType": { "type": "string" },
    "resourceId": { "type": "string" },
    "actorId": { "type": "string" },
    "actorType": { "type": "string" },
    "decisionOutcome": { "type": "string" },
    "changedFields": { "type": "array", "items": { "type": "string" }, "maxItems": 64 },
    "dataClassFlags": { "type": "integer", "minimum": 0 }
  },
  "required": ["auditRecordId","createdAt","action","resourceType","resourceId","actorId"]
}

C# (gRPC code-first)

[DataContract]
public sealed class ListEventsRequest
{
    [DataMember(Order = 1)] public string TenantId { get; init; } = default!;
    [DataMember(Order = 2)] public string? Cursor { get; init; }
    [DataMember(Order = 3)] public int Limit { get; init; } = 100;
    [DataMember(Order = 4)] public string Direction { get; init; } = "forward"; // forward|backward

    // Optional filters (applied when generating the page)
    [DataMember(Order = 5)] public string? ResourceType { get; init; }
    [DataMember(Order = 6)] public string? ResourceId { get; init; }
    [DataMember(Order = 7)] public string? ActorId { get; init; }
    [DataMember(Order = 8)] public string? Action { get; init; }
    [DataMember(Order = 9)] public short? DecisionOutcome { get; init; }
    [DataMember(Order = 10)] public short? DataClassFlags { get; init; } // bitmask
    [DataMember(Order = 11)] public DateTimeOffset? From { get; init; }
    [DataMember(Order = 12)] public DateTimeOffset? To { get; init; }
}

[DataContract]
public sealed class EventRow
{
    [DataMember(Order = 1)]  public string AuditRecordId { get; init; } = default!;
    [DataMember(Order = 2)]  public DateTimeOffset CreatedAt { get; init; }
    [DataMember(Order = 3)]  public DateTimeOffset ObservedAt { get; init; }
    [DataMember(Order = 4)]  public string Action { get; init; } = default!;
    [DataMember(Order = 5)]  public string ResourceType { get; init; } = default!;
    [DataMember(Order = 6)]  public string ResourceId { get; init; } = default!;
    [DataMember(Order = 7)]  public string ActorId { get; init; } = default!;
    [DataMember(Order = 8)]  public string ActorType { get; init; } = "Unknown";
    [DataMember(Order = 9)]  public string? DecisionOutcome { get; init; }
    [DataMember(Order = 10)] public IReadOnlyList<string>? ChangedFields { get; init; }
    [DataMember(Order = 11)] public short? DataClassFlags { get; init; }
}

[DataContract]
public sealed class ListEventsResponse
{
    [DataMember(Order = 1)] public IReadOnlyList<EventRow> Items { get; init; } = Array.Empty<EventRow>();
    [DataMember(Order = 2)] public string? Next { get; init; }
    [DataMember(Order = 3)] public string? Prev { get; init; }
    [DataMember(Order = 4)] public int Count { get; init; }
}

Cursor utility (example)

public static class CursorCodec
{
    public static string Encode(DateTimeOffset createdAt, string auditRecordId, bool forward = true)
    {
        var dir = forward ? (byte)'f' : (byte)'b';
        var ts = createdAt.ToUnixTimeMilliseconds();
        Span<byte> buf = stackalloc byte[1 + 8 + 26];
        buf[0] = dir;
        BitConverter.TryWriteBytes(buf.Slice(1,8), System.Buffers.Binary.BinaryPrimitives.ReverseEndianness((long)ts));
        System.Text.Encoding.ASCII.GetBytes(auditRecordId, buf.Slice(9, 26));
        return Base64UrlEncode(buf);
    }
    public static (DateTimeOffset ts, string id, bool forward) Decode(string cursor) { /* inverse of Encode */ throw new NotImplementedException(); }
    private static string Base64UrlEncode(ReadOnlySpan<byte> bytes) { var s = Convert.ToBase64String(bytes.ToArray()); return s.Replace('+','-').Replace('/','_').TrimEnd('='); }
}

Storage mapping (SQL Server / PostgreSQL)

SQL Server (illustrative)

CREATE TABLE dbo.AuditEvents (
  TenantId        NVARCHAR(128) NOT NULL,
  AuditRecordId   CHAR(26)      NOT NULL,
  CreatedAt       DATETIME2(3)  NOT NULL,
  ObservedAt      DATETIME2(3)  NOT NULL,
  Action          NVARCHAR(64)  NOT NULL,
  ResourceType    NVARCHAR(128) NOT NULL,
  ResourceId      NVARCHAR(128) NOT NULL,
  ActorId         NVARCHAR(128) NOT NULL,
  ActorType       SMALLINT      NOT NULL,
  DecisionOutcome SMALLINT      NULL,
  ChangedFields   NVARCHAR(2000) NULL,    -- JSON array
  DataClassFlags  SMALLINT      NULL,
  CorrelationTraceId CHAR(32)   NOT NULL,
  IntegrityBlockId CHAR(26)     NULL,
  PayloadBytes    INT           NOT NULL,
  CONSTRAINT PK_AuditEvents PRIMARY KEY (TenantId, AuditRecordId)
);
CREATE INDEX IX_AuditEvents_Tenant_CreatedAt ON dbo.AuditEvents (TenantId, CreatedAt DESC, AuditRecordId DESC);
CREATE INDEX IX_AuditEvents_Tenant_Res ON dbo.AuditEvents (TenantId, ResourceType, ResourceId, CreatedAt DESC, AuditRecordId DESC);
CREATE INDEX IX_AuditEvents_Tenant_Actor ON dbo.AuditEvents (TenantId, ActorId, CreatedAt DESC, AuditRecordId DESC);

PostgreSQL (illustrative)

CREATE TABLE "AuditEvents" (
  "TenantId"        TEXT NOT NULL,
  "AuditRecordId"   CHAR(26) NOT NULL,
  "CreatedAt"       TIMESTAMPTZ NOT NULL,
  "ObservedAt"      TIMESTAMPTZ NOT NULL,
  "Action"          TEXT NOT NULL,
  "ResourceType"    TEXT NOT NULL,
  "ResourceId"      TEXT NOT NULL,
  "ActorId"         TEXT NOT NULL,
  "ActorType"       SMALLINT NOT NULL,
  "DecisionOutcome" SMALLINT NULL,
  "ChangedFields"   JSONB NULL,
  "DataClassFlags"  SMALLINT NULL,
  "CorrelationTraceId" CHAR(32) NOT NULL,
  "IntegrityBlockId" CHAR(26) NULL,
  "PayloadBytes"    INTEGER NOT NULL,
  PRIMARY KEY ("TenantId","AuditRecordId")
);
CREATE INDEX "IX_AE_Tenant_CreatedAt" ON "AuditEvents" ("TenantId","CreatedAt" DESC,"AuditRecordId" DESC);
CREATE INDEX "IX_AE_Tenant_Res" ON "AuditEvents" ("TenantId","ResourceType","ResourceId","CreatedAt" DESC,"AuditRecordId" DESC);
CREATE INDEX "IX_AE_Tenant_Actor" ON "AuditEvents" ("TenantId","ActorId","CreatedAt" DESC,"AuditRecordId" DESC);

Apply RLS policies keyed on TenantId exactly as in the authoritative store.


Projection build rules

  • Input: stream authoritative AuditRecords ordered by (TenantId, CreatedAt, AuditRecordId) (or by ULID time).
  • For each record:
    • Compute ChangedFields = keys of Delta.fields (bounded to 64).
    • Compute DataClassFlags from the record’s classification tags.
    • Upsert into AuditEvents.
    • Upsert into ResourceEvents when resource present; append a Seq = lastSeq+1 per (TenantId, ResourceType, ResourceId).
    • Upsert into ActorEvents with Seq = lastSeq+1 per (TenantId, ActorId).
  • Checkpoint: After a successful batch, advance ProjectionCheckpoints.HighWater*.

Idempotency

  • All upserts keyed by (TenantId, AuditRecordId) must be deterministic; repeated processing of the same record produces the same row.

Purges

  • When lifecycle purges authoritative rows, delete corresponding projection rows (foreign-key cascade or projector “tombstone” stream).

API examples

List tenant events (forward, first page)

{
  "items": [
    {
      "auditRecordId": "01JE6KQQD0Q0J5VQ8WJ6T1S9FX",
      "createdAt": "2025-10-22T15:43:11.281Z",
      "observedAt": "2025-10-22T15:43:11.500Z",
      "action": "appointment.update",
      "resourceType": "Vetspire.Appointment",
      "resourceId": "A-9981",
      "actorId": "user_123",
      "actorType": "User",
      "decisionOutcome": "Allow",
      "changedFields": ["status","/lines/0/price"],
      "dataClassFlags": 12
    }
  ],
  "next": "eyJ2IjoxLCJkIjoiZiIsInQiOjE3Mjk2NTUyOTEyODEsImlkIjoiMDFKRTZLU..." ,
  "count": 1
}

List resource timeline (seek)

{
  "items": [
    {
      "auditRecordId": "01JE6KR2P2FT7DSSX9W7EJQ2DT",
      "createdAt": "2025-10-22T15:45:00.002Z",
      "action": "appointment.read",
      "actorId": "svc_gw",
      "decisionOutcome": "NotApplicable",
      "resourceType": "Vetspire.Appointment",
      "resourceId": "A-9981"
    }
  ],
  "next": "eyJ2IjoxLCJkIjoiZiIsInQiOjE3Mjk2NTU0MDAwMDIsImlkIjoiMDFKRTZL..."
}

Budgets & caps

  • Page size: 1–1000 (default 100).
  • ChangedFields: ≤ 64 entries; strings ≤ 128 chars each.
  • Projections rebuild time: parallelized per tenant/shard; no cross-tenant fan-in.
  • Checkpoint lag SLO: configurable (e.g., p95 < 60 seconds from authoritative write).

Validation rules (summary)

  • Per-tenant RLS applied for all projection tables.
  • Cursors must decode to monotonic coordinates and match the request tenant.
  • Projectors must never mutate authoritative payloads; projections are delete/rebuild only.
  • During rebuilds, target tables can be shadowed (…_Rebuild) and swapped atomically.

Search Index Schema (Optional)

Defines per-tenant search indexes to power full-text search, filtering, and type-ahead suggestions over projected audit events. Search indexes are derived, redacted, and rebuildable; they MUST never store more than the effective Redaction Plan allows.

JSON docs use lowerCamel. C# POCOs (for producers/clients) use PascalCase. Index names and fields include tenantId for strict multi-tenancy.


Overview

  • Per-tenant aliasing: Prefer one index alias per tenant (either mapping to a dedicated physical index or a filtered multi-tenant index).
  • Fields for search: action, resource, actor, time, decision outcome, changed fields, and a compact searchText blob for catch-all text queries.
  • Suggest: completion suggesters for resource IDs and actor IDs; search_as_you_type or edge-ngrams for action/resource types.
  • Analyzers: email/URL aware tokenization; keyword+lowercase normalizers for exact filters; hierarchical analyzer for resource.path.
  • Lifecycle: rollover by size/time; ILM policy to delete index shards at or before Retention windows (often ≤ authoritative retention).
  • Reindex strategy: versioned index names with write/read aliases per tenant; zero-downtime rebuild + alias swap.

Index naming & tenancy

  • Dedicated index per tenant (preferred for 100s–1Ks of tenants):
    audit-{tenantId}-v{schemaVersion}-{yyyy.MM} (monthly rollover)
    Aliases:
  • write alias: audit-{tenantId}-write
  • read alias: audit-{tenantId}

  • Shared (multi-tenant) index (for 10Ks+ tenants):
    audit-shared-v{schemaVersion}-{yyyy.MM} with filtered read aliases:
    audit-{tenantId} alias → filter term: { tenantId: "<tenantId>" }
    ⚠️ All queries must enforce a must tenantId term; index-level RBAC required.


Indexed document (logical shape)

Field (JSON) Type Notes
tenantId keyword Hard filter for ALL queries.
auditRecordId keyword ULID string; unique within tenant.
createdAt date UTC; primary sort.
observedAt date Secondary time.
action text + keyword verb or verb.noun; keyword subfield for exact.
resourceType text(sa yt) + keyword PascalCase; search-as-you-type and exact.
resourceId keyword + completion Opaque id; suggester enabled.
resourcePath text(path_hierarchy) Optional path (JSON Pointer style).
actorId keyword + completion Opaque id; suggester enabled.
actorType keyword Enum name.
decisionOutcome keyword Enum name if present.
changedFields keyword Multi-valued; from delta.fields keys.
dataClassFlags integer Bitmask for quick filtering.
payloadBytes integer For query budgeting.
searchText text Concatenated, redacted text for catch-all queries.
schemaVersion short Index doc version for reindexing.

Only redacted values are indexed (e.g., hashed email, masked IP). Never index raw PII beyond plan.


OpenSearch/Elasticsearch mapping (template)

{
  "index_patterns": ["audit-*"],
  "template": {
    "settings": {
      "index.lifecycle.name": "audit-ilm",
      "index.refresh_interval": "5s",
      "analysis": {
        "analyzer": {
          "edge_en": { "tokenizer": "edge_ngram_tok", "filter": ["lowercase"] },
          "path_hierarchy_an": { "tokenizer": "path_hierarchy", "filter": ["lowercase"] }
        },
        "tokenizer": {
          "edge_ngram_tok": { "type": "edge_ngram", "min_gram": 2, "max_gram": 15, "token_chars": ["letter","digit"] }
        },
        "normalizer": {
          "kw_lower": { "type": "custom", "filter": ["lowercase"] }
        }
      }
    },
    "mappings": {
      "dynamic": "false",
      "properties": {
        "tenantId":      { "type": "keyword", "normalizer": "kw_lower" },
        "auditRecordId": { "type": "keyword" },
        "createdAt":     { "type": "date" },
        "observedAt":    { "type": "date" },
        "action": {
          "type": "text",
          "fields": { "kw": { "type": "keyword", "normalizer": "kw_lower" }, "sug": { "type": "search_as_you_type" } }
        },
        "resourceType": {
          "type": "text",
          "fields": { "kw": { "type": "keyword" }, "sug": { "type": "search_as_you_type" } }
        },
        "resourceId": {
          "type": "keyword",
          "fields": { "suggest": { "type": "completion" } }
        },
        "resourcePath": { "type": "text", "analyzer": "path_hierarchy_an" },
        "actorId": {
          "type": "keyword",
          "fields": { "suggest": { "type": "completion" } }
        },
        "actorType":      { "type": "keyword" },
        "decisionOutcome":{ "type": "keyword" },
        "changedFields":  { "type": "keyword" },
        "dataClassFlags": { "type": "integer" },
        "payloadBytes":   { "type": "integer" },
        "searchText":     { "type": "text" },
        "schemaVersion":  { "type": "short" }
      }
    }
  },
  "priority": 500
}

ILM policy example

{
  "policy": {
    "phases": {
      "hot": {
        "actions": {
          "rollover": { "max_primary_shard_size": "30gb", "max_age": "7d" }
        }
      },
      "warm": { "min_age": "30d", "actions": { "forcemerge": { "max_num_segments": 1 } } },
      "delete": { "min_age": "90d", "actions": { "delete": {} } }
    }
  }
}

Tune min_age to not exceed tenant Retention. For tenants with longer retention, apply a custom ILM bound to their alias.


C# document contract (producer/client)

public sealed class SearchEvent
{
    public string TenantId { get; init; } = default!;
    public string AuditRecordId { get; init; } = default!;
    public DateTimeOffset CreatedAt { get; init; }
    public DateTimeOffset ObservedAt { get; init; }

    public string Action { get; init; } = default!;
    public string ResourceType { get; init; } = default!;
    public string ResourceId { get; init; } = default!;
    public string? ResourcePath { get; init; }

    public string ActorId { get; init; } = default!;
    public string ActorType { get; init; } = "Unknown";
    public string? DecisionOutcome { get; init; }

    public IReadOnlyList<string>? ChangedFields { get; init; }
    public short? DataClassFlags { get; init; }
    public int PayloadBytes { get; init; }

    public string SearchText { get; init; } = "";    // redacted concat
    public short SchemaVersion { get; init; } = 1;
}

When serializing to JSON for indexing, ensure lowerCamel property names (e.g., System.Text.Json with PropertyNamingPolicy = CamelCase).


  • Redaction step: apply the Redaction Plan to each field destined for search (e.g., hash email, mask IP).
  • searchText construction: concatenate safe fields (action, resourceType, resourceId, actorId, changedFields, small text fragments from delta) into a single text field.
  • Suggest inputs: populate resourceId.suggest and actorId.suggest with the same values; optionally include synonyms/aliases.
  • DataClassFlags: compute bitmask from record classifications for fast filtering.

Queries (examples)

Tenant-scoped free text + filters

{
  "query": {
    "bool": {
      "must": [{ "query_string": { "query": "book* OR status:Booked", "fields": ["searchText","action","resourceType.sug"] } }],
      "filter": [
        { "term": { "tenantId": "splootvets" } },
        { "term": { "resourceType.kw": "Vetspire.Appointment" } },
        { "range": { "createdAt": { "gte": "now-7d/d" } } }
      ]
    }
  },
  "sort": [{ "createdAt": "desc" }, { "auditRecordId": "desc" }],
  "size": 50
}

Type-ahead suggestion for resourceId

{
  "suggest": {
    "resid": { "prefix": "A-99", "completion": { "field": "resourceId.suggest", "skip_duplicates": true, "size": 5 } }
  }
}

Per-actor activity (exact)

{
  "query": {
    "bool": {
      "filter": [
        { "term": { "tenantId": "splootvets" } },
        { "term": { "actorId": "user_123" } }
      ]
    }
  }
}

Reindex strategy (zero-downtime)

  1. Bump schemaVersion when mapping/analysis changes.
  2. Create new physical index audit-{tenantId}-v{N+1}-000001 with the updated template.
  3. Backfill by replaying projection stream (or use _reindex from old alias → new write alias with a reprocessor that re-applies redaction).
  4. Dual write temporarily (optional) to old and new write aliases to converge.
  5. Swap read alias audit-{tenantId} to point exclusively to v{N+1}.
  6. Freeze and delete old indices post verification (checksum doc counts, sample queries).

Keep a compat window where both versions are queryable if you expose API indexVersion parameters.


Retention for index docs

  • Hot: 0–7 days (fast refresh, frequent rollovers).
  • Warm: 7–30 days (force-merge, slower refresh).
  • Delete: at or before the record’s Retention keepUntil, unless tenant policy requires full parity.
  • Legal hold: optionally pin affected shards via index block or copy matching docs to a hold index until release.

Sharding & capacity

  • Target primary shard size 20–50 GB post-merge; prefer more smaller shards for heavy ingest tenants.
  • Refresh interval 5s in hot phase, 60s in warm phase.
  • Limit doc size: ≤ 8 KiB typical (search doc is compact); avoid embedding full delta bodies—only keys/summary.

Validation rules (summary)

  • All queries include tenantId term (or use filtered read aliases).
  • Only redacted values are indexed; no secrets/credentials/PHI raw values.
  • schemaVersion in the doc matches the current template version.
  • ILM policy applied to all indices; rollover and delete actions succeed before shard limits.
  • Reindex tooling verifies doc count parity (± expected drops due to ILM) and sample checksum of auditRecordId sets.

Event Contracts (Published Language)

Catalog of domain events the Audit Trail Platform (ATP) emits/consumes. Events use a common envelope, are tenant-scoped, and are designed for at-least-once delivery with backward-compatible evolution.

JSON uses lowerCamel; C# (gRPC code-first) uses PascalCase. Protobuf fields use PascalCase with json_name mapped to lowerCamel.


Overview

  • Transport-agnostic payloads (Kafka/NATS/Service Bus friendly).
  • Per-tenant partitioning (partition key = tenantId).
  • Correlation-friendly with OTel-compatible traceId (see Correlation & Provenance).
  • Small, focused data sections; large artifacts referenced via URIs (e.g., export files).
  • Versioned schemas with additive-first evolution.

Envelope

All events share a minimal, stable header plus a type-specific data object.

JSON (lowerCamel) C# (PascalCase) Type Req. Notes
eventId EventId ULID Unique per event (not per record).
eventType EventType string Namespaced, e.g., connectsoft.audit.v1/AuditRecord.Appended.
tenantId TenantId string Partition/authorization key.
publishedAt PublishedAt timestamp UTC time the event was published.
traceId TraceId hex32 W3C trace id (from correlation).
causationId CausationId ULID Event id that caused this emission (if any).
schemaVersion SchemaVersion string e.g., event-envelope.v1.
producer Producer string Logical service name/version.
data Data object Type-specific payload (below).

CloudEvents mapping (optional): eventId → id, eventType → type, publishedAt → time, tenantId → subject ("tenant:<id>"), producer → source, data → data.


Event types

1) AuditRecord.Appended – emitted when an audit fact is durably persisted to the authoritative store.
data:

  • auditRecordId (ULID)
  • createdAt, observedAt (timestamps)
  • action (string), resourceType (string), resourceId (string)
  • actorId (string), actorType (enum name)
  • hasDelta (bool), dataClassFlags (int bitmask)
  • payloadBytes (int)

2) AuditRecord.Accepted – idempotent ack for producers; emitted even on retry/duplicate.
data:

  • auditRecordId (ULID)
  • idempotencyKey (string?)
  • status ("Created" | "Duplicate")
  • createdAt, observedAt

3) Projection.Updated – a projection row has been upserted (e.g., AuditEvents).
data:

  • projection ("AuditEvents" | "ResourceEvents" | "ActorEvents")
  • auditRecordId (ULID)
  • checkpoint (object) { "highWaterRecordId": ULID, "highWaterObservedAt": timestamp, "version": int }

4) Integrity.ProofComputed – an IntegrityBlock sealed; proofs available.
data:

  • blockId (ULID), sealedAt (timestamp)
  • segmentCount (int), recordCount (long)
  • blockRoot (hex64), prevBlockRoot (hex64)
  • signature { "scheme": "Ed25519"|"PKCS7", "signingKeyId": string }

5) Export.Requested – an export job created/started.
data:

  • jobId (ULID), createdAt (timestamp)
  • filter (object; summarized)
  • format ("Jsonl"|"Parquet"), includeIntegrity (bool)
  • redactionPlan { "id": string, "revision": int }

6) Export.Completed – export job finished (success or failure).
data:

  • jobId (ULID), state ("Completed"|"Failed"|"Canceled"), reason (string?)
  • packageCount (int), recordCount (long), bytesUncompressed (long)
  • manifests (array of URIs or ids)

7) Policy.Changed – a policy revision becomes effective (Retention/Redaction/Residency).
data:

  • kind ("Retention"|"Redaction"|"Residency")
  • id (string), revision (int), effectiveFromUtc (timestamp)
  • previousRevision (int?)

Topics & partitioning (illustrative)

  • atp.audit.v1AuditRecord.* (partition key = tenantId)
  • atp.integrity.v1Integrity.* (partition key = tenantId, secondary route by blockId if supported)
  • atp.projection.v1Projection.* (partition key = tenantId)
  • atp.export.v1Export.* (partition key = tenantId)
  • atp.policy.v1Policy.* (partition key = tenantId)

Ordering within a partition is preserved by transport; do not rely on cross-partition ordering.


JSON Schemas (v1)

event-envelope.v1.json

{
  "$id": "urn:connectsoft:schemas:events/event-envelope.v1.json",
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "title": "EventEnvelope",
  "type": "object",
  "additionalProperties": false,
  "properties": {
    "eventId": { "type": "string", "pattern": "^[0-9A-HJKMNP-TV-Z]{26}$" },
    "eventType": { "type": "string", "maxLength": 128 },
    "tenantId": { "type": "string", "pattern": "^[A-Za-z0-9._-]{1,128}$" },
    "publishedAt": { "type": "string", "format": "date-time" },
    "traceId": { "type": "string", "pattern": "^[a-f0-9]{32}$" },
    "causationId": { "type": "string", "pattern": "^[0-9A-HJKMNP-TV-Z]{26}$" },
    "schemaVersion": { "type": "string", "pattern": "^event-envelope\\.v[0-9]+$" },
    "producer": { "type": "string", "maxLength": 64 },
    "data": { "type": "object" }
  },
  "required": ["eventId","eventType","tenantId","publishedAt","traceId","schemaVersion","producer","data"]
}

auditrecord.appended.v1.json

{
  "$id": "urn:connectsoft:schemas:events/auditrecord.appended.v1.json",
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "title": "AuditRecord.Appended",
  "type": "object",
  "additionalProperties": false,
  "properties": {
    "auditRecordId": { "type": "string", "pattern": "^[0-9A-HJKMNP-TV-Z]{26}$" },
    "createdAt": { "type": "string", "format": "date-time" },
    "observedAt": { "type": "string", "format": "date-time" },
    "action": { "type": "string" },
    "resourceType": { "type": "string" },
    "resourceId": { "type": "string" },
    "actorId": { "type": "string" },
    "actorType": { "type": "string" },
    "hasDelta": { "type": "boolean" },
    "dataClassFlags": { "type": "integer", "minimum": 0 },
    "payloadBytes": { "type": "integer", "minimum": 0 }
  },
  "required": ["auditRecordId","createdAt","observedAt","action","resourceType","resourceId","actorId","actorType","payloadBytes"]
}

auditrecord.accepted.v1.json

{
  "$id": "urn:connectsoft:schemas:events/auditrecord.accepted.v1.json",
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "title": "AuditRecord.Accepted",
  "type": "object",
  "additionalProperties": false,
  "properties": {
    "auditRecordId": { "type": "string", "pattern": "^[0-9A-HJKMNP-TV-Z]{26}$" },
    "idempotencyKey": { "type": "string", "maxLength": 128 },
    "status": { "type": "string", "enum": ["Created","Duplicate"] },
    "createdAt": { "type": "string", "format": "date-time" },
    "observedAt": { "type": "string", "format": "date-time" }
  },
  "required": ["auditRecordId","status","observedAt"]
}

projection.updated.v1.json

{
  "$id": "urn:connectsoft:schemas:events/projection.updated.v1.json",
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "title": "Projection.Updated",
  "type": "object",
  "additionalProperties": false,
  "properties": {
    "projection": { "type": "string", "enum": ["AuditEvents","ResourceEvents","ActorEvents"] },
    "auditRecordId": { "type": "string", "pattern": "^[0-9A-HJKMNP-TV-Z]{26}$" },
    "checkpoint": {
      "type": "object",
      "additionalProperties": false,
      "properties": {
        "highWaterRecordId": { "type": "string", "pattern": "^[0-9A-HJKMNP-TV-Z]{26}$" },
        "highWaterObservedAt": { "type": "string", "format": "date-time" },
        "version": { "type": "integer", "minimum": 1 }
      },
      "required": ["highWaterRecordId","highWaterObservedAt","version"]
    }
  },
  "required": ["projection","auditRecordId","checkpoint"]
}

integrity.proofcomputed.v1.json

{
  "$id": "urn:connectsoft:schemas:events/integrity.proofcomputed.v1.json",
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "title": "Integrity.ProofComputed",
  "type": "object",
  "additionalProperties": false,
  "properties": {
    "blockId": { "type": "string", "pattern": "^[0-9A-HJKMNP-TV-Z]{26}$" },
    "sealedAt": { "type": "string", "format": "date-time" },
    "segmentCount": { "type": "integer", "minimum": 1 },
    "recordCount": { "type": "integer", "minimum": 1 },
    "blockRoot": { "type": "string", "pattern": "^[a-f0-9]{64}$" },
    "prevBlockRoot": { "type": "string", "pattern": "^[a-f0-9]{64}$" },
    "signature": {
      "type": "object",
      "additionalProperties": false,
      "properties": {
        "scheme": { "type": "string", "enum": ["Ed25519","PKCS7"] },
        "signingKeyId": { "type": "string", "maxLength": 128 }
      },
      "required": ["scheme","signingKeyId"]
    }
  },
  "required": ["blockId","sealedAt","segmentCount","recordCount","blockRoot","prevBlockRoot","signature"]
}

export.requested.v1.json

{
  "$id": "urn:connectsoft:schemas:events/export.requested.v1.json",
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "title": "Export.Requested",
  "type": "object",
  "additionalProperties": false,
  "properties": {
    "jobId": { "type": "string", "pattern": "^[0-9A-HJKMNP-TV-Z]{26}$" },
    "createdAt": { "type": "string", "format": "date-time" },
    "format": { "type": "string", "enum": ["Jsonl","Parquet"] },
    "includeIntegrity": { "type": "boolean" },
    "filter": { "type": "object" },
    "redactionPlan": {
      "type": "object",
      "additionalProperties": false,
      "properties": { "id": { "type": "string" }, "revision": { "type": "integer" } },
      "required": ["id","revision"]
    }
  },
  "required": ["jobId","createdAt","format","redactionPlan"]
}

export.completed.v1.json

{
  "$id": "urn:connectsoft:schemas:events/export.completed.v1.json",
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "title": "Export.Completed",
  "type": "object",
  "additionalProperties": false,
  "properties": {
    "jobId": { "type": "string", "pattern": "^[0-9A-HJKMNP-TV-Z]{26}$" },
    "state": { "type": "string", "enum": ["Completed","Failed","Canceled"] },
    "reason": { "type": "string", "maxLength": 256 },
    "packageCount": { "type": "integer", "minimum": 0 },
    "recordCount": { "type": "integer", "minimum": 0 },
    "bytesUncompressed": { "type": "integer", "minimum": 0 },
    "manifests": { "type": "array", "items": { "type": "string" }, "maxItems": 1000 }
  },
  "required": ["jobId","state"]
}

policy.changed.v1.json

{
  "$id": "urn:connectsoft:schemas:events/policy.changed.v1.json",
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "title": "Policy.Changed",
  "type": "object",
  "additionalProperties": false,
  "properties": {
    "kind": { "type": "string", "enum": ["Retention","Redaction","Residency"] },
    "id": { "type": "string", "maxLength": 128 },
    "revision": { "type": "integer", "minimum": 1 },
    "effectiveFromUtc": { "type": "string", "format": "date-time" },
    "previousRevision": { "type": "integer", "minimum": 1 }
  },
  "required": ["kind","id","revision","effectiveFromUtc"]
}

C# (gRPC code-first)

[DataContract]
public sealed class EventEnvelope<T>
{
    [DataMember(Order = 1)] public string EventId { get; init; } = default!;   // ULID
    [DataMember(Order = 2)] public string EventType { get; init; } = default!;
    [DataMember(Order = 3)] public string TenantId { get; init; } = default!;
    [DataMember(Order = 4)] public DateTimeOffset PublishedAt { get; init; }
    [DataMember(Order = 5)] public string TraceId { get; init; } = default!;   // hex32
    [DataMember(Order = 6)] public string? CausationId { get; init; }          // ULID
    [DataMember(Order = 7)] public string SchemaVersion { get; init; } = "event-envelope.v1";
    [DataMember(Order = 8)] public string Producer { get; init; } = default!;
    [DataMember(Order = 9)] public T Data { get; init; } = default!;
}

[DataContract] public sealed class AuditRecordAppended {
    [DataMember(Order = 1)] public string AuditRecordId { get; init; } = default!;
    [DataMember(Order = 2)] public DateTimeOffset CreatedAt { get; init; }
    [DataMember(Order = 3)] public DateTimeOffset ObservedAt { get; init; }
    [DataMember(Order = 4)] public string Action { get; init; } = default!;
    [DataMember(Order = 5)] public string ResourceType { get; init; } = default!;
    [DataMember(Order = 6)] public string ResourceId { get; init; } = default!;
    [DataMember(Order = 7)] public string ActorId { get; init; } = default!;
    [DataMember(Order = 8)] public string ActorType { get; init; } = "Unknown";
    [DataMember(Order = 9)] public bool HasDelta { get; init; }
    [DataMember(Order = 10)] public short? DataClassFlags { get; init; }
    [DataMember(Order = 11)] public int PayloadBytes { get; init; }
}

[DataContract] public sealed class AuditRecordAccepted {
    [DataMember(Order = 1)] public string AuditRecordId { get; init; } = default!;
    [DataMember(Order = 2)] public string? IdempotencyKey { get; init; }
    [DataMember(Order = 3)] public string Status { get; init; } = "Created"; // Created|Duplicate
    [DataMember(Order = 4)] public DateTimeOffset? CreatedAt { get; init; }
    [DataMember(Order = 5)] public DateTimeOffset ObservedAt { get; init; }
}

[DataContract] public sealed class ProjectionUpdated {
    [DataMember(Order = 1)] public string Projection { get; init; } = default!; // AuditEvents|ResourceEvents|ActorEvents
    [DataMember(Order = 2)] public string AuditRecordId { get; init; } = default!;
    [DataMember(Order = 3)] public ProjectionCheckpoint Checkpoint { get; init; } = new();
}
[DataContract] public sealed class ProjectionCheckpoint {
    [DataMember(Order = 1)] public string HighWaterRecordId { get; init; } = default!;
    [DataMember(Order = 2)] public DateTimeOffset HighWaterObservedAt { get; init; }
    [DataMember(Order = 3)] public int Version { get; init; }
}

[DataContract] public sealed class IntegrityProofComputed {
    [DataMember(Order = 1)] public string BlockId { get; init; } = default!;
    [DataMember(Order = 2)] public DateTimeOffset SealedAt { get; init; }
    [DataMember(Order = 3)] public int SegmentCount { get; init; }
    [DataMember(Order = 4)] public long RecordCount { get; init; }
    [DataMember(Order = 5)] public string BlockRoot { get; init; } = default!;
    [DataMember(Order = 6)] public string PrevBlockRoot { get; init; } = default!;
    [DataMember(Order = 7)] public SignatureHeader Signature { get; init; } = new();
}
[DataContract] public sealed class SignatureHeader {
    [DataMember(Order = 1)] public string Scheme { get; init; } = "Ed25519";
    [DataMember(Order = 2)] public string SigningKeyId { get; init; } = default!;
}

[DataContract] public sealed class ExportRequested {
    [DataMember(Order = 1)] public string JobId { get; init; } = default!;
    [DataMember(Order = 2)] public DateTimeOffset CreatedAt { get; init; }
    [DataMember(Order = 3)] public string Format { get; init; } = "Jsonl";
    [DataMember(Order = 4)] public bool IncludeIntegrity { get; init; }
    [DataMember(Order = 5)] public object? Filter { get; init; } // summary
    [DataMember(Order = 6)] public RedactionPlanRef RedactionPlan { get; init; } = new();
}

[DataContract] public sealed class ExportCompleted {
    [DataMember(Order = 1)] public string JobId { get; init; } = default!;
    [DataMember(Order = 2)] public string State { get; init; } = "Completed"; // Completed|Failed|Canceled
    [DataMember(Order = 3)] public string? Reason { get; init; }
    [DataMember(Order = 4)] public int? PackageCount { get; init; }
    [DataMember(Order = 5)] public long? RecordCount { get; init; }
    [DataMember(Order = 6)] public long? BytesUncompressed { get; init; }
    [DataMember(Order = 7)] public IReadOnlyList<string>? Manifests { get; init; }
}

[DataContract] public sealed class PolicyChanged {
    [DataMember(Order = 1)] public string Kind { get; init; } = default!; // Retention|Redaction|Residency
    [DataMember(Order = 2)] public string Id { get; init; } = default!;
    [DataMember(Order = 3)] public int Revision { get; init; }
    [DataMember(Order = 4)] public DateTimeOffset EffectiveFromUtc { get; init; }
    [DataMember(Order = 5)] public int? PreviousRevision { get; init; }
}

Protobuf (optional emission)

syntax = "proto3";
package connectsoft.events.v1;

message EventEnvelope {
  string EventId = 1 [json_name = "eventId"];
  string EventType = 2 [json_name = "eventType"];
  string TenantId = 3 [json_name = "tenantId"];
  google.protobuf.Timestamp PublishedAt = 4 [json_name = "publishedAt"];
  string TraceId = 5 [json_name = "traceId"];
  string CausationId = 6 [json_name = "causationId"];
  string SchemaVersion = 7 [json_name = "schemaVersion"];
  string Producer = 8 [json_name = "producer"];
  google.protobuf.Any Data = 9 [json_name = "data"];
}

message AuditRecordAppended {
  string AuditRecordId = 1 [json_name = "auditRecordId"];
  google.protobuf.Timestamp CreatedAt = 2 [json_name = "createdAt"];
  google.protobuf.Timestamp ObservedAt = 3 [json_name = "observedAt"];
  string Action = 4 [json_name = "action"];
  string ResourceType = 5 [json_name = "resourceType"];
  string ResourceId = 6 [json_name = "resourceId"];
  string ActorId = 7 [json_name = "actorId"];
  string ActorType = 8 [json_name = "actorType"];
  bool HasDelta = 9 [json_name = "hasDelta"];
  int32 DataClassFlags = 10 [json_name = "dataClassFlags"];
  int32 PayloadBytes = 11 [json_name = "payloadBytes"];
}

message AuditRecordAccepted {
  string AuditRecordId = 1 [json_name = "auditRecordId"];
  string IdempotencyKey = 2 [json_name = "idempotencyKey"];
  string Status = 3 [json_name = "status"]; // Created|Duplicate
  google.protobuf.Timestamp CreatedAt = 4 [json_name = "createdAt"];
  google.protobuf.Timestamp ObservedAt = 5 [json_name = "observedAt"];
}

message ProjectionUpdated {
  string Projection = 1 [json_name = "projection"];
  string AuditRecordId = 2 [json_name = "auditRecordId"];
  ProjectionCheckpoint Checkpoint = 3 [json_name = "checkpoint"];
}
message ProjectionCheckpoint {
  string HighWaterRecordId = 1 [json_name = "highWaterRecordId"];
  google.protobuf.Timestamp HighWaterObservedAt = 2 [json_name = "highWaterObservedAt"];
  int32 Version = 3 [json_name = "version"];
}

message IntegrityProofComputed {
  string BlockId = 1 [json_name = "blockId"];
  google.protobuf.Timestamp SealedAt = 2 [json_name = "sealedAt"];
  int32 SegmentCount = 3 [json_name = "segmentCount"];
  int64 RecordCount = 4 [json_name = "recordCount"];
  string BlockRoot = 5 [json_name = "blockRoot"];
  string PrevBlockRoot = 6 [json_name = "prevBlockRoot"];
  SignatureHeader Signature = 7 [json_name = "signature"];
}
message SignatureHeader {
  string Scheme = 1 [json_name = "scheme"];
  string SigningKeyId = 2 [json_name = "signingKeyId"];
}

message ExportRequested {
  string JobId = 1 [json_name = "jobId"];
  google.protobuf.Timestamp CreatedAt = 2 [json_name = "createdAt"];
  string Format = 3 [json_name = "format"];
  bool IncludeIntegrity = 4 [json_name = "includeIntegrity"];
  google.protobuf.Struct Filter = 5 [json_name = "filter"];
  RedactionPlanRef RedactionPlan = 6 [json_name = "redactionPlan"];
}
message ExportCompleted {
  string JobId = 1 [json_name = "jobId"];
  string State = 2 [json_name = "state"]; // Completed|Failed|Canceled
  string Reason = 3 [json_name = "reason"];
  int32 PackageCount = 4 [json_name = "packageCount"];
  int64 RecordCount = 5 [json_name = "recordCount"];
  int64 BytesUncompressed = 6 [json_name = "bytesUncompressed"];
  repeated string Manifests = 7 [json_name = "manifests"];
}
message RedactionPlanRef { string Id = 1 [json_name = "id"]; int32 Revision = 2 [json_name = "revision"]; }

message PolicyChanged {
  string Kind = 1 [json_name = "kind"]; // Retention|Redaction|Residency
  string Id = 2 [json_name = "id"];
  int32 Revision = 3 [json_name = "revision"];
  google.protobuf.Timestamp EffectiveFromUtc = 4 [json_name = "effectiveFromUtc"];
  int32 PreviousRevision = 5 [json_name = "previousRevision"];
}

Examples

AuditRecord.Appended

{
  "eventId": "01JE7DE6X1J6J7KJ6G7VQ5T5S4",
  "eventType": "connectsoft.audit.v1/AuditRecord.Appended",
  "tenantId": "splootvets",
  "publishedAt": "2025-10-22T16:05:12.345Z",
  "traceId": "3e1f2d0c9b8a7f6e5d4c3b2a19081716",
  "schemaVersion": "event-envelope.v1",
  "producer": "ingress-gw/2.4.1",
  "data": {
    "auditRecordId": "01JE7D9ZQ1D7J6H2DZX7HQB6XB",
    "createdAt": "2025-10-22T16:05:12.100Z",
    "observedAt": "2025-10-22T16:05:12.320Z",
    "action": "appointment.update",
    "resourceType": "Vetspire.Appointment",
    "resourceId": "A-9981",
    "actorId": "user_123",
    "actorType": "User",
    "hasDelta": true,
    "dataClassFlags": 12,
    "payloadBytes": 1536
  }
}

Integrity.ProofComputed

{
  "eventId": "01JE7E1V2Q3W4E5R6T7Y8U9I0O",
  "eventType": "connectsoft.audit.v1/Integrity.ProofComputed",
  "tenantId": "splootvets",
  "publishedAt": "2025-10-22T16:10:00Z",
  "traceId": "9c8b7a6f5e4d3c2b1a09182736455443",
  "schemaVersion": "event-envelope.v1",
  "producer": "integrity-svc/1.3.0",
  "data": {
    "blockId": "01JE7E0B5V2C6M9N3X7Z4K2J8L",
    "sealedAt": "2025-10-22T16:09:58Z",
    "segmentCount": 8,
    "recordCount": 1024,
    "blockRoot": "8a7b6c...d1",
    "prevBlockRoot": "7f6e5d...aa",
    "signature": { "scheme": "Ed25519", "signingKeyId": "kv:prod/atp-integrity/ed25519-2025-01" }
  }
}

Reprocessing & idempotency

  • Consumers MUST treat events as at-least-once and be idempotent using (tenantId, auditRecordId) or the event’s business id.
  • AuditRecord.Accepted may be emitted for duplicates (status="Duplicate").
  • Projection.Updated is informational; rebuilds should not depend on it for correctness.

Evolution & compatibility

  • New fields are additive and optional.
  • Breaking changes require a new schema id (e.g., …v2) and a new eventType suffix.
  • Producers MUST keep emitting previous versions until all critical consumers upgrade (dual-publish window).

Validation rules (summary)

  • Envelope is present and valid; eventType matches the data schema.
  • tenantId is non-empty and matches tenancy pattern.
  • traceId is W3C hex32; eventId/causationId are ULIDs.
  • Topic routing key must be tenantId.
  • Events must not contain raw data beyond the active Redaction Plan.

Schema Evolution & Compatibility

Defines how models change safely across JSON, Protobuf, SQL, search indexes, and events. Our strategy is additive-first, forward-tolerant, and backward-compatible with clear deprecation timelines and a resolvable schema registry.

JSON uses lowerCamel; C#/gRPC code-first uses PascalCase; DB tables/columns use PascalCase.


Principles

  • Additive first: Prefer adding optional fields over changing/removing existing ones.
  • Must-ignore: Readers must ignore unknown fields and preserve them when round-tripping (where the transport supports it).
  • Deterministic hashing: Integrity hashes are computed over canonical JSON (RFC8785) of the authoritative payload excluding integrity. Never mutate stored payloads for backfills.
  • Version in-payload: Domain objects include a schemaVersion (string, e.g., auditrecord.v1). Event envelopes include their own version id (e.g., event-envelope.v1).
  • Registry-resolvable: Every schema’s $id is a stable URN resolvable via the Schema Registry.
  • Compatibility gates: CI enforces additive changes; breaking changes require a new major schema id (…v2) and migration plan.

Versioning scheme

Artifact Versioning Compatibility
JSON Schemas …vN in $id and schemaVersion field Add fields/enums → backward/forward OK; breaking change → new vN+1 id
Protobuf (proto3) Field numbers immutable; reserved on removal Add fields with new numbers; never reuse numbers; mark obsolete via deprecated
C# (gRPC code-first) DataMember(Order=N) immutable Add optional props; do not change orders; use nullable/wrapper types for presence
SQL Online, additive DDL Add nullable columns/tables/indexes; avoid PK changes; never shrink/widen enums by ordinal
Events Type name suffix and schema id (e.g., …/AuditRecord.Appended + …v1) Dual-publish window for v1/v2; consumers must accept both
Search Index Template schemaVersion in docs; index name carries version Reindex + alias swap; never change analyzer on live index

Schema registry

  • URN format: urn:connectsoft:schemas/<domain>/<name>.vN.json
    Examples:
    • urn:connectsoft:schemas/domain/auditrecord.v1.json
    • urn:connectsoft:schemas/events/export.completed.v1.json
  • Resolution: URNs resolve to signed JSON files in the registry (git + object store).
  • Pinning: Producers/consumers pin to a commit digest or signed manifest for reproducibility.
  • Discovery: Each payload includes schemaVersion matching a registry $id (minus the urn: prefix if desired).

Patterns by artifact

JSON (domain, admin, responses)

  • Add: new optional properties, new enum literals (receivers treat unknown as "Unknown" or ignore).
  • Never:
    • change meaning/shape of a property,
    • tighten required-ness,
    • remove properties without a deprecation window.
  • Extensibility: reserve ext object for tenant/vendor fields:
      { "ext": { "vendorX.ticketId": "ABC-123" } }
    

All schemas: additionalProperties: false at root, except allow "ext": { "additionalProperties": true }.

Example (v1 → v2 bump)

// v1
{ "$id": "urn:connectsoft:schemas/domain/auditrecord.v1.json", ... }

// v2 (breaking due to renamed field 'actor' -> 'principal')
{ "$id": "urn:connectsoft:schemas/domain/auditrecord.v2.json", ... }

Protobuf (proto3)

  • Add: new fields with new numbers; defaulting via wrapper types (google.protobuf.*Value) when presence matters.
  • Remove: do not remove; mark deprecated = true and add:

reserved 12;        // number of removed field
reserved "oldName"; // name of removed field
* Enums: include UNSPECIFIED = 0; only append new members at the end; never renumber.

C# (gRPC code-first)

  • Keep DataMember(Order = N) stable forever.
  • Use bool?, int?, etc., or wrapper classes to model presence.
  • Mark old members [Obsolete("use NewField")] and keep them readable until sunset.

SQL

  • Additive only on hot paths: new nullable columns with defaults, new tables, indexes created online.
  • Never update authoritative payload columns post-write (WORM).
  • Migrations: shadow tables for rebuilds; double-write temporarily if needed; swap atomically.

Events

  • New fields: additive → consumers must ignore unknown.
  • Breaking: create a new event schema id and/or eventType (…v2), dual-publish for a defined window, then sunset v1.

Deprecation lifecycle

Stage Signal Producer behavior Consumer expectation Typical duration
Proposed Changelog entry No change Awareness
Deprecated Schema annotations + docs Keep writing old field; start writing new Read both 1–2 minors (≤ 6 months)
Sunset Date announced Stop writing deprecated field/event; emit v2 only Must read new form 1 minor (≤ 3 months)
Removed Changelog & major Field/event removed entirely Must be updated Next major

Signaling mechanisms

  • JSON schemas carry "deprecationNote": "…".
  • gRPC/C#: [Obsolete] attributes.
  • REST/streaming APIs may include Sunset headers or metadata in Problem+JSON errors.

schemaVersion and registry pointers (in-payload)

  • Domain payloads include:

{
  "schemaVersion": "auditrecord.v1",
  "schemaRef": "urn:connectsoft:schemas/domain/auditrecord.v1.json"
}
* Event envelopes include schemaVersion: "event-envelope.v1". * Projections may cache schemaVersion of source for debugging and replays.


Evolution playbook

  1. Author change as additive; update JSON Schema & Protobuf or C# contracts.
  2. Register new/updated schema in the registry; bump version if breaking.
  3. Implement dual-read/write when renaming/moving fields:
    • Write both oldField and newField for the deprecation window.
    • Read prefers newField ?? oldField.
  4. Backfill only in projections. Never mutate authoritative JSON or it will invalidate integrity proofs.
  5. Roll out behind a feature flag or edition gate when applicable.
  6. Sunset: remove dual-write, emit v2 only; keep readers tolerant for an additional buffer.

Compatibility matrix (quick ref)

Change JSON Protobuf C# SQL Events Allowed?
Add optional field ✓ (new number) ✓ (nullable) Yes
Add enum value ✓ (tolerate unknown) ✓ (append) n/a Yes
Rename field via dual write, then v2 add new, deprecate old add new prop new column; backfill only in projections new event type With plan
Remove field deprecate → v2 reserved number/name [Obsolete] then remove in major drop column after purge/rebuild stop emitting v1 Breaking
Change type v2 new field new prop new column v2 Breaking
Tighten required-ness v2 n/a n/a NOT on authoritative v2 Breaking

CI checks & tooling

  • jsonschema-compat: validates additive-only updates between vN and working copy.
  • protoc lints: enforce reserved numbers, forbid renumbering, require UNSPECIFIED=0.
  • contract tests: golden samples (see Fixtures, Samples & Test Data) round-trip through serializers/deserializers; consumers must pass unknown-field tolerance tests.
  • hash guard: recompute canonical JSON → SHA-256 and assert no change for historical fixtures.
  • DB migrator: dry-run additive DDL; ensure online flags; verify RLS remains intact.

Examples

JSON: additive field

// v1
{ "actor": { "id": "user_1", "type": "User" } }

// v1 additive (OK)
{ "actor": { "id": "user_1", "type": "User", "emailHash": "b109f3..."} }

Protobuf: rename via additive + deprecate

// Old
string Actor = 7;

// New
string Principal = 24; // add
reserved 7;
reserved "Actor";

C#: dual read/write shim

public sealed class Actor
{
    [DataMember(Order = 1)] public string Id { get; init; } = default!;
    [DataMember(Order = 2)] public string Type { get; init; } = "Unknown";

    [DataMember(Order = 99)] [Obsolete("Use Display")]
    public string? Name { get; init; }   // deprecated

    [DataMember(Order = 100)]
    public string? Display { get; init; }
}

// Reader preference
var display = actor.Display ?? actor.Name;

SQL: additive column (projection only)

ALTER TABLE dbo.AuditEvents ADD UADevice NVARCHAR(64) NULL;
-- Populate via projector; do not touch authoritative payload.

Validation rules (summary)

  • schemaVersion present and resolvable via registry $id.
  • Unknown JSON fields do not cause validation failure (except where explicitly disallowed); they flow into ext or are ignored.
  • Protobuf messages never reuse field numbers; removed fields are reserved.
  • Authoritative payloads are never mutated post-write; all backfills happen in projections or sidecars.
  • Event consumers and REST clients tolerate unknown fields and new enum values.
  • Breaking changes require new …vN+1 identifiers and a documented migration & sunset plan.

Validation, Limits & Canonicalization

Centralizes constraints and normalization rules applied on the write path (ingress), mirrored by projectors and API responses. Ensures every AuditRecord is well-formed, bounded, redacted, and canonical before persistence and hashing.

JSON uses lowerCamel; C#/gRPC code-first uses PascalCase; tables/columns use PascalCase. Canonical JSON follows JCS (RFC 8785) for hashing (integrity excludes the integrity node).


Record-level budgets

Area Limit Notes
Payload size (PayloadBytes) ≤ 262,144 bytes (256 KiB) Hard cap at ingress (reject with 413-equivalent).
Attributes count (attributes) ≤ 64 pairs Keys must be simple tokens (see pattern).
Attribute key length ≤ 64 chars Pattern: ^[a-z][a-z0-9._-]{0,63}$ (ASCII).
Attribute value length ≤ 256 chars UTF-8 NFC normalized (see below).
action ≤ 64 chars Canonicalized to lowercase verb or verb.noun.
resource.type ≤ 128 chars Canonicalized to PascalCase dotted segments.
resource.id ≤ 128 chars Opaque; case-preserving; no whitespace.
resource.path ≤ 512 chars JSON Pointer-like; normalized (see below).
actor.id ≤ 128 chars Opaque; case-preserving; no whitespace.
actor.type enum Unknown|User|Service|Job.
Idempotency key ≤ 128 chars ASCII visible; unique per tenant (soft TTL ≥ 24h).
Correlation traceId hex32 Lowercase 32 hex (W3C).
Correlation requestId ≤ 128 chars Freeform token; trimmed.
Timestamps precision ms ISO-8601 UTC (Z) with millisecond precision.

Clock & time sanity

  • ObservedAt: set by platform to now (UTC) (ms precision).
  • CreatedAt: producer-supplied; must satisfy:
    • createdAt ≤ now + 2m (future skew tolerance),
    • createdAt ≥ now - 365d (hard past bound; older events rejected unless a special backfill path).
  • EffectiveAt (if present): effectiveAt ≤ createdAt.
  • All comparisons use UTC; rounding to millisecond precision for storage and hashing.

If createdAt > now + 2m, reject (createdAt.futureBeyondSkew). If effectiveAt > createdAt, reject (effectiveAt.afterCreatedAt).


Canonicalization pipeline (ingress)

  1. Unicode: Normalize all free-text strings to NFC; strip leading/trailing whitespace; collapse internal runs of whitespace to single spaces.
  2. Action: lower-case and validate: ^[a-z]+(\.[a-z0-9_-]+)?$.
    • Examples: create, appointment.read, user.reset_password (underscore allowed after the dot).
  3. Resource type: split on ., PascalCase each segment, then rejoin with .. Validate: ^[A-Z][A-Za-z0-9]*(\.[A-Z][A-Za-z0-9]*)*$.
    • Example: vetspire.appointmentVetspire.Appointment.
  4. Resource path (optional): normalize JSON Pointer-like value:
    • Ensure it starts with /, decode/encode escapes per RFC 6901, and remove trailing / unless root.
  5. Attributes:
    • Keys → ASCII, pattern ^[a-z][a-z0-9._-]{0,63}$.
    • Values → UTF-8 NFC, ≤256; drop non-printable control chars.
  6. Correlation:
    • traceIdlowercase hex32; reject non-hex;
    • requestId → trim and squash whitespace; ≤128.
  7. IP addresses (if provided under conventional keys like client.ip, server.ip):
    • Parse; if IPv4-mapped IPv6, convert to IPv4;
    • IPv4: dotted decimal, no leading zeros; IPv6: RFC 5952 canonical form (lowercase hex, zero-compression).
  8. User agent (if present as client.userAgent):
    • Remove controls; truncate to 256 chars; optional parse to structured UA in projections.
  9. Numbers: preserve numeric types; reject NaN/Infinity.
  10. JCS: materialize canonical JSON for hashing: sorted keys, no insignificant whitespace, timestamps as ISO-8601 UTC with ms precision, strings as NFC.

Delta & redaction caps

  • delta.fields map: ≤ 256 entries.
  • Field key length: ≤ 128 (JSON Pointer or dotted path, normalized to one style in the model).
  • Each before/after scalar string ≤ 1024; longer content must be redacted/truncated and marked with redactionHint = "Truncated".
  • Binary values are not allowed; base64 strings must be ≤ 2 KiB after encoding.
  • If a delta entry violates limits, drop the value, keep the key, and attach redactionHint.

Validation matrix (selected)

Field Pattern / Rule Failure code
tenantId ^[A-Za-z0-9._-]{1,128}$ tenantId.invalid
action ^[a-z]+(\.[a-z0-9_-]+)?$ action.invalid
resource.type ^[A-Z][A-Za-z0-9]*(\.[A-Z][A-Za-z0-9]*)*$ resource.type.invalid
resource.id no spaces, ≤128 resource.id.invalid
actor.id no spaces, ≤128 actor.id.invalid
actor.type enum actor.type.invalid
correlation.traceId hex32 traceId.invalid
createdAt now + 2m createdAt.futureBeyondSkew
effectiveAt createdAt effectiveAt.afterCreatedAt
attributes.*.key ^[a-z][a-z0-9._-]{0,63}$ attributes.key.invalid
attributes.*.value ≤256, printable attributes.value.invalid
payloadBytes ≤ 256 KiB payload.tooLarge

Problem+JSON error hints (ingress)

Type base: urn:connectsoft:errors/validation/{code}

Example (oversized payload)

{
  "type": "urn:connectsoft:errors/validation/payload.tooLarge",
  "title": "Payload exceeds 256 KiB",
  "status": 413,
  "detail": "Submitted AuditRecord is 312,884 bytes.",
  "instance": "/ingest/records",
  "extensions": { "limitBytes": 262144 }
}

Example (bad action)

{
  "type": "urn:connectsoft:errors/validation/action.invalid",
  "title": "Invalid action",
  "status": 400,
  "detail": "Expected 'verb' or 'verb.noun' lowercase.",
  "errors": [{ "pointer": "/action", "reason": "regex" }]
}

JSON Schema (snippets, v1 addenda)

Add the following constraints to auditrecord.v1.json:

{
  "properties": {
    "action": { "type": "string", "maxLength": 64, "pattern": "^[a-z]+(\\.[a-z0-9_-]+)?$" },
    "resource": {
      "type": "object",
      "properties": {
        "type": { "type": "string", "maxLength": 128, "pattern": "^[A-Z][A-Za-z0-9]*(\\.[A-Z][A-Za-z0-9]*)*$" },
        "id":   { "type": "string", "maxLength": 128, "pattern": "^(?!.*\\s).+$" },
        "path": { "type": "string", "maxLength": 512 }
      }
    },
    "actor": {
      "type": "object",
      "properties": {
        "id":   { "type": "string", "maxLength": 128, "pattern": "^(?!.*\\s).+$" },
        "type": { "type": "string", "enum": ["Unknown","User","Service","Job"] }
      }
    },
    "attributes": {
      "type": "object",
      "propertyNames": { "pattern": "^[a-z][a-z0-9._-]{0,63}$" },
      "additionalProperties": { "type": "string", "maxLength": 256 }
    },
    "correlation": {
      "type": "object",
      "properties": {
        "traceId":    { "type": "string", "pattern": "^[a-f0-9]{32}$" },
        "requestId":  { "type": "string", "maxLength": 128 },
        "causationId":{ "type": "string", "pattern": "^[0-9A-HJKMNP-TV-Z]{26}$" }
      }
    }
  }
}

C# canonicalizers (gRPC code-first)

public static class Canon
{
    // Unicode NFC + trim + collapse inner whitespace
    public static string NormalizeText(string s)
    {
        if (string.IsNullOrWhiteSpace(s)) return string.Empty;
        var nfc = s.Normalize(NormalizationForm.FormC).Trim();
        return Regex.Replace(nfc, @"\s+", " ");
    }

    public static string CanonicalizeAction(string action)
    {
        var s = NormalizeText(action).ToLowerInvariant();
        if (!Regex.IsMatch(s, "^[a-z]+(\\.[a-z0-9_-]+)?$")) throw new ValidationException("action.invalid");
        return s;
    }

    public static string CanonicalizeResourceType(string type)
    {
        var s = NormalizeText(type);
        var segs = s.Split('.', StringSplitOptions.RemoveEmptyEntries)
                    .Select(Pascalize);
        var joined = string.Join('.', segs);
        if (!Regex.IsMatch(joined, "^[A-Z][A-Za-z0-9]*(\\.[A-Z][A-Za-z0-9]*)*$"))
            throw new ValidationException("resource.type.invalid");
        return joined;

        static string Pascalize(string x) =>
            Regex.Replace(x, @"(^|[_\-\s]+)([a-zA-Z0-9])", m => m.Groups[2].Value.ToUpperInvariant()) // word caps
                .Replace("_", "").Replace("-", "").Replace(" ", "");
    }

    public static string CanonicalizeTraceId(string hex32)
    {
        var s = NormalizeText(hex32).ToLowerInvariant();
        if (!Regex.IsMatch(s, "^[a-f0-9]{32}$")) throw new ValidationException("traceId.invalid");
        return s;
    }

    public static string CanonicalizeIp(string ip)
    {
        if (string.IsNullOrWhiteSpace(ip)) return ip;
        if (System.Net.IPAddress.TryParse(ip.Trim(), out var addr))
        {
            if (addr.IsIPv4MappedToIPv6) addr = addr.MapToIPv4();
            return addr.AddressFamily == System.Net.Sockets.AddressFamily.InterNetworkV6
                ? addr.ToString()!.ToLowerInvariant()            // .NET outputs RFC 5952-ish
                : addr.ToString();                                // IPv4 dot-decimal
        }
        throw new ValidationException("ip.invalid");
    }

    public static string Truncate(string s, int max) =>
        s.Length <= max ? s : s.Substring(0, max);

    public static (DateTimeOffset createdAt, DateTimeOffset observedAt, DateTimeOffset? effectiveAt)
        ValidateClocks(DateTimeOffset createdAt, DateTimeOffset? effectiveAt, DateTimeOffset nowUtc)
    {
        if (createdAt > nowUtc.AddMinutes(2)) throw new ValidationException("createdAt.futureBeyondSkew");
        if (effectiveAt is { } e && e > createdAt) throw new ValidationException("effectiveAt.afterCreatedAt");
        var observedAt = nowUtc; // set by platform
        return (RoundMs(createdAt), RoundMs(observedAt), effectiveAt is null ? null : RoundMs(effectiveAt.Value));

        static DateTimeOffset RoundMs(DateTimeOffset t) => new DateTimeOffset(t.UtcDateTime.AddTicks(-(t.UtcDateTime.Ticks % TimeSpan.TicksPerMillisecond)), TimeSpan.Zero);
    }
}

Apply canonicalizers before computing the canonical JSON and PayloadBytes. Reject on first hard violation; include Problem+JSON hints.


UA, IP & path normalization (field guidance)

  • Prefer attributes["client.ip"], attributes["server.ip"], attributes["client.userAgent"].
  • Store normalized IPs and truncated UA (≤256).
  • For resource sub-paths, prefer JSON Pointer (e.g., /lines/0/price), not dotted paths.

Consistency & projection mirroring

  • Projections assume canonicalized inputs; do not re-normalize, only validate invariants when enriching (e.g., computing ChangedFields, bitmasks).
  • Search docs and exports must reflect the post-redaction, canonical values only.

Operational guards

  • Ingress backpressure: reject when payloads regularly near the 256 KiB ceiling; emit Validation.Alert metrics.
  • Shadow validation: periodically sample stored payloads and re-run canonicalizers to detect drift.
  • Schema pinning: assert auditRecord.schemaVersion matches registry; reject unknown majors.

Privacy & PII Inventory

Maps all AuditRecord fields to sensitivity classes and defines data minimization and mask-on-read behavior. Inventory drives redaction at write, project, search, export, and read time.

JSON uses lowerCamel; C#/gRPC uses PascalCase; DB tables/columns use PascalCase. DataClassFlags is a bitmask on projections: Public=1, Internal=2, Personal=4, Sensitive=8, Credential=16, Phi=32.


Data classes (recap)

Class Description Examples Default read mask
Public Harmless metadata action, resource.type None
Internal Ops-only, non-PII correlation ids, shard ids None
Personal Direct or indirect identifiers (PII) names, emails, IPs, device ids Partial mask / hash
Sensitive High-risk PII SSN, national id, exact GPS Tokenize / drop
Credential Secrets/tokens/passwords API keys, session tokens Drop at write
Phi Regulated health info diagnosis, treatment codes Tokenize / policy-gated

Redaction rules per class are defined in Data Classification & Redaction Rules. This section focuses on where those classes apply.


Field → DataClass inventory (canonical AuditRecord)

Path Type DataClass Minimization posture
auditRecordId ULID Internal Keep.
tenantId token Internal Keep.
schemaVersion string Internal Keep.
createdAt,observedAt,effectiveAt timestamp Internal Round to ms; keep.
action string Public Keep lowercased.
resource.type string Public Keep PascalCase.
resource.id string Personal (pseudonymous) Keep opaque id; never expand.
resource.path string Internal Normalize JSON Pointer; keep.
actor.id string Personal (pseudonymous) Keep opaque id.
actor.type enum Public Keep.
actor.display string Personal Partial mask or hash (UI default: mask).
decision.outcome enum Internal Keep.
decision.reason string Internal Truncate ≤256.
correlation.traceId hex32 Internal Keep lowercase.
correlation.requestId string Internal Keep trimmed.
correlation.causationId ULID Internal Keep.
producer string Internal Keep.
attributes["client.ip"] ip Personal IPv4/6 canonical; mask on read (e.g., /24, /64).
attributes["server.ip"] ip Internal Keep canonical.
attributes["client.userAgent"] string Personal Truncate 256; mask on read (suffix elide).
attributes["email"] / *.email string Personal Store hash(email); optionally store masked a***z@d***.tld.
attributes["phone"] / *.phone string Personal Normalize E.164; store masked; hash full value.
attributes["geo.lat"]|["geo.lon"] number Sensitive Quantize; tokenize or drop if policy forbids.
attributes["secret"|"password"|"token"|"apiKey"] string Credential Drop at write; keep a hash-of-hash fingerprint optionally.
delta.fields[*].before/after scalar Class of field Enforce per-field class; large strings redact or tokenize.
integrity.* proof Internal Keep (sidecar) once sealed.

Attribute classification uses key heuristics (see below) plus tenant policy overrides.


Attribute classification heuristics

When attributes are free-form, apply pattern-based classification at ingress:

Pattern DataClass Rule
(?i)\b(email|e-mail)\b Personal Store SHA-256(email) + masked printable (e.g., f***@example.com).
(?i)\b(phone|mobile|msisdn)\b Personal Normalize E.164; store masked + hash.
(?i)\b(ssn|nin|national[_-]?id)\b Sensitive Tokenize (format-preserving if needed) or drop.
(?i)\b(password|secret|api[_-]?key|token|credential|bearer)\b Credential Drop value; keep indicator present=true.
(?i)\b(ip|client\.ip|remote[_-]?addr)\b Personal Canonicalize; mask on read.
(?i)\b(gps|geo\.(lat|lon)|location)\b Sensitive Quantize (e.g., 2 decimals) + tokenize or drop.
(?i)\b(name|first[_-]?name|last[_-]?name|full[_-]?name)\b Personal Mask on read; optional hash.

Tenants can override via a classificationOverrides map (key → DataClass), versioned in policy.


Default redaction on write

Apply these write-time transformations before storage/hashing:

DataClass Write action
Credential Drop value; store {present:true} and optional sha256(sha256(value)) fingerprint.
Sensitive Tokenize or hash (irreversible) per tenant policy; optional bucketing (e.g., age bands).
Personal Store value if necessary for audit, but also store masked variant for read; hash common identifiers.
Internal/Public Keep as-is (normalized).

Masking on read (role × purpose)

Server-side readers must apply a masking profile derived from role and purpose-of-use.

Profile Intended users Personal Sensitive Credential Phi
Safe (default) Console users, search Mask (email a***z@e***.com, IP /24//64) Tokenized Omit Tokenized
Support Support tickets Mask Omit Omit Omit
Investigator (JIT) Security/IR with approval Unmask (JIT logged) Tokenized Omit Tokenized
Raw (policy-gated) Legal export with basis Unmask Unmask per DPA Omit Unmask if lawful

All unmask operations are just-in-time (JIT), time-bound, and audited (who, when, purpose, scope).


API contract (hint)

Reads accept an optional Redaction header or query parameter:

Redaction: profile=Safe|Support|Investigator|Raw; purpose="Incident #1234"; expiry=2025-10-31T23:59:59Z

Server may downgrade profile based on tenant policy and user role.


JSON: classification hints (partials)

Add optional hints to AuditRecord for explicit tagging and masking provenance:

{
  "classification": {
    "record": ["Internal","Personal"],
    "fields": {
      "actor.display": ["Personal"],
      "attributes.client.ip": ["Personal"],
      "attributes.email": ["Personal"]
    },
    "policyRef": { "id": "policy-default", "revision": 3 }
  },
  "redaction": {
    "planId": "policy-default",
    "appliedAt": "2025-10-22T12:00:00Z",
    "profile": "Safe"
  }
}

C# helpers (masking)

public enum RedactionProfile { Safe, Support, Investigator, Raw }

public static class Mask
{
    public static string Email(string email, RedactionProfile p) =>
        p switch {
            RedactionProfile.Safe or RedactionProfile.Support => MaskEmail(email),
            RedactionProfile.Investigator or RedactionProfile.Raw => email,
            _ => MaskEmail(email)
        };

    public static string Ip(string ip, RedactionProfile p)
    {
        if (p == RedactionProfile.Raw || p == RedactionProfile.Investigator) return ip;
        if (System.Net.IPAddress.TryParse(ip, out var addr))
        {
            if (addr.AddressFamily == System.Net.Sockets.AddressFamily.InterNetwork)
            {
                var oct = addr.ToString().Split('.');
                return $"{oct[0]}.{oct[1]}.{oct[2]}.0/24";
            }
            // IPv6 /64
            var hextets = addr.ToString().Split(':');
            return string.Join(':', hextets.Take(4)) + "::/64";
        }
        return ip;
    }

    private static string MaskEmail(string email)
    {
        var parts = email.Split('@');
        if (parts.Length != 2) return email;
        string m(string s) => s.Length <= 2 ? new string('*', s.Length) :
                              $"{s[0]}{new string('*', s.Length - 2)}{s[^1]}";
        var local = m(parts[0]);
        var dom = parts[1].Split('.');
        var domainMasked = dom.Length >= 2 ? $"{m(dom[0])}.{dom[^1]}" : m(parts[1]);
        return $"{local}@{domainMasked}";
    }
}

Search & exports

  • Search index: index only masked or hashed forms for Personal/Sensitive; never index Credential.
  • Exports: honor the job’s redactionPlan. For legal hold exports, default to Investigator or Raw only when lawful basis exists and is recorded in the job metadata.

SQL masking views (illustrative)

CREATE VIEW dbo.AuditEvents_Masked AS
SELECT
  TenantId, AuditRecordId, CreatedAt, ObservedAt, Action, ResourceType, ResourceId,
  CASE WHEN DataClassFlags & 4 = 4 THEN -- Personal
       CONCAT(LEFT(ActorId, 2), '***') ELSE ActorId END AS ActorId,
  DecisionOutcome, ChangedFields, DataClassFlags
FROM dbo.AuditEvents;

Use DB views for BI/reporting contexts that cannot call service-side maskers.


Examples

Stored authoritative JSON (after write-time minimization)

{
  "actor": { "id": "user_123", "type": "User", "display": "A. Smith" },
  "attributes": {
    "client.ip": "203.0.113.42",
    "client.userAgent": "Mozilla/5.0 ...",
    "email": "sha256:2c26b46b68ffc68ff99b453c1d304134..."
  },
  "resource": { "type": "Vetspire.Appointment", "id": "A-9981" },
  "action": "appointment.update",
  "schemaVersion": "auditrecord.v1"
}

Read (profile=Safe)

{
  "actor": { "id": "user_123", "type": "User", "display": "A***h" },
  "attributes": {
    "client.ip": "203.0.113.0/24",
    "client.userAgent": "Mozilla/5.0 …(masked)",
    "email.masked": "a***h@e***.com"
  }
}

Validation rules (summary)

  • Credential values are never persisted; reject or drop with redactionHint="Dropped".
  • Sensitive values require tokenize/hash at write unless tenant policy allows storage with heightened controls.
  • Personal values default to mask on read; store hashed surrogates for join-free investigations.
  • All unmask operations require purpose, approver (if configured), expiry, and generate read-access audit events.
  • DataClassFlags on projections reflect the union of classes present on the record and its delta.

Data Lifecycle & States

Models the end-to-end lifecycle of an AuditRecord from appendacceptedprojectedsealedeligiblepurgedexported. Defines clocks, transitions, and a durable lifecycle transition log.

JSON uses lowerCamel; C#/gRPC uses PascalCase; DB tables/columns use PascalCase. Times are ISO-8601 UTC with ms precision. “OnHold” is an overlay that blocks purge (see Legal Hold).


Overview

  • Authoritative write appends a canonical JSON payload (no integrity) and emits Accepted.
  • Projectors upsert query shapes and advance per-tenant watermarks.
  • Integrity Service batches leaves into segments/blocks and seals proofs (sidecar).
  • Retention Evaluator computes the earliest EligibleAt date respecting policy & legal holds.
  • Lifecycle performs purge once records are Eligible and not OnHold.
  • Export can occur anytime after Accepted; manifests reference integrity material when available.
  • All steps produce idempotent lifecycle transition entries.

Clocks

Name Source Purpose Rule
createdAt Producer Domain time of the event now + 2m skew; ms precision.
observedAt Platform Ingress time Set on accept.
sealedAt Integrity Time the block was sealed From Integrity Service.
eligibleAt Retention Earliest purge date From evaluator: policy × attributes × state.
purgedAt Lifecycle Authoritative deletion time Set by lifecycle job.
exportedAt Export Package/manifests creation time Per package.
projectedAt Projector Projection upsert time Optional; usually implicit via checkpoints.

State model

We model monotonic states plus overlays. A record may be exported multiple times, and OnHold can toggle independently.

stateDiagram-v2
  [*] --> Appended: Ingest Append
  Appended --> Accepted: Durable write
  Accepted --> Projected: Projectors upsert rows
  Projected --> Sealed: Integrity sealed (sidecar)
  Sealed --> Eligible: Retention evaluator computes date reached
  Accepted --> Eligible: (path when sealing disabled)
  Eligible --> Purged: Lifecycle purge
  Accepted --> Exported: Export produces package(s)
  Projected --> Exported
  Sealed --> Exported
  Exported --> Exported: Subsequent exports
  note right of Eligible: Blocked by OnHold overlay
  state OnHold <<choice>>
Hold "Alt" / "Option" to enable pan & zoom

Derived “current state” (live query):

  • Purged if not found in authoritative store and a Purged transition exists.
  • Else Eligible if now ≥ eligibleAt and no active holds.
  • Else Sealed if RecordIntegrity exists.
  • Else Projected if checkpoints ≥ record.
  • Else Accepted once durable.
  • Appended is transient (pre-commit).

Lifecycle transitions

Transitions are recorded in a durable log (append-only). Each transition is idempotent (same key → same effect).

Event When Required fields (JSON)
Accepted After authoritative insert auditRecordId, observedAt
Projected After all required projections upsert auditRecordId, projectedAt, projections
Sealed After integrity proof computed auditRecordId, sealedAt, blockId, segmentId, leafHash
EligibleComputed When evaluator computes date auditRecordId, eligibleAt, policyId, policyRevision
OnHoldApplied When a hold starts matching auditRecordId, holdId, placedAt
OnHoldReleased When the last matching hold ends auditRecordId, holdId, releasedAt
Purged After successful authoritative delete auditRecordId, purgedAt, reason
Exported Per package including the record auditRecordId, exportedAt, jobId, packageId, manifestUri

OnHold is modeled as entries; “currently on hold” is computed as (applied − released) across matching holds.


JSON Schemas (v1)

lifecycle-transition.v1.json

{
  "$id": "urn:connectsoft:schemas/lifecycle/lifecycle-transition.v1.json",
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "title": "LifecycleTransition",
  "type": "object",
  "additionalProperties": false,
  "properties": {
    "tenantId": { "type": "string", "pattern": "^[A-Za-z0-9._-]{1,128}$" },
    "eventId":  { "type": "string", "pattern": "^[0-9A-HJKMNP-TV-Z]{26}$" },
    "auditRecordId": { "type": "string", "pattern": "^[0-9A-HJKMNP-TV-Z]{26}$" },
    "kind": { "type": "string", "enum": ["Accepted","Projected","Sealed","EligibleComputed","OnHoldApplied","OnHoldReleased","Purged","Exported"] },
    "at": { "type": "string", "format": "date-time" },
    "traceId": { "type": "string", "pattern": "^[a-f0-9]{32}$" },
    "producer": { "type": "string", "maxLength": 64 },
    "data": { "type": "object", "additionalProperties": true }   // kind-specific
  },
  "required": ["tenantId","eventId","auditRecordId","kind","at"]
}

Kind-specific data payloads

  • Projected: { "projections": ["AuditEvents","ResourceEvents"], "projectedAt": ts }
  • Sealed: { "blockId": ULID, "segmentId": ULID, "leafHash": "hex64" }
  • EligibleComputed: { "eligibleAt": ts, "policyId": "id", "policyRevision": 3, "basis": "RuleName" }
  • Purged: { "reason": "Retention|GDPR.Request|Admin.Purge", "jobId": ULID? }
  • Exported: { "jobId": ULID, "packageId": ULID, "manifestUri": "s3://…", "exportedAt": ts }

C# (gRPC code-first)

[DataContract]
public sealed class LifecycleTransition
{
    [DataMember(Order = 1)]  public string TenantId { get; init; } = default!;
    [DataMember(Order = 2)]  public string EventId { get; init; } = default!;       // ULID
    [DataMember(Order = 3)]  public string AuditRecordId { get; init; } = default!; // ULID
    [DataMember(Order = 4)]  public string Kind { get; init; } = default!;          // enum name
    [DataMember(Order = 5)]  public DateTimeOffset At { get; init; }
    [DataMember(Order = 6)]  public string? TraceId { get; init; }                  // hex32
    [DataMember(Order = 7)]  public string? Producer { get; init; }
    [DataMember(Order = 8)]  public Dictionary<string, object>? Data { get; init; }
}

public static class LifecycleKinds
{
    public const string Accepted = "Accepted";
    public const string Projected = "Projected";
    public const string Sealed = "Sealed";
    public const string EligibleComputed = "EligibleComputed";
    public const string OnHoldApplied = "OnHoldApplied";
    public const string OnHoldReleased = "OnHoldReleased";
    public const string Purged = "Purged";
    public const string Exported = "Exported";
}

Storage mapping (SQL)

LifecycleTransitions (append-only, per tenant × record)

CREATE TABLE dbo.LifecycleTransitions (
  EventId        CHAR(26)      NOT NULL,  -- ULID
  TenantId       NVARCHAR(128) NOT NULL,
  AuditRecordId  CHAR(26)      NOT NULL,
  Kind           NVARCHAR(32)  NOT NULL,
  At             DATETIME2(3)  NOT NULL,
  TraceId        CHAR(32)      NULL,
  Producer       NVARCHAR(64)  NULL,
  DataJson       NVARCHAR(MAX) NULL,

  CONSTRAINT PK_LifecycleTransitions PRIMARY KEY (EventId),
  INDEX IX_Life_Tenant_Record_At (TenantId, AuditRecordId, At),
  INDEX IX_Life_Tenant_Kind_At (TenantId, Kind, At DESC)
);

The lifecycle log outlives the authoritative row by policy (e.g., 2 years) and contains no raw PII beyond opaque ids.


Eligibility & TTLs

Evaluator inputs

  • Record clocks (createdAt, effectiveAt), action, resource.type, attributes, tenant policy (RetentionPolicy), and active holds.

Outputs

  • eligibleAt timestamp and basis (rule id). Recomputed on policy change or hold changes.

Typical SLOs & TTLs

Stage SLO target TTL / cadence
Projected p95 < 60s from observedAt Continuous
Sealed p95 < 10m or block size threshold Batching
EligibleComputed within 24h of write or policy change Daily evaluator sweep
Purged within 7d of now ≥ eligibleAt (no holds) Daily lifecycle
Lifecycle log retention ≥ 2y post-purge (configurable) Separate policy

When includeIntegrity=false, the path Accepted → Eligible remains valid; Sealed is optional.


Current state API (hint)

GET /tenants/{tenantId}/records/{auditRecordId}/lifecycle

Response shape:

{
  "state": "Sealed",
  "onHold": true,
  "clocks": {
    "createdAt": "2025-10-22T12:00:03.100Z",
    "observedAt": "2025-10-22T12:00:03.300Z",
    "sealedAt": "2025-10-22T12:06:00.000Z",
    "eligibleAt": "2026-10-22T00:00:00.000Z",
    "purgedAt": null
  },
  "transitions": [
    { "kind": "Accepted", "at": "2025-10-22T12:00:03.300Z" },
    { "kind": "Projected", "at": "2025-10-22T12:00:45.020Z", "data": { "projections": ["AuditEvents"] } },
    { "kind": "Sealed", "at": "2025-10-22T12:06:00.000Z", "data": { "blockId": "01JE7E0...", "segmentId": "01JE7DZ..." } },
    { "kind": "EligibleComputed", "at": "2025-10-23T00:05:00.000Z", "data": { "eligibleAt": "2026-10-22T00:00:00Z", "policyId": "ret-std", "policyRevision": 3 } },
    { "kind": "OnHoldApplied", "at": "2025-11-01T09:00:00.000Z", "data": { "holdId": "01JEAH..." } }
  ]
}

Transition sources (wiring)

  • Accepted: write path completion; emits AuditRecord.Accepted event.
  • Projected: projector completion; may be inferred from Projection.Updated.
  • Sealed: from Integrity.ProofComputed (match by AuditRecordId via segment membership).
  • EligibleComputed: Retention Service evaluation.
  • OnHold*: Legal Hold matcher.
  • Purged: Lifecycle job after successful delete.
  • Exported: Export job on package completion.

All sources produce a LifecycleTransition row with traceId and producer set.


Edge cases & rules

  • Late arrivals/backfills: State reconstruction relies on log order by At; idempotency on (EventId) ensures safe replays.
  • Hold toggling: OnHoldApplied/OnHoldReleased can bracket any state; Eligible does not imply Purge until no holds remain.
  • Export pre-seal: Allowed; manifest may omit integrity bundle. Later exports SHOULD include it once sealed.
  • Purge vs Export: Purge never deletes lifecycle log or export manifests; it deletes authoritative payload and projections only.
  • Rebuilds: If lifecycle log is lost, recompute from authoritative store + sidecars + events; reseal times cannot be reconstructed, keep as null with a basis="Rebuilt" note.

Validation rules (summary)

  • Lifecycle transitions must be append-only; block UPDATE/DELETE.
  • At must be monotonic non-decreasing per (TenantId, AuditRecordId, Kind); ties allowed across kinds.
  • Purged requires record absence in authoritative store.
  • EligibleComputed.eligibleAt must be ≥ observedAt and derived from an existing policy revision.
  • Lifecycle log must contain no raw PII beyond opaque identifiers.

Performance & Size Budgets

Establishes size ceilings, throughput targets, compaction windows, and tiering to keep ingestion smooth, search snappy, and costs predictable.

JSON uses lowerCamel; C#/gRPC uses PascalCase; DB tables/columns use PascalCase. Sizes use binary units (KiB, MiB, GiB). Values are defaults—tenants/editions may override within safe bounds.


Quick budgets (at a glance)

Area Target / Ceiling Notes
Authoritative write (PayloadBytes) ≤ 256 KiB (hard) Reject > 256 KiB at ingress. See Validation & Canonicalization.
AuditEvents row ≤ 512 B typical Derived, compact; no large blobs.
Search doc ≤ 8 KiB Redacted & compact fields only.
Projection lag (p95) < 60 s From ObservedAt → visible in projections.
Integrity seal lag (p95) < 10 min Block/segment thresholds or time window.
Per-tenant ingest burst ≤ 2,000 rps; sustained ≤ 500 rps Edition/tier gates; per-shard back-pressure.
Global ingest plan for ≥ 50k rps Scale shards linearly; see shard ring.
Export package target 512 MiB raw per package Balanced for network & resume safety.
Search shard size 20–50 GiB after merge Per index rollover/ILM.
Hot data window 7–30 days Fast storage, high refresh.
Warm window 30–180 days Cheaper storage, slower refresh.
Cold/Archive 6–84 months Object storage snapshots / parquet exports.

Authoritative store budgets

Row shape & bytes

  • AuditRecordRow.PayloadJson dominates size; keep envelopes lean.
  • Ceilings (hard):
    • PayloadBytes262,144
    • Attributes ≤ 64 pairs (key ≤ 64, value ≤ 256)
    • Delta.fields ≤ 256 entries, values ≤ 1,024 chars each (post-redaction)

Indexes (minimal)

  • (TenantId, CreatedAt) – primary scan path
  • (TenantId, CorrelationTraceId) – OTel correlation
  • (TenantId, IdempotencyKey) (filtered unique) – dedupe

Selectivity guidance

  • Composite keys always start with TenantId for RLS pruning.
  • Avoid additional secondary indexes unless filter selectivity < 10% over tenant partitions.

Write amplification guardrails

  • Max 2 secondary indexes on the authoritative table.
  • Aim for < 1.5× WAL/redo amplification per insert.

Integrity service sizing

Unit Target Trigger
Segment ~64 KiB–8 MiB of leaves or 4k records, whichever first Start a new segment when either threshold is hit.
Block ≤ 1,000,000 records or ≤ 10 min window Seal and sign; emit Integrity.ProofComputed.
  • Keep SealedAt jitter < 2 min to smooth proof availability.
  • Segment & block sizes are tunable per shard to meet p95 seal lag.

Projections (read models)

Tables

  • AuditEvents (primary), ResourceEvents, ActorEvents.

Storage budgets

  • Row ≤ 512 B typical; avoid wide JSON blobs.
  • ChangedFields ≤ 64 keys, string key ≤ 128 chars.

Indexes

  • (TenantId, CreatedAt DESC, AuditRecordId DESC) – universal seek
  • Per-resource and per-actor composites (see Read Models).

Checkpoint SLOs

  • Per projection × tenant checkpoint advances at least every 5 s under load.

Search index budgets

  • Doc size ≤ 8 KiB after analysis.
  • Refresh interval: 5 s (hot), 60 s (warm).
  • Primary shard size: 20–50 GiB post-merge.
  • Rollover: 30 GiB or 7 d (first wins).
  • ILM delete: ≤ authoritative retention for the tenant.

Selectivity hints

  • Always include tenantId as a must clause or alias filter.
  • Field-cardinality:
    • resourceType.kw high (1e2–1e4) ✅
    • action.kw medium (1e1–1e3) ✅
    • decisionOutcome low (3–4) – use as a filter, not sort.

Export sizing

  • Package raw bytes target: 512 MiB.
  • Compression: default Gzip → expect 3–6× reduction on JSONL.
  • Concurrency: up to 8 packages in-flight per job per shard.
  • Resume granularity: file offset checkpoints or package boundaries only.

Hot / Warm / Cold tiers

Tier What Storage Policy
Hot Authoritative + projections for most-recent window SSD / premium DB tier; search hot indices Fast ingest; frequent compaction; refresh 5s
Warm Older projections & search indices General SSD; colder DB tier Lower refresh; force-merge; fewer replicas
Cold Historical snapshots Object storage (WORM optional) Parquet/JSONL exports; integrity bundles persisted

Typical windows (suggested defaults)

  • Hot: 0–30 d, Warm: 30–180 d, Cold: > 180 d (subject to tenant retention).

Compaction & maintenance windows

  • Vacuum/Autovacuum (PG) / Index Rebuild (SQL Server): nightly maintenance window per shard (staggered).
  • Projection compaction: weekly CLUSTER/REINDEX (PG) or REORGANIZE (SQL Server) when bloat > 20%.
  • Search force-merge: warm phase to 1 segment per shard once write-complete.

Throughput & concurrency targets

Dimension Target Notes
Single shard ingest 3k rps sustained With 2 secondary indexes and WAL sync.
Latency (p95) Ingest < 50 ms; Project < 60 s; Seal < 10 min From ObservedAt.
Tenant burst 2k rps for 60 s Token bucket; smooth via queue.
Exporter 200 MiB/s per shard (network bound) Multi-part uploads.

Sizing heuristic

  • Records/day = rps * 86,400.
  • Authoritative storage/dayavgPayloadBytes * records/day * replicationFactor.
  • Ensure shard disks stay < 70% full at monthly peak.

Back-pressure & throttling

Ingress token bucket (per tenant)

  • Capacity = burstRps * 60.
  • Fill rate = sustainedRps.
  • HTTP 429/Problem+JSON when empty; include retryAfter.

System-wide pressure signals (any trips → slow producers)

  • WAL/transaction log > 80% of burst buffer.
  • DB queue depth p95 > 200 ms for 1 min.
  • CPU > 75% for 5 min on ingest nodes.
  • Disk IO latency > 20 ms p95.
  • Search indexing backlog > 15 min behind.

Shedding order

  1. Defer/slow export workers.
  2. Reduce search refresh rate (hot from 5s → 30s).
  3. Throttle tenant bursts (429).
  4. Pause non-critical projectors (actor/resource timelines) before AuditEvents.

Example capacity plan (rule-of-thumb)

Assume avgPayload = 2 KiB, global 20k rps, replication x3:

  • Authoritative/day ≈ 2 KiB × 20k × 86,400 × 3 ≈ 10.4 TiB/day.
  • With 30-day hot window → ~312 TiB hot-tier (pre-compaction).
  • Search docs (~1.2 KiB/doc) × 20k rps → ~6.7 TiB/day before merges; warm merge reduces by ~30–40%.

Shards: target ≤ 8 TiB hot data/shard → need ~40 shards for hot tier. Scale projectors & integrity workers per shard.


Monitoring & alerts (key SLOs)

  • Ingest: p95 < 50 ms; error rate < 0.1%.
  • Projection lag: p95 < 60 s; 99.9% < 5 min.
  • Seal lag: p95 < 10 min.
  • Search freshness: hot alias max age < 2 min.
  • Export: time-to-first-package < 2 min; throughput ≥ 100 MiB/s/job.

Alert when:

  • Any SLO violated for 5 consecutive minutes.
  • Disk utilization > 80% or predicted > 90% within 7 days.

Validation rules (summary)

  • Reject writes over 256 KiB; surface payload.tooLarge.
  • Enforce index minimalism on authoritative store (≤ 2 secondaries).
  • Search docs over 8 KiB are dropped with a reprocessor retry (after additional redaction).
  • Integrity uses configured segment/block thresholds; seal window must keep p95 < 10 min.
  • Export packages respect 512 MiB target and resume tokens; per-tenant concurrency caps apply.
  • Back-pressure must prefer fairness: no single tenant can starve others.

Fixtures, Samples & Test Data

Provides golden artifacts and repeatable generators for developers, CI, and integration partners. Artifacts cover authoritative writes, projections, search docs, events, and exports—validated against the Schema Registry and produced with deterministic seeds.

JSON uses lowerCamel; C#/gRPC uses PascalCase; tables/columns use PascalCase. Payloads follow canonical JSON (JCS/RFC 8785) where noted.


Principles

  • Deterministic: identical inputs → identical outputs (fixed RNG seed, fixed clock anchors).
  • Tenant-scoped: fixtures always set tenantId; RLS-safe.
  • Redaction-aware: no raw secrets; Personal/Sensitive fields are masked, hashed, or tokenized per policy.
  • Minimal & real-ish: small enough to grok; realistic enough to catch edge cases (delta, holds, idempotency).
  • Cross-form parity: JSONL ↔ projections ↔ events ↔ search docs ↔ exports represent the same facts.

Directory layout

/fixtures
/schemas
json/…           # JSON Schemas (registry-resolved copies)
avro/…           # Avro equivalents (subset)
proto/…          # Optional .proto (published language)
/authoritative
minimal-10.jsonl
delta-redaction-50.jsonl
hotday-1k.jsonl
idempotency-dupes-5.jsonl
/projections
audit-events-*.csv
resource-events-*.csv
/search
docs-*.jsonl
/events
appended-*.jsonl
accepted-*.jsonl
/sql
postgres-seed.sql
sqlserver-seed.sql
/exports
manifest-sample.json
package-README.md

File naming: *-v{schemaVersion}-{yyyymmdd}.(json|jsonl|csv|sql) when applicable.


Authoritative JSONL (golden)

/fixtures/authoritative/minimal-10.jsonl

{"tenantId":"splootvets","schemaVersion":"auditrecord.v1","auditRecordId":"01JE7K4J9F9D0S6E7X5Q1A3BCP","createdAt":"2025-10-22T12:00:03.100Z","observedAt":"2025-10-22T12:00:03.300Z","action":"user.create","resource":{"type":"Iam.User","id":"U-1001"},"actor":{"id":"svc_gw","type":"Service","display":"ingress-gw"},"attributes":{"client.ip":"203.0.113.42","client.userAgent":"Mozilla/5.0"}}
{"tenantId":"splootvets","schemaVersion":"auditrecord.v1","auditRecordId":"01JE7K4JAMJQ2B1NE8V3V7Y5ND","createdAt":"2025-10-22T12:01:00.000Z","observedAt":"2025-10-22T12:01:00.120Z","action":"appointment.update","resource":{"type":"Vetspire.Appointment","id":"A-9981","path":"/status"},"actor":{"id":"user_123","type":"User","display":"A. Smith"},"decision":{"outcome":"Allow"},"delta":{"fields":{"status":{"before":"Pending","after":"Booked"}}},"attributes":{"email":"sha256:2c26b46b68ffc68ff99b453c1d304134","client.ip":"2001:db8::1"}}

/fixtures/authoritative/delta-redaction-50.jsonl Contains variations:

  • credential-like keys (dropped at write, redactionHint=Dropped)
  • large strings (truncated), base64 caps, path normalization cases
  • correlation with traceId, requestId, causation chains

/fixtures/authoritative/idempotency-dupes-5.jsonl Same record repeated with idempotencyKey to validate dedupe behavior.

Each JSONL line validates against urn:connectsoft:schemas/domain/auditrecord.v1.json. Timestamps use ms precision.


Projections (derived CSV)

/fixtures/projections/audit-events-minimal-10.csv

TenantId,AuditRecordId,CreatedAt,ObservedAt,Action,ResourceType,ResourceId,ActorId,ActorType,DecisionOutcome,ChangedFields,DataClassFlags,CorrelationTraceId,PayloadBytes
splootvets,01JE7K4J9F9D0S6E7X5Q1A3BCP,2025-10-22T12:00:03.100Z,2025-10-22T12:00:03.300Z,user.create,Iam.User,U-1001,svc_gw,Service,,[],2,,512
splootvets,01JE7K4JAMJQ2B1NE8V3V7Y5ND,2025-10-22T12:01:00.000Z,2025-10-22T12:01:00.120Z,appointment.update,Vetspire.Appointment,A-9981,user_123,User,Allow,["status"],12,,1536

/fixtures/projections/resource-events-*.csv and actor-events-*.csv include monotonic Seq per key.


Search docs (redacted JSONL)

/fixtures/search/docs-minimal-10.jsonl

{"tenantId":"splootvets","auditRecordId":"01JE7K4J9F9D0S6E7X5Q1A3BCP","createdAt":"2025-10-22T12:00:03.100Z","observedAt":"2025-10-22T12:00:03.300Z","action":"user.create","resourceType":"Iam.User","resourceId":"U-1001","actorId":"svc_gw","actorType":"Service","changedFields":[],"dataClassFlags":2,"searchText":"user.create Iam.User U-1001 svc_gw"}
{"tenantId":"splootvets","auditRecordId":"01JE7K4JAMJQ2B1NE8V3V7Y5ND","createdAt":"2025-10-22T12:01:00.000Z","observedAt":"2025-10-22T12:01:00.120Z","action":"appointment.update","resourceType":"Vetspire.Appointment","resourceId":"A-9981","actorId":"user_123","actorType":"User","decisionOutcome":"Allow","changedFields":["status"],"dataClassFlags":12,"searchText":"appointment.update Vetspire.Appointment A-9981 user_123 status Booked"}

Event streams (enveloped JSONL)

/fixtures/events/appended-minimal-10.jsonl (one per authoritative line)

{"eventId":"01JE7K7G7B0Q3E5M7Z8X9V1C2D","eventType":"connectsoft.audit.v1/AuditRecord.Appended","tenantId":"splootvets","publishedAt":"2025-10-22T12:00:03.350Z","traceId":"3e1f2d0c9b8a7f6e5d4c3b2a19081716","schemaVersion":"event-envelope.v1","producer":"ingress-gw/2.4.1","data":{"auditRecordId":"01JE7K4J9F9D0S6E7X5Q1A3BCP","createdAt":"2025-10-22T12:00:03.100Z","observedAt":"2025-10-22T12:00:03.300Z","action":"user.create","resourceType":"Iam.User","resourceId":"U-1001","actorId":"svc_gw","actorType":"Service","hasDelta":false,"dataClassFlags":2,"payloadBytes":512}}

/fixtures/events/accepted-minimal-10.jsonl mirrors Accepted acks with status.


SQL seeds

/fixtures/sql/postgres-seed.sql

-- RLS/session context assumed set earlier (see Tenancy)
INSERT INTO "AuditRecords"
("AuditRecordId","TenantId","CreatedAt","ObservedAt","EffectiveAt","Action","ResourceType","ResourceId","ResourcePath","ActorId","ActorType","CorrelationTraceId","CorrelationRequestId","DecisionOutcome","IdempotencyKey","SchemaVersion","PayloadJson","PayloadBytes")
VALUES
('01JE7K4J9F9D0S6E7X5Q1A3BCP','splootvets','2025-10-22T12:00:03.100Z','2025-10-22T12:00:03.300Z',NULL,'user.create','Iam.User','U-1001',NULL,'svc_gw',1,'3e1f2d0c9b8a7f6e5d4c3b2a19081716',NULL,NULL,NULL,1,'{"tenantId":"splootvets","schemaVersion":"auditrecord.v1","auditRecordId":"01JE7K4J9F9D0S6E7X5Q1A3BCP","createdAt":"2025-10-22T12:00:03.100Z","observedAt":"2025-10-22T12:00:03.300Z","action":"user.create","resource":{"type":"Iam.User","id":"U-1001"},"actor":{"id":"svc_gw","type":"Service","display":"ingress-gw"}}',512);

INSERT INTO "AuditEvents"
("TenantId","AuditRecordId","CreatedAt","ObservedAt","Action","ResourceType","ResourceId","ActorId","ActorType","DecisionOutcome","ChangedFields","DataClassFlags","CorrelationTraceId","PayloadBytes")
VALUES
('splootvets','01JE7K4J9F9D0S6E7X5Q1A3BCP','2025-10-22T12:00:03.100Z','2025-10-22T12:00:03.300Z','user.create','Iam.User','U-1001','svc_gw',1,NULL,'[]',2,'3e1f2d0c9b8a7f6e5d4c3b2a19081716',512);

/fixtures/sql/sqlserver-seed.sql provides equivalent INSERT statements with NVARCHAR types.


Schemas (JSON & Avro)

/fixtures/schemas/json/auditrecord.v1.json → registry-resolved copy (read-only). /fixtures/schemas/avro/auditrecord.v1.avsc (subset for export pipelines):

{
  "type":"record","name":"AuditRecord","namespace":"connectsoft.domain.v1",
  "fields":[
    {"name":"tenantId","type":"string"},
    {"name":"auditRecordId","type":"string"},
    {"name":"schemaVersion","type":"string"},
    {"name":"createdAt","type":"string"},
    {"name":"observedAt","type":"string"},
    {"name":"action","type":"string"},
    {"name":"resource","type":{"type":"record","name":"Resource","fields":[
      {"name":"type","type":"string"},
      {"name":"id","type":"string"},
      {"name":"path","type":["null","string"],"default":null}
    ]}},
    {"name":"actor","type":{"type":"record","name":"Actor","fields":[
      {"name":"id","type":"string"},
      {"name":"type","type":"string"},
      {"name":"display","type":["null","string"],"default":null}
    ]}},
    {"name":"decision","type":["null",{"type":"record","name":"Decision","fields":[
      {"name":"outcome","type":"string"},
      {"name":"reason","type":["null","string"],"default":null}
    ]}],"default":null}
  ]
}

Export manifest sample

/fixtures/exports/manifest-sample.json

{
  "jobId":"01JE7M2F2N7QW8E9R0T1Y2U3I4",
  "tenantId":"splootvets",
  "createdAt":"2025-10-22T12:10:00Z",
  "format":"Jsonl",
  "packages":[
    {"packageId":"01JE7M2PAK0001","uri":"s3://bucket/us-central/splootvets/exports/2025/10/22/job-01JE7M2/part-0001.jsonl.gz","recordCount":500000,"sha256":"…"},
    {"packageId":"01JE7M2PAK0002","uri":"s3://bucket/us-central/splootvets/exports/2025/10/22/job-01JE7M2/part-0002.jsonl.gz","recordCount":120345,"sha256":"…"}
  ],
  "integrity":{"blockIds":["01JE7E0B5V2C6M9N3X7Z4K2J8L"],"signature":{"scheme":"Ed25519","signingKeyId":"kv:prod/atp-integrity/ed25519-2025-01"}}
}

Generators (C#, deterministic)

/fixtures/Generators.cs (excerpt)

[DataContract] public sealed class AuditRecord { /* domain contract (PascalCase) */ }

public static class FixtureGen
{
    public static IEnumerable<AuditRecord> Minimal(string tenantId, DateTimeOffset anchorUtc, int count, int seed = 1337)
    {
        var rnd = new Random(seed);
        for (var i = 0; i < count; i++)
        {
            var id = Ulid.NewUlid(anchorUtc.AddMilliseconds(i));
            var mk = (string verb, string rtype, string rid) => rnd.Next(2) == 0
                ? ("user.create","Iam.User",$"U-{1000+i}")
                : ("appointment.update","Vetspire.Appointment",$"A-{9000+i}");

            var (verb, rtype, rid) = mk("", "", "");
            yield return new AuditRecord {
                TenantId = tenantId,
                SchemaVersion = "auditrecord.v1",
                AuditRecordId = id.ToString(),
                CreatedAt = anchorUtc.AddMilliseconds(i),
                ObservedAt = anchorUtc.AddMilliseconds(i).AddMilliseconds(200),
                Action = verb,
                Resource = new() { Type = rtype, Id = rid },
                Actor = new() { Id = i % 3 == 0 ? "svc_gw" : "user_123", Type = i % 3 == 0 ? "Service" : "User", Display = i % 3 == 0 ? "ingress-gw" : "A. Smith" },
                Decision = i % 2 == 0 ? new() { Outcome = "Allow" } : null
            };
        }
    }
}

Emit JSONL canonically

var options = new JsonSerializerOptions { PropertyNamingPolicy = JsonNamingPolicy.CamelCase, WriteIndented = false };
await using var w = new StreamWriter("authoritative/minimal-10.jsonl", false, new UTF8Encoding(false));
foreach (var r in FixtureGen.Minimal("splootvets", DateTimeOffset.Parse("2025-10-22T12:00:00Z"), 10))
{
    var json = JsonSerializer.Serialize(r, options);
    await w.WriteLineAsync(json);
}

Compute PayloadBytes as the UTF-8 byte length of the canonical JSON. Integrity hashing uses the JCS form of that JSON.


Validation harness

  • JSON Schema: validate all JSON/JSONL via the registry copies in /fixtures/schemas/json.
  • Cross-shape parity:
    • For each authoritative line, lookup the same auditRecordId in projections, search docs, and events.
    • Assert invariants: createdAt ≤ observedAt, classification flags, changedFields derivation.
  • RLS sanity: run seeds through both Postgres and SQL Server scripts with tenant context set; SELECT COUNT(*) per tenant must match line counts.
  • Redaction: verify no Credential keys/values appear in any artifact; verify IP masks in read profiles.

Test matrix (scenarios)

ID Case Purpose
T01 Minimal create/read Happy path; schema/clock sanity
T02 Update with delta ChangedFields extraction; search doc text
T03 Idempotent retry Accepted(Duplicate) emission; unique key
T04 Redaction at write Credential drop; email hash/mask
T05 IPv6 + UA Canonicalization; truncation
T06 Legal hold overlay Lifecycle blocks purge
T07 No-seal tenant Eligible path without integrity sidecar
T08 Large payload near 256 KiB Back-pressure/error surfacing
T09 Residency pin Export target path prefixes contain region/tenant
T10 Policy change Policy.Changed effect on eligibility

Notes

  • Golden fixtures are immutable—add new files for new cases/versions; never rewrite in place.
  • Keep a small README in each folder explaining origin, schema version, and generation seed.
  • For CI, treat /fixtures/** changes as contract-affecting; require approval to merge.