Skip to content

PII Redaction & Classification - Audit Trail Platform (ATP)

Minimize exposure at write, enforce least-privilege at read — ATP protects sensitive data through classification and deterministic redaction.


Purpose & Scope

  • Define ATP's data classification taxonomy aligned with ConnectSoft.Extensions.Compliance.
  • Specify redaction policies per data class (write-time vs read-time).
  • Describe enforcement mechanisms across ingestion, storage, query, and export.
  • Integration with ConnectSoft microservice template patterns and compliance extensions.

Classification Model & Taxonomy

Data Classes (canonical)

Class Examples Default Handling
public doc titles, generic counters no redaction
internal service diagnostics, non-PII metadata minimize on logs
restricted tenant ids, correlation ids, access routes hash in logs, mask on export (non-DSAR)
secret.pi email, phone, postal address, national IDs hash or tokenize; drop where not needed
secret.phi medical history, lab results tokenize or redact; DSAR exception path
secret.keys tokens, secrets, key material (never persisted in payloads) never log; block storage by policy

Field Catalog (schema-tag driven)

  • Source of truth: Schema Registry (per bounded context), fields tagged with:
    • class, detector, redactionTemplate, purposeHints[], allowedExports[]
  • Runtime augmentation: detectors (regex/statistical/ML) confirm & enrich tags for free-text fields (comments, notes).

Schema tag example

field: "client.email"
class: "secret.pi"
detector: ["email.regex.v2"]
redactionTemplate: "pii-default/email"
purposeHints: ["contact", "notification"]
allowedExports: ["dsar"]

Redaction Policy Model

Modes

  • Write-time minimization: before commit to hot store (preferred).
  • Read-time redaction: applied by PEP-2 at query/export boundaries.
  • Log-time redaction: always on for structured logs/metrics/traces.

Templates (declarative)

policyVersion: 1.0.0
redactTemplates:
  pii-default:
    email:   { mode: hash, salt: tenant }      # sha256(tenantSalt||value)
    phone:   { mode: hash, salt: tenant }
    address: { mode: drop }
    ssn:     { mode: mask, keepLast: 4 }
  phi-default:
    diagnosis: { mode: tokenize, vault: "tok/pii" }
    notes:     { mode: drop-if-not-purpose, purpose: "clinical" }
logPolicy:
  drop: ["secret.keys", "secret.phi"]           # never log
  hash: ["secret.pi", "restricted"]

Determinism: hashing must be tenant-salted. Tokenization uses a jurisdiction-local vault and reversible flows require dual approval.


Enforcement Across the Core Path

flowchart LR
  A[Client/API] -->|JWT+mTLS| G[Gateway • PEP-1]
  G --> I["Ingestion • PEP-2 (write-time minimization)"]
  I --> H[(Hot/WORM)]
  H --> Q["Query • PEP-2 (read-time redaction)"]
  Q --> E["Export • PEP-2 (template + route guard)"]
  subgraph Control Plane
    P[Policy/OPA]
    R[Schema Registry]
    V[Tokenization Vault]
  end
  G --- P
  I --- P
  Q --- P
  E --- P
  I --- R
  Q --- R
  E --- V
Hold "Alt" / "Option" to enable pan & zoom

PEP-1 (Gateway)

  • Rejects calls lacking purpose, tenant_id, region_code.
  • Annotates requests with signed X-Policy-* headers (no PII).

PEP-2 (Services)

  • Write path: apply minimization by template & schema tags before append.
  • Read path: apply redaction per class, purpose, and export route.
  • Export path: require redact template unless purpose=dsar_export.

Detectors & Connectors

  • Regex/Rule detectors: email/phone/SSN/IBAN patterns (versioned).
  • Statistical: entropy + dictionary checks for IDs; low false positive budget.
  • NLP (optional): PHI entities (medication, condition) in free text.
  • ConnectSoft.Extensions.Compliance providers:
    • IRedactor (hash/mask/drop/tokenize), ILogRedactor, IClassifier.
    • Built-ins: EmailRedactor, PhoneRedactor, SecretScrubber, TenantSaltProvider.

C# integration (logging-safe)

// Injected from ConnectSoft.Extensions.Compliance
public sealed class EvidenceLogger(ILogRedactor redactor, ILogger<EvidenceLogger> log)
{
    public void LogAccepted(AppendRequest req)
    {
        var safe = redactor.RedactObject(req);   // respects schema tags + templates
        log.LogInformation("append.accepted {@request}", safe);
    }
}

Tokenization & Vault

  • Use when: business processes require reversible lookups (e.g., dedup, DSAR trace-backs).
  • Requirements:
    • Vault keys are tenant-scoped, region-anchored; mTLS + workload identity.
    • No direct export of tokens → plain; only with dual-approval and purpose=dsar_export.
  • Format: tok:{namespace}:{alg}:{ciphertext} (no raw hints).

Policy-as-Code (OPA/Rego)

Export guard (PII/PHI)

package atp.export

default allow = false
deny[msg] {
  input.resource.class in {"secret.pi","secret.phi"}
  input.token.purpose != "dsar_export"
  msg := "export purpose not permitted for secret class"
}
deny[msg] {
  input.resource.class == "secret.pi"
  input.export.redactTemplate == ""
  msg := "missing redact template for PII export"
}
allow { not deny[_] }

Write-minimize guard

package atp.write

deny[msg] {
  input.op == "append"
  some f
  f := input.payload.fields[_]
  data.schema[f].class in {"secret.pi","secret.phi"}
  not input.payload.minimized[f]
  msg := sprintf("field %s not minimized", [f])
}

Observability & Evidence

  • Metrics: redaction.applied.count, redaction.dropped.count, classification.detected.count, tokenize.ops.count, log.pii_blocked.count.
  • Decision log (PII-safe):
{
  "ts":"2025-10-29T08:45:15Z",
  "policyVersion":"1.0.0",
  "decision":"allow",
  "op":"export",
  "class":"secret.pi",
  "redactTemplate":"pii-default",
  "tenantId":"7c1a-…",
  "regionCode":"EU",
  "purpose":"dsar_export",
  "correlationId":"6b3f-…"
}
  • Evidence packs: redact template snapshot, schema tag versions, vault key lineage (ids only), OPA bundle policyVersion.

Developer Experience & Contracts

  • SDK helpers (ConnectSoft.Extensions.Compliance):
    • RedactObject<T>(obj, purpose), RedactJson(json, template), HashDeterministic(value, tenantSalt).
  • API contracts:
    • Request/response DTOs must carry classification hints for free-text fields (e.g., contentClassHint="secret.phi").
    • Services must not echo raw values for classes secret.* in errors or logs.

DTO hint example

{
  "notes": "…",
  "notesClassHint": "secret.phi",
  "purpose": "clinical"
}

Test Matrix (samples)

Scenario Expect
Append with email and phone (no purpose) Deny @ PEP-1 (missing purpose)
Append with email → write-minimize enabled Stored as hash/token only
Query warm model → class secret.pi, purpose default Fields masked/hashed
Export DSAR (tenant EU) Allow + pii-default template + in-region
Export without template, class secret.pi, purpose not DSAR Deny
Logs include PHI Block + log.pii_blocked alert

Risks & Mitigations

  • Detector drift / false negatives → versioned detectors, canary audits, periodic sampling of free-text with manual review.
  • Inconsistent templates → OPA lint in CI to validate schema↔template mappings.
  • Token vault misuse → dual-approval for detokenization; rate limits; SIEM alerts on anomaly.
  • Developer bypass → SDK shims at HTTP layer; automated PII linters in CI; contract tests on deny paths.

Artifacts Produced

  • platform/pii-redaction-classification.md (this doc).
  • Redaction templates: policies/redaction/*.yaml.
  • Schema tags updates across services.
  • OPA bundles: bundles/atp.export.rego, bundles/atp.write.rego.
  • Compliance SDK samples: /samples/compliance/RedactionPlayground.

Acceptance (Done When)

  • Every service exposes a classification map (fields → classes) and passes deny-path contract tests.
  • Write-time minimization is enabled for secret.* in ingestion.
  • Exports of secret.* require purpose and template; DSAR path verified end-to-end.
  • Logs/traces are PII-safe with measurable log.pii_blocked.count == 0 (steady state).
  • Evidence packs include policyVersion, schema tag checksums, and OPA signatures.

Cross-References

  • Security & Compliance (control framework, threat model) → platform/security-compliance.md
  • Data Residency & Retention (export routes, DSAR, holds) → platform/data-residency-retention.md
  • Tenancy & ABAC Guards (purpose binding, region checks) → platform/multitenancy-tenancy.md
  • Observability (SIEM signals, dashboards) → operations/observability.md
  • Key Management (vault, hashing salts, jurisdiction) → hardening/key-rotation.md

Guardrails (quick checklist)

  • Apply write-time minimization for secret.* before append.
  • Require purpose and redact template on all sensitive reads/exports (except DSAR purpose).
  • Logs/traces never contain raw secret.*; use tenant-salted hashes.
  • Tokenization is jurisdiction-local, rate-limited, and dual-approved for detokenization.
  • Policies and detectors are versioned, signed, and tested in CI with deny-path coverage.

Classification Taxonomy Overview

ATP uses a platform-wide classification standard built on the Microsoft.Extensions.Compliance foundation and surfaced via ConnectSoft.Extensions.Compliance. Classification is deterministic, versioned, and monotonic (risk can be elevated but never downgraded). Every classified value carries metadata used for guards, redaction, and evidence.

Canonical Classes

Class Meaning (short) Typical Examples
Public Non-sensitive Doc titles, non-PII tags
Internal Operational metadata; low risk requestId, instanceId, pod names
Personal Identifiable, lower risk Display name, city, device ID, IP address
Sensitive Higher-risk PII/financial Email, phone, postal address, last4 PAN
Credential Secrets/tokens/keys — never store raw API keys, OAuth tokens, passwords, JWTs
PHI Health information (regulated) Patient IDs, diagnosis, vitals

Services may upgrade (e.g., Personal → Sensitive) but must never downgrade a class.

Metadata Model (attached to fields)

Classification is persisted alongside values and included in decision/evidence logs.

{
  "path": "user.email",
  "dataClass": "Sensitive",
  "source": "schema|detector|override",
  "detectorId": "email.regex.v2",
  "confidence": 0.99,
  "policyVersion": "1.0.0",
  "upgradedFrom": "Personal",
  "upgradedBy": "ingestion-guard",
  "classifiedAt": "2025-10-29T09:15:00Z"
}
  • path: JSON path/property name.
  • dataClass: one of the canonical classes above.
  • source: how it was decided (schema tag, detector, tenant override).
  • policyVersion: ties to policy bundle; enables replay.
  • upgradedFrom / upgradedBy: present only on upgrades.
  • confidence: detectors provide a score (schema/override = 1.0).

Monotonic Classification (never-downgrade guard)

  • Allowed transitions: Public → Internal → Personal → Sensitive → PHI, plus * → Credential (terminal).
  • Forbidden: any transition that reduces sensitivity.

Policy-as-code (Rego sketch)

package atp.classification

default allow = false

# allowed escalation chain
rank = {"Public":0,"Internal":1,"Personal":2,"Sensitive":3,"PHI":4,"Credential":5}

deny[msg] {
  input.prevClass != ""
  rank[input.newClass] < rank[input.prevClass]
  msg := sprintf("downgrade blocked: %v → %v", [input.prevClass, input.newClass])
}

allow {
  not deny[_]
}

Foundation & Alignment

  • Microsoft.Extensions.Compliance provides base taxonomy, redactors, and logging hooks.
  • ConnectSoft.Extensions.Compliance adds:
    • Attribute model for DTOs (e.g., [EmailData], [SecretData], [HealthData]).
    • Deterministic hashing utilities with tenant-scoped salts.
    • Policy loader & cache keyed by policyVersion.

DTO example (attributes)

public sealed class AppendUserEventDto
{
    [EmailData]      public string Email { get; init; } = "";
    [PhoneData]      public string? Phone { get; init; }
    [SecretData]     public string? OAuthToken { get; init; }  // never logged
    [HealthData]     public string? DiagnosisNote { get; init; }
    public string? City { get; init; } // Personal by schema tag
}

Classification Sources & Precedence

  1. Schema tags (authoritative, versioned with contracts)
  2. Tenant overrides (may only upgrade)
  3. Detectors (regex/statistical/NLP) for free text & unknown fields

Precedence is most restrictive wins. All decisions record source and policyVersion.

Storage & Evidence

  • At rest: store raw (except Credential, which is dropped/hashed) + classification metadata.
  • In logs: only metadata and redacted values; never raw for Credential/PHI.
  • Evidence: classification decisions stream (PII-safe) includes path, dataClass, source, policyVersion, and upgrade info.

Quick Checklist

  • Classes use canonical names above; no ad-hoc labels.
  • Decisions carry policyVersion and source; detectors must include detectorId.
  • No downgrades — enforced by guard; upgrades are logged with provenance.
  • DTOs/Contracts marked with compliance attributes; logs use compliance redactors.
  • Credential values are never persisted or logged in raw form.

Data Class Definitions

Data Class Meaning Examples Write-Time Action Read-Time Action
Public Non-sensitive; safe to expose Action verbs, non-PII tags None None
Internal Operational metadata; limited exposure requestId, instanceId, pod names None Mask (optional)
Personal PII light; identifiable but lower risk Display name, city, device ID None (store raw) Hash or Mask
Sensitive PII/financial; strict protection Email, phone, address, last4 PAN None (store raw + masked/hashed variant) Hash or Mask
Credential Secrets/tokens/keys; never store raw API keys, OAuth tokens, passwords, JWTs Hash (fingerprint) or Drop Drop (never returned)
PHI Health information; regulated (HIPAA) Diagnosis notes, vitals, patient IDs None (store raw, classified) Mask or Tokenize

Defaults & Hints

  • Public/Internal: logged freely (PII-safe format); Internal may be masked in public exports.
  • Personal: default Mask for user-facing reads; Hash for analytics joins.
  • Sensitive: compute pre-redacted variant at write for hot paths (email/phone).
  • Credential: persist only a double-hash fingerprint or a presence flag (present=true); never echo.
  • PHI: Mask for most roles; Tokenize when reversible joins are required (vault-scoped to region/tenant).

Redactor presets (ConnectSoft.Extensions.Compliance)

redactors:
  Public:     none
  Internal:   mask(edge=0..2)           # optional
  Personal:   mask(showFirst=1, showLast=1) | hash(tenantSalted)
  Sensitive:  email|phone specialized redactors; else mask | hash(tenantSalted)
  Credential: secret.erase | jwt.header-only | fingerprint(sha256^2)
  PHI:        mask(all) | tokenize(fpe, vault=region/tenant)

DTO attribute mapping (example)

public sealed class EvidenceDto
{
    public string Title { get; init; } = "";              // Public
    [PersonalData] public string? City { get; init; }     // Personal
    [EmailData]    public string? Email { get; init; }    // Sensitive
    [SecretData]   public string? ApiKey { get; init; }   // Credential (dropped/hashed)
    [HealthData]   public string? Diagnosis { get; init; }// PHI
}

Enforcement notes

  • Never downgrade class (Personal → Public is blocked); upgrades are logged with upgradedFrom.
  • Errors & logs must respect class redaction (Credential/PHI never appear raw).
  • Exports require an explicit redaction template for Personal/Sensitive/PHI; Credential is always dropped.

Redaction Strategies & Techniques

ATP applies deterministic redaction with idempotent results. Redaction is policy-driven and evaluated at PEP-2 (write/read/export) and at log sinks. Where hashing is used, it is tenant-salted to prevent cross-tenant joins.

Redaction kinds (canonical)

  • None — pass-through. Used for Public and sometimes Internal in trusted contexts.
  • HashHMAC-SHA256(tenantSalt, value); stable within a tenant, different across tenants/regions.
  • Mask — preserve edges (showFirst/showLast) and structure; middle replaced with *.
  • Drop — remove value entirely; may keep a present=true indicator or fingerprint.
  • Tokenize — reversible token using format-preserving encryption (FPE) via a region/tenant-scoped vault.

Fingerprint: sha256(sha256(value)) (no salt) recorded only for audit correlation; never used for joins.

Deterministic hashing (tenant-salted)

public static string HashTenantScoped(string value, string tenantId, IKeyVault kv)
{
    // Salt is HSM/KMS-backed per-tenant, per-region (rotated via key lineage)
    var salt = kv.GetTenantSalt(tenantId);
    using var hmac = new HMACSHA256(salt);
    return Convert.ToHexString(hmac.ComputeHash(Encoding.UTF8.GetBytes(value))).ToLowerInvariant();
}
  • Rotation: rotate tenant salts; re-hash on access is not required if raw retained; otherwise run a background re-hash job.
  • Cross-tenant privacy: prevents joins between tenants on hashed PII.

Masking patterns (presets)

Type Example input Mask rule (default) Output example
Email john.doe@example.com showFirst=1, showLast=1 j**e@example.com
Phone +1-555-123-4567 showLast=4 ****4567
PAN 4532-1234-5678-9010 keep groups, last4 **** **** **** 9010
Name Rachel Green first initial only R***** G*****
IPv4 192.168.1.42 /24 192.168.1.x
IPv6 2001:db8:85a3::7334 /64 2001:db8:85a3::/64
GUID 550e8400-e29b-41d4-a716-... prefix only 550e8400-****

Tokenization (reversible, under controls)

  • When: analytics joins, DSAR tracebacks where hashing is insufficient.
  • How: FPE FF3-1 in a jurisdiction-local vault; keys are tenant-scoped.
  • Format: tok:{ns}:{alg}:{ciphertext} — no raw hints or partial plain text.
  • Controls: dual-approval for detokenization, rate limits, SIEM alerts on spikes.

ConnectSoft.Extensions.Compliance redactors (catalog)

  • EmailRedactorj**e@example.com
  • PhoneLast4Redactor****4567
  • PanLast4Redactor**** **** **** 1234
  • JwtRedactor<jwt>.<redacted>.<redacted> (strict mode: erased)
  • IpAddressRedactor → IPv4 /24, IPv6 /64
  • GuidRedactor → prefix only XXXXXXXX-****
  • SecretRedactor → erased or fixed mask ****

DI usage

builder.Services.AddConnectSoftCompliance(cfg =>
{
    cfg.UseDefaultRedactors();          // email/phone/ip/jwt/secret/...
    cfg.UseTenantSaltProvider();        // HSM/KMS-backed
    cfg.StrictSecrets();                // drop secrets at logs by default
});

Policy mapping (template excerpt)

policyVersion: 1.0.0
rulesByClass:
  Public:     { kind: None }
  Internal:   { kind: Mask, params: { showFirst: 0, showLast: 2 }, logSafe: true }
  Personal:   { kind: Mask, params: { showFirst: 1, showLast: 1 } }
  Sensitive:  { kind: Hash, params: { algorithm: HMAC-SHA256, tenantSalted: true } }
  Credential: { kind: Drop }
  PHI:        { kind: Mask, params: { showFirst: 0, showLast: 0 } }
overridesByField:
  email:  { kind: Mask, params: { showFirst: 1, showLast: 1, preserveDomain: true } }
  phone:  { kind: Hash, params: { algorithm: HMAC-SHA256, tenantSalted: true } }

Read vs. write semantics

  • Write-time: minimize Credential (drop/hash); optionally precompute masked variants for hot fields (email/phone).
  • Read-time: apply template by caller purpose (default, dsar_export, audit) and role (auditor/standard).
  • Logs: always redacted; secret.* is erased; Sensitive hashed or masked.

Contract-safe logging

[LoggerMessage(EventId = 1201, Level = LogLevel.Information,
    Message = "export.request {TenantId} {RegionCode} {@Query}")]
public static partial void ExportRequested(ILogger logger, string tenantId, string regionCode, [LogRedacted] ExportQuery query);
  • [LogRedacted] leverages schema tags + templates to scrub nested DTOs.

Edge cases & rules

  • Already masked inputs must remain idempotent — masking twice yields same result.
  • Truncation safety: ensure masked strings preserve length/format where validators expect it (e.g., PAN groups).
  • Binary payloads: do not attempt content redaction; require metadata classification or deny.
  • Free text: combine detectors (regex + allowlist) to reduce false positives; default to more restrictive.

Metrics & alerts

  • redaction.applied.count, redaction.dropped.count, tokenize.ops.count, log.pii_blocked.count
  • Alert when:
    • log.pii_blocked.count > 0 (raw PII in logs)
    • tokenization rate spikes > baseline for a tenant
    • hash failures / missing tenant salt

Test vectors (quick)

tests:
  - input: "john.doe@example.com"
    class: Sensitive
    mask:  "j**e@example.com"
    hash:  "hmac256(tenantSalt, value)"           # value changes with salt
  - input: "+1-555-123-4567"
    class: Sensitive
    mask:  "****4567"
  - input: "sk_live_abc123"
    class: Credential
    drop:  true

Guardrails (checklist)

  • Hashing for joins is tenant-salted; never reuse salts across tenants/regions.
  • Credentials are never stored or echoed; keep only present=true or fingerprint.
  • PHI defaults to mask; tokenization requires vault + dual approval.
  • Logs/traces are PII-safe by construction (attributes + redactor pipeline).
  • Redaction policies are versioned & signed; evaluations include policyVersion.

Classification Policy Model

A declarative, versioned policy defines how ATP classifies and redacts fields at write/read/export. Policies are signed, monotonic (no downgrades), and include an effectiveFromUtc for deterministic replay.

# ClassificationPolicy (v1)
id: policy-atp-v1
version: 1
effectiveFromUtc: 2025-01-01T00:00:00Z
defaultByField:
  email: Sensitive
  phone: Sensitive
  ip: Personal
  userId: Personal
  password: Credential
  apiKey: Credential
  jwt: Credential
  healthNote: PHI
rulesByClass:
  Sensitive:
    kind: Hash  # write: store raw + hash; read: return hash for auditors
    params:
      algorithm: HMAC-SHA256
      tenantSalted: true
  Credential:
    kind: Drop  # write: hash only (double SHA-256 fingerprint); read: drop
    params: {}
  PHI:
    kind: Mask  # read: mask for non-privileged; tokenize for analytics
    params:
      showFirst: 0
      showLast: 0
overridesByField:
  email:
    kind: Mask
    params:
      showFirst: 1
      showLast: 1
      preserveDomain: true

Evaluation order (most restrictive wins)

  1. overridesByField (per-field explicit rule)
  2. defaultByField (schema-driven default class)
  3. Heuristics/Detectors (fallback for free-text; may only upgrade)
  4. rulesByClass (class-wide redaction behavior)

Monotonicity is enforced by guards: once a field is Sensitive, it cannot be downgraded to Personal or below.

Semantics & behaviors

  • Sensitive
    • Write: persist raw value and deterministic tenant-salted HMAC for joins.
    • Read (auditor): return hash; Standard roles get mask per export/template.
  • Credential
    • Write: drop raw; persist only a double-SHA-256 fingerprint or present=true.
    • Read: always drop (never returned).
  • PHI
    • Read: mask for non-privileged; tokenize for analytics when allowed by route/purpose.
    • Write: tagged as PHI; raw retained according to regulatory profile (see residency/retention).

Field override example email uses Mask on read with domain preserved, even though the class rule for Sensitive prefers Hash for auditors. This enables recognizable outputs (e.g., j**e@example.com) in approved contexts while preserving auditor paths that still see hashes when required by purpose.

Validation & policy guardrails

  • Required keys: id, version, effectiveFromUtc, rulesByClass.
  • Enums: kind ∈ {None, Hash, Mask, Drop, Tokenize}; dataClass ∈ {Public, Internal, Personal, Sensitive, Credential, PHI}.
  • Hash params must include algorithm and tenantSalted: true for Sensitive/Personal joins.
  • Credential class can only be Drop (optional fingerprint) — no Mask/Hash on read.
  • Effective dating: requests carry X-Policy-Version; service refuses stale/bad versions on enforce stages.

Example decision (PII-safe log)

{
  "ts":"2025-10-29T09:45:00Z",
  "policyVersion":"1",
  "path":"user.email",
  "prevClass":"Personal",
  "newClass":"Sensitive",
  "ruleSource":"defaultByField",
  "redaction":{"kind":"Mask","template":"email(preserveDomain=true)"},
  "tenantId":"7c1a-…",
  "regionCode":"EU",
  "correlationId":"c9f4-…",
  "decision":"allow"
}

Policy ops (lifecycle)

  • Authoring: YAML in VCS; PRs require security/compliance review and CI lint (schema + monotonic checks).
  • Signing: build creates a signed bundle; services verify signature at load.
  • Rollout: draft → canary → enforce with deny/allow drift SLOs; auto-rollback on drift > 2 pp.
  • Migration: class changes trigger background re-classification & optional backfill of pre-computed variants (email/phone masks).

Tests (must pass in CI)

  • Snapshot tests: sample records produce stable redaction outputs by policyVersion.
  • Deny-path: attempts to downgrade or read Credential values fail with reason code.
  • Cross-tenant hashing: identical inputs yield different HMACs across tenants.

Acceptance (done when)

  • Policy bundle validates, signs, and loads; services stamp policyVersion on decisions.
  • Sensitive/Credential/PHI behaviors match the table above in unit + integration tests.
  • Evidence logs show no downgrades, and auditor reads for Sensitive fields return hash/mask per purpose/template.

Cross-references Security & Compliance · Data Residency & Retention · Multitenancy & ABAC Guards · Observability


Write-Time Classification & Redaction

Write-path logic minimizes sensitive values before persistence. Decisions are policy-driven, idempotent, and stamped with policyVersion and provenance (schema/detector/override). Guards enforce monotonic classification and block storage of raw Credential data.

Ingestion pipeline

  1. Producer hints
    The producer may attach optional classification hints in AuditRecord.metadata (e.g., notesClassHint=PHI, emailClassHint=Sensitive). Hints can only upgrade risk.
  2. Policy evaluation (PEP-2)
    The ingestion service loads the active Classification Policy and resolves each field’s dataClass from: schema tags → tenant overrides → producer hints → detectors. Most restrictive wins.
  3. Write-time redaction (selective)
    • CredentialDrop or Fingerprint: erase value (preferred) or persist double-SHA-256 fingerprint; keep present=true.
    • Sensitive (policy-enabled)Precompute variants: persist raw plus masked/hashed variant for hot reads (e.g., email/phone).
    • Personal/Internal/Public → store raw with classification tag (no downgrade allowed).
  4. Persist with metadata
    Store the record and its classification metadata (per-field) for replay and evidence. Emit a PII-safe decision log.

Storage sketch (per field)

{
  "path":"user.email",
  "class":"Sensitive",
  "raw":"john.doe@example.com",
  "masked":"j**e@example.com",
  "hash":"hmac256(tenantSalt, value)",
  "policyVersion":"1.0.0",
  "source":"schema",
  "classifiedAt":"2025-10-29T10:05:00Z"
}

Guard (pseudo)

if (class == DataClass.Credential)
{
    entity.SetFlag("present", true);
    entity.SetFingerprint(DoubleSha256(value));   // optional
    value = null;                                 // raw dropped
}
else if (class == DataClass.Sensitive && policy.PrecomputeVariants)
{
    entity.SetMasked(EmailRedactor.Mask(value));
    entity.SetHash(HmacTenantSalted(value, tenantId));
}

Heuristic classification (fallback)

Used only when no explicit classification is present; outputs include detectorId and confidence. Heuristics cannot downgrade.

  • (?i)\b(email|e-mail)\bSensitive
  • (?i)\b(phone|mobile)\bSensitive
  • (?i)\b(password|secret|api[_-]?key|token)\bCredential
  • (?i)\b(ssn|nin|national[_-]?id)\bSensitive (upgrade to special handling if configured)
  • (?i)\b(ip|client\.ip)\bPersonal
  • (?i)\b(health|diagnosis|vitals)\bPHI

Detector decision (PII-safe log)

{
  "ts":"2025-10-29T10:06:00Z",
  "path":"attributes.user_email",
  "detectorId":"email.regex.v2",
  "confidence":0.99,
  "prevClass":"", 
  "newClass":"Sensitive",
  "policyVersion":"1.0.0",
  "decision":"upgrade"
}

Evidence & idempotency

  • Idempotent writes: reprocessing the same payload yields the same masked/hash outputs by policyVersion.
  • Evidence: every write emits classification.decided (counts by class) and write.minimized (counts by kind: drop/mask/hash).
  • Replay: historical records can be re-evaluated safely when the policy version changes (background re-classification job).

Guardrails (write path)

  • Never persist raw Credential values; keep only presence/fingerprint.
  • Precompute variants only for approved fields (email/phone); avoid storage bloat.
  • Heuristics upgrade only; schema/overrides take precedence.
  • Stamp all decisions with policyVersion and source; store per-field metadata.

Read-Time Redaction & Enforcement

Read-time enforcement applies after auth/authz and before any serialization or export leaves the service boundary. Plans are policy-driven, role/purpose-aware, and stamped with policyVersion.

Query pipeline

  1. Authentication
    Verify caller identity (JWT + optional DPoP/mTLS). Extract tenant_id, region_code, roles, scopes, and purpose.
  2. Authorization (ABAC)
    Enforce tenant/region scope and clearance for requested data classes (Public/Internal/Personal/Sensitive/Credential/PHI) using PEP-2. Deny on cross-tenant/region or insufficient clearance.
  3. Redaction plan selection
    Select a plan from policy based on role + purpose + edition: - Auditor/PrivilegedSensitive returns hash, Credential dropped, PHI masked. - Standard userSensitive masked, Credential dropped, PHI masked/dropped by edition. - Public API → Only Public/Internal (Internal may be masked); all others omitted.
  4. Apply redaction
    Server-side masking/hashing/tokenization executed on the result model before serialization; logs use redacted objects.
  5. Audit meta-access
    Emit PII-safe decision logs: who accessed which classes, policyVersion, purpose, and counts by class.

Clearance matrix (default)

Data Class Public API Standard User Auditor/Privileged
Public allow allow allow
Internal mask/omit mask/allow allow
Personal omit mask mask/hash
Sensitive omit mask hash
Credential omit drop drop
PHI omit mask/drop mask

Break-glass is not available here; it’s governed separately (time-bound, dual-control, fully audited).

Policy sketch (Rego)

package atp.query

default allow = false

can_view[class] {
  input.token.role == "auditor"
  class != "Credential"
}
can_view[class] {
  input.token.role == "user"
  class in {"Public","Internal","Personal","Sensitive","PHI"}
}
redact[class] := action {
  some class
  action := {
    "Public":    "none",
    "Internal":  "mask",
    "Personal":  "mask",
    "Sensitive": input.token.role == "auditor" ? "hash" : "mask",
    "Credential":"drop",
    "PHI":       "mask"
  }[class]
}
allow {
  input.resource.tenantId == input.token.tenant_id
  input.resource.regionCode == input.token.region_code
}

Server-side application (C#)

var plan = redactionPlanner.For(token.Role, token.Purpose, policyVersion);
var result = await repository.QueryAsync(q, token.TenantId, token.RegionCode, ct);

// Apply per-field plan before serialization/logging
var safe = redactor.Apply(result, plan);

return Results.Ok(safe);

Access decision (PII-safe log)

{
  "ts":"2025-10-29T10:25:03Z",
  "policyVersion":"1.0.0",
  "op":"query",
  "tenantId":"7c1a-…",
  "classes": {"Personal":128,"Sensitive":42,"PHI":3,"Credential":0},
  "redaction": {"mask":171,"hash":42,"drop":3},
  "purpose":"default",
  "role":"user",
  "correlationId":"b7a9-…",
  "decision":"allow"
}

Export pipeline

  • Route & purpose checks Exports are in-region by default. Enforce purpose (e.g., dsar_export, audit_attestation) and export route policies; deny cross-region unless explicitly allowed.

  • Redaction + overrides Apply the same read-time plan plus export-specific overrides (e.g., stricter masking for third-party recipients). Credential always dropped.

  • Manifest & evidence Every export includes a signed manifest with policyVersion, counts by data class, redaction summary, and watermarking metadata.

Export manifest (example)

export:
  id: "exp-2025-10-29-001"
  tenantId: "7c1a-…"
  regionCode: "EU"
  purpose: "dsar_export"
  policyVersion: "1.0.0"
  route: "in_region"
  classes:
    Personal: 120
    Sensitive: 38
    PHI: 2
  redaction:
    mask: 140
    hash: 38
    drop: 2
  watermark:
    subject: "auditor@firm.example"
    requestId: "c0f1-…"
  signature: "cosign:…"
  • Data sharing agreements (DSA) When exporting to third parties, enforce DSA-bound templates (e.g., hash-only for Sensitive, mask-only for PHI), apply watermarks, and restrict fields to the minimum required.

Metrics & alerts

  • query.redaction.applied.count{kind=mask|hash|drop}
  • export.manifest.generated.count
  • guard.blocked{reason=class_clearance|route_cross_region|missing_template}
  • Alerts on:
    • export attempted without manifest/template,
    • cross-region export without approval,
    • unexpected rise in drop for Credential (indicates producer leakage).

Guardrails (read/export)

  • Redaction is server-side and pre-serialization; clients never receive raw Sensitive/PHI/Credential.
  • Exports carry a signed manifest with policyVersion and watermark; Credential is always dropped.
  • Purpose/route must be explicit; cross-region routes are deny-by-default.
  • Decision logs are PII-safe and include class counts and redaction summary for evidence.

Policy-as-Code & Versioning

Policies are declarative artifacts (YAML/JSON) kept in version control, signed on release, and referenced at runtime by policyVersion. Old policies are immutable for replay. Any change can trigger background re-classification with idempotent jobs and PII-safe evidence logs.

Repository layout (source of truth)

/policies/classification/
policy-atp-v1.yaml          # current
policy-atp-v0.yaml          # immutable history
schema/classification.v1.json
/signing/
cosign.pub                  # verifier
cosign.keyref               # KMS/HSM key ref (no raw key in repo)
/bundles/
atp-classification-v1.sig   # signed artifact (publish target)

Policy metadata (required fields)

id: policy-atp-v1
version: 1
effectiveFromUtc: 2025-01-01T00:00:00Z
author: dpo@connectsoft.ai
rulesByClass: { ... }           # None|Mask|Hash|Drop|Tokenize
defaultByField: { ... }         # email/phone/ip/...
overridesByField: { ... }       # field-specific behaviors
monotonic: true                 # never-downgrade enforcement

CI/CD (lint → sign → publish)

# .azure-pipelines/policy-release.yml
stages:
- stage: validate
  jobs:
  - job: lint_and_schema
    steps:
      - script: yq eval '.' policies/classification/policy-atp-v1.yaml
      - script: ajv validate -s policies/classification/schema/classification.v1.json \
                             -d policies/classification/policy-atp-v1.yaml
      - script: opa eval -i tests/fixtures/sample.json -d policies/tests/monotonic.rego "data.tests.pass"
- stage: sign
  dependsOn: validate
  jobs:
  - job: cosign_sign
    steps:
      - script: cosign sign-blob --key $(KMS_KEYREF) --output-signature bundles/atp-classification-v1.sig \
                policies/classification/policy-atp-v1.yaml
- stage: publish
  dependsOn: sign
  jobs:
  - job: push_bundle
    steps:
      - script: oras push $(POLICY_REGISTRY)/atp/classification:v1 \
                --artifact-type application/vnd.atp.policy \
                policies/classification/policy-atp-v1.yaml:policy.yaml \
                bundles/atp-classification-v1.sig:policy.sig

Runtime verification

  • Services fetch the bundle by immutable tag (v1) or digest.
  • Verify cosign signature and schema before activating.
  • Stamp all decisions with policyVersion.

Rollout strategy

  • States: draft → canary → enforce.
  • Canary scope: small tenant set per region; measure allow/deny drift and redaction deltas.
  • Abort if: abac.deny.delta_pp > 2, log.pii_blocked.count > 0, or export guard violations.
  • Promote on green metrics; record ADR with evidence links.
rollout:
  canaryTenants: ["t-eu-01","t-us-02"]
  observe:
    - abac.deny.delta_pp <= 2
    - redaction.applied.count.delta <= 10%
    - export.guard.blocked == 0
  abortIf:
    - log.pii_blocked.count > 0
    - residency.cross_region.allow > 0

Immutable history & replay

  • Prior versions remain read-only; used for audit replay to explain past outputs.
  • Replays load the historical bundle and produce deterministic redaction given the timestamp.

Migration (re-classification jobs)

  • Trigger when rulesByClass or overridesByField change.
  • Idempotent: re-running yields identical stored variants by policyVersion.
  • Scoped: per-tenant, per-region; rate-limited to protect hot paths.
  • Evidence: policy.migration.started/completed with counts and failure reasons.
migration:
  reason: "email override: Mask (preserveDomain=true)"
  scope: { regionCode: "EU", tenants: ["*"] }
  batchSize: 500
  throttleRps: 50
  retry: { maxAttempts: 3, backoff: jittered }

Meta-audit of policy changes

Every change is logged to the meta-audit stream:

{
  "ts":"2025-10-29T10:40:00Z",
  "actor":"compliance-approver@connectsoft.ai",
  "action":"policy.update",
  "policyId":"policy-atp-v1",
  "version":1,
  "digest":"sha256:…",
  "sig":"cosign:…",
  "effectiveFromUtc":"2025-11-01T00:00:00Z",
  "justification":"Tighten email masking for UI outputs",
  "adr":"ADR-SEC-0123"
}

Guardrails (quick checklist)

  • Policies are signed, schema-valid, and monotonic.
  • Services refuse unsigned/unknown policyVersion in enforce mode.
  • All decisions include policyVersion; exports embed a manifest with the same.
  • Migrations are throttled, idempotent, and evidenced.
  • Old policy files are immutable; replay uses historical bundles.

Acceptance (done when)

  • CI lints, validates, signs, and publishes the policy; services successfully load & verify.
  • Canary completes with green SLOs; ADR recorded; policyVersion visible in decision logs.
  • Migration (if required) finishes with evidence and zero PII-in-logs alerts.

Cross-references

Security & Compliance · Data Residency & Retention · Least Privilege & Policy Enforcement · Observability


Tenant-Specific Overrides

Overrides let tenants tighten classification/redaction for their data while preserving platform guarantees. They are upgrade-only (never downgrade), scoped to tenantId (and optionally stream/path), approved via a dual-control workflow, and stored in the Tenant Configuration Service with versioned history.

Model & precedence

  • Edition defaults: baseline per edition (e.g., Free = strict masking; Enterprise = configurable within guardrails).
  • Tenant overrides: may only upgrade sensitivity or strengthen redaction (e.g., Mask → Hash, keepLast=4 → keepLast=0).
  • Precedence (most restrictive wins):
    1. Tenant override (approved, active window)
    2. Global policy overridesByField
    3. Global policy defaultByField / detectors
  • Scope: tenantId + optional streamId or jsonPath (e.g., user.email, attributes.medical_notes).

Configuration schema (Tenant Configuration Service)

tenantId: "t-7c1a"
edition: "Enterprise"
overrides:
  - id: "ovr-001"
    path: "user.email"
    class: "Sensitive"              # cannot be lower than baseline
    redaction:
      kind: "Mask"
      params: { showFirst: 1, showLast: 1, preserveDomain: true }
    scope: { streams: ["audit.evidence.*"] }
    effectiveFromUtc: "2025-11-01T00:00:00Z"
    expiresUtc: "2026-11-01T00:00:00Z"
    status: "approved"
    approvals:
      - { actor: "dpo@connectsoft.ai", ts: "2025-10-29T08:10:00Z" }
      - { actor: "security@connectsoft.ai", ts: "2025-10-29T08:12:00Z" }
  - id: "ovr-002"
    path: "attributes.session_token"
    class: "Credential"
    redaction:
      kind: "Drop"                  # escalate to drop at read if missed at write
    status: "approved"
    approvals:
      - { actor: "dpo@connectsoft.ai", ts: "2025-10-29T08:15:00Z" }

Approval workflow

  1. Request (tenant admin) → submits override in portal/API with justification and impact.
  2. Validation (automated) → schema & monotonic checks; path exists; no downgrade.
  3. Dual approval → Compliance (DPO) and Security sign-off (dual-control).
  4. Activation → publish to config store; propagate to services via signed config bundle.
  5. Review/expiry → automatic reminders 30 days before expiresUtc; renewal requires re-approval.

Meta-audit event (PII-safe)

{
  "ts":"2025-10-29T08:12:00Z",
  "tenantId":"t-7c1a",
  "action":"override.approved",
  "overrideId":"ovr-001",
  "path":"user.email",
  "class":"Sensitive",
  "redaction":"Mask(preserveDomain=true)",
  "approvers":["dpo@connectsoft.ai","security@connectsoft.ai"],
  "effectiveFromUtc":"2025-11-01T00:00:00Z",
  "expiresUtc":"2026-11-01T00:00:00Z"
}

Enforcement (guards)

Never-downgrade + most-restrictive-wins (Rego sketch)

package atp.overrides

# ranks for monotonic enforcement
rank := {"Public":0,"Internal":1,"Personal":2,"Sensitive":3,"PHI":4,"Credential":5}

default allow = false

# Merge global policy decision with tenant override; choose most restrictive
effective_class := cls {
  base := input.policyClass
  ovr  := input.tenantOverrideClass
  cls  := (rank[ovr] > rank[base]) ? ovr : base
}

deny[msg] {
  some ovr
  ovr := input.tenantOverrideClass
  rank[ovr] < rank[input.policyClass]
  msg := sprintf("override would downgrade %v → %v", [input.policyClass, ovr])
}

allow { not deny[_] }

Service usage (C#)

var baseline = classifier.Decide(path, baselinePolicy, payload);
var ovr = tenantOverrides.GetFor(tenantId, path);
var effective = MostRestrictive(baseline, ovr); // monotonic check inside

var redacted = redactor.Apply(value, effective.RedactionPlan);

Edition-based defaults (examples)

Edition Personal Sensitive Credential PHI
Free Mask Mask Drop Mask
Business Mask Hash (tenant) Drop + fingerprint Mask/Tokenize
Enterprise Mask Hash or Tokenize Drop + fingerprint Tokenize (vault)

Tenants can tighten (e.g., Free → Hash for Sensitive) but cannot loosen edition defaults.

Caching & rollout

  • Overrides cached per (tenantId, policyVersion) with short TTL (e.g., 60s) and event-driven invalidation on approval/revocation.
  • Services fetch signed override bundles; reject unsigned or stale bundles in enforce mode.
  • Canary flag override.canary=true supports staged activation for high-risk paths.

Testing

  • Unit: override cannot reduce class; redaction plan resolves to most-restrictive outcome.
  • Integration: queries/exports reflect override immediately after cache invalidation; manifests include override IDs.
  • Chaos: simulate missing override bundle; verify fail-closed to baseline (more restrictive wins).

Metrics & SLOs

  • override.active.count{tenant=*}; override.decision.hit.rate; override.downgrade.blocked.count
  • SLO: override propagation P95 < 2m from approval to enforcement.
  • Alerts on:
    • blocked downgrades,
    • unsigned bundle load attempts,
    • cache staleness beyond SLO.

Guardrails (quick checklist)

  • Overrides are upgrade-only, time-bound, and dual-approved.
  • Enforcement is most restrictive wins; baseline policy always applies if override missing.
  • Bundles are signed; services fail closed on verification errors.
  • Exports include override IDs in the manifest; evidence logs capture effective class/redaction source.
  • Expiring overrides trigger renewal or auto-revert to edition defaults.

Integration with ConnectSoft.Extensions.Compliance

ATP services embed ConnectSoft.Extensions.Compliance through the microservice template so classification and redaction are consistent across ingestion, query, export, and logging. The template wires dependency injection, attributes on DTOs, policy loading, and logging sinks that automatically apply redactors.

Template integration (DI & configuration)

Program.cs / Startup.cs

var builder = WebApplication.CreateBuilder(args);

// 1) Compliance services: taxonomy, classifiers, redactors, policy loader
builder.Services.AddConnectSoftCompliance(builder.Configuration, builder.Environment)
    .AddDefaultTaxonomy()                  // ConnectSoftTaxonomy + mappings
    .AddDefaultRedactors()                 // Email/Phone/PAN/JWT/IP/Guid/Secret
    .AddTenantSaltProvider()               // HSM/KMS-backed tenant salt
    .AddPolicyBundles(options =>
    {
        options.ClassificationBundle = builder.Configuration["Compliance:ClassificationBundle"]; // e.g., "registry://atp/classification:v1"
        options.VerifySignatures = true;
    });

// 2) Structured logging with auto-redaction
builder.Services.AddLogging(logging =>
{
    logging.ClearProviders();
    logging.AddJsonConsole();
}).AddRedactionForLogging();               // plugs into Microsoft.Extensions.Logging

var app = builder.Build();
app.MapControllers();
app.Run();

appsettings.json (excerpt)

{
  "Compliance": {
    "ClassificationBundle": "registry://atp/classification:v1",
    "StrictSecrets": true,
    "Logging": { "RedactBodies": true, "MaxValueLength": 256 }
  }
}

Attributes on contracts (DTOs)

Use attributes from ConnectSoft.Extensions.Compliance to declare classes for fields. These guide both write-time minimization and read-time redaction.

public sealed class EvidenceAppendDto
{
    public string Title { get; init; } = "";                 // Public

    [EmailData]            // -> Sensitive (policy controls mask/hash)
    public string? Email { get; init; }

    [PhoneData]            // -> Sensitive (hash or mask)
    public string? Phone { get; init; }

    [SecretData]           // -> Credential (never persisted/logged raw)
    public string? ApiKey { get; init; }

    [HealthData]           // -> PHI (mask/tokenize by role/purpose)
    public string? DiagnosisNote { get; init; }

    // Free-text with an explicit hint; service may upgrade, never downgrade
    [ClassHint(DataClass.Personal)]
    public string? Notes { get; init; }
}

Logging integration (source-generated + redaction)

All structured logs flow through the compliance redactor. Mark sensitive parameters or entire objects; the logger will apply the correct template per policyVersion.

public static partial class EvidenceLogs
{
    [LoggerMessage(EventId = 2101, Level = LogLevel.Information,
        Message = "append.accepted {TenantId} {RegionCode} {@Request}")]
    public static partial void AppendAccepted(
        ILogger logger,
        string tenantId,
        string regionCode,
        [LogRedacted] EvidenceAppendDto request);  // auto-redacts nested fields
}
  • [LogRedacted] leverages DTO attributes and schema tags to redact nested structures.
  • StrictSecrets=true guarantees Credential-class fields are erased in logs.

Taxonomy alignment (ATP ↔ ConnectSoftTaxonomy)

ATP DataClass ConnectSoftTaxonomy Notes (default)
Public Public No redaction
Internal Internal Optional mask in public exports
Personal Email, Phone, PersonName, IpAddress, DeviceId Mask / tenant-salted hash
Sensitive PostalAddress, PaymentCardPan, BankAccount, FinancialId Hash/Mask per policy
Credential Secret, OAuthToken, Jwt, SessionId Drop or fingerprint only
PHI HealthInfo Mask or tokenize

Mapper registration (usually implicit via AddDefaultTaxonomy):

services.AddComplianceTaxonomy(map =>
{
    map.Map(DataClass.Personal,  ConnectSoftTaxonomy.Email, ConnectSoftTaxonomy.Phone, ConnectSoftTaxonomy.PersonName, ConnectSoftTaxonomy.IpAddress, ConnectSoftTaxonomy.DeviceId);
    map.Map(DataClass.Sensitive, ConnectSoftTaxonomy.PostalAddress, ConnectSoftTaxonomy.PaymentCardPan, ConnectSoftTaxonomy.BankAccount, ConnectSoftTaxonomy.FinancialId);
    map.Map(DataClass.Credential,ConnectSoftTaxonomy.Secret, ConnectSoftTaxonomy.OAuthToken, ConnectSoftTaxonomy.Jwt, ConnectSoftTaxonomy.SessionId);
    map.Map(DataClass.PHI,       ConnectSoftTaxonomy.HealthInfo);
});

Redactor reuse (built-ins)

  • EmailRedactorj**e@example.com
  • PhoneLast4Redactor****4567
  • PanLast4Redactor**** **** **** 1234
  • JwtRedactor<jwt>.<redacted>.<redacted> (strict mode: erased)
  • IpAddressRedactor → IPv4 /24, IPv6 /64
  • GuidRedactorXXXXXXXX-****
  • SecretRedactor → erased ("") or ****

Manual use from DI (when needed)

public sealed class ManualMasker(IRedactorFactory factory)
{
    private readonly IRedactor _email = factory.Get("EmailRedactor");
    public string MaskEmail(string email) => _email.Redact(email);
}

Middleware & serialization

  • ASP.NET Core: response serialization runs after redaction; never serialize raw Sensitive/PHI/Credential.
  • System.Text.Json converters in the template apply field-level redaction for [LogRedacted] payloads and for export manifests.
  • OpenTelemetry: compliance integration scrubs attributes/spans; deny adding raw Credential/PHI to spans.

Policy-aware planners

Redaction plans depend on role, purpose, edition, and policyVersion:

var planner = services.GetRequiredService<IRedactionPlanner>();
var plan = planner.For(token.Role, token.Purpose, policyVersion);       // e.g., user/default → mask; auditor → hash
var safe = redactor.Apply(result, plan);                                // pre-serialization application

Unit test example (logs are PII-safe)

[Fact]
public void Logs_erase_credentials()
{
    var dto = new EvidenceAppendDto { Email = "jane@ex.com", ApiKey = "sk_live_123" };
    var logger = TestLogger.CreateWithCompliance(); // template helper

    EvidenceLogs.AppendAccepted(logger, "t-1", "EU", dto);

    var entry = logger.LastJson();
    Assert.DoesNotContain("sk_live_123", entry);
    Assert.Contains("\"Email\":\"j**e@ex.com\"", entry);
}

Guardrails (quick checklist)

  • Register AddConnectSoftCompliance(...) in every service; enforce signed policy bundles.
  • Mark DTOs with compliance attributes; prefer [LogRedacted] for complex objects.
  • Use built-in redactors; do not hand-roll masking for standard types.
  • Ensure logging, tracing, and metrics pipelines pass through the compliance redactor.
  • Redaction planning is server-side; never rely on client masking.

Cross-references

Policy-as-Code & Versioning · Write-Time Classification & Redaction · Read-Time Redaction & Enforcement · Security & Compliance · Data Residency & Retention


Classification Enforcement Mechanisms

Enforcement happens at predictable policy enforcement points (PEPs) with deny-by-default semantics. Guards are fail-closed, evaluated before persistence and before serialization/export. All decisions are deterministic by policyVersion and produce PII-safe evidence.

Guards & Middleware

Ingestion guard (PEP-2 / write)

  • Validates presence of required fields and classification for known schema paths.
  • Applies write-time minimization (Credential drop/fingerprint; Sensitive pre-variants).
  • Blocks downgrades and unknown classifications; upgrades are logged with provenance.
  • Emits classification.decided + write.minimized counters.

Query guard (PEP-2 / read)

  • Enforces tenant/region scope and clearance by data class (role + purpose + edition).
  • Applies server-side redaction before serialization; logs contain only redacted objects.
  • Denies requests lacking purpose or requesting classes above caller clearance.

Export guard (PEP-2 / export)

  • Requires redaction template (except purpose=dsar_export where a DSAR plan applies).
  • Enforces in-region routes by default; cross-region requires explicit approval and watermarking.
  • Produces a signed manifest (policyVersion, class counts, redaction summary, watermark).

ASP.NET Core middleware (sketch)

app.Use(async (ctx, next) =>
{
    var token = await auth.ExtractAsync(ctx);
    if (!purposeValidator.IsValid(token.Purpose))
        throw new GuardViolation("missing_or_invalid_purpose");

    ctx.Items["RedactionPlan"] = planner.For(token.Role, token.Purpose, policy.Version);
    await next();

    // Apply server-side redaction just before write-out
    if (ctx.Items.TryGetValue("ResultBody", out var body))
        ctx.Items["ResultBody"] = redactor.Apply(body, (RedactionPlan)ctx.Items["RedactionPlan"]);
});

Policy Evaluation Engine

Policy decisions are expressed in OPA/Rego and executed within the service (embedded OPA or library evaluation). Inputs include token claims, tenant/region, purpose, policyVersion, edition, and requested classes.

Never-downgrade & most-restrictive-wins

package atp.enforce

rank := {"Public":0,"Internal":1,"Personal":2,"Sensitive":3,"PHI":4,"Credential":5}

default allow = false

deny[msg] {
  input.prevClass != ""
  rank[input.newClass] < rank[input.prevClass]
  msg := sprintf("downgrade blocked %v→%v", [input.prevClass, input.newClass])
}

# effectiveClass = max(prevClass, policyClass, tenantOverrideClass)
effective_class := cls {
  max := function(x,y){ (rank[x] > rank[y]) ? x : y }
  cls := max(input.policyClass, max(input.prevClass, input.tenantOverrideClass))
}

allow {
  not deny[_]
}

Clearance + redaction mapping

package atp.query

can_view[class] {
  input.role == "auditor"
  class != "Credential"
}
can_view[class] {
  input.role == "user"
  class in {"Public","Internal","Personal","Sensitive","PHI"}
}
redact[class] := action {
  action := {
    "Public": "none",
    "Internal": "mask",
    "Personal": "mask",
    "Sensitive": input.role == "auditor" ? "hash" : "mask",
    "Credential": "drop",
    "PHI": "mask"
  }[class]
}

Export guard

package atp.export

default allow = false

deny[msg] {
  input.route.crossRegion
  not input.approval
  msg := "cross-region export without approval"
}

deny[msg] {
  some c
  c := input.classes[_]
  c in {"Sensitive","PHI"}
  input.redactTemplate == ""
  input.purpose != "dsar_export"
  msg := "missing redact template for sensitive export"
}

allow { not deny[_] }

Deterministic evaluation & evidence

  • Determinism: f(input, policyVersion) → (effectiveClass, redactionAction) is pure.
  • Evidence log (PII-safe) includes: policyVersion, source (schema/override/detector), effectiveClass, redactionAction, role, purpose, countsByClass, correlationId.
{
  "ts":"2025-10-29T11:05:42Z",
  "op":"query",
  "tenantId":"t-7c1a",
  "policyVersion":"1.0.0",
  "role":"user",
  "purpose":"default",
  "classes":{"Personal":54,"Sensitive":12,"PHI":1,"Credential":0},
  "redaction":{"mask":55,"hash":12,"drop":1},
  "decision":"allow",
  "correlationId":"c7e4-…"
}

Caching & invalidation

  • Redaction plans are cached per (caller, tenant, edition, policyVersion, purpose) with short TTL (e.g., 60s).
  • Event-driven invalidation on policy bundle rotation or tenant override approvals (signed event from control plane).
  • Fail-closed: if the cache is stale and the policy bundle cannot be verified, revert to more restrictive defaults.
var key = CacheKey.From(token, policy.Version);
var plan = cache.GetOrCreate(key, _ => planner.For(token.Role, token.Purpose, policy.Version));
policy.Events.OnPolicyRotated += (_, v) => cache.RemoveWhere(k => k.PolicyVersion != v);

Failure modes & responses

  • Unknown/unsigned policy503 enforce_unavailable (fail-closed).
  • Missing purpose400 invalid_purpose.
  • Clearance failure403 class_not_permitted.
  • Export without template409 export_template_required.
  • Cross-region without approval403 route_not_permitted.

Metrics & SLOs

  • guard.blocked{reason=*}; plan.cache.hit_rate; policy.bundle.verify.latency
  • SLOs: P95 plan compute < 5 ms; policy rotation propagation < 2 min; guard decision latency < 10 ms.
  • Alerts on:
    • guard.blocked{reason=class_not_permitted} spike,
    • plan.cache.hit_rate < 0.9,
    • unsigned bundle load attempt.

Guardrails (quick checklist)

  • Guards are deny-by-default and fail-closed on policy/override fetch errors.
  • Server-side redaction only; never rely on client masking.
  • Every decision includes policyVersion, role, purpose, and class counts.
  • Cache keys include policyVersion; rotate and invalidate on bundle updates.
  • Export manifests are signed, watermarked, and reference the enforced policyVersion.

Cross-references

Read-Time Redaction & Enforcement · Write-Time Classification & Redaction · Policy-as-Code & Versioning · Security & Compliance


Anonymization vs Pseudonymization

Definitions

  • Anonymization — Irreversible transformation that removes/obliterates identifiers so re-identification is not reasonably possible. Under GDPR, anonymized data is not personal data.
  • Pseudonymization — Replacement of identifiers with stable surrogates (tokens/hashes) so records remain linkable under controlled keys/mappings. Under GDPR, pseudonymized data remains personal data (risk-reduced, stronger controls required).

ATP usage patterns

Anonymization (irreversible)

  • Use double-hash fingerprints: sha256(sha256(value)).
  • No keys retained; only a presence flag or fingerprint for dedup/audit correlation.
  • Typical for Credential class (secrets/tokens) and any field that must never be reconstructed.

Pseudonymization (deterministic & controlled)

  • Use HMAC-SHA256(tenantSalt, value) for joinability within the same tenant.
  • Not mathematically reversible; operational re-identification requires a lookup table or the original value (to re-derive the HMAC).
  • Prevents cross-tenant joins by design (different salts/keys per tenant/region).

Tokenization (reversible under governance)

  • Use format-preserving encryption (FPE, e.g., FF3-1) with tenant+region-scoped KMS/HSM keys.
  • Enables controlled de-tokenization via dual-approval workflows for PHI/financial analytics or DSAR trace-backs.
  • Tokens carry no plaintext hints (e.g., tok:{ns}:{alg}:{ciphertext}).

Compliance alignment

  • GDPR Art. 4(5) — Pseudonymization reduces risk but does not exempt from data protection obligations. ATP treats pseudonymized outputs as still protected, subject to ABAC, purpose limits, and export guards.
  • HIPAA Safe Harbor — De-identification requires removal of 18 identifiers. ATP supports Safe Harbor exports with:
    • explicit identifier drop lists (names, full addresses, device IDs, biometric identifiers, etc.),
    • date generalization (year only), and
    • optional tokenization for operational fields where a reversible surrogate is permitted internally but not shared externally.

Selection guide (when to use what)

Goal Technique Joinability Re-identification Typical Classes
Never recover; prove non-retain Double-hash No No Credential
Tenant-local joins; low risk HMAC (tenant-salted) Yes (local) By lookup only Personal, Sensitive
Reversible under strict control Tokenization (FPE) Yes With KMS + approval PHI, Financial identifiers
Public sharing / Safe Harbor De-ident w/ masking Limited No (if compliant) PHI (export), mixed datasets

Examples

# Anonymize credential
credential:
  store:
    fingerprint: sha256(sha256(value))
    present: true
  read: drop

# Pseudonymize email within tenant boundary
email:
  write:
    hmac: HMAC-SHA256(tenantSalt, value)   # joinable across this tenant only
    masked: "j**e@example.com"            # precomputed for UI
  read:
    standard: masked
    auditor: hmac

# Tokenize PAN for finance analytics
paymentCardPan:
  write:
    token: fpe(ff3-1, key=tenantRegionKey)
  read:
    standard: last4-mask
    analytics: token (no detokenization)
    detokenize:
      require: dual-approval, mTLS, JIT grant, audit log

Guardrails (quick checklist)

  • Use double-hash for anything that must be permanently irrecoverable (e.g., secrets).
  • Use HMAC with tenant-scoped salts for deterministic joins; never share HMACs across tenants/regions.
  • Reserve tokenization for governed workflows; dual-approval and watermarked detokenization only.
  • Safe Harbor exports enforce an identifier drop list and date generalization; tokens are not detokenized outside ATP.
  • Evidence logs capture method (anonymize/pseudonymize/tokenize), policyVersion, and approval IDs (for any detokenization).

Cross-references

Policy-as-Code & Versioning · Read-Time Redaction & Enforcement · Data Residency & Retention · Security & Compliance


Attribute-Based Classification (Dynamic)

Free-Form Attributes

Audit records may include an open-ended attributes bag (string keys → scalar/array/object values). ATP classifies these dynamically using key-based rules and detectors.

  • Key-to-class rules (policy-driven): map attribute names (or regexes) to DataClass. Examples:
    • attributes["user_email"]Sensitive → apply EmailRedactor.
    • attributes["session_token"]Credentialdrop at write, keep present=true.
  • Tenant overrides: tenants can supply a classificationOverrides map (key → DataClass) that may only upgrade sensitivity (never downgrade).
  • Detectors: if no key rule exists, fall back to detectors (regex/statistical/NLP) to infer class; record detectorId and confidence.

Policy excerpt (key rules & detectors)

dynamic:
  keyRules:
    - match: "^user[_-]?email$"     # exact/regex
      class: Sensitive
      redaction: { kind: Mask, params: { showFirst: 1, showLast: 1, preserveDomain: true } }
    - match: "^session[_-]?token$"
      class: Credential
      redaction: { kind: Drop }
    - match: "^(ip|client[_-]?ip)$"
      class: Personal
      redaction: { kind: Mask, params: { ipv4Cidr: 24, ipv6Cidr: 64 } }
  detectors:
    email:   { regex: "(?i)[A-Z0-9._%+-]+@[A-Z0-9.-]+\\.[A-Z]{2,}", class: Sensitive }
    phone:   { regex: "(?i)\\+?[0-9][0-9\\-() ]{7,}",                 class: Sensitive }
    secret:  { regex: "(?i)(api[_-]?key|token|password|secret)",      class: Credential }

Example input → decisions

{
  "attributes": {
    "user_email": "jane.doe@example.com",
    "session_token": "sk_live_123",
    "client_ip": "192.168.1.42",
    "comment": "call me at +1-555-123-4567"
  }
}
[
  {"path":"attributes.user_email","class":"Sensitive","redaction":"Mask(email)","source":"keyRule"},
  {"path":"attributes.session_token","class":"Credential","redaction":"Drop","source":"keyRule"},
  {"path":"attributes.client_ip","class":"Personal","redaction":"Mask(ipv4/24)","source":"keyRule"},
  {"path":"attributes.comment","class":"Sensitive","redaction":"Mask(phone)","source":"detector","detectorId":"phone.regex.v1","confidence":0.97}
]

Service sketch (C#)

foreach (var (k, v) in record.Attributes)
{
    var rule = policy.Dynamic.KeyRules.Match(k);
    var decision = rule is not null
        ? classifier.FromKeyRule(k, v, rule)
        : classifier.FromDetectors(k, v, policy.Dynamic.Detectors);

    applyWriteMinimization(decision);  // drop/hash/mask per decision
    metadata.Add(decision.ToMetadata(policy.Version));
}

Structural Hints

Classification can also target nested JSON paths using a JSONPath-like syntax with wildcards:

  • Exact path: user.profile.emailSensitive (EmailRedactor)
  • Array elements: items[].buyer.email → apply to each element
  • Wildcard: attributes.*.token → any token field under attributesCredential

Policy excerpt (path rules & precedence)

dynamic:
  pathRules:
    - path: "user.profile.email"
      class: Sensitive
      redaction: { kind: Mask, params: { showFirst: 1, showLast: 1, preserveDomain: true } }
    - path: "items[].buyer.email"
      class: Sensitive
      redaction: { kind: Hash, params: { algorithm: HMAC-SHA256, tenantSalted: true } }
    - path: "attributes.*.token"
      class: Credential
      redaction: { kind: Drop }

precedence:
  order: [tenantOverride, pathRule, keyRule, detector]
  monotonic: true   # never-downgrade

Array semantics

  • Rules with [] apply to every element at that position.
  • If element type mismatches (e.g., not an object or missing key), rule is skipped (no downgrade).
  • Per-element decisions are recorded separately in metadata (path includes index).

Metadata example for arrays

{
  "path":"items[3].buyer.email",
  "class":"Sensitive",
  "source":"pathRule",
  "policyVersion":"1.0.0",
  "hash":"hmac256(tenantSalt, value)"
}

Safety, Limits & Abuse Resistance

  • Key budget: cap processed dynamic keys per record (e.g., max 128) to prevent “attribute bombs”.
  • Name length: enforce sane bounds (e.g., key name ≤ 128 chars) and ASCII whitelist to avoid parser tricks.
  • Reserved prefixes: block keys beginning with __, _sys, atp., connectsoft. from user payloads.
  • Allowlist (optional): per-tenant allowlist for attribute keys; unknown keys drop to Internal (masked in exports) unless detectors upgrade.
  • Type guards: binary/blob fields in attributes are not scanned; require explicit classification hints or are rejected.

Precedence & Determinism

  • Most-restrictive wins with the following order: tenantOverride > pathRule > keyRule > detector.
  • Evaluations are pure per policyVersion: same input + same policy → same outputs.
  • All decisions record source, detectorId (if used), and confidence.

Tests (must-have)

  • Key-rule hit beats detector; detector upgrades when no rules exist.
  • Path-rule applies to every array element; missing elements do not cause downgrades.
  • Oversized key names → rejected; too many keys → truncated processing with evidence.
  • Reserved prefix keys are blocked with guard.blocked{reason=reserved_prefix}.

Metrics & Evidence

  • dynamic.keyRule.hit.count, dynamic.pathRule.hit.count, dynamic.detector.hit.count
  • guard.blocked{reason=attribute_bomb|reserved_prefix|oversize_key}
  • Decision logs include per-field path, class, source, policyVersion, and for arrays, the index.

Guardrails (quick checklist)

  • Enforce key budget, name length, and reserved prefixes.
  • Prefer path rules for stable schemas; use key rules for semi-structured bags.
  • Detectors upgrade-only; never rely on them to relax protection.
  • Persist per-field metadata with full path to make audits and replays exact.
  • Keep policy precedence and monotonic enforcement consistent across services.

Cross-references Policy-as-Code & Versioning · Tenant-Specific Overrides · Write-Time Classification & Redaction · Read-Time Redaction & Enforcement


Testing & Validation

Validation spans unit, integration, and chaos layers. All tests are deterministic by policyVersion, PII-safe, and produce evidence artifacts (snapshots, manifests, logs). CI gates on schema/signature validation, monotonicity, and redaction correctness.

Unit tests

Redactor tests

  • Idempotency: masking twice yields same output.
  • Length/format: PAN groups preserved; IPv4 /24, IPv6 /64.
  • Edge cases: empty/null, already-masked, invalid formats, Unicode.
  • Fuzzing: random strings for resilience; ensure no exceptions/leaks.

Classification tests

  • Heuristic patterns (email/phone/secret/ip/PHI) map to expected DataClass.
  • Precedence: tenantOverride > pathRule > keyRule > detector.
  • Monotonic: downgrade attempts are denied and logged.

Policy tests

  • Schema-valid + signature verified before load.
  • Snapshot tests: apply policy-atp-v1 to fixtures; compare redacted JSON snapshots.

MSTest examples

[TestClass]
public sealed class RedactorTests
{
    [TestMethod]
    public void Email_Mask_IsIdempotent()
    {
        var r = Redactors.Email();
        var once = r.Redact("john.doe@example.com");
        var twice = r.Redact(once);
        Assert.AreEqual("j**e@example.com", once);
        Assert.AreEqual(once, twice);
    }

    [TestMethod]
    public void Credential_IsErased_FromLogs()
    {
        var dto = new EvidenceAppendDto { ApiKey = "sk_live_123" };
        var logger = TestLogger.WithCompliance(strictSecrets: true);

        EvidenceLogs.AppendAccepted(logger, "t-1", "EU", dto);
        var entry = logger.LastJson();

        StringAssert.DoesNotMatch(entry, new Regex("sk_live_123"));
        StringAssert.Contains(entry, "\"ApiKey\":\"\"");
    }
}

[TestClass]
public sealed class ClassificationPolicyTests
{
    [TestMethod]
    public void Monotonic_Downgrade_IsBlocked()
    {
        var engine = PolicyEngine.LoadSigned("policy-atp-v1");
        var result = engine.TryTransition(prevClass: "Sensitive", newClass: "Personal", out var reason);
        Assert.IsFalse(result);
        StringAssert.Contains(reason, "downgrade");
    }
}

Integration tests

Ingestion (write-path)

  • Submit a payload containing email, phone, session_token, diagnosis.
  • Expect:
    • session_token (Credential) dropped; present=true, fingerprint optional.
    • email (Sensitive) stored raw and masked/hash variants (when policy enabled).
    • Per-field metadata persisted with policyVersion, source, classifiedAt.

Query (read-path)

  • Same record, query with Standard vs Auditor role.
  • Expect:
    • Standard: email masked, PHI masked, Credential dropped.
    • Auditor: email hash, PHI masked, Credential dropped.
  • Logs contain only redacted objects; access decision includes class counts.

Export

  • Generate DSAR export; verify:
    • Signed manifest (policyVersion, class counts, redaction summary).
    • Route is in-region; watermark present.
    • No raw Credential/PHI in payload.

MSTest example (end-to-end)

[TestMethod]
public async Task ReadPath_StandardVsAuditor_ProducesDifferentRedaction()
{
    var id = await IngestionFixture.AppendAsync(new {
        user = new { email = "jane@ex.com" },
        attributes = new { session_token = "sk_live_123" , client_ip = "192.168.1.42" },
        diagnosis = "PHI: mild"
    });

    var std = await QueryFixture.GetAsync(id, role: "user");
    var aud = await QueryFixture.GetAsync(id, role: "auditor");

    Assert.AreEqual("j**e@ex.com", std.user.email);
    StringAssert.Matches(aud.user.email, new Regex("^[a-f0-9]{64}$")); // HMAC
    Assert.IsFalse(JsonContains(std, "sk_live_123"));
    Assert.IsFalse(JsonContains(aud, "sk_live_123"));
}

Chaos testing

Policy version skew

  • Old client sends X-Policy-Version: 0 while service enforces v1.
  • Expect graceful deny or auto-upgrade to v1 (configurable), never downgrade.

Missing classification

  • Omit hints and schema tags; rely on detectors.
  • Expect upgrade-only classification and safe defaults (e.g., Internal masked) if nothing matches.

Malformed redaction

  • Feed already-masked inputs and strange formats.
  • Expect idempotent outputs; no exceptions; no leakage to logs.

Tokenization / KMS unavailability

  • Simulate vault key unavailable.
  • Expect fail-closed: mask instead of tokenizing; emit alert.

Residency guard interaction

  • Attempt cross-region export without approval.
  • Expect block with reason route_not_permitted.

SpecFlow (illustrative)

Scenario: Cross-region export without approval is blocked
  Given a tenant with region "EU"
  And an export request routed to "US"
  When I generate the export without approval
  Then the response status is 403
  And the error reason is "route_not_permitted"
  And no export manifest is produced

Fixtures & snapshots

Sample fixture

user:
  email: jane.doe@example.com
attributes:
  session_token: sk_live_123
  client_ip: 192.168.1.42
diagnosis: "PHI: mild"

Snapshot layout

/tests/__snapshots__/
  policy-v1.ingest.json
  policy-v1.query.standard.json
  policy-v1.query.auditor.json
  policy-v1.export.dsar.manifest.yaml

Metrics & CI gates

  • Metrics asserted in tests:
    • log.pii_blocked.count == 0
    • redaction.applied.count{kind=mask|hash|drop} > 0 for relevant paths
    • plan.cache.hit_rate >= 0.9 (in integration runs)
  • CI gates:
    • Policy schema + signature verify step passes.
    • Snapshot drift requires approval (PR label: policy-change).
    • Coverage thresholds (example): Statements ≥ 80%, Branches ≥ 70%.

Acceptance (done when)

  • Unit + integration + chaos suites green across services.
  • Snapshots recorded for policyVersion and stored as artifacts.
  • No raw Credential/PHI in test logs; all exports include signed manifests.
  • Downgrade attempts are blocked and evidenced; detectors behave upgrade-only.

Cross-references

Policy-as-Code & Versioning · Write-Time Classification & Redaction · Read-Time Redaction & Enforcement · Security & Compliance · Data Residency & Retention


Performance & Scalability

Redaction/classification must scale linearly with event volume while keeping latency predictable and cost bounded. ATP combines lazy techniques (compute on demand) with pre-computed hot-path variants, strict caching, and streaming application to avoid large intermediate allocations.

Write-path optimization

  • Lazy redaction (default)

    • Persist raw + classification tag; compute masks/hashes at read/export.
    • Best for cold/rarely-read fields and when policies evolve frequently.
    • Reduces write CPU; avoids backfills on policy tweaks.
  • Pre-compute for hot fields

    • For high-frequency lookups (e.g., email, phone) store masked / hmac alongside raw.
    • Toggle via policy flag (rulesByClass.*.precompute: true) or per-field override.
    • Avoids repeated crypto on read, improving P95 read latency.
  • Batch classification for backfills

    • Run deterministic, idempotent workers (KEDA/Hangfire/Functions) that:
      • fetch records by (tenantId, regionCode, updatedAt) windows,
      • classify in stable order (e.g., recordId ASC) to keep snapshots reproducible,
      • checkpoint with exactly-once semantics (idempotency keys).
    • Parallelize by tenant shard; cap concurrency per tenant to prevent hot-spotting.
  • Policy caching

    • Cache policy bundle + parsed decision tables in-memory with digest keys.
    • Warm on startup; event-driven invalidation on PolicyRotated.
    • Keep a read-only previous bundle for graceful rollbacks.

Strategy matrix

Field frequency Policy churn Strategy
Hot (p95 read) Low Pre-compute variants
Hot High Lazy + short TTL caches
Warm Any Lazy
Cold Any Lazy

Read-path optimization

  • Redaction-plan cache
    • Key: (tenantId, role, purpose, edition, policyVersion).
    • TTL: 60–120s with event invalidation on policy/override change.
    • Store the compiled plan (delegates) to avoid per-call rule resolution.
var key = (token.TenantId, token.Role, token.Purpose, token.Edition, policy.Version);
var plan = cache.GetOrCreate(key, entry =>
{
    entry.SlidingExpiration = TimeSpan.FromMinutes(2);
    return planner.For(token.Role, token.Purpose, policy.Version); // precompiled steps
});
  • Streaming redaction
    • Apply masking/hashing during serialization to avoid materializing large objects.
await using var stream = ctx.Response.BodyWriter.AsStream();
await using var writer = new Utf8JsonWriter(stream, new JsonWriterOptions { Indented = false });
redactor.WriteRedacted(writer, result, plan); // writes field-by-field, no big buffers
await writer.FlushAsync();
  • Projection-first queries

    • Fetch only needed fields already redacted by the storage engine (e.g., computed masked_email column) to minimize transfer and CPU.
    • Use read replicas in-region to isolate heavy export scans from hot read APIs.
  • Crypto pooling

    • Pool HMACSHA256 instances (per key) and reuse buffers; avoid per-call allocations.

Monitoring

  • Core metrics
    • redaction.ops.count{kind=mask|hash|drop|tokenize}
    • plan.cache.hit_rate (target ≥ 0.90)
    • policy.eval.latency.p95 (target ≤ 5 ms)
    • serialization.stream.bytes and stream.flush.latency.p95
    • unclassified.field.rate (alarm if > 0.1% of fields)
    • policy.version.skew.count (client vs enforce)
  • SLOs (defaults)
    • Read API: P95 end-to-end ≤ 120 ms, P99 ≤ 250 ms
    • Export job: sustained throughput ≥ 50k records/min per shard
    • Policy rotation propagation: ≤ 2 minutes to 95% of instances
  • Alerts
    • unclassified.field.rate > 0.1% → investigate producer/schema drift.
    • plan.cache.hit_rate < 0.9 → cache sizing/regression.
    • policy.version.skew.count > 0 (sustained) → client upgrade or compat shim.

Guardrails (quick checklist)

  • Prefer lazy redaction unless the field is a confirmed hot path.
  • Limit pre-computed variants to whitelisted fields (email/phone) to avoid storage bloat.
  • Cache redaction plans by (tenant, role, purpose, policyVersion); invalidate on events.
  • Stream redaction before bytes hit the wire; never allocate giant intermediate models.
  • Keep crypto tenant-keyed and pooled; avoid per-request key fetches (use KMS key handles).
  • Partition batch jobs by tenant/region; cap per-tenant concurrency to prevent noisy neighbors.

Cross-references

Read-Time Redaction & Enforcement · Write-Time Classification & Redaction · Policy-as-Code & Versioning · Data Residency & Retention


Governance & Continuous Improvement

A lightweight but rigorous governance loop keeps classification/redaction accurate, auditable, and adaptable. Reviews are calendar-driven with evidence artifacts, changes flow via ADRs + signed bundles, and improvements are guided by incidents, tenant needs, and auditor input.

Policy review cadence

  • Quarterly (taxonomy & drift)
    • Validate current data classes against platform usage; add/rename only with ADR.
    • Review unclassified.field.rate, detector false positives/negatives, and override trends.
    • Deliverables: Review minutes, policy diff plan, updated risk register entries.
  • Annual (effectiveness & risk)
    • Re-assess re-identification risk (linkage attacks) and masking efficacy.
    • Run Safe Harbor export audits; verify tokenization governance and detokenization approvals.
    • Deliverables: effectiveness report, control tests, auditor-ready evidence pack.
  • Ad-hoc (regulatory/event-driven)
    • Triggered by law changes (e.g., new PII definitions) or high-severity incidents.
    • SLA: proposal ≤ 5 business days, canary ≤ 15 days, enforce ≤ 30 days (region-dependent).

Feedback loops

  • Incident-driven
    • Any classification/redaction miss → post-incident review with corrective actions:
      • policy tweak, new detector or path rule, or producer schema contract update.
    • Requires an ADR and a signed policy bundle release; link to meta-audit event.
  • Tenant requests
    • Tenants may request upgrades (never downgrade) via portal/API.
    • Dual approval (DPO + Security). SLA: P95 ≤ 2 business days to decision.
  • Auditor feedback
    • Recommendations become backlog items with explicit control mapping (GDPR/HIPAA/SOC2).
    • Track to completion with evidence (tests, manifests, dashboards).

Change management workflow

Propose → Assess → Approve → Rollout → Verify → Record
|        |         |         |          |        |
ADR   Risk/Impact  DPO+Sec   Canary     SLOs     Meta-audit
& Test Plan  sign-off  + Enforce  green    + Evidence
  • Gate checks: schema-valid, signature-ready, monotonicity pass, replay determinism.
  • PR checklist: policy-change label, snapshots updated, canary tenants listed, rollback plan.

Metrics & SLOs

  • Governance
    • policy.review.completed.count (quarterly/annual)
    • policy.change.lead_time.p95 (proposal → enforce) — target ≤ 30d
    • override.approval.time.p95 — target ≤ 2d
  • Quality
    • unclassified.field.rate — target < 0.1%
    • detector.fp_rate / detector.fn_rate — tracked per detector
    • export.manifest.missing.rate0
  • Resilience
    • policy.rotation.propagation.p95≤ 2m
    • downgrade.blocked.count — should be ≥ 0 (healthy guard), investigate spikes

Roadmap

  • AI-assisted classification
    • Content-aware models (PII/PHI detection beyond key names) with human-in-the-loop review and explainability; models run in shadow mode before enforce.
  • Quantum-safe hashing
    • Evaluate SHA-3/Keccak families and plan for post-quantum KDFs for long-lived evidence; dual-write fingerprints during migration.
  • Differential privacy
    • Add DP mechanisms for aggregations over Personal/Sensitive; privacy budgets per tenant/export purpose.
  • Policy simulation & what-if
    • Pre-enforce simulators to estimate mask/hash/drop deltas, export impact, and override conflicts.
  • Formal verification (select rules)
    • Prove monotonicity and most-restrictive-wins invariants on core Rego modules.

Artifacts (evidence)

  • Review minutes (Quarterly/Annual), ADR links, signed policy bundles, test reports, dashboards screenshots, and export manifests with policy versions.

Guardrails (quick checklist)

  • Reviews produce actionable diffs or explicit “no change” decisions with rationale.
  • All changes flow via ADR + signed bundle, never ad-hoc toggles.
  • Canary before enforce; auto-rollback on drift or guard spikes.
  • Incidents always create feedback tasks; track to closure with tests/evidence.
  • Metrics and SLOs visible on a Compliance Dashboard (tenant & region scoped).

Cross-references

Security & Compliance · Policy-as-Code & Versioning · Tenant-Specific Overrides · Data Residency & Retention · Operations / Observability


Appendix A — ConnectSoft Taxonomy Mapping

ATP DataClass ConnectSoftTaxonomy Default Redactor
Personal Email EmailRedactor
Personal Phone PhoneLast4Redactor
Personal PersonName Mask (first initial)
Personal IpAddress IpAddressRedactor
Personal DeviceId (GUID) GuidRedactor
Sensitive PostalAddress Mask (city only)
Sensitive PaymentCardPan PanLast4Redactor
Sensitive BankAccount Drop or Tokenize
Sensitive FinancialId Hash (tenant-salted)
Credential Secret SecretRedactor (erase)
Credential OAuthToken SecretRedactor (erase)
Credential Jwt JwtRedactor (header only)
Credential SessionId Drop
PHI HealthInfo Mask or Tokenize

Appendix B — Redaction Examples

Email

  • Input: john.doe@example.com
  • Masked: j**e@example.com (EmailRedactor)
  • Hashed: a3f8b...9c2e1 (SHA-256, tenant-salted)

Phone

  • Input: +1-555-123-4567
  • Masked: ****4567 (PhoneLast4Redactor)

Payment Card PAN

  • Input: 4532-1234-5678-9010
  • Masked: **** **** **** 9010 (PanLast4Redactor)

JWT

  • Input: eyJhbGci....<payload>.<signature>
  • Masked: <jwt>.<redacted>.<redacted> (JwtRedactor)
  • Strict: (erased in dev strict mode)

IP Address

  • Input: 192.168.1.42
  • Masked: 192.168.1.x (IpAddressRedactor, /24)
  • Input: 2001:0db8:85a3::8a2e:0370:7334
  • Masked: 2001:db8:85a3::/64

GUID

  • Input: 550e8400-e29b-41d4-a716-446655440000
  • Masked: 550e8400-**** (GuidRedactor, prefix only)

Secret

  • Input: sk_live_abc123...
  • Masked: (SecretRedactor, erased) or **** (FixedMaskRedactor)

Appendix C — Classification Policy Schema (JSON)

{
  "$schema": "https://connectsoft.ai/schemas/classification-policy.v1.json",
  "id": "policy-atp-v1",
  "version": 1,
  "effectiveFromUtc": "2025-01-01T00:00:00Z",
  "author": "dpo@connectsoft.ai",
  "defaultByField": {
    "email": "Sensitive",
    "phone": "Sensitive",
    "ip": "Personal",
    "userId": "Personal",
    "password": "Credential",
    "apiKey": "Credential",
    "jwt": "Credential",
    "healthNote": "PHI"
  },
  "rulesByClass": {
    "Public": { "kind": "None", "params": {} },
    "Internal": { "kind": "None", "params": {} },
    "Personal": { "kind": "Mask", "params": { "showFirst": 1, "showLast": 1 } },
    "Sensitive": { "kind": "Hash", "params": { "algorithm": "HMAC-SHA256", "tenantSalted": true } },
    "Credential": { "kind": "Drop", "params": {} },
    "PHI": { "kind": "Mask", "params": { "showFirst": 0, "showLast": 0 } }
  },
  "overridesByField": {
    "email": { "kind": "Mask", "params": { "showFirst": 1, "showLast": 1, "preserveDomain": true } }
  }
}

Appendix D — Heuristic Classification Patterns

Pattern (Regex) Inferred DataClass Notes
(?i)\b(email\|e-mail)\b Sensitive Apply EmailRedactor
(?i)\b(phone\|mobile\|msisdn)\b Sensitive Normalize E.164; PhoneLast4Redactor
(?i)\b(ssn\|nin\|national[_-]?id)\b Sensitive Consider upgrade to special handling
(?i)\b(password\|secret\|api[_-]?key\|token\|credential\|bearer)\b Credential Drop value; keep indicator present=true
(?i)\b(ip\|client\.ip\|remote[_-]?addr)\b Personal IpAddressRedactor
(?i)\b(gps\|geo\.(lat\|lon)\|location)\b Sensitive Quantize (2 decimals) + tokenize or drop
(?i)\b(name\|first[_-]?name\|last[_-]?name\|full[_-]?name)\b Personal Mask on read; optional hash
(?i)\b(health\|diagnosis\|vitals\|patient)\b PHI HIPAA-compliant masking/tokenization

Appendix E — Read-Time Redaction Matrix

DataClass Privileged (Auditor) Standard (Tenant User) Public API
Public None None None
Internal None Mask (optional) Omit
Personal Mask or Hash Mask Omit
Sensitive Hash (tenant-salted) Mask or Drop Omit
Credential Drop Drop Omit
PHI Mask or Tokenize Mask or Drop Omit

Appendix F — Cross-Reference Map

Topic Primary Implementation Doc Notes
Classification policy model architecture/data-model.md §Classification Schema definitions
Write-time redaction architecture/hld.md §Data Classification Ingestion pipeline
Tenant isolation & ABAC platform/multitenancy-tenancy.md Role-based redaction
Encryption & key management data-residency-retention.md §5 Tenant-scoped keys for HMAC
Privacy compliance platform/privacy-gdpr-hipaa-soc2.md GDPR/HIPAA alignment
Logging redaction operations/observability.md Structured logging with compliance
Template integration implementation/template-integration.md ConnectSoft microservice setup