PII Redaction & Classification - Audit Trail Platform (ATP)¶
Minimize exposure at write, enforce least-privilege at read — ATP protects sensitive data through classification and deterministic redaction.
Purpose & Scope¶
- Define ATP's data classification taxonomy aligned with ConnectSoft.Extensions.Compliance.
- Specify redaction policies per data class (write-time vs read-time).
- Describe enforcement mechanisms across ingestion, storage, query, and export.
- Integration with ConnectSoft microservice template patterns and compliance extensions.
Classification Model & Taxonomy¶
Data Classes (canonical)¶
| Class | Examples | Default Handling |
|---|---|---|
public |
doc titles, generic counters | no redaction |
internal |
service diagnostics, non-PII metadata | minimize on logs |
restricted |
tenant ids, correlation ids, access routes | hash in logs, mask on export (non-DSAR) |
secret.pi |
email, phone, postal address, national IDs | hash or tokenize; drop where not needed |
secret.phi |
medical history, lab results | tokenize or redact; DSAR exception path |
secret.keys |
tokens, secrets, key material (never persisted in payloads) | never log; block storage by policy |
Field Catalog (schema-tag driven)¶
- Source of truth: Schema Registry (per bounded context), fields tagged with:
class,detector,redactionTemplate,purposeHints[],allowedExports[]
- Runtime augmentation: detectors (regex/statistical/ML) confirm & enrich tags for free-text fields (comments, notes).
Schema tag example
field: "client.email"
class: "secret.pi"
detector: ["email.regex.v2"]
redactionTemplate: "pii-default/email"
purposeHints: ["contact", "notification"]
allowedExports: ["dsar"]
Redaction Policy Model¶
Modes¶
- Write-time minimization: before commit to hot store (preferred).
- Read-time redaction: applied by PEP-2 at query/export boundaries.
- Log-time redaction: always on for structured logs/metrics/traces.
Templates (declarative)¶
policyVersion: 1.0.0
redactTemplates:
pii-default:
email: { mode: hash, salt: tenant } # sha256(tenantSalt||value)
phone: { mode: hash, salt: tenant }
address: { mode: drop }
ssn: { mode: mask, keepLast: 4 }
phi-default:
diagnosis: { mode: tokenize, vault: "tok/pii" }
notes: { mode: drop-if-not-purpose, purpose: "clinical" }
logPolicy:
drop: ["secret.keys", "secret.phi"] # never log
hash: ["secret.pi", "restricted"]
Determinism: hashing must be tenant-salted. Tokenization uses a jurisdiction-local vault and reversible flows require dual approval.
Enforcement Across the Core Path¶
flowchart LR
A[Client/API] -->|JWT+mTLS| G[Gateway • PEP-1]
G --> I["Ingestion • PEP-2 (write-time minimization)"]
I --> H[(Hot/WORM)]
H --> Q["Query • PEP-2 (read-time redaction)"]
Q --> E["Export • PEP-2 (template + route guard)"]
subgraph Control Plane
P[Policy/OPA]
R[Schema Registry]
V[Tokenization Vault]
end
G --- P
I --- P
Q --- P
E --- P
I --- R
Q --- R
E --- V
PEP-1 (Gateway)
- Rejects calls lacking
purpose,tenant_id,region_code. - Annotates requests with signed
X-Policy-*headers (no PII).
PEP-2 (Services)
- Write path: apply minimization by template & schema tags before append.
- Read path: apply redaction per class, purpose, and export route.
- Export path: require redact template unless
purpose=dsar_export.
Detectors & Connectors¶
- Regex/Rule detectors: email/phone/SSN/IBAN patterns (versioned).
- Statistical: entropy + dictionary checks for IDs; low false positive budget.
- NLP (optional): PHI entities (medication, condition) in free text.
- ConnectSoft.Extensions.Compliance providers:
IRedactor(hash/mask/drop/tokenize),ILogRedactor,IClassifier.- Built-ins:
EmailRedactor,PhoneRedactor,SecretScrubber,TenantSaltProvider.
C# integration (logging-safe)
// Injected from ConnectSoft.Extensions.Compliance
public sealed class EvidenceLogger(ILogRedactor redactor, ILogger<EvidenceLogger> log)
{
public void LogAccepted(AppendRequest req)
{
var safe = redactor.RedactObject(req); // respects schema tags + templates
log.LogInformation("append.accepted {@request}", safe);
}
}
Tokenization & Vault¶
- Use when: business processes require reversible lookups (e.g., dedup, DSAR trace-backs).
- Requirements:
- Vault keys are tenant-scoped, region-anchored; mTLS + workload identity.
- No direct export of tokens → plain; only with
dual-approvalandpurpose=dsar_export.
- Format:
tok:{namespace}:{alg}:{ciphertext}(no raw hints).
Policy-as-Code (OPA/Rego)¶
Export guard (PII/PHI)
package atp.export
default allow = false
deny[msg] {
input.resource.class in {"secret.pi","secret.phi"}
input.token.purpose != "dsar_export"
msg := "export purpose not permitted for secret class"
}
deny[msg] {
input.resource.class == "secret.pi"
input.export.redactTemplate == ""
msg := "missing redact template for PII export"
}
allow { not deny[_] }
Write-minimize guard
package atp.write
deny[msg] {
input.op == "append"
some f
f := input.payload.fields[_]
data.schema[f].class in {"secret.pi","secret.phi"}
not input.payload.minimized[f]
msg := sprintf("field %s not minimized", [f])
}
Observability & Evidence¶
- Metrics:
redaction.applied.count,redaction.dropped.count,classification.detected.count,tokenize.ops.count,log.pii_blocked.count. - Decision log (PII-safe):
{
"ts":"2025-10-29T08:45:15Z",
"policyVersion":"1.0.0",
"decision":"allow",
"op":"export",
"class":"secret.pi",
"redactTemplate":"pii-default",
"tenantId":"7c1a-…",
"regionCode":"EU",
"purpose":"dsar_export",
"correlationId":"6b3f-…"
}
- Evidence packs: redact template snapshot, schema tag versions, vault key lineage (ids only), OPA bundle
policyVersion.
Developer Experience & Contracts¶
- SDK helpers (ConnectSoft.Extensions.Compliance):
RedactObject<T>(obj, purpose),RedactJson(json, template),HashDeterministic(value, tenantSalt).
- API contracts:
- Request/response DTOs must carry classification hints for free-text fields (e.g.,
contentClassHint="secret.phi"). - Services must not echo raw values for classes
secret.*in errors or logs.
- Request/response DTOs must carry classification hints for free-text fields (e.g.,
DTO hint example
Test Matrix (samples)¶
| Scenario | Expect |
|---|---|
| Append with email and phone (no purpose) | Deny @ PEP-1 (missing purpose) |
| Append with email → write-minimize enabled | Stored as hash/token only |
Query warm model → class secret.pi, purpose default |
Fields masked/hashed |
| Export DSAR (tenant EU) | Allow + pii-default template + in-region |
Export without template, class secret.pi, purpose not DSAR |
Deny |
| Logs include PHI | Block + log.pii_blocked alert |
Risks & Mitigations¶
- Detector drift / false negatives → versioned detectors, canary audits, periodic sampling of free-text with manual review.
- Inconsistent templates → OPA lint in CI to validate schema↔template mappings.
- Token vault misuse → dual-approval for detokenization; rate limits; SIEM alerts on anomaly.
- Developer bypass → SDK shims at HTTP layer; automated PII linters in CI; contract tests on deny paths.
Artifacts Produced¶
platform/pii-redaction-classification.md(this doc).- Redaction templates:
policies/redaction/*.yaml. - Schema tags updates across services.
- OPA bundles:
bundles/atp.export.rego,bundles/atp.write.rego. - Compliance SDK samples:
/samples/compliance/RedactionPlayground.
Acceptance (Done When)¶
- Every service exposes a classification map (fields → classes) and passes deny-path contract tests.
- Write-time minimization is enabled for
secret.*in ingestion. - Exports of
secret.*require purpose and template; DSAR path verified end-to-end. - Logs/traces are PII-safe with measurable
log.pii_blocked.count == 0(steady state). - Evidence packs include policyVersion, schema tag checksums, and OPA signatures.
Cross-References¶
- Security & Compliance (control framework, threat model) →
platform/security-compliance.md - Data Residency & Retention (export routes, DSAR, holds) →
platform/data-residency-retention.md - Tenancy & ABAC Guards (purpose binding, region checks) →
platform/multitenancy-tenancy.md - Observability (SIEM signals, dashboards) →
operations/observability.md - Key Management (vault, hashing salts, jurisdiction) →
hardening/key-rotation.md
Guardrails (quick checklist)¶
- Apply write-time minimization for
secret.*before append. - Require purpose and redact template on all sensitive reads/exports (except DSAR purpose).
- Logs/traces never contain raw
secret.*; use tenant-salted hashes. - Tokenization is jurisdiction-local, rate-limited, and dual-approved for detokenization.
- Policies and detectors are versioned, signed, and tested in CI with deny-path coverage.
Classification Taxonomy Overview¶
ATP uses a platform-wide classification standard built on the Microsoft.Extensions.Compliance foundation and surfaced via ConnectSoft.Extensions.Compliance. Classification is deterministic, versioned, and monotonic (risk can be elevated but never downgraded). Every classified value carries metadata used for guards, redaction, and evidence.
Canonical Classes¶
| Class | Meaning (short) | Typical Examples |
|---|---|---|
| Public | Non-sensitive | Doc titles, non-PII tags |
| Internal | Operational metadata; low risk | requestId, instanceId, pod names |
| Personal | Identifiable, lower risk | Display name, city, device ID, IP address |
| Sensitive | Higher-risk PII/financial | Email, phone, postal address, last4 PAN |
| Credential | Secrets/tokens/keys — never store raw | API keys, OAuth tokens, passwords, JWTs |
| PHI | Health information (regulated) | Patient IDs, diagnosis, vitals |
Services may upgrade (e.g., Personal → Sensitive) but must never downgrade a class.
Metadata Model (attached to fields)¶
Classification is persisted alongside values and included in decision/evidence logs.
{
"path": "user.email",
"dataClass": "Sensitive",
"source": "schema|detector|override",
"detectorId": "email.regex.v2",
"confidence": 0.99,
"policyVersion": "1.0.0",
"upgradedFrom": "Personal",
"upgradedBy": "ingestion-guard",
"classifiedAt": "2025-10-29T09:15:00Z"
}
- path: JSON path/property name.
- dataClass: one of the canonical classes above.
- source: how it was decided (schema tag, detector, tenant override).
- policyVersion: ties to policy bundle; enables replay.
- upgradedFrom / upgradedBy: present only on upgrades.
- confidence: detectors provide a score (schema/override = 1.0).
Monotonic Classification (never-downgrade guard)¶
- Allowed transitions:
Public → Internal → Personal → Sensitive → PHI, plus* → Credential(terminal). - Forbidden: any transition that reduces sensitivity.
Policy-as-code (Rego sketch)
package atp.classification
default allow = false
# allowed escalation chain
rank = {"Public":0,"Internal":1,"Personal":2,"Sensitive":3,"PHI":4,"Credential":5}
deny[msg] {
input.prevClass != ""
rank[input.newClass] < rank[input.prevClass]
msg := sprintf("downgrade blocked: %v → %v", [input.prevClass, input.newClass])
}
allow {
not deny[_]
}
Foundation & Alignment¶
- Microsoft.Extensions.Compliance provides base taxonomy, redactors, and logging hooks.
- ConnectSoft.Extensions.Compliance adds:
- Attribute model for DTOs (e.g.,
[EmailData],[SecretData],[HealthData]). - Deterministic hashing utilities with tenant-scoped salts.
- Policy loader & cache keyed by
policyVersion.
- Attribute model for DTOs (e.g.,
DTO example (attributes)
public sealed class AppendUserEventDto
{
[EmailData] public string Email { get; init; } = "";
[PhoneData] public string? Phone { get; init; }
[SecretData] public string? OAuthToken { get; init; } // never logged
[HealthData] public string? DiagnosisNote { get; init; }
public string? City { get; init; } // Personal by schema tag
}
Classification Sources & Precedence¶
- Schema tags (authoritative, versioned with contracts)
- Tenant overrides (may only upgrade)
- Detectors (regex/statistical/NLP) for free text & unknown fields
Precedence is most restrictive wins. All decisions record source and policyVersion.
Storage & Evidence¶
- At rest: store raw (except Credential, which is dropped/hashed) + classification metadata.
- In logs: only metadata and redacted values; never raw for
Credential/PHI. - Evidence: classification decisions stream (PII-safe) includes
path,dataClass,source,policyVersion, and upgrade info.
Quick Checklist¶
- Classes use canonical names above; no ad-hoc labels.
- Decisions carry
policyVersionandsource; detectors must includedetectorId. - No downgrades — enforced by guard; upgrades are logged with provenance.
- DTOs/Contracts marked with compliance attributes; logs use compliance redactors.
- Credential values are never persisted or logged in raw form.
Data Class Definitions¶
| Data Class | Meaning | Examples | Write-Time Action | Read-Time Action |
|---|---|---|---|---|
| Public | Non-sensitive; safe to expose | Action verbs, non-PII tags | None | None |
| Internal | Operational metadata; limited exposure | requestId, instanceId, pod names |
None | Mask (optional) |
| Personal | PII light; identifiable but lower risk | Display name, city, device ID | None (store raw) | Hash or Mask |
| Sensitive | PII/financial; strict protection | Email, phone, address, last4 PAN | None (store raw + masked/hashed variant) | Hash or Mask |
| Credential | Secrets/tokens/keys; never store raw | API keys, OAuth tokens, passwords, JWTs | Hash (fingerprint) or Drop | Drop (never returned) |
| PHI | Health information; regulated (HIPAA) | Diagnosis notes, vitals, patient IDs | None (store raw, classified) | Mask or Tokenize |
Defaults & Hints¶
- Public/Internal: logged freely (PII-safe format); Internal may be masked in public exports.
- Personal: default Mask for user-facing reads; Hash for analytics joins.
- Sensitive: compute pre-redacted variant at write for hot paths (email/phone).
- Credential: persist only a double-hash fingerprint or a presence flag (
present=true); never echo. - PHI: Mask for most roles; Tokenize when reversible joins are required (vault-scoped to region/tenant).
Redactor presets (ConnectSoft.Extensions.Compliance)¶
redactors:
Public: none
Internal: mask(edge=0..2) # optional
Personal: mask(showFirst=1, showLast=1) | hash(tenantSalted)
Sensitive: email|phone specialized redactors; else mask | hash(tenantSalted)
Credential: secret.erase | jwt.header-only | fingerprint(sha256^2)
PHI: mask(all) | tokenize(fpe, vault=region/tenant)
DTO attribute mapping (example)¶
public sealed class EvidenceDto
{
public string Title { get; init; } = ""; // Public
[PersonalData] public string? City { get; init; } // Personal
[EmailData] public string? Email { get; init; } // Sensitive
[SecretData] public string? ApiKey { get; init; } // Credential (dropped/hashed)
[HealthData] public string? Diagnosis { get; init; }// PHI
}
Enforcement notes¶
- Never downgrade class (Personal → Public is blocked); upgrades are logged with
upgradedFrom. - Errors & logs must respect class redaction (Credential/PHI never appear raw).
- Exports require an explicit redaction template for Personal/Sensitive/PHI; Credential is always dropped.
Redaction Strategies & Techniques¶
ATP applies deterministic redaction with idempotent results. Redaction is policy-driven and evaluated at PEP-2 (write/read/export) and at log sinks. Where hashing is used, it is tenant-salted to prevent cross-tenant joins.
Redaction kinds (canonical)¶
- None — pass-through. Used for
Publicand sometimesInternalin trusted contexts. - Hash —
HMAC-SHA256(tenantSalt, value); stable within a tenant, different across tenants/regions. - Mask — preserve edges (
showFirst/showLast) and structure; middle replaced with*. - Drop — remove value entirely; may keep a
present=trueindicator or fingerprint. - Tokenize — reversible token using format-preserving encryption (FPE) via a region/tenant-scoped vault.
Fingerprint:
sha256(sha256(value))(no salt) recorded only for audit correlation; never used for joins.
Deterministic hashing (tenant-salted)¶
public static string HashTenantScoped(string value, string tenantId, IKeyVault kv)
{
// Salt is HSM/KMS-backed per-tenant, per-region (rotated via key lineage)
var salt = kv.GetTenantSalt(tenantId);
using var hmac = new HMACSHA256(salt);
return Convert.ToHexString(hmac.ComputeHash(Encoding.UTF8.GetBytes(value))).ToLowerInvariant();
}
- Rotation: rotate tenant salts; re-hash on access is not required if raw retained; otherwise run a background re-hash job.
- Cross-tenant privacy: prevents joins between tenants on hashed PII.
Masking patterns (presets)¶
| Type | Example input | Mask rule (default) | Output example |
|---|---|---|---|
john.doe@example.com |
showFirst=1, showLast=1 | j**e@example.com |
|
| Phone | +1-555-123-4567 |
showLast=4 | ****4567 |
| PAN | 4532-1234-5678-9010 |
keep groups, last4 | **** **** **** 9010 |
| Name | Rachel Green |
first initial only | R***** G***** |
| IPv4 | 192.168.1.42 |
/24 | 192.168.1.x |
| IPv6 | 2001:db8:85a3::7334 |
/64 | 2001:db8:85a3::/64 |
| GUID | 550e8400-e29b-41d4-a716-... |
prefix only | 550e8400-**** |
Tokenization (reversible, under controls)¶
- When: analytics joins, DSAR tracebacks where hashing is insufficient.
- How: FPE FF3-1 in a jurisdiction-local vault; keys are tenant-scoped.
- Format:
tok:{ns}:{alg}:{ciphertext}— no raw hints or partial plain text. - Controls: dual-approval for detokenization, rate limits, SIEM alerts on spikes.
ConnectSoft.Extensions.Compliance redactors (catalog)¶
EmailRedactor→j**e@example.comPhoneLast4Redactor→****4567PanLast4Redactor→**** **** **** 1234JwtRedactor→<jwt>.<redacted>.<redacted>(strict mode: erased)IpAddressRedactor→ IPv4/24, IPv6/64GuidRedactor→ prefix onlyXXXXXXXX-****SecretRedactor→ erased or fixed mask****
DI usage
builder.Services.AddConnectSoftCompliance(cfg =>
{
cfg.UseDefaultRedactors(); // email/phone/ip/jwt/secret/...
cfg.UseTenantSaltProvider(); // HSM/KMS-backed
cfg.StrictSecrets(); // drop secrets at logs by default
});
Policy mapping (template excerpt)¶
policyVersion: 1.0.0
rulesByClass:
Public: { kind: None }
Internal: { kind: Mask, params: { showFirst: 0, showLast: 2 }, logSafe: true }
Personal: { kind: Mask, params: { showFirst: 1, showLast: 1 } }
Sensitive: { kind: Hash, params: { algorithm: HMAC-SHA256, tenantSalted: true } }
Credential: { kind: Drop }
PHI: { kind: Mask, params: { showFirst: 0, showLast: 0 } }
overridesByField:
email: { kind: Mask, params: { showFirst: 1, showLast: 1, preserveDomain: true } }
phone: { kind: Hash, params: { algorithm: HMAC-SHA256, tenantSalted: true } }
Read vs. write semantics¶
- Write-time: minimize
Credential(drop/hash); optionally precompute masked variants for hot fields (email/phone). - Read-time: apply template by caller purpose (
default,dsar_export,audit) and role (auditor/standard). - Logs: always redacted;
secret.*is erased;Sensitivehashed or masked.
Contract-safe logging¶
[LoggerMessage(EventId = 1201, Level = LogLevel.Information,
Message = "export.request {TenantId} {RegionCode} {@Query}")]
public static partial void ExportRequested(ILogger logger, string tenantId, string regionCode, [LogRedacted] ExportQuery query);
[LogRedacted]leverages schema tags + templates to scrub nested DTOs.
Edge cases & rules¶
- Already masked inputs must remain idempotent — masking twice yields same result.
- Truncation safety: ensure masked strings preserve length/format where validators expect it (e.g., PAN groups).
- Binary payloads: do not attempt content redaction; require metadata classification or deny.
- Free text: combine detectors (regex + allowlist) to reduce false positives; default to more restrictive.
Metrics & alerts¶
redaction.applied.count,redaction.dropped.count,tokenize.ops.count,log.pii_blocked.count- Alert when:
log.pii_blocked.count > 0(raw PII in logs)- tokenization rate spikes > 3× baseline for a tenant
- hash failures / missing tenant salt
Test vectors (quick)¶
tests:
- input: "john.doe@example.com"
class: Sensitive
mask: "j**e@example.com"
hash: "hmac256(tenantSalt, value)" # value changes with salt
- input: "+1-555-123-4567"
class: Sensitive
mask: "****4567"
- input: "sk_live_abc123"
class: Credential
drop: true
Guardrails (checklist)¶
- Hashing for joins is tenant-salted; never reuse salts across tenants/regions.
- Credentials are never stored or echoed; keep only
present=trueor fingerprint. - PHI defaults to mask; tokenization requires vault + dual approval.
- Logs/traces are PII-safe by construction (attributes + redactor pipeline).
- Redaction policies are versioned & signed; evaluations include
policyVersion.
Classification Policy Model¶
A declarative, versioned policy defines how ATP classifies and redacts fields at write/read/export. Policies are signed, monotonic (no downgrades), and include an effectiveFromUtc for deterministic replay.
# ClassificationPolicy (v1)
id: policy-atp-v1
version: 1
effectiveFromUtc: 2025-01-01T00:00:00Z
defaultByField:
email: Sensitive
phone: Sensitive
ip: Personal
userId: Personal
password: Credential
apiKey: Credential
jwt: Credential
healthNote: PHI
rulesByClass:
Sensitive:
kind: Hash # write: store raw + hash; read: return hash for auditors
params:
algorithm: HMAC-SHA256
tenantSalted: true
Credential:
kind: Drop # write: hash only (double SHA-256 fingerprint); read: drop
params: {}
PHI:
kind: Mask # read: mask for non-privileged; tokenize for analytics
params:
showFirst: 0
showLast: 0
overridesByField:
email:
kind: Mask
params:
showFirst: 1
showLast: 1
preserveDomain: true
Evaluation order (most restrictive wins)¶
overridesByField(per-field explicit rule)defaultByField(schema-driven default class)- Heuristics/Detectors (fallback for free-text; may only upgrade)
rulesByClass(class-wide redaction behavior)
Monotonicity is enforced by guards: once a field is
Sensitive, it cannot be downgraded toPersonalor below.
Semantics & behaviors¶
- Sensitive
- Write: persist raw value and deterministic tenant-salted HMAC for joins.
- Read (auditor): return hash; Standard roles get mask per export/template.
- Credential
- Write: drop raw; persist only a double-SHA-256 fingerprint or
present=true. - Read: always drop (never returned).
- Write: drop raw; persist only a double-SHA-256 fingerprint or
- PHI
- Read: mask for non-privileged; tokenize for analytics when allowed by route/purpose.
- Write: tagged as PHI; raw retained according to regulatory profile (see residency/retention).
Field override example
email uses Mask on read with domain preserved, even though the class rule for Sensitive prefers Hash for auditors. This enables recognizable outputs (e.g., j**e@example.com) in approved contexts while preserving auditor paths that still see hashes when required by purpose.
Validation & policy guardrails¶
- Required keys:
id,version,effectiveFromUtc,rulesByClass. - Enums:
kind ∈ {None, Hash, Mask, Drop, Tokenize};dataClass ∈ {Public, Internal, Personal, Sensitive, Credential, PHI}. - Hash params must include
algorithmandtenantSalted: truefor Sensitive/Personal joins. - Credential class can only be
Drop(optional fingerprint) — no Mask/Hash on read. - Effective dating: requests carry
X-Policy-Version; service refuses stale/bad versions on enforce stages.
Example decision (PII-safe log)¶
{
"ts":"2025-10-29T09:45:00Z",
"policyVersion":"1",
"path":"user.email",
"prevClass":"Personal",
"newClass":"Sensitive",
"ruleSource":"defaultByField",
"redaction":{"kind":"Mask","template":"email(preserveDomain=true)"},
"tenantId":"7c1a-…",
"regionCode":"EU",
"correlationId":"c9f4-…",
"decision":"allow"
}
Policy ops (lifecycle)¶
- Authoring: YAML in VCS; PRs require security/compliance review and CI lint (schema + monotonic checks).
- Signing: build creates a signed bundle; services verify signature at load.
- Rollout:
draft → canary → enforcewith deny/allow drift SLOs; auto-rollback on drift > 2 pp. - Migration: class changes trigger background re-classification & optional backfill of pre-computed variants (email/phone masks).
Tests (must pass in CI)¶
- Snapshot tests: sample records produce stable redaction outputs by
policyVersion. - Deny-path: attempts to downgrade or read Credential values fail with reason code.
- Cross-tenant hashing: identical inputs yield different HMACs across tenants.
Acceptance (done when)¶
- Policy bundle validates, signs, and loads; services stamp
policyVersionon decisions. - Sensitive/Credential/PHI behaviors match the table above in unit + integration tests.
- Evidence logs show no downgrades, and auditor reads for Sensitive fields return hash/mask per purpose/template.
Cross-references
Security & Compliance · Data Residency & Retention · Multitenancy & ABAC Guards · Observability
Write-Time Classification & Redaction¶
Write-path logic minimizes sensitive values before persistence. Decisions are policy-driven, idempotent, and stamped with policyVersion and provenance (schema/detector/override). Guards enforce monotonic classification and block storage of raw Credential data.
Ingestion pipeline¶
- Producer hints
The producer may attach optional classification hints inAuditRecord.metadata(e.g.,notesClassHint=PHI,emailClassHint=Sensitive). Hints can only upgrade risk. - Policy evaluation (PEP-2)
The ingestion service loads the active Classification Policy and resolves each field’sdataClassfrom: schema tags → tenant overrides → producer hints → detectors. Most restrictive wins. - Write-time redaction (selective)
- Credential → Drop or Fingerprint: erase value (preferred) or persist double-SHA-256 fingerprint; keep
present=true. - Sensitive (policy-enabled) → Precompute variants: persist raw plus masked/hashed variant for hot reads (e.g., email/phone).
- Personal/Internal/Public → store raw with classification tag (no downgrade allowed).
- Credential → Drop or Fingerprint: erase value (preferred) or persist double-SHA-256 fingerprint; keep
- Persist with metadata
Store the record and its classification metadata (per-field) for replay and evidence. Emit a PII-safe decision log.
Storage sketch (per field)
{
"path":"user.email",
"class":"Sensitive",
"raw":"john.doe@example.com",
"masked":"j**e@example.com",
"hash":"hmac256(tenantSalt, value)",
"policyVersion":"1.0.0",
"source":"schema",
"classifiedAt":"2025-10-29T10:05:00Z"
}
Guard (pseudo)
if (class == DataClass.Credential)
{
entity.SetFlag("present", true);
entity.SetFingerprint(DoubleSha256(value)); // optional
value = null; // raw dropped
}
else if (class == DataClass.Sensitive && policy.PrecomputeVariants)
{
entity.SetMasked(EmailRedactor.Mask(value));
entity.SetHash(HmacTenantSalted(value, tenantId));
}
Heuristic classification (fallback)¶
Used only when no explicit classification is present; outputs include detectorId and confidence. Heuristics cannot downgrade.
(?i)\b(email|e-mail)\b→ Sensitive(?i)\b(phone|mobile)\b→ Sensitive(?i)\b(password|secret|api[_-]?key|token)\b→ Credential(?i)\b(ssn|nin|national[_-]?id)\b→ Sensitive (upgrade to special handling if configured)(?i)\b(ip|client\.ip)\b→ Personal(?i)\b(health|diagnosis|vitals)\b→ PHI
Detector decision (PII-safe log)
{
"ts":"2025-10-29T10:06:00Z",
"path":"attributes.user_email",
"detectorId":"email.regex.v2",
"confidence":0.99,
"prevClass":"",
"newClass":"Sensitive",
"policyVersion":"1.0.0",
"decision":"upgrade"
}
Evidence & idempotency¶
- Idempotent writes: reprocessing the same payload yields the same masked/hash outputs by
policyVersion. - Evidence: every write emits
classification.decided(counts by class) andwrite.minimized(counts by kind: drop/mask/hash). - Replay: historical records can be re-evaluated safely when the policy version changes (background re-classification job).
Guardrails (write path)¶
- Never persist raw Credential values; keep only presence/fingerprint.
- Precompute variants only for approved fields (email/phone); avoid storage bloat.
- Heuristics upgrade only; schema/overrides take precedence.
- Stamp all decisions with
policyVersionandsource; store per-field metadata.
Read-Time Redaction & Enforcement¶
Read-time enforcement applies after auth/authz and before any serialization or export leaves the service boundary. Plans are policy-driven, role/purpose-aware, and stamped with policyVersion.
Query pipeline¶
- Authentication
Verify caller identity (JWT + optional DPoP/mTLS). Extracttenant_id,region_code,roles,scopes, andpurpose. - Authorization (ABAC)
Enforce tenant/region scope and clearance for requested data classes (Public/Internal/Personal/Sensitive/Credential/PHI) using PEP-2. Deny on cross-tenant/region or insufficient clearance. - Redaction plan selection
Select a plan from policy based on role + purpose + edition: - Auditor/Privileged → Sensitive returns hash, Credential dropped, PHI masked. - Standard user → Sensitive masked, Credential dropped, PHI masked/dropped by edition. - Public API → Only Public/Internal (Internal may be masked); all others omitted. - Apply redaction
Server-side masking/hashing/tokenization executed on the result model before serialization; logs use redacted objects. - Audit meta-access
Emit PII-safe decision logs:whoaccessedwhich classes,policyVersion,purpose, andcounts by class.
Clearance matrix (default)
| Data Class | Public API | Standard User | Auditor/Privileged |
|---|---|---|---|
| Public | allow | allow | allow |
| Internal | mask/omit | mask/allow | allow |
| Personal | omit | mask | mask/hash |
| Sensitive | omit | mask | hash |
| Credential | omit | drop | drop |
| PHI | omit | mask/drop | mask |
Break-glass is not available here; it’s governed separately (time-bound, dual-control, fully audited).
Policy sketch (Rego)
package atp.query
default allow = false
can_view[class] {
input.token.role == "auditor"
class != "Credential"
}
can_view[class] {
input.token.role == "user"
class in {"Public","Internal","Personal","Sensitive","PHI"}
}
redact[class] := action {
some class
action := {
"Public": "none",
"Internal": "mask",
"Personal": "mask",
"Sensitive": input.token.role == "auditor" ? "hash" : "mask",
"Credential":"drop",
"PHI": "mask"
}[class]
}
allow {
input.resource.tenantId == input.token.tenant_id
input.resource.regionCode == input.token.region_code
}
Server-side application (C#)
var plan = redactionPlanner.For(token.Role, token.Purpose, policyVersion);
var result = await repository.QueryAsync(q, token.TenantId, token.RegionCode, ct);
// Apply per-field plan before serialization/logging
var safe = redactor.Apply(result, plan);
return Results.Ok(safe);
Access decision (PII-safe log)
{
"ts":"2025-10-29T10:25:03Z",
"policyVersion":"1.0.0",
"op":"query",
"tenantId":"7c1a-…",
"classes": {"Personal":128,"Sensitive":42,"PHI":3,"Credential":0},
"redaction": {"mask":171,"hash":42,"drop":3},
"purpose":"default",
"role":"user",
"correlationId":"b7a9-…",
"decision":"allow"
}
Export pipeline¶
-
Route & purpose checks Exports are in-region by default. Enforce
purpose(e.g.,dsar_export,audit_attestation) and export route policies; deny cross-region unless explicitly allowed. -
Redaction + overrides Apply the same read-time plan plus export-specific overrides (e.g., stricter masking for third-party recipients). Credential always dropped.
-
Manifest & evidence Every export includes a signed manifest with
policyVersion, counts by data class, redaction summary, and watermarking metadata.
Export manifest (example)
export:
id: "exp-2025-10-29-001"
tenantId: "7c1a-…"
regionCode: "EU"
purpose: "dsar_export"
policyVersion: "1.0.0"
route: "in_region"
classes:
Personal: 120
Sensitive: 38
PHI: 2
redaction:
mask: 140
hash: 38
drop: 2
watermark:
subject: "auditor@firm.example"
requestId: "c0f1-…"
signature: "cosign:…"
- Data sharing agreements (DSA) When exporting to third parties, enforce DSA-bound templates (e.g., hash-only for Sensitive, mask-only for PHI), apply watermarks, and restrict fields to the minimum required.
Metrics & alerts¶
query.redaction.applied.count{kind=mask|hash|drop}export.manifest.generated.countguard.blocked{reason=class_clearance|route_cross_region|missing_template}- Alerts on:
- export attempted without manifest/template,
- cross-region export without approval,
- unexpected rise in drop for Credential (indicates producer leakage).
Guardrails (read/export)¶
- Redaction is server-side and pre-serialization; clients never receive raw Sensitive/PHI/Credential.
- Exports carry a signed manifest with
policyVersionand watermark; Credential is always dropped. - Purpose/route must be explicit; cross-region routes are deny-by-default.
- Decision logs are PII-safe and include class counts and redaction summary for evidence.
Policy-as-Code & Versioning¶
Policies are declarative artifacts (YAML/JSON) kept in version control, signed on release, and referenced at runtime by policyVersion. Old policies are immutable for replay. Any change can trigger background re-classification with idempotent jobs and PII-safe evidence logs.
Repository layout (source of truth)¶
/policies/classification/
policy-atp-v1.yaml # current
policy-atp-v0.yaml # immutable history
schema/classification.v1.json
/signing/
cosign.pub # verifier
cosign.keyref # KMS/HSM key ref (no raw key in repo)
/bundles/
atp-classification-v1.sig # signed artifact (publish target)
Policy metadata (required fields)¶
id: policy-atp-v1
version: 1
effectiveFromUtc: 2025-01-01T00:00:00Z
author: dpo@connectsoft.ai
rulesByClass: { ... } # None|Mask|Hash|Drop|Tokenize
defaultByField: { ... } # email/phone/ip/...
overridesByField: { ... } # field-specific behaviors
monotonic: true # never-downgrade enforcement
CI/CD (lint → sign → publish)¶
# .azure-pipelines/policy-release.yml
stages:
- stage: validate
jobs:
- job: lint_and_schema
steps:
- script: yq eval '.' policies/classification/policy-atp-v1.yaml
- script: ajv validate -s policies/classification/schema/classification.v1.json \
-d policies/classification/policy-atp-v1.yaml
- script: opa eval -i tests/fixtures/sample.json -d policies/tests/monotonic.rego "data.tests.pass"
- stage: sign
dependsOn: validate
jobs:
- job: cosign_sign
steps:
- script: cosign sign-blob --key $(KMS_KEYREF) --output-signature bundles/atp-classification-v1.sig \
policies/classification/policy-atp-v1.yaml
- stage: publish
dependsOn: sign
jobs:
- job: push_bundle
steps:
- script: oras push $(POLICY_REGISTRY)/atp/classification:v1 \
--artifact-type application/vnd.atp.policy \
policies/classification/policy-atp-v1.yaml:policy.yaml \
bundles/atp-classification-v1.sig:policy.sig
Runtime verification
- Services fetch the bundle by immutable tag (
v1) or digest. - Verify cosign signature and schema before activating.
- Stamp all decisions with
policyVersion.
Rollout strategy¶
- States:
draft → canary → enforce. - Canary scope: small tenant set per region; measure allow/deny drift and redaction deltas.
- Abort if:
abac.deny.delta_pp > 2,log.pii_blocked.count > 0, or export guard violations. - Promote on green metrics; record ADR with evidence links.
rollout:
canaryTenants: ["t-eu-01","t-us-02"]
observe:
- abac.deny.delta_pp <= 2
- redaction.applied.count.delta <= 10%
- export.guard.blocked == 0
abortIf:
- log.pii_blocked.count > 0
- residency.cross_region.allow > 0
Immutable history & replay¶
- Prior versions remain read-only; used for audit replay to explain past outputs.
- Replays load the historical bundle and produce deterministic redaction given the timestamp.
Migration (re-classification jobs)¶
- Trigger when
rulesByClassoroverridesByFieldchange. - Idempotent: re-running yields identical stored variants by
policyVersion. - Scoped: per-tenant, per-region; rate-limited to protect hot paths.
- Evidence:
policy.migration.started/completedwith counts and failure reasons.
migration:
reason: "email override: Mask (preserveDomain=true)"
scope: { regionCode: "EU", tenants: ["*"] }
batchSize: 500
throttleRps: 50
retry: { maxAttempts: 3, backoff: jittered }
Meta-audit of policy changes¶
Every change is logged to the meta-audit stream:
{
"ts":"2025-10-29T10:40:00Z",
"actor":"compliance-approver@connectsoft.ai",
"action":"policy.update",
"policyId":"policy-atp-v1",
"version":1,
"digest":"sha256:…",
"sig":"cosign:…",
"effectiveFromUtc":"2025-11-01T00:00:00Z",
"justification":"Tighten email masking for UI outputs",
"adr":"ADR-SEC-0123"
}
Guardrails (quick checklist)¶
- Policies are signed, schema-valid, and monotonic.
- Services refuse unsigned/unknown
policyVersionin enforce mode. - All decisions include
policyVersion; exports embed a manifest with the same. - Migrations are throttled, idempotent, and evidenced.
- Old policy files are immutable; replay uses historical bundles.
Acceptance (done when)¶
- CI lints, validates, signs, and publishes the policy; services successfully load & verify.
- Canary completes with green SLOs; ADR recorded;
policyVersionvisible in decision logs. - Migration (if required) finishes with evidence and zero PII-in-logs alerts.
Cross-references
Security & Compliance · Data Residency & Retention · Least Privilege & Policy Enforcement · Observability
Tenant-Specific Overrides¶
Overrides let tenants tighten classification/redaction for their data while preserving platform guarantees. They are upgrade-only (never downgrade), scoped to tenantId (and optionally stream/path), approved via a dual-control workflow, and stored in the Tenant Configuration Service with versioned history.
Model & precedence¶
- Edition defaults: baseline per edition (e.g., Free = strict masking; Enterprise = configurable within guardrails).
- Tenant overrides: may only upgrade sensitivity or strengthen redaction (e.g., Mask → Hash, keepLast=4 → keepLast=0).
- Precedence (most restrictive wins):
- Tenant override (approved, active window)
- Global policy
overridesByField - Global policy
defaultByField/ detectors
- Scope:
tenantId+ optionalstreamIdorjsonPath(e.g.,user.email,attributes.medical_notes).
Configuration schema (Tenant Configuration Service)¶
tenantId: "t-7c1a"
edition: "Enterprise"
overrides:
- id: "ovr-001"
path: "user.email"
class: "Sensitive" # cannot be lower than baseline
redaction:
kind: "Mask"
params: { showFirst: 1, showLast: 1, preserveDomain: true }
scope: { streams: ["audit.evidence.*"] }
effectiveFromUtc: "2025-11-01T00:00:00Z"
expiresUtc: "2026-11-01T00:00:00Z"
status: "approved"
approvals:
- { actor: "dpo@connectsoft.ai", ts: "2025-10-29T08:10:00Z" }
- { actor: "security@connectsoft.ai", ts: "2025-10-29T08:12:00Z" }
- id: "ovr-002"
path: "attributes.session_token"
class: "Credential"
redaction:
kind: "Drop" # escalate to drop at read if missed at write
status: "approved"
approvals:
- { actor: "dpo@connectsoft.ai", ts: "2025-10-29T08:15:00Z" }
Approval workflow¶
- Request (tenant admin) → submits override in portal/API with justification and impact.
- Validation (automated) → schema & monotonic checks; path exists; no downgrade.
- Dual approval → Compliance (DPO) and Security sign-off (dual-control).
- Activation → publish to config store; propagate to services via signed config bundle.
- Review/expiry → automatic reminders 30 days before
expiresUtc; renewal requires re-approval.
Meta-audit event (PII-safe)
{
"ts":"2025-10-29T08:12:00Z",
"tenantId":"t-7c1a",
"action":"override.approved",
"overrideId":"ovr-001",
"path":"user.email",
"class":"Sensitive",
"redaction":"Mask(preserveDomain=true)",
"approvers":["dpo@connectsoft.ai","security@connectsoft.ai"],
"effectiveFromUtc":"2025-11-01T00:00:00Z",
"expiresUtc":"2026-11-01T00:00:00Z"
}
Enforcement (guards)¶
Never-downgrade + most-restrictive-wins (Rego sketch)
package atp.overrides
# ranks for monotonic enforcement
rank := {"Public":0,"Internal":1,"Personal":2,"Sensitive":3,"PHI":4,"Credential":5}
default allow = false
# Merge global policy decision with tenant override; choose most restrictive
effective_class := cls {
base := input.policyClass
ovr := input.tenantOverrideClass
cls := (rank[ovr] > rank[base]) ? ovr : base
}
deny[msg] {
some ovr
ovr := input.tenantOverrideClass
rank[ovr] < rank[input.policyClass]
msg := sprintf("override would downgrade %v → %v", [input.policyClass, ovr])
}
allow { not deny[_] }
Service usage (C#)
var baseline = classifier.Decide(path, baselinePolicy, payload);
var ovr = tenantOverrides.GetFor(tenantId, path);
var effective = MostRestrictive(baseline, ovr); // monotonic check inside
var redacted = redactor.Apply(value, effective.RedactionPlan);
Edition-based defaults (examples)¶
| Edition | Personal | Sensitive | Credential | PHI |
|---|---|---|---|---|
| Free | Mask | Mask | Drop | Mask |
| Business | Mask | Hash (tenant) | Drop + fingerprint | Mask/Tokenize |
| Enterprise | Mask | Hash or Tokenize | Drop + fingerprint | Tokenize (vault) |
Tenants can tighten (e.g., Free → Hash for Sensitive) but cannot loosen edition defaults.
Caching & rollout¶
- Overrides cached per
(tenantId, policyVersion)with short TTL (e.g., 60s) and event-driven invalidation on approval/revocation. - Services fetch signed override bundles; reject unsigned or stale bundles in enforce mode.
- Canary flag
override.canary=truesupports staged activation for high-risk paths.
Testing¶
- Unit: override cannot reduce class; redaction plan resolves to most-restrictive outcome.
- Integration: queries/exports reflect override immediately after cache invalidation; manifests include override IDs.
- Chaos: simulate missing override bundle; verify fail-closed to baseline (more restrictive wins).
Metrics & SLOs¶
override.active.count{tenant=*};override.decision.hit.rate;override.downgrade.blocked.count- SLO: override propagation P95 < 2m from approval to enforcement.
- Alerts on:
- blocked downgrades,
- unsigned bundle load attempts,
- cache staleness beyond SLO.
Guardrails (quick checklist)¶
- Overrides are upgrade-only, time-bound, and dual-approved.
- Enforcement is most restrictive wins; baseline policy always applies if override missing.
- Bundles are signed; services fail closed on verification errors.
- Exports include override IDs in the manifest; evidence logs capture effective class/redaction source.
- Expiring overrides trigger renewal or auto-revert to edition defaults.
Integration with ConnectSoft.Extensions.Compliance¶
ATP services embed ConnectSoft.Extensions.Compliance through the microservice template so classification and redaction are consistent across ingestion, query, export, and logging. The template wires dependency injection, attributes on DTOs, policy loading, and logging sinks that automatically apply redactors.
Template integration (DI & configuration)¶
Program.cs / Startup.cs
var builder = WebApplication.CreateBuilder(args);
// 1) Compliance services: taxonomy, classifiers, redactors, policy loader
builder.Services.AddConnectSoftCompliance(builder.Configuration, builder.Environment)
.AddDefaultTaxonomy() // ConnectSoftTaxonomy + mappings
.AddDefaultRedactors() // Email/Phone/PAN/JWT/IP/Guid/Secret
.AddTenantSaltProvider() // HSM/KMS-backed tenant salt
.AddPolicyBundles(options =>
{
options.ClassificationBundle = builder.Configuration["Compliance:ClassificationBundle"]; // e.g., "registry://atp/classification:v1"
options.VerifySignatures = true;
});
// 2) Structured logging with auto-redaction
builder.Services.AddLogging(logging =>
{
logging.ClearProviders();
logging.AddJsonConsole();
}).AddRedactionForLogging(); // plugs into Microsoft.Extensions.Logging
var app = builder.Build();
app.MapControllers();
app.Run();
appsettings.json (excerpt)
{
"Compliance": {
"ClassificationBundle": "registry://atp/classification:v1",
"StrictSecrets": true,
"Logging": { "RedactBodies": true, "MaxValueLength": 256 }
}
}
Attributes on contracts (DTOs)¶
Use attributes from ConnectSoft.Extensions.Compliance to declare classes for fields. These guide both write-time minimization and read-time redaction.
public sealed class EvidenceAppendDto
{
public string Title { get; init; } = ""; // Public
[EmailData] // -> Sensitive (policy controls mask/hash)
public string? Email { get; init; }
[PhoneData] // -> Sensitive (hash or mask)
public string? Phone { get; init; }
[SecretData] // -> Credential (never persisted/logged raw)
public string? ApiKey { get; init; }
[HealthData] // -> PHI (mask/tokenize by role/purpose)
public string? DiagnosisNote { get; init; }
// Free-text with an explicit hint; service may upgrade, never downgrade
[ClassHint(DataClass.Personal)]
public string? Notes { get; init; }
}
Logging integration (source-generated + redaction)¶
All structured logs flow through the compliance redactor. Mark sensitive parameters or entire objects; the logger will apply the correct template per policyVersion.
public static partial class EvidenceLogs
{
[LoggerMessage(EventId = 2101, Level = LogLevel.Information,
Message = "append.accepted {TenantId} {RegionCode} {@Request}")]
public static partial void AppendAccepted(
ILogger logger,
string tenantId,
string regionCode,
[LogRedacted] EvidenceAppendDto request); // auto-redacts nested fields
}
[LogRedacted]leverages DTO attributes and schema tags to redact nested structures.StrictSecrets=trueguarantees Credential-class fields are erased in logs.
Taxonomy alignment (ATP ↔ ConnectSoftTaxonomy)¶
ATP DataClass |
ConnectSoftTaxonomy | Notes (default) |
|---|---|---|
Public |
Public |
No redaction |
Internal |
Internal |
Optional mask in public exports |
Personal |
Email, Phone, PersonName, IpAddress, DeviceId |
Mask / tenant-salted hash |
Sensitive |
PostalAddress, PaymentCardPan, BankAccount, FinancialId |
Hash/Mask per policy |
Credential |
Secret, OAuthToken, Jwt, SessionId |
Drop or fingerprint only |
PHI |
HealthInfo |
Mask or tokenize |
Mapper registration (usually implicit via AddDefaultTaxonomy):
services.AddComplianceTaxonomy(map =>
{
map.Map(DataClass.Personal, ConnectSoftTaxonomy.Email, ConnectSoftTaxonomy.Phone, ConnectSoftTaxonomy.PersonName, ConnectSoftTaxonomy.IpAddress, ConnectSoftTaxonomy.DeviceId);
map.Map(DataClass.Sensitive, ConnectSoftTaxonomy.PostalAddress, ConnectSoftTaxonomy.PaymentCardPan, ConnectSoftTaxonomy.BankAccount, ConnectSoftTaxonomy.FinancialId);
map.Map(DataClass.Credential,ConnectSoftTaxonomy.Secret, ConnectSoftTaxonomy.OAuthToken, ConnectSoftTaxonomy.Jwt, ConnectSoftTaxonomy.SessionId);
map.Map(DataClass.PHI, ConnectSoftTaxonomy.HealthInfo);
});
Redactor reuse (built-ins)¶
EmailRedactor→j**e@example.comPhoneLast4Redactor→****4567PanLast4Redactor→**** **** **** 1234JwtRedactor→<jwt>.<redacted>.<redacted>(strict mode: erased)IpAddressRedactor→ IPv4/24, IPv6/64GuidRedactor→XXXXXXXX-****SecretRedactor→ erased ("") or****
Manual use from DI (when needed)
public sealed class ManualMasker(IRedactorFactory factory)
{
private readonly IRedactor _email = factory.Get("EmailRedactor");
public string MaskEmail(string email) => _email.Redact(email);
}
Middleware & serialization¶
- ASP.NET Core: response serialization runs after redaction; never serialize raw Sensitive/PHI/Credential.
- System.Text.Json converters in the template apply field-level redaction for
[LogRedacted]payloads and for export manifests. - OpenTelemetry: compliance integration scrubs attributes/spans; deny adding raw Credential/PHI to spans.
Policy-aware planners¶
Redaction plans depend on role, purpose, edition, and policyVersion:
var planner = services.GetRequiredService<IRedactionPlanner>();
var plan = planner.For(token.Role, token.Purpose, policyVersion); // e.g., user/default → mask; auditor → hash
var safe = redactor.Apply(result, plan); // pre-serialization application
Unit test example (logs are PII-safe)¶
[Fact]
public void Logs_erase_credentials()
{
var dto = new EvidenceAppendDto { Email = "jane@ex.com", ApiKey = "sk_live_123" };
var logger = TestLogger.CreateWithCompliance(); // template helper
EvidenceLogs.AppendAccepted(logger, "t-1", "EU", dto);
var entry = logger.LastJson();
Assert.DoesNotContain("sk_live_123", entry);
Assert.Contains("\"Email\":\"j**e@ex.com\"", entry);
}
Guardrails (quick checklist)¶
- Register
AddConnectSoftCompliance(...)in every service; enforce signed policy bundles. - Mark DTOs with compliance attributes; prefer
[LogRedacted]for complex objects. - Use built-in redactors; do not hand-roll masking for standard types.
- Ensure logging, tracing, and metrics pipelines pass through the compliance redactor.
- Redaction planning is server-side; never rely on client masking.
Cross-references
Policy-as-Code & Versioning · Write-Time Classification & Redaction · Read-Time Redaction & Enforcement · Security & Compliance · Data Residency & Retention
Classification Enforcement Mechanisms¶
Enforcement happens at predictable policy enforcement points (PEPs) with deny-by-default semantics. Guards are fail-closed, evaluated before persistence and before serialization/export. All decisions are deterministic by policyVersion and produce PII-safe evidence.
Guards & Middleware¶
Ingestion guard (PEP-2 / write)
- Validates presence of required fields and classification for known schema paths.
- Applies write-time minimization (Credential drop/fingerprint; Sensitive pre-variants).
- Blocks downgrades and unknown classifications; upgrades are logged with provenance.
- Emits
classification.decided+write.minimizedcounters.
Query guard (PEP-2 / read)
- Enforces tenant/region scope and clearance by data class (role + purpose + edition).
- Applies server-side redaction before serialization; logs contain only redacted objects.
- Denies requests lacking
purposeor requesting classes above caller clearance.
Export guard (PEP-2 / export)
- Requires redaction template (except
purpose=dsar_exportwhere a DSAR plan applies). - Enforces in-region routes by default; cross-region requires explicit approval and watermarking.
- Produces a signed manifest (policyVersion, class counts, redaction summary, watermark).
ASP.NET Core middleware (sketch)
app.Use(async (ctx, next) =>
{
var token = await auth.ExtractAsync(ctx);
if (!purposeValidator.IsValid(token.Purpose))
throw new GuardViolation("missing_or_invalid_purpose");
ctx.Items["RedactionPlan"] = planner.For(token.Role, token.Purpose, policy.Version);
await next();
// Apply server-side redaction just before write-out
if (ctx.Items.TryGetValue("ResultBody", out var body))
ctx.Items["ResultBody"] = redactor.Apply(body, (RedactionPlan)ctx.Items["RedactionPlan"]);
});
Policy Evaluation Engine¶
Policy decisions are expressed in OPA/Rego and executed within the service (embedded OPA or library evaluation). Inputs include token claims, tenant/region, purpose, policyVersion, edition, and requested classes.
Never-downgrade & most-restrictive-wins
package atp.enforce
rank := {"Public":0,"Internal":1,"Personal":2,"Sensitive":3,"PHI":4,"Credential":5}
default allow = false
deny[msg] {
input.prevClass != ""
rank[input.newClass] < rank[input.prevClass]
msg := sprintf("downgrade blocked %v→%v", [input.prevClass, input.newClass])
}
# effectiveClass = max(prevClass, policyClass, tenantOverrideClass)
effective_class := cls {
max := function(x,y){ (rank[x] > rank[y]) ? x : y }
cls := max(input.policyClass, max(input.prevClass, input.tenantOverrideClass))
}
allow {
not deny[_]
}
Clearance + redaction mapping
package atp.query
can_view[class] {
input.role == "auditor"
class != "Credential"
}
can_view[class] {
input.role == "user"
class in {"Public","Internal","Personal","Sensitive","PHI"}
}
redact[class] := action {
action := {
"Public": "none",
"Internal": "mask",
"Personal": "mask",
"Sensitive": input.role == "auditor" ? "hash" : "mask",
"Credential": "drop",
"PHI": "mask"
}[class]
}
Export guard
package atp.export
default allow = false
deny[msg] {
input.route.crossRegion
not input.approval
msg := "cross-region export without approval"
}
deny[msg] {
some c
c := input.classes[_]
c in {"Sensitive","PHI"}
input.redactTemplate == ""
input.purpose != "dsar_export"
msg := "missing redact template for sensitive export"
}
allow { not deny[_] }
Deterministic evaluation & evidence¶
- Determinism:
f(input, policyVersion) → (effectiveClass, redactionAction)is pure. - Evidence log (PII-safe) includes:
policyVersion,source(schema/override/detector),effectiveClass,redactionAction,role,purpose,countsByClass,correlationId.
{
"ts":"2025-10-29T11:05:42Z",
"op":"query",
"tenantId":"t-7c1a",
"policyVersion":"1.0.0",
"role":"user",
"purpose":"default",
"classes":{"Personal":54,"Sensitive":12,"PHI":1,"Credential":0},
"redaction":{"mask":55,"hash":12,"drop":1},
"decision":"allow",
"correlationId":"c7e4-…"
}
Caching & invalidation¶
- Redaction plans are cached per (caller, tenant, edition, policyVersion, purpose) with short TTL (e.g., 60s).
- Event-driven invalidation on policy bundle rotation or tenant override approvals (signed event from control plane).
- Fail-closed: if the cache is stale and the policy bundle cannot be verified, revert to more restrictive defaults.
var key = CacheKey.From(token, policy.Version);
var plan = cache.GetOrCreate(key, _ => planner.For(token.Role, token.Purpose, policy.Version));
policy.Events.OnPolicyRotated += (_, v) => cache.RemoveWhere(k => k.PolicyVersion != v);
Failure modes & responses¶
- Unknown/unsigned policy →
503 enforce_unavailable(fail-closed). - Missing purpose →
400 invalid_purpose. - Clearance failure →
403 class_not_permitted. - Export without template →
409 export_template_required. - Cross-region without approval →
403 route_not_permitted.
Metrics & SLOs¶
guard.blocked{reason=*};plan.cache.hit_rate;policy.bundle.verify.latency- SLOs: P95 plan compute < 5 ms; policy rotation propagation < 2 min; guard decision latency < 10 ms.
- Alerts on:
guard.blocked{reason=class_not_permitted}spike,plan.cache.hit_rate < 0.9,- unsigned bundle load attempt.
Guardrails (quick checklist)¶
- Guards are deny-by-default and fail-closed on policy/override fetch errors.
- Server-side redaction only; never rely on client masking.
- Every decision includes
policyVersion, role, purpose, and class counts. - Cache keys include policyVersion; rotate and invalidate on bundle updates.
- Export manifests are signed, watermarked, and reference the enforced policyVersion.
Cross-references
Read-Time Redaction & Enforcement · Write-Time Classification & Redaction · Policy-as-Code & Versioning · Security & Compliance
Anonymization vs Pseudonymization¶
Definitions¶
- Anonymization — Irreversible transformation that removes/obliterates identifiers so re-identification is not reasonably possible. Under GDPR, anonymized data is not personal data.
- Pseudonymization — Replacement of identifiers with stable surrogates (tokens/hashes) so records remain linkable under controlled keys/mappings. Under GDPR, pseudonymized data remains personal data (risk-reduced, stronger controls required).
ATP usage patterns¶
Anonymization (irreversible)
- Use double-hash fingerprints:
sha256(sha256(value)). - No keys retained; only a presence flag or fingerprint for dedup/audit correlation.
- Typical for Credential class (secrets/tokens) and any field that must never be reconstructed.
Pseudonymization (deterministic & controlled)
- Use HMAC-SHA256(tenantSalt, value) for joinability within the same tenant.
- Not mathematically reversible; operational re-identification requires a lookup table or the original value (to re-derive the HMAC).
- Prevents cross-tenant joins by design (different salts/keys per tenant/region).
Tokenization (reversible under governance)
- Use format-preserving encryption (FPE, e.g., FF3-1) with tenant+region-scoped KMS/HSM keys.
- Enables controlled de-tokenization via dual-approval workflows for PHI/financial analytics or DSAR trace-backs.
- Tokens carry no plaintext hints (e.g.,
tok:{ns}:{alg}:{ciphertext}).
Compliance alignment¶
- GDPR Art. 4(5) — Pseudonymization reduces risk but does not exempt from data protection obligations. ATP treats pseudonymized outputs as still protected, subject to ABAC, purpose limits, and export guards.
- HIPAA Safe Harbor — De-identification requires removal of 18 identifiers. ATP supports Safe Harbor exports with:
- explicit identifier drop lists (names, full addresses, device IDs, biometric identifiers, etc.),
- date generalization (year only), and
- optional tokenization for operational fields where a reversible surrogate is permitted internally but not shared externally.
Selection guide (when to use what)¶
| Goal | Technique | Joinability | Re-identification | Typical Classes |
|---|---|---|---|---|
| Never recover; prove non-retain | Double-hash | No | No | Credential |
| Tenant-local joins; low risk | HMAC (tenant-salted) | Yes (local) | By lookup only | Personal, Sensitive |
| Reversible under strict control | Tokenization (FPE) | Yes | With KMS + approval | PHI, Financial identifiers |
| Public sharing / Safe Harbor | De-ident w/ masking | Limited | No (if compliant) | PHI (export), mixed datasets |
Examples¶
# Anonymize credential
credential:
store:
fingerprint: sha256(sha256(value))
present: true
read: drop
# Pseudonymize email within tenant boundary
email:
write:
hmac: HMAC-SHA256(tenantSalt, value) # joinable across this tenant only
masked: "j**e@example.com" # precomputed for UI
read:
standard: masked
auditor: hmac
# Tokenize PAN for finance analytics
paymentCardPan:
write:
token: fpe(ff3-1, key=tenantRegionKey)
read:
standard: last4-mask
analytics: token (no detokenization)
detokenize:
require: dual-approval, mTLS, JIT grant, audit log
Guardrails (quick checklist)¶
- Use double-hash for anything that must be permanently irrecoverable (e.g., secrets).
- Use HMAC with tenant-scoped salts for deterministic joins; never share HMACs across tenants/regions.
- Reserve tokenization for governed workflows; dual-approval and watermarked detokenization only.
- Safe Harbor exports enforce an identifier drop list and date generalization; tokens are not detokenized outside ATP.
- Evidence logs capture method (
anonymize/pseudonymize/tokenize), policyVersion, and approval IDs (for any detokenization).
Cross-references
Policy-as-Code & Versioning · Read-Time Redaction & Enforcement · Data Residency & Retention · Security & Compliance
Attribute-Based Classification (Dynamic)¶
Free-Form Attributes¶
Audit records may include an open-ended attributes bag (string keys → scalar/array/object values). ATP classifies these dynamically using key-based rules and detectors.
- Key-to-class rules (policy-driven): map attribute names (or regexes) to
DataClass. Examples:attributes["user_email"]→ Sensitive → applyEmailRedactor.attributes["session_token"]→ Credential → drop at write, keeppresent=true.
- Tenant overrides: tenants can supply a
classificationOverridesmap (key →DataClass) that may only upgrade sensitivity (never downgrade). - Detectors: if no key rule exists, fall back to detectors (regex/statistical/NLP) to infer class; record
detectorIdandconfidence.
Policy excerpt (key rules & detectors)
dynamic:
keyRules:
- match: "^user[_-]?email$" # exact/regex
class: Sensitive
redaction: { kind: Mask, params: { showFirst: 1, showLast: 1, preserveDomain: true } }
- match: "^session[_-]?token$"
class: Credential
redaction: { kind: Drop }
- match: "^(ip|client[_-]?ip)$"
class: Personal
redaction: { kind: Mask, params: { ipv4Cidr: 24, ipv6Cidr: 64 } }
detectors:
email: { regex: "(?i)[A-Z0-9._%+-]+@[A-Z0-9.-]+\\.[A-Z]{2,}", class: Sensitive }
phone: { regex: "(?i)\\+?[0-9][0-9\\-() ]{7,}", class: Sensitive }
secret: { regex: "(?i)(api[_-]?key|token|password|secret)", class: Credential }
Example input → decisions
{
"attributes": {
"user_email": "jane.doe@example.com",
"session_token": "sk_live_123",
"client_ip": "192.168.1.42",
"comment": "call me at +1-555-123-4567"
}
}
[
{"path":"attributes.user_email","class":"Sensitive","redaction":"Mask(email)","source":"keyRule"},
{"path":"attributes.session_token","class":"Credential","redaction":"Drop","source":"keyRule"},
{"path":"attributes.client_ip","class":"Personal","redaction":"Mask(ipv4/24)","source":"keyRule"},
{"path":"attributes.comment","class":"Sensitive","redaction":"Mask(phone)","source":"detector","detectorId":"phone.regex.v1","confidence":0.97}
]
Service sketch (C#)
foreach (var (k, v) in record.Attributes)
{
var rule = policy.Dynamic.KeyRules.Match(k);
var decision = rule is not null
? classifier.FromKeyRule(k, v, rule)
: classifier.FromDetectors(k, v, policy.Dynamic.Detectors);
applyWriteMinimization(decision); // drop/hash/mask per decision
metadata.Add(decision.ToMetadata(policy.Version));
}
Structural Hints¶
Classification can also target nested JSON paths using a JSONPath-like syntax with wildcards:
- Exact path:
user.profile.email→ Sensitive (EmailRedactor) - Array elements:
items[].buyer.email→ apply to each element - Wildcard:
attributes.*.token→ anytokenfield underattributes→ Credential
Policy excerpt (path rules & precedence)
dynamic:
pathRules:
- path: "user.profile.email"
class: Sensitive
redaction: { kind: Mask, params: { showFirst: 1, showLast: 1, preserveDomain: true } }
- path: "items[].buyer.email"
class: Sensitive
redaction: { kind: Hash, params: { algorithm: HMAC-SHA256, tenantSalted: true } }
- path: "attributes.*.token"
class: Credential
redaction: { kind: Drop }
precedence:
order: [tenantOverride, pathRule, keyRule, detector]
monotonic: true # never-downgrade
Array semantics
- Rules with
[]apply to every element at that position. - If element type mismatches (e.g., not an object or missing key), rule is skipped (no downgrade).
- Per-element decisions are recorded separately in metadata (
pathincludes index).
Metadata example for arrays
{
"path":"items[3].buyer.email",
"class":"Sensitive",
"source":"pathRule",
"policyVersion":"1.0.0",
"hash":"hmac256(tenantSalt, value)"
}
Safety, Limits & Abuse Resistance¶
- Key budget: cap processed dynamic keys per record (e.g., max 128) to prevent “attribute bombs”.
- Name length: enforce sane bounds (e.g., key name ≤ 128 chars) and ASCII whitelist to avoid parser tricks.
- Reserved prefixes: block keys beginning with
__,_sys,atp.,connectsoft.from user payloads. - Allowlist (optional): per-tenant allowlist for attribute keys; unknown keys drop to Internal (masked in exports) unless detectors upgrade.
- Type guards: binary/blob fields in
attributesare not scanned; require explicit classification hints or are rejected.
Precedence & Determinism¶
- Most-restrictive wins with the following order:
tenantOverride>pathRule>keyRule>detector. - Evaluations are pure per
policyVersion: same input + same policy → same outputs. - All decisions record
source,detectorId(if used), andconfidence.
Tests (must-have)¶
- Key-rule hit beats detector; detector upgrades when no rules exist.
- Path-rule applies to every array element; missing elements do not cause downgrades.
- Oversized key names → rejected; too many keys → truncated processing with evidence.
- Reserved prefix keys are blocked with
guard.blocked{reason=reserved_prefix}.
Metrics & Evidence¶
dynamic.keyRule.hit.count,dynamic.pathRule.hit.count,dynamic.detector.hit.countguard.blocked{reason=attribute_bomb|reserved_prefix|oversize_key}- Decision logs include per-field
path,class,source,policyVersion, and for arrays, the index.
Guardrails (quick checklist)¶
- Enforce key budget, name length, and reserved prefixes.
- Prefer path rules for stable schemas; use key rules for semi-structured bags.
- Detectors upgrade-only; never rely on them to relax protection.
- Persist per-field metadata with full
pathto make audits and replays exact. - Keep policy precedence and monotonic enforcement consistent across services.
Cross-references
Policy-as-Code & Versioning · Tenant-Specific Overrides · Write-Time Classification & Redaction · Read-Time Redaction & Enforcement
Testing & Validation¶
Validation spans unit, integration, and chaos layers. All tests are deterministic by policyVersion, PII-safe, and produce evidence artifacts (snapshots, manifests, logs). CI gates on schema/signature validation, monotonicity, and redaction correctness.
Unit tests¶
Redactor tests
- Idempotency: masking twice yields same output.
- Length/format: PAN groups preserved; IPv4
/24, IPv6/64. - Edge cases: empty/null, already-masked, invalid formats, Unicode.
- Fuzzing: random strings for resilience; ensure no exceptions/leaks.
Classification tests
- Heuristic patterns (email/phone/secret/ip/PHI) map to expected
DataClass. - Precedence:
tenantOverride > pathRule > keyRule > detector. - Monotonic: downgrade attempts are denied and logged.
Policy tests
- Schema-valid + signature verified before load.
- Snapshot tests: apply
policy-atp-v1to fixtures; compare redacted JSON snapshots.
MSTest examples
[TestClass]
public sealed class RedactorTests
{
[TestMethod]
public void Email_Mask_IsIdempotent()
{
var r = Redactors.Email();
var once = r.Redact("john.doe@example.com");
var twice = r.Redact(once);
Assert.AreEqual("j**e@example.com", once);
Assert.AreEqual(once, twice);
}
[TestMethod]
public void Credential_IsErased_FromLogs()
{
var dto = new EvidenceAppendDto { ApiKey = "sk_live_123" };
var logger = TestLogger.WithCompliance(strictSecrets: true);
EvidenceLogs.AppendAccepted(logger, "t-1", "EU", dto);
var entry = logger.LastJson();
StringAssert.DoesNotMatch(entry, new Regex("sk_live_123"));
StringAssert.Contains(entry, "\"ApiKey\":\"\"");
}
}
[TestClass]
public sealed class ClassificationPolicyTests
{
[TestMethod]
public void Monotonic_Downgrade_IsBlocked()
{
var engine = PolicyEngine.LoadSigned("policy-atp-v1");
var result = engine.TryTransition(prevClass: "Sensitive", newClass: "Personal", out var reason);
Assert.IsFalse(result);
StringAssert.Contains(reason, "downgrade");
}
}
Integration tests¶
Ingestion (write-path)
- Submit a payload containing
email,phone,session_token,diagnosis. - Expect:
session_token(Credential) dropped;present=true, fingerprint optional.email(Sensitive) stored raw and masked/hash variants (when policy enabled).- Per-field metadata persisted with
policyVersion,source,classifiedAt.
Query (read-path)
- Same record, query with Standard vs Auditor role.
- Expect:
- Standard:
emailmasked,PHImasked,Credentialdropped. - Auditor:
emailhash,PHImasked,Credentialdropped.
- Standard:
- Logs contain only redacted objects; access decision includes class counts.
Export
- Generate DSAR export; verify:
- Signed manifest (policyVersion, class counts, redaction summary).
- Route is in-region; watermark present.
- No raw Credential/PHI in payload.
MSTest example (end-to-end)
[TestMethod]
public async Task ReadPath_StandardVsAuditor_ProducesDifferentRedaction()
{
var id = await IngestionFixture.AppendAsync(new {
user = new { email = "jane@ex.com" },
attributes = new { session_token = "sk_live_123" , client_ip = "192.168.1.42" },
diagnosis = "PHI: mild"
});
var std = await QueryFixture.GetAsync(id, role: "user");
var aud = await QueryFixture.GetAsync(id, role: "auditor");
Assert.AreEqual("j**e@ex.com", std.user.email);
StringAssert.Matches(aud.user.email, new Regex("^[a-f0-9]{64}$")); // HMAC
Assert.IsFalse(JsonContains(std, "sk_live_123"));
Assert.IsFalse(JsonContains(aud, "sk_live_123"));
}
Chaos testing¶
Policy version skew
- Old client sends
X-Policy-Version: 0while service enforcesv1. - Expect graceful deny or auto-upgrade to v1 (configurable), never downgrade.
Missing classification
- Omit hints and schema tags; rely on detectors.
- Expect upgrade-only classification and safe defaults (e.g.,
Internalmasked) if nothing matches.
Malformed redaction
- Feed already-masked inputs and strange formats.
- Expect idempotent outputs; no exceptions; no leakage to logs.
Tokenization / KMS unavailability
- Simulate vault key unavailable.
- Expect fail-closed: mask instead of tokenizing; emit alert.
Residency guard interaction
- Attempt cross-region export without approval.
- Expect block with reason
route_not_permitted.
SpecFlow (illustrative)
Scenario: Cross-region export without approval is blocked
Given a tenant with region "EU"
And an export request routed to "US"
When I generate the export without approval
Then the response status is 403
And the error reason is "route_not_permitted"
And no export manifest is produced
Fixtures & snapshots¶
Sample fixture
user:
email: jane.doe@example.com
attributes:
session_token: sk_live_123
client_ip: 192.168.1.42
diagnosis: "PHI: mild"
Snapshot layout
/tests/__snapshots__/
policy-v1.ingest.json
policy-v1.query.standard.json
policy-v1.query.auditor.json
policy-v1.export.dsar.manifest.yaml
Metrics & CI gates¶
- Metrics asserted in tests:
log.pii_blocked.count == 0redaction.applied.count{kind=mask|hash|drop} > 0for relevant pathsplan.cache.hit_rate >= 0.9(in integration runs)
- CI gates:
- Policy schema + signature verify step passes.
- Snapshot drift requires approval (PR label:
policy-change). - Coverage thresholds (example): Statements ≥ 80%, Branches ≥ 70%.
Acceptance (done when)¶
- Unit + integration + chaos suites green across services.
- Snapshots recorded for
policyVersionand stored as artifacts. - No raw Credential/PHI in test logs; all exports include signed manifests.
- Downgrade attempts are blocked and evidenced; detectors behave upgrade-only.
Cross-references
Policy-as-Code & Versioning · Write-Time Classification & Redaction · Read-Time Redaction & Enforcement · Security & Compliance · Data Residency & Retention
Performance & Scalability¶
Redaction/classification must scale linearly with event volume while keeping latency predictable and cost bounded. ATP combines lazy techniques (compute on demand) with pre-computed hot-path variants, strict caching, and streaming application to avoid large intermediate allocations.
Write-path optimization¶
-
Lazy redaction (default)
- Persist raw + classification tag; compute masks/hashes at read/export.
- Best for cold/rarely-read fields and when policies evolve frequently.
- Reduces write CPU; avoids backfills on policy tweaks.
-
Pre-compute for hot fields
- For high-frequency lookups (e.g.,
email,phone) storemasked/hmacalongside raw. - Toggle via policy flag (
rulesByClass.*.precompute: true) or per-field override. - Avoids repeated crypto on read, improving P95 read latency.
- For high-frequency lookups (e.g.,
-
Batch classification for backfills
- Run deterministic, idempotent workers (KEDA/Hangfire/Functions) that:
- fetch records by
(tenantId, regionCode, updatedAt)windows, - classify in stable order (e.g.,
recordId ASC) to keep snapshots reproducible, - checkpoint with exactly-once semantics (idempotency keys).
- fetch records by
- Parallelize by tenant shard; cap concurrency per tenant to prevent hot-spotting.
- Run deterministic, idempotent workers (KEDA/Hangfire/Functions) that:
-
Policy caching
- Cache policy bundle + parsed decision tables in-memory with digest keys.
- Warm on startup; event-driven invalidation on
PolicyRotated. - Keep a read-only previous bundle for graceful rollbacks.
Strategy matrix
| Field frequency | Policy churn | Strategy |
|---|---|---|
| Hot (p95 read) | Low | Pre-compute variants |
| Hot | High | Lazy + short TTL caches |
| Warm | Any | Lazy |
| Cold | Any | Lazy |
Read-path optimization¶
- Redaction-plan cache
- Key:
(tenantId, role, purpose, edition, policyVersion). - TTL: 60–120s with event invalidation on policy/override change.
- Store the compiled plan (delegates) to avoid per-call rule resolution.
- Key:
var key = (token.TenantId, token.Role, token.Purpose, token.Edition, policy.Version);
var plan = cache.GetOrCreate(key, entry =>
{
entry.SlidingExpiration = TimeSpan.FromMinutes(2);
return planner.For(token.Role, token.Purpose, policy.Version); // precompiled steps
});
- Streaming redaction
- Apply masking/hashing during serialization to avoid materializing large objects.
await using var stream = ctx.Response.BodyWriter.AsStream();
await using var writer = new Utf8JsonWriter(stream, new JsonWriterOptions { Indented = false });
redactor.WriteRedacted(writer, result, plan); // writes field-by-field, no big buffers
await writer.FlushAsync();
-
Projection-first queries
- Fetch only needed fields already redacted by the storage engine (e.g., computed
masked_emailcolumn) to minimize transfer and CPU. - Use read replicas in-region to isolate heavy export scans from hot read APIs.
- Fetch only needed fields already redacted by the storage engine (e.g., computed
-
Crypto pooling
- Pool
HMACSHA256instances (per key) and reuse buffers; avoid per-call allocations.
- Pool
Monitoring¶
- Core metrics
redaction.ops.count{kind=mask|hash|drop|tokenize}plan.cache.hit_rate(target ≥ 0.90)policy.eval.latency.p95(target ≤ 5 ms)serialization.stream.bytesandstream.flush.latency.p95unclassified.field.rate(alarm if > 0.1% of fields)policy.version.skew.count(client vs enforce)
- SLOs (defaults)
- Read API: P95 end-to-end ≤ 120 ms, P99 ≤ 250 ms
- Export job: sustained throughput ≥ 50k records/min per shard
- Policy rotation propagation: ≤ 2 minutes to 95% of instances
- Alerts
unclassified.field.rate > 0.1%→ investigate producer/schema drift.plan.cache.hit_rate < 0.9→ cache sizing/regression.policy.version.skew.count > 0(sustained) → client upgrade or compat shim.
Guardrails (quick checklist)¶
- Prefer lazy redaction unless the field is a confirmed hot path.
- Limit pre-computed variants to whitelisted fields (email/phone) to avoid storage bloat.
- Cache redaction plans by (tenant, role, purpose, policyVersion); invalidate on events.
- Stream redaction before bytes hit the wire; never allocate giant intermediate models.
- Keep crypto tenant-keyed and pooled; avoid per-request key fetches (use KMS key handles).
- Partition batch jobs by tenant/region; cap per-tenant concurrency to prevent noisy neighbors.
Cross-references
Read-Time Redaction & Enforcement · Write-Time Classification & Redaction · Policy-as-Code & Versioning · Data Residency & Retention
Governance & Continuous Improvement¶
A lightweight but rigorous governance loop keeps classification/redaction accurate, auditable, and adaptable. Reviews are calendar-driven with evidence artifacts, changes flow via ADRs + signed bundles, and improvements are guided by incidents, tenant needs, and auditor input.
Policy review cadence¶
- Quarterly (taxonomy & drift)
- Validate current data classes against platform usage; add/rename only with ADR.
- Review unclassified.field.rate, detector false positives/negatives, and override trends.
- Deliverables: Review minutes, policy diff plan, updated risk register entries.
- Annual (effectiveness & risk)
- Re-assess re-identification risk (linkage attacks) and masking efficacy.
- Run Safe Harbor export audits; verify tokenization governance and detokenization approvals.
- Deliverables: effectiveness report, control tests, auditor-ready evidence pack.
- Ad-hoc (regulatory/event-driven)
- Triggered by law changes (e.g., new PII definitions) or high-severity incidents.
- SLA: proposal ≤ 5 business days, canary ≤ 15 days, enforce ≤ 30 days (region-dependent).
Feedback loops¶
- Incident-driven
- Any classification/redaction miss → post-incident review with corrective actions:
- policy tweak, new detector or path rule, or producer schema contract update.
- Requires an ADR and a signed policy bundle release; link to meta-audit event.
- Any classification/redaction miss → post-incident review with corrective actions:
- Tenant requests
- Tenants may request upgrades (never downgrade) via portal/API.
- Dual approval (DPO + Security). SLA: P95 ≤ 2 business days to decision.
- Auditor feedback
- Recommendations become backlog items with explicit control mapping (GDPR/HIPAA/SOC2).
- Track to completion with evidence (tests, manifests, dashboards).
Change management workflow¶
Propose → Assess → Approve → Rollout → Verify → Record
| | | | | |
ADR Risk/Impact DPO+Sec Canary SLOs Meta-audit
& Test Plan sign-off + Enforce green + Evidence
- Gate checks: schema-valid, signature-ready, monotonicity pass, replay determinism.
- PR checklist:
policy-changelabel, snapshots updated, canary tenants listed, rollback plan.
Metrics & SLOs¶
- Governance
policy.review.completed.count(quarterly/annual)policy.change.lead_time.p95(proposal → enforce) — target ≤ 30doverride.approval.time.p95— target ≤ 2d
- Quality
unclassified.field.rate— target < 0.1%detector.fp_rate/detector.fn_rate— tracked per detectorexport.manifest.missing.rate— 0
- Resilience
policy.rotation.propagation.p95— ≤ 2mdowngrade.blocked.count— should be ≥ 0 (healthy guard), investigate spikes
Roadmap¶
- AI-assisted classification
- Content-aware models (PII/PHI detection beyond key names) with human-in-the-loop review and explainability; models run in shadow mode before enforce.
- Quantum-safe hashing
- Evaluate SHA-3/Keccak families and plan for post-quantum KDFs for long-lived evidence; dual-write fingerprints during migration.
- Differential privacy
- Add DP mechanisms for aggregations over Personal/Sensitive; privacy budgets per tenant/export purpose.
- Policy simulation & what-if
- Pre-enforce simulators to estimate mask/hash/drop deltas, export impact, and override conflicts.
- Formal verification (select rules)
- Prove monotonicity and most-restrictive-wins invariants on core Rego modules.
Artifacts (evidence)¶
- Review minutes (Quarterly/Annual), ADR links, signed policy bundles, test reports, dashboards screenshots, and export manifests with policy versions.
Guardrails (quick checklist)¶
- Reviews produce actionable diffs or explicit “no change” decisions with rationale.
- All changes flow via ADR + signed bundle, never ad-hoc toggles.
- Canary before enforce; auto-rollback on drift or guard spikes.
- Incidents always create feedback tasks; track to closure with tests/evidence.
- Metrics and SLOs visible on a Compliance Dashboard (tenant & region scoped).
Cross-references
Security & Compliance · Policy-as-Code & Versioning · Tenant-Specific Overrides · Data Residency & Retention · Operations / Observability
Appendix A — ConnectSoft Taxonomy Mapping¶
| ATP DataClass | ConnectSoftTaxonomy | Default Redactor |
|---|---|---|
| Personal | EmailRedactor | |
| Personal | Phone | PhoneLast4Redactor |
| Personal | PersonName | Mask (first initial) |
| Personal | IpAddress | IpAddressRedactor |
| Personal | DeviceId (GUID) | GuidRedactor |
| Sensitive | PostalAddress | Mask (city only) |
| Sensitive | PaymentCardPan | PanLast4Redactor |
| Sensitive | BankAccount | Drop or Tokenize |
| Sensitive | FinancialId | Hash (tenant-salted) |
| Credential | Secret | SecretRedactor (erase) |
| Credential | OAuthToken | SecretRedactor (erase) |
| Credential | Jwt | JwtRedactor (header only) |
| Credential | SessionId | Drop |
| PHI | HealthInfo | Mask or Tokenize |
Appendix B — Redaction Examples¶
Email¶
- Input:
john.doe@example.com - Masked:
j**e@example.com(EmailRedactor) - Hashed:
a3f8b...9c2e1(SHA-256, tenant-salted)
Phone¶
- Input:
+1-555-123-4567 - Masked:
****4567(PhoneLast4Redactor)
Payment Card PAN¶
- Input:
4532-1234-5678-9010 - Masked:
**** **** **** 9010(PanLast4Redactor)
JWT¶
- Input:
eyJhbGci....<payload>.<signature> - Masked:
<jwt>.<redacted>.<redacted>(JwtRedactor) - Strict:
(erased in dev strict mode)
IP Address¶
- Input:
192.168.1.42 - Masked:
192.168.1.x(IpAddressRedactor, /24) - Input:
2001:0db8:85a3::8a2e:0370:7334 - Masked:
2001:db8:85a3::/64
GUID¶
- Input:
550e8400-e29b-41d4-a716-446655440000 - Masked:
550e8400-****(GuidRedactor, prefix only)
Secret¶
- Input:
sk_live_abc123... - Masked:
(SecretRedactor, erased) or****(FixedMaskRedactor)
Appendix C — Classification Policy Schema (JSON)¶
{
"$schema": "https://connectsoft.ai/schemas/classification-policy.v1.json",
"id": "policy-atp-v1",
"version": 1,
"effectiveFromUtc": "2025-01-01T00:00:00Z",
"author": "dpo@connectsoft.ai",
"defaultByField": {
"email": "Sensitive",
"phone": "Sensitive",
"ip": "Personal",
"userId": "Personal",
"password": "Credential",
"apiKey": "Credential",
"jwt": "Credential",
"healthNote": "PHI"
},
"rulesByClass": {
"Public": { "kind": "None", "params": {} },
"Internal": { "kind": "None", "params": {} },
"Personal": { "kind": "Mask", "params": { "showFirst": 1, "showLast": 1 } },
"Sensitive": { "kind": "Hash", "params": { "algorithm": "HMAC-SHA256", "tenantSalted": true } },
"Credential": { "kind": "Drop", "params": {} },
"PHI": { "kind": "Mask", "params": { "showFirst": 0, "showLast": 0 } }
},
"overridesByField": {
"email": { "kind": "Mask", "params": { "showFirst": 1, "showLast": 1, "preserveDomain": true } }
}
}
Appendix D — Heuristic Classification Patterns¶
| Pattern (Regex) | Inferred DataClass | Notes |
|---|---|---|
(?i)\b(email\|e-mail)\b |
Sensitive | Apply EmailRedactor |
(?i)\b(phone\|mobile\|msisdn)\b |
Sensitive | Normalize E.164; PhoneLast4Redactor |
(?i)\b(ssn\|nin\|national[_-]?id)\b |
Sensitive | Consider upgrade to special handling |
(?i)\b(password\|secret\|api[_-]?key\|token\|credential\|bearer)\b |
Credential | Drop value; keep indicator present=true |
(?i)\b(ip\|client\.ip\|remote[_-]?addr)\b |
Personal | IpAddressRedactor |
(?i)\b(gps\|geo\.(lat\|lon)\|location)\b |
Sensitive | Quantize (2 decimals) + tokenize or drop |
(?i)\b(name\|first[_-]?name\|last[_-]?name\|full[_-]?name)\b |
Personal | Mask on read; optional hash |
(?i)\b(health\|diagnosis\|vitals\|patient)\b |
PHI | HIPAA-compliant masking/tokenization |
Appendix E — Read-Time Redaction Matrix¶
| DataClass | Privileged (Auditor) | Standard (Tenant User) | Public API |
|---|---|---|---|
| Public | None | None | None |
| Internal | None | Mask (optional) | Omit |
| Personal | Mask or Hash | Mask | Omit |
| Sensitive | Hash (tenant-salted) | Mask or Drop | Omit |
| Credential | Drop | Drop | Omit |
| PHI | Mask or Tokenize | Mask or Drop | Omit |
Appendix F — Cross-Reference Map¶
| Topic | Primary Implementation Doc | Notes |
|---|---|---|
| Classification policy model | architecture/data-model.md §Classification |
Schema definitions |
| Write-time redaction | architecture/hld.md §Data Classification |
Ingestion pipeline |
| Tenant isolation & ABAC | platform/multitenancy-tenancy.md |
Role-based redaction |
| Encryption & key management | data-residency-retention.md §5 |
Tenant-scoped keys for HMAC |
| Privacy compliance | platform/privacy-gdpr-hipaa-soc2.md |
GDPR/HIPAA alignment |
| Logging redaction | operations/observability.md |
Structured logging with compliance |
| Template integration | implementation/template-integration.md |
ConnectSoft microservice setup |