Security & Compliance - Audit Trail Platform (ATP)¶
ATP is security-first by design — every layer enforces defense-in-depth, least privilege, and continuous verification.
Purpose & Scope¶
This document defines ATP’s security posture and the control framework that governs it. It consolidates what we secure, how we secure it, and how we prove it — while deferring deep technical details to the specialized docs it references.
What this document covers
- Establish ATP’s security architecture across network, identity, application, data, and operations.
- Define a control framework (preventive/detective/corrective/deterrent) and how controls are owned, tested, and evidenced.
- Lay out the threat model (actors, assets, trust boundaries) and incident response at a platform level.
- Set the compliance attestation strategy (GDPR, HIPAA, SOC 2, ISO 27001) and evidence sources.
- Link to detailed guides instead of duplicating:
- Tenancy & ABAC → multitenancy-tenancy.md
- Residency & retention → data-residency-retention.md
- Zero Trust & hardening → zero-trust.md
- Key management & rotation → key-rotation.md
- PII classification & redaction → pii-redaction-classification.md
- Backups/restore/eDiscovery → backups-restore-ediscovery.md
Out of scope (referenced elsewhere)
- Implementation minutiae of guards, crypto primitives, or service-specific configs.
- Day-2 operational runbooks beyond security (scaling, performance tuning, generic SRE playbooks).
Readers & ownership
- Platform Security (owners): policies, control registry, audits.
- SRE/Operations: detection engineering, response playbooks, drills.
- Product/Engineering: secure SDLC, boundary enforcement in services.
- Compliance/Legal/DPO: framework mappings, exceptions, auditor interface.
Artifacts produced
- Security Control Inventory (IDs → owner, evidence, test cadence).
- Threat Model (trust boundaries + risk matrix).
- Incident Response Playbooks (cross-tenant, tampering, exfil, key compromise).
- Compliance Crosswalks (GDPR/HIPAA/SOC2/ISO) with evidence pointers.
- Attestation Pack templates (monthly/quarterly), CI/CD security gates.
Acceptance (done when)
- Control inventory is complete, owned, and each control has evidence sources and test cadence.
- Threat model states actors/vectors/assets, trust boundaries, risks, and mitigations.
- Incident response is actionable, tested, and cross-linked to on-call docs.
- Compliance mappings point to policies, logs, metrics, manifests, ADRs — no dead links.
Security Architecture Overview¶
ATP applies defense-in-depth across five layers — network → identity → application → data → operations — with Zero Trust defaults (see zero-trust.md). Security is enforced at every boundary of the core path: external API → ingestion → storage → query → export. Control-plane policies (OPA/Rego) govern the data-plane, and all decisions are observable and auditable.
Layers & Principles¶
- Network
- VNet/VPC isolation per environment and region; private endpoints to data stores; service mesh mTLS and policy (deny-by-default).
- Ingress via API Gateway with WAF, rate-limits, bot/DoS protections, and IP allow-lists for operator endpoints.
- Identity
- OIDC for users; workload identities for services; short-lived tokens with purpose and residency claims.
- mTLS between services; SPIFFE/SPIRE (or equivalent) SVIDs for service identity in mesh.
- Application
- PEP-1 (Gateway) for coarse residency/tenancy guards; PEP-2 (Service) for fine-grained ABAC, classification, and quota checks.
- Policy-as-code (OPA/Rego) embedded and versioned; decisions stamped with
policyVersion.
- Data
- WORM evidence segments; hash-chained integrity with regional anchors; envelope encryption with tenant-scoped keys.
- Residency-aware storage and replication; export routes constrained by profile.
- Operations
- Immutable, structured security logs; SIEM pipelines; alerts on guard violations, key ops, anomaly spikes.
- Runbooks and playbooks for incident response; continuous verification and drills.
Trust Boundaries & Core Path¶
flowchart LR
U[Client / Integrations] -->|JWT + mTLS| G["API Gateway (PEP-1)"]
G --> I["Ingestion Services (PEP-2)"]
I --> S[(Hot Storage - WORM)]
S --> Q["Query/Read Models (PEP-2)"]
Q --> E["Export/eDiscovery (PEP-2)"]
subgraph Control Plane
P["Policy Engine (OPA/Rego)"]
K[KMS/HSM]
C[Residency & Retention Catalog]
M[Monitoring/SIEM]
end
G --- P
I --- P
Q --- P
E --- P
S --- K
E --- K
G --- M
I --- M
Q --- M
E --- M
C --- G
C --- I
C --- Q
C --- E
Boundaries
- External → Gateway: authentication, WAF, rate-limits, coarse tenancy/residency guard.
- Gateway → Services: purpose-bound requests with signed context headers (
X-Policy-*), correlation IDs. - Services → Data Stores: private network only; attribute checks (tenant, region, category) before read/write.
Control Plane vs Data Plane¶
- Control Plane
- Policy Registry (residency, retention, ABAC), Key Management (per-tenant/region), Catalogs (tenants, regions, silos), Observability (metrics/logs/events).
- Drives decisions; emits evidence (policy changes, approvals, rotations, drill reports).
- Data Plane
- Executes guards and modes: append, seal, verify, purge/redact, export.
- Mutations are ledgered and idempotent; decisions reference control-plane
policyVersion.
Enforcement Points (PEP) & Decisions (PDP)¶
- PEP-1 Gateway
- Enforces deny-by-default, residency route checks, DDoS/WAF, purpose binding, and basic quota.
- Annotates requests with decision context; drops or quarantines on mismatch.
- PEP-2 Services
- Performs fine-grained ABAC (tenantId, regionCode, dataSiloId, category), classification/redaction checks, and export route validation.
- PDP (Policy Engine)
- Co-located or sidecar OPA with signed bundles; hot-reload on policy updates; short decision cache with instant revocation on changes.
Key & Secrets Path¶
- KMS/HSM per region; KEKs are region-anchored, DEKs are tenant-scoped and rotated aggressively.
- No plaintext secrets in code or images; bootstrap via identity (Managed Identity) + short-lived tokens.
- Key operations (wrap/unwrap/sign) logged with key lineage and surfaced to SIEM.
Integration Points¶
- ConnectSoft Identity Context: source of user/service identities, entitlements, and purpose claims.
- KMS/HSM: encryption at rest, anchor signing, manifest signing; rotation cadences from key-rotation.md.
- Monitoring/Alerting: unified security signals (guard denials, anomaly spikes, key ops, export attempts) with routed on-call.
- Data Residency & Retention: region topology, replication posture, purge/export modes (see data-residency-retention.md).
Guardrails (quick checklist)¶
- Every request carries purpose and residency claims; decisions are deny-by-default without them.
- All data-plane calls traverse private networks with mTLS; no public egress to data stores.
- Policy bundles are signed, versioned, and audited; PEPs stamp
policyVersioninto logs. - Keys are region-anchored; no cross-border unwrap; all key ops audited.
- Security telemetry is first-class: structured, PII-safe, and correlated end-to-end.
Threat Model & Attack Surface¶
We model threats with STRIDE (Spoofing, Tampering, Repudiation, Information disclosure, DoS, Elevation of privilege) and a kill-chain lens (recon → initial access → execution → persistence → exfiltration/impact). We keep the scope anchored to ATP’s trust boundaries, assets, and decision points.
Assets & Trust Boundaries¶
Crown-jewel assets
- Hot evidence (WORM segments) and anchors (signed, timestamped).
- Residency & retention catalogs (govern movement, purge eligibility).
- KMS/HSM keys (tenant DEKs, region KEKs, anchor signing keys).
- Policy bundles (OPA/Rego) and attestation manifests.
- Export bundles (NDJSON/Parquet) + signed manifests.
- Identity tokens (user/service), break-glass approvals.
Trust boundaries (high-level)
flowchart LR
ext[External Clients/Integrations]
gw["API Gateway (PEP-1)"]
svc["Microservices (PEP-2)"]
hot[(Hot/WORM Storage)]
rm[(Read Models/Indexes)]
exp[Export/eDiscovery]
kms[[KMS/HSM]]
opa[[Policy Engine/OPA]]
obs[[SIEM/Observability]]
cat[[Residency/Retention Catalogs]]
ext -->|Authn + mTLS| gw --> svc
svc --> hot
svc --> rm
svc --> exp
svc --- kms
gw --- opa
svc --- opa
gw --- obs
svc --- obs
exp --- kms
svc --- cat
Threat Actors (representative)¶
- External attackers (targeted or opportunistic, incl. bots).
- Malicious tenants (abuse APIs to escape scope or exfiltrate).
- Insiders (over-privileged operators/devs; compromised accounts).
- Supply chain (poisoned dependencies/containers, CI artifact tampering).
- Compromised integrations (leaky webhooks, abused API keys).
- Cloud control-plane misuse (misconfiguration, stale IAM roles).
Attack Vectors (STRIDE mapped)¶
| Vector | STRIDE | Examples in ATP context | Primary mitigations |
|---|---|---|---|
| Unauthorized access | S/E | Token theft/replay, client secret leakage, JWT forgery attempts | OIDC, mTLS, short-lived tokens, purpose binding, JTI replay checks |
| Tampering | T | Modify evidence prior to seal; anchor substitution; policy bundle tampering | WORM, Merkle/anchors (HSM-signed), signed policy bundles, canary verification |
| Repudiation | R | Delete/alter security logs; deny actions taken | Append-only logs, meta-audit stream, signed manifests, time-stamps |
| Information disclosure | I | Cross-tenant query, cross-region export, DSAR over-breadth | ABAC tenancy/residency guards, export routes, redaction templates |
| DoS / resource exhaustion | D | Hot-shard floods, export storms, query fan-outs | Rate-limits, quotas, back-pressure, governors, cost guards |
| Elevation of privilege | E | Break-glass abuse, role drift, policy bypass | Dual-approval + TTL, least-privilege, OPA at PEP-½, continuous attestations |
ATP-Specific Threats (focus items)¶
- Cross-tenant leakage
- Why it matters: Multi-tenant platform; any leakage is critical.
- Paths: mis-scoped queries, missing
tenantId/dataSiloIdchecks, cache bleed. - Controls: PEP-1/PEP-2 ABAC, per-tenant caches, tenancy tags in indices, contract tests.
- Integrity compromise of audit evidence
- Paths: anchor key misuse, Merkle root mismatch, TSA failures, “re-seal” attempts.
- Controls: WORM stores, HSM sign-only keys per region, bridge anchors for moves, scheduled verification with quarantine.
- Retention bypass or purge abuse
- Paths: policy relaxation, silent purge without ledger, export-then-purge without verify.
- Controls: policy-as-code with bounds, purge ledger, export verification gates, dual approval for sensitive categories.
- Residency violations / exfiltration
- Paths: cross-region reads/exports, replica hydration to disallowed regions.
- Controls: residency profiles, export routes, deny-by-default cross-region, cost guards, network egress allow-lists.
- Break-glass misuse
- Paths: over-broad scopes, unlimited TTL, lack of follow-up.
- Controls: scoped approvals, ≤4h TTL, evidence hooks, mandatory post-mortem.
- Supply-chain insertion
- Paths: poisoned dependency/container, tampered OPA bundles.
- Controls: SBOM, sigstore/cosign, verified builds, signed OPA bundles, admission policies.
Abuse Stories (examples to test)¶
- “As a malicious tenant, I try to read US data with an EU token.” → Expect deny with
cross_region_blocked. - “As an operator, I request break-glass to export global.” → Expect deny unless route permitted + dual approval + TTL.
- “As an attacker with a leaked JWT, I call purge.” → Expect deny (missing purpose/scope) + alert + token revoke.
Risk Scoring & Prioritization¶
Scale: Likelihood (L) and Impact (I) from 1–5. Risk = L × I. Priority: ≥16 = Critical, 12–15 = High, 8–11 = Medium, ≤7 = Low. We track Residual after controls.
| Threat | L | I | Risk | Residual | Owner | Next action |
|---|---|---|---|---|---|---|
| Cross-tenant leakage | 3 | 5 | 15 (High) | 9 (Med) | Platform Sec | Expand contract tests to cache layer; add per-tenant cache keys to lint |
| Integrity compromise | 3 | 5 | 15 (High) | 8 (Med) | Core Eng | Increase verify-on-read coverage to 20% for sensitive categories |
| Residency violation | 2 | 5 | 10 (Med) | 6 (Low) | SRE | Enforce cost estimate gate on all cross-region attempts |
| Retention bypass | 3 | 4 | 12 (High) | 8 (Med) | Data Gov | Require dual approval for evidence.hot purge; tighten dry-run diff thresholds |
| Break-glass misuse | 2 | 4 | 8 (Med) | 5 (Low) | Security Ops | Auto-create post-mortem tasks; add scope validators to approval UI |
| Supply-chain insertion | 3 | 4 | 12 (High) | 9 (Med) | Platform Sec | Gate deploys on cosign verification + SBOM diff alerts |
Risk register entries link to controls, tests, and evidence (SIEM queries, manifests, ADRs). Residual risk is recalculated after each control change.
Detection & Response Hooks (per threat)¶
- Cross-tenant attempts →
abac.decision_denied{reason="tenant_mismatch"}alerts; dashboard by tenant/region. - Integrity failures →
integrity.violation_detected; auto-quarantine; run verifier; block exports. - Retention anomalies →
retention.dryrun.anomaly=truewhen >35% swing; require governance review. - Cross-region attempts →
abac.decision_denied{reason="cross_region_blocked"}+ cost guard logs. - Break-glass →
break_glass.usedwith TTL countdown; reminder + post-mortem ticket at expiry. - Supply chain → CI gate fails on unsigned images/OPA bundles; runtime admission webhooks deny.
Assumptions & Constraints¶
- Cloud provider physical security is out of scope; we trust IaaS attestations.
- All services run with managed identities; no long-lived static secrets.
- Data stores are reached only via private network; no public data-plane endpoints.
- Policy bundles and images are signed; unsigned artifacts cannot run.
Evidence (what we capture)¶
- Structured decision logs (ABAC, retention, export routes) with
policyVersion. - Anchor signatures, TSA receipts, and verification reports.
- Purge ledgers and export manifests (signed, checksummed).
- Control tests: residency conformance, retention dry-runs, DR drill reports.
- Supply-chain attestations: SBOMs, cosign signatures, provenance.
Security Control Framework¶
ATP’s controls are organized as preventive, detective, corrective, and deterrent measures, mapped to industry frameworks (NIST CSF, CIS Controls, ISO/IEC 27001 Annex A). Controls are policy-as-code, owned, tested on a defined cadence, and evidenced via immutable telemetry.
Control Categories (how we secure)¶
- Preventive — stop bad things by default: Zero Trust network, ABAC at gateways/services, per-tenant encryption keys, WORM storage, egress allow-lists.
- Detective — see what matters quickly: SIEM pipelines, anomaly detectors, guard-violation alerts, integrity verification jobs.
- Corrective — recover to safe state: DR playbooks, restore to quarantine, purge ledgers with replay/rebuild of read models.
- Deterrent — raise the bar & trace actions: dual-approval + TTL for break-glass, signed policies/images, visible audit trails and watermarked exports.
Framework Cross-Mapping (anchor points)¶
| Domain | NIST CSF | CIS v8 (examples) | ISO/IEC 27001:2022 (Annex A) | ATP Control Examples |
|---|---|---|---|---|
| Asset/Context | ID.AM | 1, 2 | A.5.9, A.5.12 | Residency/retention catalogs; control registry |
| Access Control | PR.AC | 6 | A.5.15, A.5.16 | ABAC tenancy/residency guards; purpose-bound tokens |
| Data Security | PR.DS | 3 | A.8.24, A.5.10 | Per-tenant/region encryption; WORM; redaction |
| Detection | DE.AE, DE.CM | 8 | A.8.15 | Guard-violation alerts; SIEM correlation, anomaly detection |
| Response | RS.RP, RS.MI | 17 | A.5.30, A.5.31 | IR playbooks (cross-tenant, tampering, exfil); kill switches |
| Recovery | RC.RP | 11 | A.5.31 | Region-coherent backups; DR drills (read-only promotion) |
| Change/Config | PR.IP | 4, 5 | A.5.23 | Signed OPA bundles/images; CI policy gates; ADRs |
| Supplier/SBOM | ID.SC | 15 | A.5.20, A.5.19 | Cosign-verified images, SBOM diff alerts, license checks |
| Logging/Evidence | DE.AE | 8 | A.8.15 | Meta-audit stream; signed manifests; evidence packs |
Detailed mappings for GDPR/HIPAA/SOC 2/ISO are maintained in privacy-gdpr-hipaa-soc2.md.
Control Inventory (samples)¶
| Control ID | Category | Description | Owner | Evidence Sources | Test Cadence |
|---|---|---|---|---|---|
| AC-ATP-001 | Preventive | ABAC enforcement at Gateway (PEP-1): tenant/region/purpose | Platform Security | ABAC decision logs, Rego bundle signature, gateway policyVersion | Continuous + per-release contract tests |
| DS-ATP-010 | Preventive | Per-tenant envelope encryption; region-anchored KEKs | Platform Security | KMS/HSM audit logs, key rotation manifests | Daily key checks + quarterly rotation drill |
| IN-ATP-020 | Detective | Integrity verification (on-read/sample/scheduled) | Core Eng | integrity.verify.* metrics, violation events, verifier reports |
Daily sample + weekly full for last week |
| RE-ATP-030 | Corrective | Restore to quarantine namespace (read-only) | SRE | Restore logs, hash/anchor/TSA verification evidence | Monthly restore drill per region |
| EX-ATP-040 | Deterrent | Break-glass dual approval with TTL and scope | Security Ops | Approval records, break_glass.used events, audit trail |
Continuous; monthly review of usage |
| RS-ATP-050 | Preventive | Residency profiles with export route gating | Data Governance | Residency decisions, export manifests, policy registry | Continuous; conformance suite nightly |
| CH-ATP-060 | Preventive | Signed OPA bundles and container images | Platform Security | Cosign attestations, admission controller logs | Per-build; gate blocks unsigned |
| LG-ATP-070 | Detective | Meta-audit stream (append-only) for guard decisions | ATP Team | Structured logs with correlationId & policyVersion | Continuous; weekly integrity spot-check |
Control Registry Schema (sketch)¶
id: "AC-ATP-001"
name: "Gateway ABAC Enforcement"
category: preventive
frameworks:
nist_csf: ["PR.AC", "PR.PT"]
cis_v8: [6]
iso27001: ["A.5.15", "A.5.16"]
owner: "Platform Security"
policyRefs:
- "platform/multitenancy-tenancy.md#guards"
- "platform/data-residency-retention.md#residency-aware-access-controls"
evidence:
logs:
queries:
- name: "abac_denies_24h"
query: 'decision=="deny" and reason endswith "_blocked"'
metrics: ["abac.allow.count", "abac.deny.count"]
artifacts: ["rego-bundle.sig", "policyVersion"]
tests:
type: ["contract", "integration"]
cadence: { continuous: true, nightly: true }
successCriteria:
- "deny rate baseline ±5pp"
- "no unsigned bundle admitted"
risk:
inherent: high
residual: medium
Evidence & Test Cadence (how we prove it)¶
- Evidence sources: SIEM queries, KMS/HSM logs, signed manifests (exports/purge/anchors), purge ledgers, verification reports, ADRs and policy diffs.
- Cadence:
- Continuous: guard decisions, SIEM correlations, unsigned artifact blocks.
- Nightly: residency conformance suite; integrity sample verify; dependency scan.
- Weekly: retention dry-run stats review; key/anchor log review.
- Monthly/Quarterly: restore and DR drills; control attestation pack generation.
- Gates: CI blocks on unsigned images/OPA bundles, policy lint failures, residency test regressions.
Cross-References (implementation details)¶
- Tenancy & ABAC guards → multitenancy-tenancy.md
- Residency & retention controls → data-residency-retention.md
- Zero Trust/service mesh & boundary hardening → zero-trust.md
- Key rotation & escrow → key-rotation.md
- Observability & evidence packs → observability.md, backups-restore-ediscovery.md
Guardrails (quick checklist)¶
- Every control has a named owner, policy reference, evidence query, and test cadence.
- Controls are policy-as-code with signatures and versioning; changes require ADRs.
- CI/CD gates enforce signatures, lints, and conformance tests before deploy.
- Evidence is immutable, PII-safe, and tied to policyVersion for audit traceability.
Authentication & Authorization Strategy¶
ATP uses token-based authentication with RBAC for coarse permissions and ABAC for contextual enforcement (see multitenancy-tenancy.md §5). All service-to-service calls are sender-constrained (mTLS or DPoP), and purpose-bound tokens are required at every boundary. Break-glass is time-boxed, dual-controlled, and fully audited (see §19 in multitenancy-tenancy.md).
Identities & Token Types¶
- Human users
- OIDC interactive flows (Auth Code + PKCE).
- JWTs carry tenant, scopes, and entitlements resolved from the Identity context.
- Short-lived access tokens (≤ 15 min) + refresh tokens (rotating).
- Workloads (services, jobs, agents)
- mTLS between services; client credentials to obtain short-lived access tokens (≤ 5 min).
- Managed identities (where supported) for secret-less bootstrap.
- For ingestion agents, optionally scoped API keys (HMAC) with tight IP/rate/TTL bounds.
- Delegation (On-Behalf-Of)
- OBO exchange produces a down-scoped service token that encodes the original user (via
act/oboclaim) and strips disallowed scopes.
- OBO exchange produces a down-scoped service token that encodes the original user (via
Claims Model (sketch)¶
Tokens are purpose-bound and residency-aware. Minimal claim set:
{
"iss": "https://id.connectsoft.example",
"sub": "svc-query", // user or workload identity
"aud": "atp-api",
"exp": 1730190000,
"nbf": 1730189100,
"jti": "4d1c...",
"tenant_id": "7c1a-...",
"edition": "enterprise",
"scopes": ["evidence.read", "index.query"],
"entitlements": ["ediscovery.viewer"],
"purpose": "default", // e.g., dsar_export, ops_triage, dr_failover_review
"region_code": "EU",
"data_silo_id": "silo-7c1a...",
"policy_version": "3.3.0",
"break_glass": false,
"obo": { "sub": "user@tenant", "amr": ["mfa"] } // present for OBO flows
}
RBAC + ABAC (how decisions are made)¶
- RBAC grants baseline abilities (e.g.,
ediscovery.viewer,retention.operator). - ABAC enforces tenant/region/category/purpose at request time:
- Same-region writes only; cross-region reads/exports deny-by-default unless profile allows.
- Exports must match an allowed route (
in_region,same_code,global) per residency profile. - Missing
purposeor residency attributes → deny.
Rego snippet (gateway PEP-1, sketch)
package atp.authz
default allow = false
allow {
input.token.tenant_id == input.resource.tenantId
input.token.region_code == input.resource.regionCode
input.token.purpose != ""
input.op == "read"
}
allow {
input.op == "export"
input.token.purpose == "dsar_export"
input.export.route == allowed_route[input.token.residency_profile]
}
allowed_route["gdpr-standard"] = "in_region"
Service-to-Service Patterns¶
- mTLS + client credentials
- Workload obtains sender-constrained access token (mTLS or DPoP).
- PEP verifies TLS binding and token audience; rejects bearer-only tokens on internal hops.
- Scoped API keys (ingestion only)
- HMAC-signed requests with key id (
kid) and rotating secret. - Constraints baked into key metadata: tenant, category, IP/CIDR, max QPS, expiry.
- Keys can only append; no read/export scopes.
- HMAC-signed requests with key id (
HMAC header (example)
Authorization: ATP-HMAC kid=ing-01,ts=1730189300,nonce=2f9c...,sig=base64(hmac_sha256(secret, method|path|ts|nonce|sha256(body)))
Propagation & Context¶
- Gateway adds signed context headers (immutable within hop):
X-Policy-Decision,X-Policy-Version,X-Region-Code,X-Data-Silo-Id,X-Correlation-Id.
- Services must re-evaluate ABAC with local resource attributes; never trust caller’s tenant/region blindly.
Token Lifetimes & Rotation¶
- Access tokens: ≤ 15 min (humans), ≤ 5 min (workloads); clock-skew tolerant.
- Refresh tokens: rotating; revoke on logout / risk events.
- JWKS:
kidpinning; staged key rotation with overlap; reject stalekidafter grace period. - DPoP/mTLS binding: required for privileged scopes (export, purge, residency admin).
Break-Glass Workflow (controlled exception)¶
- Request: operator submits justification, scope (
tenantId,regionCode,category,op), desired TTL (≤ 4h). - Dual approval: Security + DPO/Legal; system issues a scoped token with
break_glass=trueandapproval_id. - Enforcement: PEPs only allow routes covered by the approval; all calls write explicit evidence (
break_glass.used). - Aftercare: auto-expiry → revoke; create post-mortem ticket with linked logs/metrics.
Common Flows¶
sequenceDiagram
autonumber
participant U as User
participant ID as Identity (OIDC)
participant GW as API Gateway (PEP-1)
participant S as Service (PEP-2)
participant K as KMS/HSM
U->>ID: Auth Code + PKCE
ID-->>U: Access Token (JWT)
U->>GW: Request (JWT)
GW->>GW: RBAC + ABAC (purpose/tenant/region)
GW-->>S: Forward + signed X-Policy-* headers (mTLS)
S->>S: ABAC on resource attrs
S-->>K: (optional) Sign/Wrap (mTLS)
S-->>GW: Response + decision logs
Guardrails (quick checklist)¶
- Require purpose and residency claims for all sensitive routes; deny-by-default without them.
- All internal hops are sender-constrained (mTLS/DPoP); bearer-only tokens are rejected inside the mesh.
- Enforce OBO down-scoping; never forward end-user tokens directly to storage.
- API keys are append-only, scoped, rotated, and rate-limited; no read/export via keys.
- Break-glass tokens are time-boxed, scope-limited, dual-approved, and leave a full audit trail.
- Token/JWKS rotation is staged with overlap; stale
kidis blocked after grace.
Network & Boundary Controls¶
ATP's network follows a hub–spoke, private-by-default design. All data-plane paths are east–west over private networks with mTLS in the mesh; ingress is only through the API Gateway; egress is restricted to an explicit allowlist. Cross-region topology aligns with data-residency-retention.md §3.
Topology & Isolation¶
- Per-region VPC/VNet isolation with separate spokes for:
- Gateway (ingress), Services (ingestion/query/export), and Data (Hot/WORM, Read Models).
- Private endpoints/Private Link to all stateful stores (no public data-plane endpoints).
- Hub provides shared services: egress gateway, DNS, SIEM collectors, policy distribution.
- Peering is intra-region only; cross-region traffic uses private interconnect and is governed by residency policies.
flowchart LR
subgraph "Region (EU West)"
HUB[[Hub]]
GW["API Gateway (WAF/DDoS/PEP-1)"]
SVC1["Ingestion Svc (PEP-2)"]
SVC2["Query Svc (PEP-2)"]
HOT["(Hot/WORM Storage)<br/>Private Link"]
WARM["(Read Models)<br/>Private Link"]
EGR[Egress Gateway]
DNS[Private DNS]
HUB---GW
HUB---EGR
HUB---DNS
GW---SVC1
GW---SVC2
SVC1---HOT
SVC2---WARM
end
subgraph "Region (EU North)"
GWN[Gateway Mirror]
SVCN[Read Replica]
WARMR["(Warm Replica)<br/>Private Link"]
end
SVC2==Private Interconnect / Mesh mTLS==>SVCN
Ingress (API Gateway)¶
- WAF with OWASP CRS (SQLi, XSS, RCE, SSRF) + custom rules:
- Block query strings with known attack payloads, XML entity expansion, over-sized bodies.
- DDoS protections, rate limiting (per IP/tenant), and bot management.
- TLS: modern ciphers, HSTS, TLS ≥ 1.2 (prefer 1.3); optional SPKI pinning for operator portals.
- Operator endpoints behind IP allowlists and MFA-backed auth flows.
- Context sealing: adds signed
X-Policy-*headers and correlation IDs for downstream enforcement.
East–West (Service Mesh Policies)¶
- mTLS STRICT for all service-to-service traffic; deny-by-default authorization in mesh.
- Workload identity per service (SPIFFE/SPIRE or equivalent) → policy references identity + namespace.
- L7 authz: only allow paths/verbs required for the contract; block wildcards.
- Sidecar egress: forced through the egress gateway; direct internet blocked.
Data Plane Boundaries¶
- Private Link to data stores; storage accounts/network rules deny public access.
- Attribute checks at boundary: tenant/region/category must match before read/write.
- WORM backends enforce immutability at storage layer; admins cannot bypass via network.
Egress Controls (Exfiltration Prevention)¶
- Allowlist-only: outbound flows restricted to approved FQDNs/CIDRs (e.g., timestamping authority, compliance webhooks).
- DNS: private resolvers with split-horizon; public DNS calls blocked except for allowlisted domains.
- NAT/Egress gateway: single choke point with logging; no direct pod-to-internet.
- Data residency hooks: cross-region routes are denied by default; if allowed, path must be private and same RegionCode.
Egress allowlist (sketch)
egress:
default: deny
allow:
- name: tsa-rfc3161
host: tsa.example.tld
ports: [443]
purpose: integrity_timestamp
- name: audit-webhook
host: audit.soc2-partner.tld
ports: [443]
purpose: evidence_delivery
dns:
blockPublic: true
privateZones: [ "svc.cluster.local", "priv.atp.internal" ]
mesh:
mtls: strict
outbound:
viaEgressGateway: true
Regional Network Topology (residency-aware)¶
- Authoritative writer stays in the tenant’s CloudRegion; replicas (if any) live only in permitted RegionCodes.
- No cross-family peering (e.g., EU↔US) unless the residency profile explicitly allows and a legal basis is recorded.
- Failover posture defaults to read-only (see DR sections), with routing changes broadcast via the Residency Catalog.
Operational Controls & Evidence¶
- Continuous validation: policy conformance tests (deny direct public egress, deny storage public endpoints).
- Telemetry:
network.egress.blocked.count,waf.blocked.count,mesh.denied.count,private_link.bytes. - Forensics: packet captures at the egress gateway (on-demand), flow logs, and WAF request samples (redacted).
Guardrails (quick checklist)¶
- All stateful services use Private Link; public access disabled at resource level.
- Ingress only through the API Gateway (WAF/DDoS/rate-limit); operator endpoints IP-allowlisted.
- Mesh enforces mTLS STRICT + deny-by-default; internal calls are sender-constrained.
- Egress is allowlisted and routed via a single gateway; DNS is private and controlled.
- Cross-region connectivity respects residency profiles; unauthorized paths are blocked and logged.
Application Security & Secure SDLC¶
ATP’s SDLC bakes security in from design → code → build → deploy → operate. We enforce secure defaults, verify via automated gates, and record decisions in security-focused ADRs.
Secure Coding Standards¶
- Validate at boundaries: length/range/format, strict whitelists, reject-by-default.
- Encode on output: HTML/JS/URL/SQL contexts; never concatenate into queries.
- Parameterized queries everywhere (ORM or parameterized SQL); no string interpolation.
- Canonicalize inputs before comparison; normalize Unicode; trim invisible characters.
- Least privilege for service accounts; deny filesystem/network access by default.
- Safe deserialization: use allowlists; avoid dynamic type binders.
- Secrets never logged; structured logs use redaction providers.
- CSRF/CORS: same-site cookies, anti-forgery tokens, explicit origins allowlist.
- Do not trust client-side checks; repeat server-side verification.
- Crypto: approved algorithms only; use platform crypto APIs; no homegrown crypto.
Example (C# parameterized query, NHibernate/ADO.NET)
using var cmd = new MySqlCommand(
"SELECT * FROM Events WHERE TenantId = @tenant AND CreatedAt >= @from AND CreatedAt < @to",
conn);
cmd.Parameters.AddWithValue("@tenant", tenantId);
cmd.Parameters.AddWithValue("@from", fromUtc);
cmd.Parameters.AddWithValue("@to", toUtc);
Example (output encoding in Razor)
Dependency & Supply Chain Hygiene¶
- SBOM generation per build (CycloneDX/SPDX) and diff alerts in CI.
- SAST on every PR; DAST (baseline + authenticated) per release.
- Container/IaC scanning (Dockerfiles, Helm/Bicep/Terraform) with policy gates.
- Sigstore/cosign verification for images and signed OPA policy bundles.
- Pinned versions; no
latest. Lockfiles committed. Transitive deps monitored. - License compliance: allowlist; flag copyleft where redistribution applies.
Azure DevOps pipeline (sketch)
stages:
- stage: security_checks
jobs:
- job: sbom_sast_iac
steps:
- script: dotnet build /p:ContinuousIntegrationBuild=true
- script: dotnet tool run cyclonedx # SBOM
- script: dotnet tool run securityscan # SAST (e.g., security code scan)
- script: trivy fs --exit-code 1 . # IaC + secrets scan
- script: trivy image --exit-code 1 $(imageName) # Container scan
- job: dast
steps:
- script: zap-baseline.py -t $(DEPLOYED_URL) -r zap.html
condition: and(succeeded(), eq(variables['RunDAST'], 'true'))
- stage: sign_and_verify
jobs:
- job: cosign_verify
steps:
- script: cosign verify --key $(COSIGN_PUB) $(imageName)
Secrets Management¶
- No secrets in code, images, or pipelines. Use Managed Identity → Key Vault/HSM.
- Short-lived credentials only; rotate keys automatically; alert on static creds.
- Secret discovery scans in CI; policy blocks merges if detected.
- App config uses Key Vault references; services fetch at startup with retry/jitter.
- Zero plaintext: TLS in transit, encrypted at rest; sensitive config redacted in logs.
Code Review & Security ADRs¶
- Two-person rule for security-affecting changes (auth, crypto, policy, network).
- Security checklist on PRs:
- Input validation/encoding ✔
- Authorization at boundary (PEP-1/PEP-2) ✔
- Residency/tenancy attributes plumbed ✔
- Secrets removed; config via Key Vault ✔
- Logging PII-safe; correlation IDs present ✔
- Tests added: happy path, deny path, negative/abuse stories ✔
- Security ADR required when changing: authN/Z flows, encryption, data movement, egress, policy engines.
- Template fields: Context, Decision, Alternatives, Risks, Rollout & Rollback, Evidence hooks.
CI/CD Security Gates¶
- Fail fast on: unsigned image/policy bundle, high/critical CVE, secret leak, failing SAST/DAST, policy lint errors.
- Policy-as-code enforcement (OPA) for: residency routes, network egress, image provenance, SBOM allowlist.
- Canary with guards: enable new controls for a cohort; auto-rollback on guard regressions (deny spike, integrity fail).
Policy lint (Rego sketch)
package ci.guardrails
deny[msg] { input.container.tag == "latest"; msg := "no 'latest' tags" }
deny[msg] { input.image.signature.verified != true; msg := "unsigned image" }
deny[msg] { some cve in input.sbom.cvss; cve.score >= 7.0; msg := "high CVE present" }
Security Testing (what we automate)¶
- Unit/contract tests for authZ decisions, tenancy/residency ABAC, classification redaction.
- Integration tests with private endpoints and mTLS; ensure bearer-only tokens fail in mesh.
- Abuse tests: injection payloads, path traversal, oversized body, replay with stale
jti. - Regression packs for past incidents or near-misses.
Developer Experience & Guardrails¶
- Secure templates and snippets (parameterized queries, encoding helpers, HTTP clients with mTLS).
- Pre-commit hooks: secret scans, formatting, linting.
- Local dev: seeded test identities, fake Key Vault, local OPA bundle; no real secrets.
- Education: OWASP Top 10 refreshers, crypto dos/don’ts, “how to file a Security ADR”.
Evidence & Acceptance¶
- Artifacts: SBOMs, scan reports, cosign attestations, policy bundle signatures, SAST/DAST dashboards.
- Logs/metrics:
security.scan.failed.count,image.unsigned.blocked.count,secrets.leak.detected. - Done when: PR passes all security gates, has security checklist ticked, and changes are recorded in a Security ADR if applicable.
Cross-References¶
- Residency-aware access & data guards → data-residency-retention.md
- Tenancy & ABAC → multitenancy-tenancy.md
- Zero Trust & mesh hardening → zero-trust.md
- Key rotation & escrow → key-rotation.md
Data Security Controls¶
ATP treats data security as a layered control: strong cryptography (at-rest & in-transit), classification & redaction at boundaries, immutability & integrity for evidence, and encrypted backups with secure restore. See data-residency-retention.md §5 and key-rotation.md for keying details; see tamper-evidence.md for integrity; see backups-restore-ediscovery.md for backup/restore.
Encryption (At-Rest)¶
- Envelope encryption:
- DEK (per-tenant, per-artifact class): AES-256-GCM, rotated aggressively (e.g., daily or per-segment).
- KEK (per-region): HSM-anchored RSA-OAEP-256 or AES-KW; no cross-border unwrap.
- Sign-only keys (HSM) for anchors/manifests; separate from KEKs (SoD).
- Key lineage in metadata:
keyId,keyVersion,alg,iv,aadHash,createdAt,regionCode.
- Policy (excerpt)
keys:
tenantScoped: true
regionAnchored: true
algorithms: { encrypt: AES-256-GCM, wrap: RSA-OAEP-256, sign: ECDSA-P256-SHA256 }
rotation:
dek: P1D
kek: P90D
overlapGrace: P7D
escrow:
jurisdiction: match(regionCode)
dualControl: true
- Stores: WORM segments (hot), immutable object storage (cold), indices (warm) all encrypted with tenant DEKs wrapped by regional KEKs.
Artifact encryption header (example)
{
"tenantId":"7c1a-...",
"category":"evidence.hot",
"regionCode":"EU",
"keyId":"k-eu-01",
"keyVersion":"8",
"alg":"AES-256-GCM",
"iv":"b64:...",
"aadHash":"sha256:2a9e...",
"sealedAt":"2025-10-29T07:55:11Z"
}
Encryption (In-Transit)¶
- mTLS everywhere in mesh; TLS ≥ 1.2 (prefer 1.3) at ingress; HSTS on public endpoints.
- Sender-constrained tokens (mTLS/DPoP) for privileged scopes (export/purge/residency-admin).
- SPKI pinning optional for operator consoles; cipher suites with forward secrecy only.
- Private endpoints to data stores; no public data-plane listeners.
Classification & Redaction¶
- Data classes:
public,internal,restricted,secret.pi(PII),secret.phi(health),secret.keys(cryptographic material). - Classification sources: schema registry tags + runtime detectors for PII fields (email, phone, address, IDs).
- Redaction templates applied at export and logs:
- Hash (irreversible, per-tenant salt/pepper):
sha256(salt_tenant || value). - Mask (keep last N chars): e.g.,
****1234. - Drop (remove field).
- Tokenize (vault-backed, reversible with approval).
- Hash (irreversible, per-tenant salt/pepper):
- Template (example)
redactTemplates:
pii-default:
email: { mode: hash }
phone: { mode: hash }
address: { mode: drop }
ssn: { mode: mask, keepLast: 4 }
package atp.export
default allow = false
# deny if secret class without proper purpose
deny[msg] {
input.resource.class in {"secret.pi","secret.phi"}
input.token.purpose != "dsar_export"
msg := "export purpose not permitted for secret class"
}
# require redact template for PII unless DSAR
deny[msg] {
input.resource.class == "secret.pi"
input.token.purpose != "dsar_export"
input.export.redactTemplate == ""
msg := "missing redact template for PII export"
}
allow {
not deny[_]
}
Immutability & Integrity (Evidence)¶
- WORM: append-only segments; admins cannot rewrite historical data.
- Merkle chains per stream/segment; anchors signed by regional HSM keys; optional RFC 3161 TSA timestamping.
- Verification:
- On-write: verify segment before seal.
- On-read sampling: verify Merkle path and anchor signature.
- Scheduled: rolling verification (≥ 100% coverage over policy window).
- Migration: bridge anchors with provenance when moving regions; old anchors preserved.
Integrity manifest (excerpt)
anchor:
id: "A_000145"
stream: "aud.gateway"
region: "westeurope"
merkleRoot: "b64:R_n"
signedBy: "hsm-eu-01"
tsa: { token: "b64:..." }
policyVersion: "3.3.0"
Backup Encryption & Secure Restore¶
- Backups: region-local, immutable class, encrypted with tenant DEKs / region KEKs; manifests signed.
- Restore: lands in quarantine (read-only); verify checksums, Merkle roots, anchor signatures, TSA receipts; only then promote.
- Access: restore operations require scoped tokens; logs include
tenantId,regionCode,policyVersion,keyId,keyVersion.
Backup policy (excerpt)
backup:
regionLocal: true
encrypt: { keyIdFrom: tenant_scope, hsm: true }
sign: { cms: true, tsa: true }
restore:
quarantine: { posture: read_only, ttl: P7D }
verification: [ checksum, merkle, anchor, tsa ]
Evidence & Monitoring¶
- Metrics:
encryption.key.rotate.count,integrity.verify.fail.count,export.redacted.fields,backup.success.count,restore.verify.fail.count. - Logs: key ops with lineage (
keyId,keyVersion), export redaction maps (no raw values), integrity verify outcomes. - Alerts: integrity failure, unsigned artifact detected, KEK rotation overdue, cross-border unwrap attempt.
Guardrails (quick checklist)¶
- Never store or transmit plaintext secrets; use Managed Identity + Key Vault/HSM.
- Encryption keys are tenant-scoped (DEK) and region-anchored (KEK); no cross-border unwrap.
- Evidence is immutable; all integrity proofs are verifiable offline (anchors + TSA).
- Exports of sensitive classes require purpose and redaction templates; DSAR is the exception pathway.
- Backups are encrypted & signed; restores are quarantined and verified before promotion.
Least Privilege & Policy Enforcement¶
ATP enforces minimum necessary access with time-bound, purpose-bound grants. Authorization is expressed as policy-as-code (OPA/Rego) and applied at every boundary (gateway, services, storage, export). Every decision is captured in a meta-audit stream tied to policyVersion.
Principles¶
- Deny by default; no ambient privileges.
- Just-Enough Authorization (JEA): scopes and actions are as narrow as possible.
- Just-In-Time (JIT) elevation: short TTL, automatic expiry, dual approval for sensitive ops.
- Purpose binding: tokens must include
purposealigned with the requested operation. - Contextual ABAC: decisions consider
tenantId,regionCode,dataSiloId,category,edition, andpurpose.
JIT Elevation (privilege broker)¶
- Operators request elevation with scope, resource, purpose, and TTL (≤ 4h).
- Dual approval for admin/sensitive categories (purge, export, residency admin).
- Broker issues a down-scoped token (
break_glass=falsefor routine JIT;trueonly for emergency flows) with:entitlementssubset,purpose,resource_scope(tenant/region/category),ttl,approval_id.
- PEPs enforce TTL and scope; all calls emit
elevation.usedevidence and attachapproval_id.
Privilege grant (example)
{
"approval_id": "apr_01J6...",
"requestor": "alice@ops",
"approvers": ["sec.ops", "dpo"],
"scope": ["retention.simulate"],
"resource_scope": { "tenantId":"7c1a-...", "regionCode":"EU", "category":"evidence.hot" },
"purpose": "ops_triage",
"ttl": "PT1H"
}
Policy-as-Code (OPA/Rego)¶
- Policies cover tenancy/residency, classification/redaction, quota/cost, and rate limits.
- Bundles are signed, versioned, and hot-reloaded; decisions stamp
policyVersion.
Rego (ABAC + purpose, sketch)
package atp.guard
default allow = false
deny[msg] { input.token.purpose == ""; msg := "missing purpose" }
deny[msg] { input.token.tenant_id != input.resource.tenantId; msg := "tenant mismatch" }
deny[msg] { input.token.region_code != input.resource.regionCode; msg := "cross-region blocked" }
allow {
input.op == "read"
input.token.scopes[_] == "evidence.read"
not deny[_]
}
Rego (quota + egress guard, sketch)
package atp.quota
deny[msg] {
input.op == "export"
bytes := input.request.bytes
bytes > data.quota[input.token.tenant_id].export_monthly_remaining
msg := "export quota exceeded"
}
deny[msg] {
input.op == "export"
input.route != data.residency[input.token.residency_profile].allowed_route
msg := "export route not allowed"
}
Runtime Enforcement (PEP-1 / PEP-2)¶
- PEP-1 (Gateway): coarse checks (authn, rate/WAF, tenancy/residency route, basic quota). Adds signed
X-Policy-*headers,X-Correlation-Id. - PEP-2 (Services): fine-grained ABAC on resource attributes (tenant/region/category), classification/redaction, export routes, cost/egress checks.
- Data boundary: verify attributes again before any read/write; WORM backends enforce immutability irrespective of caller.
Decision Logging (meta-audit stream)¶
Every guard produces a structured decision:
{
"ts":"2025-10-29T08:45:15Z",
"policyVersion":"3.3.0",
"decision":"deny|allow|quarantine",
"reason":"cross_region_blocked|quota_exceeded|ok",
"tenantId":"7c1a-...",
"regionCode":"EU",
"category":"evidence.hot",
"op":"read|write|export|purge",
"purpose":"dsar_export",
"correlationId":"6b3f-...",
"approval_id":"apr_01J6...", // when JIT/Break-glass applies
"latencyMs":7
}
- No raw PII; sensitive fields are hashed/redacted per log policy.
- Decisions are immutable (append-only sink) and feed SIEM and compliance evidence packs.
Operational Flow (example)¶
- Client calls Gateway with JWT (purpose, tenant, region).
- PEP-1 evaluates OPA; on allow → forwards with signed context.
- Service re-evaluates PEP-2 against resource; if export, checks redaction and quota/route.
- Decision is logged; on deny, returns
403 GuardViolationwith a reason code.
Metrics & Alerts¶
- Metrics:
abac.allow.count,abac.deny.count,quota.violation.count,elevation.used.count. - Alerts:
- Spike in deny for
tenant_mismatch(> +5 pp over baseline). - Quota exceed events without prior cost estimate.
- Elevation used near expiry without closure (post-mortem reminder).
- Spike in deny for
CI/CD Integration¶
- Policy lint and bundle signature checks as pipeline gates.
- Contract tests for deny paths (cross-tenant, cross-region, missing purpose).
- Canary policy rollout with automatic rollback on deny/allow drift.
Guardrails (quick checklist)¶
- Access is deny-by-default; tokens must include purpose and residency claims.
- Elevations are JIT, scoped, TTL-bound, and dual-approved when sensitive.
- Policies are signed & versioned; PEPs stamp
policyVersionand log all decisions. - Enforcement occurs at gateway, service, and data boundaries; no single gate is trusted alone.
- Meta-audit stream contains decision-grade evidence with no raw PII; feeds SIEM and compliance packs.
Security Monitoring & Detection¶
ATP emits decision-grade telemetry and correlates it in a SIEM-first pipeline. Signals come from the gateway (PEP-1), services (PEP-2), policy engine, KMS/HSM, WAF, mesh, and egress gateway. All events carry correlation IDs, classification tags, and policyVersion to enable high-precision detection. See also observability.md and alerts-slos.md.
Telemetry Model (event schema, excerpt)¶
{
"ts":"2025-10-29T08:59:11Z",
"tenantId":"7c1a-...",
"regionCode":"EU",
"service":"exporter",
"boundary":"PEP-2",
"category":"security.guard", // guard|authn|authz|integrity|keyops|egress|waf
"event":"abac.decision_denied",
"decision":"deny",
"reason":"cross_region_blocked",
"policyVersion":"3.3.0",
"correlationId":"c-6b3f...",
"subject":"svc-query",
"purpose":"dsar_export",
"route":"EU->US",
"class":["sec","privacy"], // classification tags
"severity":"medium",
"labels":{"edition":"enterprise","dataSiloId":"silo-7c1a"},
"network":{"srcIp":"hash:...","ua":"hash:..."}
}
- PII-safe: sensitive fields hashed/redacted per log policy.
- Dimensions:
tenantId,regionCode,category,boundary,policyVersion,purpose,decision.
SIEM Integration (sources & flows)¶
- Sources: Gateway/WAF, PEP-2 services, OPA/Policy bundles, KMS/HSM key ops, mesh authz, egress gateway, cloud control-plane.
- Transport: reliable shipper (structured JSON), regional in-region sinks (residency-aligned).
- Normalization: taxonomy to ECS/OCSF-like fields; enrich with tenant profile, residency profile, and asset inventory.
- Retention: ≥ 180 days online, ≥ 365 days archive (write-once), aligned to
data-residency-retention.md.
Detections (rules & analytics)¶
Guard violations (ABAC)
guards
| where event == "abac.decision_denied"
| summarize cnt = count() by reason, tenantId, regionCode, bin(ts, 15m)
| join kind=leftanti (
baseline_abac_denies
| where reason == "cross_region_blocked"
| project reason, tenantId, regionCode, p95 = p95_denies_15m
) on reason, tenantId, regionCode
| where cnt > p95 * 1.5
Failed auth spikes
auth
| where result == "fail"
| summarize fails=count() by tenantId, bin(ts, 5m)
| join (auth | where result=="success" | summarize succ=count() by tenantId, bin(ts,5m)) on tenantId, ts
| extend ratio = todecimal(fails) / (succ + 1)
| where fails > 50 and ratio > 3.0
Egress anomaly (possible exfil)
egress
| summarize bytes=sum(bytes) by tenantId, route, bin(ts, 1h)
| join kind=leftouter (egress_baseline) on tenantId, route
| where bytes > baseline_bytes * 2 and route !in ("in_region","same_code")
Key operations anomaly
keyops
| where op in ("unwrap","sign") and result=="success"
| summarize cnt=count() by keyId, bin(ts, 10m)
| where cnt > 3 * avg_cnt_7d // computed via windowed baseline job
Sigma (break-glass outside TTL)
title: Break-glass Used Outside Approved TTL
id: 56d8a1b0-1f4b-4af8-9d5a-ttl
status: experimental
logsource: { product: atp, service: guards }
detection:
sel_event: event: "break_glass.used"
sel_ttl:
ttl_remaining_seconds: "<= 0"
condition: sel_event and sel_ttl
level: high
fields: [ tenantId, regionCode, approval_id, subject, purpose ]
Anomaly Detection & ML¶
- Seasonal baselines for: ABAC denies, export bytes per route, key ops/min, WAF blocks/IP.
- Entity analytics: per tenant/identity impossible travel (region drift), purpose drift (new purposes for identity), and role drift (entitlements change spikes).
- Correlation: chain failed auth → ABAC denies → egress attempt within
correlationIdwindow raises severity.
Alerting & Response¶
- Severities: Critical (key compromise, integrity violation), High (cross-region export allowed under abnormal volume), Medium (deny spikes), Low (noise indicators).
- Routing: SecOps on-call (PagerDuty/Teams), SRE for availability-coupled signals, DPO/Legal for privacy signals.
- Alert payload includes: SIEM query link, recent decision logs, policyVersion, runbook link, and recommended next actions.
Alert policy (sketch)
alerts:
- name: integrity_violation
if: 'integrity.verify.fail.count > 0'
severity: critical
route: secops
runbook: 'hardening/tamper-evidence.md#incident'
- name: cross_region_export_spike
if: 'egress.bytes.above_baseline and route not in ["in_region","same_code"]'
severity: high
route: secops+dpo
runbook: 'platform/data-residency-retention.md#observability--compliance-evidence'
- name: abac_deny_surge
if: 'abac.deny.count.delta_pp > 5'
severity: medium
route: sre
runbook: 'operations/alerts-slos.md#abac-deny-surge'
Dashboards (security view)¶
- Guard Posture: allow/deny trend, reasons heatmap, top tenants/regions by denies.
- Auth Posture: fail/success ratio, MFA prompts, OBO usage, token anomalies (stale
kid, missing DPoP). - Egress & Residency: bytes by route, cross-region attempts (blocked/allowed), TSA latency tiles.
- Key Ops & Integrity: rotations, unwrap/sign counts, verification pass/fail heatmap.
Tuning & Suppression¶
- Context-aware suppression for approved pentests, DR drills, load tests (tagged
purpose). - Auto-closure: if post-incident remediation ADR merged and no recurrence in 30 days.
- Feedback loop: each false positive requires a rule note and either scope refinement or baseline adjustment.
KPIs & SLO Hooks¶
- MTTD/MTTR per severity band.
- Precision/Recall for top 5 rules (quarterly review).
- Coverage: % of guard decisions correlated in SIEM within 60s.
- Freshness: policy bundle signature lag < 5 min.
Evidence & Compliance¶
- SIEM reports feed monthly Security Evidence Packs: detection stats, incident timelines, remediation status, and rule version diffs.
- All alerts/events are region-local and included in auditor-ready exports with signatures.
Guardrails (quick checklist)¶
- All security events carry correlationId, tenantId, regionCode, and policyVersion.
- Detection rules honor residency and never require raw PII (use hashes/tokens).
- Alerts must link to a runbook and policy references; paging only on actionable signals.
- Baselines and rules are versioned; changes recorded in a Security ADR.
Incident Response & Forensics¶
ATP treats incidents as time-critical, evidence-driven operations. We detect early, contain fast, investigate with immutable artifacts, and remediate with policy changes captured in Security ADRs. Every step produces auditor-ready evidence.
Detection (how we know)¶
- Signals: alerts from SIEM rules (guard violations, key ops anomalies), integrity verification failures, WAF spikes, egress anomalies, IdP risk events.
- Sources: Gateway/PEP-1, Services/PEP-2, OPA decisions, KMS/HSM logs, mesh authz, egress gateway, cloud control plane.
- Auto-triage: incidents are auto-classified with severity and routed to on-call; correlation uses
correlationId,tenantId,regionCode,policyVersion.
Severity & targets (example)
| Sev | Description | Targets | Initial comms |
|---|---|---|---|
| S0 | Active data breach, key compromise, cross-tenant exfiltration | 15 min containment | 30 min internal, 72h regulator (jurisdictional) |
| S1 | Integrity violation, residency breach prevented, mass auth spike | 60 min containment | 2h internal |
| S2 | Misconfig, partial outage, false-positive run | 4 h containment | By EOD |
Containment (stop the bleeding)¶
- Tenant isolation: dynamic OPA override → deny for
{tenantId: X}on read/export/purge. - Kill switches at the Gateway:
- Block cross-region routes, disable export categories, throttle ingestion for hot shard.
- Token revocation:
- Revoke refresh tokens for affected identities; rotate API keys; enforce re-auth with MFA.
- Key containment:
- Suspend KEK unwrap/sign for suspected
keyId; rotate DEKs for impacted tenants; escrow unaffected.
- Suspend KEK unwrap/sign for suspected
- Network choke points:
- Egress allowlist tightened; specific FQDN/CIDR blocks; WAF rule elevation; mesh policy set to deny-by-default for suspicious identities.
Containment policy (sketch)
containment:
tenantBlock:
tenantId: "7c1a-..."
ops: [ "read", "export", "purge" ]
reason: "cross-tenant attempt"
ttl: "PT4H"
gateway:
exportRoutes: { allow: [ "in_region" ], deny: [ "same_code", "global" ] }
keys:
suspend:
- keyId: "hsm-eu-01"
ops: [ "unwrap" ]
scope: [ "tenant:7c1a-..." ]
ttl: "PT2H"
Investigation (prove what happened)¶
- Golden rule: preserve evidence; no live-fixing in the blast zone without snapshot.
- Immutable audit trail:
- Decision logs (ABAC, exports), integrity manifests, anchor/TSA receipts, KMS/HSM key ops, gateway WAF logs.
- Chain-of-custody:
- Hash and sign collected artifacts; timestamp (RFC 3161 optional); store in write-once evidence vault; record access with dual-control.
Evidence bundle (manifest)
forensicsBundle:
id: "ir_2025-10-29_7c1a"
severity: S1
scope: { tenantId: "7c1a-...", region: "westeurope", from: "2025-10-29T08:00Z", to: "2025-10-29T10:00Z" }
artifacts:
- guards.log.ndjson.sig
- waf.sample.json.gz.sig
- keyops.log.ndjson.sig
- integrity.verify.report.json.sig
- export.manifests/*.sig
hash: "sha256:..."
tsa: "b64:..."
reviewers: ["forensics.lead","ic"]
- Timeline reconstruction:
- Join on
correlationIdand time bins; verify sequence of allow/deny, key ops, egress.
- Join on
- Integrity validation:
- Re-verify affected segments/anchors; quarantine anything with mismatched Merkle paths.
Remediation (return to safe & prevent recurrence)¶
- Compensating controls:
- Tighten policies (e.g., export route to
in_regiononly), raise WAF sensitivity, reduce token TTLs, enforce DPoP for privileged routes.
- Tighten policies (e.g., export route to
- Rotation & cleanup:
- Rotate KEKs/DEKs as needed; re-issue tokens/keys; deprecate vulnerable routes.
- Fix & validate:
- Patch/rollback services or policy bundles; run conformance suites (residency, authz, integrity) and post-fix canary.
- Security ADR:
- Record root cause, alternatives, chosen change, rollout/rollback plan, evidence queries.
Roles & RACI¶
| Role | Responsibilities |
|---|---|
| Incident Commander (IC) | Owns timeline, decisions, severity, comms; delegates tasks |
| Forensics Lead | Evidence capture, chain-of-custody, timeline, findings |
| Ops/SRE Lead | Containment (network/mesh/gateway), DR posture changes |
| Security Engineering | Policy/OPA, key ops coordination with KMS/HSM |
| Comms Lead | Stakeholder updates, regulator/customer notices with Legal/DPO |
| Scribe | Real-time log of actions/decisions, artifact index |
Communications & Notifications¶
- Internal: war room channel with audited bot; updates at fixed intervals per severity.
- External: regulator/customer notices coordinated with Legal/DPO; content must include scope, data categories, controls in place, and remediation.
- Residency-aware: notifications follow regional requirements (see privacy-gdpr-hipaa-soc2.md).
Playbooks (common scenarios)¶
- Cross-tenant access attempt (blocked)
- Detect
abac.decision_denied{reason="tenant_mismatch"}surge → isolate tenant route → verify logs → confirm noallow→ close with tuning if necessary. - Tampering / integrity violation
- Detect
integrity.violation_detected→ quarantine stream → re-verify anchors → check HSM key lineage → rebuild read models from last good segment → publish incident report. - Data breach / exfil suspicion
- Egress anomaly to non-allowed route → kill cross-region exports → rotate credentials → capture packet samples → notify per jurisdiction if confirmed.
- Key compromise indicators
- Abnormal
unwrap/signrates → suspend key ops → rotate KEKs/DEKs → re-seal anchors with new signer → attest rotations. - Supply-chain artifact tampering
- CI gate or admission controller denial → freeze deploys → verify cosign/SBOM diffs → roll back images/policies → update allowlists.
Flow (generic incident)
sequenceDiagram
autonumber
participant DET as Detection (SIEM)
participant IC as Incident Commander
participant CT as Containment
participant FR as Forensics
participant RM as Remediation
DET-->>IC: Alert (severity, signals, links)
IC->>CT: Activate playbook (isolate tenant/export route)
CT-->>IC: Contained, switches active
IC->>FR: Capture evidence + timeline
FR-->>IC: Findings (scope, root cause, impact)
IC->>RM: Implement fix (policy/service/keys)
RM-->>IC: Conformance green, canary pass
IC-->>All: Close + ADR + evidence pack
Runbooks & Automation Hooks¶
- Buttons (secured runbooks):
- Isolate Tenant → writes OPA override; TTL-bound.
- Disable Export Route (
same_code,global) globally or per-tenant. - Revoke Tokens for subject/tenant.
- Suspend Key Ops for
keyId/tenant scope.
- Backout: timeboxed; all switches auto-expire unless renewed with approval.
KPIs & Post-incident¶
- KPIs: MTTD, MTTR, % incidents with complete evidence bundle, % closed with ADR, repeat rate within 30/90 days.
- Post-incident review: within 5 business days; track actions in backlog with owners and due dates; link to evidence and ADR.
Acceptance (done when)¶
- Containment active within target window; evidence bundle complete and signed.
- Root cause, impact, and scope confirmed; regulators/customers notified if required.
- Fix deployed and verified via conformance suites; Security ADR merged.
- Kill switches rolled back or codified as permanent policy changes.
- Lessons learned captured; detections tuned; playbook updated.
Cross-references observability.md · alerts-slos.md · tamper-evidence.md · data-residency-retention.md · key-rotation.md · privacy-gdpr-hipaa-soc2.md
Vulnerability & Patch Management¶
We maintain a continuous posture for discovering, triaging, and remediating vulnerabilities across code, dependencies, images, OS, IaC, and policies. Patching favors zero-downtime, canary rollouts, and automatic rollback with auditable evidence.
Coverage & Scanners¶
- Dependencies: NuGet/NPM/containers — SBOM generated each build; diffed on PRs.
- Container images: base + app layers; pinned by digest, rebuilt nightly.
- OS & runtimes: distro/APK/apt packages; language runtimes (dotnet/node).
- IaC: Dockerfiles, Helm, Bicep/Terraform — misconfig & secret scans.
- Policies: OPA bundles signed; verify signature and policy lints on PR/CI.
CVE Intake & Prioritization¶
- Feeds: CVE/NVD, vendor advisories, CISA KEV, ecosystem advisories.
- Exploit signals: EPSS percentile, KEV listing, public PoC.
- Asset criticality: internet-exposed? PII/PHI adjacent? tenancy boundary?
Risk score (sketch)
Risk = max(CVSS, SevMap) × (1 + EPSSfactor + ExploitBonus) × AssetCriticality
EPSSfactor: 0.25 if EPSS ≥ 0.7, 0.1 if 0.5–0.7 else 0.ExploitBonus: +0.25 if KEV or public PoC.AssetCriticality: 1.5 (internet-exposed), 1.25 (boundary), 1.0 otherwise.
Remediation SLAs (default)¶
| Class (example) | Trigger | SLA to Mitigate | Mitigation options |
|---|---|---|---|
| Critical | KEV or EPSS ≥ 0.7 or CVSS ≥ 9 | 48h | Patch/upgrade, config block, feature flag off, WAF rule |
| High | CVSS 7–8.9 | 7d | Patch/upgrade; config hardening |
| Medium | CVSS 4–6.9 | 30d | Patch in next scheduled window |
| Low | CVSS <4 | 90d | Batch with routine updates |
Exceptions require Security Ops + Product approval, an expiry, and a compensating control; tracked in the exception registry.
Pipeline Gates (CI/CD)¶
- Build stage: SBOM, dependency scan, IaC scan, container scan; fail on High/Critical unless approved exception.
- Sign/attest: cosign signatures for images; signed OPA bundles.
- Deploy stage: policy guard — rejects unsigned/unknown digest; canary guarded by SLOs & security metrics.
stages:
- stage: vuln_scans
jobs:
- job: sbom_and_scans
steps:
- script: dotnet build /p:ContinuousIntegrationBuild=true
- script: cyclonedx create --out sbom.json
- script: trivy fs --exit-code 1 .
- script: trivy image --exit-code 1 $(IMAGE)
- script: checkov -d infra/ # IaC
- stage: sign_and_verify
jobs:
- job: sign
steps:
- script: cosign sign --key $(COSIGN_KEY) $(IMAGE_DIGEST)
- script: opa build policies/ && cosign sign --key $(COSIGN_KEY) policy.bundle
- stage: deploy_canary
condition: succeeded()
jobs:
- deployment: canary
strategy:
runOnce:
deploy:
steps:
- script: helm upgrade ... --set image.digest=$(IMAGE_DIGEST) --set canary=1
- script: ./gates.sh security --denyDrift=5 --abacDenyDelta=5
Patch Windows & Rollouts¶
- Zero-downtime strategy: rolling updates; workloads replicated; backward-compatible migrations.
- Progressive: 1% → 10% → 50% → 100% with automated health & security gates.
- Automatic rollback triggers:
- Guard deny spike > +5 pp vs baseline,
- Integrity verify failures > 0.1% sample,
- Error budget burn rate > 2× normal.
Policy (excerpt)
patch:
strategy: progressive
steps: [0.01, 0.1, 0.5, 1.0]
securityGates:
abacDenyDeltaPpMax: 5
integrityFailRateMax: 0.001
unsignedArtifact: block
rollback:
auto: true
reasonCapture: required
maintenanceWindows:
preferred:
- fri_22_02_local
blackout:
- last_week_of_qtr
Coordination (ConnectSoft Platform Security)¶
- Shared base images and common libraries patched centrally; ATP inherits via digest pin bumps.
- Weekly Patch Board: review new CVEs, SLA status, exceptions expiring.
- Change comms: security bulletin to affected teams; link to Security ADR and rollout plan.
Exception & Risk Acceptance¶
exception:
id: "EXC-2025-014"
cve: "CVE-2025-12345"
reason: "Upstream fix pending; WAF rule blocks vector"
scope: { service: "exporter", region: "westeurope" }
compensating: [ "WAF-Rule-1123", "rate-limit-tighten" ]
owner: "Platform Security"
expiresAt: "2025-11-15T00:00:00Z"
reviewEvery: "P7D"
- Exceptions auto-page owners 7 days before expiry; cannot be extended without dual approval.
Evidence & Reporting¶
- Artifacts: SBOMs, scan reports, cosign attestations, digest pins, deployment manifests, rollback records.
- Dashboards: open vulns by severity/age, MTTR, SLA compliance %, exposure days, top offending packages.
- Logs:
vuln.scan.failed,image.unsigned.blocked,policy.bundle.unsigned.blocked,rollback.triggered.
For Hotfixes (out-of-band)¶
- Use hotfix lane with same gates; scope to affected services/regions; create post-incident Security ADR.
Mermaid Flow (triage → rollout)¶
flowchart TD
A[CVE Intake] --> B[Triage & Score]
B -->|Critical/High| C[Create Fix PR]
C --> D[Scans + SBOM + Sign]
D --> E[Canary Rollout]
E -->|Gates Pass| F[Full Rollout]
E -->|Gate Fails| R[Auto Rollback]
F --> G[Close & Evidence Pack]
R --> C
Guardrails (quick checklist)¶
- Images & policy bundles are signed; digests pinned; no
latesttags. - High/Critical vulns block CI/CD unless an exception with expiry and compensating controls exists.
- Rollouts are progressive with security gates; automatic rollback is configured and tested.
- Exceptions live in a registry, are time-boxed, and reviewed weekly; alerts fire before expiry.
- Evidence (SBOMs, scans, signatures, rollout states) is retained and included in monthly security packs.
Cross-references Application Security & Secure SDLC · Security Monitoring & Detection · Incident Response & Forensics · zero-trust.md
Compliance Attestation Strategy¶
ATP proves compliance continuously, not just at audit time. We maintain a live mapping to frameworks (GDPR, HIPAA, SOC 2, ISO 27001), curate signed evidence, and support time-bound, read-only auditor workflows — all backed by dashboards that reflect real-time control posture. See privacy-gdpr-hipaa-soc2.md for per-framework details.
Scope & Mapping (where each proof comes from)¶
| Framework focus | Primary controls (examples) | Live evidence sources | Cross-refs |
|---|---|---|---|
| GDPR: residency, storage limitation, rights | Residency profiles, retention policy-as-code, DSAR pipeline, redaction | Residency decisions, purge ledgers, DSAR manifests, export route logs | data-residency-retention.md |
| HIPAA: safeguards, integrity, access | ABAC, immutability + anchors, key ops auditing, restore quarantine | ABAC decision logs, integrity verify reports, KMS/HSM logs, restore manifests | tamper-evidence.md |
| SOC 2: Security/Availability/PI/Conf/Privacy | Zero Trust, DR drills, logging/monitoring, change control | WAF/mesh/egress logs, DR drill reports, SIEM detections, ADRs | observability.md |
| ISO 27001: Annex A controls | Risk management, supplier, crypto, deletion | Risk register, vendor assessments, crypto key lineage, deletion evidence | key-rotation.md |
The authoritative mapping lives in the Control Registry, where each control lists its framework tags and evidence queries.
Evidence Collection (how we curate proofs)¶
- Artifacts are region-local, append-only, and signed:
- Policy bundles + signatures (
policyVersion, bundle digest) - ABAC decisions (allow/deny with reason codes)
- Residency decisions, purge ledgers, DSAR/export manifests
- Integrity proofs (Merkle roots, anchor signatures, TSA receipts)
- KMS/HSM key operations (wrap/unwrap/sign) with key lineage
- DR/restore drill reports, change records (ADRs), CI/CD gate outcomes
- Policy bundles + signatures (
- Packaging:
- Monthly Evidence Pack (ops + security): guard stats, scans, rotations, drills
- Quarterly Attestation Pack (framework-aligned): crosswalk + curated artifacts
- Stored in write-once class, CMS-signed; optional RFC 3161 timestamps
- Retention: online ≥ 180 days, archive ≥ 12–24 months (per framework & contract)
Evidence pack manifest (sketch)
attestationPack:
id: "att-2025Q4-eu"
scope: { regionCode: "EU", tenants: ["*"] }
frameworks: ["GDPR","SOC2"]
policyVersion: "3.3.0"
artifacts:
- abac.decisions.2025Q4.ndjson.sig
- residency.decisions.2025Q4.ndjson.sig
- dsar.manifests.2025Q4/*.sig
- retention.purge.ledger.2025Q4.csv.sig
- integrity.verify.report.2025Q4.json.sig
- kms.keyops.2025Q4.ndjson.sig
- dr.drill.2025-11-12.report.pdf.sig
tsa: "b64:..."
Auditor Workflows (read-only, time-bound, watermarked)¶
- Access model
- Read-only auditor role (no break-glass, no export without watermark)
- Time-boxed (e.g., 14 days), region-scoped, tenant-scoped
- Purpose-bound tokens (
purpose: "audit_attestation")
- Watermarked exports
- All auditor exports are in-region, redacted by default, watermarked (auditor ID, request ID, timestamp)
- Manifests are signed and logged under
audit.export.*
- Chain-of-custody
- Each data pull is hashed, signed, timestamped; access logged with correlation IDs
sequenceDiagram
autonumber
participant A as Auditor
participant SEC as SecOps (Approval)
participant GW as API Gateway (PEP-1)
participant SV as Services (PEP-2)
participant PK as Policy/KMS
A->>SEC: Request read-only access (scope, TTL, region)
SEC-->>A: Approved token (purpose=audit_attestation)
A->>GW: Query evidence endpoints
GW->>SV: Forward with signed X-Policy-* headers
SV->>PK: Verify policyVersion, sign manifest
SV-->>A: Watermarked export + signed manifest
Auditor checklist (excerpt)
auditorAccess:
scope: { regionCode: "EU", tenants: ["s*","t*"] }
ttl: P14D
watermark: required
export:
route: in_region
redactTemplate: pii-default
logs: pii-safe
Continuous Compliance Dashboards (live posture)¶
- Framework posture tiles: % controls passing per framework (last 24h/7d/30d)
- Key indicators:
- Policy bundle signature freshness (< 5 min lag)
- Residency conformance pass rate
- Retention dry-run anomalies (Δ > ±5 pp)
- Integrity verification failure rate
- Key rotation schedule adherence
- DR drill freshness (last drill per region)
- Drill-down: control → evidence query → artifact → signature & TSA receipt
- Exceptions view: open risk acceptances with expiry and compensating controls
Attestation Process (cadence & roles)¶
- Monthly: Evidence Pack generation (automated), SecOps review, findings triage
- Quarterly: Framework Attestation Pack, control sampling, auditor window
- Annually / on major change: penetration test + DPIA refresh (where required)
- Roles:
- Compliance: owns framework mapping and auditor interface
- SecOps: owns evidence packs and access provisioning
- SRE: owns DR/restore evidence, availability artifacts
- Platform Security: owns policy registry & signatures
- Legal/DPO: owns jurisdictional notices and DPIA
Interfaces (read-only endpoints)¶
/evidence/abac/decisions?from=&to=&tenantId=®ionCode=/evidence/residency/decisions?.../evidence/retention/purge-ledger?.../evidence/integrity/reports?.../evidence/keyops/logs?...- All endpoints enforce auditor role, purpose, region, rate limits, and watermarking on export.
Acceptance (done when)¶
- Framework crosswalks link each control → evidence query → artifact with owners and cadence
- Monthly/quarterly packs are generated, signed, stored in write-once, and retrievable by auditors
- Auditor workflow is time-bound, read-only, watermarked, and captured in logs with correlation IDs
- Dashboards reflect real-time control posture and surface expiring exceptions and stale drills
Guardrails (quick checklist)¶
- Evidence is region-local, PII-safe, signed, and timestamped.
- Auditor exports are in-region, watermarked, and redacted by default.
- No auditor action can mutate systems; break-glass is not available to auditor roles.
- Exceptions are time-boxed with compensating controls and show up in posture dashboards.
Cross-references privacy-gdpr-hipaa-soc2.md · data-residency-retention.md · observability.md · alerts-slos.md · tamper-evidence.md · key-rotation.md
Third-Party & Supply Chain Security¶
We secure the supply chain from code → build → artifact → deploy → runtime and govern third-party risk across cloud providers, SaaS, SDKs, libraries, and CI/CD tooling. Vendors are tiered by data sensitivity and blast radius; artifacts must be signed, attested, and SBOM-tracked; contracts encode security requirements.
Principles¶
- Trust is earned and re-verified: signatures, provenance, and reputation checks.
- Minimal blast radius: least privilege, constrained egress, segregated secrets per vendor.
- Documented accountability: subprocessor register, DPAs/BAAs, incident SLAs, data maps.
- Continuous monitoring: detect typosquat, dependency confusion, malicious updates.
Vendor Risk Assessment (tiers & cadence)¶
| Tier | Example vendors | Data & access | Due diligence | Review cadence |
|---|---|---|---|---|
| T1 — Critical | Cloud/KMS, IdP, logging/SIEM | Prod data, keys, or auth | SOC2/ISO certs, pen test, DPA/BAA, residency & deletion guarantees | Quarterly + annual onsite/virtual |
| T2 — Sensitive | SDKs, observability SaaS, build infra | Metadata/telemetry, limited PII | Security questionnaire, SBOM, SLA on incidents | Semiannual |
| T3 — Low | Dev utilities, docs tooling | No prod access | Basic questionnaire, OSS posture | Annual |
Onboarding checklist (excerpt)
vendorOnboarding:
riskTier: T1|T2|T3
controls:
- soc2_or_iso27001: required_for: [T1,T2]
- dpa_baa_signed: required_for: [T1]
- residency_terms: match(platform_profiles)
- incident_sla_hours: { critical: 24, high: 72 }
- data_deletion_sla_days: 30
- sbom_provided: true
- vuln_disclosure_policy: public_or_private
technical:
- ip_allowlist_configured: true
- egress_allowlist_entry: created
- api_keys_scoped_rotated: true
- audit_logs_available: true
SBOM, License & Provenance¶
- SBOM generation every build (CycloneDX/SPDX) and verification on deploy; diffs must be reviewed on PRs.
- License compliance: allowlist/denylist; flag copyleft where redistribution applies; legal review for exceptions.
- Artifact signing & provenance:
- cosign signatures for container images and OPA bundles.
- SLSA/in-toto attestations (builder, source repo, commit, workflow run).
- NuGet/npm sources pinned; package signature/integrity verified; no
latest.
Policy (sketch)
supplyChain:
requireSignatures: [ container_image, opa_bundle ]
requireAttestations: [ slsa_provenance ]
sbom:
format: cyclonedx
verifyOnDeploy: true
packageSources:
nuget: [ "https://api.nuget.org/v3/index.json" ]
npm: [ "https://registry.npmjs.org" ]
pinning:
container: digest_only
packages: lockfile_required
licenses:
allow: [ "MIT","Apache-2.0","BSD-3-Clause" ]
deny: [ "AGPL-3.0" ]
Contractual Security Requirements¶
- Incident notification: Critical within 24h, High within 72h; share IoCs and containment steps.
- Data handling: processing instructions, residency alignment, subprocessor disclosure, secure deletion within 30 days of termination.
- Availability: SLA for uptime and support; RPO/RTO if vendor hosts critical path.
- Compliance: maintain valid SOC 2/ISO; annual pen test; provide report extracts under NDA.
- Audit: ATP may conduct security review; provide API/audit logs for the scoped period.
Monitoring for Supply Chain Attacks¶
- Dependency hijacking & typosquatting
- Alert on new package names with small Levenshtein distance to existing deps.
- Block installs from unapproved registries or namespace changes.
- Malicious updates
- Alert on sudden publisher change, unusual permission requests, or telemetry domains added.
- Quarantine builds on SBOM delta spikes (e.g., +30% new transient deps).
- CI/CD artifact tampering
- Admission controller denies unsigned or unknown-digest images/policies.
- Compare provenance (repo URL, commit SHA, workflow run ID) to expected patterns.
Detections (examples)
-- New or suspicious package
sbom_deltas
| where change == "added"
| extend dist = levenshtein(packageName, nearestKnown)
| where dist <= 2 or sourceRegistry !in ('nuget.org','npmjs.org')
-- Unsigned artifact admitted (should be impossible)
deploy_events
| where imageSignatureVerified == false or opaBundleVerified == false
Operational Practices¶
- Quarantine lane for new deps/vendors → staged rollout behind feature flag.
- Egress only to allowlisted vendor endpoints; DNS split-horizon; no wildcard egress.
- Secrets per vendor integration: scoped, rotated, and stored in Key Vault/HSM; no credential reuse.
- Vendor keys: rotated on staff changes or incident; access reviewed quarterly.
Subprocessor Register & Changes¶
- Maintain a public internal register (customer-visible summary) of all subprocessors with:
- Services provided, data categories, residency, contact, certifications, last review date.
- Change notifications to customers per contract; allow opt-out where required.
Evidence & Acceptance¶
- Artifacts: vendor questionnaires, certifications (SOC/ISO), DPAs/BAAs, SBOMs, cosign/verifier logs, provenance attestations, license scans, allowlist policies.
- Dashboards: open vendor risks by tier, SBOM drift, unsigned artifact blocks, registry changes, license violations.
- Done when:
- All T1/T2 vendors have current attestations; DPAs/BAAs executed.
- All deployable artifacts are signed + attested; SBOM verified at deploy.
- Package sources pinned; dependency monitoring rules active; no wildcards.
- Subprocessor register is up to date and linked from customer docs.
Guardrails (quick checklist)¶
- Only approved registries and pinned digests; no
latest, no unverified publisher changes. - Images and policy bundles must be signed; admission denies unsigned/unknown artifacts.
- SBOMs are generated, diffed, and verified; license allowlist enforced.
- Vendor integrations have scoped secrets, egress allowlists, and incident SLAs in contract.
- Supply-chain detections are on by default; quarantine suspicious updates for manual review.
Cross-references Application Security & Secure SDLC · Vulnerability & Patch Management · Security Monitoring & Detection · observability.md
Security Testing & Validation¶
Security is verified continuously with layered tests: unit/contract/integration suites, chaos experiments, independent penetration testing, and compliance test packs that prove control effectiveness. Tests are policy-aware (carry policyVersion) and run in CI/CD and scheduled jobs.
Test Strategy (what we cover)¶
- Auth/AuthZ flows: token minting, OBO down-scoping, RBAC/ABAC decisions at PEP-1/PEP-2.
- Tenancy & residency guards: cross-tenant and cross-region deny paths, export-route gating.
- Classification/redaction: required templates, DSAR exceptions, log redaction.
- Integrity & immutability: WORM append/deny, Merkle path checks, anchor/TSA verification.
- Keying: per-tenant DEKs, region KEKs, rotation windows, sender-constrained tokens (mTLS/DPoP).
- Network: WAF rules, mTLS-only mesh, egress allowlist enforcement.
Automated Tests¶
Unit & Contract (fast)
- ABAC/Rego: allow/deny matrices for common operations.
- Token claims: required
purpose,tenant_id,region_code,entitlements. - Export routes:
in_regiononly unless profile allows.
# test/authz_test.rego
package atp.tests
import data.atp.authz as pol
test_read_same_tenant_and_region_allows {
input := {"op":"read","token":{"tenant_id":"t1","region_code":"EU","scopes":["evidence.read"],"purpose":"default"},
"resource":{"tenantId":"t1","regionCode":"EU"}}
pol.allow with input as input
}
test_cross_region_blocks {
input := {"op":"read","token":{"tenant_id":"t1","region_code":"EU","scopes":["evidence.read"],"purpose":"default"},
"resource":{"tenantId":"t1","regionCode":"US"}}
not pol.allow with input as input
}
Integration (real boundaries)
- Run against ephemeral env with private endpoints and mesh mTLS.
- Assert bearer-only tokens fail inside mesh; sender-constrained tokens pass.
- Export attempts to disallowed routes return
403 GuardViolationwith reason.
- name: integration:mesh-mtls
run: dotnet test tests/Integration.Mesh.Tests.csproj --filter "Category=Mesh"
Deny/Allow Matrix (sample)
| Operation | Token Purpose | Region Route | Expect |
|---|---|---|---|
| read evidence | default | EU→EU | allow |
| read evidence | default | EU→US | deny (cross_region_blocked) |
| export PII | dsar_export | EU→EU | allow (redaction disabled) |
| export PII | default | EU→EU | deny (missing redactTemplate) |
| purge hot | retention_admin | EU | allow if cutoff & approvals |
| purge hot | default | EU | deny (scope) |
Chaos Experiments (resilience of controls)¶
- Credential rotation: roll JWKS
kid, ensure no stale acceptance; DPoP key rotation mid-session. - Key unavailability: suspend KEK unwrap in region; verify write/read degrade modes and deny-with-evidence.
- Network partitions: isolate OPA sidecar; ensure fail closed with reason
pdp_unreachable. - Egress lockdown: remove allowlist entry; assert outbound fails and alerts fire.
- WORM stress: simulate storage retry/fail; ensure no duplicate/mutable writes (idempotency preserved).
- Anchor signer outage: defer seal with quarantine; verify queued seals upon recovery.
Chaos policy (excerpt)
chaos:
blastRadius: "single service / single region"
schedule: "off-peak"
abortOn: [ "abac.deny.count.delta_pp > 5", "error_budget_burn > 2x" ]
evidence: collect[ "guards", "keyops", "integrity", "egress" ]
Penetration Testing & Bug Bounty¶
- Cadence: annual external pen test + post-major change (auth/crypto/network).
- Scope: public API, gateway/WAF, auth flows, multi-tenant isolation, export workflows.
- ROE: no production data mutation; dedicated staging with production-like controls; auditor tokens with watermarking.
- Bug bounty: coordinated disclosure; severity mapped to IR SLAs; researcher safe harbor.
Deliverables: findings, proof-of-concept, exploitation path, recommended fix, retest evidence. All high/critical findings require a Security ADR and are tracked to closure.
Compliance Test Suites (control effectiveness)¶
- Residency conformance: synthetic tenants per region; cross-region deny assertions.
- Retention dry-runs: compare eligible vs. purged; investigate deltas ±5 pp.
- Key rotation checks: verify DEK/KEK schedules, escrow attestations.
- Evidence completeness: monthly pack contains required artifacts; manifest signatures valid.
conformance:
residency:
tenants: ["t-eu","t-us"]
tests:
- route: EU->US
expect: deny
retention:
dryRunWindowDays: 7
deltaThresholdPp: 5
CI/CD Integration¶
- Stages: unit/rego → integration (ephemeral env) → chaos (canary env) → pen-test gate (scheduled).
- Gates: block deploy if deny/allow drift > 5 pp, or if any chaos test yields open control failure.
- Artifacts: test reports, OPA bundle signatures, SIEM links; all archived into evidence packs.
- stage: security_tests
jobs:
- job: rego_unit
steps:
- script: conftest test policies/ -p policies/
- job: dotnet_unit
steps:
- script: dotnet test tests/Unit
- job: integration_ephemeral
steps:
- script: ./scripts/provision-ephemeral.sh
- script: dotnet test tests/Integration
- script: ./scripts/teardown-ephemeral.sh
KPIs & SLOs¶
- Coverage: ≥ 90% of guard paths have tests (incl. deny paths).
- Drift: allow/deny decision drift ≤ 2 pp week-over-week.
- Chaos: ≥ 1 control-focused chaos drill per region/month; aborts < 5%.
- Pen test: all High/Critical remediated within SLA; retest pass.
Evidence & Acceptance¶
- Test outputs, chaos logs, pen-test reports, and conformance results are retained, signed, and referenced by
policyVersion. - “Done” when:
- All suites pass; gates green; no High/Critical open.
- Evidence is bundled into the current Security Pack.
- Any material changes result in a Security ADR.
Cross-references Application Security & Secure SDLC · Least Privilege & Policy Enforcement · Security Monitoring & Detection · Incident Response & Forensics · data-residency-retention.md
Governance, Risk & Continuous Improvement¶
ATP evolves through clear ownership, measurable risk reduction, and versioned security decisions. We maintain a forward-looking security roadmap, a living risk register, and an ADR-driven decision record. Progress is reviewed with stakeholders quarterly and tied to concrete KPIs.
Security Roadmap (next 12–18 months)¶
-
Crypto agility & quantum-safe readiness
- Pluggable crypto providers; hybrid (ECDSA + Dilithium) anchor signatures.
- KEK migration plan; algorithm identifiers embedded in manifests; rollback strategy.
- PQC testbed with offline verification harness; performance budget established.
-
AI-assisted detection
- Entity behavior baselines (tenant/identity/purpose) with drift scoring.
- LLM-assisted triage summaries sourced from structured evidence; no raw PII.
- Auto-enrichment (IoCs, change diffs, policy deltas) appended to alerts.
-
Adaptive policies
- Risk-aware ABAC (e.g., shorten token TTL on anomaly, force DPoP for export).
- Contextual egress gates (tighten routes on surge; relax post-drill automatically).
- Progressive enforcement:
report → warn → enforcewith tracked false positives.
-
Attestation automation
- “One-click” Evidence Pack generation with per-framework crosswalks.
- Signed control proofs (policyVersion, SIEM query hash, artifact digests).
Roadmap tracker (excerpt)
roadmap:
- id: CRYPTO-AGILITY-01
goal: "Hybrid anchors (ECDSA+Dilithium) in EU region"
due: "2026-03-31"
kpi: "100% anchors dual-signed; verify latency < +10%"
owner: "Platform Security"
- id: DET-AI-02
goal: "LLM triage summaries for S1+"
due: "2026-01-15"
kpi: "MTTD -20%; triage time -30%"
owner: "SecOps"
Risk Register (living view)¶
- Inputs: threat intel, incident post-mortems, pen test findings, supply-chain advisories, KPI drift.
- Scoring: Likelihood × Impact (1–5); Residual after controls; review monthly.
- States:
Open → Mitigating → Verified → Accepted (time-boxed). - Links: each risk ties to controls, tests, evidence queries, and ADRs.
Template
risk:
id: R-2025-023
title: "Break-glass scope misuse"
drivers: ["Incident review 2025-10-12"]
inherent: { likelihood: 3, impact: 4 }
controls: ["EX-ATP-040","AC-ATP-001"]
plan:
actions:
- "Add scope validator to approval UI"
- "TTL hard-cap to 4h platform-wide"
owner: "Security Ops"
due: "2025-11-15"
residual: { likelihood: 2, impact: 3 }
evidence: ["siem:break_glass.used", "policy:approval_validator.v2"]
status: "Mitigating"
exceptions: []
ADR-Driven Security Decisions¶
- When required: auth/crypto changes, policy model updates, egress rules, residency posture, incident-driven compensating controls.
- Format: Context → Decision → Alternatives → Risks → Rollout/Rollback → Evidence hooks (SIEM queries, policyVersion, manifests).
- Versioning: policies and ADRs share a semantic version; PEPs stamp
policyVersioninto decision logs. - Governance: security ADRs require dual approval (Platform Security + Product/Compliance).
Policy Lifecycle & Change Control¶
- Stages:
draft → canary → enforce. - Deprecation windows: minimum 30 days for breaking policy changes (export routes, token requirements).
- Policy bundles: signed, reproducible build, immutably archived; rollback bundles retained ≥ 12 months.
- Drift monitoring: allow/deny delta ≤ 2 pp week-over-week, else auto-rollback.
Policy release (sketch)
policyRelease:
version: "3.4.0"
changes:
- "Require DPoP for export.* routes"
- "Quotas per edition tightened"
rollout:
canaryTenants: ["t-eu-01","t-us-02"]
metrics: ["abac.deny.count","egress.bytes","export.latency.p95"]
abortIf:
- "abac.deny.delta_pp > 5"
- "egress.route.violations > 0"
Quarterly Security Reviews¶
- Participants: Platform Security, SecOps, SRE, Product, Compliance/Legal, Data Governance.
- Agenda:
- KPIs: MTTD/MTTR, allow/deny drift, integrity failure rate, key rotation adherence.
- Risks: top open risks, exceptions expiring, pen test & bug bounty status.
- Roadmap: progress vs. targets; reprioritize based on incidents and business goals.
- Controls: effectiveness review; propose retire/replace/tune.
- Artifacts: minutes, action register, updated roadmap, updated risk entries, ADR references.
KPIs & Signals¶
- Risk: % risks in Mitigating past due (< 10%); residual risk trend ↓ quarter-over-quarter.
- Controls: policy signature freshness < 5 min; residency conformance ≥ 99.9%.
- Incidents: MTTD/MTTR targets met for S0/S1; repeat incident rate < 5% per quarter.
- Testing: chaos drills ≥ 1/region/month; pen test High/Critical remediated within SLA.
- Compliance: evidence pack on time; auditor access windows met with zero scope creep.
Operating Model & RACI (summary)¶
| Area | Owner | Consulted | Informed |
|---|---|---|---|
| Policies & ADRs | Platform Security | Product, Compliance | SRE, Eng |
| Risk Register | SecOps | Platform Security, Compliance | Product |
| Roadmap | Platform Security + Product | SRE, Compliance | Execs |
| Quarterly Review | Security Lead (chair) | All above | All teams |
Evidence & Acceptance¶
- Roadmap, risks, ADRs, policy bundles, and quarterly minutes are signed, timestamped, and discoverable.
- “Done” for the quarter when:
- Roadmap milestones updated; risks re-scored; exceptions reviewed/renewed or closed.
- KPIs reported; deltas explained; remediations tracked to owners/dates.
- All policy changes have ADR**s and **evidence hooks; canary/enforce states recorded.
Cross-references Security Control Framework · Least Privilege & Policy Enforcement · Security Monitoring & Detection · Incident Response & Forensics · Vulnerability & Patch Management · data-residency-retention.md
Guardrails (quick checklist)¶
- Every material security change has an ADR, a versioned policy, and evidence hooks.
- Risks are time-bound with owners and actions; residual risk tracked post-mitigation.
- Policy rollouts are canaried with explicit abort conditions; rollbacks are rehearsed.
- Quarterly reviews produce signed minutes, updated roadmap, and closed-loop actions.
Appendix A — Security Control Inventory (Summary)¶
| Control ID | Category | Description | Owner | Evidence | Test Freq |
|---|---|---|---|---|---|
| IAM-01 | Preventive | MFA for admin ops | Identity | Auth logs | Monthly |
| NET-01 | Preventive | mTLS inter-service | Platform | Cert rotation | Quarterly |
| ENC-01 | Preventive | Per-tenant encryption | Platform | KMS logs | Continuous |
| DET-01 | Detective | Anomaly detection | SRE | Alert history | Weekly |
| AUD-01 | Detective | Meta-audit stream | ATP | Audit records | Continuous |
| IAM-02 | Deterrent | JIT elevation (dual-approval, TTL) | SecOps | elevation.used events |
Continuous |
| OPA-01 | Preventive | ABAC at PEP-½ (policy-as-code) | Platform | ABAC decision logs, bundle sigs | Per release |
| DLP-01 | Preventive | Redaction enforcement on exports | DataGov | Export manifests, redact maps | Continuous |
| KMS-02 | Preventive | Key rotation & escrow (DEK/KEK) | Platform | Rotation manifests, HSM ops | Quarterly |
| INT-01 | Detective | Integrity verification (Merkle/anchors) | Core Eng | Verifier reports, TSA receipts | Daily/Weekly |
| DR-01 | Corrective | Quarantined restore (verify before use) | SRE | Restore logs, verify checklist | Monthly |
| EGR-01 | Preventive | Egress allowlist (deny-by-default) | Platform | Egress gateway logs, DNS policy | Continuous |
Full inventory is maintained in the security control registry and includes owners, evidence queries, acceptance criteria, and ADR links.
Appendix B — Cross-Reference Map¶
| Topic | Primary Document | Notes |
|---|---|---|
| Tenant isolation & guards | multitenancy-tenancy.md | ABAC, break-glass |
| Encryption & key mgmt | data-residency-retention.md §5, key-rotation.md | Per-tenant keys, rotation |
| PII handling | pii-redaction-classification.md | Classification, redaction |
| GDPR/HIPAA/SOC2 mapping | privacy-gdpr-hipaa-soc2.md | Detailed compliance |
| Integrity & tamper evidence | tamper-evidence.md | Hash chains, anchors |
| Zero Trust architecture | zero-trust.md | Principles, patterns |
| Backup security | backups-restore-ediscovery.md | Encrypted backups |
| Observability | observability.md | Security logs, metrics |
Appendix C — Threat Model Summary (High-Level)¶
| Threat | Impact | Likelihood | Mitigations |
|---|---|---|---|
| Cross-tenant access | Critical | Low | Tenancy guards (§5), ABAC |
| Tampering audit records | Critical | Low | Immutability, hash chains |
| Data exfiltration | High | Medium | Egress controls, encryption, DLP |
| Insider threat | High | Low | Least privilege, dual-control, audit |
| DoS/resource exhaustion | Medium | Medium | Rate limits, quotas, backpressure |
See “Threat Model & Attack Surface” for detailed actors, paths, and residual risk notes; definitions for Impact/Likelihood and scoring live in the risk register.