Security & Compliance - Audit Trail Platform (ATP)¶

ATP is security-first by design — every layer enforces defense-in-depth, least privilege, and continuous verification.

Purpose & Scope¶

This document defines ATP’s security posture and the control framework that governs it. It consolidates what we secure, how we secure it, and how we prove it — while deferring deep technical details to the specialized docs it references.

What this document covers

Establish ATP’s security architecture across network, identity, application, data, and operations.
Define a control framework (preventive/detective/corrective/deterrent) and how controls are owned, tested, and evidenced.
Lay out the threat model (actors, assets, trust boundaries) and incident response at a platform level.
Set the compliance attestation strategy (GDPR, HIPAA, SOC 2, ISO 27001) and evidence sources.
Link to detailed guides instead of duplicating:
- Tenancy & ABAC → multitenancy-tenancy.md
- Residency & retention → data-residency-retention.md
- Zero Trust & hardening → zero-trust.md
- Key management & rotation → key-rotation.md
- PII classification & redaction → pii-redaction-classification.md
- Backups/restore/eDiscovery → backups-restore-ediscovery.md

Out of scope (referenced elsewhere)

Implementation minutiae of guards, crypto primitives, or service-specific configs.
Day-2 operational runbooks beyond security (scaling, performance tuning, generic SRE playbooks).

Readers & ownership

Platform Security (owners): policies, control registry, audits.
SRE/Operations: detection engineering, response playbooks, drills.
Product/Engineering: secure SDLC, boundary enforcement in services.
Compliance/Legal/DPO: framework mappings, exceptions, auditor interface.

Artifacts produced

Security Control Inventory (IDs → owner, evidence, test cadence).
Threat Model (trust boundaries + risk matrix).
Incident Response Playbooks (cross-tenant, tampering, exfil, key compromise).
Compliance Crosswalks (GDPR/HIPAA/SOC2/ISO) with evidence pointers.
Attestation Pack templates (monthly/quarterly), CI/CD security gates.

Acceptance (done when)

Control inventory is complete, owned, and each control has evidence sources and test cadence.
Threat model states actors/vectors/assets, trust boundaries, risks, and mitigations.
Incident response is actionable, tested, and cross-linked to on-call docs.
Compliance mappings point to policies, logs, metrics, manifests, ADRs — no dead links.

Security Architecture Overview¶

ATP applies defense-in-depth across five layers — network → identity → application → data → operations — with Zero Trust defaults (see zero-trust.md). Security is enforced at every boundary of the core path: external API → ingestion → storage → query → export. Control-plane policies (OPA/Rego) govern the data-plane, and all decisions are observable and auditable.

Layers & Principles¶

Network
- VNet/VPC isolation per environment and region; private endpoints to data stores; service mesh mTLS and policy (deny-by-default).
- Ingress via API Gateway with WAF, rate-limits, bot/DoS protections, and IP allow-lists for operator endpoints.
Identity
- OIDC for users; workload identities for services; short-lived tokens with purpose and residency claims.
- mTLS between services; SPIFFE/SPIRE (or equivalent) SVIDs for service identity in mesh.
Application
- PEP-1 (Gateway) for coarse residency/tenancy guards; PEP-2 (Service) for fine-grained ABAC, classification, and quota checks.
- Policy-as-code (OPA/Rego) embedded and versioned; decisions stamped with policyVersion.
Data
- WORM evidence segments; hash-chained integrity with regional anchors; envelope encryption with tenant-scoped keys.
- Residency-aware storage and replication; export routes constrained by profile.
Operations
- Immutable, structured security logs; SIEM pipelines; alerts on guard violations, key ops, anomaly spikes.
- Runbooks and playbooks for incident response; continuous verification and drills.

Trust Boundaries & Core Path¶

flowchart LR
  U[Client / Integrations] -->|JWT + mTLS| G["API Gateway (PEP-1)"]
  G --> I["Ingestion Services (PEP-2)"]
  I --> S[(Hot Storage - WORM)]
  S --> Q["Query/Read Models (PEP-2)"]
  Q --> E["Export/eDiscovery (PEP-2)"]
  subgraph Control Plane
    P["Policy Engine (OPA/Rego)"]
    K[KMS/HSM]
    C[Residency & Retention Catalog]
    M[Monitoring/SIEM]
  end
  G --- P
  I --- P
  Q --- P
  E --- P
  S --- K
  E --- K
  G --- M
  I --- M
  Q --- M
  E --- M
  C --- G
  C --- I
  C --- Q
  C --- E

Hold "Alt" / "Option" to enable pan & zoom

Boundaries

External → Gateway: authentication, WAF, rate-limits, coarse tenancy/residency guard.
Gateway → Services: purpose-bound requests with signed context headers (X-Policy-*), correlation IDs.
Services → Data Stores: private network only; attribute checks (tenant, region, category) before read/write.

Control Plane vs Data Plane¶

Control Plane
- Policy Registry (residency, retention, ABAC), Key Management (per-tenant/region), Catalogs (tenants, regions, silos), Observability (metrics/logs/events).
- Drives decisions; emits evidence (policy changes, approvals, rotations, drill reports).
Data Plane
- Executes guards and modes: append, seal, verify, purge/redact, export.
- Mutations are ledgered and idempotent; decisions reference control-plane policyVersion.

Enforcement Points (PEP) & Decisions (PDP)¶

PEP-1 Gateway
- Enforces deny-by-default, residency route checks, DDoS/WAF, purpose binding, and basic quota.
- Annotates requests with decision context; drops or quarantines on mismatch.
PEP-2 Services
- Performs fine-grained ABAC (tenantId, regionCode, dataSiloId, category), classification/redaction checks, and export route validation.
PDP (Policy Engine)
- Co-located or sidecar OPA with signed bundles; hot-reload on policy updates; short decision cache with instant revocation on changes.

Key & Secrets Path¶

KMS/HSM per region; KEKs are region-anchored, DEKs are tenant-scoped and rotated aggressively.
No plaintext secrets in code or images; bootstrap via identity (Managed Identity) + short-lived tokens.
Key operations (wrap/unwrap/sign) logged with key lineage and surfaced to SIEM.

Integration Points¶

ConnectSoft Identity Context: source of user/service identities, entitlements, and purpose claims.
KMS/HSM: encryption at rest, anchor signing, manifest signing; rotation cadences from key-rotation.md.
Monitoring/Alerting: unified security signals (guard denials, anomaly spikes, key ops, export attempts) with routed on-call.
Data Residency & Retention: region topology, replication posture, purge/export modes (see data-residency-retention.md).

Guardrails (quick checklist)¶

Every request carries purpose and residency claims; decisions are deny-by-default without them.
All data-plane calls traverse private networks with mTLS; no public egress to data stores.
Policy bundles are signed, versioned, and audited; PEPs stamp policyVersion into logs.
Keys are region-anchored; no cross-border unwrap; all key ops audited.
Security telemetry is first-class: structured, PII-safe, and correlated end-to-end.

Threat Model & Attack Surface¶

We model threats with STRIDE (Spoofing, Tampering, Repudiation, Information disclosure, DoS, Elevation of privilege) and a kill-chain lens (recon → initial access → execution → persistence → exfiltration/impact). We keep the scope anchored to ATP’s trust boundaries, assets, and decision points.

Assets & Trust Boundaries¶

Crown-jewel assets

Hot evidence (WORM segments) and anchors (signed, timestamped).
Residency & retention catalogs (govern movement, purge eligibility).
KMS/HSM keys (tenant DEKs, region KEKs, anchor signing keys).
Policy bundles (OPA/Rego) and attestation manifests.
Export bundles (NDJSON/Parquet) + signed manifests.
Identity tokens (user/service), break-glass approvals.

Trust boundaries (high-level)

flowchart LR
  ext[External Clients/Integrations]
  gw["API Gateway (PEP-1)"]
  svc["Microservices (PEP-2)"]
  hot[(Hot/WORM Storage)]
  rm[(Read Models/Indexes)]
  exp[Export/eDiscovery]
  kms[[KMS/HSM]]
  opa[[Policy Engine/OPA]]
  obs[[SIEM/Observability]]
  cat[[Residency/Retention Catalogs]]

  ext -->|Authn + mTLS| gw --> svc
  svc --> hot
  svc --> rm
  svc --> exp
  svc --- kms
  gw --- opa
  svc --- opa
  gw --- obs
  svc --- obs
  exp --- kms
  svc --- cat

Hold "Alt" / "Option" to enable pan & zoom

Threat Actors (representative)¶

External attackers (targeted or opportunistic, incl. bots).
Malicious tenants (abuse APIs to escape scope or exfiltrate).
Insiders (over-privileged operators/devs; compromised accounts).
Supply chain (poisoned dependencies/containers, CI artifact tampering).
Compromised integrations (leaky webhooks, abused API keys).
Cloud control-plane misuse (misconfiguration, stale IAM roles).

Attack Vectors (STRIDE mapped)¶

Vector	STRIDE	Examples in ATP context	Primary mitigations
Unauthorized access	S/E	Token theft/replay, client secret leakage, JWT forgery attempts	OIDC, mTLS, short-lived tokens, purpose binding, JTI replay checks
Tampering	T	Modify evidence prior to seal; anchor substitution; policy bundle tampering	WORM, Merkle/anchors (HSM-signed), signed policy bundles, canary verification
Repudiation	R	Delete/alter security logs; deny actions taken	Append-only logs, meta-audit stream, signed manifests, time-stamps
Information disclosure	I	Cross-tenant query, cross-region export, DSAR over-breadth	ABAC tenancy/residency guards, export routes, redaction templates
DoS / resource exhaustion	D	Hot-shard floods, export storms, query fan-outs	Rate-limits, quotas, back-pressure, governors, cost guards
Elevation of privilege	E	Break-glass abuse, role drift, policy bypass	Dual-approval + TTL, least-privilege, OPA at PEP-½, continuous attestations

ATP-Specific Threats (focus items)¶

Cross-tenant leakage
- Why it matters: Multi-tenant platform; any leakage is critical.
- Paths: mis-scoped queries, missing tenantId/dataSiloId checks, cache bleed.
- Controls: PEP-1/PEP-2 ABAC, per-tenant caches, tenancy tags in indices, contract tests.
Integrity compromise of audit evidence
- Paths: anchor key misuse, Merkle root mismatch, TSA failures, “re-seal” attempts.
- Controls: WORM stores, HSM sign-only keys per region, bridge anchors for moves, scheduled verification with quarantine.
Retention bypass or purge abuse
- Paths: policy relaxation, silent purge without ledger, export-then-purge without verify.
- Controls: policy-as-code with bounds, purge ledger, export verification gates, dual approval for sensitive categories.
Residency violations / exfiltration
- Paths: cross-region reads/exports, replica hydration to disallowed regions.
- Controls: residency profiles, export routes, deny-by-default cross-region, cost guards, network egress allow-lists.
Break-glass misuse
- Paths: over-broad scopes, unlimited TTL, lack of follow-up.
- Controls: scoped approvals, ≤4h TTL, evidence hooks, mandatory post-mortem.
Supply-chain insertion
- Paths: poisoned dependency/container, tampered OPA bundles.
- Controls: SBOM, sigstore/cosign, verified builds, signed OPA bundles, admission policies.

Abuse Stories (examples to test)¶

“As a malicious tenant, I try to read US data with an EU token.” → Expect deny with cross_region_blocked.
“As an operator, I request break-glass to export global.” → Expect deny unless route permitted + dual approval + TTL.
“As an attacker with a leaked JWT, I call purge.” → Expect deny (missing purpose/scope) + alert + token revoke.

Risk Scoring & Prioritization¶

Scale: Likelihood (L) and Impact (I) from 1–5. Risk = L × I. Priority: ≥16 = Critical, 12–15 = High, 8–11 = Medium, ≤7 = Low. We track Residual after controls.

Threat	L	I	Risk	Residual	Owner	Next action
Cross-tenant leakage	3	5	15 (High)	9 (Med)	Platform Sec	Expand contract tests to cache layer; add per-tenant cache keys to lint
Integrity compromise	3	5	15 (High)	8 (Med)	Core Eng	Increase verify-on-read coverage to 20% for sensitive categories
Residency violation	2	5	10 (Med)	6 (Low)	SRE	Enforce cost estimate gate on all cross-region attempts
Retention bypass	3	4	12 (High)	8 (Med)	Data Gov	Require dual approval for `evidence.hot` purge; tighten dry-run diff thresholds
Break-glass misuse	2	4	8 (Med)	5 (Low)	Security Ops	Auto-create post-mortem tasks; add scope validators to approval UI
Supply-chain insertion	3	4	12 (High)	9 (Med)	Platform Sec	Gate deploys on cosign verification + SBOM diff alerts

Risk register entries link to controls, tests, and evidence (SIEM queries, manifests, ADRs). Residual risk is recalculated after each control change.

Detection & Response Hooks (per threat)¶

Cross-tenant attempts → abac.decision_denied{reason="tenant_mismatch"} alerts; dashboard by tenant/region.
Integrity failures → integrity.violation_detected; auto-quarantine; run verifier; block exports.
Retention anomalies → retention.dryrun.anomaly=true when >35% swing; require governance review.
Cross-region attempts → abac.decision_denied{reason="cross_region_blocked"} + cost guard logs.
Break-glass → break_glass.used with TTL countdown; reminder + post-mortem ticket at expiry.
Supply chain → CI gate fails on unsigned images/OPA bundles; runtime admission webhooks deny.

Assumptions & Constraints¶

Cloud provider physical security is out of scope; we trust IaaS attestations.
All services run with managed identities; no long-lived static secrets.
Data stores are reached only via private network; no public data-plane endpoints.
Policy bundles and images are signed; unsigned artifacts cannot run.

Evidence (what we capture)¶

Structured decision logs (ABAC, retention, export routes) with policyVersion.
Anchor signatures, TSA receipts, and verification reports.
Purge ledgers and export manifests (signed, checksummed).
Control tests: residency conformance, retention dry-runs, DR drill reports.
Supply-chain attestations: SBOMs, cosign signatures, provenance.

Security Control Framework¶

ATP’s controls are organized as preventive, detective, corrective, and deterrent measures, mapped to industry frameworks (NIST CSF, CIS Controls, ISO/IEC 27001 Annex A). Controls are policy-as-code, owned, tested on a defined cadence, and evidenced via immutable telemetry.

Control Categories (how we secure)¶

Preventive — stop bad things by default: Zero Trust network, ABAC at gateways/services, per-tenant encryption keys, WORM storage, egress allow-lists.
Detective — see what matters quickly: SIEM pipelines, anomaly detectors, guard-violation alerts, integrity verification jobs.
Corrective — recover to safe state: DR playbooks, restore to quarantine, purge ledgers with replay/rebuild of read models.
Deterrent — raise the bar & trace actions: dual-approval + TTL for break-glass, signed policies/images, visible audit trails and watermarked exports.

Framework Cross-Mapping (anchor points)¶

Domain	NIST CSF	CIS v8 (examples)	ISO/IEC 27001:2022 (Annex A)	ATP Control Examples
Asset/Context	ID.AM	1, 2	A.5.9, A.5.12	Residency/retention catalogs; control registry
Access Control	PR.AC	6	A.5.15, A.5.16	ABAC tenancy/residency guards; purpose-bound tokens
Data Security	PR.DS	3	A.8.24, A.5.10	Per-tenant/region encryption; WORM; redaction
Detection	DE.AE, DE.CM	8	A.8.15	Guard-violation alerts; SIEM correlation, anomaly detection
Response	RS.RP, RS.MI	17	A.5.30, A.5.31	IR playbooks (cross-tenant, tampering, exfil); kill switches
Recovery	RC.RP	11	A.5.31	Region-coherent backups; DR drills (read-only promotion)
Change/Config	PR.IP	4, 5	A.5.23	Signed OPA bundles/images; CI policy gates; ADRs
Supplier/SBOM	ID.SC	15	A.5.20, A.5.19	Cosign-verified images, SBOM diff alerts, license checks
Logging/Evidence	DE.AE	8	A.8.15	Meta-audit stream; signed manifests; evidence packs

Detailed mappings for GDPR/HIPAA/SOC 2/ISO are maintained in privacy-gdpr-hipaa-soc2.md.

Control Inventory (samples)¶

Control ID	Category	Description	Owner	Evidence Sources	Test Cadence
AC-ATP-001	Preventive	ABAC enforcement at Gateway (PEP-1): tenant/region/purpose	Platform Security	ABAC decision logs, Rego bundle signature, gateway policyVersion	Continuous + per-release contract tests
DS-ATP-010	Preventive	Per-tenant envelope encryption; region-anchored KEKs	Platform Security	KMS/HSM audit logs, key rotation manifests	Daily key checks + quarterly rotation drill
IN-ATP-020	Detective	Integrity verification (on-read/sample/scheduled)	Core Eng	`integrity.verify.*` metrics, violation events, verifier reports	Daily sample + weekly full for last week
RE-ATP-030	Corrective	Restore to quarantine namespace (read-only)	SRE	Restore logs, hash/anchor/TSA verification evidence	Monthly restore drill per region
EX-ATP-040	Deterrent	Break-glass dual approval with TTL and scope	Security Ops	Approval records, `break_glass.used` events, audit trail	Continuous; monthly review of usage
RS-ATP-050	Preventive	Residency profiles with export route gating	Data Governance	Residency decisions, export manifests, policy registry	Continuous; conformance suite nightly
CH-ATP-060	Preventive	Signed OPA bundles and container images	Platform Security	Cosign attestations, admission controller logs	Per-build; gate blocks unsigned
LG-ATP-070	Detective	Meta-audit stream (append-only) for guard decisions	ATP Team	Structured logs with correlationId & policyVersion	Continuous; weekly integrity spot-check

Control Registry Schema (sketch)¶

id: "AC-ATP-001"
name: "Gateway ABAC Enforcement"
category: preventive
frameworks:
  nist_csf: ["PR.AC", "PR.PT"]
  cis_v8: [6]
  iso27001: ["A.5.15", "A.5.16"]
owner: "Platform Security"
policyRefs:
  - "platform/multitenancy-tenancy.md#guards"
  - "platform/data-residency-retention.md#residency-aware-access-controls"
evidence:
  logs:
    queries:
      - name: "abac_denies_24h"
        query: 'decision=="deny" and reason endswith "_blocked"'
  metrics: ["abac.allow.count", "abac.deny.count"]
  artifacts: ["rego-bundle.sig", "policyVersion"]
tests:
  type: ["contract", "integration"]
  cadence: { continuous: true, nightly: true }
  successCriteria:
    - "deny rate baseline ±5pp"
    - "no unsigned bundle admitted"
risk:
  inherent: high
  residual: medium

Evidence & Test Cadence (how we prove it)¶

Evidence sources: SIEM queries, KMS/HSM logs, signed manifests (exports/purge/anchors), purge ledgers, verification reports, ADRs and policy diffs.
Cadence:
- Continuous: guard decisions, SIEM correlations, unsigned artifact blocks.
- Nightly: residency conformance suite; integrity sample verify; dependency scan.
- Weekly: retention dry-run stats review; key/anchor log review.
- Monthly/Quarterly: restore and DR drills; control attestation pack generation.
Gates: CI blocks on unsigned images/OPA bundles, policy lint failures, residency test regressions.

Cross-References (implementation details)¶

Tenancy & ABAC guards → multitenancy-tenancy.md
Residency & retention controls → data-residency-retention.md
Zero Trust/service mesh & boundary hardening → zero-trust.md
Key rotation & escrow → key-rotation.md
Observability & evidence packs → observability.md, backups-restore-ediscovery.md

Guardrails (quick checklist)¶

Every control has a named owner, policy reference, evidence query, and test cadence.
Controls are policy-as-code with signatures and versioning; changes require ADRs.
CI/CD gates enforce signatures, lints, and conformance tests before deploy.
Evidence is immutable, PII-safe, and tied to policyVersion for audit traceability.

Authentication & Authorization Strategy¶

ATP uses token-based authentication with RBAC for coarse permissions and ABAC for contextual enforcement (see multitenancy-tenancy.md §5). All service-to-service calls are sender-constrained (mTLS or DPoP), and purpose-bound tokens are required at every boundary. Break-glass is time-boxed, dual-controlled, and fully audited (see §19 in multitenancy-tenancy.md).

Identities & Token Types¶

Human users
- OIDC interactive flows (Auth Code + PKCE).
- JWTs carry tenant, scopes, and entitlements resolved from the Identity context.
- Short-lived access tokens (≤ 15 min) + refresh tokens (rotating).
Workloads (services, jobs, agents)
- mTLS between services; client credentials to obtain short-lived access tokens (≤ 5 min).
- Managed identities (where supported) for secret-less bootstrap.
- For ingestion agents, optionally scoped API keys (HMAC) with tight IP/rate/TTL bounds.
Delegation (On-Behalf-Of)
- OBO exchange produces a down-scoped service token that encodes the original user (via act/obo claim) and strips disallowed scopes.

Claims Model (sketch)¶

Tokens are purpose-bound and residency-aware. Minimal claim set:

{
  "iss": "https://id.connectsoft.example",
  "sub": "svc-query",                 // user or workload identity
  "aud": "atp-api",
  "exp": 1730190000,
  "nbf": 1730189100,
  "jti": "4d1c...",
  "tenant_id": "7c1a-...",
  "edition": "enterprise",
  "scopes": ["evidence.read", "index.query"],
  "entitlements": ["ediscovery.viewer"],
  "purpose": "default",               // e.g., dsar_export, ops_triage, dr_failover_review
  "region_code": "EU",
  "data_silo_id": "silo-7c1a...",
  "policy_version": "3.3.0",
  "break_glass": false,
  "obo": { "sub": "user@tenant", "amr": ["mfa"] }   // present for OBO flows
}

RBAC + ABAC (how decisions are made)¶

RBAC grants baseline abilities (e.g., ediscovery.viewer, retention.operator).
ABAC enforces tenant/region/category/purpose at request time:
- Same-region writes only; cross-region reads/exports deny-by-default unless profile allows.
- Exports must match an allowed route (in_region, same_code, global) per residency profile.
- Missing purpose or residency attributes → deny.

Rego snippet (gateway PEP-1, sketch)

package atp.authz

default allow = false

allow {
  input.token.tenant_id == input.resource.tenantId
  input.token.region_code == input.resource.regionCode
  input.token.purpose != ""
  input.op == "read"
}

allow {
  input.op == "export"
  input.token.purpose == "dsar_export"
  input.export.route == allowed_route[input.token.residency_profile]
}

allowed_route["gdpr-standard"] = "in_region"

Service-to-Service Patterns¶

mTLS + client credentials
- Workload obtains sender-constrained access token (mTLS or DPoP).
- PEP verifies TLS binding and token audience; rejects bearer-only tokens on internal hops.
Scoped API keys (ingestion only)
- HMAC-signed requests with key id (kid) and rotating secret.
- Constraints baked into key metadata: tenant, category, IP/CIDR, max QPS, expiry.
- Keys can only append; no read/export scopes.

HMAC header (example)

Authorization: ATP-HMAC kid=ing-01,ts=1730189300,nonce=2f9c...,sig=base64(hmac_sha256(secret, method|path|ts|nonce|sha256(body)))

Propagation & Context¶

Gateway adds signed context headers (immutable within hop):
- X-Policy-Decision, X-Policy-Version, X-Region-Code, X-Data-Silo-Id, X-Correlation-Id.
Services must re-evaluate ABAC with local resource attributes; never trust caller’s tenant/region blindly.

Token Lifetimes & Rotation¶

Access tokens: ≤ 15 min (humans), ≤ 5 min (workloads); clock-skew tolerant.
Refresh tokens: rotating; revoke on logout / risk events.
JWKS: kid pinning; staged key rotation with overlap; reject stale kid after grace period.
DPoP/mTLS binding: required for privileged scopes (export, purge, residency admin).

Break-Glass Workflow (controlled exception)¶

Request: operator submits justification, scope (tenantId, regionCode, category, op), desired TTL (≤ 4h).
Dual approval: Security + DPO/Legal; system issues a scoped token with break_glass=true and approval_id.
Enforcement: PEPs only allow routes covered by the approval; all calls write explicit evidence (break_glass.used).
Aftercare: auto-expiry → revoke; create post-mortem ticket with linked logs/metrics.

Common Flows¶

sequenceDiagram
  autonumber
  participant U as User
  participant ID as Identity (OIDC)
  participant GW as API Gateway (PEP-1)
  participant S as Service (PEP-2)
  participant K as KMS/HSM
  U->>ID: Auth Code + PKCE
  ID-->>U: Access Token (JWT)
  U->>GW: Request (JWT)
  GW->>GW: RBAC + ABAC (purpose/tenant/region)
  GW-->>S: Forward + signed X-Policy-* headers (mTLS)
  S->>S: ABAC on resource attrs
  S-->>K: (optional) Sign/Wrap (mTLS)
  S-->>GW: Response + decision logs

Hold "Alt" / "Option" to enable pan & zoom

Guardrails (quick checklist)¶

Require purpose and residency claims for all sensitive routes; deny-by-default without them.
All internal hops are sender-constrained (mTLS/DPoP); bearer-only tokens are rejected inside the mesh.
Enforce OBO down-scoping; never forward end-user tokens directly to storage.
API keys are append-only, scoped, rotated, and rate-limited; no read/export via keys.
Break-glass tokens are time-boxed, scope-limited, dual-approved, and leave a full audit trail.
Token/JWKS rotation is staged with overlap; stale kid is blocked after grace.

Network & Boundary Controls¶

ATP's network follows a hub–spoke, private-by-default design. All data-plane paths are east–west over private networks with mTLS in the mesh; ingress is only through the API Gateway; egress is restricted to an explicit allowlist. Cross-region topology aligns with data-residency-retention.md §3.

Topology & Isolation¶

Per-region VPC/VNet isolation with separate spokes for:
- Gateway (ingress), Services (ingestion/query/export), and Data (Hot/WORM, Read Models).
Private endpoints/Private Link to all stateful stores (no public data-plane endpoints).
Hub provides shared services: egress gateway, DNS, SIEM collectors, policy distribution.
Peering is intra-region only; cross-region traffic uses private interconnect and is governed by residency policies.

flowchart LR
  subgraph "Region (EU West)"
    HUB[[Hub]]
    GW["API Gateway (WAF/DDoS/PEP-1)"]
    SVC1["Ingestion Svc (PEP-2)"]
    SVC2["Query Svc (PEP-2)"]
    HOT["(Hot/WORM Storage)<br/>Private Link"]
    WARM["(Read Models)<br/>Private Link"]
    EGR[Egress Gateway]
    DNS[Private DNS]
    HUB---GW
    HUB---EGR
    HUB---DNS
    GW---SVC1
    GW---SVC2
    SVC1---HOT
    SVC2---WARM
  end
  subgraph "Region (EU North)"
    GWN[Gateway Mirror]
    SVCN[Read Replica]
    WARMR["(Warm Replica)<br/>Private Link"]
  end
  SVC2==Private Interconnect / Mesh mTLS==>SVCN

Hold "Alt" / "Option" to enable pan & zoom

Ingress (API Gateway)¶

WAF with OWASP CRS (SQLi, XSS, RCE, SSRF) + custom rules:
- Block query strings with known attack payloads, XML entity expansion, over-sized bodies.
DDoS protections, rate limiting (per IP/tenant), and bot management.
TLS: modern ciphers, HSTS, TLS ≥ 1.2 (prefer 1.3); optional SPKI pinning for operator portals.
Operator endpoints behind IP allowlists and MFA-backed auth flows.
Context sealing: adds signed X-Policy-* headers and correlation IDs for downstream enforcement.

East–West (Service Mesh Policies)¶

mTLS STRICT for all service-to-service traffic; deny-by-default authorization in mesh.
Workload identity per service (SPIFFE/SPIRE or equivalent) → policy references identity + namespace.
L7 authz: only allow paths/verbs required for the contract; block wildcards.
Sidecar egress: forced through the egress gateway; direct internet blocked.

Data Plane Boundaries¶

Private Link to data stores; storage accounts/network rules deny public access.
Attribute checks at boundary: tenant/region/category must match before read/write.
WORM backends enforce immutability at storage layer; admins cannot bypass via network.

Egress Controls (Exfiltration Prevention)¶

Allowlist-only: outbound flows restricted to approved FQDNs/CIDRs (e.g., timestamping authority, compliance webhooks).
DNS: private resolvers with split-horizon; public DNS calls blocked except for allowlisted domains.
NAT/Egress gateway: single choke point with logging; no direct pod-to-internet.
Data residency hooks: cross-region routes are denied by default; if allowed, path must be private and same RegionCode.

Egress allowlist (sketch)

egress:
  default: deny
  allow:
    - name: tsa-rfc3161
      host: tsa.example.tld
      ports: [443]
      purpose: integrity_timestamp
    - name: audit-webhook
      host: audit.soc2-partner.tld
      ports: [443]
      purpose: evidence_delivery
dns:
  blockPublic: true
  privateZones: [ "svc.cluster.local", "priv.atp.internal" ]
mesh:
  mtls: strict
  outbound:
    viaEgressGateway: true

Regional Network Topology (residency-aware)¶

Authoritative writer stays in the tenant’s CloudRegion; replicas (if any) live only in permitted RegionCodes.
No cross-family peering (e.g., EU↔US) unless the residency profile explicitly allows and a legal basis is recorded.
Failover posture defaults to read-only (see DR sections), with routing changes broadcast via the Residency Catalog.

Operational Controls & Evidence¶

Continuous validation: policy conformance tests (deny direct public egress, deny storage public endpoints).
Telemetry: network.egress.blocked.count, waf.blocked.count, mesh.denied.count, private_link.bytes.
Forensics: packet captures at the egress gateway (on-demand), flow logs, and WAF request samples (redacted).

Guardrails (quick checklist)¶

All stateful services use Private Link; public access disabled at resource level.
Ingress only through the API Gateway (WAF/DDoS/rate-limit); operator endpoints IP-allowlisted.
Mesh enforces mTLS STRICT + deny-by-default; internal calls are sender-constrained.
Egress is allowlisted and routed via a single gateway; DNS is private and controlled.
Cross-region connectivity respects residency profiles; unauthorized paths are blocked and logged.

Application Security & Secure SDLC¶

ATP’s SDLC bakes security in from design → code → build → deploy → operate. We enforce secure defaults, verify via automated gates, and record decisions in security-focused ADRs.

Secure Coding Standards¶

Validate at boundaries: length/range/format, strict whitelists, reject-by-default.
Encode on output: HTML/JS/URL/SQL contexts; never concatenate into queries.
Parameterized queries everywhere (ORM or parameterized SQL); no string interpolation.
Canonicalize inputs before comparison; normalize Unicode; trim invisible characters.
Least privilege for service accounts; deny filesystem/network access by default.
Safe deserialization: use allowlists; avoid dynamic type binders.
Secrets never logged; structured logs use redaction providers.
CSRF/CORS: same-site cookies, anti-forgery tokens, explicit origins allowlist.
Do not trust client-side checks; repeat server-side verification.
Crypto: approved algorithms only; use platform crypto APIs; no homegrown crypto.

Example (C# parameterized query, NHibernate/ADO.NET)

using var cmd = new MySqlCommand(
    "SELECT * FROM Events WHERE TenantId = @tenant AND CreatedAt >= @from AND CreatedAt < @to",
    conn);
cmd.Parameters.AddWithValue("@tenant", tenantId);
cmd.Parameters.AddWithValue("@from", fromUtc);
cmd.Parameters.AddWithValue("@to", toUtc);

Example (output encoding in Razor)

<!-- Razor HTML encodes by default -->
<div class="event">@Model.Summary</div>

Dependency & Supply Chain Hygiene¶

SBOM generation per build (CycloneDX/SPDX) and diff alerts in CI.
SAST on every PR; DAST (baseline + authenticated) per release.
Container/IaC scanning (Dockerfiles, Helm/Bicep/Terraform) with policy gates.
Sigstore/cosign verification for images and signed OPA policy bundles.
Pinned versions; no latest. Lockfiles committed. Transitive deps monitored.
License compliance: allowlist; flag copyleft where redistribution applies.

Azure DevOps pipeline (sketch)

stages:
- stage: security_checks
  jobs:
  - job: sbom_sast_iac
    steps:
    - script: dotnet build /p:ContinuousIntegrationBuild=true
    - script: dotnet tool run cyclonedx   # SBOM
    - script: dotnet tool run securityscan # SAST (e.g., security code scan)
    - script: trivy fs --exit-code 1 .     # IaC + secrets scan
    - script: trivy image --exit-code 1 $(imageName) # Container scan
  - job: dast
    steps:
    - script: zap-baseline.py -t $(DEPLOYED_URL) -r zap.html
      condition: and(succeeded(), eq(variables['RunDAST'], 'true'))
- stage: sign_and_verify
  jobs:
  - job: cosign_verify
    steps:
    - script: cosign verify --key $(COSIGN_PUB) $(imageName)

Secrets Management¶

No secrets in code, images, or pipelines. Use Managed Identity → Key Vault/HSM.
Short-lived credentials only; rotate keys automatically; alert on static creds.
Secret discovery scans in CI; policy blocks merges if detected.
App config uses Key Vault references; services fetch at startup with retry/jitter.
Zero plaintext: TLS in transit, encrypted at rest; sensitive config redacted in logs.

Code Review & Security ADRs¶

Two-person rule for security-affecting changes (auth, crypto, policy, network).
Security checklist on PRs:
- Input validation/encoding ✔
- Authorization at boundary (PEP-1/PEP-2) ✔
- Residency/tenancy attributes plumbed ✔
- Secrets removed; config via Key Vault ✔
- Logging PII-safe; correlation IDs present ✔
- Tests added: happy path, deny path, negative/abuse stories ✔
Security ADR required when changing: authN/Z flows, encryption, data movement, egress, policy engines.
- Template fields: Context, Decision, Alternatives, Risks, Rollout & Rollback, Evidence hooks.

CI/CD Security Gates¶

Fail fast on: unsigned image/policy bundle, high/critical CVE, secret leak, failing SAST/DAST, policy lint errors.
Policy-as-code enforcement (OPA) for: residency routes, network egress, image provenance, SBOM allowlist.
Canary with guards: enable new controls for a cohort; auto-rollback on guard regressions (deny spike, integrity fail).

Policy lint (Rego sketch)

package ci.guardrails

deny[msg] { input.container.tag == "latest"; msg := "no 'latest' tags" }
deny[msg] { input.image.signature.verified != true; msg := "unsigned image" }
deny[msg] { some cve in input.sbom.cvss; cve.score >= 7.0; msg := "high CVE present" }

Security Testing (what we automate)¶

Unit/contract tests for authZ decisions, tenancy/residency ABAC, classification redaction.
Integration tests with private endpoints and mTLS; ensure bearer-only tokens fail in mesh.
Abuse tests: injection payloads, path traversal, oversized body, replay with stale jti.
Regression packs for past incidents or near-misses.

Developer Experience & Guardrails¶

Secure templates and snippets (parameterized queries, encoding helpers, HTTP clients with mTLS).
Pre-commit hooks: secret scans, formatting, linting.
Local dev: seeded test identities, fake Key Vault, local OPA bundle; no real secrets.
Education: OWASP Top 10 refreshers, crypto dos/don’ts, “how to file a Security ADR”.

Evidence & Acceptance¶

Artifacts: SBOMs, scan reports, cosign attestations, policy bundle signatures, SAST/DAST dashboards.
Logs/metrics: security.scan.failed.count, image.unsigned.blocked.count, secrets.leak.detected.
Done when: PR passes all security gates, has security checklist ticked, and changes are recorded in a Security ADR if applicable.

Cross-References¶

Residency-aware access & data guards → data-residency-retention.md
Tenancy & ABAC → multitenancy-tenancy.md
Zero Trust & mesh hardening → zero-trust.md
Key rotation & escrow → key-rotation.md

Data Security Controls¶

ATP treats data security as a layered control: strong cryptography (at-rest & in-transit), classification & redaction at boundaries, immutability & integrity for evidence, and encrypted backups with secure restore. See data-residency-retention.md §5 and key-rotation.md for keying details; see tamper-evidence.md for integrity; see backups-restore-ediscovery.md for backup/restore.

Encryption (At-Rest)¶

Envelope encryption:
- DEK (per-tenant, per-artifact class): AES-256-GCM, rotated aggressively (e.g., daily or per-segment).
- KEK (per-region): HSM-anchored RSA-OAEP-256 or AES-KW; no cross-border unwrap.
- Sign-only keys (HSM) for anchors/manifests; separate from KEKs (SoD).
Key lineage in metadata:
- keyId, keyVersion, alg, iv, aadHash, createdAt, regionCode.
Policy (excerpt)

  keys:
    tenantScoped: true
    regionAnchored: true
    algorithms: { encrypt: AES-256-GCM, wrap: RSA-OAEP-256, sign: ECDSA-P256-SHA256 }
    rotation:
      dek: P1D
      kek: P90D
      overlapGrace: P7D
    escrow:
      jurisdiction: match(regionCode)
      dualControl: true

Stores: WORM segments (hot), immutable object storage (cold), indices (warm) all encrypted with tenant DEKs wrapped by regional KEKs.

Artifact encryption header (example)

{
  "tenantId":"7c1a-...",
  "category":"evidence.hot",
  "regionCode":"EU",
  "keyId":"k-eu-01",
  "keyVersion":"8",
  "alg":"AES-256-GCM",
  "iv":"b64:...",
  "aadHash":"sha256:2a9e...",
  "sealedAt":"2025-10-29T07:55:11Z"
}

Encryption (In-Transit)¶

mTLS everywhere in mesh; TLS ≥ 1.2 (prefer 1.3) at ingress; HSTS on public endpoints.
Sender-constrained tokens (mTLS/DPoP) for privileged scopes (export/purge/residency-admin).
SPKI pinning optional for operator consoles; cipher suites with forward secrecy only.
Private endpoints to data stores; no public data-plane listeners.

Classification & Redaction¶

Data classes: public, internal, restricted, secret.pi (PII), secret.phi (health), secret.keys (cryptographic material).
Classification sources: schema registry tags + runtime detectors for PII fields (email, phone, address, IDs).
Redaction templates applied at export and logs:
- Hash (irreversible, per-tenant salt/pepper): sha256(salt_tenant || value).
- Mask (keep last N chars): e.g., ****1234.
- Drop (remove field).
- Tokenize (vault-backed, reversible with approval).
Template (example)

redactTemplates:
  pii-default:
    email:    { mode: hash }
    phone:    { mode: hash }
    address:  { mode: drop }
    ssn:      { mode: mask, keepLast: 4 }

* Policy-as-code guard (export) (Rego sketch)

package atp.export

default allow = false

# deny if secret class without proper purpose
deny[msg] {
  input.resource.class in {"secret.pi","secret.phi"}
  input.token.purpose != "dsar_export"
  msg := "export purpose not permitted for secret class"
}

# require redact template for PII unless DSAR
deny[msg] {
  input.resource.class == "secret.pi"
  input.token.purpose != "dsar_export"
  input.export.redactTemplate == ""
  msg := "missing redact template for PII export"
}

allow {
  not deny[_]
}

Immutability & Integrity (Evidence)¶

WORM: append-only segments; admins cannot rewrite historical data.
Merkle chains per stream/segment; anchors signed by regional HSM keys; optional RFC 3161 TSA timestamping.
Verification:
- On-write: verify segment before seal.
- On-read sampling: verify Merkle path and anchor signature.
- Scheduled: rolling verification (≥ 100% coverage over policy window).
Migration: bridge anchors with provenance when moving regions; old anchors preserved.

Integrity manifest (excerpt)

anchor:
  id: "A_000145"
  stream: "aud.gateway"
  region: "westeurope"
  merkleRoot: "b64:R_n"
  signedBy: "hsm-eu-01"
  tsa: { token: "b64:..." }
  policyVersion: "3.3.0"

Backup Encryption & Secure Restore¶

Backups: region-local, immutable class, encrypted with tenant DEKs / region KEKs; manifests signed.
Restore: lands in quarantine (read-only); verify checksums, Merkle roots, anchor signatures, TSA receipts; only then promote.
Access: restore operations require scoped tokens; logs include tenantId, regionCode, policyVersion, keyId, keyVersion.

Backup policy (excerpt)

backup:
  regionLocal: true
  encrypt: { keyIdFrom: tenant_scope, hsm: true }
  sign: { cms: true, tsa: true }
restore:
  quarantine: { posture: read_only, ttl: P7D }
  verification: [ checksum, merkle, anchor, tsa ]

Evidence & Monitoring¶

Metrics: encryption.key.rotate.count, integrity.verify.fail.count, export.redacted.fields, backup.success.count, restore.verify.fail.count.
Logs: key ops with lineage (keyId, keyVersion), export redaction maps (no raw values), integrity verify outcomes.
Alerts: integrity failure, unsigned artifact detected, KEK rotation overdue, cross-border unwrap attempt.

Guardrails (quick checklist)¶

Never store or transmit plaintext secrets; use Managed Identity + Key Vault/HSM.
Encryption keys are tenant-scoped (DEK) and region-anchored (KEK); no cross-border unwrap.
Evidence is immutable; all integrity proofs are verifiable offline (anchors + TSA).
Exports of sensitive classes require purpose and redaction templates; DSAR is the exception pathway.
Backups are encrypted & signed; restores are quarantined and verified before promotion.

Least Privilege & Policy Enforcement¶

ATP enforces minimum necessary access with time-bound, purpose-bound grants. Authorization is expressed as policy-as-code (OPA/Rego) and applied at every boundary (gateway, services, storage, export). Every decision is captured in a meta-audit stream tied to policyVersion.

Principles¶

Deny by default; no ambient privileges.
Just-Enough Authorization (JEA): scopes and actions are as narrow as possible.
Just-In-Time (JIT) elevation: short TTL, automatic expiry, dual approval for sensitive ops.
Purpose binding: tokens must include purpose aligned with the requested operation.
Contextual ABAC: decisions consider tenantId, regionCode, dataSiloId, category, edition, and purpose.

JIT Elevation (privilege broker)¶

Operators request elevation with scope, resource, purpose, and TTL (≤ 4h).
Dual approval for admin/sensitive categories (purge, export, residency admin).
Broker issues a down-scoped token (break_glass=false for routine JIT; true only for emergency flows) with:
- entitlements subset, purpose, resource_scope (tenant/region/category), ttl, approval_id.
PEPs enforce TTL and scope; all calls emit elevation.used evidence and attach approval_id.

Privilege grant (example)

{
  "approval_id": "apr_01J6...",
  "requestor": "alice@ops",
  "approvers": ["sec.ops", "dpo"],
  "scope": ["retention.simulate"],
  "resource_scope": { "tenantId":"7c1a-...", "regionCode":"EU", "category":"evidence.hot" },
  "purpose": "ops_triage",
  "ttl": "PT1H"
}

Policy-as-Code (OPA/Rego)¶

Policies cover tenancy/residency, classification/redaction, quota/cost, and rate limits.
Bundles are signed, versioned, and hot-reloaded; decisions stamp policyVersion.

Rego (ABAC + purpose, sketch)

package atp.guard

default allow = false
deny[msg] { input.token.purpose == ""; msg := "missing purpose" }
deny[msg] { input.token.tenant_id != input.resource.tenantId; msg := "tenant mismatch" }
deny[msg] { input.token.region_code != input.resource.regionCode; msg := "cross-region blocked" }

allow {
  input.op == "read"
  input.token.scopes[_] == "evidence.read"
  not deny[_]
}

Rego (quota + egress guard, sketch)

package atp.quota

deny[msg] {
  input.op == "export"
  bytes := input.request.bytes
  bytes > data.quota[input.token.tenant_id].export_monthly_remaining
  msg := "export quota exceeded"
}

deny[msg] {
  input.op == "export"
  input.route != data.residency[input.token.residency_profile].allowed_route
  msg := "export route not allowed"
}

Runtime Enforcement (PEP-1 / PEP-2)¶

PEP-1 (Gateway): coarse checks (authn, rate/WAF, tenancy/residency route, basic quota). Adds signed X-Policy-* headers, X-Correlation-Id.
PEP-2 (Services): fine-grained ABAC on resource attributes (tenant/region/category), classification/redaction, export routes, cost/egress checks.
Data boundary: verify attributes again before any read/write; WORM backends enforce immutability irrespective of caller.

Decision Logging (meta-audit stream)¶

Every guard produces a structured decision:

{
  "ts":"2025-10-29T08:45:15Z",
  "policyVersion":"3.3.0",
  "decision":"deny|allow|quarantine",
  "reason":"cross_region_blocked|quota_exceeded|ok",
  "tenantId":"7c1a-...",
  "regionCode":"EU",
  "category":"evidence.hot",
  "op":"read|write|export|purge",
  "purpose":"dsar_export",
  "correlationId":"6b3f-...",
  "approval_id":"apr_01J6...",       // when JIT/Break-glass applies
  "latencyMs":7
}

No raw PII; sensitive fields are hashed/redacted per log policy.
Decisions are immutable (append-only sink) and feed SIEM and compliance evidence packs.

Operational Flow (example)¶

Client calls Gateway with JWT (purpose, tenant, region).
PEP-1 evaluates OPA; on allow → forwards with signed context.
Service re-evaluates PEP-2 against resource; if export, checks redaction and quota/route.
Decision is logged; on deny, returns 403 GuardViolation with a reason code.

Metrics & Alerts¶

Metrics: abac.allow.count, abac.deny.count, quota.violation.count, elevation.used.count.
Alerts:
- Spike in deny for tenant_mismatch (> +5 pp over baseline).
- Quota exceed events without prior cost estimate.
- Elevation used near expiry without closure (post-mortem reminder).

CI/CD Integration¶

Policy lint and bundle signature checks as pipeline gates.
Contract tests for deny paths (cross-tenant, cross-region, missing purpose).
Canary policy rollout with automatic rollback on deny/allow drift.

Guardrails (quick checklist)¶

Access is deny-by-default; tokens must include purpose and residency claims.
Elevations are JIT, scoped, TTL-bound, and dual-approved when sensitive.
Policies are signed & versioned; PEPs stamp policyVersion and log all decisions.
Enforcement occurs at gateway, service, and data boundaries; no single gate is trusted alone.
Meta-audit stream contains decision-grade evidence with no raw PII; feeds SIEM and compliance packs.

Security Monitoring & Detection¶

ATP emits decision-grade telemetry and correlates it in a SIEM-first pipeline. Signals come from the gateway (PEP-1), services (PEP-2), policy engine, KMS/HSM, WAF, mesh, and egress gateway. All events carry correlation IDs, classification tags, and policyVersion to enable high-precision detection. See also observability.md and alerts-slos.md.

Telemetry Model (event schema, excerpt)¶

{
  "ts":"2025-10-29T08:59:11Z",
  "tenantId":"7c1a-...",
  "regionCode":"EU",
  "service":"exporter",
  "boundary":"PEP-2",
  "category":"security.guard",         // guard|authn|authz|integrity|keyops|egress|waf
  "event":"abac.decision_denied",
  "decision":"deny",
  "reason":"cross_region_blocked",
  "policyVersion":"3.3.0",
  "correlationId":"c-6b3f...",
  "subject":"svc-query",
  "purpose":"dsar_export",
  "route":"EU->US",
  "class":["sec","privacy"],           // classification tags
  "severity":"medium",
  "labels":{"edition":"enterprise","dataSiloId":"silo-7c1a"},
  "network":{"srcIp":"hash:...","ua":"hash:..."}
}

PII-safe: sensitive fields hashed/redacted per log policy.
Dimensions: tenantId, regionCode, category, boundary, policyVersion, purpose, decision.

SIEM Integration (sources & flows)¶

Sources: Gateway/WAF, PEP-2 services, OPA/Policy bundles, KMS/HSM key ops, mesh authz, egress gateway, cloud control-plane.
Transport: reliable shipper (structured JSON), regional in-region sinks (residency-aligned).
Normalization: taxonomy to ECS/OCSF-like fields; enrich with tenant profile, residency profile, and asset inventory.
Retention: ≥ 180 days online, ≥ 365 days archive (write-once), aligned to data-residency-retention.md.

Detections (rules & analytics)¶

Guard violations (ABAC)

guards
| where event == "abac.decision_denied"
| summarize cnt = count() by reason, tenantId, regionCode, bin(ts, 15m)
| join kind=leftanti (
  baseline_abac_denies
  | where reason == "cross_region_blocked"
  | project reason, tenantId, regionCode, p95 = p95_denies_15m
) on reason, tenantId, regionCode
| where cnt > p95 * 1.5

Failed auth spikes

auth
| where result == "fail"
| summarize fails=count() by tenantId, bin(ts, 5m)
| join (auth | where result=="success" | summarize succ=count() by tenantId, bin(ts,5m)) on tenantId, ts
| extend ratio = todecimal(fails) / (succ + 1)
| where fails > 50 and ratio > 3.0

Egress anomaly (possible exfil)

egress
| summarize bytes=sum(bytes) by tenantId, route, bin(ts, 1h)
| join kind=leftouter (egress_baseline) on tenantId, route
| where bytes > baseline_bytes * 2 and route !in ("in_region","same_code")

Key operations anomaly

keyops
| where op in ("unwrap","sign") and result=="success"
| summarize cnt=count() by keyId, bin(ts, 10m)
| where cnt > 3 * avg_cnt_7d   // computed via windowed baseline job

Sigma (break-glass outside TTL)

title: Break-glass Used Outside Approved TTL
id: 56d8a1b0-1f4b-4af8-9d5a-ttl
status: experimental
logsource: { product: atp, service: guards }
detection:
  sel_event: event: "break_glass.used"
  sel_ttl:
    ttl_remaining_seconds: "<= 0"
condition: sel_event and sel_ttl
level: high
fields: [ tenantId, regionCode, approval_id, subject, purpose ]

Anomaly Detection & ML¶

Seasonal baselines for: ABAC denies, export bytes per route, key ops/min, WAF blocks/IP.
Entity analytics: per tenant/identity impossible travel (region drift), purpose drift (new purposes for identity), and role drift (entitlements change spikes).
Correlation: chain failed auth → ABAC denies → egress attempt within correlationId window raises severity.

Alerting & Response¶

Severities: Critical (key compromise, integrity violation), High (cross-region export allowed under abnormal volume), Medium (deny spikes), Low (noise indicators).
Routing: SecOps on-call (PagerDuty/Teams), SRE for availability-coupled signals, DPO/Legal for privacy signals.
Alert payload includes: SIEM query link, recent decision logs, policyVersion, runbook link, and recommended next actions.

Alert policy (sketch)

alerts:
  - name: integrity_violation
    if: 'integrity.verify.fail.count > 0'
    severity: critical
    route: secops
    runbook: 'hardening/tamper-evidence.md#incident'
  - name: cross_region_export_spike
    if: 'egress.bytes.above_baseline and route not in ["in_region","same_code"]'
    severity: high
    route: secops+dpo
    runbook: 'platform/data-residency-retention.md#observability--compliance-evidence'
  - name: abac_deny_surge
    if: 'abac.deny.count.delta_pp > 5'
    severity: medium
    route: sre
    runbook: 'operations/alerts-slos.md#abac-deny-surge'

Dashboards (security view)¶

Guard Posture: allow/deny trend, reasons heatmap, top tenants/regions by denies.
Auth Posture: fail/success ratio, MFA prompts, OBO usage, token anomalies (stale kid, missing DPoP).
Egress & Residency: bytes by route, cross-region attempts (blocked/allowed), TSA latency tiles.
Key Ops & Integrity: rotations, unwrap/sign counts, verification pass/fail heatmap.

Tuning & Suppression¶

Context-aware suppression for approved pentests, DR drills, load tests (tagged purpose).
Auto-closure: if post-incident remediation ADR merged and no recurrence in 30 days.
Feedback loop: each false positive requires a rule note and either scope refinement or baseline adjustment.

KPIs & SLO Hooks¶

MTTD/MTTR per severity band.
Precision/Recall for top 5 rules (quarterly review).
Coverage: % of guard decisions correlated in SIEM within 60s.
Freshness: policy bundle signature lag < 5 min.

Evidence & Compliance¶

SIEM reports feed monthly Security Evidence Packs: detection stats, incident timelines, remediation status, and rule version diffs.
All alerts/events are region-local and included in auditor-ready exports with signatures.

Guardrails (quick checklist)¶

All security events carry correlationId, tenantId, regionCode, and policyVersion.
Detection rules honor residency and never require raw PII (use hashes/tokens).
Alerts must link to a runbook and policy references; paging only on actionable signals.
Baselines and rules are versioned; changes recorded in a Security ADR.

Incident Response & Forensics¶

ATP treats incidents as time-critical, evidence-driven operations. We detect early, contain fast, investigate with immutable artifacts, and remediate with policy changes captured in Security ADRs. Every step produces auditor-ready evidence.

Detection (how we know)¶

Signals: alerts from SIEM rules (guard violations, key ops anomalies), integrity verification failures, WAF spikes, egress anomalies, IdP risk events.
Sources: Gateway/PEP-1, Services/PEP-2, OPA decisions, KMS/HSM logs, mesh authz, egress gateway, cloud control plane.
Auto-triage: incidents are auto-classified with severity and routed to on-call; correlation uses correlationId, tenantId, regionCode, policyVersion.

Severity & targets (example)

Sev	Description	Targets	Initial comms
S0	Active data breach, key compromise, cross-tenant exfiltration	15 min containment	30 min internal, 72h regulator (jurisdictional)
S1	Integrity violation, residency breach prevented, mass auth spike	60 min containment	2h internal
S2	Misconfig, partial outage, false-positive run	4 h containment	By EOD

Containment (stop the bleeding)¶

Tenant isolation: dynamic OPA override → deny for {tenantId: X} on read/export/purge.
Kill switches at the Gateway:
- Block cross-region routes, disable export categories, throttle ingestion for hot shard.
Token revocation:
- Revoke refresh tokens for affected identities; rotate API keys; enforce re-auth with MFA.
Key containment:
- Suspend KEK unwrap/sign for suspected keyId; rotate DEKs for impacted tenants; escrow unaffected.
Network choke points:
- Egress allowlist tightened; specific FQDN/CIDR blocks; WAF rule elevation; mesh policy set to deny-by-default for suspicious identities.

Containment policy (sketch)

containment:
  tenantBlock:
    tenantId: "7c1a-..."
    ops: [ "read", "export", "purge" ]
    reason: "cross-tenant attempt"
    ttl: "PT4H"
  gateway:
    exportRoutes: { allow: [ "in_region" ], deny: [ "same_code", "global" ] }
  keys:
    suspend:
      - keyId: "hsm-eu-01"
        ops: [ "unwrap" ]
        scope: [ "tenant:7c1a-..." ]
        ttl: "PT2H"

Investigation (prove what happened)¶

Golden rule: preserve evidence; no live-fixing in the blast zone without snapshot.
Immutable audit trail:
- Decision logs (ABAC, exports), integrity manifests, anchor/TSA receipts, KMS/HSM key ops, gateway WAF logs.
Chain-of-custody:
- Hash and sign collected artifacts; timestamp (RFC 3161 optional); store in write-once evidence vault; record access with dual-control.

Evidence bundle (manifest)

forensicsBundle:
  id: "ir_2025-10-29_7c1a"
  severity: S1
  scope: { tenantId: "7c1a-...", region: "westeurope", from: "2025-10-29T08:00Z", to: "2025-10-29T10:00Z" }
  artifacts:
    - guards.log.ndjson.sig
    - waf.sample.json.gz.sig
    - keyops.log.ndjson.sig
    - integrity.verify.report.json.sig
    - export.manifests/*.sig
  hash: "sha256:..."
  tsa: "b64:..."
  reviewers: ["forensics.lead","ic"]

Timeline reconstruction:
- Join on correlationId and time bins; verify sequence of allow/deny, key ops, egress.
Integrity validation:
- Re-verify affected segments/anchors; quarantine anything with mismatched Merkle paths.

Remediation (return to safe & prevent recurrence)¶

Compensating controls:
- Tighten policies (e.g., export route to in_region only), raise WAF sensitivity, reduce token TTLs, enforce DPoP for privileged routes.
Rotation & cleanup:
- Rotate KEKs/DEKs as needed; re-issue tokens/keys; deprecate vulnerable routes.
Fix & validate:
- Patch/rollback services or policy bundles; run conformance suites (residency, authz, integrity) and post-fix canary.
Security ADR:
- Record root cause, alternatives, chosen change, rollout/rollback plan, evidence queries.

Roles & RACI¶

Role	Responsibilities
Incident Commander (IC)	Owns timeline, decisions, severity, comms; delegates tasks
Forensics Lead	Evidence capture, chain-of-custody, timeline, findings
Ops/SRE Lead	Containment (network/mesh/gateway), DR posture changes
Security Engineering	Policy/OPA, key ops coordination with KMS/HSM
Comms Lead	Stakeholder updates, regulator/customer notices with Legal/DPO
Scribe	Real-time log of actions/decisions, artifact index

Communications & Notifications¶

Internal: war room channel with audited bot; updates at fixed intervals per severity.
External: regulator/customer notices coordinated with Legal/DPO; content must include scope, data categories, controls in place, and remediation.
Residency-aware: notifications follow regional requirements (see privacy-gdpr-hipaa-soc2.md).

Playbooks (common scenarios)¶

Cross-tenant access attempt (blocked)
Detect abac.decision_denied{reason="tenant_mismatch"} surge → isolate tenant route → verify logs → confirm no allow → close with tuning if necessary.
Tampering / integrity violation
Detect integrity.violation_detected → quarantine stream → re-verify anchors → check HSM key lineage → rebuild read models from last good segment → publish incident report.
Data breach / exfil suspicion
Egress anomaly to non-allowed route → kill cross-region exports → rotate credentials → capture packet samples → notify per jurisdiction if confirmed.
Key compromise indicators
Abnormal unwrap/sign rates → suspend key ops → rotate KEKs/DEKs → re-seal anchors with new signer → attest rotations.
Supply-chain artifact tampering
CI gate or admission controller denial → freeze deploys → verify cosign/SBOM diffs → roll back images/policies → update allowlists.

Flow (generic incident)

sequenceDiagram
  autonumber
  participant DET as Detection (SIEM)
  participant IC as Incident Commander
  participant CT as Containment
  participant FR as Forensics
  participant RM as Remediation
  DET-->>IC: Alert (severity, signals, links)
  IC->>CT: Activate playbook (isolate tenant/export route)
  CT-->>IC: Contained, switches active
  IC->>FR: Capture evidence + timeline
  FR-->>IC: Findings (scope, root cause, impact)
  IC->>RM: Implement fix (policy/service/keys)
  RM-->>IC: Conformance green, canary pass
  IC-->>All: Close + ADR + evidence pack

Hold "Alt" / "Option" to enable pan & zoom

Runbooks & Automation Hooks¶

Buttons (secured runbooks):
- Isolate Tenant → writes OPA override; TTL-bound.
- Disable Export Route (same_code, global) globally or per-tenant.
- Revoke Tokens for subject/tenant.
- Suspend Key Ops for keyId/tenant scope.
Backout: timeboxed; all switches auto-expire unless renewed with approval.

KPIs & Post-incident¶

KPIs: MTTD, MTTR, % incidents with complete evidence bundle, % closed with ADR, repeat rate within 30/90 days.
Post-incident review: within 5 business days; track actions in backlog with owners and due dates; link to evidence and ADR.

Acceptance (done when)¶

Containment active within target window; evidence bundle complete and signed.
Root cause, impact, and scope confirmed; regulators/customers notified if required.
Fix deployed and verified via conformance suites; Security ADR merged.
Kill switches rolled back or codified as permanent policy changes.
Lessons learned captured; detections tuned; playbook updated.

Cross-references observability.md · alerts-slos.md · tamper-evidence.md · data-residency-retention.md · key-rotation.md · privacy-gdpr-hipaa-soc2.md

Vulnerability & Patch Management¶

We maintain a continuous posture for discovering, triaging, and remediating vulnerabilities across code, dependencies, images, OS, IaC, and policies. Patching favors zero-downtime, canary rollouts, and automatic rollback with auditable evidence.

Coverage & Scanners¶

Dependencies: NuGet/NPM/containers — SBOM generated each build; diffed on PRs.
Container images: base + app layers; pinned by digest, rebuilt nightly.
OS & runtimes: distro/APK/apt packages; language runtimes (dotnet/node).
IaC: Dockerfiles, Helm, Bicep/Terraform — misconfig & secret scans.
Policies: OPA bundles signed; verify signature and policy lints on PR/CI.

CVE Intake & Prioritization¶

Feeds: CVE/NVD, vendor advisories, CISA KEV, ecosystem advisories.
Exploit signals: EPSS percentile, KEV listing, public PoC.
Asset criticality: internet-exposed? PII/PHI adjacent? tenancy boundary?

Risk score (sketch)

Risk = max(CVSS, SevMap) × (1 + EPSSfactor + ExploitBonus) × AssetCriticality

EPSSfactor: 0.25 if EPSS ≥ 0.7, 0.1 if 0.5–0.7 else 0.
ExploitBonus: +0.25 if KEV or public PoC.
AssetCriticality: 1.5 (internet-exposed), 1.25 (boundary), 1.0 otherwise.

Remediation SLAs (default)¶

Class (example)	Trigger	SLA to Mitigate	Mitigation options
Critical	KEV or EPSS ≥ 0.7 or CVSS ≥ 9	48h	Patch/upgrade, config block, feature flag off, WAF rule
High	CVSS 7–8.9	7d	Patch/upgrade; config hardening
Medium	CVSS 4–6.9	30d	Patch in next scheduled window
Low	CVSS <4	90d	Batch with routine updates

Exceptions require Security Ops + Product approval, an expiry, and a compensating control; tracked in the exception registry.

Pipeline Gates (CI/CD)¶

Build stage: SBOM, dependency scan, IaC scan, container scan; fail on High/Critical unless approved exception.
Sign/attest: cosign signatures for images; signed OPA bundles.
Deploy stage: policy guard — rejects unsigned/unknown digest; canary guarded by SLOs & security metrics.

stages:
- stage: vuln_scans
  jobs:
  - job: sbom_and_scans
    steps:
    - script: dotnet build /p:ContinuousIntegrationBuild=true
    - script: cyclonedx create --out sbom.json
    - script: trivy fs --exit-code 1 .
    - script: trivy image --exit-code 1 $(IMAGE)
    - script: checkov -d infra/   # IaC
- stage: sign_and_verify
  jobs:
  - job: sign
    steps:
    - script: cosign sign --key $(COSIGN_KEY) $(IMAGE_DIGEST)
    - script: opa build policies/ && cosign sign --key $(COSIGN_KEY) policy.bundle
- stage: deploy_canary
  condition: succeeded()
  jobs:
  - deployment: canary
    strategy:
      runOnce:
        deploy:
          steps:
          - script: helm upgrade ... --set image.digest=$(IMAGE_DIGEST) --set canary=1
          - script: ./gates.sh security --denyDrift=5 --abacDenyDelta=5

Patch Windows & Rollouts¶

Zero-downtime strategy: rolling updates; workloads replicated; backward-compatible migrations.
Progressive: 1% → 10% → 50% → 100% with automated health & security gates.
Automatic rollback triggers:
- Guard deny spike > +5 pp vs baseline,
- Integrity verify failures > 0.1% sample,
- Error budget burn rate > 2× normal.

Policy (excerpt)

patch:
  strategy: progressive
  steps: [0.01, 0.1, 0.5, 1.0]
  securityGates:
    abacDenyDeltaPpMax: 5
    integrityFailRateMax: 0.001
    unsignedArtifact: block
  rollback:
    auto: true
    reasonCapture: required
maintenanceWindows:
  preferred:
    - fri_22_02_local
  blackout:
    - last_week_of_qtr

Coordination (ConnectSoft Platform Security)¶

Shared base images and common libraries patched centrally; ATP inherits via digest pin bumps.
Weekly Patch Board: review new CVEs, SLA status, exceptions expiring.
Change comms: security bulletin to affected teams; link to Security ADR and rollout plan.

Exception & Risk Acceptance¶

exception:
  id: "EXC-2025-014"
  cve: "CVE-2025-12345"
  reason: "Upstream fix pending; WAF rule blocks vector"
  scope: { service: "exporter", region: "westeurope" }
  compensating: [ "WAF-Rule-1123", "rate-limit-tighten" ]
  owner: "Platform Security"
  expiresAt: "2025-11-15T00:00:00Z"
  reviewEvery: "P7D"

Exceptions auto-page owners 7 days before expiry; cannot be extended without dual approval.

Evidence & Reporting¶

Artifacts: SBOMs, scan reports, cosign attestations, digest pins, deployment manifests, rollback records.
Dashboards: open vulns by severity/age, MTTR, SLA compliance %, exposure days, top offending packages.
Logs: vuln.scan.failed, image.unsigned.blocked, policy.bundle.unsigned.blocked, rollback.triggered.

For Hotfixes (out-of-band)¶

Use hotfix lane with same gates; scope to affected services/regions; create post-incident Security ADR.

Mermaid Flow (triage → rollout)¶

flowchart TD
  A[CVE Intake] --> B[Triage & Score]
  B -->|Critical/High| C[Create Fix PR]
  C --> D[Scans + SBOM + Sign]
  D --> E[Canary Rollout]
  E -->|Gates Pass| F[Full Rollout]
  E -->|Gate Fails| R[Auto Rollback]
  F --> G[Close & Evidence Pack]
  R --> C

Hold "Alt" / "Option" to enable pan & zoom

Guardrails (quick checklist)¶

Images & policy bundles are signed; digests pinned; no latest tags.
High/Critical vulns block CI/CD unless an exception with expiry and compensating controls exists.
Rollouts are progressive with security gates; automatic rollback is configured and tested.
Exceptions live in a registry, are time-boxed, and reviewed weekly; alerts fire before expiry.
Evidence (SBOMs, scans, signatures, rollout states) is retained and included in monthly security packs.

Cross-references Application Security & Secure SDLC · Security Monitoring & Detection · Incident Response & Forensics · zero-trust.md

Compliance Attestation Strategy¶

ATP proves compliance continuously, not just at audit time. We maintain a live mapping to frameworks (GDPR, HIPAA, SOC 2, ISO 27001), curate signed evidence, and support time-bound, read-only auditor workflows — all backed by dashboards that reflect real-time control posture. See privacy-gdpr-hipaa-soc2.md for per-framework details.

Scope & Mapping (where each proof comes from)¶

Framework focus	Primary controls (examples)	Live evidence sources	Cross-refs
GDPR: residency, storage limitation, rights	Residency profiles, retention policy-as-code, DSAR pipeline, redaction	Residency decisions, purge ledgers, DSAR manifests, export route logs	data-residency-retention.md
HIPAA: safeguards, integrity, access	ABAC, immutability + anchors, key ops auditing, restore quarantine	ABAC decision logs, integrity verify reports, KMS/HSM logs, restore manifests	tamper-evidence.md
SOC 2: Security/Availability/PI/Conf/Privacy	Zero Trust, DR drills, logging/monitoring, change control	WAF/mesh/egress logs, DR drill reports, SIEM detections, ADRs	observability.md
ISO 27001: Annex A controls	Risk management, supplier, crypto, deletion	Risk register, vendor assessments, crypto key lineage, deletion evidence	key-rotation.md

The authoritative mapping lives in the Control Registry, where each control lists its framework tags and evidence queries.

Evidence Collection (how we curate proofs)¶

Artifacts are region-local, append-only, and signed:
- Policy bundles + signatures (policyVersion, bundle digest)
- ABAC decisions (allow/deny with reason codes)
- Residency decisions, purge ledgers, DSAR/export manifests
- Integrity proofs (Merkle roots, anchor signatures, TSA receipts)
- KMS/HSM key operations (wrap/unwrap/sign) with key lineage
- DR/restore drill reports, change records (ADRs), CI/CD gate outcomes
Packaging:
- Monthly Evidence Pack (ops + security): guard stats, scans, rotations, drills
- Quarterly Attestation Pack (framework-aligned): crosswalk + curated artifacts
- Stored in write-once class, CMS-signed; optional RFC 3161 timestamps
Retention: online ≥ 180 days, archive ≥ 12–24 months (per framework & contract)

Evidence pack manifest (sketch)

attestationPack:
  id: "att-2025Q4-eu"
  scope: { regionCode: "EU", tenants: ["*"] }
  frameworks: ["GDPR","SOC2"]
  policyVersion: "3.3.0"
  artifacts:
    - abac.decisions.2025Q4.ndjson.sig
    - residency.decisions.2025Q4.ndjson.sig
    - dsar.manifests.2025Q4/*.sig
    - retention.purge.ledger.2025Q4.csv.sig
    - integrity.verify.report.2025Q4.json.sig
    - kms.keyops.2025Q4.ndjson.sig
    - dr.drill.2025-11-12.report.pdf.sig
  tsa: "b64:..."

Auditor Workflows (read-only, time-bound, watermarked)¶

Access model
- Read-only auditor role (no break-glass, no export without watermark)
- Time-boxed (e.g., 14 days), region-scoped, tenant-scoped
- Purpose-bound tokens (purpose: "audit_attestation")
Watermarked exports
- All auditor exports are in-region, redacted by default, watermarked (auditor ID, request ID, timestamp)
- Manifests are signed and logged under audit.export.*
Chain-of-custody
- Each data pull is hashed, signed, timestamped; access logged with correlation IDs

sequenceDiagram
  autonumber
  participant A as Auditor
  participant SEC as SecOps (Approval)
  participant GW as API Gateway (PEP-1)
  participant SV as Services (PEP-2)
  participant PK as Policy/KMS
  A->>SEC: Request read-only access (scope, TTL, region)
  SEC-->>A: Approved token (purpose=audit_attestation)
  A->>GW: Query evidence endpoints
  GW->>SV: Forward with signed X-Policy-* headers
  SV->>PK: Verify policyVersion, sign manifest
  SV-->>A: Watermarked export + signed manifest

Hold "Alt" / "Option" to enable pan & zoom

Auditor checklist (excerpt)

auditorAccess:
  scope: { regionCode: "EU", tenants: ["s*","t*"] }
  ttl: P14D
  watermark: required
  export:
    route: in_region
    redactTemplate: pii-default
  logs: pii-safe

Continuous Compliance Dashboards (live posture)¶

Framework posture tiles: % controls passing per framework (last 24h/7d/30d)
Key indicators:
- Policy bundle signature freshness (< 5 min lag)
- Residency conformance pass rate
- Retention dry-run anomalies (Δ > ±5 pp)
- Integrity verification failure rate
- Key rotation schedule adherence
- DR drill freshness (last drill per region)
Drill-down: control → evidence query → artifact → signature & TSA receipt
Exceptions view: open risk acceptances with expiry and compensating controls

Attestation Process (cadence & roles)¶

Monthly: Evidence Pack generation (automated), SecOps review, findings triage
Quarterly: Framework Attestation Pack, control sampling, auditor window
Annually / on major change: penetration test + DPIA refresh (where required)
Roles:
- Compliance: owns framework mapping and auditor interface
- SecOps: owns evidence packs and access provisioning
- SRE: owns DR/restore evidence, availability artifacts
- Platform Security: owns policy registry & signatures
- Legal/DPO: owns jurisdictional notices and DPIA

Interfaces (read-only endpoints)¶

/evidence/abac/decisions?from=&to=&tenantId=&regionCode=
/evidence/residency/decisions?...
/evidence/retention/purge-ledger?...
/evidence/integrity/reports?...
/evidence/keyops/logs?...
All endpoints enforce auditor role, purpose, region, rate limits, and watermarking on export.

Acceptance (done when)¶

Framework crosswalks link each control → evidence query → artifact with owners and cadence
Monthly/quarterly packs are generated, signed, stored in write-once, and retrievable by auditors
Auditor workflow is time-bound, read-only, watermarked, and captured in logs with correlation IDs
Dashboards reflect real-time control posture and surface expiring exceptions and stale drills

Guardrails (quick checklist)¶

Evidence is region-local, PII-safe, signed, and timestamped.
Auditor exports are in-region, watermarked, and redacted by default.
No auditor action can mutate systems; break-glass is not available to auditor roles.
Exceptions are time-boxed with compensating controls and show up in posture dashboards.

Cross-references privacy-gdpr-hipaa-soc2.md · data-residency-retention.md · observability.md · alerts-slos.md · tamper-evidence.md · key-rotation.md

Third-Party & Supply Chain Security¶

We secure the supply chain from code → build → artifact → deploy → runtime and govern third-party risk across cloud providers, SaaS, SDKs, libraries, and CI/CD tooling. Vendors are tiered by data sensitivity and blast radius; artifacts must be signed, attested, and SBOM-tracked; contracts encode security requirements.

Principles¶

Trust is earned and re-verified: signatures, provenance, and reputation checks.
Minimal blast radius: least privilege, constrained egress, segregated secrets per vendor.
Documented accountability: subprocessor register, DPAs/BAAs, incident SLAs, data maps.
Continuous monitoring: detect typosquat, dependency confusion, malicious updates.

Vendor Risk Assessment (tiers & cadence)¶

Tier	Example vendors	Data & access	Due diligence	Review cadence
T1 — Critical	Cloud/KMS, IdP, logging/SIEM	Prod data, keys, or auth	SOC2/ISO certs, pen test, DPA/BAA, residency & deletion guarantees	Quarterly + annual onsite/virtual
T2 — Sensitive	SDKs, observability SaaS, build infra	Metadata/telemetry, limited PII	Security questionnaire, SBOM, SLA on incidents	Semiannual
T3 — Low	Dev utilities, docs tooling	No prod access	Basic questionnaire, OSS posture	Annual

Onboarding checklist (excerpt)

vendorOnboarding:
  riskTier: T1|T2|T3
  controls:
    - soc2_or_iso27001: required_for: [T1,T2]
    - dpa_baa_signed: required_for: [T1]
    - residency_terms: match(platform_profiles)
    - incident_sla_hours: { critical: 24, high: 72 }
    - data_deletion_sla_days: 30
    - sbom_provided: true
    - vuln_disclosure_policy: public_or_private
  technical:
    - ip_allowlist_configured: true
    - egress_allowlist_entry: created
    - api_keys_scoped_rotated: true
    - audit_logs_available: true

SBOM, License & Provenance¶

SBOM generation every build (CycloneDX/SPDX) and verification on deploy; diffs must be reviewed on PRs.
License compliance: allowlist/denylist; flag copyleft where redistribution applies; legal review for exceptions.
Artifact signing & provenance:
- cosign signatures for container images and OPA bundles.
- SLSA/in-toto attestations (builder, source repo, commit, workflow run).
- NuGet/npm sources pinned; package signature/integrity verified; no latest.

Policy (sketch)

supplyChain:
  requireSignatures: [ container_image, opa_bundle ]
  requireAttestations: [ slsa_provenance ]
  sbom:
    format: cyclonedx
    verifyOnDeploy: true
  packageSources:
    nuget: [ "https://api.nuget.org/v3/index.json" ]
    npm:   [ "https://registry.npmjs.org" ]
  pinning:
    container: digest_only
    packages: lockfile_required
  licenses:
    allow: [ "MIT","Apache-2.0","BSD-3-Clause" ]
    deny:  [ "AGPL-3.0" ]

Contractual Security Requirements¶

Incident notification: Critical within 24h, High within 72h; share IoCs and containment steps.
Data handling: processing instructions, residency alignment, subprocessor disclosure, secure deletion within 30 days of termination.
Availability: SLA for uptime and support; RPO/RTO if vendor hosts critical path.
Compliance: maintain valid SOC 2/ISO; annual pen test; provide report extracts under NDA.
Audit: ATP may conduct security review; provide API/audit logs for the scoped period.

Monitoring for Supply Chain Attacks¶

Dependency hijacking & typosquatting
- Alert on new package names with small Levenshtein distance to existing deps.
- Block installs from unapproved registries or namespace changes.
Malicious updates
- Alert on sudden publisher change, unusual permission requests, or telemetry domains added.
- Quarantine builds on SBOM delta spikes (e.g., +30% new transient deps).
CI/CD artifact tampering
- Admission controller denies unsigned or unknown-digest images/policies.
- Compare provenance (repo URL, commit SHA, workflow run ID) to expected patterns.

Detections (examples)

-- New or suspicious package
sbom_deltas
| where change == "added"
| extend dist = levenshtein(packageName, nearestKnown)
| where dist <= 2 or sourceRegistry !in ('nuget.org','npmjs.org')

-- Unsigned artifact admitted (should be impossible)
deploy_events
| where imageSignatureVerified == false or opaBundleVerified == false

Operational Practices¶

Quarantine lane for new deps/vendors → staged rollout behind feature flag.
Egress only to allowlisted vendor endpoints; DNS split-horizon; no wildcard egress.
Secrets per vendor integration: scoped, rotated, and stored in Key Vault/HSM; no credential reuse.
Vendor keys: rotated on staff changes or incident; access reviewed quarterly.

Subprocessor Register & Changes¶

Maintain a public internal register (customer-visible summary) of all subprocessors with:
- Services provided, data categories, residency, contact, certifications, last review date.
Change notifications to customers per contract; allow opt-out where required.

Evidence & Acceptance¶

Artifacts: vendor questionnaires, certifications (SOC/ISO), DPAs/BAAs, SBOMs, cosign/verifier logs, provenance attestations, license scans, allowlist policies.
Dashboards: open vendor risks by tier, SBOM drift, unsigned artifact blocks, registry changes, license violations.
Done when:
- All T1/T2 vendors have current attestations; DPAs/BAAs executed.
- All deployable artifacts are signed + attested; SBOM verified at deploy.
- Package sources pinned; dependency monitoring rules active; no wildcards.
- Subprocessor register is up to date and linked from customer docs.

Guardrails (quick checklist)¶

Only approved registries and pinned digests; no latest, no unverified publisher changes.
Images and policy bundles must be signed; admission denies unsigned/unknown artifacts.
SBOMs are generated, diffed, and verified; license allowlist enforced.
Vendor integrations have scoped secrets, egress allowlists, and incident SLAs in contract.
Supply-chain detections are on by default; quarantine suspicious updates for manual review.

Cross-references Application Security & Secure SDLC · Vulnerability & Patch Management · Security Monitoring & Detection · observability.md

Security Testing & Validation¶

Security is verified continuously with layered tests: unit/contract/integration suites, chaos experiments, independent penetration testing, and compliance test packs that prove control effectiveness. Tests are policy-aware (carry policyVersion) and run in CI/CD and scheduled jobs.

Test Strategy (what we cover)¶

Auth/AuthZ flows: token minting, OBO down-scoping, RBAC/ABAC decisions at PEP-1/PEP-2.
Tenancy & residency guards: cross-tenant and cross-region deny paths, export-route gating.
Classification/redaction: required templates, DSAR exceptions, log redaction.
Integrity & immutability: WORM append/deny, Merkle path checks, anchor/TSA verification.
Keying: per-tenant DEKs, region KEKs, rotation windows, sender-constrained tokens (mTLS/DPoP).
Network: WAF rules, mTLS-only mesh, egress allowlist enforcement.

Automated Tests¶

Unit & Contract (fast)

ABAC/Rego: allow/deny matrices for common operations.
Token claims: required purpose, tenant_id, region_code, entitlements.
Export routes: in_region only unless profile allows.

# test/authz_test.rego
package atp.tests

import data.atp.authz as pol

test_read_same_tenant_and_region_allows {
  input := {"op":"read","token":{"tenant_id":"t1","region_code":"EU","scopes":["evidence.read"],"purpose":"default"},
            "resource":{"tenantId":"t1","regionCode":"EU"}}
  pol.allow with input as input
}

test_cross_region_blocks {
  input := {"op":"read","token":{"tenant_id":"t1","region_code":"EU","scopes":["evidence.read"],"purpose":"default"},
            "resource":{"tenantId":"t1","regionCode":"US"}}
  not pol.allow with input as input
}

Integration (real boundaries)

Run against ephemeral env with private endpoints and mesh mTLS.
Assert bearer-only tokens fail inside mesh; sender-constrained tokens pass.
Export attempts to disallowed routes return 403 GuardViolation with reason.

- name: integration:mesh-mtls
  run: dotnet test tests/Integration.Mesh.Tests.csproj --filter "Category=Mesh"

Deny/Allow Matrix (sample)

Operation	Token Purpose	Region Route	Expect
read evidence	default	EU→EU	allow
read evidence	default	EU→US	deny (cross_region_blocked)
export PII	dsar_export	EU→EU	allow (redaction disabled)
export PII	default	EU→EU	deny (missing redactTemplate)
purge hot	retention_admin	EU	allow if cutoff & approvals
purge hot	default	EU	deny (scope)

Chaos Experiments (resilience of controls)¶

Credential rotation: roll JWKS kid, ensure no stale acceptance; DPoP key rotation mid-session.
Key unavailability: suspend KEK unwrap in region; verify write/read degrade modes and deny-with-evidence.
Network partitions: isolate OPA sidecar; ensure fail closed with reason pdp_unreachable.
Egress lockdown: remove allowlist entry; assert outbound fails and alerts fire.
WORM stress: simulate storage retry/fail; ensure no duplicate/mutable writes (idempotency preserved).
Anchor signer outage: defer seal with quarantine; verify queued seals upon recovery.

Chaos policy (excerpt)

chaos:
  blastRadius: "single service / single region"
  schedule: "off-peak"
  abortOn: [ "abac.deny.count.delta_pp > 5", "error_budget_burn > 2x" ]
  evidence: collect[ "guards", "keyops", "integrity", "egress" ]

Penetration Testing & Bug Bounty¶

Cadence: annual external pen test + post-major change (auth/crypto/network).
Scope: public API, gateway/WAF, auth flows, multi-tenant isolation, export workflows.
ROE: no production data mutation; dedicated staging with production-like controls; auditor tokens with watermarking.
Bug bounty: coordinated disclosure; severity mapped to IR SLAs; researcher safe harbor.

Deliverables: findings, proof-of-concept, exploitation path, recommended fix, retest evidence. All high/critical findings require a Security ADR and are tracked to closure.

Compliance Test Suites (control effectiveness)¶

Residency conformance: synthetic tenants per region; cross-region deny assertions.
Retention dry-runs: compare eligible vs. purged; investigate deltas ±5 pp.
Key rotation checks: verify DEK/KEK schedules, escrow attestations.
Evidence completeness: monthly pack contains required artifacts; manifest signatures valid.

conformance:
  residency:
    tenants: ["t-eu","t-us"]
    tests:
      - route: EU->US
        expect: deny
  retention:
    dryRunWindowDays: 7
    deltaThresholdPp: 5

CI/CD Integration¶

Stages: unit/rego → integration (ephemeral env) → chaos (canary env) → pen-test gate (scheduled).
Gates: block deploy if deny/allow drift > 5 pp, or if any chaos test yields open control failure.
Artifacts: test reports, OPA bundle signatures, SIEM links; all archived into evidence packs.

- stage: security_tests
  jobs:
  - job: rego_unit
    steps:
      - script: conftest test policies/ -p policies/
  - job: dotnet_unit
    steps:
      - script: dotnet test tests/Unit
  - job: integration_ephemeral
    steps:
      - script: ./scripts/provision-ephemeral.sh
      - script: dotnet test tests/Integration
      - script: ./scripts/teardown-ephemeral.sh

KPIs & SLOs¶

Coverage: ≥ 90% of guard paths have tests (incl. deny paths).
Drift: allow/deny decision drift ≤ 2 pp week-over-week.
Chaos: ≥ 1 control-focused chaos drill per region/month; aborts < 5%.
Pen test: all High/Critical remediated within SLA; retest pass.

Evidence & Acceptance¶

Test outputs, chaos logs, pen-test reports, and conformance results are retained, signed, and referenced by policyVersion.
“Done” when:
- All suites pass; gates green; no High/Critical open.
- Evidence is bundled into the current Security Pack.
- Any material changes result in a Security ADR.

Cross-references Application Security & Secure SDLC · Least Privilege & Policy Enforcement · Security Monitoring & Detection · Incident Response & Forensics · data-residency-retention.md

Governance, Risk & Continuous Improvement¶

ATP evolves through clear ownership, measurable risk reduction, and versioned security decisions. We maintain a forward-looking security roadmap, a living risk register, and an ADR-driven decision record. Progress is reviewed with stakeholders quarterly and tied to concrete KPIs.

Security Roadmap (next 12–18 months)¶

Crypto agility & quantum-safe readiness
- Pluggable crypto providers; hybrid (ECDSA + Dilithium) anchor signatures.
- KEK migration plan; algorithm identifiers embedded in manifests; rollback strategy.
- PQC testbed with offline verification harness; performance budget established.
AI-assisted detection
- Entity behavior baselines (tenant/identity/purpose) with drift scoring.
- LLM-assisted triage summaries sourced from structured evidence; no raw PII.
- Auto-enrichment (IoCs, change diffs, policy deltas) appended to alerts.
Adaptive policies
- Risk-aware ABAC (e.g., shorten token TTL on anomaly, force DPoP for export).
- Contextual egress gates (tighten routes on surge; relax post-drill automatically).
- Progressive enforcement: report → warn → enforce with tracked false positives.
Attestation automation
- “One-click” Evidence Pack generation with per-framework crosswalks.
- Signed control proofs (policyVersion, SIEM query hash, artifact digests).

Roadmap tracker (excerpt)

roadmap:
  - id: CRYPTO-AGILITY-01
    goal: "Hybrid anchors (ECDSA+Dilithium) in EU region"
    due: "2026-03-31"
    kpi: "100% anchors dual-signed; verify latency < +10%"
    owner: "Platform Security"
  - id: DET-AI-02
    goal: "LLM triage summaries for S1+"
    due: "2026-01-15"
    kpi: "MTTD -20%; triage time -30%"
    owner: "SecOps"

Risk Register (living view)¶

Inputs: threat intel, incident post-mortems, pen test findings, supply-chain advisories, KPI drift.
Scoring: Likelihood × Impact (1–5); Residual after controls; review monthly.
States: Open → Mitigating → Verified → Accepted (time-boxed).
Links: each risk ties to controls, tests, evidence queries, and ADRs.

Template

risk:
  id: R-2025-023
  title: "Break-glass scope misuse"
  drivers: ["Incident review 2025-10-12"]
  inherent: { likelihood: 3, impact: 4 }
  controls: ["EX-ATP-040","AC-ATP-001"]
  plan:
    actions:
      - "Add scope validator to approval UI"
      - "TTL hard-cap to 4h platform-wide"
    owner: "Security Ops"
    due: "2025-11-15"
  residual: { likelihood: 2, impact: 3 }
  evidence: ["siem:break_glass.used", "policy:approval_validator.v2"]
  status: "Mitigating"
  exceptions: []

ADR-Driven Security Decisions¶

When required: auth/crypto changes, policy model updates, egress rules, residency posture, incident-driven compensating controls.
Format: Context → Decision → Alternatives → Risks → Rollout/Rollback → Evidence hooks (SIEM queries, policyVersion, manifests).
Versioning: policies and ADRs share a semantic version; PEPs stamp policyVersion into decision logs.
Governance: security ADRs require dual approval (Platform Security + Product/Compliance).

Policy Lifecycle & Change Control¶

Stages: draft → canary → enforce.
Deprecation windows: minimum 30 days for breaking policy changes (export routes, token requirements).
Policy bundles: signed, reproducible build, immutably archived; rollback bundles retained ≥ 12 months.
Drift monitoring: allow/deny delta ≤ 2 pp week-over-week, else auto-rollback.

Policy release (sketch)

policyRelease:
  version: "3.4.0"
  changes:
    - "Require DPoP for export.* routes"
    - "Quotas per edition tightened"
  rollout:
    canaryTenants: ["t-eu-01","t-us-02"]
    metrics: ["abac.deny.count","egress.bytes","export.latency.p95"]
    abortIf:
      - "abac.deny.delta_pp > 5"
      - "egress.route.violations > 0"

Quarterly Security Reviews¶

Participants: Platform Security, SecOps, SRE, Product, Compliance/Legal, Data Governance.
Agenda:
- KPIs: MTTD/MTTR, allow/deny drift, integrity failure rate, key rotation adherence.
- Risks: top open risks, exceptions expiring, pen test & bug bounty status.
- Roadmap: progress vs. targets; reprioritize based on incidents and business goals.
- Controls: effectiveness review; propose retire/replace/tune.
Artifacts: minutes, action register, updated roadmap, updated risk entries, ADR references.

KPIs & Signals¶

Risk: % risks in Mitigating past due (< 10%); residual risk trend ↓ quarter-over-quarter.
Controls: policy signature freshness < 5 min; residency conformance ≥ 99.9%.
Incidents: MTTD/MTTR targets met for S0/S1; repeat incident rate < 5% per quarter.
Testing: chaos drills ≥ 1/region/month; pen test High/Critical remediated within SLA.
Compliance: evidence pack on time; auditor access windows met with zero scope creep.

Operating Model & RACI (summary)¶

Area	Owner	Consulted	Informed
Policies & ADRs	Platform Security	Product, Compliance	SRE, Eng
Risk Register	SecOps	Platform Security, Compliance	Product
Roadmap	Platform Security + Product	SRE, Compliance	Execs
Quarterly Review	Security Lead (chair)	All above	All teams

Evidence & Acceptance¶

Roadmap, risks, ADRs, policy bundles, and quarterly minutes are signed, timestamped, and discoverable.
“Done” for the quarter when:
- Roadmap milestones updated; risks re-scored; exceptions reviewed/renewed or closed.
- KPIs reported; deltas explained; remediations tracked to owners/dates.
- All policy changes have ADR**s and **evidence hooks; canary/enforce states recorded.

Cross-references Security Control Framework · Least Privilege & Policy Enforcement · Security Monitoring & Detection · Incident Response & Forensics · Vulnerability & Patch Management · data-residency-retention.md

Guardrails (quick checklist)¶

Every material security change has an ADR, a versioned policy, and evidence hooks.
Risks are time-bound with owners and actions; residual risk tracked post-mitigation.
Policy rollouts are canaried with explicit abort conditions; rollbacks are rehearsed.
Quarterly reviews produce signed minutes, updated roadmap, and closed-loop actions.

Appendix A — Security Control Inventory (Summary)¶

Control ID	Category	Description	Owner	Evidence	Test Freq
IAM-01	Preventive	MFA for admin ops	Identity	Auth logs	Monthly
NET-01	Preventive	mTLS inter-service	Platform	Cert rotation	Quarterly
ENC-01	Preventive	Per-tenant encryption	Platform	KMS logs	Continuous
DET-01	Detective	Anomaly detection	SRE	Alert history	Weekly
AUD-01	Detective	Meta-audit stream	ATP	Audit records	Continuous
IAM-02	Deterrent	JIT elevation (dual-approval, TTL)	SecOps	`elevation.used` events	Continuous
OPA-01	Preventive	ABAC at PEP-½ (policy-as-code)	Platform	ABAC decision logs, bundle sigs	Per release
DLP-01	Preventive	Redaction enforcement on exports	DataGov	Export manifests, redact maps	Continuous
KMS-02	Preventive	Key rotation & escrow (DEK/KEK)	Platform	Rotation manifests, HSM ops	Quarterly
INT-01	Detective	Integrity verification (Merkle/anchors)	Core Eng	Verifier reports, TSA receipts	Daily/Weekly
DR-01	Corrective	Quarantined restore (verify before use)	SRE	Restore logs, verify checklist	Monthly
EGR-01	Preventive	Egress allowlist (deny-by-default)	Platform	Egress gateway logs, DNS policy	Continuous

Full inventory is maintained in the security control registry and includes owners, evidence queries, acceptance criteria, and ADR links.

Appendix B — Cross-Reference Map¶

Topic	Primary Document	Notes
Tenant isolation & guards	multitenancy-tenancy.md	ABAC, break-glass
Encryption & key mgmt	data-residency-retention.md §5, key-rotation.md	Per-tenant keys, rotation
PII handling	pii-redaction-classification.md	Classification, redaction
GDPR/HIPAA/SOC2 mapping	privacy-gdpr-hipaa-soc2.md	Detailed compliance
Integrity & tamper evidence	tamper-evidence.md	Hash chains, anchors
Zero Trust architecture	zero-trust.md	Principles, patterns
Backup security	backups-restore-ediscovery.md	Encrypted backups
Observability	observability.md	Security logs, metrics

Appendix C — Threat Model Summary (High-Level)¶

Threat	Impact	Likelihood	Mitigations
Cross-tenant access	Critical	Low	Tenancy guards (§5), ABAC
Tampering audit records	Critical	Low	Immutability, hash chains
Data exfiltration	High	Medium	Egress controls, encryption, DLP
Insider threat	High	Low	Least privilege, dual-control, audit
DoS/resource exhaustion	Medium	Medium	Rate limits, quotas, backpressure

See “Threat Model & Attack Surface” for detailed actors, paths, and residual risk notes; definitions for Impact/Likelihood and scoring live in the risk register.