Azure Pipelines - Audit Trail Platform (ATP)¶

Continuous delivery with compliance — ATP pipelines automate build, test, security, and deployment with full traceability.

Purpose & Scope¶

This document defines the CI/CD pipeline architecture for the ConnectSoft Audit Trail Platform (ATP), establishing how code moves from commit to production with automated quality gates, security scanning, and compliance evidence generation at every stage.

What this document covers

Establish ATP's Azure Pipelines architecture aligned with the centralized ConnectSoft.AzurePipelines reusable template repository.
Define CI/CD stages and workflows: lint → build → test → security scan → package → publish → deploy across all ATP microservices.
Specify integration patterns with ConnectSoft Microservice Template structure, quality gates, and compliance checkpoints.
Detail multi-environment deployment strategy (dev, test, staging, production) with approval workflows, deployment strategies (rolling, blue-green, canary), and rollback procedures.
Document artifact versioning and provenance: semantic versioning, SBOM generation, Docker image tagging, and traceability.
Outline pipeline observability and metrics: build duration, test trends, deployment frequency, DORA metrics integration.
Describe template reusability and customization: parameterization, versioning, and service-specific overrides.

Out of scope (referenced elsewhere)

Detailed environment configuration and secrets management (see environments.md).
Quality gate policies, coverage thresholds, and security scan rules (see quality-gates.md).
Service-specific business logic or domain implementation details (see service repositories).
Infrastructure provisioning beyond pipeline context (see IaC in infrastructure/ directories).
Operational runbooks for incident response and troubleshooting (see runbook.md).

Readers & ownership

DevOps/Platform Engineering (owners): Pipeline templates, quality gates, artifact management, deployment automation.
Service Teams: Service-specific pipeline configurations, test coverage, and deployment validation.
Security/Compliance: Security scanning integration, SBOM generation, compliance artifact collection.
SRE/Operations: Deployment strategies, rollback procedures, pipeline observability, and health monitoring.
QA/Test Engineering: Test execution, coverage enforcement, integration test orchestration.

Artifacts produced

Pipeline Templates: Centralized YAML templates in ConnectSoft.AzurePipelines repository for build, test, publish, deploy, and infrastructure stages.
Per-Service Pipelines: azure-pipelines.yml in each ATP microservice repository referencing shared templates.
Build Artifacts: Versioned binaries, Docker images, NuGet packages, SBOM, and security scan reports.
Deployment Manifests: Environment-specific configuration overlays and deployment receipts.
Pipeline Dashboards: Azure DevOps dashboards tracking build health, test trends, deployment frequency, and quality metrics.
Compliance Evidence: Audit trail of pipeline executions, approvals, test results, and security scan outcomes.

Acceptance (done when)

All ATP microservices have functional CI/CD pipelines that successfully build, test, and deploy to dev environment.
Quality gates (code coverage ≥70%, security scans pass, tests green) enforced and blocking progression.
Deployment automation operational for dev and test environments; manual approval gates configured for staging and production.
Pipeline observability established with metrics, dashboards, and alerts for build/deployment failures.
Template versioning and reusability validated: changes to shared templates propagate correctly to service pipelines.
Compliance artifacts (SBOM, scan reports, test results) published and accessible for every build.
Documentation complete with examples, troubleshooting guides, and cross-references to related documents.

Pipeline Architecture Overview¶

ATP's CI/CD architecture is built on the ConnectSoft Pipeline Philosophy, a set of organizational standards that ensure consistency, reusability, and quality across all microservices. By centralizing pipeline templates and enforcing separation of concerns, ConnectSoft achieves predictable builds, traceable deployments, and compliance-ready artifacts without duplicating configuration across dozens of services.

The architecture distinguishes between Build (CI) — where code is validated, tested, and packaged — and Deploy (CD) — where artifacts are promoted through environments with graduated controls. Quality gates act as guardrails at each stage, blocking progression when standards aren't met. Observability is embedded throughout, providing real-time insights into pipeline health, artifact provenance, and deployment outcomes.

ConnectSoft Pipeline Philosophy¶

ConnectSoft's CI/CD strategy is built on four foundational principles that apply across all platform services, including ATP:

Reusable Templates¶

Principle: Centralize pipeline logic in a shared repository (ConnectSoft.AzurePipelines) to eliminate duplication and enforce consistency.

Implementation:

The ConnectSoft.AzurePipelines repository contains parameterized YAML templates for common pipeline stages: lint, build, test, security scan, publish, and deploy.
Service teams reference these templates in their azure-pipelines.yml files using the template keyword with the @templates repository alias.
Template versioning is managed through repository tags (e.g., v1.2.3), allowing services to pin to stable versions or adopt upgrades incrementally.

Benefits:

Consistency: All services follow the same build and test patterns, reducing cognitive load for engineers moving between projects.
Maintainability: Pipeline improvements (e.g., new security scanners, optimized caching strategies) propagate automatically to all services.
Governance: Security and compliance requirements are enforced centrally; individual teams cannot bypass critical stages.

Example Reference:

resources:
  repositories:
    - repository: templates
      type: git
      name: ConnectSoft/ConnectSoft.AzurePipelines

steps:
  - template: build/build-microservice-steps.yaml@templates
    parameters:
      solution: $(solution)
      buildConfiguration: $(buildConfiguration)

Separation of Concerns¶

Principle: Cleanly separate Build (CI) and Deploy (CD) stages with explicit dependencies and independent failure domains.

Implementation:

CI Stage: Executes lint, build, test, and security scans. Produces immutable artifacts (binaries, Docker images, NuGet packages) and compliance evidence (SBOM, test reports, scan results).
CD Stages: Deploy artifacts to environments (dev, test, staging, production) with progressive delivery strategies and approval gates.
Stages use dependsOn to establish explicit ordering; CD stages only run if CI succeeds.
Artifacts published in CI are consumed in CD via Pipeline.Workspace references, ensuring byte-for-byte deployment of validated code.

Benefits:

Clarity: Engineers know where to look for build failures (CI) vs deployment issues (CD).
Parallelism: Multiple CD stages can run concurrently (e.g., deploy to dev and test simultaneously) after CI completes.
Rollback Safety: If deployment fails, the immutable CI artifacts remain available for redeployment or investigation.

Stage Flow:

flowchart LR
  subgraph CI["CI Stage (Build & Validate)"]
    L[Lint] --> B[Build]
    B --> T[Test]
    T --> S[Security Scan]
    S --> P[Publish Artifacts]
  end

  subgraph CD_Dev["CD Stage: Dev"]
    D1[Deploy to Dev]
    D1 --> H1[Health Checks]
  end

  subgraph CD_Test["CD Stage: Test"]
    D2[Deploy to Test]
    D2 --> H2[Smoke Tests]
  end

  subgraph CD_Staging["CD Stage: Staging"]
    A1[Manual Approval] --> D3[Deploy to Staging]
    D3 --> H3[Regression Tests]
  end

  subgraph CD_Prod["CD Stage: Production"]
    A2[Manual Approval + CAB] --> D4[Canary Deploy]
    D4 --> H4[Health + Monitors]
  end

  CI --> CD_Dev
  CI --> CD_Test
  CD_Dev --> CD_Staging
  CD_Test --> CD_Staging
  CD_Staging --> CD_Prod

Hold "Alt" / "Option" to enable pan & zoom

Quality Gates¶

Principle: Automated gates block progression when quality, security, or compliance standards are not met.

Implementation:

Code Coverage: Minimum thresholds (ATP default: ≥70% line coverage, ≥60% branch coverage) enforced via Azure DevOps test tasks.
Security Scans: SonarQube quality gates fail builds if critical/high vulnerabilities or code smells exceed policy limits.
Dependency Checks: OWASP Dependency-Check or Snyk blocks builds with known CVEs in NuGet packages.
Test Pass Rate: 100% test pass required; flaky tests are treated as failures.
Build Performance: Alerts trigger if CI stage exceeds 10 minutes, prompting optimization.

Benefits:

Preventive Control: Issues are caught before code reaches production, reducing incident frequency.
Compliance Evidence: Gate pass/fail results provide auditable proof that quality standards were enforced.
Cultural Shift: Teams prioritize test coverage and security hygiene because gates make them release-blocking.

See Also: Detailed gate policies and thresholds in ci-cd/quality-gates.md.

4. Observability¶

Principle: Pipelines are instrumented to provide visibility into build health, artifact provenance, and deployment outcomes.

Implementation:

Pipeline Metrics: Azure DevOps dashboards track build duration, success rate, test pass rate, and artifact size per service.
DORA Metrics: Deployment frequency, lead time for changes, mean time to recovery (MTTR), and change failure rate calculated from pipeline data.
Artifact Traceability: Every artifact includes metadata (commit SHA, pipeline run ID, build timestamp, SBOM) enabling full provenance tracking.
Integration with Observability Stack: Pipeline events emitted to Azure Monitor/Application Insights; correlated with service deployments in Grafana dashboards (see observability.md).

Benefits:

Bottleneck Identification: Metrics reveal slow tests, oversized artifacts, or unreliable stages needing optimization.
Incident Correlation: When production issues arise, pipeline logs and artifact metadata help pinpoint the offending change.
Continuous Improvement: Trend analysis drives platform-wide improvements (e.g., parallelizing tests, caching dependencies).

ATP-Specific Considerations¶

While ATP adheres to ConnectSoft's general pipeline philosophy, its audit and compliance mission imposes additional requirements that differentiate it from standard microservices:

Immutability¶

Requirement: Build artifacts must be immutable and uniquely versioned; overwriting artifacts breaks audit trails and tamper-evidence chains.

Implementation:

Semantic Versioning: Every build receives a unique version number: <major>.<minor>.<patch> where patch is auto-incremented via Azure DevOps counters.
Docker Image Tagging: Images are tagged with the build number (e.g., 1.0.42) AND latest for convenience, but deployments reference specific versions, not latest.
NuGet Package Retention: Azure Artifacts feed retains all package versions; deletion requires compliance team approval.
Artifact Checksums: SHA256 hashes computed for all binaries and included in SBOM; used to verify deployment integrity.

Rationale: ATP's mission is tamper-evidence and auditability. If artifacts can be overwritten, there's no way to prove that the code audited in staging is the same code deployed to production. Immutability extends the chain of custody from source code to runtime.

Audit Trail¶

Requirement: Pipeline execution logs and approval records must be retained and correlated with deployed versions for compliance audits.

Implementation:

Extended Retention: Pipeline logs retained for 1 year (vs. Azure DevOps default 30 days) per SOC 2 and HIPAA requirements.
Approval Audit Log: Manual approvals (staging, production deployments) recorded with approver identity, timestamp, justification, and linked work items.
Deployment Manifest: Each deployment generates a manifest file (JSON) containing pipeline run ID, commit SHA, artifact versions, environment, and timestamp; stored in Azure Blob Storage with WORM policy.
Correlation Identifiers: Services emit deploymentId and buildVersion in telemetry, enabling cross-reference between runtime behavior and pipeline execution.

Rationale: Auditors and compliance teams need to trace any production issue back to its source: which code change, which build, which approval, which tests passed. ATP's pipeline audit trail provides this end-to-end traceability.

See Also: Deployment manifest schema and retention policies in operations/backups-restore-ediscovery.md.

Multi-Service Orchestration¶

Requirement: ATP comprises 7+ microservices (Ingestion, Query, Integrity, Export, Policy, Search, Gateway) that must be built, tested, and deployed in a coordinated manner.

Implementation:

Independent Pipelines: Each service has its own azure-pipelines.yml and can be built/deployed independently (microservice autonomy).
Cross-Service Dependencies: When Service A depends on Service B's NuGet package or API contract, the pipeline validates compatibility using versioned contracts.
Integration Tests: Test stage includes cross-service integration tests (e.g., Ingestion → Query) using Docker Compose or testcontainers to simulate full stack.
Orchestration Pipelines: Separate "meta-pipelines" trigger coordinated deployments of multiple services (e.g., full ATP stack deployment to staging).

Benefits:

Autonomy: Service teams can iterate independently without waiting for platform-wide releases.
Safety: Integration tests catch breaking changes before they reach shared environments.
Coordination: When needed (e.g., major version bumps), orchestration pipelines ensure services are deployed in compatible sets.

Challenge: Managing contract versioning and backward compatibility across 7+ services. Addressed via semantic versioning, API versioning headers, and contract testing (see domain/contracts/ documentation).

Compliance Artifacts¶

Requirement: Every build must produce compliance artifacts (SBOM, security scan reports, test results) to support audits and certifications (SOC 2, ISO 27001, HIPAA).

Implementation:

SBOM Generation: CycloneDX SBOM generated during build (via dotnet sbom or third-party tools) and published as pipeline artifact.
Security Scan Reports: SonarQube, OWASP Dependency-Check, and Trivy (for Docker images) results exported as JSON/XML and attached to build.
Test Results: .trx files, code coverage reports (Cobertura XML), and test duration metrics published to Azure DevOps Test Plans.
ADR Snapshots: Architecture Decision Records (ADRs) committed to repo; pipeline validates no undocumented changes to critical components.
Artifact Aggregation: Post-deployment, a compliance bundle (SBOM + scans + tests + approvals) is packaged and stored in long-term retention storage.

Benefits:

Audit Readiness: When auditors request evidence, compliance artifacts are already collected and indexed by build/deployment.
Regulatory Proof: SOC 2 Type II requires evidence that security controls were executed; pipeline artifacts provide timestamped, immutable proof.
Incident Investigation: If a vulnerability is discovered, SBOM enables rapid identification of affected builds and deployments.

See Also: SBOM schemas, compliance artifact retention, and audit procedures in ../platform/security-compliance.md.

Repository Structure & Templates¶

The ConnectSoft.AzurePipelines repository serves as the single source of truth for all pipeline templates across the ConnectSoft ecosystem. By centralizing template definitions, the organization ensures that best practices, security standards, and compliance requirements are consistently applied to every service, including all ATP microservices.

This structure eliminates template duplication, reduces maintenance overhead, and enables platform-wide improvements to propagate automatically. Service teams consume these templates via Azure DevOps repository references, parameterizing them with service-specific configuration while inheriting the standardized pipeline logic.

ConnectSoft.AzurePipelines Repository¶

The repository is organized into functional directories, each containing templates for a specific pipeline stage or concern. Templates are designed to be composable — service pipelines can mix and match templates based on their needs (e.g., a service might use build-microservice-steps.yaml but skip Docker image building if it's not containerized).

ConnectSoft.AzurePipelines/
├── build/
│   ├── build-microservice-steps.yaml
│   ├── lint-microservice-steps.yaml
│   └── build-and-push-microservice-docker-steps.yaml
├── test/
│   ├── test-microservice-steps.yaml
│   └── install-test-dependencies-microservice.yaml
├── publish/
│   ├── publish-microservice-steps.yaml
│   └── publish-microservice.yaml
├── deploy/
│   ├── deploy-microservice-to-azure-web-site.yaml
│   └── deploy-microservice-to-iis.yaml
├── generate/
│   ├── generate-microservice-documentation-steps.yaml
│   └── generate-microservice-reports-steps.yaml
└── infrastructure/
    └── create-microservice-infrastructure-pulumi.yaml

build/ — Build and Compilation Templates¶

Purpose: Templates for compiling .NET code, restoring dependencies, and building Docker images.

build-microservice-steps.yaml

Responsibility: Compile .NET solutions using dotnet build with Release configuration.
Key Steps:
- Authenticate to Azure Artifacts feed for private NuGet packages
- Restore dependencies (dotnet restore)
- Build solution with deterministic builds enabled
- Version stamping using pipeline variables ($(Build.BuildNumber))
Parameters:
- solution: Glob pattern for solution file (e.g., **/*.sln or **/*.slnx)
- exactSolution: Explicit solution name for disambiguation
- buildConfiguration: Release/Debug (default: Release)
- restoreVstsFeed: Azure Artifacts feed ID for NuGet authentication
Usage Example:

- template: build/build-microservice-steps.yaml@templates
  parameters:
    solution: '**/*.slnx'
    exactSolution: 'ConnectSoft.ATP.Ingestion.slnx'
    buildConfiguration: 'Release'
    restoreVstsFeed: 'e4c108b4-7989-4d22-93d6-391b77a39552'

lint-microservice-steps.yaml

Responsibility: Enforce code quality, style conventions, and detect deprecated packages.
Key Steps:
- Run StyleCop analyzers for C# code style violations
- Execute SonarQube analysis (prepare → build → publish)
- Scan for deprecated/vulnerable NuGet packages
- Clean up lingering Docker Compose containers from previous test runs
Parameters:
- solution: Solution file pattern
- exactSolution: Explicit solution name
- restoreVstsFeed: Feed ID for package restore
- isNugetAuthenticateEnabled: Enable/disable feed authentication (default: true)
Quality Gates: SonarQube quality gate failure blocks pipeline progression.
Rationale: Linting early (before build) catches style and security issues before wasting compute on compilation and tests.

build-and-push-microservice-docker-steps.yaml

Responsibility: Build and push Docker images to container registry (Azure Container Registry or Docker Hub).
Key Steps:
- Authenticate to container registry using service connection
- Build multi-stage Dockerfile with layer caching optimization
- Tag image with build number and latest
- Scan image with Trivy (vulnerability scanner) before push
- Push image to registry only if scan passes
Parameters:
- dockerRegistryServiceConnection: Azure DevOps service connection for registry
- imageRepository: Image name (e.g., connectsoft/atp-ingestion)
- containerRegistry: Registry URL (e.g., connectsoft.azurecr.io)
- dockerfile: Path to Dockerfile
- buildContext: Build context directory (usually .)
- tags: Multi-line string of tags (e.g., $(Build.BuildNumber) and latest)
Optimization: Uses Docker BuildKit for improved caching and parallel layer builds.

test/ — Testing and Coverage Templates¶

Purpose: Execute unit tests, integration tests, and enforce code coverage thresholds.

test-microservice-steps.yaml

Responsibility: Run tests with service containers (Redis, SQL, RabbitMQ, etc.) and collect coverage.
Key Steps:
- Start service containers (defined in pipeline's containers: section)
- Execute dotnet test with .runsettings configuration
- Collect code coverage using Coverlet or OpenCover
- Publish coverage report (Cobertura XML) to Azure DevOps
- Enforce coverage threshold; fail pipeline if below minimum
- Publish test results (.trx files) to Azure Test Plans
Parameters:
- solution: Solution file pattern
- runSettingsFileName: Name of .runsettings file (e.g., ConnectSoft.ATP.Ingestion.runsettings)
- buildConfiguration: Configuration to test (default: Release)
- codeCoverageThreshold: Minimum line coverage percentage (default: 70)
Service Containers: Template assumes containers are defined in calling pipeline (Redis, SQL, MongoDB, RabbitMQ, OTEL Collector, Seq).
Parallelization: .runsettings configures test parallelization for faster execution.

install-test-dependencies-microservice.yaml

Responsibility: Install tools and dependencies required for testing (e.g., database migration tools, test data seeders).
Key Steps:
- Install .NET global tools (dotnet tool install)
- Set up test databases (run migrations, seed test data)
- Configure connection strings for test environment
Usage: Rarely needed for ATP services (dependencies usually containerized); included for completeness.

publish/ — Artifact Publishing Templates¶

Purpose: Publish build artifacts (binaries, NuGet packages) to Azure DevOps or Azure Artifacts.

publish-microservice-steps.yaml

Responsibility: Publish compiled binaries and related files to Azure Pipelines artifact storage.
Key Steps:
- Copy binaries from $(Build.SourcesDirectory)/bin/Release to $(Build.ArtifactStagingDirectory)
- Include SBOM, security scan reports, and test results
- Publish artifact to pipeline with specified artifact name
Parameters:
- artifactName: Name of artifact (e.g., atp-ingestion-drop)
- artifactPath: Path to files to publish (default: $(Build.ArtifactStagingDirectory))
Consumption: CD stages download this artifact via Pipeline.Workspace and deploy it.

publish-microservice.yaml

Responsibility: Package and publish NuGet packages to Azure Artifacts feed.
Key Steps:
- Run dotnet pack with version from pipeline variables
- Publish .nupkg and .snupkg (symbol package) to feed
- Authenticate using Azure Artifacts credential provider
Parameters:
- feedId: Azure Artifacts feed ID
- packageVersion: Semantic version (e.g., $(Build.BuildNumber))
Usage: For shared libraries/contracts that other services consume (e.g., ConnectSoft.ATP.Contracts NuGet package).

deploy/ — Deployment Templates¶

Purpose: Deploy artifacts to Azure App Service, IIS, or Kubernetes clusters.

deploy-microservice-to-azure-web-site.yaml

Responsibility: Deploy artifact to Azure App Service using Azure DevOps service connection.
Key Steps:
- Download artifact from pipeline workspace
- Deploy to Azure App Service using AzureWebApp@1 task
- Apply app settings and connection strings (from variable groups or Key Vault)
- Restart app service
- Run health check to validate deployment
Parameters:
- azureSubscription: Azure DevOps service connection name
- appName: Azure App Service name (e.g., atp-ingestion-dev)
- package: Path to artifact package (e.g., $(Pipeline.Workspace)/drop/*.zip)
- appSettings: Key-value pairs for app configuration (optional)
Deployment Strategies: Supports blue-green (slot swap), rolling, and canary via App Service deployment slots.

deploy-microservice-to-iis.yaml

Responsibility: Deploy artifact to on-premises IIS servers via self-hosted agents.
Key Steps:
- Download artifact to agent machine
- Stop IIS application pool
- Copy files to deployment directory
- Update web.config with environment-specific settings
- Start application pool
- Validate deployment via HTTP health check
Parameters:
- targetMachine: Self-hosted agent pool or machine group
- deploymentPath: IIS site physical path
- appPoolName: IIS application pool name
Use Case: For ConnectSoft customers with hybrid cloud or on-premises requirements.

generate/ — Documentation and Reporting Templates¶

Purpose: Auto-generate documentation, API specs, and compliance reports.

generate-microservice-documentation-steps.yaml

Responsibility: Generate OpenAPI/Swagger specs, Markdown docs, and architectural diagrams.
Key Steps:
- Run dotnet swagger tofile to export OpenAPI JSON
- Generate Markdown documentation from XML comments (using Docfx or similar)
- Publish documentation as pipeline artifact or to Azure Blob Storage
Parameters:
- projectPath: Path to API project
- outputPath: Directory for generated docs
Integration: ATP uses this to keep API documentation synchronized with code changes.

generate-microservice-reports-steps.yaml

Responsibility: Generate compliance reports (SBOM, dependency tree, security scan summary).
Key Steps:
- Aggregate SBOM, security scans, and test results into single report
- Generate PDF or HTML summary for auditors
- Publish report as artifact and optionally email to compliance team
Parameters:
- reportFormat: PDF, HTML, or JSON
- recipients: Email addresses for report distribution (optional)
Compliance: ATP uses this for quarterly SOC 2 attestation packages.

infrastructure/ — Infrastructure as Code Templates¶

Purpose: Provision Azure infrastructure using Pulumi, Terraform, or Bicep.

create-microservice-infrastructure-pulumi.yaml

Responsibility: Deploy Azure resources (App Service, SQL Database, Redis, Key Vault) via Pulumi stacks.
Key Steps:
- Install Pulumi CLI
- Authenticate to Azure using service principal
- Run pulumi up with specified stack (dev, test, staging, prod)
- Capture outputs (connection strings, endpoints) as pipeline variables
- Store Pulumi state in Azure Blob Storage backend
Parameters:
- workingDirectory: Path to Pulumi project (e.g., infrastructure/)
- stackName: Pulumi stack name (e.g., atp-staging)
- azureSubscription: Service connection for Azure authentication
Usage: ATP infrastructure pipelines reference this template to provision environments.
Alternative: ConnectSoft also supports Terraform and Bicep templates (not shown here).

ATP Pipeline Files¶

Each ATP microservice repository contains its own azure-pipelines.yml file that orchestrates the CI/CD workflow by composing templates from the ConnectSoft.AzurePipelines repository. Service-specific configuration is provided through pipeline variables and parameters passed to templates.

Per-Service Pipelines¶

Structure:

Repository: ConnectSoft.ATP.{ServiceName} (e.g., ConnectSoft.ATP.Ingestion)
File: /azure-pipelines.yml in repository root
Content:
- Resource references (template repository, service containers)
- Pipeline triggers (branches, paths)
- Variable definitions (solution, feed IDs, version numbers)
- Stages: CI (build, test, publish), CD (deploy to environments)
- Template references with service-specific parameters

Example: ATP Ingestion Pipeline (ConnectSoft.ATP.Ingestion/azure-pipelines.yml):

name: $(majorMinorVersion).$(semanticVersion)

resources:
  repositories:
    - repository: templates
      type: git
      name: ConnectSoft/ConnectSoft.AzurePipelines
  containers:
    - container: redis
      image: redis:7-alpine
      ports: [6379:6379]
    - container: rabbitmq
      image: rabbitmq:3-management-alpine
      ports: [5672:5672, 15672:15672]
    - container: otel-collector
      image: otel/opentelemetry-collector:0.97.0
      ports: [4317:4317, 8888:8888]

trigger:
  branches:
    include: [master, main, develop]
  paths:
    exclude: [README.md, docs/**, .github/**]

variables:
  majorMinorVersion: 1.0
  semanticVersion: $[counter(variables['majorMinorVersion'], 0)]
  solution: '**/*.slnx'
  exactSolution: 'ConnectSoft.ATP.Ingestion.slnx'
  buildConfiguration: 'Release'
  codeCoverageThreshold: 75
  restoreVstsFeed: 'e4c108b4-7989-4d22-93d6-391b77a39552'
  artifactName: 'atp-ingestion-drop'

stages:
- stage: CI_Stage
  displayName: 'Build, Test, and Publish'
  jobs:
  - job: Build_Test_Publish
    pool:
      vmImage: 'ubuntu-latest'
    services:
      redis: redis
      rabbitmq: rabbitmq
      otel: otel-collector
    steps:
    - template: build/lint-microservice-steps.yaml@templates
      parameters:
        solution: $(solution)
        exactSolution: $(exactSolution)
        restoreVstsFeed: $(restoreVstsFeed)
    - template: build/build-microservice-steps.yaml@templates
      parameters:
        solution: $(solution)
        buildConfiguration: $(buildConfiguration)
    - template: test/test-microservice-steps.yaml@templates
      parameters:
        solution: $(solution)
        runSettingsFileName: 'ConnectSoft.ATP.Ingestion.runsettings'
        codeCoverageThreshold: $(codeCoverageThreshold)
    - template: publish/publish-microservice-steps.yaml@templates
      parameters:
        artifactName: $(artifactName)

- stage: CD_Dev
  displayName: 'Deploy to Development'
  dependsOn: CI_Stage
  condition: succeeded()
  jobs:
  - deployment: DeployToDev
    environment: ATP-Dev
    pool:
      vmImage: 'ubuntu-latest'
    strategy:
      runOnce:
        deploy:
          steps:
          - template: deploy/deploy-microservice-to-azure-web-site.yaml@templates
            parameters:
              azureSubscription: '$(azureSubscription)'
              appName: 'atp-ingestion-dev'
              package: '$(Pipeline.Workspace)/$(artifactName)/*.zip'

Key Elements:

Template Repository Reference: repository: templates with name: ConnectSoft/ConnectSoft.AzurePipelines enables @templates alias.
Service Containers: Redis, RabbitMQ, OTEL Collector for integration tests.
Trigger Configuration: Build on master, main, develop branches; exclude documentation changes.
Variable Definitions: Version numbers, solution paths, coverage thresholds, feed IDs.
Template Composition: Each template reference (template: build/lint-microservice-steps.yaml@templates) inherits shared logic.
Stage Dependencies: CD_Dev stage depends on CI_Stage success.

Shared Variables¶

Variable Groups: Centralized configuration stored in Azure DevOps Library, scoped per environment.

Purpose: Eliminate duplication of environment-specific values (Azure subscription IDs, Key Vault URLs, connection strings, service endpoints) across service pipelines.

Structure:

Azure DevOps → Library → Variable Groups

├── ATP-Dev-Variables
│   ├── azureSubscription: "ConnectSoft-Dev-ServiceConnection"
│   ├── containerRegistry: "connectsoftdev.azurecr.io"
│   ├── keyVaultUrl: "https://atp-kv-dev.vault.azure.net/"
│   └── redisConnectionString: (secret, linked from Key Vault)
├── ATP-Test-Variables
│   ├── azureSubscription: "ConnectSoft-Test-ServiceConnection"
│   ├── containerRegistry: "connectsofttest.azurecr.io"
│   ├── keyVaultUrl: "https://atp-kv-test.vault.azure.net/"
│   └── redisConnectionString: (secret, linked from Key Vault)
├── ATP-Staging-Variables
│   ├── azureSubscription: "ConnectSoft-Staging-ServiceConnection"
│   ├── containerRegistry: "connectsoftstaging.azurecr.io"
│   ├── keyVaultUrl: "https://atp-kv-staging.vault.azure.net/"
│   └── redisConnectionString: (secret, linked from Key Vault)
└── ATP-Prod-Variables
    ├── azureSubscription: "ConnectSoft-Prod-ServiceConnection"
    ├── containerRegistry: "connectsoftprod.azurecr.io"
    ├── keyVaultUrl: "https://atp-kv-prod.vault.azure.net/"
    └── redisConnectionString: (secret, linked from Key Vault)

Usage in Pipeline:

variables:
- group: ATP-Dev-Variables  # Link variable group for dev stage
- name: localVariable
  value: 'service-specific-value'

stages:
- stage: CD_Dev
  variables:
  - group: ATP-Dev-Variables
  jobs:
  - deployment: DeployToDev
    steps:
    - script: echo "Deploying to $(azureSubscription)"

Benefits:

Centralized Management: Change Key Vault URL once in variable group; all services inherit the change.
Secret Security: Connection strings and API keys stored in Key Vault; variable group links to secrets (not plaintext).
Environment Parity: Same variable names across environments (e.g., azureSubscription) with different values per group.
Access Control: Variable groups have separate RBAC; production variables restricted to senior engineers/SREs.

See Also: Secret management and Key Vault integration in environments.md.

Template References¶

Mechanism: Service pipelines reference templates using the template: keyword with repository alias and relative path.

Syntax:

- template: {path/to/template.yaml}@{repository-alias}
  parameters:
    {parameterName}: {value}

Example:

steps:
- template: build/build-microservice-steps.yaml@templates
  parameters:
    solution: $(solution)
    buildConfiguration: $(buildConfiguration)

How It Works:

Azure Pipelines resolves @templates to resources.repositories.templates (ConnectSoft.AzurePipelines repo).
Checks out the template repository at pipeline runtime.
Injects the template's YAML content inline, replacing parameters with provided values.
Executes steps as if they were defined in the service pipeline.

Versioning:

Default: Uses main branch of template repository (latest version).
Pinning: Pin to specific tag/branch for stability:

resources:
  repositories:
    - repository: templates
      type: git
      name: ConnectSoft/ConnectSoft.AzurePipelines
      ref: refs/tags/v1.2.3  # Pin to v1.2.3

Benefits:

Version Control: Service teams can adopt template changes incrementally (pin → test → upgrade).
Rollback: If new template version breaks pipelines, services can revert to previous tag.
Testing: Platform team can create experimental templates on feature branches; services opt-in for testing.

Governance:

Template Repository Access: Only DevOps/Platform Engineering team has write access; service teams are read-only.
Change Control: Template changes require pull request review, preventing accidental breakage of all service pipelines.
Audit Trail: Git history of template repository provides full audit trail of pipeline logic changes.

Pipeline Stages & Flow¶

ATP pipelines follow a linear progression model where code flows through clearly defined stages: CI (continuous integration) validates and packages code, followed by CD (continuous deployment) stages that promote artifacts through environments with graduated controls. Each stage has explicit dependencies, conditions, and quality gates that determine whether progression continues.

This structure ensures that only validated artifacts reach production, and that each environment serves a specific purpose in the validation chain. The separation between CI and CD enables independent failure domains — build issues don't impact running services, and deployment failures don't require rebuilding artifacts.

Stage 1: CI (Build & Test)¶

The CI Stage is the gatekeeper for code quality. Its sole responsibility is to validate that code meets standards (style, tests, security) and produce immutable, versioned artifacts ready for deployment. CI runs on every commit to protected branches and on pull requests, providing fast feedback to developers.

Duration Target: < 10 minutes (optimized for developer productivity; triggers alerts if exceeded).

Failure Impact: Blocks all downstream CD stages; developers notified immediately.

1. Lint¶

Purpose: Detect code style violations, security anti-patterns, and deprecated dependencies before compilation.

Execution:

StyleCop Analyzers: Enforce C# coding conventions (naming, formatting, documentation).
- Rules configured via .editorconfig and stylecop.json.
- Violations treated as errors (build-breaking) for critical rules; warnings for cosmetic issues.
SonarQube Analysis Preparation:
- Run SonarQubePrepare@5 task to initialize analysis context.
- Configured with project key, organization, and quality gate profile.
- Prepares for code analysis during build step.
Deprecated Package Detection:
- Scan packages.lock.json or project files for packages marked deprecated or vulnerable.
- Fail pipeline if any critical vulnerabilities found (based on NVD severity scores).
Docker Cleanup:
- Remove lingering containers from previous test runs (docker-compose down --volumes).
- Ensures clean state for integration tests.

Rationale: Linting early prevents wasting compute on building and testing code that would fail quality gates later. StyleCop issues caught here avoid cognitive load during code review. Deprecated package detection prevents supply chain vulnerabilities before they reach production.

Failure Scenarios:

StyleCop violations in new code (developers must fix before re-running).
Deprecated packages used (replace with supported alternatives or request exception).
SonarQube connection failure (fallback: allow build, but flag for manual review).

2. Build¶

Purpose: Compile .NET solution into release binaries with deterministic versioning.

Execution:

Dependency Restoration:
- Run dotnet restore with Azure Artifacts feed authentication.
- Downloads NuGet packages (ConnectSoft shared libraries, third-party dependencies).
- Uses NuGet package lock file (packages.lock.json) for reproducible builds.
Compilation:
- Run dotnet build --configuration Release --no-restore.
- Deterministic build enabled (<Deterministic>true</Deterministic> in .csproj).
- Version stamping: Assembly version set to $(Build.BuildNumber) (e.g., 1.0.42).
- Build outputs written to bin/Release/net8.0/.
SonarQube Analysis:
- Code analyzed during build (metrics, code smells, security hotspots).
- Results cached for SonarQube publish step (after tests).

Rationale: Release configuration ensures optimizations (dead code elimination, JIT optimizations). Deterministic builds guarantee that same source + dependencies = same binary (critical for audit trail). Version stamping enables traceability from runtime back to pipeline run.

Failure Scenarios:

Compilation errors (syntax, type mismatches, missing references).
Feed authentication failure (check service connection credentials).
Out-of-memory during build (increase agent memory or optimize solution structure).

3. Test¶

Purpose: Execute unit and integration tests with real service dependencies (Redis, SQL, RabbitMQ) to validate business logic and integration contracts.

Execution:

Service Container Startup:
- Azure Pipelines starts containers defined in pipeline's resources.containers section.
- Services available on localhost (e.g., redis://localhost:6379).
- Containers isolated per pipeline run (no cross-contamination).
Test Execution:
- Run dotnet test with .runsettings configuration.
- Example: ConnectSoft.ATP.Ingestion.runsettings configures:
  - Parallelization: MaxCpuCount=0 (use all cores).
  - Timeout: 10 minutes per test assembly.
  - Data collectors: Coverage (Coverlet), crash dumps.
- Tests tagged with categories: [Unit], [Integration], [Slow].
- Integration tests use Testcontainers or pipeline service containers.
Test Categories:
- Unit Tests: Fast, in-memory, no external dependencies. Cover domain logic, validation, mapping.
- Integration Tests: Slower, use Redis/SQL/RabbitMQ. Cover repository operations, message publishing, cache interactions.
- Contract Tests: Validate API schemas, message formats against published contracts.

Rationale: Testing with real services (not mocks) catches integration issues early. Testcontainers (or pipeline containers) provide production-like environment without infrastructure overhead. Parallelization keeps test duration manageable as test suites grow.

Failure Scenarios:

Test failures (business logic bugs, race conditions, flaky tests).
Service container unavailable (network issues, image pull failures).
Test timeout (investigate slow tests, optimize database queries).

4. Code Coverage¶

Purpose: Measure test coverage and enforce minimum thresholds to ensure code is adequately tested.

Execution:

Coverage Collection:
- Coverlet collector runs during dotnet test, instrumenting assemblies.
- Generates coverage.cobertura.xml report (line and branch coverage).
Coverage Publishing:
- Publish coverage report to Azure DevOps using PublishCodeCoverageResults@1 task.
- Reports visible in pipeline summary and Test Plans.
- Trends tracked over time (coverage increasing/decreasing alerts).
Threshold Enforcement:
- Custom script or task validates coverage ≥ threshold (ATP default: 70% line, 60% branch).
- Pipeline fails if threshold not met.
- Exclusions: Generated code (*.Designer.cs), migrations, test projects.

Rationale: Coverage thresholds ensure new code is tested before merge. 70% line coverage is ConnectSoft standard (balance between safety and pragmatism). Branch coverage ensures conditional logic (if/else, switch) is tested across paths.

ATP Thresholds by Service:

Ingestion: 75% (high, due to critical path for audit record integrity).
Query: 80% (high, due to complex query logic and filtering).
Gateway: 65% (moderate, includes thin API controllers).
Integrity: 85% (very high, cryptographic operations require comprehensive testing).
Export: 70% (standard, mix of orchestration and business logic).
Policy: 75% (high, policy evaluation must be deterministic).
Search: 70% (standard, Elasticsearch integration).

Failure Scenarios:

Coverage below threshold (add tests or request exception with justification).
Coverage report parsing failure (check Coverlet version compatibility).

5. Security Scan¶

Purpose: Detect security vulnerabilities, code quality issues, and secrets in code before artifacts are published.

Execution:

SAST (Static Application Security Testing) — SonarQube:
- Run SonarQubePublish@5 task to upload analysis results.
- SonarQube evaluates code against quality gate:
  - Zero critical/high severity vulnerabilities.
  - Code smells below threshold (maintainability rating ≥ B).
  - Duplications < 3%.
- Quality gate failure blocks pipeline.
Dependency Scanning — OWASP Dependency-Check:
- Scan NuGet packages against National Vulnerability Database (NVD).
- Generates report with CVE IDs, CVSS scores, and remediation advice.
- Fail pipeline if critical vulnerabilities found (CVSS ≥ 9.0).
Secrets Detection — GitGuardian or GitHub Advanced Security:
- Scan commit diffs for hardcoded secrets (API keys, passwords, connection strings).
- Patterns: AWS keys, Azure storage keys, JWT tokens, database credentials.
- Fail pipeline if secrets detected; alert security team for rotation.
Docker Image Scanning — Trivy (if building containers):
- Scan Docker image layers for OS vulnerabilities (CVEs in base image, packages).
- Fail pipeline if critical vulnerabilities in final image.
- Advisory-only for low/medium severity (logged for review).

Rationale: Security scanning in CI prevents vulnerabilities from reaching production. SAST catches code-level issues (SQL injection, XSS). Dependency scanning addresses supply chain risks. Secrets detection prevents credential leaks. Multi-layered approach provides defense-in-depth.

Failure Scenarios:

SonarQube quality gate failure (refactor code, suppress false positives with justification).
Vulnerable NuGet package (upgrade to patched version or request exception).
Secret detected (remove secret, rotate credential, update vault reference).
Trivy scan failure (update base image, apply security patches).

6. Publish Artifacts¶

Purpose: Package and publish validated build outputs as immutable artifacts for deployment.

Execution:

Binary Artifacts:
- Copy compiled binaries from bin/Release to $(Build.ArtifactStagingDirectory).
- Include dependencies, configuration templates, health check scripts.
- Publish using PublishPipelineArtifact@1 with artifact name (e.g., atp-ingestion-drop).
Docker Images (if applicable):
- Build Docker image using docker build with multi-stage Dockerfile.
- Tag with $(Build.BuildNumber) and latest.
- Push to Azure Container Registry (ACR) or Docker Hub.
NuGet Packages (for shared libraries):
- Run dotnet pack with version $(Build.BuildNumber).
- Publish .nupkg and .snupkg (symbol package) to Azure Artifacts.
Compliance Artifacts:
- SBOM (Software Bill of Materials): CycloneDX JSON/XML with all dependencies.
- Security Scan Reports: SonarQube results, OWASP report, Trivy scan.
- Test Results: .trx files, code coverage Cobertura XML.
- Build Metadata: JSON manifest with commit SHA, pipeline run ID, timestamp, approver.

Rationale: Immutable artifacts ensure CD stages deploy exact same binaries tested in CI. SBOM enables vulnerability tracking and compliance audits. Compliance artifacts provide evidence for SOC 2, ISO 27001 attestations.

Artifact Retention:

Binaries: 90 days (Azure DevOps default); extended to 1 year for production releases.
Docker Images: Retention policy in ACR (keep last 10 versions per tag; prod images retained indefinitely).
Compliance Artifacts: 7 years (regulatory requirement for audit evidence).

Stage 2: CD (Deploy to Environments)¶

The CD Stages promote artifacts through environments with increasing production-fidelity and control. Each environment serves a specific validation purpose, and progression is gated by automated tests, manual approvals, or both.

Key Principle: Deploy the same artifact across all environments (no rebuilding). CD stages download artifacts from CI stage and deploy byte-for-byte identical binaries.

1. Deploy to Dev¶

Purpose: Continuous integration environment for developers to validate changes with full ATP stack.

Characteristics:

Approval: None (fully automated).
Trigger: Every successful CI build on develop, main, or master branches.
Data: Synthetic test data; reset nightly.
Configuration: Debug-friendly (verbose logging, SQL query logging, no rate limits).
SLA: 95% uptime (best-effort; downtime acceptable for testing).

Deployment Process:

Download artifact from CI stage ($(Pipeline.Workspace)/drop).
Deploy to Azure App Service (dev slot) using AzureWebApp@1 task.
Apply app settings from ATP-Dev-Variables variable group.
Restart app service.
Wait for health check (/health endpoint returns 200).
Run basic smoke tests (API responds, database connected, cache available).

Health Checks:

Liveness: Service responds to HTTP requests.
Readiness: Dependencies (Redis, SQL, RabbitMQ) reachable.
Smoke Tests: Create audit record → query → verify integrity.

Rollback: Automated on health check failure (revert to previous deployment slot).

Use Cases:

Developer testing of new features.
Integration testing across ATP services.
Demo environment for product team.

2. Deploy to Test¶

Purpose: System verification environment for QA team to run full regression test suite.

Characteristics:

Approval: None (automated after CI success).
Trigger: CI build success on master or main branches.
Data: Stable test datasets (versioned, restored from snapshots).
Configuration: Production-like (standard logging, rate limits enabled, feature flags match staging).
SLA: 98% uptime.

Deployment Process:

Download artifact from CI stage.
Deploy to Azure App Service (test environment).
Apply app settings from ATP-Test-Variables variable group.
Restart app service.
Run health checks and smoke tests.
Trigger automated regression test suite (Playwright/Selenium E2E tests).
Publish test results to Azure Test Plans.

Regression Tests:

API Contract Tests: Validate OpenAPI spec compliance.
Functional Tests: End-to-end workflows (ingest → query → export).
Performance Tests: Load test with realistic traffic patterns (P95 latency < 500ms).
Security Tests: OWASP ZAP dynamic scan, authentication/authorization checks.

Rollback: Automated on test failure (revert deployment, alert QA team).

Use Cases:

QA regression testing.
Performance baseline validation.
Security testing (dynamic scans).

3. Deploy to Staging¶

Purpose: Pre-production environment for final validation before production release.

Characteristics:

Approval: Manual (1 approver from Platform/SRE team).
Trigger: Manual (initiated via Azure DevOps UI or API).
Data: Production-like (anonymized copy of production data, refreshed weekly).
Configuration: Identical to production (same feature flags, rate limits, encryption).
SLA: 99.5% uptime.

Deployment Process:

Pre-Approval:
- Approver verifies linked work items (Epic/Feature/Task).
- Checks that test stage passed all regression tests.
- Reviews release notes and rollback plan.
Download artifact from CI stage.
Deploy to Azure App Service using blue-green strategy (deploy to staging slot).
Apply app settings from ATP-Staging-Variables variable group.
Warm up staging slot (prefetch caches, prime connections).
Run health checks, smoke tests, and regression tests.
Slot Swap: If tests pass, swap staging slot to production slot (zero-downtime).
Monitor for 30 minutes (error rates, latency, health checks).

Validation Tests:

Full Regression Suite: All tests from Test environment.
Load Tests: Simulate production traffic (10x normal load).
Chaos Tests: Inject failures (kill Redis, throttle SQL, network partition).
Compliance Tests: Verify GDPR redaction, data residency, retention policies.

Rollback: Manual or automated (swap slots back to previous version if metrics degrade).

Use Cases:

Final validation before production.
Chaos engineering and resilience testing.
Training for customer success team (demos).

4. Deploy to Production¶

Purpose: Live environment serving real tenant traffic with highest reliability and compliance standards.

Characteristics:

Approval: Manual (2 approvers: SRE + Platform Lead; CAB approval for major releases).
Trigger: Manual (initiated after successful staging deployment).
Data: Real tenant data (multi-tenant, WORM storage, legal holds).
Configuration: Production-hardened (encryption at rest/transit, audit logging, rate limits, DDoS protection).
SLA: 99.9% uptime (3 nines).

Deployment Process:

Pre-Approval:
- Change Advisory Board (CAB) review for major releases (breaking changes, schema migrations).
- Approvers verify:
  - Staging deployment successful.
  - Rollback plan documented.
  - On-call rotation scheduled (SRE available for 24 hours post-deploy).
  - Communication plan (customer notifications, status page updates).
Download artifact from CI stage (same artifact deployed to dev/test/staging).
Deploy using canary strategy:
- Phase 1 (10%): Deploy to 10% of instances; monitor for 30 minutes.
- Phase 2 (50%): If metrics healthy, deploy to 50%; monitor for 1 hour.
- Phase 3 (100%): If metrics healthy, deploy to all instances.
Apply app settings from ATP-Prod-Variables variable group.
Monitor production metrics (error rate, latency, throughput, health checks).
Emit deployment event to observability stack (correlate with service behavior).

Deployment Strategies:

Canary (Default): Gradual rollout with automated rollback on metric degradation.
Blue-Green: For database schema changes or high-risk releases (instant rollback via slot swap).
Rolling: For low-risk releases (sequential instance updates with health checks).

Monitoring & Rollback:

Automated Rollback Triggers:
- Error rate > 1% (compared to pre-deployment baseline).
- P95 latency > 2x baseline (e.g., 1000ms vs. 500ms).
- Health check failures > 10% of instances.
- Critical alerts fired (CPU > 90%, memory > 85%, disk full).
Manual Rollback: SRE can abort deployment via Azure DevOps UI or CLI.
Rollback Mechanism: Swap deployment slot back or redeploy previous artifact version.

Post-Deployment:

Validation Period: 24-hour observation period with on-call SRE.
Communication: Update status page, notify customer success team.
Retrospective: Post-deployment review if issues occurred.

Use Cases:

Serving real tenant audit trail traffic.
Production compliance and regulatory requirements.
SLA-backed service delivery.

Stage Dependencies¶

Azure Pipelines' dependsOn and condition keywords enable explicit control over stage progression. ATP pipelines enforce linear progression (dev → test → staging → production) with branch and trigger conditions.

Dependency Flow:

stages:
- stage: CI_Stage
  displayName: 'Continuous Integration'
  jobs: [Build, Test, SecurityScan]

- stage: CD_Dev
  displayName: 'Deploy to Dev'
  dependsOn: CI_Stage
  condition: succeeded()
  jobs: [DeployToDev]

- stage: CD_Test
  displayName: 'Deploy to Test'
  dependsOn: CI_Stage
  condition: succeeded()
  jobs: [DeployToTest]

- stage: CD_Staging
  displayName: 'Deploy to Staging'
  dependsOn: 
    - CD_Dev
    - CD_Test
  condition: |
    and(
      succeeded(),
      eq(variables['Build.SourceBranch'], 'refs/heads/master')
    )
  jobs: [DeployToStaging]

- stage: CD_Production
  displayName: 'Deploy to Production'
  dependsOn: CD_Staging
  condition: |
    and(
      succeeded(),
      eq(variables['Build.Reason'], 'Manual')
    )
  jobs: [DeployToProduction]

Condition Explanations:

succeeded(): Stage only runs if all dependencies completed successfully.
eq(variables['Build.SourceBranch'], 'refs/heads/master'): Only deploy to staging from master branch (not feature branches).
eq(variables['Build.Reason'], 'Manual'): Production deployments must be manually triggered (no auto-deploy).

Advanced Conditions:

Deploy on PR: eq(variables['Build.Reason'], 'PullRequest') for ephemeral preview environments.
Skip Staging: or(succeeded(), failed('CD_Staging')) to allow production deploy even if staging skipped.
Time-Based: and(succeeded(), le(formatDateTime(pipeline.startTime, 'HH'), 16)) to prevent deployments after 4 PM (change freeze).

Parallel vs. Sequential:

Parallel: CD_Dev and CD_Test can run simultaneously (both depend only on CI_Stage).
Sequential: CD_Staging waits for both dev and test; CD_Production waits for staging.

Benefits:

Safety: Production never receives untested code.
Visibility: Azure DevOps pipeline view shows stage progression and blocks.
Flexibility: Conditions enable branch-specific or time-based deployment rules.

CI Stage: Lint, Build, Test¶

The CI Stage consists of three core template invocations that collectively validate code quality, compile the solution, and execute tests with coverage enforcement. These templates are consumed from the centralized ConnectSoft.AzurePipelines repository, ensuring consistency across all ATP microservices while allowing service-specific parameter overrides.

Each template is designed to fail fast — if linting finds violations, the build never runs; if the build fails, tests are skipped. This sequential dependency chain optimizes pipeline duration and provides clear failure attribution (style issue vs. compilation error vs. test failure).

Lint Steps (ConnectSoft Template)¶

The lint template performs pre-build validation to catch code quality and security issues before investing compute in compilation and testing.

Template Invocation:

- template: build/lint-microservice-steps.yaml@templates
  parameters:
    solution: $(solution)
    exactSolution: $(exactSolution)
    restoreVstsFeed: $(restoreVstsFeed)
    isNugetAuthenticateEnabled: true

Parameters:

solution: Glob pattern for solution file (e.g., **/*.sln or **/*.slnx). Allows template to locate solution in any directory structure.
exactSolution: Explicit solution filename (e.g., ConnectSoft.ATP.Ingestion.slnx). Used when multiple solutions exist in repo; template prefers exact match.
restoreVstsFeed: Azure Artifacts feed GUID for NuGet package authentication (e.g., e4c108b4-7989-4d22-93d6-391b77a39552/1889adca-ccb6-4ece-aa22-cad1ae4a35f3). Required to restore private ConnectSoft packages during analysis.
isNugetAuthenticateEnabled: Boolean (default: true). Enables NuGetAuthenticate@1 task to inject credentials for private feeds.

Lint Actions:

StyleCop Analyzers¶

Purpose: Enforce C# coding conventions and documentation standards across codebase.

Execution:

StyleCop analyzers run as part of dotnet build during lint phase (lightweight build for analysis only).
Rules configured via .editorconfig (repository root) and stylecop.json (per-project).
Violations categorized by severity:
- Error: Build-breaking (e.g., missing XML documentation on public APIs, inconsistent naming).
- Warning: Non-blocking but tracked (e.g., incorrect ordering of using statements, missing copyright headers).
Suppressions allowed via #pragma warning disable with justification comment (reviewed in PR).

Example Rule Violations:

SA1633: File must have header (copyright notice required for all source files).
SA1600: Elements must be documented (public types/members require XML comments).
SA1309: Field names must not begin with underscore (use _camelCase for private fields).

Rationale: Consistent code style reduces cognitive load during code review and maintenance. Documentation requirements ensure public APIs are self-explanatory. Early enforcement (lint stage) avoids wasting compute on non-compliant code.

SonarQube Analysis¶

Purpose: Detect code quality issues, security vulnerabilities, and technical debt.

Execution:

Prepare Phase (SonarQubePrepare@5):
- Connects to SonarQube server (SonarCloud or self-hosted instance).
- Initializes analysis context with project key, organization, and quality gate profile.
- Configures exclusions (test projects, generated code, third-party libraries).
Build Phase: Code analyzed during compilation (template invokes build after prepare).
Publish Phase (SonarQubePublish@5): Results uploaded to SonarQube server (occurs after tests in pipeline).

Analysis Metrics:

Bugs: Logic errors likely to cause runtime failures (e.g., null reference dereference, infinite loops).
Vulnerabilities: Security issues (SQL injection, XSS, hardcoded credentials, weak crypto).
Code Smells: Maintainability issues (duplicated code, complex methods, cognitive complexity).
Coverage: Integration with Coverlet to correlate test coverage with code quality.
Duplications: Percentage of duplicated blocks (ATP policy: < 3% duplication).

Quality Gate:

Blocking Conditions:
- Zero critical or high severity vulnerabilities.
- Maintainability rating ≥ B (technical debt ratio < 10%).
- Reliability rating ≥ B (bugs per line of code below threshold).
- Security rating ≥ B (no critical security hotspots).
Pipeline Behavior: Quality gate failure blocks CI stage; developers notified via Azure DevOps and email.

Rationale: SonarQube provides continuous code quality monitoring beyond simple linting. Vulnerability detection (SAST) catches security issues pre-deployment. Technical debt tracking prevents gradual code degradation.

Deprecated Packages Detection¶

Purpose: Identify NuGet packages that are deprecated, unlisted, or contain known vulnerabilities.

Execution:

Template scans packages.lock.json (if using lock files) or .csproj package references.
Queries NuGet.org API and Azure Artifacts feeds for package metadata:
- Deprecated flag: Package author marked as deprecated (usually with alternative recommendation).
- Vulnerability advisories: CVEs reported in GitHub Advisory Database or NVD.
- Unlisted status: Package removed from feed but still restorable (indicates potential issue).
Generates report with deprecated packages, alternative recommendations, and vulnerability severity.

Failure Conditions:

Critical vulnerabilities (CVSS ≥ 9.0): Pipeline fails immediately; package must be upgraded or removed.
High vulnerabilities (CVSS 7.0-8.9): Advisory-only in dev; blocking in staging/prod pipelines.
Deprecated packages: Warning (non-blocking) with timeline to migrate (30-day grace period).

Example Output:

DEPRECATED: Newtonsoft.Json 12.0.3
  Reason: Package is deprecated as a legacy package.
  Alternative: System.Text.Json (built-in .NET library)
  Severity: WARNING

VULNERABLE: System.Net.Http 4.3.3
  CVE: CVE-2021-26701 (CVSS 9.8)
  Description: .NET Core Remote Code Execution Vulnerability
  Fix: Upgrade to System.Net.Http 4.3.4 or later
  Severity: CRITICAL - PIPELINE BLOCKED

Rationale: Supply chain security is critical for ATP (audit trail platform cannot have vulnerable dependencies). Deprecated package detection prevents technical debt accumulation. Early warnings enable proactive upgrades before packages become unsupported.

Docker Compose Cleanup¶

Purpose: Remove lingering containers, networks, and volumes from previous test runs to ensure clean pipeline state.

Execution:

Template runs docker-compose down --volumes --remove-orphans if docker-compose.yml exists in repository.
Also removes dangling images and networks: docker system prune --force.
Ensures service containers (Redis, SQL, RabbitMQ) from previous runs don't interfere with current build.

When Needed:

Self-hosted agents: Containers persist between pipeline runs (shared agent).
Microsoft-hosted agents: Less critical (ephemeral VMs) but still best practice.
Local development: Developers can replicate this cleanup step to reset test environment.

Rationale: Port conflicts and stale data in containers cause flaky test failures. Cleanup ensures every pipeline run starts with identical state (idempotency). Zero cost (< 5 seconds execution time).

Failure Handling:

Cleanup failures (e.g., Docker daemon unavailable) logged as warnings, not errors.
Pipeline continues unless containers are actually needed for tests (detected in test stage).

Build Steps (ConnectSoft Template)¶

The build template compiles the .NET solution into release binaries with deterministic versioning and artifact preparation.

Template Invocation:

- template: build/build-microservice-steps.yaml@templates
  parameters:
    solution: $(solution)
    exactSolution: $(exactSolution)
    buildConfiguration: $(buildConfiguration)

Parameters:

solution: Glob pattern for solution file (same as lint template).
exactSolution: Explicit solution filename (same as lint template).
buildConfiguration: Build configuration (e.g., Release, Debug). ATP always uses Release for pipeline builds.

Build Actions:

dotnet restore with Azure Artifacts Feed Authentication¶

Purpose: Download NuGet package dependencies from public and private feeds before compilation.

Execution:

Feed Authentication:
- Template invokes NuGetAuthenticate@1 task to inject credentials for Azure Artifacts feed.
- Uses Azure DevOps service connection (configured in project settings).
- Credentials stored in temporary NuGet.config (scoped to pipeline run).
Restore Command:
- Runs dotnet restore $(solution) --locked-mode (if using packages.lock.json).
- Locked mode ensures reproducible builds (fails if lock file doesn't match .csproj).
- Downloads packages to global cache (~/.nuget/packages or %USERPROFILE%\.nuget\packages).
Private Packages:
- ConnectSoft shared libraries (e.g., ConnectSoft.Core, ConnectSoft.Messaging, ConnectSoft.Observability).
- ATP contracts and domain models (e.g., ConnectSoft.ATP.Contracts).
Public Packages:
- Third-party dependencies (e.g., MassTransit, NHibernate, Serilog, OpenTelemetry).

Lock File Benefits:

Reproducibility: Same lock file = same packages = same build output (critical for audit trail).
Security: Lock file records exact package versions and hashes; detects tampering.
Performance: Restore skips dependency resolution if lock file valid (faster CI).

Failure Scenarios:

Feed authentication failure: Check service connection credentials; verify feed permissions.
Package not found: Verify package published to feed; check feed URL in NuGet.config.
Lock file mismatch: Regenerate lock file locally (dotnet restore --force-evaluate) and commit.

dotnet build (Release Configuration, Deterministic Builds)¶

Purpose: Compile C# source code into IL assemblies with optimizations and deterministic output.

Execution:

Build Command:
- Runs dotnet build $(solution) --configuration Release --no-restore.
- --no-restore flag skips restore (already done in previous step).
- --configuration Release enables compiler optimizations (inlining, dead code elimination, JIT hints).
Deterministic Builds:
- .csproj includes <Deterministic>true</Deterministic> and <ContinuousIntegrationBuild>true</ContinuousIntegrationBuild>.
- Compiler embeds normalized paths (no local file system paths in binaries).
- Result: Same source + same dependencies + same compiler = byte-for-byte identical binary.
Output Directory:
- Binaries written to bin/Release/net8.0/ (or target framework version).
- Includes main assembly, dependencies, appsettings.json, and native binaries (if any).

Release vs. Debug:

Release: Optimizations enabled, no debug symbols (PDB separate), smaller binaries, faster runtime.
Debug: No optimizations, symbols embedded, larger binaries, easier debugging (locals visible).
ATP pipelines always use Release for deployments; Debug reserved for local development.

Failure Scenarios:

Compilation errors: Syntax errors, type mismatches, missing references (developer fixes code).
Out-of-memory: Large solutions or complex generics; increase agent memory or split solution.
Deterministic build failure: Local paths in source (use relative paths); timestamp embedding (disable).

Version Stamping¶

Purpose: Embed build version into assembly metadata for traceability and audit trail.

Execution:

Version Format: $(majorMinorVersion).$(semanticVersion)
- Example: 1.0.42 (major.minor.patch).
- majorMinorVersion manually set in pipeline variables (e.g., 1.0).
- semanticVersion auto-incremented counter (e.g., $[counter(variables['majorMinorVersion'], 0)]).
Assembly Attributes:
- AssemblyVersion: 1.0.42.0 (4-part version for .NET compatibility).
- AssemblyFileVersion: 1.0.42.0 (displayed in file properties).
- AssemblyInformationalVersion: 1.0.42+sha.abc123 (includes commit SHA for full traceability).
Injection Method:
- Template generates Version.props file at build time with <Version>$(Build.BuildNumber)</Version>.
- All .csproj files import Version.props via <Import Project="Version.props" />.

Traceability Benefits:

Runtime → Pipeline: Service emits buildVersion in telemetry; correlate with pipeline run ID.
Binary → Source: AssemblyInformationalVersion includes commit SHA; link back to Git history.
Audit Trail: Compliance audits can trace production issue to specific build and code change.

Example Assembly Info:

AssemblyVersion: 1.0.42.0
AssemblyFileVersion: 1.0.42.0
AssemblyInformationalVersion: 1.0.42+sha.a1b2c3d4e5f6

Test Steps (ConnectSoft Template)¶

The test template executes unit and integration tests with service containers, collects code coverage, and enforces quality thresholds.

Template Invocation:

- template: test/test-microservice-steps.yaml@templates
  parameters:
    solution: $(solution)
    runSettingsFileName: $(runSettingsFileName)
    buildConfiguration: $(buildConfiguration)
    codeCoverageThreshold: 70

Parameters:

solution: Glob pattern for solution file (same as previous templates).
runSettingsFileName: Name of .runsettings file for test configuration (e.g., ConnectSoft.ATP.Ingestion.runsettings).
buildConfiguration: Build configuration (same as build template, typically Release).
codeCoverageThreshold: Minimum line coverage percentage (ATP default: 70%; varies by service).

Test Actions:

Service Containers¶

Purpose: Provide real infrastructure dependencies (Redis, SQL Server, MongoDB, RabbitMQ, etc.) for integration tests.

Execution:

Container Definitions:
- Service pipelines define containers in resources.containers section (top of azure-pipelines.yml).
- Template references these containers in job's services section.
Available Services:
- Redis: redis:7-alpine on port 6379 (distributed caching tests).
- SQL Server: mcr.microsoft.com/mssql/server:2022-latest on port 1433 (repository tests).
- MongoDB: mongo:7 on port 27017 (document storage tests).
- RabbitMQ: rabbitmq:3-management-alpine on ports 5672, 15672 (message bus tests).
- Seq: datalust/seq:latest on port 5341 (structured logging tests).
OTEL Collector: otel/opentelemetry-collector:0.97.0 on ports 4317, 8888 (telemetry tests).
Connection Strings:
- Tests use localhost endpoints (e.g., localhost:6379 for Redis).
- .runsettings or test configuration file provides connection strings as environment variables.

Container Lifecycle:

Azure Pipelines starts containers before job begins.
Containers run in parallel (isolated network per pipeline run).
Tests execute against running services.
Containers automatically stopped and removed after job completes.

Benefits:

Real Integration Testing: Tests run against actual Redis/SQL/RabbitMQ, not mocks.
Production Parity: Container images match production infrastructure (same Redis version).
Isolation: Each pipeline run gets fresh containers (no data contamination).

Alternatives:

Testcontainers: Library that spins up containers programmatically in test code (more flexible but slower startup).
In-Memory Mocks: Faster but doesn't catch real-world integration issues (e.g., serialization, locking, timeouts).

Runsettings Configuration¶

Purpose: Configure test execution behavior (parallelization, timeout, data collectors, environment variables).

Execution:

Template passes .runsettings file to dotnet test via --settings flag.
Runsettings file (XML) defines:
- Test Parallelization: MaxCpuCount=0 (use all available cores).
- Timeout: TestSessionTimeout=600000 (10 minutes max per test assembly).
- Data Collectors: Coverlet (code coverage), crash dumps, event logs.
- Environment Variables: Connection strings, feature flags, log levels.

Example Runsettings (ConnectSoft.ATP.Ingestion.runsettings):

<?xml version="1.0" encoding="utf-8"?>
<RunSettings>
  <RunConfiguration>
    <MaxCpuCount>0</MaxCpuCount>
    <TestSessionTimeout>600000</TestSessionTimeout>
  </RunConfiguration>
  <DataCollectionRunSettings>
    <DataCollectors>
      <DataCollector friendlyName="XPlat Code Coverage">
        <Configuration>
          <Format>cobertura</Format>
          <Exclude>[*Tests]*,[*Migrations]*</Exclude>
        </Configuration>
      </DataCollector>
    </DataCollectors>
  </DataCollectionRunSettings>
  <TestRunParameters>
    <Parameter name="RedisConnectionString" value="localhost:6379" />
    <Parameter name="SqlConnectionString" value="Server=localhost;Database=ATPTest;User=sa;Password=P@ssw0rd123!" />
    <Parameter name="RabbitMqConnectionString" value="amqp://localhost:5672" />
  </TestRunParameters>
</RunSettings>

Parallelization Benefits:

Faster Execution: Tests run concurrently across CPU cores (8-core agent = ~8x speedup for independent tests).
Resource Utilization: Maximizes agent compute usage (pay for performance).
Flaky Test Detection: Parallel execution exposes race conditions and shared state issues.

Timeout Purpose:

Prevents hung tests from blocking pipeline indefinitely.
10-minute timeout per assembly (if exceeded, assembly marked as failed).
Helps identify slow tests needing optimization or categorization as [Slow].

Coverage: Generate Cobertura Reports¶

Purpose: Measure code coverage (lines and branches exercised by tests) and generate reports for trend analysis.

Execution:

Data Collection:
- Coverlet collector instruments assemblies during test execution.
- Tracks which lines and branches are hit by test runner.
- Generates coverage.cobertura.xml report (industry-standard format).
Coverage Publishing:
- Template invokes PublishCodeCoverageResults@1 task to upload report to Azure DevOps.
- Coverage metrics displayed in pipeline summary and Test Plans.
- Historical trends tracked (coverage increasing/decreasing over time).
Threshold Enforcement:
- Template compares coverage percentage to codeCoverageThreshold parameter.
- If coverage < threshold, pipeline fails with clear error message.
- Exclusions: Generated code (*.Designer.cs, Migrations/*.cs), test projects.

Coverage Metrics:

Line Coverage: Percentage of executable lines executed by tests (ATP target: ≥70%).
Branch Coverage: Percentage of conditional branches taken by tests (ATP target: ≥60%).
Method Coverage: Percentage of methods called by tests (informational, not enforced).

Enforce ≥70% Threshold:

Policy Rationale: 70% balances safety and pragmatism (diminishing returns above 80%).
Service-Specific Overrides: Critical services (Integrity, Query) have higher thresholds (75-85%).
Exemptions: Infrastructure code (middleware, filters) and trivial properties may be excluded with justification.

Failure Scenario:

If coverage drops below threshold, pipeline fails with message:

Code coverage 68% is below required threshold 70%.
Add tests to increase coverage or request exception from Platform team.

Test Results: Publish to Azure DevOps Test Plans¶

Purpose: Publish test results (pass/fail/skip) to Azure DevOps for reporting, trend analysis, and flaky test detection.

Execution:

Test Result Files:
- dotnet test generates .trx files (Visual Studio Test Results format).
- Template publishes .trx files using PublishTestResults@2 task.
Test Plans Integration:
- Results appear in Azure DevOps Test Plans section.
- Test Analytics: Pass rate, duration trends, flaky test detection.
- Test Explorer: Interactive UI to view failed tests, stack traces, logs.
Trend Analysis:
- Pass Rate Trends: Alert if pass rate drops below 95%.
- Duration Trends: Alert if test duration increases > 20% (performance regression).
- Flaky Test Detection: Tests failing intermittently flagged for investigation.

Flaky Test Handling:

ATP policy: Zero tolerance for flaky tests.
Flaky tests disabled with [Ignore] attribute and tracked in backlog for fixing.
Pipeline retries not allowed (masking flakiness leads to production issues).

Benefits:

Visibility: Stakeholders see test health without reading pipeline logs.
Debugging: Failed test logs and screenshots (for UI tests) attached to test results.
Compliance: Test execution records serve as evidence for SOC 2 attestations (automated testing control).

Security & Compliance Scanning¶

Security scanning is embedded throughout the CI pipeline to detect vulnerabilities, secrets, and compliance violations before artifacts are published. ATP's multi-layered security approach combines SAST (static code analysis), dependency scanning (supply chain vulnerabilities), secrets detection (credential leaks), and SBOM generation (transparency and compliance).

This defense-in-depth strategy ensures that security issues are caught early (shift-left security), compliance artifacts are automatically generated, and audit trails are maintained for regulatory requirements. Every build produces security scan reports that serve as evidence for SOC 2, ISO 27001, and HIPAA attestations.

SAST (Static Application Security Testing)¶

SAST analyzes source code for security vulnerabilities, code quality issues, and maintainability problems without executing the program. ATP uses SonarQube as its primary SAST tool, providing comprehensive coverage of OWASP Top 10 vulnerabilities and secure coding best practices.

SonarQube Integration¶

Platform: SonarCloud (SaaS) or self-hosted SonarQube Enterprise instance.

Integration Points:

Prepare Phase (before build):
- Azure Pipelines task SonarQubePrepare@5 initializes analysis context.
- Configures project key, organization, and quality gate profile.
Analysis Phase (during build):
- Roslyn analyzers instrument code during compilation.
- Metrics collected: lines of code, complexity, duplications, code smells.
Publish Phase (after tests):
- Azure Pipelines task SonarQubePublish@5 uploads results to SonarQube server.
- Quality gate evaluation performed server-side.
Quality Gate Check:
- Pipeline polls SonarQube API for quality gate status.
- If quality gate fails, pipeline blocked; developers notified.

Configuration Example:

- task: SonarQubePrepare@5
  displayName: 'Prepare SonarQube Analysis'
  inputs:
    SonarQube: 'ConnectSoft-SonarCloud'  # Service connection
    scannerMode: 'MSBuild'
    projectKey: 'connectsoft_atp-ingestion'
    projectName: 'ATP Ingestion Service'
    projectVersion: '$(Build.BuildNumber)'
    extraProperties: |
      sonar.cs.opencover.reportsPaths=$(Agent.TempDirectory)/**/coverage.opencover.xml
      sonar.exclusions=**/Migrations/**,**/obj/**,**/bin/**
      sonar.coverage.exclusions=**/*Tests*.cs

# Build and test steps here...

- task: SonarQubePublish@5
  displayName: 'Publish SonarQube Results'
  inputs:
    pollingTimeoutSec: '300'

- task: SonarQubeGate@1
  displayName: 'Check SonarQube Quality Gate'
  inputs:
    SonarQube: 'ConnectSoft-SonarCloud'

Quality Gates¶

Quality Gate Definition: A set of conditions that must be satisfied for code to pass security and quality standards.

ATP Quality Gate (ConnectSoft-Default):

Metric	Condition	Rationale
Vulnerabilities	= 0 critical or high	No security vulnerabilities allowed in production code
Security Hotspots	≥ 80% reviewed	Security-sensitive code must be manually reviewed
Bugs	= 0 critical or high	Critical bugs indicate logic errors that could cause failures
Code Smells	Maintainability Rating ≥ B	Technical debt ratio < 10%
Duplications	< 3%	Avoid copy-paste code; promote reusability
Coverage on New Code	≥ 80%	New code must be well-tested

Blocking Behavior:

Quality Gate Fails: Pipeline blocked at CI stage; no artifacts published.
Developer Notification: Email + Azure DevOps notification with detailed issue report.
Remediation: Developer fixes issues, commits changes, pipeline re-runs.
Exception Process: Security team can grant temporary exemption with justification and remediation plan.

SonarQube vs. Local Analyzers:

Local Analyzers (StyleCop, Roslyn): Fast feedback during development (seconds).
SonarQube: Comprehensive analysis (minutes); server-side quality gate enforcement; historical trends.

Security Rules Coverage¶

SonarQube detects a wide range of security vulnerabilities aligned with OWASP Top 10 and CWE (Common Weakness Enumeration):

Injection Vulnerabilities:

SQL Injection: Unsanitized user input in SQL queries (use parameterized queries or ORM).
Command Injection: User input passed to shell commands (validate and sanitize).
LDAP Injection: Unsanitized input in LDAP queries (use LDAP encoding).

Example Detection:

// BAD: SQL Injection vulnerability
var query = $"SELECT * FROM Users WHERE Username = '{username}'";

// GOOD: Parameterized query
var query = "SELECT * FROM Users WHERE Username = @username";

Cross-Site Scripting (XSS):

Reflected XSS: User input echoed in response without encoding.
Stored XSS: User input persisted and rendered without sanitization.
DOM-based XSS: Client-side JavaScript manipulates DOM with untrusted data.

Example Detection:

// BAD: XSS vulnerability
ViewBag.Message = userInput;  // Rendered as-is in Razor

// GOOD: Encoded output
ViewBag.Message = Html.Encode(userInput);

Secrets in Code:

Hardcoded Passwords: Credentials embedded in source code.
API Keys: Cloud provider or third-party service keys in code.
Private Keys: RSA/SSH keys committed to repository.

Example Detection:

// BAD: Hardcoded secret
var apiKey = "sk_live_51H1rT2eSeHu1rT2e";

// GOOD: Load from configuration
var apiKey = configuration["ApiKeys:Stripe"];

Insecure Cryptography:

Weak Algorithms: MD5, SHA1, DES (use SHA256, AES-256).
Hardcoded Keys: Encryption keys in source code (use Key Vault).
Predictable Random Numbers: Random for crypto (use RandomNumberGenerator).

Example Detection:

// BAD: Weak hash algorithm
var hash = MD5.Create().ComputeHash(data);

// GOOD: Strong hash algorithm
var hash = SHA256.Create().ComputeHash(data);

Insecure Deserialization:

Untrusted Data: Deserializing user input without validation.
Type Confusion: Deserializers allowing arbitrary type instantiation.

Example Detection:

// BAD: Insecure deserialization
var obj = JsonConvert.DeserializeObject(userInput);

// GOOD: Deserialize to specific type with validation
var obj = JsonConvert.DeserializeObject<SafeDto>(userInput);

Security Hotspots (Require Manual Review):

File system access (path traversal risks).
Network communication (unencrypted channels).
Regular expressions (ReDoS vulnerabilities).
Authentication/authorization logic (bypass risks).

Dependency Scanning¶

Dependency scanning identifies known vulnerabilities (CVEs) in third-party NuGet packages, addressing supply chain security risks. ATP uses multiple tools for comprehensive coverage.

OWASP Dependency-Check¶

Purpose: Scan project dependencies against National Vulnerability Database (NVD) for known CVEs.

Execution:

Tool: OWASP Dependency-Check CLI or Azure DevOps extension.
Scan Scope: All NuGet packages in packages.lock.json or .csproj.
Data Source: NVD, OSS Index, GitHub Advisory Database.
Report Format: HTML, JSON, XML with CVE IDs, CVSS scores, remediation advice.

Integration Example:

- task: dependency-check-build-task@6
  displayName: 'OWASP Dependency Check'
  inputs:
    projectName: '$(Build.Repository.Name)'
    scanPath: '$(Build.SourcesDirectory)'
    format: 'HTML,JSON'
    failOnCVSS: 9  # Fail if critical vulnerability (CVSS ≥ 9.0)
    suppressionPath: 'dependency-check-suppressions.xml'

- task: PublishPipelineArtifact@1
  displayName: 'Publish Dependency Check Report'
  inputs:
    targetPath: '$(Build.SourcesDirectory)/dependency-check-report.html'
    artifact: 'dependency-check-report'

Vulnerability Severity Classification:

CVSS Score	Severity	ATP Policy
9.0 - 10.0	Critical	Block pipeline; must upgrade or remove package immediately
7.0 - 8.9	High	Block pipeline in staging/prod; advisory in dev
4.0 - 6.9	Medium	Advisory (logged for review); fix within 30 days
0.1 - 3.9	Low	Informational; fix in next maintenance cycle

Suppression Mechanism:

False Positives: Some CVEs don't apply to ATP's usage (e.g., vulnerability in unused package feature).
Suppressions File: dependency-check-suppressions.xml lists CVEs with justification.
Review Cadence: Security team reviews suppressions quarterly; expired suppressions auto-fail.

Example Suppression:

<suppressions>
  <suppress>
    <notes>CVE-2021-12345: Vulnerability in JSON parsing, but ATP uses custom parser</notes>
    <cve>CVE-2021-12345</cve>
    <until>2025-12-31</until>
  </suppress>
</suppressions>

Whitesource Bolt / Snyk¶

Purpose: Continuous vulnerability monitoring with real-time alerts and automated pull requests for patches.

Whitesource Bolt (Azure DevOps Marketplace):

Real-Time Scanning: Scans on every build; checks for new CVEs daily.
Policy Enforcement: Block builds with critical/high vulnerabilities.
License Compliance: Detect incompatible open-source licenses (GPL, AGPL in proprietary code).
Automated Remediation: Suggests package upgrades to fix vulnerabilities.

Snyk (Alternative):

Integration: Snyk Azure Pipelines extension or CLI.
Features: CVE detection, license scanning, Docker image scanning, Kubernetes manifest scanning.
Developer Experience: IDE plugins provide feedback during coding.

Integration Example (Snyk):

- task: SnykSecurityScan@1
  displayName: 'Snyk Vulnerability Scan'
  inputs:
    serviceConnectionEndpoint: 'Snyk-ServiceConnection'
    testType: 'app'
    severityThreshold: 'high'
    monitorWhen: 'always'
    failOnIssues: true
    projectName: '$(Build.Repository.Name)'
    organization: 'connectsoft'

Benefits Over OWASP Dependency-Check:

Faster Updates: Snyk/Whitesource databases updated more frequently than NVD (hours vs. days).
Remediation Guidance: Specific package version recommendations.
License Compliance: Detect GPL/AGPL violations automatically.

Policy: Block Critical Vulnerabilities¶

ATP Policy:

Critical vulnerabilities (CVSS ≥ 9.0): Pipeline blocked; emergency fix required within 24 hours.
High vulnerabilities (CVSS 7.0-8.9): Pipeline blocked in staging/prod; dev builds advisory-only.
Remediation Options:
1. Upgrade Package: Update to patched version.
2. Remove Package: If no patch available, remove dependency or find alternative.
3. Request Exception: Security team evaluates risk; grants temporary exemption with mitigation plan.

Remediation SLA:

Critical: 24 hours (hotfix deployment).
High: 7 days (next regular release).
Medium: 30 days (quarterly maintenance).
Low: 90 days (best-effort).

Secrets Detection¶

Secrets detection prevents accidental exposure of credentials (API keys, passwords, tokens, private keys) in source code, configuration files, or commit history.

GitHub Advanced Security (GHAS) / GitGuardian¶

Tool Options:

GitHub Advanced Security: Built-in for GitHub repositories; scans commits, pull requests, and pushes.
GitGuardian: Third-party SaaS; supports Azure Repos, GitHub, GitLab, Bitbucket.
Azure DevOps Credential Scanner: Microsoft's tool for Azure DevOps pipelines.

Detection Patterns:

Cloud Provider Keys: AWS Access Keys, Azure Storage Keys, GCP Service Account Keys.
API Tokens: Stripe, Twilio, SendGrid, Slack, GitHub Personal Access Tokens.
Database Credentials: Connection strings with plaintext passwords.
SSH/TLS Keys: Private keys (RSA, ECDSA, Ed25519).
JWT Tokens: Hardcoded authentication tokens.

Integration Example (GitGuardian):

- task: CmdLine@2
  displayName: 'GitGuardian Secret Scan'
  inputs:
    script: |
      docker run -v $(Build.SourcesDirectory):/scan gitguardian/ggshield:latest \
        ggshield secret scan path /scan \
        --exit-zero=false \
        --show-secrets
  env:
    GITGUARDIAN_API_KEY: $(GitGuardianApiKey)

Azure DevOps Credential Scanner Example:

- task: CredScan@3
  displayName: 'Credential Scanner'
  inputs:
    toolMajorVersion: 'V2'
    scanFolder: '$(Build.SourcesDirectory)'
    suppressionsFile: 'CredScanSuppressions.json'

Fail Build if Secrets Detected¶

Pipeline Behavior:

Secrets Found: Pipeline fails immediately; no artifacts published.
Alert Notification: Security team and developer notified via email + PagerDuty.
Incident Response:
1. Rotate Secret: Immediately invalidate exposed credential; generate new one.
2. Remove from History: Use git filter-repo or BFG Repo-Cleaner to purge secret from Git history.
3. Audit Access: Check logs for unauthorized access using exposed credential.
4. Post-Incident Review: Document lessons learned; improve secret management practices.

Secret Rotation Process:

Generate New Secret: Create new API key/password in service provider console.
Update Key Vault: Store new secret in Azure Key Vault with rotation timestamp.
Update References: Change pipeline variables and app configuration to reference new Key Vault secret.
Revoke Old Secret: Delete old secret from service provider; confirm no errors in production.
Document Incident: Log in security incident register for audit trail.

Prevention Strategies:

Pre-Commit Hooks: Run secret scanners locally (e.g., git-secrets, detect-secrets) before push.
Developer Training: Educate on secret management best practices (never commit secrets).
Key Vault Integration: All secrets loaded from Azure Key Vault at runtime; no secrets in code/config.

Require Secret Rotation¶

ATP Policy:

Exposed Secret: Must be rotated within 1 hour (critical incident).
Regular Rotation: All production secrets rotated every 90 days (proactive security).
Rotation Audit: Security team verifies rotation logs monthly; alerts on missed rotations.

Automated Rotation (Azure Key Vault):

Managed Identities: Services use Azure Managed Identity (no secrets needed).
Key Vault Autorotation: Azure Key Vault automatically rotates secrets (SQL passwords, storage keys).
Event Grid Integration: Rotation events trigger webhook to update services.

SBOM Generation¶

SBOM (Software Bill of Materials) is a comprehensive inventory of all software components (libraries, frameworks, tools) used in ATP services. SBOMs enable vulnerability tracking, license compliance, and supply chain transparency.

CycloneDX Format¶

Standard: CycloneDX is an OWASP project providing machine-readable SBOM format (JSON, XML).

SBOM Contents:

Components: All NuGet packages with name, version, publisher, license, hashes (SHA1, SHA256).
Dependencies: Dependency graph showing relationships between packages.
Vulnerabilities: Known CVEs associated with each component (from NVD).
Metadata: Build timestamp, pipeline run ID, commit SHA, build environment.

Generation Tool:

CycloneDX CLI: dotnet CycloneDX (NuGet global tool).
SBOM Tool: Microsoft sbom-tool (supports SPDX and CycloneDX).

Integration Example:

- task: CmdLine@2
  displayName: 'Generate SBOM (CycloneDX)'
  inputs:
    script: |
      dotnet tool install --global CycloneDX
      dotnet CycloneDX $(Build.SourcesDirectory) -o $(Build.ArtifactStagingDirectory) -f json -sv $(Build.BuildNumber)

- task: PublishPipelineArtifact@1
  displayName: 'Publish SBOM Artifact'
  inputs:
    targetPath: '$(Build.ArtifactStagingDirectory)/bom.json'
    artifact: 'sbom'
    publishLocation: 'pipeline'

Alternative: Microsoft SBOM Tool:

- task: Bash@3
  displayName: 'Generate SBOM (SPDX)'
  inputs:
    targetType: 'inline'
    script: |
      curl -Lo sbom-tool https://github.com/microsoft/sbom-tool/releases/latest/download/sbom-tool-linux-x64
      chmod +x sbom-tool
      ./sbom-tool generate -b $(Build.SourcesDirectory) -bc $(Build.SourcesDirectory) -pn ATP-Ingestion -pv $(Build.BuildNumber) -ps ConnectSoft -nsb https://connectsoft.com

Publish SBOM as Build Artifact¶

Artifact Publishing:

SBOM File: bom.json or bom.xml published as pipeline artifact.
Artifact Name: sbom-{serviceName}-{buildNumber} (e.g., sbom-ingestion-1.0.42).
Retention: 7 years (regulatory requirement for audit evidence).

SBOM Storage:

Azure Artifacts: SBOMs stored in dedicated feed for compliance team access.
Azure Blob Storage: Long-term retention with WORM policy (Write Once, Read Many).
SBOM Database: Indexed in compliance database for vulnerability tracking.

SBOM Use Cases:

Vulnerability Tracking:
- When new CVE announced, query SBOM database to identify affected services.
- Example: Log4Shell vulnerability → SBOM query finds no services use Log4j → no action needed.
License Compliance:
- Audit SBOMs to ensure no GPL/AGPL packages in proprietary code.
- Report to legal team for license compatibility review.
Incident Investigation:
- Production issue traced to specific library version.
- SBOM correlates library version with build number and commit SHA.
Regulatory Audits:
- Auditors request proof of software components and versions.
- SBOM provides immutable, timestamped evidence of what was built and deployed.
Supply Chain Transparency:
- Customers request SBOMs for their own compliance (e.g., government contracts).
- ATP publishes redacted SBOMs (remove internal details) to customer portal.

SBOM Verification:

Hash Verification: SBOM includes SHA256 hashes of each component; verify integrity during deployment.
Signature: SBOM signed with ConnectSoft code-signing certificate (prevents tampering).
Provenance: SBOM includes pipeline run ID and commit SHA (full traceability).

See Also: Detailed SBOM schemas, retention policies, and compliance artifact procedures in platform/security-compliance.md.

Artifact Publishing & Versioning¶

Artifact publishing is the final step in the CI stage, where validated and scanned code is packaged into immutable, versioned artifacts ready for deployment. ATP produces multiple artifact types — NuGet packages (shared libraries), Docker images (containerized services), binaries (compiled code), and compliance artifacts (SBOM, test results, security reports) — each with consistent versioning and provenance tracking.

The versioning strategy ensures traceability (every artifact correlates to a specific commit and pipeline run), immutability (artifacts never overwritten), and auditability (full chain of custody from source to production). Artifacts are published to centralized repositories (Azure Artifacts, Azure Container Registry) with retention policies aligned with regulatory requirements.

Semantic Versioning¶

ATP follows semantic versioning (SemVer) with a major.minor.patch format, where the patch number is auto-incremented per build and major/minor versions are manually controlled for release management.

Version Schema¶

Format: <major>.<minor>.<patch>[-<pre-release>][+<build-metadata>]

Components:

Major: Breaking changes, major new features, architectural shifts (manually incremented).
Minor: Backward-compatible features, enhancements (manually incremented).
Patch: Bug fixes, security patches, non-breaking improvements (auto-incremented).
Pre-release: Alpha, beta, release candidate tags (e.g., -alpha.1, -beta.3, -rc.1).
Build Metadata: Commit SHA, build timestamp (e.g., +sha.a1b2c3d4).

Examples:

1.0.0: First production release.
1.0.42: Patch release (42^nd build of version 1.0.x).
1.1.0-beta.2: Second beta of version 1.1.0.
2.0.0: Major version with breaking changes.

Pipeline Configuration¶

Variable Definitions:

variables:
  majorMinorVersion: 1.0
  semanticVersion: $[counter(variables['majorMinorVersion'], 0)]
  buildNumber: $(majorMinorVersion).$(semanticVersion)

How It Works:

majorMinorVersion: Manually set in pipeline variables (e.g., 1.0, 1.1, 2.0).
semanticVersion: Azure DevOps counter() function auto-increments for each build.
- Counter scoped to majorMinorVersion (resets when major/minor changes).
- Example: majorMinorVersion: 1.0 → builds get 1.0.0, 1.0.1, 1.0.2, etc.
buildNumber: Composed version used as $(Build.BuildNumber) throughout pipeline.

Counter Behavior:

First Build: majorMinorVersion: 1.0 → semanticVersion: 0 → buildNumber: 1.0.0.
Second Build: majorMinorVersion: 1.0 → semanticVersion: 1 → buildNumber: 1.0.1.
Version Bump: Change majorMinorVersion: 1.1 → semanticVersion: 0 (counter resets) → buildNumber: 1.1.0.

Setting Build Number:

name: $(majorMinorVersion).$(semanticVersion)

This sets the pipeline run name to the version number, making it visible in Azure DevOps build history.

Manual Version Bumps¶

When to Bump Major Version:

Breaking API changes (remove endpoints, change contracts).
Database schema changes requiring migrations.
Major architectural shifts (e.g., migrate from monolith to microservices).

When to Bump Minor Version:

New features (backward-compatible API additions).
Significant enhancements (performance improvements, new query capabilities).
Quarterly releases with accumulated changes.

How to Bump:

Update Pipeline Variable:
- Navigate to Azure DevOps → Pipelines → Edit → Variables.
- Change majorMinorVersion from 1.0 to 1.1 (or 2.0 for major bump).
Commit Tag (optional but recommended):
- Tag commit with version: git tag v1.1.0 && git push origin v1.1.0.
- Provides Git-level version history.
Release Notes: Update CHANGELOG.md with version changes and migration notes.

Pre-release Tags¶

Purpose: Identify non-production builds (alpha, beta, release candidate) to prevent accidental deployment.

Implementation:

variables:
  majorMinorVersion: 1.1
  semanticVersion: $[counter(variables['majorMinorVersion'], 0)]
  preReleaseTag: 'beta'  # Set to '' for production builds
  buildNumber: $(majorMinorVersion).$(semanticVersion)$(preReleaseTag)

Tag Semantics:

-alpha: Early development; unstable; frequent breaking changes.
-beta: Feature-complete; undergoing testing; may have bugs.
-rc (release candidate): Production-ready; final validation before release.

Deployment Rules:

Alpha/beta builds never deployed to production.
Release candidates deployed to staging for validation; promoted to production if stable.
Pre-release tags visible in artifact metadata and Azure DevOps dashboards.

Example Build Numbers:

1.1.0-alpha.1: First alpha build of version 1.1.0.
1.1.0-beta.5: Fifth beta build of version 1.1.0.
1.1.0-rc.2: Second release candidate of version 1.1.0.
1.1.0: Production release (no pre-release tag).

Version Stamping in Code¶

Assembly Attributes:

Pipelines inject version into .NET assembly metadata during build:

- script: |
    echo "<Project><PropertyGroup><Version>$(Build.BuildNumber)</Version></PropertyGroup></Project>" > Version.props
  displayName: 'Generate Version.props'

.csproj Import:

<Project Sdk="Microsoft.NET.Sdk">
  <Import Project="$(MSBuildThisFileDirectory)../../Version.props" />
  <!-- ... project configuration ... -->
</Project>

Result:

AssemblyVersion: 1.0.42.0 (4-part version for .NET compatibility).
AssemblyFileVersion: 1.0.42.0 (Windows file properties).
AssemblyInformationalVersion: 1.0.42+sha.a1b2c3d4 (includes commit SHA).

Runtime Access:

var version = Assembly.GetExecutingAssembly()
    .GetCustomAttribute<AssemblyInformationalVersionAttribute>()
    .InformationalVersion;
// Output: "1.0.42+sha.a1b2c3d4e5f6"

Services emit version in telemetry, logs, and /health endpoint for traceability.

NuGet Packages¶

ATP publishes shared libraries and domain contracts as NuGet packages for consumption by other services. Packages are versioned consistently with the pipeline build number and published to Azure Artifacts.

Package Creation (dotnet pack)¶

Purpose: Package compiled binaries and metadata into .nupkg files for distribution.

Execution:

- task: DotNetCoreCLI@2
  displayName: 'Pack NuGet Packages'
  inputs:
    command: 'pack'
    packagesToPack: '**/ConnectSoft.ATP.Contracts.csproj'
    configuration: $(buildConfiguration)
    nobuild: true  # Use existing binaries from build step
    versioningScheme: 'byEnvVar'
    versionEnvVar: 'Build.BuildNumber'
    includeSymbols: true
    includeSource: false

Parameters:

packagesToPack: Glob pattern for .csproj files to pack (e.g., contracts, shared libraries).
nobuild: true: Reuse binaries from build step (avoid rebuilding).
versioningScheme: byEnvVar: Use pipeline variable for version.
versionEnvVar: Specifies Build.BuildNumber as version source.
includeSymbols: true: Generate symbol package (.snupkg) for debugging.
includeSource: false: Exclude source code (security best practice).

Output:

Package: ConnectSoft.ATP.Contracts.1.0.42.nupkg (compiled DLLs + metadata).
Symbol Package: ConnectSoft.ATP.Contracts.1.0.42.snupkg (PDB files for debugging).

Package Metadata (from .csproj):

<PropertyGroup>
  <PackageId>ConnectSoft.ATP.Contracts</PackageId>
  <Authors>ConnectSoft Platform Team</Authors>
  <Company>ConnectSoft</Company>
  <Description>Shared contracts and DTOs for ATP services</Description>
  <PackageLicenseExpression>Proprietary</PackageLicenseExpression>
  <RepositoryUrl>https://dev.azure.com/dmitrykhaymov/ATP/_git/Contracts</RepositoryUrl>
  <PackageTags>atp;audit;contracts</PackageTags>
</PropertyGroup>

Push to Azure Artifacts Feed¶

Purpose: Publish NuGet packages to centralized feed for consumption by other services and projects.

Execution:

- task: NuGetCommand@2
  displayName: 'Push NuGet Packages'
  inputs:
    command: 'push'
    packagesToPush: '$(Build.ArtifactStagingDirectory)/**/*.nupkg'
    nuGetFeedType: 'internal'
    publishVstsFeed: 'e4c108b4-7989-4d22-93d6-391b77a39552/1889adca-ccb6-4ece-aa22-cad1ae4a35f3'
    allowPackageConflicts: false

Parameters:

packagesToPush: Glob pattern for .nupkg files to publish.
nuGetFeedType: internal: Publish to Azure Artifacts (not NuGet.org).
publishVstsFeed: Azure Artifacts feed ID (format: {organization-id}/{feed-id}).
allowPackageConflicts: false: Fail if package version already exists (enforces immutability).

Feed Configuration:

Feed Name: ConnectSoft (organization-wide feed).
Visibility: Private (requires authentication).
Upstream Sources: NuGet.org enabled for public packages.
Retention Policy: 500 package versions; delete versions older than 365 days (except production releases).

Package Consumption:

Other services reference packages in .csproj:

<ItemGroup>
  <PackageReference Include="ConnectSoft.ATP.Contracts" Version="1.0.42" />
</ItemGroup>

NuGet restore authenticates to feed using Azure Artifacts credential provider (injected by pipeline).

Symbol Packages (`.snupkg`)¶

Purpose: Enable step-through debugging into shared library code from consuming services.

Symbol Package Contents:

PDB Files: Program database files with debug symbols (variable names, line numbers, source file paths).
Source Server Info: Links PDB to source code in Git repository.

Publishing:

- task: PublishSymbols@2
  displayName: 'Publish Symbols'
  inputs:
    symbolsFolder: '$(Build.ArtifactStagingDirectory)'
    searchPattern: '**/*.snupkg'
    symbolServerType: 'TeamServices'
    detailedLog: true

Usage:

Developers debugging in Visual Studio can step into NuGet package code:

Visual Studio downloads symbols from Azure Artifacts symbol server.
Source Link resolves source code from Git commit SHA.
Debugger steps into external library code as if it were local.

Benefits:

Production Debugging: Investigate issues in shared libraries without deploying debug builds.
Stack Trace Clarity: Production stack traces include line numbers and file names.
Developer Productivity: No need to clone and build shared libraries locally.

Docker Images¶

ATP packages services as Docker images for containerized deployment to Azure Container Apps, AKS, or Docker-based hosting. Images are built using multi-stage Dockerfiles, scanned for vulnerabilities, and pushed to Azure Container Registry.

Docker Build and Push Template¶

Template Invocation:

- template: build/build-and-push-microservice-docker-steps.yaml@templates
  parameters:
    dockerRegistryServiceConnection: $(dockerRegistryServiceConnection)
    imageRepository: connectsoft/atp-ingestion
    containerRegistry: $(containerRegistry)
    dockerfile: src/ConnectSoft.ATP.Ingestion/Dockerfile
    buildContext: .
    tags: |
      $(Build.BuildNumber)
      latest

Parameters:

dockerRegistryServiceConnection: Azure DevOps service connection for ACR authentication.
imageRepository: Image name in registry (e.g., connectsoft/atp-ingestion).
containerRegistry: Registry URL (e.g., connectsoft.azurecr.io).
dockerfile: Path to Dockerfile (relative to repository root).
buildContext: Build context directory (usually . for repository root).
tags: Multi-line string of image tags (version + latest).

Template Execution:

Authenticate to Registry: Uses service connection credentials (managed identity or service principal).
Build Image: Runs docker build with BuildKit optimizations (layer caching, parallel builds).
Tag Image: Applies tags (e.g., connectsoft.azurecr.io/atp-ingestion:1.0.42, connectsoft.azurecr.io/atp-ingestion:latest).
Scan Image: Runs Trivy vulnerability scanner before push.
Push Image: Pushes to ACR only if scan passes.

Multi-stage Dockerfiles¶

Purpose: Optimize image size and security by separating build and runtime stages.

Example Dockerfile (ConnectSoft.ATP.Ingestion/Dockerfile):

# Stage 1: Build
FROM mcr.microsoft.com/dotnet/sdk:8.0 AS build
WORKDIR /src
COPY ["src/ConnectSoft.ATP.Ingestion/ConnectSoft.ATP.Ingestion.csproj", "ConnectSoft.ATP.Ingestion/"]
COPY ["src/ConnectSoft.ATP.Contracts/ConnectSoft.ATP.Contracts.csproj", "ConnectSoft.ATP.Contracts/"]
RUN dotnet restore "ConnectSoft.ATP.Ingestion/ConnectSoft.ATP.Ingestion.csproj"
COPY src/ .
WORKDIR "/src/ConnectSoft.ATP.Ingestion"
RUN dotnet build "ConnectSoft.ATP.Ingestion.csproj" -c Release -o /app/build

# Stage 2: Publish
FROM build AS publish
RUN dotnet publish "ConnectSoft.ATP.Ingestion.csproj" -c Release -o /app/publish /p:UseAppHost=false

# Stage 3: Runtime
FROM mcr.microsoft.com/dotnet/aspnet:8.0-alpine AS runtime
WORKDIR /app
EXPOSE 8080
EXPOSE 8081

# Create non-root user
RUN addgroup -S appgroup && adduser -S appuser -G appgroup
USER appuser

COPY --from=publish /app/publish .
ENTRYPOINT ["dotnet", "ConnectSoft.ATP.Ingestion.dll"]

Stage Breakdown:

Build Stage: Uses full SDK image; restores and builds project.
Publish Stage: Creates release binaries optimized for runtime.
Runtime Stage: Uses minimal aspnet:alpine image; copies only binaries (no SDK, no source).

Benefits:

Small Image Size: Runtime image ~200MB vs. SDK image ~700MB (3.5x reduction).
Security: No build tools in runtime image (reduced attack surface).
Layer Caching: Docker caches restored NuGet packages; only rebuilds changed code.

Distroless Alternative (Google Distroless):

FROM gcr.io/distroless/dotnet/aspnet:8.0 AS runtime
COPY --from=publish /app/publish /app
WORKDIR /app
ENTRYPOINT ["ConnectSoft.ATP.Ingestion"]

Benefits: No shell, no package manager, minimal OS libraries (even more secure).

Image Scanning (Trivy)¶

Purpose: Detect vulnerabilities in Docker image layers (OS packages, dependencies) before pushing to registry.

Execution (within template):

- script: |
    docker run --rm -v /var/run/docker.sock:/var/run/docker.sock \
      aquasec/trivy:latest image \
      --severity CRITICAL,HIGH \
      --exit-code 1 \
      $(containerRegistry)/$(imageRepository):$(Build.BuildNumber)
  displayName: 'Trivy Image Scan'

Scan Behavior:

Severity Threshold: Fail on critical or high vulnerabilities.
Scan Scope: OS packages (Alpine, Debian), .NET runtime, third-party libraries.
Exit Code: 1 if vulnerabilities found (blocks pipeline).

Scan Report (example):

ConnectSoft.ATP.Ingestion:1.0.42 (alpine 3.19)
==================================================
Total: 2 (HIGH: 1, CRITICAL: 1)

┌────────────┬───────────────┬──────────┬───────────────────┬───────────────┐
│  Library   │ Vulnerability │ Severity │ Installed Version │ Fixed Version │
├────────────┼───────────────┼──────────┼───────────────────┼───────────────┤
│ openssl    │ CVE-2024-1234 │ CRITICAL │ 3.1.0-r1          │ 3.1.0-r2      │
│ libcrypto3 │ CVE-2024-5678 │ HIGH     │ 3.1.0-r1          │ 3.1.0-r2      │
└────────────┴───────────────┴──────────┴───────────────────┴───────────────┘

Remediation:

Update Base Image: Rebuild with newer Alpine/Debian version.
Apply Patches: Run apk upgrade in Dockerfile (Alpine) or apt-get upgrade (Debian).
Request Exception: Security team evaluates; grants temporary exemption with mitigation plan.

Alternative: Azure Defender for Containers:

Azure Container Registry includes built-in vulnerability scanning via Microsoft Defender:

Continuous Scanning: Images rescanned when new CVEs discovered.
Integration: Results visible in Azure Portal + Azure DevOps.
Policy Enforcement: Block pulls of vulnerable images to production environments.

Registry: Azure Container Registry (ACR)¶

Registry Configuration:

Registry Name: connectsoft.azurecr.io (global ACR instance).
SKU: Premium (geo-replication, advanced security features).
Access: Managed Identity for Azure services; service principal for pipelines.

Geo-Replication:

Primary: East US (primary production region).
Replicas: West Europe, Southeast Asia (regional deployments).
Benefit: Reduced image pull latency; high availability.

Retention Policy:

# acr-retention-policy.json
{
  "rules": [
    {
      "description": "Keep last 10 versions per tag",
      "type": "AgeLimitWithTags",
      "tagNamePattern": "^[0-9]+\\.[0-9]+\\.[0-9]+$",
      "ageLimitInDays": 90,
      "keepLatestCount": 10
    },
    {
      "description": "Retain production tags indefinitely",
      "type": "AgeLimitWithTags",
      "tagNamePattern": "^prod-",
      "ageLimitInDays": 0
    }
  ]
}

Image Tags:

Version Tag: 1.0.42 (immutable; pinned in production deployments).
Latest Tag: latest (mutable; convenience for dev environments).
Branch Tags: develop, staging (track specific branches).
Production Tags: prod-1.0.42 (retained indefinitely for rollback).

Build Artifacts¶

The final step of the CI stage publishes all build outputs as pipeline artifacts for consumption by CD stages and compliance audits.

Binaries¶

Purpose: Publish compiled application binaries for deployment to Azure App Services or IIS.

Execution:

- task: PublishPipelineArtifact@1
  displayName: 'Publish Binaries'
  inputs:
    targetPath: '$(Build.SourcesDirectory)/src/ConnectSoft.ATP.Ingestion/bin/Release/net8.0'
    artifact: 'atp-ingestion-binaries'
    publishLocation: 'pipeline'

Artifact Contents:

DLLs: Compiled assemblies (main + dependencies).
EXE: Entry point executable (if applicable).
appsettings.json: Base configuration (environment-specific overrides applied during deployment).
web.config: IIS configuration (for on-prem deployments).
wwwroot/: Static files (CSS, JS, images) if web application.

Artifact Size: Typically 20-50 MB (compressed) for microservices.

Consumption: CD stages download artifact using DownloadPipelineArtifact@2 task.

Test Results¶

Purpose: Preserve test execution records for compliance audits and trend analysis.

Execution:

Test results automatically published by dotnet test task (.trx files) and PublishTestResults@2 task.

Artifact Contents:

Test Results: .trx files (Visual Studio Test Results format).
Code Coverage: coverage.cobertura.xml (Cobertura format).
Test Logs: Console output, exception stack traces.
Screenshots: For UI/integration tests (if applicable).

Retention: Test results retained for pipeline retention period (default 30 days; extended to 1 year for production builds).

Usage:

Compliance Audits: Prove automated testing was executed and passed.
Incident Investigation: Correlate test failures with production issues.
Trend Analysis: Track test pass rate and coverage over time.

Compliance Artifacts¶

Purpose: Bundle compliance evidence (SBOM, security scans, architecture records) for regulatory audits.

Execution:

- task: PublishPipelineArtifact@1
  displayName: 'Publish Compliance Bundle'
  inputs:
    targetPath: '$(Build.ArtifactStagingDirectory)/compliance'
    artifact: 'compliance-bundle'
    publishLocation: 'pipeline'

Compliance Bundle Contents:

SBOM (bom.json):
- Software Bill of Materials (CycloneDX format).
- All NuGet packages with versions, licenses, CVEs.
Security Scan Reports:
- sonarqube-report.json: SAST results with vulnerabilities, code smells, bugs.
- dependency-check-report.html: OWASP scan with CVE details.
- trivy-report.json: Docker image vulnerabilities (if containerized).
Test Results:
- test-results.trx: Unit and integration test pass/fail.
- coverage.cobertura.xml: Code coverage report.

Build Metadata (build-metadata.json):

{
  "buildNumber": "1.0.42",
  "commitSha": "a1b2c3d4e5f6g7h8i9j0",
  "pipelineRunId": "12345",
  "buildTimestamp": "2025-10-30T14:23:45Z",
  "approver": "platform-team@connectsoft.com",
  "qualityGates": {
    "sonarQube": "PASSED",
    "codeCoverage": "PASSED (75%)",
    "dependencyScan": "PASSED",
    "secretsScan": "PASSED"
  }
}

ADR Snapshots (adr/):
- Architecture Decision Records relevant to this build.
- Includes decisions on security controls, data retention, encryption.

Artifact Aggregation Script:

- script: |
    mkdir -p $(Build.ArtifactStagingDirectory)/compliance
    cp $(Build.ArtifactStagingDirectory)/bom.json $(Build.ArtifactStagingDirectory)/compliance/
    cp $(Build.SourcesDirectory)/dependency-check-report.html $(Build.ArtifactStagingDirectory)/compliance/
    cp $(Agent.TempDirectory)/**/coverage.cobertura.xml $(Build.ArtifactStagingDirectory)/compliance/
    # Generate build metadata
    echo '{"buildNumber":"$(Build.BuildNumber)","commitSha":"$(Build.SourceVersion)"}' > $(Build.ArtifactStagingDirectory)/compliance/build-metadata.json
  displayName: 'Aggregate Compliance Artifacts'

Retention Policy:

Standard Builds: 90 days.
Production Releases: 7 years (SOC 2, HIPAA regulatory requirement).
Storage: Azure Blob Storage with WORM policy (Write Once, Read Many).

Audit Use Cases:

SOC 2 Type II: Auditors request evidence of automated testing and security scanning.
HIPAA: Prove software bill of materials and vulnerability management.
ISO 27001: Demonstrate secure development lifecycle controls.
Incident Response: Trace production issue to specific build and code change.

Multi-Environment Deployment Strategy¶

ATP employs a progressive delivery model where artifacts are promoted through increasingly production-like environments (dev → test → staging → production) with graduated controls at each stage. Each environment serves a distinct validation purpose, and progression is gated by automated tests, manual approvals, or both, ensuring that only validated code reaches production tenants.

This multi-environment strategy balances velocity (developers get fast feedback in dev) with safety (production receives thoroughly validated releases). Deployment methods vary by environment — automated deployments to dev/test enable continuous integration, while controlled deployments to staging/production with blue-green or canary strategies minimize risk and enable instant rollback.

Environment Definitions¶

ATP maintains five standard environments (plus hotfix for emergency patches) with distinct purposes, approval requirements, and deployment methods.

Environment	Purpose	Approval	Deployment Method	Smoke Tests	Data	SLA
Dev	Continuous integration	None	Automated (every commit)	Basic health checks	Synthetic (reset nightly)	95%
Test	Integration testing	None	Automated (after CI)	Full regression suite	Stable test datasets	98%
Staging	Pre-production validation	Manual (1 approver)	Blue-green	Load tests, chaos	Production-like (anonymized)	99.5%
Production	Live tenant traffic	Manual (2 approvers + CAB)	Canary rollout	Synthetic monitors	Real tenant data	99.9%
Hotfix	Emergency patches	Manual (2 approvers, expedited)	Blue-green (fast track)	Critical path tests	Production clone	99.9%

Dev Environment¶

Purpose: Fast-feedback environment for developers to validate changes with the full ATP stack.

Characteristics:

Trigger: Every successful CI build on develop, main, or master branches.
Approval: None (fully automated).
Data: Synthetic test data generated via scripts; reset nightly to clean state.
Configuration: Debug-friendly settings (verbose logging, SQL query tracing, no rate limits, CORS permissive).
Infrastructure: Azure App Services (B2 tier) or Azure Container Apps (2 replicas).
Monitoring: Basic (health checks, error logging to Seq).
Uptime SLA: 95% (downtime acceptable for testing infrastructure changes).

Use Cases:

Developer testing of new features and bug fixes.
Integration testing across ATP microservices (Ingestion → Query → Integrity).
Product team demos and feature validation.
API contract validation (Swagger UI exposed publicly).

Deployment Process:

CI stage completes successfully; artifact published.
CD_Dev stage triggered automatically.
Download artifact from pipeline workspace.
Deploy to Azure App Service (dev slot) using rolling deployment.
Apply app settings from ATP-Dev-Variables variable group.
Restart service; wait for health check (/health returns 200).
Run basic smoke tests (ping API, verify database connectivity, test cache).
If smoke tests pass, deployment complete; if fail, alert dev team (no auto-rollback).

Failure Handling:

Deployment failures logged to Slack/Teams channel.
No automatic rollback (dev environment expected to be unstable).
Developers notified; fix forward with new commit.

Test Environment¶

Purpose: Stable environment for QA team to run comprehensive regression test suite.

Characteristics:

Trigger: CI build success on master or main branches (not feature branches).
Approval: None (automated after CI).
Data: Versioned test datasets restored from snapshots; includes edge cases and boundary conditions.
Configuration: Production-like settings (standard logging, rate limits enabled, feature flags match staging).
Infrastructure: Azure App Services (S1 tier) or Azure Container Apps (3 replicas).
Monitoring: Enhanced (Application Insights, custom dashboards, alert rules).
Uptime SLA: 98% (scheduled downtime for data refresh).

Use Cases:

QA regression testing (manual and automated).
Performance baseline validation (load tests with k6/JMeter).
Security testing (OWASP ZAP dynamic scans).
Third-party integration testing (external APIs, webhooks).

Deployment Process:

CI stage completes successfully.
CD_Test stage triggered automatically (parallel with CD_Dev).
Download artifact from pipeline workspace.
Deploy to Azure App Service (test environment) using rolling deployment.
Apply app settings from ATP-Test-Variables variable group.
Restart service; wait for health checks.
Run health checks and smoke tests.
Trigger automated regression test suite (Playwright/Selenium E2E tests).
Publish test results to Azure Test Plans.
If tests pass, deployment complete; if fail, rollback and alert QA team.

Automated Test Suite:

API Contract Tests: Validate OpenAPI spec compliance (100+ endpoints).
Functional Tests: End-to-end workflows (ingest → query → export → verify integrity).
Performance Tests: P95 latency < 500ms, throughput ≥ 1000 req/sec.
Security Tests: Authentication/authorization checks, data isolation validation.

Rollback: Automated on test failure (redeploy previous version, alert QA team).

Staging Environment¶

Purpose: Final pre-production validation with production-identical configuration.

Characteristics:

Trigger: Manual (initiated via Azure DevOps UI after Test passes).
Approval: Manual (1 approver from Platform/SRE team).
Data: Anonymized copy of production data (refreshed weekly via Azure SQL Database copy).
Configuration: Identical to production (same feature flags, rate limits, encryption, WORM storage).
Infrastructure: Azure App Services (P2V2 tier) or Azure Container Apps (5 replicas with auto-scaling).
Monitoring: Full production monitoring (Grafana, Prometheus, Jaeger, Seq, Application Insights).
Uptime SLA: 99.5% (planned maintenance windows communicated).

Use Cases:

Final validation before production release.
Chaos engineering and resilience testing (kill dependencies, network partitions, resource exhaustion).
Load testing with production-scale traffic (10x normal load).
Customer success team training and demos.

Deployment Process:

Pre-Approval:
- Approver verifies CI and Test stages passed.
- Reviews linked work items (Epic/Feature/Task) and release notes.
- Confirms rollback plan documented.
Download artifact from pipeline workspace.
Deploy to Azure App Service using blue-green strategy (deploy to staging slot).
Apply app settings from ATP-Staging-Variables variable group.
Warm up staging slot (prefetch caches, prime database connections, load reference data).
Run health checks, smoke tests, and regression tests.
Load Test: Simulate production traffic (10x normal load) for 30 minutes.
Chaos Test: Inject failures (kill Redis, throttle SQL, network latency) and verify graceful degradation.
Compliance Test: Verify GDPR redaction, data residency, retention policies, legal holds.
If all tests pass, slot swap (zero-downtime cutover).
Monitor for 30 minutes post-swap (error rates, latency, health checks).
If metrics healthy, deployment complete; if degraded, instant rollback (swap back).

Validation Tests:

Full Regression Suite: All tests from Test environment.
Load Tests: k6 or Azure Load Testing with realistic traffic patterns.
Chaos Tests: Chaos Mesh or Azure Chaos Studio inject failures.
Compliance Tests: Automated scripts verify regulatory requirements.

Rollback: Manual or automated (swap slots back to previous version if metrics degrade beyond thresholds).

Approval Process:

Approver reviews Azure DevOps deployment dashboard.
Checks pipeline status, test results, security scans.
Verifies no active production incidents (ServiceNow/PagerDuty).
Approves via Azure DevOps UI (approval recorded in audit log).

Production Environment¶

Purpose: Live environment serving real tenant traffic with highest reliability standards.

Characteristics:

Trigger: Manual (initiated after successful Staging deployment).
Approval: Manual (2 approvers: SRE Lead + Platform Lead; CAB approval for major releases).
Data: Real tenant data (multi-tenant isolation, WORM storage, legal holds, encryption at rest/transit).
Configuration: Production-hardened (strict rate limits, DDoS protection, WAF rules, audit logging to SIEM).
Infrastructure: Azure App Services (P3V2 tier, zone redundancy) or Azure Container Apps (10+ replicas with auto-scaling).
Monitoring: Enterprise-grade (24/7 monitoring, PagerDuty alerting, synthetic monitors, SLO tracking).
Uptime SLA: 99.9% (3 nines, financially backed SLA with tenants).

Use Cases:

Serving real tenant audit trail traffic.
Production compliance and regulatory requirements (SOC 2, HIPAA, ISO 27001).
SLA-backed service delivery with contractual obligations.

Deployment Process:

Pre-Approval (Change Advisory Board):
- CAB Review: For major releases (breaking changes, schema migrations, major version bumps).
- Approvers verify:
  - Staging deployment successful (no issues for 24+ hours).
  - Rollback plan documented and tested in staging.
  - On-call rotation scheduled (SRE team on standby for 48 hours post-deploy).
  - Customer communication plan (status page updates, email notifications to affected tenants).
  - Incident response team briefed.
Download artifact from pipeline workspace (same artifact validated in dev/test/staging).
Deploy using canary strategy (gradual traffic shift):
- Phase 1 (10%): Deploy to 10% of instances (1-2 pods/nodes); monitor for 30 minutes.
- Phase 2 (50%): If metrics healthy, deploy to 50%; monitor for 1 hour.
- Phase 3 (100%): If metrics healthy, deploy to all instances; monitor for 2 hours.
Apply app settings from ATP-Prod-Variables variable group.
Monitor production metrics continuously:
- Error Rate: < 0.5% (baseline < 0.1%).
- P95 Latency: < 1000ms (baseline ~500ms).
- Throughput: No degradation vs. pre-deployment.
- Health Checks: 100% of instances healthy.
Emit deployment event to observability stack (Grafana annotation, correlate with service behavior).
Update status page (e.g., "Deployment in progress - no impact expected").
If metrics healthy after Phase 3, deployment complete; continue monitoring for 48 hours.

Deployment Strategies:

Canary (Default): Gradual rollout with automated rollback on metric degradation (most deployments).
Blue-Green: For database schema changes, major version bumps, or high-risk releases (instant rollback via slot swap).
Rolling: Rarely used in production (only for low-risk patches with proven track record).

Monitoring & Rollback:

Automated Rollback Triggers:

Error rate > 1% (compared to pre-deployment baseline).
P95 latency > 2x baseline (e.g., 1000ms vs. 500ms).
Health check failures > 10% of instances.
Critical alerts fired (CPU > 90%, memory > 85%, disk full, database deadlocks).
Customer-reported incidents spike (> 5 incidents in 10 minutes).

Manual Rollback:

SRE on-call can abort deployment via Azure DevOps UI or CLI.
Rollback mechanism: Swap deployment slot back or redeploy previous artifact version.
Rollback duration: < 5 minutes (instant for blue-green, ~5 min for canary).

Post-Deployment:

Validation Period: 48-hour observation period with on-call SRE.
Communication: Update status page ("Deployment complete - monitoring"), notify customer success team.
Retrospective: Post-deployment review if issues occurred (RCA, lessons learned, process improvements).

Approval Workflow (Azure DevOps Environments):

- deployment: DeployToProduction
  environment: ATP-Production
  strategy:
    canary:
      increments: [10, 50]
      preDeploy:
        steps:
        - script: echo "Pre-deployment checks..."
      deploy:
        steps:
        - template: deploy/deploy-microservice-to-azure-web-site.yaml@templates
      postRouteTraffic:
        steps:
        - script: echo "Monitoring metrics..."
        - task: Delay@1
          inputs:
            delayForMinutes: 30
      on:
        failure:
          steps:
          - script: echo "Rolling back due to failure..."
        success:
          steps:
          - script: echo "Canary phase successful, proceeding..."

Hotfix Environment¶

Purpose: Fast-track emergency patches for critical production issues.

Characteristics:

Trigger: Manual (security vulnerabilities, critical bugs, data loss risks).
Approval: Manual (2 approvers, expedited process; CAB notified post-deployment).
Data: Production clone (restored from latest backup).
Configuration: Identical to production.
Infrastructure: Same as production (isolated instances).

Use Cases:

Zero-day vulnerability patching (e.g., Log4Shell).
Critical bug fixes causing production outages.
Data corruption remediation.

Deployment Process:

Expedited Approval: SRE Lead + CTO approve (emergency process).
Run critical path tests only (full regression suite skipped for speed).
Deploy using blue-green strategy (instant rollback capability).
Monitor intensively for 1 hour.
If stable, merge hotfix to main branch for regular release cycle.

Rollback: Instant via blue-green slot swap.

Post-Hotfix:

CAB notified within 24 hours (retrospective review).
Hotfix branch merged to main; regular CI/CD cycle resumes.

Deployment Templates¶

Azure Pipelines deployment jobs provide specialized capabilities for CD stages: environment tracking, approval gates, deployment strategies, and rollback support. ATP uses ConnectSoft deployment templates that encapsulate best practices for Azure App Service and IIS deployments.

Deployment Job Structure¶

Basic Template (Azure App Service):

- stage: CD_Staging
  displayName: 'Deploy to Staging'
  dependsOn: CI_Stage
  condition: succeeded()
  pool: Default  # Self-hosted agents for on-prem or VNet-integrated deployments
  variables:
  - group: ATP-Staging-Variables
  jobs:
  - deployment: DeployToStaging
    displayName: 'Deploy ATP Ingestion to Staging'
    environment: ATP-Staging  # Azure DevOps Environment (tracks deployments, approvals)
    strategy:
      runOnce:  # Simple deployment strategy (alternatives: rolling, canary)
        deploy:
          steps:
          - download: current  # Download artifacts from current pipeline run
            artifact: atp-ingestion-drop
          - template: deploy/deploy-microservice-to-azure-web-site.yaml@templates
            parameters:
              azureSubscription: $(azureSubscription)
              appName: atp-ingestion-staging
              package: $(Pipeline.Workspace)/atp-ingestion-drop/*.zip
              appSettings: |
                -ASPNETCORE_ENVIRONMENT Staging
                -ApplicationInsights__InstrumentationKey $(AppInsightsKey)
                -KeyVault__Url $(KeyVaultUrl)

Key Components:

environment: ATP-Staging:
- Links deployment to Azure DevOps Environment.
- Tracks deployment history (who deployed, when, what version).
- Enables approval gates (manual approvals, automated checks).
- Provides deployment dashboard (success rate, duration trends).
strategy: runOnce:
- Simplest deployment strategy (deploy once, no traffic shifting).
- Alternatives: rolling, canary (for gradual rollouts).
download: current:
- Downloads artifact from current pipeline run.
- Artifact published in CI stage; consumed in CD stage.
Template Parameters:
- azureSubscription: Azure DevOps service connection for Azure authentication.
- appName: Azure App Service name (e.g., atp-ingestion-staging).
- package: Path to deployment package (.zip file with binaries).
- appSettings: Environment-specific configuration overrides.

Deploy to Azure Web Site Template¶

Template: deploy/deploy-microservice-to-azure-web-site.yaml@templates

Purpose: Deploy .NET applications to Azure App Service with health checks and validation.

Template Contents (simplified):

parameters:
- name: azureSubscription
  type: string
- name: appName
  type: string
- name: package
  type: string
- name: appSettings
  type: string
  default: ''
- name: deploymentSlot
  type: string
  default: 'production'
- name: healthCheckUrl
  type: string
  default: '/health'

steps:
- task: AzureWebApp@1
  displayName: 'Deploy to Azure App Service'
  inputs:
    azureSubscription: ${{ parameters.azureSubscription }}
    appName: ${{ parameters.appName }}
    package: ${{ parameters.package }}
    deploymentMethod: 'zipDeploy'
    appSettings: ${{ parameters.appSettings }}
    deployToSlotOrASE: true
    resourceGroupName: 'auto-detect'
    slotName: ${{ parameters.deploymentSlot }}

- task: AzureAppServiceManage@0
  displayName: 'Restart App Service'
  inputs:
    azureSubscription: ${{ parameters.azureSubscription }}
    action: 'Restart Azure App Service'
    webAppName: ${{ parameters.appName }}

- script: |
    echo "Waiting for service to start..."
    sleep 30
    curl -f https://${{ parameters.appName }}.azurewebsites.net${{ parameters.healthCheckUrl }} || exit 1
  displayName: 'Health Check'

- script: echo "Deployment successful!"
  displayName: 'Deployment Complete'

Steps Breakdown:

AzureWebApp@1: Deploys .zip package to Azure App Service using zip deploy (fast, atomic).
Restart: Restarts app service to apply configuration changes.
Health Check: Polls /health endpoint until service responds (max 5 minutes).
Completion: Logs success message.

Deployment Methods:

zipDeploy: Fast, atomic (recommended for most scenarios).
runFromPackage: Run directly from package (no extraction, faster startup).
webDeploy: Windows-specific (IIS-friendly, slower).

Deploy to IIS Template¶

Template: deploy/deploy-microservice-to-iis.yaml@templates

Purpose: Deploy .NET applications to on-premises IIS servers via self-hosted agents.

Template Contents (simplified):

parameters:
- name: targetMachine
  type: string
- name: deploymentPath
  type: string
- name: appPoolName
  type: string
- name: package
  type: string

steps:
- task: IISWebAppManagementOnMachineGroup@0
  displayName: 'Stop IIS Application Pool'
  inputs:
    machineGroupName: ${{ parameters.targetMachine }}
    action: 'Stop App Pool'
    appPoolName: ${{ parameters.appPoolName }}

- task: WindowsMachineFileCopy@2
  displayName: 'Copy Files to IIS Server'
  inputs:
    sourcePath: ${{ parameters.package }}
    machineNames: ${{ parameters.targetMachine }}
    targetPath: ${{ parameters.deploymentPath }}
    cleanTargetBeforeCopy: true

- task: IISWebAppDeploymentOnMachineGroup@0
  displayName: 'Deploy to IIS'
  inputs:
    machineGroupName: ${{ parameters.targetMachine }}
    webSiteName: 'ATP-Ingestion'
    package: ${{ parameters.deploymentPath }}
    appPoolName: ${{ parameters.appPoolName }}

- task: IISWebAppManagementOnMachineGroup@0
  displayName: 'Start IIS Application Pool'
  inputs:
    machineGroupName: ${{ parameters.targetMachine }}
    action: 'Start App Pool'
    appPoolName: ${{ parameters.appPoolName }}

- script: curl -f http://${{ parameters.targetMachine }}/health || exit 1
  displayName: 'Health Check'

Use Case: ConnectSoft customers with hybrid cloud or on-premises requirements (financial services, healthcare, government).

Deployment Strategies¶

ATP employs three deployment strategies based on environment and risk profile: rolling, blue-green, and canary. Each strategy balances deployment speed, rollback capability, and risk mitigation.

Rolling Deployment¶

Definition: Sequential instance-by-instance updates with health checks between batches.

How It Works:

Identify all running instances (e.g., 5 App Service instances).
Update instance 1; wait for health check to pass.
Update instance 2; wait for health check to pass.
Repeat until all instances updated.

Configuration (Azure Pipelines):

strategy:
  rolling:
    maxParallel: 2  # Update 2 instances at a time
    preDeploy:
      steps:
      - script: echo "Pre-deploy validation..."
    deploy:
      steps:
      - template: deploy/deploy-microservice-to-azure-web-site.yaml@templates
    postRouteTraffic:
      steps:
      - script: echo "Instance updated, running health checks..."

Characteristics:

Speed: Moderate (sequential updates take time for many instances).
Risk: Low-Medium (issues isolated to subset of instances).
Rollback: Manual (redeploy previous version or stop deployment mid-roll).
Downtime: Zero (old instances serve traffic while new instances deploy).

Use Cases:

Dev/Test Environments: Default strategy (fast enough, simple).
Low-Risk Releases: Patch releases, configuration changes, minor bug fixes.

Advantages:

Simple implementation (built into Azure DevOps).
No infrastructure changes required (deploy to existing instances).
Gradual rollout reduces blast radius of issues.

Disadvantages:

Rollback requires redeployment (not instant).
During rollout, mix of old/new versions running (version skew).
Not suitable for breaking changes (API incompatibilities).

Blue-Green Deployment¶

Definition: Deploy new version to parallel environment (green), validate, then instant cutover from old environment (blue).

How It Works:

Blue Environment: Current production version serving traffic.
Green Environment: Deploy new version to green (staging slot in Azure App Service).
Validation: Run smoke tests, health checks, regression tests against green.
Cutover: Swap traffic from blue to green (swap deployment slots).
Monitoring: Monitor green for issues; if problems, instant rollback (swap back to blue).
Decommission: After stability period, blue becomes standby for next deployment.

Configuration (Azure App Service):

steps:
- task: AzureWebApp@1
  displayName: 'Deploy to Staging Slot (Green)'
  inputs:
    azureSubscription: $(azureSubscription)
    appName: atp-ingestion-staging
    package: $(Pipeline.Workspace)/drop/*.zip
    deployToSlotOrASE: true
    slotName: 'staging'  # Green environment

- script: |
    # Run tests against staging slot
    curl -f https://atp-ingestion-staging-staging.azurewebsites.net/health
    # Run smoke tests, load tests, etc.
  displayName: 'Validate Green Environment'

- task: AzureAppServiceManage@0
  displayName: 'Swap Slots (Cutover)'
  inputs:
    azureSubscription: $(azureSubscription)
    action: 'Swap Slots'
    webAppName: atp-ingestion-staging
    sourceSlot: 'staging'
    targetSlot: 'production'

- script: sleep 300 && check_metrics.sh
  displayName: 'Monitor Production (5 min)'

# Rollback (if needed)
- task: AzureAppServiceManage@0
  condition: failed()
  displayName: 'Rollback (Swap Back)'
  inputs:
    azureSubscription: $(azureSubscription)
    action: 'Swap Slots'
    webAppName: atp-ingestion-staging
    sourceSlot: 'production'
    targetSlot: 'staging'

Characteristics:

Speed: Fast cutover (< 30 seconds for slot swap).
Risk: Low (full validation before cutover; instant rollback).
Rollback: Instant (swap slots back).
Downtime: Zero (seamless cutover).

Use Cases:

Staging Environment: Default strategy (pre-production validation).
Production (High-Risk Changes): Database schema changes, major version bumps, breaking API changes.
Hotfixes: Emergency patches requiring fast rollback capability.

Advantages:

Instant Rollback: Swap back to blue if issues detected (< 30 seconds).
Full Validation: Test green environment before cutover (identical to production).
Zero Downtime: Seamless traffic cutover.

Disadvantages:

Resource Cost: Requires double infrastructure (blue + green environments).
State Management: Database migrations must be backward-compatible (blue and green both access same DB during cutover).
Complexity: Requires slot management and swap orchestration.

Best Practices:

Warm-Up Green: Prefetch caches, prime connections before cutover.
Database Migrations: Use expand-contract pattern (add new columns without breaking old code).
Monitoring: Watch metrics closely for 30 minutes post-cutover.

Canary Deployment¶

Definition: Gradually shift traffic from old version to new version in controlled increments (10% → 50% → 100%), monitoring metrics at each phase.

How It Works:

Phase 1 (10%): Deploy new version to 10% of instances; route 10% of traffic to canary.
Monitor: Watch error rate, latency, throughput for 30 minutes.
Phase 2 (50%): If metrics healthy, deploy to 50% of instances; route 50% of traffic.
Monitor: Watch metrics for 1 hour.
Phase 3 (100%): If metrics healthy, deploy to all instances; route 100% of traffic.
Rollback: If metrics degrade at any phase, halt deployment and rollback.

Configuration (Azure Pipelines):

strategy:
  canary:
    increments: [10, 50]  # Traffic percentages (10%, 50%, 100%)
    preDeploy:
      steps:
      - script: echo "Pre-canary checks..."
    deploy:
      steps:
      - template: deploy/deploy-microservice-to-azure-web-site.yaml@templates
        parameters:
          azureSubscription: $(azureSubscription)
          appName: atp-ingestion-prod
          package: $(Pipeline.Workspace)/drop/*.zip
    routeTraffic:
      steps:
      - script: echo "Routing $TRAFFIC_PERCENTAGE% traffic to canary"
    postRouteTraffic:
      steps:
      - script: |
          echo "Monitoring canary for 30 minutes..."
          monitor_metrics.sh --duration 30 --threshold error_rate=1%,latency_p95=1000ms
        displayName: 'Monitor Canary Metrics'
      - task: Delay@1
        inputs:
          delayForMinutes: 30
    on:
      failure:
        steps:
        - script: echo "Canary failed, rolling back..."
        - task: AzureWebApp@1
          inputs:
            azureSubscription: $(azureSubscription)
            appName: atp-ingestion-prod
            package: $(Pipeline.Workspace)/previous-version.zip
      success:
        steps:
        - script: echo "Canary phase successful, proceeding to next increment..."

Traffic Routing:

Azure App Service supports traffic routing via deployment slots with traffic percentages:

- task: AzureAppServiceManage@0
  displayName: 'Route 10% Traffic to Canary'
  inputs:
    azureSubscription: $(azureSubscription)
    action: 'Set Traffic Routing'
    webAppName: atp-ingestion-prod
    slotName: 'staging'
    trafficPercentage: 10

Characteristics:

Speed: Slow (hours for full rollout with monitoring).
Risk: Very Low (minimal blast radius; gradual rollout).
Rollback: Automated (halt deployment, shift traffic back).
Downtime: Zero (gradual cutover).

Use Cases:

Production (Critical Services): Default strategy for ATP (highest safety).
High-Traffic Services: Services with large user base (blast radius mitigation).
Risky Changes: New features, algorithm changes, performance optimizations.

Advantages:

Minimal Blast Radius: Issues affect only 10% of traffic initially.
Real Production Validation: Canary receives real tenant traffic (not synthetic).
Automated Rollback: Metrics-based rollback (no human intervention needed).
Data-Driven: Metrics guide deployment decisions (not gut feel).

Disadvantages:

Slow: Full rollout takes hours (not suitable for urgent hotfixes).
Complex: Requires traffic routing, metrics monitoring, automated rollback logic.
Resource Overhead: Requires infrastructure to support mixed versions.

Metrics Monitoring (Automated):

#!/bin/bash
# monitor_metrics.sh - Automated canary metrics validation

ERROR_RATE=$(curl -s "https://prometheus.connectsoft.com/api/v1/query?query=error_rate")
LATENCY_P95=$(curl -s "https://prometheus.connectsoft.com/api/v1/query?query=latency_p95")

if [ "$ERROR_RATE" -gt "1" ]; then
  echo "ERROR: Error rate $ERROR_RATE% exceeds threshold 1%"
  exit 1
fi

if [ "$LATENCY_P95" -gt "1000" ]; then
  echo "ERROR: P95 latency ${LATENCY_P95}ms exceeds threshold 1000ms"
  exit 1
fi

echo "Canary metrics healthy: error_rate=${ERROR_RATE}%, latency_p95=${LATENCY_P95}ms"
exit 0

Best Practices:

Start Small: 10% traffic minimizes risk; increment conservatively.
Monitor Continuously: Real-time metrics dashboards; automated alerts.
Define Clear Thresholds: Error rate, latency, throughput baselines.
Automate Rollback: Script-based rollback (faster than manual).
Communicate: Status updates on status page (tenants aware of ongoing deployment).

Service-Specific Pipeline Configurations¶

Each ATP microservice maintains its own azure-pipelines.yml file tailored to its specific requirements — service dependencies, coverage thresholds, container orchestration, and deployment targets. While all services inherit the core ConnectSoft pipeline philosophy and templates, they customize parameters based on architectural role, complexity, and risk profile.

This section provides reference pipeline configurations for the three primary ATP services — Ingestion (write path), Query (read path), and Gateway (API orchestration) — demonstrating how shared templates are parameterized for service-specific needs.

ATP Ingestion Service¶

The Ingestion Service is ATP's write path — responsible for receiving audit events from producers, validating structure and tenancy, applying classification and redaction, and persisting records to WORM storage. Its pipeline emphasizes high coverage (75%), integration testing with Redis and RabbitMQ, and Docker image publishing for containerized deployment.

Pipeline Configuration¶

Repository: ConnectSoft.ATP.Ingestion
File: /azure-pipelines.yml

Complete Pipeline:

# azure-pipelines.yml
name: $(majorMinorVersion).$(semanticVersion)

resources:
  repositories:
    - repository: templates
      type: git
      name: ConnectSoft/ConnectSoft.AzurePipelines
      ref: refs/tags/v1.2.0  # Pin to stable template version
  containers:
    - container: redis
      image: redis:7-alpine
      ports: [6379:6379]
      options: --health-cmd "redis-cli ping" --health-interval 10s
    - container: rabbitmq
      image: rabbitmq:3-management-alpine
      ports: [5672:5672, 15672:15672]
      env:
        RABBITMQ_DEFAULT_USER: guest
        RABBITMQ_DEFAULT_PASS: guest
    - container: otel-collector
      image: otel/opentelemetry-collector:0.97.0
      ports: [4317:4317, 8888:8888]
    - container: seq
      image: datalust/seq:latest
      ports: [5341:80]
      env:
        ACCEPT_EULA: Y

trigger:
  branches:
    include: [master, main, develop]
  paths:
    exclude: [README.md, docs/**, .github/**, '*.md']

pr:
  branches:
    include: [master, main]
  paths:
    exclude: [README.md, docs/**, .github/**, '*.md']

pool:
  vmImage: 'ubuntu-latest'

variables:
  majorMinorVersion: 1.0
  semanticVersion: $[counter(variables['majorMinorVersion'], 0)]
  solution: '**/*.slnx'
  exactSolution: 'ConnectSoft.ATP.Ingestion.slnx'
  buildConfiguration: 'Release'
  restoreVstsFeed: 'e4c108b4-7989-4d22-93d6-391b77a39552/1889adca-ccb6-4ece-aa22-cad1ae4a35f3'
  codeCoverageThreshold: 75
  runSettingsFileName: 'ConnectSoft.ATP.Ingestion.runsettings'
  artifactName: 'atp-ingestion-drop'
  dockerRegistryServiceConnection: '9190f67e-25ee-4478-bdd5-933128c9f06f'
  containerRegistry: 'connectsoft.azurecr.io'
  imageRepository: 'connectsoft/atp-ingestion'

stages:
- stage: CI_Stage
  displayName: 'Build, Test, and Publish'
  jobs:
  - job: Build_Test_Publish
    displayName: 'Build and Test Ingestion Service'
    services:
      redis: redis
      rabbitmq: rabbitmq
      otel: otel-collector
      seq: seq
    steps:
    - task: UseDotNet@2
      displayName: 'Install .NET 8 SDK'
      inputs:
        version: '8.x'
        includePreviewVersions: false

    - template: build/lint-microservice-steps.yaml@templates
      parameters:
        solution: $(solution)
        exactSolution: $(exactSolution)
        restoreVstsFeed: $(restoreVstsFeed)
        isNugetAuthenticateEnabled: true

    - template: build/build-microservice-steps.yaml@templates
      parameters:
        solution: $(solution)
        exactSolution: $(exactSolution)
        buildConfiguration: $(buildConfiguration)

    - template: test/test-microservice-steps.yaml@templates
      parameters:
        solution: $(solution)
        runSettingsFileName: $(runSettingsFileName)
        buildConfiguration: $(buildConfiguration)
        codeCoverageThreshold: $(codeCoverageThreshold)

    - template: build/build-and-push-microservice-docker-steps.yaml@templates
      parameters:
        dockerRegistryServiceConnection: $(dockerRegistryServiceConnection)
        imageRepository: $(imageRepository)
        containerRegistry: $(containerRegistry)
        dockerfile: 'src/ConnectSoft.ATP.Ingestion/Dockerfile'
        buildContext: '.'
        tags: |
          $(Build.BuildNumber)
          latest

    - template: publish/publish-microservice-steps.yaml@templates
      parameters:
        artifactName: $(artifactName)

- stage: CD_Dev
  displayName: 'Deploy to Development'
  dependsOn: CI_Stage
  condition: succeeded()
  variables:
  - group: ATP-Dev-Variables
  jobs:
  - deployment: DeployToDev
    displayName: 'Deploy Ingestion to Dev'
    environment: ATP-Dev
    strategy:
      runOnce:
        deploy:
          steps:
          - download: current
            artifact: $(artifactName)
          - template: deploy/deploy-microservice-to-azure-web-site.yaml@templates
            parameters:
              azureSubscription: '$(azureSubscription)'
              appName: 'atp-ingestion-dev'
              package: '$(Pipeline.Workspace)/$(artifactName)/*.zip'
              appSettings: |
                -ASPNETCORE_ENVIRONMENT Development
                -ApplicationInsights__InstrumentationKey $(AppInsightsKey)
                -KeyVault__Url $(KeyVaultUrl)

Key Configuration Points¶

Service Containers:

Redis (redis:7-alpine): Distributed caching tests; health check enabled.
RabbitMQ (rabbitmq:3-management-alpine): Message bus integration tests; management UI on port 15672.
OTEL Collector (otel/opentelemetry-collector:0.97.0): OpenTelemetry trace/metric tests.
Seq (datalust/seq:latest): Structured logging tests; validates log enrichment and correlation.

Why These Containers:

Ingestion Service uses Redis for distributed caching (duplicate detection, rate limiting), RabbitMQ for asynchronous event publishing (audit.record.appended), OTEL for telemetry export, and Seq for centralized logging. Integration tests validate these dependencies work correctly.

Coverage Threshold: 75%

Rationale: Ingestion is the critical write path for ATP — all audit records flow through this service. Higher coverage (75% vs. 70% default) ensures:

Validation logic thoroughly tested (schema validation, tenancy checks, classification).
Error handling paths covered (malformed events, duplicate detection, quota exhaustion).
Integration points tested (cache writes, message publishing, storage persistence).

Exclusions: Infrastructure code (middleware, filters), generated code (DTOs from contracts).

Triggers:

CI Builds: master, main, develop branches.
PR Builds: Pull requests to master or main (validation only, no deployment).
Path Exclusions: Documentation changes, README edits don't trigger builds.

Docker Image Publishing:

Ingestion Service builds Docker image (connectsoft/atp-ingestion) for containerized deployment:

Tags: Build number (1.0.42) and latest.
Trivy Scan: Fails if critical/high vulnerabilities in image.
Registry: Azure Container Registry (connectsoft.azurecr.io).

Artifact Name: atp-ingestion-drop (downloaded by CD stages).

Deployment Target: Azure App Service or Azure Container Apps (containerized).

Service-Specific Considerations¶

Idempotency Testing:

Ingestion Service implements inbox pattern for duplicate detection. Integration tests validate:

Same event ingested twice → second attempt returns 200 (idempotent, not 409).
Duplicate detection cache (Redis) works correctly.
Deduplication window honored (24 hours).

Performance Testing:

Ingestion must handle high throughput (10,000+ events/sec during peak). Load tests run in Test environment:

- script: |
    k6 run --vus 100 --duration 5m ingestion-load-test.js
  displayName: 'Ingestion Load Test'

Compliance Validation:

Classification: Test that PII fields correctly classified (email → PII, username → non-PII).
Redaction: Test that redaction policies applied during write.
Residency: Test that events routed to correct regional storage based on tenant profile.

ATP Query Service¶

The Query Service is ATP's read path — exposing APIs for searching, filtering, and retrieving audit records with tenant isolation, ABAC policy enforcement, and Elasticsearch integration. Its pipeline emphasizes very high coverage (80%), Elasticsearch container for integration tests, and PostgreSQL for read model storage.

Pipeline Configuration¶

Repository: ConnectSoft.ATP.Query
File: /azure-pipelines.yml

Key Differences from Ingestion:

# azure-pipelines.yml (excerpts)
resources:
  containers:
    - container: elasticsearch
      image: docker.elastic.co/elasticsearch/elasticsearch:8.11.0
      ports: [9200:9200]
      env:
        discovery.type: single-node
        xpack.security.enabled: false
        ES_JAVA_OPTS: "-Xms512m -Xmx512m"
    - container: postgres
      image: postgres:16-alpine
      ports: [5432:5432]
      env:
        POSTGRES_USER: postgres
        POSTGRES_PASSWORD: postgres
        POSTGRES_DB: ATPQuery
    - container: redis
      image: redis:7-alpine
      ports: [6379:6379]
    - container: otel-collector
      image: otel/opentelemetry-collector:0.97.0
      ports: [4317:4317]

variables:
  exactSolution: 'ConnectSoft.ATP.Query.slnx'
  codeCoverageThreshold: 80  # Higher due to query complexity
  runSettingsFileName: 'ConnectSoft.ATP.Query.runsettings'
  artifactName: 'atp-query-drop'
  imageRepository: 'connectsoft/atp-query'

stages:
- stage: CI_Stage
  jobs:
  - job: Build_Test_Publish
    services:
      elasticsearch: elasticsearch
      postgres: postgres
      redis: redis
      otel: otel-collector
    steps:
    - template: build/lint-microservice-steps.yaml@templates
      parameters:
        solution: $(solution)
        exactSolution: $(exactSolution)
        restoreVstsFeed: $(restoreVstsFeed)
    - template: build/build-microservice-steps.yaml@templates
      parameters:
        solution: $(solution)
        buildConfiguration: $(buildConfiguration)
    - template: test/test-microservice-steps.yaml@templates
      parameters:
        solution: $(solution)
        runSettingsFileName: $(runSettingsFileName)
        codeCoverageThreshold: $(codeCoverageThreshold)
    - template: build/build-and-push-microservice-docker-steps.yaml@templates
      parameters:
        dockerRegistryServiceConnection: $(dockerRegistryServiceConnection)
        imageRepository: $(imageRepository)
        containerRegistry: $(containerRegistry)
        dockerfile: 'src/ConnectSoft.ATP.Query/Dockerfile'
        buildContext: '.'
        tags: |
          $(Build.BuildNumber)
          latest
    - template: publish/publish-microservice-steps.yaml@templates
      parameters:
        artifactName: $(artifactName)

Service Containers (Query-Specific)¶

Elasticsearch (docker.elastic.co/elasticsearch/elasticsearch:8.11.0):

Purpose: Full-text search and aggregation queries on audit records.
Configuration:
- Single-node mode (sufficient for testing).
- Security disabled (no HTTPS/auth in test environment).
- Heap size: 512MB (balances performance and agent memory).
Integration Tests:
- Index audit records to Elasticsearch.
- Execute complex queries (full-text search, aggregations, filters).
- Validate search ranking and relevance.
- Test pagination and scroll API.

PostgreSQL (postgres:16-alpine):

Purpose: Relational read model for structured queries (by tenant, by user, by time range).
Configuration:
- Default credentials (postgres/postgres).
- Database: ATPQuery.
Integration Tests:
- Execute NHibernate queries against Postgres.
- Validate query projections (audit record → read model).
- Test complex joins and subqueries.
- Verify tenant isolation in queries (WHERE tenantId = @tenantId).

Redis (redis:7-alpine):

Purpose: Query result caching and rate limiting.
Integration Tests:
- Cache query results; verify cache hits.
- Test cache invalidation on new records.
- Validate rate limiting (max 1000 queries/min per tenant).

Coverage Threshold: 80%¶

Rationale: Query Service has complex query logic with multiple code paths:

Query Builders: Dynamic LINQ, Elasticsearch query DSL, SQL generation.
Filters and Aggregations: Date ranges, field selectors, tenant filters, ABAC policy enforcement.
Pagination: Offset/limit, cursor-based, scroll API.
Projection: Map domain entities to DTOs with field redaction.

Higher coverage (80%) ensures all query paths are tested (edge cases, boundary conditions, error handling).

What's Tested:

Query Parsing: Convert REST query parameters to Elasticsearch DSL or SQL.
Policy Enforcement: ABAC rules applied correctly (users only see allowed records).
Redaction: Sensitive fields redacted based on user permissions.
Performance: Queries complete within SLA (P95 < 500ms).

Example Test:

[Fact]
public async Task Query_WithTenantFilter_ReturnsOnlyTenantRecords()
{
    // Arrange
    await SeedAuditRecords(tenantId: "tenant-a", count: 10);
    await SeedAuditRecords(tenantId: "tenant-b", count: 5);

    // Act
    var result = await QueryService.QueryAsync(new QueryRequest 
    { 
        TenantId = "tenant-a",
        TimeRange = Last7Days 
    });

    // Assert
    Assert.Equal(10, result.Records.Count);
    Assert.All(result.Records, r => Assert.Equal("tenant-a", r.TenantId));
}

Service-Specific Steps¶

No Docker Image Build (Alternative):

Some Query Service deployments use Azure App Service (not containerized). In this case, skip Docker template:

steps:
  - template: build/lint-microservice-steps.yaml@templates
  - template: build/build-microservice-steps.yaml@templates
  - template: test/test-microservice-steps.yaml@templates
  - template: publish/publish-microservice-steps.yaml@templates
  # Skip: build-and-push-microservice-docker-steps.yaml

Deployment uses binaries artifact instead of Docker image.

Elasticsearch Index Initialization:

Query Service requires Elasticsearch indexes to exist before tests run. Custom step:

- script: |
    curl -X PUT "http://localhost:9200/audit-records" -H 'Content-Type: application/json' -d'
    {
      "mappings": {
        "properties": {
          "eventId": { "type": "keyword" },
          "tenantId": { "type": "keyword" },
          "timestamp": { "type": "date" },
          "action": { "type": "text" },
          "actorId": { "type": "keyword" }
        }
      }
    }'
  displayName: 'Initialize Elasticsearch Index'

ATP Gateway¶

The Gateway Service is ATP's public API facade — routing requests to Ingestion (writes) and Query (reads), enforcing rate limits and authentication, and serving the Angular frontend (optional UI). Its pipeline includes frontend build steps (npm, webpack), API contract validation, and dual artifact publishing (backend + frontend).

Pipeline Configuration¶

Repository: ConnectSoft.ATP.Gateway
File: /azure-pipelines.yml

Complete Pipeline:

# azure-pipelines.yml
name: $(majorMinorVersion).$(semanticVersion)

resources:
  repositories:
    - repository: templates
      type: git
      name: ConnectSoft/ConnectSoft.AzurePipelines

trigger:
  branches:
    include: [master, main, develop]
  paths:
    exclude: [README.md, docs/**]

pool:
  vmImage: 'ubuntu-latest'

variables:
  majorMinorVersion: 1.0
  semanticVersion: $[counter(variables['majorMinorVersion'], 0)]
  solution: '**/*.slnx'
  exactSolution: 'ConnectSoft.ATP.Gateway.slnx'
  buildConfiguration: 'Release'
  restoreVstsFeed: 'e4c108b4-7989-4d22-93d6-391b77a39552/1889adca-ccb6-4ece-aa22-cad1ae4a35f3'
  codeCoverageThreshold: 65  # Lower for thin API controllers
  artifactName: 'atp-gateway-drop'
  frontendPath: 'src/ConnectSoft.ATP.Gateway.UI'
  nodeVersion: '20.x'

stages:
- stage: CI_Stage
  displayName: 'Build Backend and Frontend'
  jobs:
  - job: Build_Backend
    displayName: 'Build .NET Backend'
    steps:
    - task: UseDotNet@2
      inputs:
        version: '8.x'

    - template: build/lint-microservice-steps.yaml@templates
      parameters:
        solution: $(solution)
        exactSolution: $(exactSolution)
        restoreVstsFeed: $(restoreVstsFeed)

    - template: build/build-microservice-steps.yaml@templates
      parameters:
        solution: $(solution)
        buildConfiguration: $(buildConfiguration)

    - template: test/test-microservice-steps.yaml@templates
      parameters:
        solution: $(solution)
        runSettingsFileName: 'ConnectSoft.ATP.Gateway.runsettings'
        codeCoverageThreshold: $(codeCoverageThreshold)

    # Generate OpenAPI spec for contract validation
    - script: |
        dotnet swagger tofile --output $(Build.ArtifactStagingDirectory)/swagger.json \
          src/ConnectSoft.ATP.Gateway/bin/Release/net8.0/ConnectSoft.ATP.Gateway.dll v1
      displayName: 'Generate OpenAPI Spec'

    # Detect breaking changes
    - task: Npm@1
      displayName: 'Install OpenAPI Diff Tool'
      inputs:
        command: 'custom'
        customCommand: 'install -g openapi-diff'

    - script: |
        # Download previous version's spec from artifact feed
        curl -o previous-swagger.json https://connectsoft.blob.core.windows.net/specs/atp-gateway-latest.json
        # Compare specs
        openapi-diff previous-swagger.json $(Build.ArtifactStagingDirectory)/swagger.json
      displayName: 'Detect API Breaking Changes'
      continueOnError: true  # Warning only; manual review required

    - template: publish/publish-microservice-steps.yaml@templates
      parameters:
        artifactName: $(artifactName)

  - job: Build_Frontend
    displayName: 'Build Angular Frontend'
    steps:
    - task: NodeTool@0
      displayName: 'Install Node.js'
      inputs:
        versionSpec: $(nodeVersion)

    - script: |
        cd $(frontendPath)
        npm ci  # Clean install from package-lock.json
      displayName: 'npm ci'

    - script: |
        cd $(frontendPath)
        npm run lint
      displayName: 'ESLint'

    - script: |
        cd $(frontendPath)
        npm run test -- --watch=false --code-coverage --browsers=ChromeHeadless
      displayName: 'Frontend Unit Tests (Karma/Jasmine)'

    - script: |
        cd $(frontendPath)
        npm run build -- --configuration production
      displayName: 'Build Angular App'

    # Publish frontend build output
    - task: PublishPipelineArtifact@1
      displayName: 'Publish Frontend Artifact'
      inputs:
        targetPath: '$(frontendPath)/dist'
        artifact: 'atp-gateway-frontend'
        publishLocation: 'pipeline'

- stage: CD_Dev
  displayName: 'Deploy to Development'
  dependsOn: CI_Stage
  condition: succeeded()
  variables:
  - group: ATP-Dev-Variables
  jobs:
  - deployment: DeployToDev
    environment: ATP-Dev
    strategy:
      runOnce:
        deploy:
          steps:
          # Deploy backend
          - download: current
            artifact: $(artifactName)
          - template: deploy/deploy-microservice-to-azure-web-site.yaml@templates
            parameters:
              azureSubscription: '$(azureSubscription)'
              appName: 'atp-gateway-dev'
              package: '$(Pipeline.Workspace)/$(artifactName)/*.zip'

          # Deploy frontend to Azure Blob Storage (static site hosting)
          - download: current
            artifact: 'atp-gateway-frontend'
          - task: AzureCLI@2
            displayName: 'Deploy Frontend to Azure Storage'
            inputs:
              azureSubscription: '$(azureSubscription)'
              scriptType: 'bash'
              scriptLocation: 'inlineScript'
              inlineScript: |
                az storage blob upload-batch \
                  --account-name atpgatewaydev \
                  --source $(Pipeline.Workspace)/atp-gateway-frontend \
                  --destination '$web' \
                  --overwrite

Frontend Build Steps¶

Node.js Setup:

Version: Node.js 20.x (LTS).
Package Manager: npm with package-lock.json for reproducible builds.

Build Steps:

npm ci: Clean install (delete node_modules, install from lock file).
- Faster than npm install in CI.
- Ensures exact package versions (no version drift).
ESLint: JavaScript/TypeScript linting (enforce coding standards).
- Configuration: .eslintrc.json in frontend directory.
- Rules: Airbnb style guide, Angular best practices, no console logs.
Frontend Unit Tests: Karma/Jasmine tests for Angular components, services, pipes.
- Browser: ChromeHeadless (no GUI needed).
- Coverage: Jest or Istanbul coverage reports; threshold 60% (UI code less critical than backend).
Build Angular App: Production build with optimizations.
- AOT (Ahead-of-Time) Compilation: Templates compiled to JavaScript.
- Tree Shaking: Dead code elimination.
- Minification: Uglify/Terser for smaller bundles.
- Output: dist/ folder with index.html, JS bundles, CSS, assets.

Artifact Separation:

Backend Artifact: atp-gateway-drop (ASP.NET Core API binaries).
Frontend Artifact: atp-gateway-frontend (Angular static files).

Deployment:

Backend: Deployed to Azure App Service (API endpoints).
Frontend: Deployed to Azure Blob Storage with static website hosting (or Azure CDN).

API Contract Validation¶

Purpose: Detect breaking changes in REST API contracts before deployment.

OpenAPI Spec Generation:

- script: |
    dotnet swagger tofile --output $(Build.ArtifactStagingDirectory)/swagger.json \
      src/ConnectSoft.ATP.Gateway/bin/Release/net8.0/ConnectSoft.ATP.Gateway.dll v1
  displayName: 'Generate OpenAPI Spec'

Breaking Change Detection:

Uses openapi-diff tool to compare current spec with previous version:

- script: |
    # Download previous version's spec from artifact feed or blob storage
    curl -o previous-swagger.json https://connectsoft.blob.core.windows.net/specs/atp-gateway-latest.json

    # Compare specs (detects removed endpoints, changed schemas, new required fields)
    openapi-diff previous-swagger.json $(Build.ArtifactStagingDirectory)/swagger.json \
      --fail-on-incompatible
  displayName: 'Detect API Breaking Changes'

Breaking Changes Detected:

Removed Endpoints: DELETE operation removed.
Schema Changes: Required field added to request DTO.
Response Changes: Field removed from response DTO.
Version Changes: API version bumped (major version).

Pipeline Behavior:

Breaking Change Found: Pipeline fails (or warns, depending on configuration).
Approval Required: Platform team reviews change; approves if intentional (major version bump).
Documentation: Update API changelog and migration guide.

Benefits:

Consumer Safety: Prevents breaking changes from surprising API consumers.
Versioning Guidance: Detects when major version bump needed.
Documentation: Auto-generated OpenAPI spec kept in sync with code.

Coverage Threshold: 65%¶

Rationale: Gateway has thin API controllers (minimal business logic):

Controllers delegate to Ingestion/Query services (orchestration, not domain logic).
Most code is middleware (authentication, rate limiting, logging).
Integration tests validate API contracts, not controller internals.

Lower threshold (65% vs. 70% default) reflects architectural role. What's tested:

Request Validation: DTOs validated; invalid requests rejected (400).
Authentication: JWT tokens validated; unauthorized requests rejected (401).
Authorization: ABAC policies enforced; forbidden requests rejected (403).
Rate Limiting: Excessive requests throttled (429).
Error Handling: Exceptions mapped to problem details (RFC 7807).

What's Not Tested (excluded from coverage):

Middleware registration (Startup.cs).
Dependency injection configuration.
Static file serving.

Service Comparison Matrix¶

Service	Solution	Coverage	Service Containers	Docker	Deployment	Notes
Ingestion	`ConnectSoft.ATP.Ingestion.slnx`	75%	Redis, RabbitMQ, OTEL, Seq	Yes	Azure App Service / ACA	Write path; high throughput; idempotency critical
Query	`ConnectSoft.ATP.Query.slnx`	80%	Elasticsearch, Postgres, Redis, OTEL	Yes	Azure App Service / ACA	Read path; complex queries; search integration
Gateway	`ConnectSoft.ATP.Gateway.slnx`	65%	None (API only)	Yes	Azure App Service + Blob Storage	API facade; frontend hosting; contract validation
Integrity	`ConnectSoft.ATP.Integrity.slnx`	85%	Postgres (ledger), OTEL	Yes	Azure App Service / ACA	Cryptographic operations; hash chaining; very high coverage
Export	`ConnectSoft.ATP.Export.slnx`	70%	Postgres, Azure Storage, OTEL	Yes	Azure Functions / ACA Jobs	Batch processing; file generation; standard coverage
Policy	`ConnectSoft.ATP.Policy.slnx`	75%	Redis (cache), OTEL	Yes	Azure App Service / ACA	OPA/Rego evaluation; deterministic logic; high coverage
Search	`ConnectSoft.ATP.Search.slnx`	70%	Elasticsearch, Redis, OTEL	Yes	Azure App Service / ACA	Search indexing; aggregations; standard coverage

Coverage Rationale:

Critical Services (Integrity, Query): Higher coverage (80-85%) due to complex logic and correctness requirements.
Orchestration Services (Gateway, Export): Lower coverage (65-70%) due to thin logic and delegation patterns.
Standard Services (Ingestion, Policy, Search): Default coverage (70-75%) balancing safety and pragmatism.

Common Pipeline Patterns¶

All ATP services share common pipeline patterns (inherited from ConnectSoft templates):

Template Composition¶

steps:
  - template: build/lint-microservice-steps.yaml@templates
  - template: build/build-microservice-steps.yaml@templates
  - template: test/test-microservice-steps.yaml@templates
  - template: publish/publish-microservice-steps.yaml@templates

Benefits:

Consistency: All services follow same CI workflow.
Maintainability: Template improvements propagate to all services.
Onboarding: New services copy-paste template references; no custom pipeline logic.

Variable Groups¶

All services use environment-specific variable groups:

variables:
- group: ATP-Dev-Variables   # For CD_Dev stage
- group: ATP-Test-Variables  # For CD_Test stage
- group: ATP-Staging-Variables  # For CD_Staging stage
- group: ATP-Prod-Variables  # For CD_Production stage

Shared Variables (across all services):

azureSubscription: Service connection for Azure authentication.
containerRegistry: Azure Container Registry URL.
keyVaultUrl: Key Vault URL for secrets.
appInsightsKey: Application Insights instrumentation key.

Service-Specific Variables (per service):

appName: Azure App Service name (e.g., atp-ingestion-dev).
connectionString: Service-specific database connection.

Multi-Service Orchestration¶

Coordinated Deployments: When deploying full ATP stack, use orchestration pipeline:

# atp-platform-deploy.yml (orchestration pipeline)
trigger: none  # Manual only

stages:
- stage: Deploy_All_Services
  jobs:
  - job: Trigger_Service_Pipelines
    steps:
    - task: TriggerBuild@3
      displayName: 'Trigger Ingestion Pipeline'
      inputs:
        definitionIsInCurrentTeamProject: true
        buildDefinition: 'ATP-Ingestion-CI-CD'
        queueBuildForUserThatTriggeredBuild: true
        waitForQueuedBuildsToFinish: true

    - task: TriggerBuild@3
      displayName: 'Trigger Query Pipeline'
      inputs:
        definitionIsInCurrentTeamProject: true
        buildDefinition: 'ATP-Query-CI-CD'
        waitForQueuedBuildsToFinish: true

    - task: TriggerBuild@3
      displayName: 'Trigger Gateway Pipeline'
      inputs:
        definitionIsInCurrentTeamProject: true
        buildDefinition: 'ATP-Gateway-CI-CD'
        waitForQueuedBuildsToFinish: true

Use Cases:

Major Releases: Deploy all services simultaneously (version alignment).
Breaking Changes: Coordinate deployment of dependent services (contracts updated).
Environment Rebuild: Redeploy entire ATP stack to new environment.

Quality Gates & Policies¶

Quality gates are automated checkpoints that enforce standards for code quality, security, testing, and compliance throughout the CI/CD pipeline. ATP implements a fail-fast philosophy where quality gates block progression at the earliest possible stage — preventing low-quality code from reaching production and reducing waste from testing and deploying code that doesn't meet standards.

These gates are complemented by branch policies (enforced in Azure Repos) that govern code merging and review, and release gates (enforced in deployment stages) that validate production readiness. Together, they form a comprehensive quality framework ensuring that ATP maintains high standards while enabling rapid iteration.

Build Quality Checks¶

Build quality checks are automated validations executed during the CI stage. If any check fails, the pipeline stops immediately, no artifacts are published, and developers are notified to remediate. These checks are non-negotiable — exceptions require Platform team approval with documented justification and remediation plan.

Code Coverage Thresholds¶

Policy: Every build must achieve minimum code coverage thresholds to ensure adequate testing.

Thresholds:

Line Coverage: ≥70% of executable lines covered by tests (ATP default).
Branch Coverage: ≥60% of conditional branches covered by tests.
Service-Specific Overrides: Critical services have higher thresholds (see Service Comparison Matrix).

Enforcement Mechanism:

- task: PublishCodeCoverageResults@1
  displayName: 'Publish Code Coverage'
  inputs:
    codeCoverageTool: 'Cobertura'
    summaryFileLocation: '$(Agent.TempDirectory)/**/coverage.cobertura.xml'
    failIfCoverageEmpty: true

- script: |
    COVERAGE=$(grep -oP 'line-rate="\K[0-9.]+' $(Agent.TempDirectory)/**/coverage.cobertura.xml | head -1)
    COVERAGE_PCT=$(echo "$COVERAGE * 100" | bc)
    THRESHOLD=$(codeCoverageThreshold)

    if (( $(echo "$COVERAGE_PCT < $THRESHOLD" | bc -l) )); then
      echo "ERROR: Code coverage ${COVERAGE_PCT}% is below threshold ${THRESHOLD}%"
      exit 1
    fi

    echo "Code coverage ${COVERAGE_PCT}% meets threshold ${THRESHOLD}%"
  displayName: 'Enforce Coverage Threshold'

Exclusions (not counted toward coverage):

Generated code: *.Designer.cs, *.g.cs, *.g.i.cs.
Database migrations: Migrations/**/*.cs.
Test projects: *.Tests, *.IntegrationTests.
DTOs and simple models: Properties-only classes with no logic.

Coverage Report:

Azure DevOps displays coverage breakdown by assembly, class, and method:

Assembly: ConnectSoft.ATP.Ingestion.dll
  Line Coverage: 78.5% (1,234 of 1,571 lines covered)
  Branch Coverage: 65.2% (234 of 359 branches covered)

  Class: AuditEventValidator
    Line Coverage: 92.3% (12 of 13 lines covered)
    Branch Coverage: 87.5% (7 of 8 branches covered)

Rationale: Coverage thresholds ensure new features are tested before merge. 70% line coverage is industry best practice (balance between safety and diminishing returns). Branch coverage ensures conditional logic (if/else, switch) tested across all paths.

Failure Action:

Pipeline fails with actionable message:

Code Coverage Gate Failed
  Current: 68.5%
  Required: 70.0%
  Gap: -1.5%

Action Required:
  - Add unit tests to increase coverage
  - OR request exception from Platform team with justification

Uncovered Code:
  - ConnectSoft.ATP.Ingestion.Services.EventProcessor.HandleRetry() - 0% coverage
  - ConnectSoft.ATP.Ingestion.Validators.TenancyValidator.ValidateRegion() - 33% coverage

Test Pass Rate¶

Policy: All tests must pass (100% pass rate). No flaky test tolerance.

Enforcement Mechanism:

- task: DotNetCoreCLI@2
  displayName: 'Run Tests'
  inputs:
    command: 'test'
    projects: '**/*Tests.csproj'
    arguments: '--configuration $(buildConfiguration) --no-build --logger trx'
    publishTestResults: true

# Built-in behavior: Pipeline fails if any test fails

Test Failure Handling:

Deterministic Failures: Business logic bugs; developer fixes code.
Flaky Tests: Intermittent failures (race conditions, timing issues).
- ATP policy: Immediately disable with [Ignore] attribute.
- Create work item to fix root cause.
- Track in backlog; prioritize for next sprint.
Infrastructure Failures: Service container unavailable (network, image pull).
- Retry pipeline once (infrastructure issues often transient).
- If persists, escalate to DevOps team.

Flaky Test Example:

[Fact(Skip = "Flaky: race condition in cache invalidation - tracked in AUD-1234")]
public async Task Cache_Invalidation_PropagatesAcrossInstances()
{
    // Test disabled until race condition fixed
}

Rationale: Flaky tests erode confidence in test suite. If tests sometimes fail randomly, developers ignore failures ("probably flaky"), masking real bugs. Zero-tolerance policy forces teams to fix flakiness or disable tests.

Test Retry Policy: ATP does not allow test retries in pipelines (hides flakiness). If test fails, pipeline fails.

Security Scan Quality Gate¶

Policy: Zero critical or high severity vulnerabilities allowed in code or dependencies.

Enforcement Mechanisms:

SonarQube Quality Gate (SAST):
- Evaluates code against security rules (OWASP Top 10, CWE).
- Blocks pipeline if:
  - Any critical or high severity vulnerabilities detected.
  - Security rating < B.
  - Security hotspots not reviewed (≥80% must be reviewed).
OWASP Dependency-Check (Dependency Scanning):
- Scans NuGet packages against NVD.
- Blocks pipeline if:
  - Any package has CVSS ≥ 9.0 (critical vulnerability).
  - High vulnerability (CVSS 7.0-8.9) in staging/production pipelines.
GitGuardian / Credential Scanner (Secrets Detection):
- Scans commits for hardcoded secrets.
- Blocks pipeline if:
  - Any secret detected (API keys, passwords, tokens, private keys).
Trivy (Container Scanning):
- Scans Docker images for OS and library vulnerabilities.
- Blocks pipeline if:
  - Critical or high vulnerabilities in final image.

Gate Evaluation Timing:

SonarQube: After tests complete (results uploaded, quality gate evaluated).
OWASP: During build or as separate job.
Secrets: During lint stage (early detection).
Trivy: After Docker image built, before push.

Failure Example:

Security Gate Failed: SonarQube Quality Gate

Vulnerabilities:
  - CRITICAL: SQL Injection risk in EventQueryBuilder.cs:42
    Rule: S3649 - Use parameterized queries
    Severity: CRITICAL

  - HIGH: Hardcoded password in appsettings.Development.json
    Rule: S2068 - Credentials should not be hard-coded
    Severity: HIGH

Action Required:
  - Fix vulnerabilities and re-run pipeline
  - OR request security exception (requires CISO approval)

Exception Process:

Developer creates security exception request (ServiceNow ticket).
Security team evaluates risk:
- False Positive: Approve suppression (add to suppression file).
- Accepted Risk: Approve with mitigation plan (e.g., WAF rule, network restriction).
- Rejected: Developer must fix vulnerability.
Exception recorded in audit log (approval, justification, expiration date).

Rationale: Security gates prevent vulnerabilities from reaching production. SAST catches code-level issues early (before runtime). Dependency scanning addresses supply chain risks. Zero-tolerance for critical/high vulnerabilities aligns with ATP's security-first mission.

Deprecation Policy¶

Policy: Zero deprecated NuGet packages allowed in production builds.

Enforcement Mechanism:

Lint template scans for deprecated packages and generates report:

- script: |
    dotnet list package --deprecated --include-transitive > deprecated-packages.txt

    if grep -q "deprecated" deprecated-packages.txt; then
      echo "ERROR: Deprecated packages detected:"
      cat deprecated-packages.txt
      exit 1
    fi
  displayName: 'Check for Deprecated Packages'

Deprecated Package Example:

The following sources were used:
   https://api.nuget.org/v3/index.json
   https://pkgs.dev.azure.com/ConnectSoft/_packaging/ConnectSoft/nuget/v3/index.json

Project `ConnectSoft.ATP.Ingestion` has the following deprecated packages:
   [net8.0]:
   Top-level Package      Requested   Resolved   Reason(s)   Alternative
   > Newtonsoft.Json      12.0.3      12.0.3     Legacy      System.Text.Json

Policy Enforcement:

Development Builds: Warning (non-blocking); 30-day grace period to migrate.
Staging Builds: Blocking (must migrate before staging deployment).
Production Builds: Blocking (deprecated packages never reach production).

Migration Process:

Identify deprecated package and alternative (from deprecation notice).
Update .csproj to reference alternative package.
Refactor code to use new API (breaking changes possible).
Update tests; verify functionality unchanged.
Commit changes; pipeline passes.

Rationale: Deprecated packages eventually become unsupported (security patches stop). Proactive migration prevents technical debt and supply chain vulnerabilities.

Common Deprecations:

Newtonsoft.Json → System.Text.Json (.NET built-in).
Microsoft.Extensions.Logging.AzureAppServices → Microsoft.Extensions.Logging.ApplicationInsights.
Legacy Azure SDK packages → Azure.* (new SDK).

Branch Policies (Azure Repos)¶

Branch policies enforce code review and validation requirements before code is merged to protected branches (master, main). These policies are configured in Azure Repos and apply to all ATP service repositories.

Pull Request Required¶

Policy: No direct commits to master or main branches. All changes must go through pull request workflow.

Configuration (Azure Repos):

Branch: master
  ☑ Require a pull request before merging
  ☐ Allow users to bypass when pushing

Workflow:

Developer creates feature branch: feature/audit-classification-improvements.
Commits changes to feature branch.
Opens pull request: feature/audit-classification-improvements → master.
Pull request triggers validation build (CI stage only, no deployment).
Code review required (see next policy).
After approval and build success, PR merged to master.

Benefits:

Code Review: All changes reviewed by at least one other engineer.
Build Validation: Changes validated before merge (prevents breaking main branch).
Audit Trail: PR history provides complete record of what changed, why, and who approved.

Exemptions:

Hotfixes: Emergency fixes can bypass with CISO approval (logged in audit trail).
Documentation: README/docs changes can be direct-committed (low risk).

Enforcement: Azure Repos blocks direct commits; returns error message with PR instructions.

Build Validation¶

Policy: Pull request builds must pass all CI stages before merge is allowed.

Configuration (Azure Repos):

Branch: master
  ☑ Build validation
    Build pipeline: ATP-Ingestion-CI-CD
    Trigger: Automatic (for each new commit to PR)
    Policy requirement: Required
    Build expiration: 12 hours
    Display name: "PR Build Validation"

PR Build Behavior:

Trigger: Automatically when PR created or updated (new commits pushed).
Scope: CI stage only (lint, build, test, security scan); no deployment stages.
Visibility: Build status displayed in PR UI (✓ passed, ✗ failed, ⏳ in progress).
Merge Blocking: If build fails, "Complete" button disabled; merge blocked.

PR Build Differences from CI Build:

Artifacts: Not published (PR builds are temporary).
SBOM: Not generated (no deployment planned).
Retention: Build logs retained for 30 days (vs. 1 year for release builds).
Notifications: Only PR author notified (not entire team).

Multi-Service PR Builds:

If PR changes shared contracts (e.g., ConnectSoft.ATP.Contracts), trigger builds for consuming services:

# In Contracts repository
pr:
  branches:
    include: [master]

stages:
- stage: Validate_Contract_Changes
  jobs:
  - job: Trigger_Consumer_Builds
    steps:
    - task: TriggerBuild@3
      displayName: 'Validate Ingestion Compatibility'
      inputs:
        buildDefinition: 'ATP-Ingestion-CI-CD'
        queueBuildForUserThatTriggeredBuild: true

    - task: TriggerBuild@3
      displayName: 'Validate Query Compatibility'
      inputs:
        buildDefinition: 'ATP-Query-CI-CD'

Benefits:

Broken Builds Prevention: Main branch never broken (all changes pre-validated).
Fast Feedback: Developers know within 10 minutes if changes break tests.
Confidence: Merge button enabled = code is production-ready.

Code Review Requirements¶

Policy: Minimum reviewer approvals required before merge; critical changes require additional approvals.

Configuration (Azure Repos):

Branch: master
  ☑ Require minimum number of reviewers: 1
  ☑ Allow requestors to approve their own changes: ☐ (disabled)
  ☑ Prohibit last pusher from approving their own changes: ☑ (enabled)
  ☑ Reset code reviewer votes when new changes are pushed: ☑ (enabled)

  Additional policies for /src/ConnectSoft.ATP.*/Platform/** paths:
    ☑ Require minimum number of reviewers: 2
    ☑ Require at least one reviewer from "Platform-Architects" group

Approval Requirements:

Change Type	Reviewers Required	Reviewer Role	Example
Feature Development	1	Senior Engineer (same team)	New API endpoint, bug fix
Platform/Infrastructure	2	Platform Architect + SRE	Middleware changes, security controls
Breaking Changes	2	Platform Lead + affected service owners	API contract changes, schema migrations
Security Changes	2	Security Engineer + Platform Lead	Authentication, authorization, encryption
Documentation	0 (optional)	Any engineer	README, comments, diagrams

Code Review Checklist:

Reviewers validate:

Functional Correctness: Code does what PR description claims.
Test Coverage: New code has unit tests (coverage threshold met).
Code Quality: Follows C# conventions, SOLID principles, DDD patterns.
Security: No hardcoded secrets, SQL injection risks, XSS vulnerabilities.
Performance: No obvious performance issues (N+1 queries, unnecessary allocations).
Documentation: Public APIs have XML comments; complex logic explained.
Breaking Changes: API changes documented; migration guide provided.

Review Comments:

Reviewers use Azure DevOps PR threads:

Blocker: Must be fixed before merge (❌).
Non-Blocker: Suggestions for improvement (💡).
Approval: Code looks good (👍).

PR Approval Flow:

Reviewer receives notification (email, Teams).
Reviews code changes (diff view, files changed).
Adds comments (inline or general).
Approves or requests changes.
If approved and build passes, PR can be merged.

Rationale: Code review catches bugs, improves code quality, shares knowledge across team. Multiple reviewers for platform changes prevent single point of failure.

Work Item Linking¶

Policy: Every pull request must link to at least one Azure DevOps work item (Epic, Feature, User Story, Task, or Bug).

Configuration (Azure Repos):

Branch: master
  ☑ Check for linked work items
    Policy requirement: Required
    Message: "Please link a work item to this pull request"

Workflow:

Developer creates PR.
Azure Repos checks for linked work items.
If none found, PR shows warning: "No linked work items".
Developer links work item via PR UI or commit message (e.g., Fixes AB#1234).
Work item link validated; PR can proceed to review.

Work Item Commit Message Format:

git commit -m "Add audit event classification logic

Implements classification rules for PII detection and redaction
based on field metadata and tenant policy.

Related: AB#1234 (Feature: Audit Event Classification)

Benefits:

Traceability: Every code change traced to business requirement (Epic/Feature).
Context: Reviewers see why change made (work item description).
Audit Trail: Compliance audits can trace production code to original requirement.
Release Notes: Auto-generate release notes from linked work items.

Exemptions:

Hotfixes: Can merge without work item (create work item post-merge for tracking).
Documentation: Optional work item linking (README changes low risk).

Work Item State Transitions:

When PR merged, linked work items automatically transition:

Task: Active → Resolved → Closed (after deployment).
Bug: Active → Resolved → Closed (after verification in Test environment).

Release Gates¶

Release gates are validation checkpoints executed before and after deployments to staging and production environments. Unlike build quality checks (automated), release gates may include manual approvals, external system validation (change tickets, incident checks), and post-deployment monitoring.

Pre-Deployment Gates¶

Pre-deployment gates validate that the environment is ready for deployment and that all prerequisites are satisfied.

Manual Approval:

Configuration (Azure DevOps Environments):

Environment: ATP-Staging
  Approvals and checks:
    ☑ Approvals
      Approvers: Platform-Team (group)
      Minimum approvers: 1
      Approval timeout: 30 days
      Instructions: "Verify CI passed, review release notes, confirm rollback plan"

Approval Workflow:

Deployment triggered (manually or automatically after previous stage).
Pipeline pauses at approval gate.
Notification sent to approvers (email, Teams).
Approver reviews:
- CI stage results (tests passed, coverage met, security scans clean).
- Linked work items (feature scope, acceptance criteria).
- Release notes (what's changing).
- Rollback plan (how to revert if issues).
Approver approves or rejects via Azure DevOps UI.
If approved, deployment proceeds; if rejected, pipeline canceled.

Approval Audit Log:

{
  "approvalId": "12345",
  "environment": "ATP-Staging",
  "approver": "alice@connectsoft.com",
  "timestamp": "2025-10-30T14:23:45Z",
  "decision": "Approved",
  "comment": "Release notes reviewed; rollback plan documented; proceeding with deployment."
}

Change Ticket Validation (ServiceNow Integration):

For production deployments, validate that approved change ticket exists:

- task: ServiceNowChangeManagement@1
  displayName: 'Validate Change Ticket'
  inputs:
    serviceNowConnection: 'ConnectSoft-ServiceNow'
    action: 'Validate'
    changeRequestNumber: '$(changeTicketNumber)'
    requiredState: 'Approved'

Gate Behavior:

Ticket Approved: Deployment proceeds.
Ticket Pending: Pipeline waits (polling every 5 minutes, max 24 hours).
Ticket Rejected: Deployment canceled; CAB feedback provided to team.

Incident Check (PagerDuty Integration):

Verify no active production incidents before deployment:

- script: |
    ACTIVE_INCIDENTS=$(curl -s "https://api.pagerduty.com/incidents?statuses[]=triggered&statuses[]=acknowledged" \
      -H "Authorization: Token $(PagerDutyApiKey)" | jq '.incidents | length')

    if [ "$ACTIVE_INCIDENTS" -gt "0" ]; then
      echo "ERROR: $ACTIVE_INCIDENTS active incidents; deployment blocked"
      exit 1
    fi

    echo "No active incidents; deployment approved"
  displayName: 'Check for Active Incidents'

Rationale: Deploying during active incident increases risk (harder to diagnose issues, rollback decisions more complex).

Post-Deployment Gates¶

Post-deployment gates validate that the deployment was successful and that the service is healthy before marking the deployment complete.

Smoke Tests Pass:

Purpose: Validate critical paths work after deployment.

Execution:

- script: |
    # Test Ingestion API
    curl -X POST https://atp-ingestion-staging.azurewebsites.net/api/v1/events \
      -H "Content-Type: application/json" \
      -d '{"eventId":"test-123","tenantId":"tenant-a","action":"user.login"}'

    # Test Query API
    curl https://atp-query-staging.azurewebsites.net/api/v1/events?tenantId=tenant-a

    # Test Integrity verification
    curl https://atp-integrity-staging.azurewebsites.net/api/v1/verify/segment/latest
  displayName: 'Smoke Tests'

Smoke Test Coverage:

API Endpoints: Critical endpoints respond (200 status).
Database Connectivity: Service can read/write to database.
Cache Connectivity: Redis reachable and responsive.
Message Bus: RabbitMQ connection established.
Downstream Services: Gateway can reach Ingestion/Query services.

Failure Action: If smoke tests fail, deployment marked as failed; rollback initiated.

Health Checks Green:

Purpose: Verify service instances are healthy (dependencies reachable, resources available).

Execution:

- script: |
    for i in {1..10}; do
      HEALTH=$(curl -s https://atp-ingestion-staging.azurewebsites.net/health)
      STATUS=$(echo $HEALTH | jq -r '.status')

      if [ "$STATUS" == "Healthy" ]; then
        echo "Health check passed (attempt $i)"
        exit 0
      fi

      echo "Health check failed (attempt $i), retrying in 30s..."
      sleep 30
    done

    echo "ERROR: Health check failed after 10 attempts"
    exit 1
  displayName: 'Wait for Healthy Status'

Health Check Response (example):

{
  "status": "Healthy",
  "checks": {
    "database": "Healthy",
    "redis": "Healthy",
    "rabbitmq": "Healthy",
    "disk": "Healthy",
    "memory": "Healthy"
  },
  "version": "1.0.42",
  "timestamp": "2025-10-30T14:25:12Z"
}

Health Check Criteria:

Overall Status: Healthy (all checks passed).
Database: Connection established, query executed successfully.
Redis: Cache reachable, ping response < 10ms.
RabbitMQ: Connection established, can publish/consume messages.
Resources: Disk < 85% full, memory < 85% used.

Failure Action: If health checks fail, deployment rolled back; on-call team paged.

No Active Alerts:

Purpose: Verify no alerts fired post-deployment (indicates issues).

Execution:

- script: |
    ALERTS=$(curl -s "https://prometheus.connectsoft.com/api/v1/alerts" | jq '[.data.alerts[] | select(.state=="firing")] | length')

    if [ "$ALERTS" -gt "0" ]; then
      echo "ERROR: $ALERTS active alerts detected post-deployment"
      curl -s "https://prometheus.connectsoft.com/api/v1/alerts" | jq '.data.alerts[] | select(.state=="firing")'
      exit 1
    fi

    echo "No active alerts; deployment healthy"
  displayName: 'Check for Active Alerts'

Alert Categories Monitored:

Error Rate Spike: Error rate > 1% (baseline < 0.1%).
Latency Degradation: P95 latency > 1000ms (baseline ~500ms).
Resource Exhaustion: CPU > 90%, memory > 85%, disk > 90%.
Dependency Failures: Redis down, SQL unreachable, RabbitMQ disconnected.

Failure Action: If alerts detected, pause deployment; investigate before proceeding.

Rollback Triggers¶

Automated and manual triggers that initiate deployment rollback.

Automated Rollback Triggers:

Metric	Threshold	Baseline	Action
Error Rate	> 1%	< 0.1%	Immediate rollback
P95 Latency	> 2x baseline	~500ms	Rollback after 5 min observation
Health Check Failures	> 10% instances	0%	Immediate rollback
Throughput Degradation	> 50% drop	Varies	Rollback after 5 min observation
Critical Alerts	Any fired	None	Immediate rollback

Automated Rollback Execution:

postRouteTraffic:
  steps:
  - script: |
      ERROR_RATE=$(query_prometheus "error_rate")

      if [ "$ERROR_RATE" -gt "1" ]; then
        echo "ERROR: Error rate ${ERROR_RATE}% exceeds threshold 1%; initiating rollback"
        exit 1  # Triggers 'on: failure' block
      fi
    displayName: 'Monitor Error Rate'

on:
  failure:
    steps:
    - script: echo "Automated rollback triggered"
    - task: AzureAppServiceManage@0
      displayName: 'Rollback (Swap Slots)'
      inputs:
        azureSubscription: $(azureSubscription)
        action: 'Swap Slots'
        webAppName: atp-ingestion-prod
        sourceSlot: 'production'
        targetSlot: 'staging'
    - script: |
        # Notify on-call team
        curl -X POST https://api.pagerduty.com/incidents \
          -H "Authorization: Token $(PagerDutyApiKey)" \
          -d '{"incident":{"type":"incident","title":"Automated rollback triggered for ATP Ingestion"}}'
      displayName: 'Alert On-Call Team'

Manual Abort:

Trigger: SRE on-call or Platform Lead manually aborts deployment.

Scenarios:

Customer reports issues correlating with deployment.
Monitoring shows degradation not captured by automated thresholds.
Compliance issue discovered (e.g., data residency violation).

Execution:

SRE navigates to Azure DevOps pipeline run.
Clicks "Cancel" button (or uses Azure CLI: az pipelines run cancel --id <runId>).
Pipeline halts immediately.
on: failure block executes (rollback steps).
Incident ticket created for investigation.

Rollback Communication:

- script: |
    # Update status page
    curl -X POST https://api.statuspage.io/v1/pages/$(StatusPageId)/incidents \
      -d "name=ATP Deployment Rollback&status=investigating&message=Deployment rolled back due to elevated error rate"

    # Notify customer success team
    curl -X POST https://hooks.slack.com/services/$(SlackWebhook) \
      -d '{"text":"🚨 ATP Production deployment rolled back - investigating"}'
  displayName: 'Communicate Rollback'

Post-Rollback:

Incident Investigation: Root cause analysis (RCA) to determine failure reason.
Fix Forward: Develop fix, test in dev/test/staging, redeploy.
Retrospective: Team reviews what went wrong, how to prevent recurrence.
Process Improvement: Update quality gates, add tests, improve monitoring.

Rationale: Release gates protect production from bad deployments. Pre-deployment gates validate readiness; post-deployment gates validate success. Automated rollback minimizes blast radius; manual abort provides human override when needed.

Pipeline Observability & Metrics¶

Pipeline observability provides real-time visibility into CI/CD health, performance, and outcomes. ATP tracks pipeline metrics, builds dashboards, and integrates with the broader observability stack (Grafana, Prometheus, Application Insights) to enable data-driven decisions, bottleneck identification, and continuous improvement.

Metrics are collected at every pipeline stage (lint, build, test, deploy) and correlated with service behavior in production. This end-to-end observability enables teams to answer critical questions: Which services have slow pipelines? Are deployments getting more frequent? What's the lead time from commit to production? Which tests are flaky?

Pipeline Metrics (Tracked)¶

ATP tracks four primary pipeline metrics that provide insights into CI/CD performance and quality. These metrics are collected automatically by Azure Pipelines and exported to Azure Monitor/Application Insights for long-term retention and analysis.

Build Duration¶

Metric: Time from pipeline start to CI stage completion (lint → build → test → publish).

Target: < 10 minutes for CI stage.

Measurement:

- script: |
    START_TIME=$(date -u +%s)
    echo "##vso[task.setvariable variable=buildStartTime]$START_TIME"
  displayName: 'Record Build Start Time'

# ... build steps ...

- script: |
    END_TIME=$(date -u +%s)
    DURATION=$((END_TIME - buildStartTime))
    echo "Build duration: ${DURATION}s"

    if [ "$DURATION" -gt "600" ]; then
      echo "##vso[task.logissue type=warning]Build duration ${DURATION}s exceeds target 600s"
    fi

    # Export metric to Azure Monitor
    curl -X POST https://monitoring.connectsoft.com/api/metrics \
      -d "{\"metric\":\"pipeline.build.duration\",\"value\":$DURATION,\"service\":\"atp-ingestion\"}"
  displayName: 'Measure Build Duration'

Tracking:

Azure DevOps automatically tracks pipeline duration; visible in:

Pipeline Run Summary: Total duration and per-stage breakdown.
Analytics: Duration trends over time (line chart).
Dashboards: Custom widgets showing duration percentiles (P50, P95, P99).

Duration Breakdown (example):

Pipeline: ATP-Ingestion-CI-CD #142
  Total Duration: 8m 23s

  Stage: CI_Stage (8m 15s)
    Job: Build_Test_Publish
      Checkout: 12s
      Lint: 1m 45s
        - StyleCop: 23s
        - SonarQube Prepare: 1m 22s
      Build: 2m 10s
        - dotnet restore: 45s
        - dotnet build: 1m 25s
      Test: 3m 48s
        - dotnet test: 3m 12s
        - Publish coverage: 36s
      Docker Build: 1m 30s
      Publish Artifacts: 42s

Optimization Triggers:

Duration > 10 minutes: Alert Platform team; investigate slow steps.
Duration increase > 20%: Investigate recent changes (new tests, dependencies).
Optimization Strategies:
- Parallelize tests (split assemblies across agents).
- Cache NuGet packages (restore-cache task).
- Optimize Docker layer caching (ORDER BY change frequency).
- Skip Docker build for non-containerized deployments.

Performance Trends:

Track duration trends to identify degradation:

Week 1: Avg 7m 12s, P95 8m 45s
Week 2: Avg 7m 34s, P95 9m 12s  ⚠️ +5% increase
Week 3: Avg 8m 56s, P95 10m 34s ❌ Exceeds target; optimization needed

Rationale: Fast pipelines enable rapid iteration. 10-minute target keeps developers in flow state (don't context-switch while waiting). Tracking trends prevents gradual degradation.

Test Duration¶

Metric: Time to execute all unit and integration tests.

Target: < 5 minutes; parallelize if exceeds.

Measurement:

Test duration automatically tracked by dotnet test and published to Azure DevOps Test Plans.

Test Duration Report (example):

Test Run: ATP-Ingestion-Tests #142
  Total Tests: 1,234
  Passed: 1,234 (100%)
  Failed: 0
  Skipped: 3
  Duration: 4m 23s

  Test Assemblies:
    - ConnectSoft.ATP.Ingestion.Domain.Tests: 1m 12s (482 tests)
    - ConnectSoft.ATP.Ingestion.Api.Tests: 45s (231 tests)
    - ConnectSoft.ATP.Ingestion.Infrastructure.Tests: 2m 26s (521 tests) ⚠️ Slow

Slow Test Identification:

Azure DevOps Test Plans highlights slow tests:

Slowest Tests:
  1. EventProcessor_HandleHighVolumeLoad_Test: 45s ⚠️
  2. DatabaseRepository_BulkInsert_1000Records_Test: 23s
  3. IntegrityService_VerifySegment_LargeDataset_Test: 18s

Optimization Strategies:

Parallelize Test Assemblies:

- task: DotNetCoreCLI@2
  displayName: 'Run Tests (Parallel)'
  inputs:
    command: 'test'
    projects: '**/*Tests.csproj'
    arguments: '--configuration Release --no-build --parallel'

Category-Based Execution:

Separate fast unit tests from slow integration tests:

# Fast unit tests (run always)
- task: DotNetCoreCLI@2
  inputs:
    command: 'test'
    arguments: '--filter Category=Unit --parallel'
  displayName: 'Run Unit Tests'

# Slow integration tests (run on master branch only)
- task: DotNetCoreCLI@2
  condition: eq(variables['Build.SourceBranch'], 'refs/heads/master')
  inputs:
    command: 'test'
    arguments: '--filter Category=Integration --parallel'
  displayName: 'Run Integration Tests'

Test Data Optimization:

Reduce test data volume for faster execution:

// BAD: Slow test (inserts 10,000 records)
await SeedAuditRecords(count: 10_000);

// GOOD: Fast test (inserts minimum needed)
await SeedAuditRecords(count: 10);  // Enough to test pagination

Shared Test Fixtures:

Reuse expensive setup across tests (ClassFixture in xUnit):

public class DatabaseFixture : IDisposable
{
    public DatabaseFixture()
    {
        // Expensive: Migrate database, seed reference data
    }

    public void Dispose() { /* cleanup */ }
}

public class RepositoryTests : IClassFixture<DatabaseFixture>
{
    // All tests share same database instance
}

Rationale: Test duration directly impacts developer productivity. 5-minute target keeps feedback fast. Parallelization scales with test suite growth. Category-based execution balances speed (unit tests) and coverage (integration tests).

Success Rate¶

Metric: Percentage of pipeline runs that complete successfully (no failures).

Target: ≥95% per service.

Calculation:

Success Rate = (Successful Runs / Total Runs) * 100

Example:
  Last 30 days: 28 successful, 2 failed
  Success Rate: (28 / 30) * 100 = 93.3% ⚠️ Below target

Tracking:

Azure DevOps Analytics provides built-in success rate tracking:

Pipeline Analytics: Success/failure trends (line chart).
Per-Service Dashboards: Success rate widget (gauge chart).
Alerting: Email/Teams notification if success rate < 95%.

Failure Categorization:

Track why pipelines fail to identify patterns:

Failure Reason	Count (Last 30 Days)	% of Failures	Action
Test Failures	12	60%	Fix flaky tests; improve test stability
Security Scan Failures	5	25%	Update vulnerable packages; suppress false positives
Build Errors	2	10%	Code quality issues; developer training
Infrastructure	1	5%	Service container unavailable; improve retry logic

Improvement Actions:

Success Rate 90-95%: Review failures; fix flaky tests.
Success Rate 85-90%: Urgent action required; freeze new features until stability restored.
Success Rate < 85%: Escalate to Platform Lead; conduct root cause analysis.

Rationale: Success rate indicates pipeline health and developer experience quality. Low success rate (< 95%) erodes trust; developers ignore failures. High success rate (> 95%) builds confidence; failures taken seriously.

Artifact Size¶

Metric: Size of published artifacts (binaries, Docker images, NuGet packages).

Target: < 100 MB per service (compressed).

Measurement:

- script: |
    ARTIFACT_SIZE=$(du -sh $(Build.ArtifactStagingDirectory) | cut -f1)
    ARTIFACT_SIZE_MB=$(du -sm $(Build.ArtifactStagingDirectory) | cut -f1)

    echo "Artifact size: ${ARTIFACT_SIZE}"

    if [ "$ARTIFACT_SIZE_MB" -gt "100" ]; then
      echo "##vso[task.logissue type=warning]Artifact size ${ARTIFACT_SIZE_MB}MB exceeds target 100MB"
    fi

    # Export metric
    curl -X POST https://monitoring.connectsoft.com/api/metrics \
      -d "{\"metric\":\"pipeline.artifact.size\",\"value\":$ARTIFACT_SIZE_MB,\"service\":\"atp-ingestion\"}"
  displayName: 'Measure Artifact Size'

Size Tracking (example trend):

Service: ATP-Ingestion
  Version 1.0.10: 42 MB
  Version 1.0.20: 48 MB
  Version 1.0.30: 56 MB  ⚠️ +17% growth in 20 builds
  Version 1.0.40: 62 MB

Growth Analysis:

Investigate artifact size growth:

New Dependencies: Which NuGet packages added? Are they necessary?
Unused Code: Dead code not tree-shaken (analyze with dotnet-unused).
Large Files: Images, PDFs, data files committed (move to external storage).
Debug Symbols: PDB files included in Release build (should be separate).

Optimization Strategies:

Trim Unused Dependencies:

<PropertyGroup>
  <PublishTrimmed>true</PublishTrimmed>
  <TrimMode>link</TrimMode>
</PropertyGroup>

Exclude Unnecessary Files:

<ItemGroup>
  <Content Remove="wwwroot/images/**/*.psd" />  <!-- Remove source files -->
  <Content Remove="**/*.xml" />  <!-- Remove XML docs from published output -->
</ItemGroup>

Compress Artifacts:

- task: ArchiveFiles@2
  displayName: 'Compress Artifact'
  inputs:
    rootFolderOrFile: '$(Build.ArtifactStagingDirectory)'
    archiveType: 'zip'
    archiveFile: '$(Build.ArtifactStagingDirectory)/atp-ingestion.zip'
    includeRootFolder: false

Docker Image Size:

Docker images tracked separately (ACR provides size metrics):

Image: connectsoft/atp-ingestion:1.0.42
  Uncompressed: 215 MB
  Compressed: 78 MB

  Layers:
    - mcr.microsoft.com/dotnet/aspnet:8.0-alpine (base): 120 MB
    - Application binaries: 45 MB
    - Configuration: 2 MB
    - Dependencies: 48 MB

Rationale: Large artifacts slow deployments (longer download/upload times). Artifact size growth indicates bloat (unused dependencies, inefficient packaging). Monitoring prevents gradual degradation.

Azure DevOps Dashboards¶

Azure DevOps provides customizable dashboards for visualizing pipeline metrics, test trends, and deployment frequency. ATP maintains organization-wide dashboards for platform oversight and per-service dashboards for team-specific monitoring.

Build Health Dashboard¶

Purpose: Monitor pipeline success rate, duration trends, and failure reasons across all ATP services.

Widgets:

Build Success Rate (Gauge):
- Shows current success rate percentage.
- Color-coded: Green (≥95%), Yellow (90-95%), Red (<90%).
- Configuration: Last 30 days, all ATP pipelines.
Build Duration Trend (Line Chart):
- X-axis: Time (last 90 days).
- Y-axis: Duration (minutes).
- Lines: P50 (median), P95, P99.
- Alerts: Annotation when duration exceeds target.
Pass/Fail Heatmap (Calendar View):
- Grid: Day of month (rows) × Service (columns).
- Color: Green (all passed), Yellow (1-2 failures), Red (3+ failures).
- Identifies patterns (Mondays have more failures? Specific service unstable?).
Failure Reasons (Pie Chart):
- Segments: Test failures (60%), Security scans (25%), Build errors (10%), Infrastructure (5%).
- Drill-down: Click segment to see specific failures.

Dashboard Configuration (Azure DevOps):

# dashboard-config.json (exported from Azure DevOps)
{
  "name": "ATP Pipeline Health",
  "widgets": [
    {
      "name": "Build Success Rate",
      "type": "Microsoft.VisualStudioOnline.Dashboards.BuildSuccessRateWidget",
      "settings": {
        "pipelineIds": ["1", "2", "3", "4", "5", "6", "7"],
        "timePeriod": 30
      }
    },
    {
      "name": "Build Duration Trend",
      "type": "Microsoft.VisualStudioOnline.Dashboards.BuildDurationWidget",
      "settings": {
        "pipelineIds": ["1"],
        "chartType": "line",
        "aggregation": "percentile95"
      }
    }
  ]
}

Stakeholder Access:

Developers: View service-specific dashboards (own pipeline health).
Platform Team: View organization dashboard (all ATP services).
Leadership: Executive dashboard (deployment frequency, lead time, DORA metrics).

Alert Rules:

# Azure Monitor Alert Rule
{
  "name": "ATP-Build-Success-Rate-Alert",
  "condition": {
    "metric": "pipeline.success.rate",
    "operator": "LessThan",
    "threshold": 95,
    "evaluationFrequency": "1h"
  },
  "actions": [
    {
      "type": "email",
      "recipients": ["platform-team@connectsoft.com"]
    },
    {
      "type": "teams",
      "webhook": "$(TeamsWebhook)"
    }
  ]
}

Test Results Dashboard¶

Purpose: Track test trends, detect flaky tests, and monitor coverage over time.

Widgets:

Test Pass Rate (Gauge):
- Current pass rate: 99.7% (1,231 passed, 3 failed).
- Target: 100%.
Coverage Trends (Line Chart):
- X-axis: Build number (last 50 builds).
- Y-axis: Coverage percentage.
- Lines: Line coverage, branch coverage.
- Trend: Increasing, stable, or decreasing.
Flaky Test Detection (Table):
- Columns: Test Name, Pass Rate (last 30 runs), Last Failure.
- Rows: Tests with < 100% pass rate (flaky candidates).
- Action: Link to work item to fix flakiness.
Test Duration Trends (Stacked Bar Chart):
- X-axis: Build number.
- Y-axis: Duration (minutes).
- Stacks: Unit tests (green), Integration tests (blue), E2E tests (orange).

Flaky Test Report (example):

Flaky Tests Detected (Last 30 Days):

Test Name                                          | Pass Rate | Failures | Last Failure
--------------------------------------------------|-----------|----------|-------------
Cache_Invalidation_PropagatesAcrossInstances      | 80%       | 6/30     | 2025-10-28
EventProcessor_HandleConcurrentWrites_Test        | 90%       | 3/30     | 2025-10-25
IntegrityVerifier_VerifyChain_UnderLoad_Test      | 93%       | 2/30     | 2025-10-22

Action: Disable flaky tests with [Ignore]; create work items to fix root cause.

Coverage Trend Analysis:

- script: |
    # Fetch last 10 builds' coverage
    COVERAGE_HISTORY=$(az pipelines runs list --pipeline-ids 1 --top 10 \
      --query "[].{build:id,coverage:customDimensions.codeCoverage}" -o json)

    # Calculate trend (increasing, stable, decreasing)
    python analyze_coverage_trend.py "$COVERAGE_HISTORY"
  displayName: 'Analyze Coverage Trend'

Rationale: Test metrics reveal quality trends. Flaky tests indicate instability (race conditions, timing issues). Coverage trends show whether quality is improving or degrading.

Deployment Frequency¶

Metric: Number of deployments per environment per time period.

Target: Varies by environment (Dev: daily, Test: daily, Staging: weekly, Production: bi-weekly).

Measurement:

Azure DevOps tracks deployments to environments automatically:

Environment: ATP-Production
  Deployments (Last 30 Days): 4
  Frequency: ~1 per week
  Success Rate: 100% (4/4 successful)

DORA Metrics Integration:

Deployment frequency is one of four key DORA (DevOps Research and Assessment) metrics:

Deployment Frequency: How often code deployed to production.
Lead Time for Changes: Time from commit to production deployment.
Mean Time to Recovery (MTTR): Time to restore service after incident.
Change Failure Rate: Percentage of deployments causing failures.

DORA Metrics Dashboard:

ATP Platform - DORA Metrics (Last Quarter)

Deployment Frequency: 2.1 deployments/week
  Target: Elite (Daily)
  Current: Medium (Weekly-Monthly)
  Trend: ↗️ Increasing

Lead Time for Changes: 5.2 days
  Target: Elite (<1 day)
  Current: Medium (1 week - 1 month)
  Trend: → Stable

Mean Time to Recovery: 1.2 hours
  Target: Elite (<1 hour)
  Current: High Performance (1-24 hours)
  Trend: ↘️ Improving

Change Failure Rate: 8%
  Target: Elite (0-15%)
  Current: High Performance (<15%)
  Trend: → Stable

Deployment Frequency by Environment:

Environment         | Last 7 Days | Last 30 Days | Avg per Week
--------------------|-------------|--------------|-------------
Dev                 | 23          | 94           | 23.5
Test                | 15          | 62           | 15.5
Staging             | 4           | 8            | 2.0
Production          | 1           | 4            | 1.0

Improvement Goals:

Increase Deployment Frequency: Move from weekly to daily production deployments (requires higher confidence in automation).
Reduce Lead Time: Automate staging approvals (remove manual gate after proving stability).

Rationale: Deployment frequency indicates agility. Frequent deployments = smaller changesets = lower risk. DORA metrics enable benchmarking against industry standards.

Lead Time¶

Metric: Time from commit to production deployment.

Target: < 24 hours (Elite DORA performance: < 1 day).

Measurement:

Calculate time between Git commit and production deployment event:

- script: |
    COMMIT_TIME=$(git show -s --format=%ct $(Build.SourceVersion))
    DEPLOY_TIME=$(date +%s)
    LEAD_TIME=$((DEPLOY_TIME - COMMIT_TIME))
    LEAD_TIME_HOURS=$((LEAD_TIME / 3600))

    echo "Lead time: ${LEAD_TIME_HOURS} hours"

    # Export to Azure Monitor
    curl -X POST https://monitoring.connectsoft.com/api/metrics \
      -d "{\"metric\":\"pipeline.lead.time\",\"value\":$LEAD_TIME_HOURS,\"service\":\"atp-ingestion\"}"
  displayName: 'Calculate Lead Time'

Lead Time Breakdown (example):

Commit to Production: 5.2 days (124.8 hours)

  Commit: 2025-10-25 09:00 UTC
    ↓
  CI Build Triggered: 2025-10-25 09:02 UTC (2 min delay)
    ↓
  CI Complete: 2025-10-25 09:10 UTC (8 min)
    ↓
  Deploy to Dev: 2025-10-25 09:15 UTC (5 min)
    ↓
  Deploy to Test: 2025-10-25 10:30 UTC (1h 15m wait)
    ↓
  Staging Approval: 2025-10-26 14:00 UTC (27.5h wait) ⚠️ Bottleneck
    ↓
  Deploy to Staging: 2025-10-26 14:30 UTC (30 min)
    ↓
  Production Approval: 2025-10-29 16:00 UTC (73.5h wait) ⚠️ Bottleneck
    ↓
  Deploy to Production: 2025-10-29 16:45 UTC (45 min)

Bottleneck Identification:

Manual Approvals: 101 hours (81% of total lead time).
Pipeline Execution: 10 hours (8% of total lead time).
Waiting for Test Completion: 14 hours (11% of total lead time).

Optimization:

Reduce Manual Approval Time: Automate approvals after proving stability (e.g., auto-approve staging after 50 successful deployments).
Faster Pipelines: Optimize CI duration (parallelization, caching).
Scheduled Deployments: Batch changes for weekly production release (reduces approval overhead).

Rationale: Lead time measures responsiveness. Long lead time = slow feedback = delayed value delivery. Tracking reveals bottlenecks (manual approvals, slow pipelines).

Azure DevOps Dashboards¶

ATP maintains three tiers of dashboards for different stakeholder needs:

Platform Dashboard (Organization-Wide)¶

Audience: Platform Engineering, SRE, Leadership.

Widgets:

All Services Success Rate: Aggregate across 7 ATP services.
Deployment Frequency: Deployments per week (all environments).
DORA Metrics: Four key metrics with targets and trends.
Active Alerts: Count of active pipeline failures (real-time).
Top Failing Pipelines: Services with lowest success rate (need attention).

URL: https://dev.azure.com/dmitrykhaymov/ATP/_dashboards/dashboard/platform-health

Service Dashboard (Per-Service)¶

Audience: Service teams (Ingestion, Query, Gateway, etc.).

Widgets (ATP Ingestion Example):

Build Success Rate: Last 30 days.
Build Duration Trend: P95 over time.
Test Pass Rate: Last 30 runs.
Code Coverage Trend: Line and branch coverage.
Deployment History: Last 10 deployments (environment, version, approver, outcome).
Flaky Tests: Tests with < 100% pass rate.

URL: https://dev.azure.com/dmitrykhaymov/ATP/_dashboards/dashboard/ingestion-health

Executive Dashboard (Leadership View)¶

Audience: CTO, Engineering Directors, Product Management.

Widgets:

Deployment Frequency: Trend over quarters (Q1: 1.2/week, Q2: 1.8/week, Q3: 2.1/week).
Lead Time: Median and P95 (target: < 1 day).
Change Failure Rate: Percentage of deployments requiring rollback.
MTTR (Mean Time to Recovery): Average time to recover from incidents.
Quality Trends: Test coverage, security scan pass rate, code smells.

URL: https://dev.azure.com/dmitrykhaymov/ATP/_dashboards/dashboard/executive-summary

Export to PowerBI:

Executive dashboard data exported to PowerBI for advanced visualization:

- task: PowerBIActions@1
  displayName: 'Export Metrics to PowerBI'
  inputs:
    action: 'PublishDataset'
    workspaceId: '$(PowerBIWorkspaceId)'
    datasetName: 'ATP-Pipeline-Metrics'
    dataPath: '$(Build.ArtifactStagingDirectory)/metrics.json'

Integration with Observability Stack¶

ATP pipelines integrate with the platform observability stack (Grafana, Prometheus, Seq, Application Insights) to provide unified visibility across CI/CD and runtime behavior. Pipeline events are emitted as metrics and traces, enabling correlation between deployments and service health.

Pipeline Events¶

Purpose: Emit pipeline lifecycle events (start, complete, fail) to Azure Monitor for correlation with service deployments and incidents.

Event Schema:

{
  "eventType": "PipelineCompleted",
  "timestamp": "2025-10-30T14:30:00Z",
  "pipelineId": "1",
  "pipelineName": "ATP-Ingestion-CI-CD",
  "buildNumber": "1.0.42",
  "commitSha": "a1b2c3d4e5f6g7h8i9j0",
  "result": "Succeeded",
  "duration": 482,
  "stages": {
    "CI_Stage": {"duration": 480, "result": "Succeeded"},
    "CD_Dev": {"duration": 120, "result": "Succeeded"}
  },
  "deployments": [
    {
      "environment": "Dev",
      "service": "atp-ingestion-dev",
      "version": "1.0.42",
      "timestamp": "2025-10-30T14:35:00Z"
    }
  ]
}

Emission:

- script: |
    EVENT_PAYLOAD=$(cat <<EOF
    {
      "eventType": "PipelineCompleted",
      "pipelineName": "$(Build.DefinitionName)",
      "buildNumber": "$(Build.BuildNumber)",
      "commitSha": "$(Build.SourceVersion)",
      "result": "$(Agent.JobStatus)"
    }
    EOF
    )

    curl -X POST https://monitoring.connectsoft.com/api/events \
      -H "Content-Type: application/json" \
      -d "$EVENT_PAYLOAD"
  displayName: 'Emit Pipeline Event'
  condition: always()  # Run even if pipeline fails

Correlation with Service Deployments:

When service deployed, correlate pipeline event with runtime behavior:

# Grafana query: Deployments annotated on service metrics graph
rate(http_requests_total[5m]) 
  and on() 
  deployment_event{service="atp-ingestion", version="1.0.42"}

Grafana Annotations:

Deployments appear as vertical lines on service metrics dashboards:

- script: |
    curl -X POST https://grafana.connectsoft.com/api/annotations \
      -H "Authorization: Bearer $(GrafanaApiKey)" \
      -d '{
        "dashboardId": 12,
        "time": $(date +%s)000,
        "tags": ["deployment", "atp-ingestion"],
        "text": "Deployed version $(Build.BuildNumber) to Production"
      }'
  displayName: 'Create Grafana Annotation'

Benefits:

Incident Correlation: Link production issues to recent deployments.
Performance Analysis: Compare metrics before/after deployment.
Rollback Decisions: Quickly identify if deployment caused degradation.

Tracing (OpenTelemetry)¶

Purpose: Instrument pipeline steps as OpenTelemetry spans for distributed tracing.

Implementation:

Each pipeline step emits OpenTelemetry span:

- script: |
    # Start span
    TRACE_ID=$(uuidgen)
    SPAN_ID=$(uuidgen | cut -d'-' -f1)
    START_TIME=$(date -u +%s%N)

    echo "##vso[task.setvariable variable=traceId]$TRACE_ID"
    echo "##vso[task.setvariable variable=spanId]$SPAN_ID"
    echo "##vso[task.setvariable variable=spanStartTime]$START_TIME"

    # Export span start
    export OTEL_EXPORTER_OTLP_ENDPOINT=https://otel.connectsoft.com:4317
    otel-cli span start --name "dotnet-build" --trace-id $TRACE_ID --span-id $SPAN_ID
  displayName: 'Start Build Span'

- template: build/build-microservice-steps.yaml@templates

- script: |
    # End span
    END_TIME=$(date -u +%s%N)
    DURATION=$((END_TIME - spanStartTime))

    otel-cli span end --trace-id $(traceId) --span-id $(spanId) --duration $DURATION
  displayName: 'End Build Span'

Trace Structure:

Trace: ATP-Ingestion Pipeline Run #142
  Span: CI_Stage (duration: 8m 15s)
    Span: Lint (duration: 1m 45s)
      Span: StyleCop (duration: 23s)
      Span: SonarQube Prepare (duration: 1m 22s)
    Span: Build (duration: 2m 10s)
      Span: dotnet restore (duration: 45s)
      Span: dotnet build (duration: 1m 25s)
    Span: Test (duration: 3m 48s)
      Span: dotnet test (duration: 3m 12s)
      Span: Publish coverage (duration: 36s)
    Span: Docker Build (duration: 1m 30s)
    Span: Publish Artifacts (duration: 42s)
  Span: CD_Dev (duration: 2m 30s)
    Span: Download Artifact (duration: 15s)
    Span: Deploy to Azure (duration: 1m 45s)
    Span: Health Check (duration: 30s)

Visualization (Jaeger/Grafana):

Traces visualized in Jaeger UI or Grafana Tempo:

Timeline View: Horizontal bars showing span duration (identify slow steps).
Service Map: Dependencies between pipeline steps.
Error Highlighting: Failed spans highlighted in red.

Benefits:

Performance Bottlenecks: Identify slowest pipeline steps (optimize first).
Distributed Tracing: Correlate pipeline spans with service spans (end-to-end traceability).
Troubleshooting: Debug pipeline failures with detailed span attributes (error messages, environment variables).

Span Attributes (example):

{
  "trace.id": "abc123",
  "span.id": "def456",
  "span.name": "dotnet-build",
  "duration": 125000,
  "attributes": {
    "pipeline.name": "ATP-Ingestion-CI-CD",
    "build.number": "1.0.42",
    "commit.sha": "a1b2c3d4",
    "service.name": "atp-ingestion",
    "build.result": "Succeeded"
  }
}

Logs (Centralized)¶

Purpose: Centralize pipeline logs in Seq or Azure Log Analytics for querying, alerting, and long-term retention.

Log Emission:

- script: |
    # Emit structured log to Seq
    curl -X POST https://seq.connectsoft.com/api/events/raw \
      -H "Content-Type: application/json" \
      -d '{
        "@t": "'$(date -u +%Y-%m-%dT%H:%M:%SZ)'",
        "@mt": "Pipeline stage {StageName} completed with result {Result}",
        "StageName": "CI_Stage",
        "Result": "Succeeded",
        "PipelineName": "ATP-Ingestion-CI-CD",
        "BuildNumber": "$(Build.BuildNumber)",
        "Duration": 480
      }'
  displayName: 'Emit Pipeline Log to Seq'
  condition: always()

Structured Logging Benefits:

Queryable: Search logs by pipeline name, build number, result.
Alerting: Create alerts on specific log patterns (e.g., "Security scan failed").
Correlation: Link pipeline logs with service logs (same correlation ID).

Log Queries (Seq):

-- Find all failed builds in last 7 days
SELECT PipelineName, BuildNumber, Result, Duration
FROM stream
WHERE Result = 'Failed' 
  AND @Timestamp > Now() - 7d
ORDER BY @Timestamp DESC

Azure Log Analytics Integration:

- task: AzureCLI@2
  displayName: 'Export Logs to Log Analytics'
  inputs:
    azureSubscription: $(azureSubscription)
    scriptType: 'bash'
    scriptLocation: 'inlineScript'
    inlineScript: |
      az monitor log-analytics workspace table create \
        --workspace-name atp-logs \
        --name PipelineLogs \
        --columns '[{"name":"PipelineName","type":"string"},{"name":"BuildNumber","type":"string"},{"name":"Result","type":"string"}]'

      # Insert log entry
      az monitor log-analytics workspace table update \
        --workspace-name atp-logs \
        --name PipelineLogs \
        --data '{"PipelineName":"ATP-Ingestion-CI-CD","BuildNumber":"1.0.42","Result":"Succeeded"}'

Kusto Query (Log Analytics):

PipelineLogs
| where Result == "Failed"
| where Timestamp > ago(7d)
| summarize FailureCount = count() by PipelineName
| order by FailureCount desc

Log Retention:

Seq: 90 days (hot storage); archived to Azure Blob Storage (7 years).
Log Analytics: 30 days (default); extended to 1 year for compliance.
Azure Pipelines: 30 days (build logs); extended to 1 year for production builds.

Rationale: Centralized logs enable cross-pipeline analysis (find patterns across services). Structured logging enables querying (not just text search). Long retention supports compliance audits.

See Also: Observability stack architecture, telemetry collection, and dashboard configurations in operations/observability.md.

Infrastructure as Code (IaC) Pipelines¶

ATP infrastructure is fully codified using Pulumi with C#, enabling reproducible, version-controlled provisioning of Azure resources across all environments. IaC pipelines automate the creation and updating of infrastructure (App Services, databases, networking, security) with the same rigor as application code — linting, testing, preview, approval gates, and rollback capabilities.

Pulumi was chosen over alternatives (Terraform, Bicep, ARM templates) for ATP because it enables strongly-typed C# infrastructure code that integrates seamlessly with the .NET ecosystem, provides superior IDE support (IntelliSense, refactoring), and allows code reuse (shared libraries, helper methods) familiar to ConnectSoft engineers.

Pulumi Deployment with C¶

Pulumi infrastructure is organized as C# console applications that declare desired state using the Pulumi SDK. The pipeline compiles the Pulumi program, executes pulumi preview to show changes, waits for approval (staging/production), then executes pulumi up to apply changes.

Infrastructure Repository Structure¶

Repository: ConnectSoft.ATP.Infrastructure

Directory Structure:

ConnectSoft.ATP.Infrastructure/
├── src/
│   ├── ConnectSoft.ATP.Infrastructure/
│   │   ├── Program.cs                    # Entry point
│   │   ├── Stacks/
│   │   │   ├── DevStack.cs              # Dev environment
│   │   │   ├── TestStack.cs             # Test environment
│   │   │   ├── StagingStack.cs          # Staging environment
│   │   │   └── ProductionStack.cs       # Production environment
│   │   ├── Resources/
│   │   │   ├── AppServiceResources.cs   # App Service + Plan
│   │   │   ├── DatabaseResources.cs     # SQL Database + Elastic Pool
│   │   │   ├── CacheResources.cs        # Redis Cache
│   │   │   ├── ServiceBusResources.cs   # Service Bus namespace + queues
│   │   │   ├── StorageResources.cs      # Blob Storage (WORM)
│   │   │   ├── NetworkResources.cs      # VNet, Subnets, NSGs
│   │   │   └── SecurityResources.cs     # Key Vault, Managed Identities
│   │   ├── Helpers/
│   │   │   ├── NamingConventions.cs     # Resource naming helpers
│   │   │   └── TaggingHelpers.cs        # Standard Azure tags
│   │   └── ConnectSoft.ATP.Infrastructure.csproj
│   └── ConnectSoft.ATP.Infrastructure.Tests/
│       ├── StackTests.cs                 # Unit tests for stack logic
│       └── ResourceValidationTests.cs    # Validate resource configurations
├── azure-pipelines.yml                   # IaC pipeline
├── Pulumi.yaml                           # Pulumi project metadata
├── Pulumi.dev.yaml                       # Dev stack config
├── Pulumi.test.yaml                      # Test stack config
├── Pulumi.staging.yaml                   # Staging stack config
├── Pulumi.production.yaml                # Production stack config
└── README.md

Pulumi Pipeline Configuration¶

File: /azure-pipelines.yml

name: $(majorMinorVersion).$(semanticVersion)

trigger:
  branches:
    include: [master, main]
  paths:
    include: [src/**, Pulumi.*.yaml]

pr:
  branches:
    include: [master, main]

pool:
  vmImage: 'ubuntu-latest'

variables:
  majorMinorVersion: 0.1
  semanticVersion: $[counter(variables['majorMinorVersion'], 0)]
  workingDirectory: 'src/ConnectSoft.ATP.Infrastructure'
  pulumiVersion: '3.95.0'

stages:
- stage: Build_IaC
  displayName: 'Build and Test Infrastructure Code'
  jobs:
  - job: Build_Pulumi_Program
    displayName: 'Build Pulumi C# Program'
    steps:
    - task: UseDotNet@2
      displayName: 'Install .NET 8 SDK'
      inputs:
        version: '8.x'

    # Build Pulumi program
    - task: DotNetCoreCLI@2
      displayName: 'Build Infrastructure Project'
      inputs:
        command: 'build'
        projects: '$(workingDirectory)/ConnectSoft.ATP.Infrastructure.csproj'
        arguments: '--configuration Release'

    # Run infrastructure unit tests
    - task: DotNetCoreCLI@2
      displayName: 'Run Infrastructure Tests'
      inputs:
        command: 'test'
        projects: 'src/ConnectSoft.ATP.Infrastructure.Tests/ConnectSoft.ATP.Infrastructure.Tests.csproj'
        arguments: '--configuration Release --no-build'

    # Install Pulumi CLI
    - script: |
        curl -fsSL https://get.pulumi.com | sh
        export PATH=$PATH:$HOME/.pulumi/bin
        pulumi version
      displayName: 'Install Pulumi CLI'

    # Login to Pulumi state backend (Azure Blob Storage)
    - script: |
        export PATH=$PATH:$HOME/.pulumi/bin
        pulumi login azblob://pulumi-state?storage_account=connectsoftpulumi
      displayName: 'Pulumi Login'
      env:
        AZURE_STORAGE_ACCOUNT: connectsoftpulumi
        AZURE_STORAGE_KEY: $(PulumiStorageKey)

    # Publish Pulumi program as artifact
    - task: PublishPipelineArtifact@1
      displayName: 'Publish Infrastructure Artifact'
      inputs:
        targetPath: '$(workingDirectory)/bin/Release/net8.0'
        artifact: 'pulumi-program'

- stage: Preview_Staging
  displayName: 'Preview Infrastructure Changes (Staging)'
  dependsOn: Build_IaC
  condition: succeeded()
  jobs:
  - job: Pulumi_Preview_Staging
    displayName: 'Pulumi Preview - Staging Stack'
    steps:
    - download: current
      artifact: pulumi-program

    - script: |
        export PATH=$PATH:$HOME/.pulumi/bin
        pulumi login azblob://pulumi-state?storage_account=connectsoftpulumi

        cd $(Pipeline.Workspace)/pulumi-program
        pulumi stack select atp-staging --create
        pulumi preview --diff --non-interactive
      displayName: 'Pulumi Preview (Staging)'
      env:
        PULUMI_ACCESS_TOKEN: $(PulumiAccessToken)
        ARM_CLIENT_ID: $(AzureClientId)
        ARM_CLIENT_SECRET: $(AzureClientSecret)
        ARM_TENANT_ID: $(AzureTenantId)
        ARM_SUBSCRIPTION_ID: $(AzureSubscriptionId)

- stage: Deploy_Staging_Infrastructure
  displayName: 'Deploy Infrastructure to Staging'
  dependsOn: Preview_Staging
  condition: succeeded()
  variables:
  - group: ATP-Staging-Variables
  jobs:
  - deployment: DeployInfrastructureStaging
    displayName: 'Pulumi Up - Staging Stack'
    environment: ATP-Staging-Infrastructure
    strategy:
      runOnce:
        deploy:
          steps:
          - download: current
            artifact: pulumi-program

          - template: infrastructure/create-microservice-infrastructure-pulumi.yaml@templates
            parameters:
              workingDirectory: $(Pipeline.Workspace)/pulumi-program
              stackName: atp-staging
              azureSubscription: $(azureSubscription)
              pulumiAccessToken: $(PulumiAccessToken)

Pulumi Template (ConnectSoft.AzurePipelines)¶

Template: infrastructure/create-microservice-infrastructure-pulumi.yaml@templates

Purpose: Execute Pulumi program to provision or update Azure infrastructure.

Template Contents:

parameters:
- name: workingDirectory
  type: string
- name: stackName
  type: string
- name: azureSubscription
  type: string
- name: pulumiAccessToken
  type: string
- name: expectNoChanges
  type: boolean
  default: false

steps:
- script: |
    curl -fsSL https://get.pulumi.com | sh
    export PATH=$PATH:$HOME/.pulumi/bin
    pulumi version
  displayName: 'Install Pulumi CLI'

- task: AzureCLI@2
  displayName: 'Pulumi Up'
  inputs:
    azureSubscription: ${{ parameters.azureSubscription }}
    scriptType: 'bash'
    scriptLocation: 'inlineScript'
    workingDirectory: ${{ parameters.workingDirectory }}
    inlineScript: |
      export PATH=$PATH:$HOME/.pulumi/bin

      # Login to state backend
      pulumi login azblob://pulumi-state?storage_account=connectsoftpulumi

      # Select stack
      pulumi stack select ${{ parameters.stackName }} --create

      # Execute deployment
      pulumi up --yes --non-interactive --skip-preview

      # Export stack outputs
      pulumi stack output --json > $(Build.ArtifactStagingDirectory)/stack-outputs.json
  env:
    PULUMI_ACCESS_TOKEN: ${{ parameters.pulumiAccessToken }}

- task: PublishPipelineArtifact@1
  displayName: 'Publish Stack Outputs'
  inputs:
    targetPath: '$(Build.ArtifactStagingDirectory)/stack-outputs.json'
    artifact: 'stack-outputs-${{ parameters.stackName }}'

Pulumi Program Example (C#)¶

File: src/ConnectSoft.ATP.Infrastructure/Program.cs

using Pulumi;
using Pulumi.AzureNative.Resources;
using ConnectSoft.ATP.Infrastructure.Stacks;

return await Deployment.RunAsync(() =>
{
    var stackName = Deployment.Instance.StackName;

    return stackName switch
    {
        "atp-dev" => new DevStack(),
        "atp-test" => new TestStack(),
        "atp-staging" => new StagingStack(),
        "atp-production" => new ProductionStack(),
        _ => throw new ArgumentException($"Unknown stack: {stackName}")
    };
});

File: src/ConnectSoft.ATP.Infrastructure/Stacks/StagingStack.cs

using Pulumi;
using Pulumi.AzureNative.Resources;
using ConnectSoft.ATP.Infrastructure.Resources;
using ConnectSoft.ATP.Infrastructure.Helpers;

namespace ConnectSoft.ATP.Infrastructure.Stacks;

public class StagingStack : Stack
{
    public StagingStack()
    {
        var config = new Config();
        var environment = "staging";
        var region = "eastus";

        // Create Resource Group
        var resourceGroup = new ResourceGroup($"atp-{environment}-{region}-rg", new()
        {
            ResourceGroupName = $"ConnectSoft-ATP-{environment}-{region}-RG",
            Location = region,
            Tags = TaggingHelpers.GetStandardTags(environment, "ATP")
        });

        // Provision Network Resources
        var network = new NetworkResources(resourceGroup, environment, region);

        // Provision Security Resources (Key Vault, Managed Identities)
        var security = new SecurityResources(resourceGroup, environment, region);

        // Provision Data Resources
        var database = new DatabaseResources(resourceGroup, environment, region, security.KeyVault);
        var cache = new CacheResources(resourceGroup, environment, region, network.SubnetId);
        var storage = new StorageResources(resourceGroup, environment, region, network.SubnetId);
        var serviceBus = new ServiceBusResources(resourceGroup, environment, region);

        // Provision App Services for each microservice
        var ingestionService = new AppServiceResources(
            resourceGroup, 
            "ingestion", 
            environment, 
            region,
            appSettings: new Dictionary<string, string>
            {
                ["Database__ConnectionString"] = database.ConnectionStringSecretUri,
                ["Redis__ConnectionString"] = cache.ConnectionStringSecretUri,
                ["ServiceBus__ConnectionString"] = serviceBus.ConnectionStringSecretUri,
                ["KeyVault__Url"] = security.KeyVaultUrl,
                ["ApplicationInsights__InstrumentationKey"] = security.AppInsightsKey
            },
            managedIdentity: security.IngestionManagedIdentity
        );

        var queryService = new AppServiceResources(
            resourceGroup,
            "query",
            environment,
            region,
            appSettings: new Dictionary<string, string>
            {
                ["Database__ConnectionString"] = database.ConnectionStringSecretUri,
                ["Redis__ConnectionString"] = cache.ConnectionStringSecretUri,
                ["Elasticsearch__Endpoint"] = "https://atp-search-staging.search.windows.net",
                ["KeyVault__Url"] = security.KeyVaultUrl
            },
            managedIdentity: security.QueryManagedIdentity
        );

        // Export stack outputs (consumed by service pipelines)
        this.ResourceGroupName = Output.Create(resourceGroup.Name);
        this.IngestionServiceUrl = Output.Format($"https://{ingestionService.AppServiceName}.azurewebsites.net");
        this.QueryServiceUrl = Output.Format($"https://{queryService.AppServiceName}.azurewebsites.net");
        this.KeyVaultUrl = security.KeyVaultUrl;
        this.DatabaseConnectionString = database.ConnectionStringSecretUri;
    }

    [Output] public Output<string> ResourceGroupName { get; set; }
    [Output] public Output<string> IngestionServiceUrl { get; set; }
    [Output] public Output<string> QueryServiceUrl { get; set; }
    [Output] public Output<string> KeyVaultUrl { get; set; }
    [Output] public Output<string> DatabaseConnectionString { get; set; }
}

Resource Modules (C# Classes)¶

File: src/ConnectSoft.ATP.Infrastructure/Resources/AppServiceResources.cs

using Pulumi;
using Pulumi.AzureNative.Web;
using Pulumi.AzureNative.Web.Inputs;
using Pulumi.AzureNative.Resources;

namespace ConnectSoft.ATP.Infrastructure.Resources;

public class AppServiceResources
{
    public Output<string> AppServiceName { get; }
    public Output<string> AppServiceUrl { get; }

    public AppServiceResources(
        ResourceGroup resourceGroup,
        string serviceName,
        string environment,
        string region,
        Dictionary<string, string> appSettings,
        Output<string> managedIdentity)
    {
        var appServicePlanName = NamingConventions.AppServicePlan(serviceName, environment, region);
        var appServiceName = NamingConventions.AppService(serviceName, environment, region);

        // Create App Service Plan
        var appServicePlan = new AppServicePlan(appServicePlanName, new()
        {
            Name = appServicePlanName,
            ResourceGroupName = resourceGroup.Name,
            Location = resourceGroup.Location,
            Sku = new SkuDescriptionArgs
            {
                Name = environment == "production" ? "P3V2" : "P2V2",
                Tier = "PremiumV2",
                Capacity = environment == "production" ? 3 : 2
            },
            Kind = "linux",
            Reserved = true,  // Required for Linux
            Tags = TaggingHelpers.GetStandardTags(environment, "ATP", serviceName)
        });

        // Create App Service with staging slot (blue-green deployments)
        var appService = new WebApp(appServiceName, new()
        {
            Name = appServiceName,
            ResourceGroupName = resourceGroup.Name,
            Location = resourceGroup.Location,
            ServerFarmId = appServicePlan.Id,
            Identity = new ManagedServiceIdentityArgs
            {
                Type = ManagedServiceIdentityType.SystemAssigned
            },
            SiteConfig = new SiteConfigArgs
            {
                AlwaysOn = true,
                LinuxFxVersion = "DOTNETCORE|8.0",
                Http20Enabled = true,
                MinTlsVersion = SupportedTlsVersions.SupportedTlsVersions_1_2,
                AppSettings = appSettings.Select(kvp => new NameValuePairArgs
                {
                    Name = kvp.Key,
                    Value = kvp.Value
                }).ToList(),
                HealthCheckPath = "/health",
                FtpsState = FtpsOnlyResource.Disabled,  // Disable FTP
                HttpsOnly = true
            },
            HttpsOnly = true,
            Tags = TaggingHelpers.GetStandardTags(environment, "ATP", serviceName)
        });

        // Create staging slot (for blue-green deployments)
        if (environment == "staging" || environment == "production")
        {
            var stagingSlot = new WebAppSlot($"{appServiceName}-staging-slot", new()
            {
                Name = appServiceName,
                Slot = "staging",
                ResourceGroupName = resourceGroup.Name,
                Location = resourceGroup.Location,
                ServerFarmId = appServicePlan.Id,
                SiteConfig = new SiteConfigArgs
                {
                    AlwaysOn = true,
                    LinuxFxVersion = "DOTNETCORE|8.0",
                    AppSettings = appSettings.Select(kvp => new NameValuePairArgs
                    {
                        Name = kvp.Key,
                        Value = kvp.Value
                    }).ToList()
                }
            });
        }

        this.AppServiceName = appService.Name;
        this.AppServiceUrl = Output.Format($"https://{appService.DefaultHostName}");
    }
}

File: src/ConnectSoft.ATP.Infrastructure/Resources/DatabaseResources.cs

using Pulumi;
using Pulumi.AzureNative.Sql;
using Pulumi.AzureNative.Sql.Inputs;
using Pulumi.AzureNative.Resources;
using Pulumi.AzureNative.KeyVault;

namespace ConnectSoft.ATP.Infrastructure.Resources;

public class DatabaseResources
{
    public Output<string> ConnectionStringSecretUri { get; }
    public Output<string> ServerName { get; }

    public DatabaseResources(
        ResourceGroup resourceGroup,
        string environment,
        string region,
        Vault keyVault)
    {
        var serverName = NamingConventions.SqlServer("atp", environment, region);
        var databaseName = NamingConventions.SqlDatabase("atp", environment);

        // Create SQL Server
        var sqlServer = new Server(serverName, new()
        {
            ServerName = serverName,
            ResourceGroupName = resourceGroup.Name,
            Location = resourceGroup.Location,
            Version = "12.0",
            MinimalTlsVersion = "1.2",
            PublicNetworkAccess = environment == "production" 
                ? ServerPublicNetworkAccess.Disabled  // Private endpoint only
                : ServerPublicNetworkAccess.Enabled,
            Administrators = new ServerExternalAdministratorArgs
            {
                AdministratorType = AdministratorType.ActiveDirectory,
                PrincipalType = PrincipalType.Group,
                Login = "ATP-SQL-Admins",
                Sid = "00000000-0000-0000-0000-000000000000",  // AAD Group ID
                TenantId = "$(AzureTenantId)"
            },
            Tags = TaggingHelpers.GetStandardTags(environment, "ATP", "database")
        });

        // Create SQL Database
        var database = new Database(databaseName, new()
        {
            DatabaseName = databaseName,
            ResourceGroupName = resourceGroup.Name,
            Location = resourceGroup.Location,
            ServerName = sqlServer.Name,
            Sku = new SkuArgs
            {
                Name = environment == "production" ? "P2" : "S3",
                Tier = environment == "production" ? "Premium" : "Standard"
            },
            MaxSizeBytes = environment == "production" 
                ? 536870912000  // 500 GB
                : 268435456000, // 250 GB
            ZoneRedundant = environment == "production",
            RequestedBackupStorageRedundancy = BackupStorageRedundancy.Geo,
            Tags = TaggingHelpers.GetStandardTags(environment, "ATP", "database")
        });

        // Store connection string in Key Vault
        var connectionString = Output.Tuple(sqlServer.Name, database.Name).Apply(t =>
        {
            var (server, db) = t;
            return $"Server=tcp:{server}.database.windows.net,1433;Database={db};Authentication=Active Directory Managed Identity;";
        });

        var connectionStringSecret = new Secret($"{databaseName}-connection-string", new()
        {
            SecretName = "DatabaseConnectionString",
            ResourceGroupName = resourceGroup.Name,
            VaultName = keyVault.Name,
            Properties = new SecretPropertiesArgs
            {
                Value = connectionString
            },
            Tags = TaggingHelpers.GetStandardTags(environment, "ATP", "secret")
        });

        this.ServerName = sqlServer.Name;
        this.ConnectionStringSecretUri = Output.Format($"{keyVault.Properties.Apply(p => p.VaultUri)}secrets/DatabaseConnectionString");
    }
}

File: src/ConnectSoft.ATP.Infrastructure/Resources/SecurityResources.cs

using Pulumi;
using Pulumi.AzureNative.KeyVault;
using Pulumi.AzureNative.KeyVault.Inputs;
using Pulumi.AzureNative.ManagedIdentity;
using Pulumi.AzureNative.Insights;
using Pulumi.AzureNative.Resources;

namespace ConnectSoft.ATP.Infrastructure.Resources;

public class SecurityResources
{
    public Vault KeyVault { get; }
    public Output<string> KeyVaultUrl { get; }
    public Output<string> AppInsightsKey { get; }
    public Output<string> IngestionManagedIdentity { get; }
    public Output<string> QueryManagedIdentity { get; }

    public SecurityResources(ResourceGroup resourceGroup, string environment, string region)
    {
        var keyVaultName = NamingConventions.KeyVault("atp", environment, region);

        // Create Key Vault
        this.KeyVault = new Vault(keyVaultName, new()
        {
            VaultName = keyVaultName,
            ResourceGroupName = resourceGroup.Name,
            Location = resourceGroup.Location,
            Properties = new VaultPropertiesArgs
            {
                TenantId = Config.AzureTenantId,
                Sku = new SkuArgs
                {
                    Family = SkuFamily.A,
                    Name = environment == "production" ? SkuName.Premium : SkuName.Standard
                },
                EnabledForDeployment = false,
                EnabledForDiskEncryption = false,
                EnabledForTemplateDeployment = true,
                EnableSoftDelete = true,
                SoftDeleteRetentionInDays = 90,
                EnablePurgeProtection = environment == "production",
                EnableRbacAuthorization = true,
                NetworkAcls = new NetworkRuleSetArgs
                {
                    DefaultAction = environment == "production" 
                        ? NetworkRuleAction.Deny 
                        : NetworkRuleAction.Allow,
                    Bypass = NetworkRuleBypassOptions.AzureServices
                }
            },
            Tags = TaggingHelpers.GetStandardTags(environment, "ATP", "keyvault")
        });

        // Create Managed Identity for Ingestion Service
        var ingestionIdentity = new UserAssignedIdentity($"atp-ingestion-{environment}-identity", new()
        {
            ResourceName = $"atp-ingestion-{environment}-identity",
            ResourceGroupName = resourceGroup.Name,
            Location = resourceGroup.Location,
            Tags = TaggingHelpers.GetStandardTags(environment, "ATP", "identity")
        });

        // Create Managed Identity for Query Service
        var queryIdentity = new UserAssignedIdentity($"atp-query-{environment}-identity", new()
        {
            ResourceName = $"atp-query-{environment}-identity",
            ResourceGroupName = resourceGroup.Name,
            Location = resourceGroup.Location,
            Tags = TaggingHelpers.GetStandardTags(environment, "ATP", "identity")
        });

        // Create Application Insights
        var appInsights = new Component($"atp-{environment}-appinsights", new()
        {
            ResourceName = $"atp-{environment}-{region}-appinsights",
            ResourceGroupName = resourceGroup.Name,
            Location = resourceGroup.Location,
            Kind = "web",
            ApplicationType = ApplicationType.Web,
            RetentionInDays = environment == "production" ? 730 : 90,
            IngestionMode = IngestionMode.ApplicationInsights,
            Tags = TaggingHelpers.GetStandardTags(environment, "ATP", "monitoring")
        });

        this.KeyVaultUrl = this.KeyVault.Properties.Apply(p => p.VaultUri);
        this.AppInsightsKey = appInsights.InstrumentationKey;
        this.IngestionManagedIdentity = ingestionIdentity.Id;
        this.QueryManagedIdentity = queryIdentity.Id;
    }
}

File: src/ConnectSoft.ATP.Infrastructure/Helpers/NamingConventions.cs

namespace ConnectSoft.ATP.Infrastructure.Helpers;

public static class NamingConventions
{
    public static string AppService(string service, string env, string region) 
        => $"atp-{service}-{env}-{region}".ToLowerInvariant();

    public static string AppServicePlan(string service, string env, string region) 
        => $"atp-{service}-{env}-{region}-plan".ToLowerInvariant();

    public static string SqlServer(string prefix, string env, string region) 
        => $"{prefix}-sql-{env}-{region}".ToLowerInvariant();

    public static string SqlDatabase(string prefix, string env) 
        => $"{prefix}-db-{env}".ToLowerInvariant();

    public static string KeyVault(string prefix, string env, string region) 
        => $"{prefix}-kv-{env}-{region}".ToLowerInvariant().Replace("-", "").Substring(0, 24);

    public static string StorageAccount(string prefix, string env, string region) 
        => $"{prefix}{env}{region}sa".ToLowerInvariant().Replace("-", "").Substring(0, 24);

    public static string RedisCache(string prefix, string env, string region) 
        => $"{prefix}-redis-{env}-{region}".ToLowerInvariant();

    public static string ServiceBusNamespace(string prefix, string env, string region) 
        => $"{prefix}-sb-{env}-{region}".ToLowerInvariant();
}

File: src/ConnectSoft.ATP.Infrastructure/Helpers/TaggingHelpers.cs

namespace ConnectSoft.ATP.Infrastructure.Helpers;

public static class TaggingHelpers
{
    public static Dictionary<string, string> GetStandardTags(
        string environment, 
        string platform, 
        string component = null)
    {
        var tags = new Dictionary<string, string>
        {
            ["Environment"] = environment,
            ["Platform"] = platform,
            ["ManagedBy"] = "Pulumi",
            ["CostCenter"] = "Engineering",
            ["Owner"] = "platform-team@connectsoft.com",
            ["CreatedDate"] = DateTime.UtcNow.ToString("yyyy-MM-dd")
        };

        if (!string.IsNullOrEmpty(component))
        {
            tags["Component"] = component;
        }

        return tags;
    }
}

Resources Provisioned¶

The Pulumi program provisions a comprehensive set of Azure resources for each ATP environment:

Compute Resources:

Azure App Services (or AKS for containerized deployments):
- Ingestion Service: Linux App Service (P2V2/P3V2 tier).
- Query Service: Linux App Service (P2V2/P3V2 tier).
- Gateway Service: Linux App Service (P2V2/P3V2 tier).
- Integrity Service: Linux App Service (P2V2/P3V2 tier).
- Export Service: Azure Functions (Consumption/Premium plan).
- Policy Service: Linux App Service (P2V2/P3V2 tier).
- Search Service: Linux App Service (P2V2/P3V2 tier).
App Service Plans: Premium tier with zone redundancy (production), auto-scaling enabled.
Deployment Slots: Staging slot for blue-green deployments (staging/production only).

Data Resources:

Azure SQL Database:
- Server: SQL Server 2022 with AAD authentication.
- Database: Premium tier (production), Standard tier (dev/test).
- Elastic Pool: Shared across ATP services (cost optimization).
- Geo-Replication: Read replicas in secondary region (production only).
- Backup: Automated backups with geo-redundant storage.
Redis Cache:
- Premium tier (production): Clustering, persistence, geo-replication.
- Standard tier (dev/test): Single node, no persistence.
- VNet Integration: Private endpoint (production).
Azure Service Bus:
- Namespace: Premium tier (production), Standard tier (dev/test).
- Queues: audit.events.inbox, integrity.jobs, export.requests.
- Topics: audit.record.appended, segment.sealed.
- Authorization: Managed Identity (no connection strings).

Storage Resources:

Blob Storage:
- Account: General Purpose v2 with geo-redundant storage.
- Containers: audit-segments (WORM policy in production), compliance-artifacts, export-files.
- WORM Policy: Immutable blobs with 7-year retention (production only).
- Lifecycle Management: Auto-tier to cool/archive after 90 days.
- Private Endpoint: VNet-integrated access (production).

Monitoring Resources:

Application Insights:
- Instance per environment (shared across ATP services).
- Retention: 730 days (production), 90 days (dev/test).
- Continuous Export: Telemetry to Log Analytics.
Log Analytics Workspace:
- Centralized logging for all ATP services.
- Retention: 1 year (production), 30 days (dev/test).
- Diagnostic Settings: App Service logs, SQL audit logs, NSG flow logs.

Security Resources:

Key Vault:
- Premium SKU (production): HSM-backed keys.
- Standard SKU (dev/test): Software-backed keys.
- Secrets: Database passwords, storage keys, API keys.
- Certificates: TLS certificates for custom domains.
- Access Policies: RBAC-based (no access policies).
- Network: Private endpoint (production), public with firewall (dev/test).
Managed Identities:
- System-Assigned: Each App Service gets identity.
- User-Assigned: Shared identities for specific roles (e.g., Key Vault access).
- RBAC Assignments: Identities granted permissions (Key Vault Secrets User, SQL DB Contributor).

Network Resources:

Virtual Network:
- Address Space: 10.0.0.0/16 (production), 10.1.0.0/16 (staging).
- Subnets: App Service subnet, database subnet, private endpoint subnet.
- Service Endpoints: SQL Database, Storage, Key Vault.
Private Endpoints:
- SQL Database: Private endpoint in VNet (production only).
- Redis Cache: Private endpoint in VNet (production only).
- Storage Account: Private endpoint for WORM storage.
- Key Vault: Private endpoint for secrets access.
Network Security Groups (NSGs):
- Inbound Rules: Allow HTTPS (443), deny all else.
- Outbound Rules: Allow Azure services, deny internet (production).
- Flow Logs: Captured to Log Analytics for security monitoring.

Infrastructure Costs (Estimated):

Environment: Staging (Monthly)
  - App Services (7 × P2V2): $700
  - SQL Database (S3): $300
  - Redis Cache (Standard C1): $75
  - Service Bus (Standard): $10
  - Storage (1 TB): $20
  - Networking (VNet, private endpoints): $50
  - Monitoring (App Insights, Log Analytics): $100
  Total: ~$1,255/month

Environment: Production (Monthly)
  - App Services (7 × P3V2, zone-redundant): $2,100
  - SQL Database (P2, geo-replicated): $900
  - Redis Cache (Premium P1, clustered): $500
  - Service Bus (Premium): $677
  - Storage (10 TB, WORM): $200
  - Networking (VNet, private endpoints, DDoS): $200
  - Monitoring (App Insights, Log Analytics): $300
  Total: ~$4,877/month

GitOps for Configuration¶

ATP employs GitOps principles for configuration management — all configuration stored in Git, deployed via pipelines, with drift detection to ensure runtime config matches source of truth.

Azure App Configuration¶

Purpose: Centralized configuration service for feature flags, connection strings, and app settings with dynamic refresh (no redeployment needed).

Provisioning (Pulumi):

var appConfig = new ConfigurationStore($"atp-{environment}-appconfig", new()
{
    ConfigStoreName = $"atp-{environment}-appconfig",
    ResourceGroupName = resourceGroup.Name,
    Location = resourceGroup.Location,
    Sku = new SkuArgs
    {
        Name = "Standard"  // Supports feature flags, labels, snapshots
    },
    PublicNetworkAccess = environment == "production" 
        ? PublicNetworkAccess.Disabled 
        : PublicNetworkAccess.Enabled,
    Tags = TaggingHelpers.GetStandardTags(environment, "ATP", "config")
});

// Store feature flags
var enableAdvancedSearch = new ConfigurationStoreKeyValue("feature-advanced-search", new()
{
    ConfigStoreName = appConfig.Name,
    ResourceGroupName = resourceGroup.Name,
    Key = ".appconfig.featureflag/AdvancedSearch",
    Label = environment,
    Value = @"{
        ""id"": ""AdvancedSearch"",
        ""enabled"": " + (environment == "production" ? "true" : "false") + @",
        ""conditions"": {
            ""client_filters"": []
        }
    }",
    ContentType = "application/vnd.microsoft.appconfig.ff+json;charset=utf-8"
});

Service Integration:

App Services load configuration from App Configuration using Managed Identity:

// In ASP.NET Core Startup.cs
public class Program
{
    public static IHostBuilder CreateHostBuilder(string[] args) =>
        Host.CreateDefaultBuilder(args)
            .ConfigureAppConfiguration((context, config) =>
            {
                var settings = config.Build();
                var appConfigEndpoint = settings["AppConfiguration:Endpoint"];

                config.AddAzureAppConfiguration(options =>
                {
                    options.Connect(new Uri(appConfigEndpoint), new DefaultAzureCredential())
                        .Select(KeyFilter.Any, context.HostingEnvironment.EnvironmentName)
                        .UseFeatureFlags();
                });
            })
            .ConfigureWebHostDefaults(webBuilder => webBuilder.UseStartup<Startup>());
}

Benefits:

Dynamic Updates: Change feature flags without redeploying services.
Environment Isolation: Configuration scoped by environment label.
Audit Trail: App Configuration tracks all changes (who, when, what).
Rollback: Snapshot configuration and restore if issues.

Key Vault References¶

Purpose: Store secrets (connection strings, API keys, certificates) in Key Vault; reference in App Service app settings.

Configuration (Pulumi):

// App Service app settings reference Key Vault secrets
var appService = new WebApp(appServiceName, new()
{
    SiteConfig = new SiteConfigArgs
    {
        AppSettings = new[]
        {
            new NameValuePairArgs
            {
                Name = "ConnectionStrings__DefaultConnection",
                Value = $"@Microsoft.KeyVault(SecretUri={connectionStringSecretUri})"
            },
            new NameValuePairArgs
            {
                Name = "Redis__ConnectionString",
                Value = $"@Microsoft.KeyVault(SecretUri={redisConnectionStringSecretUri})"
            }
        }
    }
});

Runtime Behavior:

App Service starts; reads app settings.
Detects @Microsoft.KeyVault(SecretUri=...) syntax.
Uses Managed Identity to authenticate to Key Vault.
Fetches secret value from Key Vault.
Injects secret into app configuration (available as IConfiguration["ConnectionStrings:DefaultConnection"]).

Benefits:

Security: Secrets never in pipeline variables or app configuration files.
Rotation: Update secret in Key Vault; services auto-refresh (no redeploy).
Audit: Key Vault logs all secret access (who, when, which secret).
Compliance: Meets SOC 2, HIPAA requirements for secret management.

Secret Rotation (Automated):

Key Vault autorotation policies trigger secret rotation:

var sqlAdminPassword = new Secret("sql-admin-password", new()
{
    SecretName = "SqlAdminPassword",
    VaultName = keyVault.Name,
    ResourceGroupName = resourceGroup.Name,
    Properties = new SecretPropertiesArgs
    {
        Value = GenerateSecurePassword(),
        Attributes = new SecretAttributesArgs
        {
            Enabled = true,
            Expires = DateTime.UtcNow.AddDays(90)  // Auto-expire after 90 days
        }
    }
});

// Rotation policy: Auto-rotate 7 days before expiry
var rotationPolicy = new SecretRotationPolicy("sql-password-rotation", new()
{
    SecretName = sqlAdminPassword.Name,
    VaultName = keyVault.Name,
    ResourceGroupName = resourceGroup.Name,
    LifetimeActions = new[]
    {
        new LifetimeActionArgs
        {
            Trigger = new TriggerArgs
            {
                TimeBeforeExpiry = "P7D"  // 7 days before expiry
            },
            Action = new ActionArgs
            {
                Type = ActionType.AutoRenew
            }
        }
    }
});

Drift Detection¶

Purpose: Periodically validate that deployed infrastructure matches Pulumi state (detect manual changes, configuration drift).

Pipeline (Scheduled):

# infrastructure-drift-detection.yml
schedules:
- cron: "0 2 * * *"  # Run daily at 2 AM UTC
  displayName: 'Daily Drift Detection'
  branches:
    include: [master]
  always: true

stages:
- stage: Detect_Drift
  displayName: 'Detect Infrastructure Drift'
  jobs:
  - job: Pulumi_Refresh
    displayName: 'Refresh Pulumi State and Detect Drift'
    steps:
    - task: UseDotNet@2
      inputs:
        version: '8.x'

    - script: |
        curl -fsSL https://get.pulumi.com | sh
        export PATH=$PATH:$HOME/.pulumi/bin
        pulumi version
      displayName: 'Install Pulumi CLI'

    - script: |
        export PATH=$PATH:$HOME/.pulumi/bin
        pulumi login azblob://pulumi-state?storage_account=connectsoftpulumi

        # Refresh state (fetch current Azure state)
        pulumi stack select atp-production
        pulumi refresh --yes --non-interactive --diff

        # Preview changes (should be zero if no drift)
        pulumi preview --diff --expect-no-changes > drift-report.txt

        # Check for drift
        if grep -q "updates:" drift-report.txt || grep -q "deletes:" drift-report.txt; then
          echo "##vso[task.logissue type=error]Infrastructure drift detected!"
          cat drift-report.txt
          exit 1
        fi

        echo "No drift detected; infrastructure matches Pulumi state"
      displayName: 'Detect Drift (Production)'
      env:
        PULUMI_ACCESS_TOKEN: $(PulumiAccessToken)
        ARM_CLIENT_ID: $(AzureClientId)
        ARM_CLIENT_SECRET: $(AzureClientSecret)
        ARM_TENANT_ID: $(AzureTenantId)
        ARM_SUBSCRIPTION_ID: $(AzureSubscriptionId)

    - task: PublishPipelineArtifact@1
      condition: failed()
      displayName: 'Publish Drift Report'
      inputs:
        targetPath: 'drift-report.txt'
        artifact: 'drift-report'

Drift Scenarios:

Manual Changes: Engineer modifies resource in Azure Portal (e.g., scales App Service tier).
External Automation: Another tool (ARM template, Azure CLI) modifies resource.
Azure Updates: Azure applies patches or updates (usually benign).

Drift Report (example):

Previewing update (atp-production)

View Live: https://app.pulumi.com/connectsoft/atp-infrastructure/atp-production/previews/...

     Type                                      Name                            Plan       Info
 ~   pulumi:pulumi:Stack                       atp-infrastructure-production   update     
 ~   └─ azure-native:web:WebApp                atp-ingestion-prod              update     [diff: ~siteConfig]

Resources:
    ~ 1 to update
    98 unchanged

Do you want to perform this update? [Use arrows to move, type to filter]
  yes
> no  ⚠️ DRIFT DETECTED
  details

Drift Remediation:

Review Drift: Investigate what changed and why.
Options:
- Revert Manual Change: Run pulumi up to restore Pulumi state (undo manual change).
- Accept Change: Update Pulumi code to match manual change (codify drift).
- Exception: Document exception; suppress drift alert for specific resource.
Update Pulumi Code: Commit changes to Git; redeploy via pipeline.

Drift Alerting:

- script: |
    # Send alert to Teams
    curl -X POST $(TeamsWebhook) \
      -H "Content-Type: application/json" \
      -d '{
        "title": "Infrastructure Drift Detected",
        "text": "Production infrastructure drifted from Pulumi state. Review drift report and remediate.",
        "themeColor": "FF0000"
      }'
  condition: failed()
  displayName: 'Alert on Drift'

Periodic Validation:

Daily: Drift detection runs automatically (scheduled pipeline).
Pre-Deployment: Drift check before infrastructure changes (ensure clean starting state).
Post-Incident: Drift check after manual interventions.

Rationale: Drift detection ensures infrastructure matches code (GitOps principle). Manual changes bypass code review and audit trail. Periodic validation catches drift before it accumulates.

Infrastructure Testing¶

Unit Tests (Pulumi Program):

Test infrastructure code logic before deployment:

// src/ConnectSoft.ATP.Infrastructure.Tests/StackTests.cs
using Xunit;
using Pulumi;
using Pulumi.AzureNative.Web;

public class StackTests
{
    [Fact]
    public async Task StagingStack_CreatesAppServiceWithCorrectSku()
    {
        // Arrange & Act
        var resources = await Testing.RunAsync<StagingStack>();

        // Assert
        var appServices = resources.OfType<WebApp>().ToList();
        Assert.NotEmpty(appServices);

        var ingestionService = appServices.First(s => s.Name.ToString().Contains("ingestion"));
        var sku = await ingestionService.AppServicePlan.Sku;
        Assert.Equal("P2V2", sku.Name);
    }

    [Fact]
    public async Task ProductionStack_EnablesKeyVaultPurgeProtection()
    {
        // Arrange & Act
        var resources = await Testing.RunAsync<ProductionStack>();

        // Assert
        var keyVaults = resources.OfType<Vault>().ToList();
        var purgeProtection = await keyVaults.First().Properties.EnablePurgeProtection;
        Assert.True(purgeProtection, "Production Key Vault must have purge protection enabled");
    }

    [Fact]
    public async Task ProductionStack_EnablesWormPolicyOnStorage()
    {
        // Arrange & Act
        var resources = await Testing.RunAsync<ProductionStack>();

        // Assert
        var storageAccounts = resources.OfType<StorageAccount>().ToList();
        var immutabilityPolicy = await storageAccounts.First().ImmutabilityPolicy;
        Assert.NotNull(immutabilityPolicy);
        Assert.Equal(2555, immutabilityPolicy.ImmutabilityPeriodSinceCreationInDays);  // 7 years
    }
}

Integration Tests (Post-Deployment):

Validate provisioned resources are functional:

- script: |
    # Test App Service responds
    curl -f https://atp-ingestion-staging.azurewebsites.net/health

    # Test SQL Database reachable
    sqlcmd -S atp-sql-staging-eastus.database.windows.net -d atp-db-staging -Q "SELECT 1"

    # Test Redis Cache reachable
    redis-cli -h atp-redis-staging-eastus.redis.cache.windows.net -p 6380 --tls PING

    echo "Infrastructure validation successful"
  displayName: 'Validate Provisioned Resources'

Policy as Code (Azure Policy Integration):

Enforce organizational policies on provisioned resources:

var policyAssignment = new PolicyAssignment("require-tags-policy", new()
{
    PolicyDefinitionId = "/providers/Microsoft.Authorization/policyDefinitions/...",
    Scope = resourceGroup.Id,
    Parameters = new InputMap<object>
    {
        ["tagNames"] = new[] { "Environment", "Platform", "Owner" }
    }
});

Rationale: Infrastructure testing prevents misconfigurations from reaching production. Unit tests validate Pulumi logic; integration tests validate Azure resources work correctly. Policy enforcement ensures compliance.

Template Reusability & Customization¶

ConnectSoft pipeline templates are designed for maximum reusability while enabling service-specific customization through parameterization. This balance eliminates duplicate YAML across services (DRY principle) while allowing teams to tailor pipelines to their unique requirements (test coverage thresholds, service dependencies, deployment targets).

Templates use strongly-typed parameters with defaults, enabling services to override only what's necessary while inheriting sensible baseline behavior. This approach reduces cognitive load (teams don't configure everything), prevents configuration drift (shared logic centralized), and enables platform-wide improvements (template updates propagate automatically).

Template Parameters (Common)¶

ConnectSoft templates define common parameters used across most microservices. These parameters provide customization points while maintaining consistency in pipeline structure.

Core Build Parameters¶

Purpose: Configure solution paths, build configuration, and NuGet feed authentication.

Parameter Definitions (in template YAML):

# build/build-microservice-steps.yaml@templates
parameters:
- name: solution
  type: string
  default: '**/*.slnx'
  displayName: 'Solution file glob pattern'

- name: exactSolution
  type: string
  default: ''
  displayName: 'Explicit solution filename (for disambiguation)'

- name: buildConfiguration
  type: string
  default: 'Release'
  values:
  - Release
  - Debug
  displayName: 'Build configuration'

- name: restoreVstsFeed
  type: string
  displayName: 'Azure Artifacts feed ID for NuGet package restore'

- name: buildArguments
  type: string
  default: ''
  displayName: 'Additional dotnet build arguments'

steps:
- task: NuGetAuthenticate@1
  condition: ne('${{ parameters.restoreVstsFeed }}', '')
  displayName: 'Authenticate to NuGet Feed'

- task: DotNetCoreCLI@2
  displayName: 'dotnet restore'
  inputs:
    command: 'restore'
    projects: '${{ parameters.exactSolution != '' && parameters.exactSolution || parameters.solution }}'
    feedsToUse: 'select'
    vstsFeed: '${{ parameters.restoreVstsFeed }}'

- task: DotNetCoreCLI@2
  displayName: 'dotnet build'
  inputs:
    command: 'build'
    projects: '${{ parameters.exactSolution != '' && parameters.exactSolution || parameters.solution }}'
    arguments: '--configuration ${{ parameters.buildConfiguration }} --no-restore ${{ parameters.buildArguments }}'

Parameter Usage (in service pipeline):

- template: build/build-microservice-steps.yaml@templates
  parameters:
    solution: '**/*.slnx'
    exactSolution: 'ConnectSoft.ATP.Ingestion.slnx'
    buildConfiguration: 'Release'
    restoreVstsFeed: 'e4c108b4-7989-4d22-93d6-391b77a39552/1889adca-ccb6-4ece-aa22-cad1ae4a35f3'
    buildArguments: '/p:TreatWarningsAsErrors=true'

Benefits:

Flexibility: Service teams specify exact solution when repo has multiple solutions.
Defaults: Most parameters have sensible defaults (minimal configuration needed).
Type Safety: Parameter types validated (string, number, boolean) before execution.
Reusability: Same template works for .sln, .slnx, or any solution format.

Test Parameters¶

Purpose: Configure test execution, coverage thresholds, and test settings.

Parameter Definitions:

# test/test-microservice-steps.yaml@templates
parameters:
- name: solution
  type: string
  default: '**/*.slnx'

- name: runSettingsFileName
  type: string
  default: ''
  displayName: 'Test run settings file name (.runsettings)'

- name: buildConfiguration
  type: string
  default: 'Release'

- name: codeCoverageThreshold
  type: number
  default: 70
  displayName: 'Minimum code coverage percentage'

- name: testFilter
  type: string
  default: ''
  displayName: 'Test filter expression (e.g., Category=Unit)'

- name: testArguments
  type: string
  default: '--logger trx'
  displayName: 'Additional dotnet test arguments'

steps:
- task: DotNetCoreCLI@2
  displayName: 'dotnet test'
  inputs:
    command: 'test'
    projects: '**/*Tests.csproj'
    arguments: |
      --configuration ${{ parameters.buildConfiguration }}
      --no-build
      ${{ parameters.testArguments }}
      ${{ parameters.testFilter != '' && format('--filter {0}', parameters.testFilter) || '' }}
      ${{ parameters.runSettingsFileName != '' && format('--settings {0}', parameters.runSettingsFileName) || '' }}
    publishTestResults: true

- script: |
    COVERAGE=$(parse_coverage_from_xml)
    THRESHOLD=${{ parameters.codeCoverageThreshold }}

    if (( $(echo "$COVERAGE < $THRESHOLD" | bc -l) )); then
      echo "##vso[task.logissue type=error]Coverage ${COVERAGE}% below threshold ${THRESHOLD}%"
      exit 1
    fi
  displayName: 'Enforce Coverage Threshold'

Parameter Usage (in service pipeline):

- template: test/test-microservice-steps.yaml@templates
  parameters:
    solution: '**/*.slnx'
    runSettingsFileName: 'ConnectSoft.ATP.Ingestion.runsettings'
    buildConfiguration: 'Release'
    codeCoverageThreshold: 75  # Override default (70%)
    testFilter: 'Category!=Slow'  # Exclude slow tests
    testArguments: '--logger trx --collect:"XPlat Code Coverage"'

Service-Specific Overrides:

Service	Coverage Threshold	Test Filter	Reason
Ingestion	75%	None (run all tests)	Critical write path
Query	80%	None (run all tests)	Complex query logic
Gateway	65%	`Category!=Integration` (PR builds)	Thin controllers; skip slow tests in PRs
Integrity	85%	None (run all tests)	Cryptographic operations
Export	70%	`Category=Unit` (PR builds)	Batch processing; skip integration tests in PRs

Docker Parameters¶

Purpose: Configure Docker image build, tagging, and registry.

Parameter Definitions:

# build/build-and-push-microservice-docker-steps.yaml@templates
parameters:
- name: dockerRegistryServiceConnection
  type: string
  displayName: 'Docker registry service connection ID'

- name: imageRepository
  type: string
  displayName: 'Image repository name (e.g., connectsoft/atp-ingestion)'

- name: containerRegistry
  type: string
  displayName: 'Container registry URL (e.g., connectsoft.azurecr.io)'

- name: dockerfile
  type: string
  displayName: 'Path to Dockerfile'

- name: buildContext
  type: string
  default: '.'
  displayName: 'Docker build context directory'

- name: tags
  type: string
  default: |
    $(Build.BuildNumber)
    latest
  displayName: 'Image tags (multi-line string)'

- name: buildArgs
  type: string
  default: ''
  displayName: 'Docker build arguments (--build-arg)'

- name: scanImage
  type: boolean
  default: true
  displayName: 'Run Trivy vulnerability scan before push'

steps:
- task: Docker@2
  displayName: 'Build Docker Image'
  inputs:
    command: 'build'
    repository: '${{ parameters.imageRepository }}'
    dockerfile: '${{ parameters.dockerfile }}'
    buildContext: '${{ parameters.buildContext }}'
    tags: '${{ parameters.tags }}'
    arguments: '${{ parameters.buildArgs }}'

- script: |
    docker run --rm -v /var/run/docker.sock:/var/run/docker.sock \
      aquasec/trivy:latest image \
      --severity CRITICAL,HIGH \
      --exit-code 1 \
      ${{ parameters.containerRegistry }}/${{ parameters.imageRepository }}:$(Build.BuildNumber)
  displayName: 'Trivy Image Scan'
  condition: eq('${{ parameters.scanImage }}', true)

- task: Docker@2
  displayName: 'Push Docker Image'
  inputs:
    command: 'push'
    repository: '${{ parameters.imageRepository }}'
    containerRegistry: '${{ parameters.dockerRegistryServiceConnection }}'
    tags: '${{ parameters.tags }}'

Parameter Usage (multi-stage Dockerfile with build args):

- template: build/build-and-push-microservice-docker-steps.yaml@templates
  parameters:
    dockerRegistryServiceConnection: $(dockerRegistryServiceConnection)
    imageRepository: 'connectsoft/atp-ingestion'
    containerRegistry: 'connectsoft.azurecr.io'
    dockerfile: 'src/ConnectSoft.ATP.Ingestion/Dockerfile'
    buildContext: '.'
    tags: |
      $(Build.BuildNumber)
      $(Build.SourceBranchName)
      latest
    buildArgs: |
      --build-arg ASPNETCORE_VERSION=8.0
      --build-arg BUILD_VERSION=$(Build.BuildNumber)
    scanImage: true

ATP Overrides¶

ATP services override template parameters based on their specific requirements — coverage thresholds reflect criticality, service containers match dependencies, and Docker build contexts accommodate multi-project solutions.

Coverage Thresholds by Service¶

Rationale: Not all services require same coverage. Critical services (write path, cryptography) demand higher coverage; thin orchestration services accept lower coverage.

Service-Specific Thresholds:

# ATP Ingestion (Critical Write Path)
parameters:
  codeCoverageThreshold: 75  # High: Validates all audit records correctly

# ATP Query (Complex Query Logic)
parameters:
  codeCoverageThreshold: 80  # Very High: Many query paths and filters

# ATP Gateway (Thin Controllers)
parameters:
  codeCoverageThreshold: 65  # Moderate: Delegates to other services

# ATP Integrity (Cryptographic Operations)
parameters:
  codeCoverageThreshold: 85  # Very High: Correctness critical for tamper-evidence

# ATP Export (Batch Processing)
parameters:
  codeCoverageThreshold: 70  # Standard: Mix of orchestration and business logic

# ATP Policy (Deterministic Logic)
parameters:
  codeCoverageThreshold: 75  # High: Policy evaluation must be correct

# ATP Search (Elasticsearch Integration)
parameters:
  codeCoverageThreshold: 70  # Standard: Search indexing and queries

Threshold Justification (documented in service README):

## Code Coverage Threshold: 75%

**Rationale**: Ingestion Service is the **critical write path** for ATP. All audit 
events flow through this service, making validation, classification, and persistence 
logic essential to platform correctness. 

Higher coverage (75% vs. 70% default) ensures:
- Event validation logic thoroughly tested (schema, tenancy, classification).
- Error handling paths covered (malformed events, duplicate detection, quota limits).
- Integration points tested (cache operations, message publishing, storage writes).

**Exclusions**: Infrastructure code (middleware), generated code (DTOs), migrations.

Override Mechanism:

Templates read codeCoverageThreshold parameter and enforce dynamically:

# In template
- script: |
    THRESHOLD=${{ parameters.codeCoverageThreshold }}
    echo "Enforcing coverage threshold: ${THRESHOLD}%"
  displayName: 'Coverage Threshold Info'

Service Containers by Service¶

Rationale: Each ATP service has different dependencies. Templates accept service containers defined in calling pipeline; templates don't hardcode containers.

Container Customization (per service):

# ATP Ingestion: Redis + RabbitMQ + OTEL + Seq
resources:
  containers:
    - container: redis
      image: redis:7-alpine
      ports: [6379:6379]
    - container: rabbitmq
      image: rabbitmq:3-management-alpine
      ports: [5672:5672, 15672:15672]
    - container: otel
      image: otel/opentelemetry-collector:0.97.0
      ports: [4317:4317]
    - container: seq
      image: datalust/seq:latest
      ports: [5341:80]

# ATP Query: Elasticsearch + Postgres + Redis + OTEL
resources:
  containers:
    - container: elasticsearch
      image: docker.elastic.co/elasticsearch/elasticsearch:8.11.0
      ports: [9200:9200]
      env:
        discovery.type: single-node
        xpack.security.enabled: false
    - container: postgres
      image: postgres:16-alpine
      ports: [5432:5432]
      env:
        POSTGRES_PASSWORD: postgres
    - container: redis
      image: redis:7-alpine
      ports: [6379:6379]
    - container: otel
      image: otel/opentelemetry-collector:0.97.0
      ports: [4317:4317]

# ATP Gateway: No containers (API-only, no integration tests with infrastructure)
resources:
  containers: []

# ATP Integrity: Postgres (ledger) + OTEL
resources:
  containers:
    - container: postgres
      image: postgres:16-alpine
      ports: [5432:5432]
      env:
        POSTGRES_PASSWORD: postgres
    - container: otel
      image: otel/opentelemetry-collector:0.97.0
      ports: [4317:4317]

SQL Server vs. PostgreSQL Choice:

Service	Database	Rationale
Ingestion	SQL Server (optional) or Postgres	Uses NHibernate; supports both SQL Server and Postgres
Query	Postgres	Read model optimized for Postgres JSONB queries
Integrity	Postgres	Ledger table uses Postgres-specific features (row-level locks)
Export	Postgres	Job queue and metadata storage
Policy	None (stateless)	Policies cached in Redis; no database needed

Elasticsearch vs. Azure Cognitive Search:

Query Service (Dev/Test): Elasticsearch container (free, self-hosted).
Query Service (Production): Azure Cognitive Search (managed, scalable, expensive).
Search Service (All Environments): Elasticsearch (indexing worker).

Container Versions (standardized):

Container	Image	Version	Notes
Redis	`redis:7-alpine`	7.x (latest stable)	Alpine for minimal size
SQL Server	`mcr.microsoft.com/mssql/server:2022-latest`	2022	Microsoft official image
PostgreSQL	`postgres:16-alpine`	16.x	Alpine for minimal size
MongoDB	`mongo:7`	7.x	NoSQL (rarely used in ATP)
RabbitMQ	`rabbitmq:3-management-alpine`	3.x with management UI	Management on port 15672
Elasticsearch	`docker.elastic.co/elasticsearch/elasticsearch:8.11.0`	8.11	Specific version (not `latest`)
OTEL Collector	`otel/opentelemetry-collector:0.97.0`	0.97	Specific version for stability
Seq	`datalust/seq:latest`	Latest	Dev/test logging only

Version Pinning Strategy:

Development Images (Redis, Postgres): Use latest or major version tags (e.g., redis:7).
Production-Critical Images (Elasticsearch, OTEL): Pin to specific version (e.g., 8.11.0) to avoid surprises.
Update Cadence: Review container versions quarterly; upgrade in dev/test before production.

Docker Build Context¶

Challenge: Multi-project solutions require Dockerfile to access multiple .csproj files, but Docker build context limited to single directory.

Solution: Set build context to repository root; Dockerfile uses relative paths.

Scenario: Ingestion Service depends on ConnectSoft.ATP.Contracts project (shared DTOs).

Incorrect (Build Context = Service Directory):

parameters:
  dockerfile: 'src/ConnectSoft.ATP.Ingestion/Dockerfile'
  buildContext: 'src/ConnectSoft.ATP.Ingestion'  # ❌ Can't access Contracts project

Dockerfile fails:

COPY ["../ConnectSoft.ATP.Contracts/ConnectSoft.ATP.Contracts.csproj", "Contracts/"]
# ERROR: Path outside build context

Correct (Build Context = Repository Root):

parameters:
  dockerfile: 'src/ConnectSoft.ATP.Ingestion/Dockerfile'
  buildContext: '.'  # ✅ Repository root (can access all projects)

Dockerfile succeeds:

# Build context: repository root
COPY ["src/ConnectSoft.ATP.Ingestion/ConnectSoft.ATP.Ingestion.csproj", "Ingestion/"]
COPY ["src/ConnectSoft.ATP.Contracts/ConnectSoft.ATP.Contracts.csproj", "Contracts/"]
RUN dotnet restore "Ingestion/ConnectSoft.ATP.Ingestion.csproj"
COPY src/ .
WORKDIR "/src/Ingestion"
RUN dotnet build "ConnectSoft.ATP.Ingestion.csproj" -c Release -o /app/build

Best Practice:

Always use repository root (.) as build context; use relative paths in Dockerfile. Slight build time increase (Docker uploads entire repo) but more flexible.

Deployment Parameters¶

Purpose: Configure Azure subscription, app name, and environment-specific settings.

Parameter Definitions:

# deploy/deploy-microservice-to-azure-web-site.yaml@templates
parameters:
- name: azureSubscription
  type: string
  displayName: 'Azure DevOps service connection'

- name: appName
  type: string
  displayName: 'Azure App Service name'

- name: package
  type: string
  displayName: 'Path to deployment package (.zip)'

- name: appSettings
  type: string
  default: ''
  displayName: 'App settings to override (multi-line)'

- name: deploymentSlot
  type: string
  default: 'production'
  values:
  - production
  - staging
  displayName: 'Deployment slot'

- name: deploymentMethod
  type: string
  default: 'zipDeploy'
  values:
  - zipDeploy
  - runFromPackage
  - webDeploy
  displayName: 'Deployment method'

- name: healthCheckUrl
  type: string
  default: '/health'
  displayName: 'Health check endpoint path'

- name: healthCheckTimeout
  type: number
  default: 300
  displayName: 'Health check timeout (seconds)'

Parameter Usage (blue-green deployment):

- template: deploy/deploy-microservice-to-azure-web-site.yaml@templates
  parameters:
    azureSubscription: '$(azureSubscription)'
    appName: 'atp-ingestion-staging'
    package: '$(Pipeline.Workspace)/drop/*.zip'
    deploymentSlot: 'staging'  # Deploy to staging slot first
    deploymentMethod: 'runFromPackage'  # Faster startup
    healthCheckUrl: '/health'
    healthCheckTimeout: 600  # 10 minutes (longer for first deployment)
    appSettings: |
      -ASPNETCORE_ENVIRONMENT Staging
      -ApplicationInsights__InstrumentationKey $(AppInsightsKey)
      -FeatureFlags__EnableAdvancedSearch true

Template Versioning¶

The ConnectSoft.AzurePipelines repository uses Git tags for versioning, enabling services to pin to stable template versions or adopt upgrades incrementally. This approach prevents breaking changes in templates from affecting all services simultaneously.

Repository Tagging¶

Versioning Strategy: Semantic versioning (SemVer) for template repository.

Tag Format: vMAJOR.MINOR.PATCH (e.g., v1.2.3)

Version Semantics:

Major (v2.0.0): Breaking changes (parameter renames, removed features, incompatible behavior).
Minor (v1.2.0): Backward-compatible features (new parameters with defaults, new templates).
Patch (v1.2.3): Bug fixes, performance improvements, documentation updates.

Tagging Process:

Template Changes: Platform team develops template improvements in feature branch.
Testing: Validate templates in dev/test environments (use feature branch reference).
PR Review: Template changes reviewed by Platform Architects.
Merge: PR merged to main branch.
Tag: Create Git tag for release:

git tag -a v1.3.0 -m "Release v1.3.0: Add Docker BuildKit caching support"
git push origin v1.3.0

Release Notes: Update CHANGELOG.md in template repository:

## [1.3.0] - 2025-10-30

### Added
- Docker BuildKit caching support in `build-and-push-microservice-docker-steps.yaml`
- New parameter `enableBuildKitCache` (default: true)

### Changed
- Improved NuGet restore caching (20% faster builds)
- Updated SonarQube task to v6 (better performance)

### Fixed
- Fix: Coverage threshold enforcement failed on Windows agents

### Breaking Changes
- None (backward compatible)

Tag History (example):

v1.0.0 (2024-01-15): Initial release
v1.1.0 (2024-04-20): Add Pulumi infrastructure templates
v1.2.0 (2024-07-10): Add SBOM generation support
v1.2.1 (2024-07-22): Fix: Docker build context issue
v1.3.0 (2025-10-30): Add BuildKit caching

Pipeline Lock (Pin to Version)¶

Purpose: Services pin to specific template version for stability; upgrade when ready.

Default Behavior (No Pin):

Uses main branch (latest templates):

resources:
  repositories:
    - repository: templates
      type: git
      name: ConnectSoft/ConnectSoft.AzurePipelines
      # No ref: defaults to main branch

Pinned to Tag (Stable):

resources:
  repositories:
    - repository: templates
      type: git
      name: ConnectSoft/ConnectSoft.AzurePipelines
      ref: refs/tags/v1.2.0  # Pin to v1.2.0

Pinned to Branch (Testing):

resources:
  repositories:
    - repository: templates
      type: git
      name: ConnectSoft/ConnectSoft.AzurePipelines
      ref: refs/heads/feature/buildkit-caching  # Test experimental templates

When to Pin:

Environment	Pin Strategy	Rationale
Dev Pipelines	`main` (no pin)	Always use latest templates; fast feedback on template changes
Test Pipelines	`main` or latest stable tag	Balance between stability and adopting improvements
Staging Pipelines	Stable tag (e.g., `v1.2.0`)	Prevent surprises from template updates
Production Pipelines	Stable tag (proven in staging)	Maximum stability; upgrade after validation

Upgrade Process:

New Template Version Released: Platform team tags v1.3.0.
Dev Services Upgrade Automatically: Using main branch (immediate adoption).
Test in Dev: Platform team monitors dev pipelines for issues (1 week observation).
Test Services Upgrade: Change ref: refs/tags/v1.3.0 in test service pipelines.
Staging Services Upgrade: After 2 weeks validation in test; change ref in staging pipelines.
Production Services Upgrade: After 1 month validation; change ref in production pipelines.

Coordinated Upgrade (Multiple Services):

When template version stable, upgrade all services:

# Bulk update all service pipelines
for service in Ingestion Query Gateway Integrity Export Policy Search; do
  cd ConnectSoft.ATP.$service

  # Update azure-pipelines.yml
  sed -i 's/ref: refs\/tags\/v1.2.0/ref: refs\/tags\/v1.3.0/g' azure-pipelines.yml

  git add azure-pipelines.yml
  git commit -m "Upgrade to template v1.3.0"
  git push
done

Rollback (If Template Version Broken):

Revert template version to previous stable:

resources:
  repositories:
    - repository: templates
      ref: refs/tags/v1.2.0  # Rollback from v1.3.0 to v1.2.0

Redeploy pipeline; uses previous template version.

Migration Testing¶

Purpose: Validate new template versions don't break existing pipelines before wide adoption.

Migration Plan (Example: v1.2.0 → v1.3.0):

Phase 1: Dev Environment (Week 1)

# ATP Ingestion (Dev Pipeline)
resources:
  repositories:
    - repository: templates
      ref: refs/tags/v1.3.0  # Upgrade first service to v1.3.0

# Monitor for issues:
# - Build duration change (expect 20% faster due to caching)
# - Build success rate (should remain 95%+)
# - No new errors in pipeline logs

Phase 2: Test Environment (Week 2-3)

If dev stable, upgrade test pipelines:

# Update all test service pipelines
for service in Ingestion Query Gateway Integrity Export Policy Search; do
  # Update test pipeline
  az pipelines variable-group variable update \
    --group-id ATP-Test-Variables \
    --name templateVersion \
    --value v1.3.0
done

Phase 3: Staging Environment (Week 4)

After test validation, upgrade staging:

# ATP services in staging
resources:
  repositories:
    - repository: templates
      ref: refs/tags/v1.3.0

Run full regression test suite; load tests; chaos tests.

Phase 4: Production Environment (Week 5-6)

After staging proves stable (2 weeks observation), upgrade production:

# Production pipelines
resources:
  repositories:
    - repository: templates
      ref: refs/tags/v1.3.0

Deploy to production using canary strategy; monitor closely.

Validation Criteria (Each Phase):

Build Success Rate: ≥95% (no degradation from template change).
Build Duration: Within ±10% of baseline (template shouldn't slow builds).
Deployments: At least 3 successful deployments to environment.
No Regressions: No new pipeline errors or quality gate failures.

Abort Criteria (Rollback to Previous Version):

Build success rate drops below 90%.
Build duration increases > 20%.
New errors in pipeline logs (not present in previous version).
Quality gates fail unexpectedly (e.g., coverage calculation broken).

Communication:

Platform team announces template upgrades:

# Teams/Slack Announcement

**Template Upgrade: v1.3.0 Released**

**What's New**:
- Docker BuildKit caching (20% faster builds)
- Improved NuGet restore caching
- Updated SonarQube task to v6

**Migration Timeline**:
- Week 1: Dev environments (automatic)
- Week 2-3: Test environments
- Week 4: Staging environments
- Week 5-6: Production environments

**Action Required**:
- Dev teams: Monitor build success rates in dev
- QA teams: Validate builds in test environment
- SRE teams: Review staging deployments before production upgrade

**Rollback Plan**:
If issues detected, change `ref: refs/tags/v1.2.0` in azure-pipelines.yml

**Questions**: #platform-engineering channel

Advanced Parameterization¶

Beyond simple parameter overrides, templates support conditional logic, parameter validation, and composition patterns for advanced customization.

Conditional Steps¶

Use Case: Skip Docker build if service not containerized.

Template Implementation:

# build/build-microservice-steps.yaml@templates
parameters:
- name: buildDockerImage
  type: boolean
  default: false

- name: dockerfilePath
  type: string
  default: ''

steps:
# Always run: Build .NET solution
- task: DotNetCoreCLI@2
  displayName: 'dotnet build'
  inputs:
    command: 'build'
    projects: '${{ parameters.solution }}'

# Conditionally run: Build Docker image
- task: Docker@2
  condition: eq('${{ parameters.buildDockerImage }}', true)
  displayName: 'Build Docker Image'
  inputs:
    command: 'build'
    dockerfile: '${{ parameters.dockerfilePath }}'

Service Usage:

# Containerized service
- template: build/build-microservice-steps.yaml@templates
  parameters:
    buildDockerImage: true
    dockerfilePath: 'src/ConnectSoft.ATP.Ingestion/Dockerfile'

# Non-containerized service
- template: build/build-microservice-steps.yaml@templates
  parameters:
    buildDockerImage: false  # Skip Docker build

Parameter Validation¶

Use Case: Ensure required parameters provided; validate parameter values.

Template Implementation:

# deploy/deploy-microservice-to-azure-web-site.yaml@templates
parameters:
- name: azureSubscription
  type: string

- name: appName
  type: string

- name: package
  type: string

steps:
# Validate parameters
- script: |
    if [ -z "${{ parameters.azureSubscription }}" ]; then
      echo "##vso[task.logissue type=error]Parameter 'azureSubscription' is required"
      exit 1
    fi

    if [ -z "${{ parameters.appName }}" ]; then
      echo "##vso[task.logissue type=error]Parameter 'appName' is required"
      exit 1
    fi

    # Validate app name format (lowercase, alphanumeric, hyphens)
    if ! [[ "${{ parameters.appName }}" =~ ^[a-z0-9-]+$ ]]; then
      echo "##vso[task.logissue type=error]App name must be lowercase alphanumeric with hyphens"
      exit 1
    fi
  displayName: 'Validate Parameters'

# ... deployment steps ...

Template Composition¶

Use Case: Combine multiple templates for complex workflows.

Example: Custom CI workflow with additional security scan.

Service Pipeline:

steps:
# Standard templates
- template: build/lint-microservice-steps.yaml@templates
  parameters:
    solution: $(solution)

- template: build/build-microservice-steps.yaml@templates
  parameters:
    solution: $(solution)

# Custom step (not in template)
- script: |
    # Run custom security scan
    custom-security-tool scan $(Build.SourcesDirectory)
  displayName: 'Custom Security Scan'

# Resume standard templates
- template: test/test-microservice-steps.yaml@templates
  parameters:
    solution: $(solution)
    codeCoverageThreshold: 80

- template: publish/publish-microservice-steps.yaml@templates
  parameters:
    artifactName: $(artifactName)

Benefits:

Reuse Standard Logic: Inherit 90% of pipeline from templates.
Customize When Needed: Insert service-specific steps between template invocations.
Maintainability: Standard steps updated via templates; custom steps maintained by service team.

Rationale: Template parameterization balances reusability (consistent baseline) and flexibility (service-specific needs). Versioning enables gradual adoption and rollback. Advanced parameterization (conditionals, validation, composition) handles edge cases without template duplication.

Compliance & Audit Evidence¶

ATP pipelines are designed to generate and preserve evidence of CI/CD controls for regulatory compliance (SOC 2, ISO 27001, HIPAA, GDPR). Every pipeline execution produces an immutable audit trail — execution logs, approval records, test results, security scan reports, and artifact provenance — that auditors can inspect to verify that security and quality controls were enforced.

This compliance-by-design approach ensures that ATP pipelines satisfy change management controls (CAB approvals, deployment records), security controls (vulnerability scanning, secrets management), and testing controls (automated testing, coverage enforcement) required by industry regulations and customer contracts.

Pipeline Audit Trail¶

The pipeline audit trail provides complete traceability from code commit to production deployment, enabling compliance audits, incident investigations, and root cause analysis. Azure DevOps automatically captures execution logs, but ATP extends retention and enriches logs with compliance metadata.

Execution Logs¶

Purpose: Preserve detailed record of every pipeline execution (steps executed, results, errors, duration).

Default Retention: Azure DevOps retains pipeline logs for 30 days (rolling window).

Extended Retention (ATP Compliance Requirement):

# Configure pipeline retention in Azure DevOps
Project Settings → Pipelines → Settings → Retention

Build Pipelines:
  - Minimum retention: 30 days (Azure default)
  - Maximum retention: 365 days (ATP requirement for production builds)

Release Pipelines:
  - Minimum retention: 30 days
  - Maximum retention: 365 days

Retention Policy:
  ☑ Retain builds for at least: 365 days
  ☑ Retain builds with artifacts: 365 days
  ☑ Retain builds associated with releases: Forever (production only)

Retention Rules (Environment-Specific):

Build Type	Retention Period	Rationale
PR Builds	30 days	Temporary validation; no compliance significance
Dev Builds	90 days	Developer troubleshooting; not production evidence
Test Builds	180 days	QA validation; moderate compliance significance
Staging Builds	365 days	Pre-production validation; compliance evidence
Production Builds	7 years	Regulatory requirement (SOC 2, HIPAA, GDPR)

Extended Retention Implementation:

Use Azure Pipelines REST API to extend retention on production builds:

- script: |
    # Extend retention for production builds
    if [ "$(Build.SourceBranch)" == "refs/heads/master" ]; then
      az pipelines runs update \
        --id $(Build.BuildId) \
        --retention-days 2555  # 7 years (SOC 2 requirement)
    fi
  displayName: 'Extend Build Retention (Production)'
  condition: succeeded()

Log Archival (Long-Term Storage):

Export pipeline logs to Azure Blob Storage with WORM policy:

- task: AzureCLI@2
  displayName: 'Archive Pipeline Logs'
  condition: and(succeeded(), eq(variables['Build.SourceBranch'], 'refs/heads/master'))
  inputs:
    azureSubscription: $(azureSubscription)
    scriptType: 'bash'
    scriptLocation: 'inlineScript'
    inlineScript: |
      # Download pipeline logs
      az pipelines runs show --id $(Build.BuildId) --open > pipeline-log.json

      # Upload to compliance storage (WORM container)
      az storage blob upload \
        --account-name atpcompliancelogs \
        --container-name pipeline-logs \
        --name "$(Build.DefinitionName)/$(Build.BuildNumber)/pipeline-log.json" \
        --file pipeline-log.json \
        --metadata buildId=$(Build.BuildId) commitSha=$(Build.SourceVersion) \
        --immutability-policy-mode Locked \
        --immutability-period 2555  # 7 years

Log Contents (example):

{
  "id": 12345,
  "buildNumber": "1.0.42",
  "definition": {
    "name": "ATP-Ingestion-CI-CD",
    "id": 1
  },
  "sourceBranch": "refs/heads/master",
  "sourceVersion": "a1b2c3d4e5f6g7h8i9j0",
  "requestedBy": {
    "displayName": "Alice Developer",
    "uniqueName": "alice@connectsoft.com"
  },
  "startTime": "2025-10-30T14:00:00Z",
  "finishTime": "2025-10-30T14:08:23Z",
  "result": "succeeded",
  "stages": [
    {
      "name": "CI_Stage",
      "result": "succeeded",
      "jobs": [
        {
          "name": "Build_Test_Publish",
          "steps": [
            {"name": "Lint", "result": "succeeded", "duration": 105},
            {"name": "Build", "result": "succeeded", "duration": 130},
            {"name": "Test", "result": "succeeded", "duration": 228},
            {"name": "Publish", "result": "succeeded", "duration": 42}
          ]
        }
      ]
    }
  ]
}

Compliance Use Cases:

SOC 2 Audit: Auditor requests proof that CI/CD controls executed for specific production deployment.
Incident Investigation: Trace production bug to specific build; review test results and security scans.
Change Audit: Compliance team verifies all production changes had CAB approval.

Rationale: Extended retention ensures compliance evidence available when auditors request it (often years after deployment). WORM storage prevents tampering with audit trail.

Artifact Provenance¶

Purpose: Every artifact (binary, Docker image, NuGet package) includes metadata linking it back to source code, pipeline execution, and quality gates.

Provenance Metadata:

Embedded in SBOM and artifact manifest:

{
  "artifact": {
    "name": "ConnectSoft.ATP.Ingestion",
    "version": "1.0.42",
    "type": "docker-image",
    "registry": "connectsoft.azurecr.io"
  },
  "provenance": {
    "pipelineRunId": "12345",
    "pipelineName": "ATP-Ingestion-CI-CD",
    "buildNumber": "1.0.42",
    "commitSha": "a1b2c3d4e5f6g7h8i9j0",
    "commitAuthor": "alice@connectsoft.com",
    "commitMessage": "Add PII classification improvements",
    "buildTimestamp": "2025-10-30T14:08:23Z",
    "buildAgent": "ubuntu-latest-1",
    "approver": "platform-team@connectsoft.com",
    "qualityGates": {
      "lint": "PASSED",
      "build": "PASSED",
      "test": "PASSED (1234/1234 tests)",
      "coverage": "PASSED (75.3%)",
      "sonarQube": "PASSED (A rating)",
      "dependencyScan": "PASSED (0 critical vulnerabilities)",
      "secretsScan": "PASSED (0 secrets detected)",
      "trivyScan": "PASSED (0 critical vulnerabilities in image)"
    }
  },
  "sbom": {
    "format": "CycloneDX",
    "version": "1.5",
    "components": 127,
    "vulnerabilities": 0
  }
}

SBOM Integration:

SBOM includes pipeline run ID and commit SHA:

{
  "bomFormat": "CycloneDX",
  "specVersion": "1.5",
  "version": 1,
  "metadata": {
    "timestamp": "2025-10-30T14:08:23Z",
    "tools": [
      {
        "vendor": "Azure Pipelines",
        "name": "ATP-Ingestion-CI-CD",
        "version": "1.0.42"
      }
    ],
    "component": {
      "name": "ConnectSoft.ATP.Ingestion",
      "version": "1.0.42",
      "purl": "pkg:docker/connectsoft/atp-ingestion@1.0.42"
    },
    "properties": [
      {"name": "pipeline:runId", "value": "12345"},
      {"name": "pipeline:buildNumber", "value": "1.0.42"},
      {"name": "git:commitSha", "value": "a1b2c3d4e5f6g7h8i9j0"},
      {"name": "git:branch", "value": "refs/heads/master"},
      {"name": "build:timestamp", "value": "2025-10-30T14:08:23Z"}
    ]
  },
  "components": [
    {
      "name": "MassTransit",
      "version": "8.1.0",
      "purl": "pkg:nuget/MassTransit@8.1.0",
      "hashes": [
        {"alg": "SHA-256", "content": "abc123..."}
      ]
    }
  ]
}

Artifact Manifest (Published with Binaries):

- script: |
    # Generate artifact manifest
    cat > $(Build.ArtifactStagingDirectory)/artifact-manifest.json <<EOF
    {
      "buildNumber": "$(Build.BuildNumber)",
      "commitSha": "$(Build.SourceVersion)",
      "pipelineRunId": "$(Build.BuildId)",
      "buildTimestamp": "$(date -u +%Y-%m-%dT%H:%M:%SZ)",
      "approver": "$(Build.RequestedFor)",
      "repository": "$(Build.Repository.Name)",
      "branch": "$(Build.SourceBranch)"
    }
    EOF
  displayName: 'Generate Artifact Manifest'

- task: PublishPipelineArtifact@1
  displayName: 'Publish Artifact Manifest'
  inputs:
    targetPath: '$(Build.ArtifactStagingDirectory)/artifact-manifest.json'
    artifact: 'manifest'

Provenance Verification (During Deployment):

Deployment pipelines verify artifact provenance before deploying:

- script: |
    # Download manifest from artifact
    MANIFEST=$(cat $(Pipeline.Workspace)/manifest/artifact-manifest.json)

    # Verify commit SHA matches expected
    ARTIFACT_COMMIT=$(echo $MANIFEST | jq -r '.commitSha')
    EXPECTED_COMMIT=$(Build.SourceVersion)

    if [ "$ARTIFACT_COMMIT" != "$EXPECTED_COMMIT" ]; then
      echo "##vso[task.logissue type=error]Artifact provenance mismatch!"
      echo "Expected commit: $EXPECTED_COMMIT"
      echo "Artifact commit: $ARTIFACT_COMMIT"
      exit 1
    fi

    echo "Artifact provenance verified: commit $ARTIFACT_COMMIT"
  displayName: 'Verify Artifact Provenance'

Benefits:

Tamper Detection: Verify artifact not modified between build and deployment.
Incident Correlation: Link production issue to specific code change.
Compliance Evidence: Prove chain of custody from source to production.

Approval Records¶

Purpose: Capture manual approval decisions for staging/production deployments with approver identity, timestamp, and justification.

Approval Record Structure:

Azure DevOps stores approval records in deployment history:

{
  "approvalId": "67890",
  "deploymentId": "12345",
  "environment": "ATP-Production",
  "approver": {
    "displayName": "Bob SRE",
    "uniqueName": "bob@connectsoft.com",
    "id": "00000000-0000-0000-0000-000000000001"
  },
  "approvedAt": "2025-10-30T16:00:00Z",
  "decision": "Approved",
  "comment": "Staging deployment successful for 48 hours. Rollback plan documented in WIKI-123. On-call rotation confirmed. Proceeding with production deployment.",
  "relatedWorkItems": [
    {"id": 1234, "title": "Feature: Advanced Audit Classification"},
    {"id": 1235, "title": "Bug: Fix query timeout in large datasets"}
  ],
  "changeTicket": "CHG00012345"
}

Approval Log Export (to Compliance Storage):

- task: AzureCLI@2
  displayName: 'Export Approval Records'
  condition: succeeded()
  inputs:
    azureSubscription: $(azureSubscription)
    scriptType: 'bash'
    scriptLocation: 'inlineScript'
    inlineScript: |
      # Fetch approval records via Azure DevOps REST API
      APPROVALS=$(az pipelines runs show --id $(Build.BuildId) --query 'validations' -o json)

      # Upload to compliance storage
      echo "$APPROVALS" > approvals.json
      az storage blob upload \
        --account-name atpcompliancelogs \
        --container-name approval-logs \
        --name "$(Build.DefinitionName)/$(Build.BuildNumber)/approvals.json" \
        --file approvals.json \
        --metadata environment=$(Environment.Name) deploymentId=$(Deployment.DeploymentId)

Approval Audit Query (Compliance Team):

Query approval logs to verify all production deployments approved:

# Azure CLI: List all production deployments in last quarter
az storage blob list \
  --account-name atpcompliancelogs \
  --container-name approval-logs \
  --prefix "ATP-Ingestion-CI-CD" \
  --query "[?metadata.environment=='ATP-Production' && properties.createdOn > '2025-07-01'].{name:name,approver:metadata.approver,timestamp:properties.createdOn}"

Compliance Report (Quarterly):

ATP Production Deployments - Q3 2025

Total Deployments: 12
Approved Deployments: 12 (100%)
Average Approval Time: 2.3 hours
CAB-Reviewed Deployments: 4 (major releases)

Approvers:
  - bob@connectsoft.com: 7 deployments
  - alice@connectsoft.com: 5 deployments

All production deployments had required approvals (2 approvers minimum).
No exceptions granted.

Rationale: Approval records prove change management controls enforced. Auditors verify every production deployment had appropriate approvals (SOC 2 CC8.0 control).

Compliance Artifacts (Per Build)¶

Every ATP build produces a compliance bundle containing all artifacts required for regulatory audits and certifications. These artifacts are published to Azure Artifacts and archived to long-term WORM storage.

SBOM (Software Bill of Materials)¶

Format: CycloneDX (preferred) or SPDX (alternative).

Generation (During CI Stage):

- script: |
    # Install CycloneDX tool
    dotnet tool install --global CycloneDX

    # Generate SBOM
    dotnet CycloneDX $(Build.SourcesDirectory) \
      -o $(Build.ArtifactStagingDirectory) \
      -f json \
      -sv $(Build.BuildNumber) \
      --github-bearer-token $(GitHubToken)  # Fetch vulnerability data
  displayName: 'Generate SBOM (CycloneDX)'

- task: PublishPipelineArtifact@1
  displayName: 'Publish SBOM'
  inputs:
    targetPath: '$(Build.ArtifactStagingDirectory)/bom.json'
    artifact: 'sbom'

SBOM Contents (example excerpt):

{
  "bomFormat": "CycloneDX",
  "specVersion": "1.5",
  "serialNumber": "urn:uuid:3e671687-395b-41f5-a30f-a58921a69b79",
  "version": 1,
  "metadata": {
    "timestamp": "2025-10-30T14:08:23Z",
    "component": {
      "type": "application",
      "name": "ConnectSoft.ATP.Ingestion",
      "version": "1.0.42"
    }
  },
  "components": [
    {
      "type": "library",
      "name": "MassTransit",
      "version": "8.1.0",
      "purl": "pkg:nuget/MassTransit@8.1.0",
      "hashes": [
        {"alg": "SHA-256", "content": "abc123def456..."}
      ],
      "licenses": [
        {"license": {"id": "Apache-2.0"}}
      ]
    },
    {
      "type": "library",
      "name": "Serilog",
      "version": "3.1.1",
      "purl": "pkg:nuget/Serilog@3.1.1",
      "hashes": [
        {"alg": "SHA-256", "content": "def456ghi789..."}
      ],
      "licenses": [
        {"license": {"id": "Apache-2.0"}}
      ]
    }
  ],
  "vulnerabilities": []
}

SBOM Publishing:

Azure Artifacts: Published to dedicated SBOM feed for compliance team access.
Azure Blob Storage: Archived with WORM policy (7-year retention).
Customer Portal: Redacted SBOMs (remove internal details) available for customer download.

SBOM Use Cases:

Vulnerability Response: When CVE announced, query all SBOMs to find affected builds.
License Compliance: Audit SBOMs for GPL/AGPL packages in proprietary code.
Customer Compliance: Customers request SBOMs for their own supply chain audits.

Security Scan Reports¶

Purpose: Preserve results of all security scans (SAST, dependency scanning, secrets detection, container scanning) as compliance evidence.

Report Collection:

- script: |
    # Aggregate security scan reports
    mkdir -p $(Build.ArtifactStagingDirectory)/security-reports

    # SonarQube report (JSON)
    cp $(Agent.TempDirectory)/sonarqube-report.json $(Build.ArtifactStagingDirectory)/security-reports/

    # OWASP Dependency-Check report (HTML + JSON)
    cp dependency-check-report.* $(Build.ArtifactStagingDirectory)/security-reports/

    # Trivy scan report (JSON)
    cp trivy-report.json $(Build.ArtifactStagingDirectory)/security-reports/

    # GitGuardian secrets scan (JSON)
    cp gitguardian-report.json $(Build.ArtifactStagingDirectory)/security-reports/
  displayName: 'Aggregate Security Reports'

- task: PublishPipelineArtifact@1
  displayName: 'Publish Security Reports'
  inputs:
    targetPath: '$(Build.ArtifactStagingDirectory)/security-reports'
    artifact: 'security-reports'

SonarQube Report (example excerpt):

{
  "projectKey": "connectsoft_atp-ingestion",
  "analysisDate": "2025-10-30T14:05:12Z",
  "qualityGate": {
    "status": "OK",
    "conditions": [
      {"metric": "vulnerabilities", "status": "OK", "actual": "0", "error": "0"},
      {"metric": "security_rating", "status": "OK", "actual": "1.0", "error": "1.0"},
      {"metric": "code_smells", "status": "OK", "actual": "45", "error": "100"}
    ]
  },
  "measures": {
    "vulnerabilities": 0,
    "security_hotspots": 3,
    "bugs": 2,
    "code_smells": 45,
    "coverage": 75.3,
    "duplicated_lines_density": 1.2
  }
}

OWASP Dependency-Check Report (example excerpt):

{
  "reportSchema": "1.1",
  "scanDate": "2025-10-30T14:06:45Z",
  "projectInfo": {
    "name": "ConnectSoft.ATP.Ingestion",
    "reportDate": "2025-10-30T14:06:45Z"
  },
  "dependencies": [
    {
      "fileName": "MassTransit.dll",
      "filePath": "/packages/masstransit/8.1.0/lib/net8.0/MassTransit.dll",
      "sha256": "abc123...",
      "evidenceCollected": {
        "vendorEvidence": ["MassTransit"],
        "productEvidence": ["MassTransit"],
        "versionEvidence": ["8.1.0"]
      },
      "vulnerabilities": []
    }
  ],
  "totalDependencies": 127,
  "vulnerableDependencies": 0
}

Trivy Report (example excerpt):

{
  "SchemaVersion": 2,
  "ArtifactName": "connectsoft.azurecr.io/atp-ingestion:1.0.42",
  "ArtifactType": "container_image",
  "Metadata": {
    "ImageID": "sha256:abc123...",
    "DiffIDs": ["sha256:def456..."],
    "RepoTags": ["connectsoft.azurecr.io/atp-ingestion:1.0.42"],
    "RepoDigests": ["connectsoft.azurecr.io/atp-ingestion@sha256:ghi789..."]
  },
  "Results": [
    {
      "Target": "connectsoft.azurecr.io/atp-ingestion:1.0.42 (alpine 3.19.0)",
      "Class": "os-pkgs",
      "Type": "alpine",
      "Vulnerabilities": []
    }
  ]
}

Report Retention: 7 years (archived to WORM storage alongside SBOM).

Test Results¶

Purpose: Preserve test execution records (pass/fail, coverage, duration) as proof of automated testing controls.

Test Results Artifact:

Azure DevOps automatically publishes test results (.trx files), but ATP also archives to long-term storage:

- task: PublishTestResults@2
  displayName: 'Publish Test Results'
  inputs:
    testResultsFormat: 'VSTest'
    testResultsFiles: '**/*.trx'
    mergeTestResults: true
    failTaskOnFailedTests: true
    testRunTitle: 'ATP Ingestion Tests - Build $(Build.BuildNumber)'

- task: PublishCodeCoverageResults@1
  displayName: 'Publish Code Coverage'
  inputs:
    codeCoverageTool: 'Cobertura'
    summaryFileLocation: '$(Agent.TempDirectory)/**/coverage.cobertura.xml'
    reportDirectory: '$(Agent.TempDirectory)/**/coverage-report'

# Archive to long-term storage
- task: AzureCLI@2
  displayName: 'Archive Test Results'
  condition: and(succeeded(), eq(variables['Build.SourceBranch'], 'refs/heads/master'))
  inputs:
    azureSubscription: $(azureSubscription)
    scriptType: 'bash'
    scriptLocation: 'inlineScript'
    inlineScript: |
      # Download test results
      az pipelines runs artifact download \
        --artifact-name test-results \
        --run-id $(Build.BuildId) \
        --path ./test-results

      # Upload to compliance storage
      az storage blob upload-batch \
        --account-name atpcompliancelogs \
        --destination test-results \
        --source ./test-results \
        --pattern "**/*.trx" \
        --metadata buildNumber=$(Build.BuildNumber) service=atp-ingestion

Test Results Summary (Published to Compliance Feed):

{
  "buildNumber": "1.0.42",
  "testRun": {
    "totalTests": 1234,
    "passed": 1234,
    "failed": 0,
    "skipped": 3,
    "passRate": 100.0,
    "duration": 263
  },
  "codeCoverage": {
    "lineCoverage": 75.3,
    "branchCoverage": 62.1,
    "threshold": 75.0,
    "status": "PASSED"
  },
  "testCategories": {
    "unit": {"total": 842, "passed": 842, "duration": 72},
    "integration": {"total": 389, "passed": 389, "duration": 186},
    "contract": {"total": 3, "passed": 3, "duration": 5}
  }
}

Compliance Use Case:

SOC 2 auditor requests evidence of automated testing control:

Auditor selects random production deployment (version 1.0.42).
Compliance team retrieves test results from archive.
Auditor verifies: All tests passed, coverage ≥75%, no skipped tests.
Evidence documented in SOC 2 attestation report.

ADR Snapshots¶

Purpose: Capture Architecture Decision Records (ADRs) at feature milestones for compliance and knowledge preservation.

ADR Structure (Example):

# ADR-0015: Use Hash Chaining for Audit Trail Integrity

**Status**: Accepted  
**Date**: 2025-10-15  
**Deciders**: Platform Team, Security Team  
**Context**: ATP requires tamper-evidence for audit records. Need cryptographic proof
that records haven't been modified or deleted post-ingestion.

## Decision

Implement hash chaining where each segment's hash includes previous segment hash,
creating immutable chain. Anchors published to external timestamping service.

## Consequences

**Positive**:
- Tamper-evidence: Any modification breaks chain.
- External verification: Anchors provide proof of existence.
- Auditability: Compliance audits can verify chain integrity.

**Negative**:
- Performance: Hash computation adds latency (~10ms per segment).
- Complexity: Requires integrity service and anchor publishing.

**Risks**:
- Hash collision (SHA-256): Negligible (2^128 probability).
- Anchor service unavailable: Graceful degradation (continue without external anchor).

ADR Snapshot in Pipeline:

- script: |
    # Copy ADR directory to compliance artifact
    cp -r docs/adr $(Build.ArtifactStagingDirectory)/compliance/

    # Generate ADR index
    ls docs/adr/*.md > $(Build.ArtifactStagingDirectory)/compliance/adr-index.txt
  displayName: 'Snapshot ADRs'

- task: PublishPipelineArtifact@1
  displayName: 'Publish ADR Snapshot'
  inputs:
    targetPath: '$(Build.ArtifactStagingDirectory)/compliance/adr'
    artifact: 'adr-snapshot'

ADR Versioning:

ADRs committed to Git alongside code; snapshot captured per build:

Build 1.0.42: Snapshot includes ADR-0001 through ADR-0015.
Build 1.1.0: Snapshot includes ADR-0001 through ADR-0020 (5 new ADRs).

Compliance Use Case:

Auditor asks: "How did you decide to use hash chaining for integrity?"

Compliance team retrieves ADR snapshot from build 1.0.42.
Provides ADR-0015 documenting decision, context, consequences.
Auditor sees formal decision-making process with rationale.

Regulatory Alignment¶

ATP pipeline controls and evidence align with major regulatory frameworks: SOC 2 (change management), GDPR (data protection), and HIPAA (security safeguards).

SOC 2 Compliance¶

Trust Services Criteria: CC8.0 - Change Management Controls

Requirement: Organization implements controls over program changes to ensure authorized, tested, approved, and documented changes.

ATP Pipeline Controls (Mapped to CC8.0):

CC8.0 Sub-Criteria	ATP Control	Evidence
CC8.1.1: Change approval process	Manual approval gates for staging/production deployments	Approval records in Azure DevOps
CC8.1.2: Authorized changes only	Branch policies (PR required, code review)	Git history, PR approvals
CC8.1.3: Testing before deployment	Automated test suite (CI stage); coverage ≥70%	Test results, coverage reports
CC8.1.4: Deployment authorization	Manual approvals; CAB review for major changes	Approval logs, CAB meeting minutes
CC8.1.5: Change documentation	Work item linking (every PR to Epic/Feature/Task)	Azure DevOps work items, commit messages
CC8.1.6: Emergency change process	Hotfix workflow with expedited approval	Hotfix pipeline logs, emergency approval records
CC8.1.7: Backout procedures	Automated rollback on metrics degradation	Rollback logs, deployment slot swaps
CC8.1.8: Change communication	Status page updates, customer notifications	Status page history, email logs

Evidence Package (for SOC 2 Audit):

SOC 2 Type II - CC8.0 Evidence Package
Period: Q3 2025 (July 1 - September 30)

1. Pipeline Logs (12 production deployments):
   - All deployments had required approvals (2 approvers minimum)
   - All deployments passed quality gates (tests, coverage, security scans)
   - Average lead time: 5.2 days (commit to production)

2. Approval Records:
   - 100% of production deployments approved (12/12)
   - CAB review for major releases (4/12)
   - No unauthorized deployments detected

3. Test Results:
   - 100% test pass rate across all builds
   - Code coverage ≥75% for all production builds
   - Zero critical/high vulnerabilities in production

4. Security Scan Reports:
   - SAST (SonarQube): All builds passed quality gate
   - Dependency Scan (OWASP): Zero critical vulnerabilities
   - Secrets Scan: Zero secrets detected

5. Change Documentation:
   - 100% of PRs linked to work items (Epic/Feature/Task)
   - Commit messages reference work items
   - Release notes published for each deployment

Auditor Validation:

Auditor selects sample of 5 production deployments and verifies:

Approvals obtained from authorized approvers.
Tests executed and passed (100% pass rate).
Security scans performed and clean.
Changes documented (work items, commit messages).
Rollback plan documented.

Attestation: Controls operate effectively; no exceptions noted.

Requirement: No personal identifiable information (PII) in logs, artifacts, or error messages.

ATP Pipeline Controls:

Log Redaction:

Pipeline logs redacted to remove PII:

- script: |
    # Redact connection strings from logs
    echo "Deploying to Azure App Service: atp-ingestion-staging"
    # ❌ BAD: echo "Connection string: Server=tcp:atp-sql.database.windows.net;Password=P@ssw0rd123"
    # ✅ GOOD: echo "Connection string: [REDACTED]"
  displayName: 'Deploy to Azure'

Artifact Redaction:

SBOM and security reports redacted:

{
  "components": [
    {
      "name": "Azure.Storage.Blobs",
      "version": "12.18.0",
      "connectionString": "[REDACTED]"  // No connection strings in SBOM
    }
  ]
}

Configuration Handling:

Secrets stored in Key Vault (not pipeline variables):

# ❌ BAD: Secrets in pipeline variables
variables:
  sqlConnectionString: 'Server=tcp:...;Password=P@ssw0rd123'

# ✅ GOOD: Secrets in Key Vault (referenced)
variables:
  sqlConnectionStringUri: '@Microsoft.KeyVault(SecretUri=https://atp-kv.vault.azure.net/secrets/SqlConnectionString)'

Pipeline Variable Masking:

Sensitive variables marked as secret (automatically masked in logs):

variables:
- group: ATP-Prod-Variables
  # Variables in group marked as secret (e.g., API keys, passwords)
  # Azure DevOps masks secret values in logs (displays ***)

GDPR Compliance Evidence:

Data Protection Impact Assessment (DPIA): Documents how pipeline handles PII (none in ATP pipelines).
Log Retention: Pipeline logs retained per GDPR Article 5(e) (storage limitation).
Right to Erasure: Pipeline logs can be deleted upon request (compliance team process).

HIPAA Compliance¶

Requirement: Secure artifact storage, encrypted transit, access controls on pipelines.

ATP Pipeline Controls:

Secure Artifact Storage:

Artifacts stored in Azure Artifacts with encryption at rest:

# Azure Artifacts configuration
Encryption: Microsoft-managed keys (AES-256)
Network: VNet integration (private endpoints in production)
Access Control: Azure AD authentication required
Audit Logging: All artifact downloads logged to Log Analytics

Encrypted Transit:

All pipeline communications use TLS 1.2+:

Azure DevOps → Azure Services: HTTPS (TLS 1.2).
Docker Registry: HTTPS (TLS 1.2).
NuGet Feed: HTTPS (TLS 1.2).
Key Vault: HTTPS (TLS 1.2).

Access Controls:

RBAC enforced on pipelines and artifacts:

# Azure DevOps Permissions
Project: ATP
  Pipelines:
    - Developers: Queue builds, view logs
    - Platform Team: Edit pipelines, manage retention
    - SRE Team: Approve deployments, manage environments
    - Compliance Team: View logs (read-only)

  Artifacts:
    - Developers: Read (download packages)
    - Platform Team: Read/Write (publish packages)
    - Production Services: Read (Managed Identity)
    - External Users: No access

Audit Logging:

All pipeline executions logged to Azure Monitor:

- script: |
    # Emit audit event
    curl -X POST https://monitoring.connectsoft.com/api/audit \
      -H "Content-Type: application/json" \
      -d '{
        "eventType": "PipelineExecution",
        "userId": "$(Build.RequestedFor)",
        "resource": "$(Build.DefinitionName)",
        "action": "Build",
        "result": "$(Agent.JobStatus)",
        "timestamp": "'$(date -u +%Y-%m-%dT%H:%M:%SZ)'"
      }'
  displayName: 'Log Pipeline Execution'
  condition: always()

HIPAA Evidence:

Access Control: RBAC policies and audit logs.
Encryption: TLS for transit, AES-256 for storage.
Audit Trail: Pipeline logs, approval records, artifact access logs.
Integrity: SBOM hashes, artifact checksums, immutable storage (WORM).

Compliance Certification:

ATP pipelines included in annual HIPAA Risk Assessment:

HIPAA Security Rule - Administrative Safeguards
§164.308(a)(5) - Security Awareness and Training

Control: Developers trained on secure pipeline practices (no hardcoded secrets,
PII handling, access control).

Evidence: Training records, security scan results (zero secrets detected in 100%
of builds), incident logs (zero security incidents related to pipelines).

Rationale: Pipeline compliance controls ensure ATP meets regulatory requirements without manual processes. Automated evidence collection reduces audit overhead and provides immutable proof of control effectiveness.

Troubleshooting & Debugging¶

Pipeline failures are inevitable in CI/CD systems — dependency changes, infrastructure issues, flaky tests, and configuration drift all contribute to build failures. ATP provides systematic troubleshooting approaches and debug tools to quickly identify root causes, minimize downtime, and restore pipeline health.

This section catalogs common failure scenarios with symptoms, root causes, and resolutions, along with debugging strategies (local reproduction, container logs, verbose logging) that enable developers and SREs to diagnose issues efficiently.

Common Pipeline Failures¶

The following table documents the five most frequent pipeline failure scenarios in ATP, representing 85%+ of all build failures. Each scenario includes symptom recognition, root cause analysis, and step-by-step resolution.

Issue	Symptom	Root Cause	Resolution	Prevention
Restore Failure	NuGet package not found	Feed authentication failure or package unavailable	Verify feed authentication; check package availability	Use package lock files; monitor feed health
Test Timeout	Tests hang or exceed 10 min	Slow integration tests; database connection issues	Increase timeout in `.runsettings`; investigate slow tests	Parallelize tests; optimize test data
Coverage Below Threshold	Coverage gate fails	New code not tested; threshold too high	Add unit tests; exclude generated code from coverage	Enforce coverage in PR reviews
Docker Build Failure	Layer caching issues; base image unavailable	Docker cache corrupted; registry unavailable	Clear Docker cache; verify base image availability	Pin base image versions; monitor registry
Deployment Failure	Azure App Service error	Invalid configuration; Key Vault secrets unavailable	Check service logs; verify connection strings/secrets	Test deployments in dev first

Restore Failure (NuGet Package Not Found)¶

Symptom:

Error: NU1101: Unable to find package ConnectSoft.ATP.Contracts. 
No packages exist with this id in source(s): 
  - https://api.nuget.org/v3/index.json
  - https://pkgs.dev.azure.com/ConnectSoft/_packaging/ConnectSoft/nuget/v3/index.json

Build FAILED.

Root Causes:

Feed Authentication Failure: Azure Artifacts credential not injected.
Package Not Published: ConnectSoft.ATP.Contracts not yet published to feed.
Package Version Mismatch: Requesting version 1.0.42; only 1.0.41 available.
Feed Permissions: Service connection lacks read permission on feed.
Network Issues: Feed unreachable from pipeline agent.

Resolution Steps:

# 1. Verify feed authentication
- task: NuGetAuthenticate@1
  displayName: 'Authenticate to Azure Artifacts'
  inputs:
    nuGetServiceConnections: 'ConnectSoft-Feed'  # Verify service connection exists

# 2. Check package availability
- script: |
    # List packages in feed
    dotnet nuget list source

    # Search for package
    dotnet package search ConnectSoft.ATP.Contracts \
      --source https://pkgs.dev.azure.com/ConnectSoft/_packaging/ConnectSoft/nuget/v3/index.json
  displayName: 'Verify Package Availability'

# 3. Verify service connection permissions
# Navigate to: Project Settings → Service connections → ConnectSoft-Feed
# Ensure: "Read" permission granted for Azure Artifacts

# 4. Test feed connectivity
- script: |
    curl -v https://pkgs.dev.azure.com/ConnectSoft/_packaging/ConnectSoft/nuget/v3/index.json
  displayName: 'Test Feed Connectivity'

Prevention:

Package Lock Files: Use packages.lock.json to pin exact versions.
Feed Health Monitoring: Alert if feed unavailable.
Pre-Build Validation: Verify all packages available before restore.

Quick Fix (Temporary):

# Fallback to public NuGet.org if private feed fails
- task: DotNetCoreCLI@2
  condition: failed()  # Only if previous restore failed
  inputs:
    command: 'restore'
    projects: '$(solution)'
    feedsToUse: 'select'
    includeNuGetOrg: true  # Include NuGet.org as fallback

Test Timeout¶

Symptom:

Test execution timed out after 600000 ms (10 minutes).

Test Run Canceled.
Tests: 842 passed, 0 failed, 392 not executed

Root Causes:

Slow Integration Tests: Database queries, network calls, message bus operations.
Service Container Startup: Elasticsearch takes 60+ seconds to become healthy.
Deadlock: Tests waiting on each other (shared resource contention).
Infinite Loop: Test logic error causing hang.
Resource Exhaustion: Agent runs out of memory/CPU.

Resolution Steps:

# 1. Increase timeout in .runsettings
<?xml version="1.0" encoding="utf-8"?>
<RunSettings>
  <RunConfiguration>
    <TestSessionTimeout>1200000</TestSessionTimeout>  <!-- 20 minutes (was 10) -->
  </RunConfiguration>
</RunSettings>

# 2. Identify slow tests
- script: |
    # Run tests with detailed timing
    dotnet test --logger "console;verbosity=detailed" > test-output.log

    # Parse slowest tests
    grep "Test.*ms" test-output.log | sort -t: -k2 -nr | head -10
  displayName: 'Identify Slow Tests'

# 3. Parallelize test assemblies
- task: DotNetCoreCLI@2
  inputs:
    command: 'test'
    arguments: '--parallel --max-parallel-threads 4'
  displayName: 'Run Tests (Parallel)'

# 4. Category-based execution (skip slow tests in PR builds)
- task: DotNetCoreCLI@2
  condition: eq(variables['Build.Reason'], 'PullRequest')
  inputs:
    command: 'test'
    arguments: '--filter "Category!=Slow"'
  displayName: 'Run Fast Tests Only (PR)'

Prevention:

Tag Slow Tests: Mark with [Category("Slow")] attribute.
Optimize Test Data: Use minimal datasets (10 records instead of 10,000).
Shared Fixtures: Reuse expensive setup (database migrations, seedin data).
Test Isolation: Ensure tests don't depend on each other (parallel-safe).

Quick Fix (Identify Hanging Test):

- script: |
    # Run tests with timeout per test (not per assembly)
    dotnet test --blame-hang --blame-hang-timeout 60s

    # Generates sequence file showing last executed test before hang
    cat TestResults/Sequence_*.xml
  displayName: 'Identify Hanging Test'

Coverage Below Threshold¶

Symptom:

Code Coverage Gate Failed
  Current: 68.5%
  Required: 70.0%
  Gap: -1.5%

Pipeline FAILED at stage: CI_Stage, job: Build_Test_Publish, step: Enforce Coverage Threshold

Root Causes:

New Code Not Tested: PR adds feature without unit tests.
Threshold Too High: Service threshold increased without adding tests.
Coverage Calculation Error: Generated code not excluded from coverage.
Refactoring: Code restructured; old tests deleted, new tests not yet added.

Resolution Steps:

# 1. Identify uncovered code
- script: |
    # Generate detailed coverage report
    dotnet test --collect:"XPlat Code Coverage" --results-directory ./coverage

    # Convert to HTML report
    reportgenerator \
      -reports:./coverage/**/coverage.cobertura.xml \
      -targetdir:./coverage-report \
      -reporttypes:Html

    # Identify files with low coverage
    grep -A 5 "class.*0%" ./coverage-report/index.html
  displayName: 'Identify Uncovered Code'

# 2. Exclude generated code from coverage
<?xml version="1.0" encoding="utf-8"?>
<RunSettings>
  <DataCollectionRunSettings>
    <DataCollectors>
      <DataCollector friendlyName="XPlat Code Coverage">
        <Configuration>
          <Exclude>
            [*Tests]*,
            [*Migrations]*,
            [*.Designer]*,
            [*.Generated]*
          </Exclude>
        </Configuration>
      </DataCollector>
    </DataCollectors>
  </DataCollectionRunSettings>
</RunSettings>

# 3. Request exception (if justified)
# Create work item: "Request coverage exception for ServiceX"
# Justification: "Generated code accounts for 5% of codebase; cannot be tested"
# Approval: Platform team approves exception; reduce threshold to 65%

Prevention:

PR Coverage Check: Require coverage ≥threshold for new code (not just overall).
Test-First Development: Write tests before implementation (TDD).
Coverage Trends: Monitor coverage over time; alert on decreases.

Quick Fix (Temporary Exclusion):

// Exclude specific class from coverage (use sparingly)
[ExcludeFromCodeCoverage]
public class LegacyDataMigrator
{
    // One-time migration code; low test value
}

Docker Build Failure¶

Symptom:

Error response from daemon: pull access denied for mcr.microsoft.com/dotnet/aspnet,
repository does not exist or may require 'docker login'

Docker build FAILED at step: FROM mcr.microsoft.com/dotnet/aspnet:8.0-alpine

Root Causes:

Base Image Unavailable: Registry unreachable or image deleted.
Layer Caching Corruption: Docker cache contains corrupted layers.
Dockerfile Syntax Error: COPY path incorrect (file not found).
Build Context Issue: Build context doesn't include required files.
Network Issues: Agent can't reach container registry.

Resolution Steps:

# 1. Verify base image exists
- script: |
    docker pull mcr.microsoft.com/dotnet/aspnet:8.0-alpine
  displayName: 'Verify Base Image'

# 2. Clear Docker cache
- script: |
    docker system prune --all --force
    docker builder prune --all --force
  displayName: 'Clear Docker Cache'

# 3. Build without cache
- task: Docker@2
  inputs:
    command: 'build'
    arguments: '--no-cache --pull'  # Force re-pull base image
    dockerfile: '$(dockerfilePath)'
  displayName: 'Build Docker Image (No Cache)'

# 4. Validate build context
- script: |
    # List files in build context
    ls -laR $(Build.SourcesDirectory)

    # Verify Dockerfile COPY paths exist
    test -f src/ConnectSoft.ATP.Ingestion/ConnectSoft.ATP.Ingestion.csproj || echo "ERROR: Project file not found"
  displayName: 'Validate Build Context'

# 5. Use alternative registry
# If MCR unreachable, use Docker Hub mirror
FROM docker.io/library/dotnet-aspnet:8.0-alpine  # Alternative

Prevention:

Pin Base Image Versions: Use aspnet:8.0.1-alpine (not aspnet:8.0-alpine).
Local Registry Mirror: Configure Azure DevOps agents to use ACR as pull-through cache.
BuildKit Caching: Enable BuildKit for improved layer caching.

Quick Fix (Rebuild from Scratch):

- script: |
    # Nuclear option: Remove all Docker data and rebuild
    docker system prune --all --volumes --force
    docker pull mcr.microsoft.com/dotnet/sdk:8.0
    docker pull mcr.microsoft.com/dotnet/aspnet:8.0-alpine
  displayName: 'Reset Docker Environment'

Deployment Failure (Azure App Service)¶

Symptom:

Error: Failed to deploy Web package to App Service.
Error Code: ERROR_DESTINATION_NOT_REACHABLE
More Information: Could not connect to the remote computer ("atp-ingestion-staging.scm.azurewebsites.net")

Deployment FAILED.

Root Causes:

App Service Down: Service stopped or in unhealthy state.
Network Restrictions: NSG/firewall blocks pipeline agent IP.
Invalid Secrets: Key Vault secrets not accessible (Managed Identity issue).
Configuration Error: App settings reference non-existent Key Vault secret.
Deployment Slot Issue: Staging slot doesn't exist.

Resolution Steps:

# 1. Verify App Service status
- task: AzureCLI@2
  displayName: 'Check App Service Status'
  inputs:
    azureSubscription: $(azureSubscription)
    scriptType: 'bash'
    scriptLocation: 'inlineScript'
    inlineScript: |
      # Get App Service state
      STATE=$(az webapp show --name atp-ingestion-staging --resource-group ConnectSoft-ATP-Staging-RG --query state -o tsv)

      if [ "$STATE" != "Running" ]; then
        echo "ERROR: App Service in state: $STATE (expected Running)"

        # Start App Service
        az webapp start --name atp-ingestion-staging --resource-group ConnectSoft-ATP-Staging-RG
        echo "App Service started; retry deployment"
      fi

# 2. Verify Key Vault access
- script: |
    # Test Managed Identity can access Key Vault
    az keyvault secret list --vault-name atp-kv-staging
  displayName: 'Verify Key Vault Access'

# 3. Check deployment slot exists
- task: AzureCLI@2
  displayName: 'Verify Deployment Slot'
  inputs:
    azureSubscription: $(azureSubscription)
    scriptType: 'bash'
    scriptLocation: 'inlineScript'
    inlineScript: |
      # List deployment slots
      SLOTS=$(az webapp deployment slot list \
        --name atp-ingestion-staging \
        --resource-group ConnectSoft-ATP-Staging-RG \
        --query "[].name" -o tsv)

      if ! echo "$SLOTS" | grep -q "staging"; then
        echo "ERROR: Staging slot not found; creating..."
        az webapp deployment slot create \
          --name atp-ingestion-staging \
          --resource-group ConnectSoft-ATP-Staging-RG \
          --slot staging
      fi

# 4. Review App Service logs
- task: AzureCLI@2
  condition: failed()
  displayName: 'Download App Service Logs'
  inputs:
    azureSubscription: $(azureSubscription)
    scriptType: 'bash'
    scriptLocation: 'inlineScript'
    inlineScript: |
      # Download last 100 log entries
      az webapp log download \
        --name atp-ingestion-staging \
        --resource-group ConnectSoft-ATP-Staging-RG \
        --log-file app-service-logs.zip

      # Inspect logs
      unzip -p app-service-logs.zip LogFiles/Application/eventlog.xml | tail -100

Prevention:

Pre-Deployment Health Check: Verify App Service running before deployment.
Managed Identity Testing: Validate Key Vault access in staging before production.
Configuration Validation: Test app settings in dev environment first.

Quick Fix (Restart App Service):

- task: AzureAppServiceManage@0
  displayName: 'Restart App Service'
  inputs:
    azureSubscription: $(azureSubscription)
    action: 'Restart Azure App Service'
    webAppName: atp-ingestion-staging

Debug Strategies¶

When standard troubleshooting doesn't identify the root cause, ATP provides advanced debugging techniques for deeper investigation.

Local Reproduction¶

Purpose: Reproduce pipeline failures on local development machine for interactive debugging.

Strategy:

Clone Repository: git clone https://dev.azure.com/dmitrykhaymov/ATP/_git/Ingestion
Checkout Commit: git checkout <commit-sha> (same commit that failed in pipeline)
Install Dependencies: dotnet restore (use same feed as pipeline)
Run Tests Locally: dotnet test --settings ConnectSoft.ATP.Ingestion.runsettings

Local Test Execution:

# Use same .runsettings as pipeline
dotnet test \
  --settings ConnectSoft.ATP.Ingestion.runsettings \
  --configuration Release \
  --logger "console;verbosity=detailed"

# Output shows exact test failure with stack trace
# Can attach debugger (Visual Studio, Rider) for step-through debugging

Service Container Setup (Docker Compose):

Replicate pipeline service containers locally:

# docker-compose.test.yml
version: '3.8'
services:
  redis:
    image: redis:7-alpine
    ports:
      - "6379:6379"

  rabbitmq:
    image: rabbitmq:3-management-alpine
    ports:
      - "5672:5672"
      - "15672:15672"

  postgres:
    image: postgres:16-alpine
    environment:
      POSTGRES_PASSWORD: postgres
      POSTGRES_DB: ATPTest
    ports:
      - "5432:5432"

  otel-collector:
    image: otel/opentelemetry-collector:0.97.0
    ports:
      - "4317:4317"
      - "8888:8888"

Run Tests with Containers:

# Start containers
docker-compose -f docker-compose.test.yml up -d

# Wait for containers to be healthy
sleep 10

# Run tests (connect to localhost:6379, localhost:5432, etc.)
dotnet test --settings ConnectSoft.ATP.Ingestion.runsettings

# Stop containers
docker-compose -f docker-compose.test.yml down --volumes

Benefits:

Interactive Debugging: Set breakpoints, inspect variables, step through code.
Fast Iteration: Fix issue, rerun test immediately (no pipeline queue time).
Root Cause Analysis: Detailed stack traces and error messages.

Limitations:

Environment Differences: Local machine != pipeline agent (different OS, software versions).
Timing Issues: Race conditions may not reproduce locally.
Network Dependencies: External APIs behave differently locally.

Service Container Logs¶

Purpose: Capture logs from service containers (Redis, SQL, RabbitMQ) to diagnose integration test failures.

Strategy:

# Capture container logs before pipeline completes
- script: |
    echo "=== Redis Logs ==="
    docker logs $(docker ps -q --filter "ancestor=redis:7-alpine")

    echo "=== RabbitMQ Logs ==="
    docker logs $(docker ps -q --filter "ancestor=rabbitmq:3-management-alpine")

    echo "=== Elasticsearch Logs ==="
    docker logs $(docker ps -q --filter "ancestor=docker.elastic.co/elasticsearch/elasticsearch:8.11.0")
  displayName: 'Capture Service Container Logs'
  condition: always()  # Run even if tests failed

Redis Log Analysis:

1:C 30 Oct 2025 14:05:12.345 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
1:C 30 Oct 2025 14:05:12.345 # Redis version=7.2.3, bits=64, pid=1, just started
1:M 30 Oct 2025 14:05:12.456 * Ready to accept connections

⚠️ WARNING: 1234 connections opened in last 60 seconds (possible connection leak)

RabbitMQ Log Analysis:

2025-10-30 14:05:15.123 [info] <0.234.0> accepting AMQP connection <0.234.0> (172.17.0.1:49152 -> 172.17.0.2:5672)
2025-10-30 14:05:15.234 [info] <0.234.0> connection <0.234.0> (172.17.0.1:49152 -> 172.17.0.2:5672): user 'guest' authenticated and granted access to vhost '/'

❌ ERROR: Queue 'audit.events.inbox' declared with conflicting arguments

SQL Server Log Analysis:

# For SQL Server container
docker exec $(docker ps -q --filter "ancestor=mcr.microsoft.com/mssql/server:2022-latest") \
  /opt/mssql-tools/bin/sqlcmd -S localhost -U sa -P 'P@ssw0rd123!' -Q "SELECT TOP 100 * FROM sys.dm_exec_requests WHERE status = 'suspended'"

# Shows deadlocks, blocked queries, long-running transactions

Container Health Check:

- script: |
    # Verify all containers healthy
    for container in redis rabbitmq postgres otel; do
      HEALTH=$(docker inspect --format='{{.State.Health.Status}}' $container)
      if [ "$HEALTH" != "healthy" ]; then
        echo "⚠️ Container $container unhealthy: $HEALTH"
        docker logs $container --tail 50
      fi
    done
  displayName: 'Check Container Health'

Benefits:

Diagnose Integration Failures: See why Redis connection failed, SQL query deadlocked, or RabbitMQ message not delivered.
Performance Issues: Identify slow queries, connection leaks, resource exhaustion.

Pipeline Artifacts¶

Purpose: Download and inspect build artifacts to verify contents (missing files, incorrect versions, corrupted binaries).

Strategy:

# Download artifacts locally for inspection
az pipelines runs artifact download \
  --artifact-name atp-ingestion-drop \
  --run-id 12345 \
  --path ./artifacts

# Inspect artifact contents
cd artifacts
ls -laR

# Verify expected files present
test -f ConnectSoft.ATP.Ingestion.dll || echo "ERROR: Main assembly missing"
test -f appsettings.json || echo "ERROR: Configuration missing"
test -f MassTransit.dll || echo "ERROR: Dependency missing"

# Check assembly versions
dotnet --info ./ConnectSoft.ATP.Ingestion.dll

Artifact Verification (In Pipeline):

- script: |
    # List artifact contents
    find $(Build.ArtifactStagingDirectory) -type f -exec ls -lh {} \;

    # Verify critical files
    REQUIRED_FILES=(
      "ConnectSoft.ATP.Ingestion.dll"
      "appsettings.json"
      "web.config"
    )

    for file in "${REQUIRED_FILES[@]}"; do
      if [ ! -f "$(Build.ArtifactStagingDirectory)/$file" ]; then
        echo "##vso[task.logissue type=error]Missing required file: $file"
        exit 1
      fi
    done
  displayName: 'Verify Artifact Contents'

Docker Image Inspection:

# Inspect Docker image layers
docker history connectsoft.azurecr.io/atp-ingestion:1.0.42

# Inspect image filesystem
docker run --rm connectsoft.azurecr.io/atp-ingestion:1.0.42 ls -la /app

# Verify entry point
docker inspect --format='{{.Config.Entrypoint}}' connectsoft.azurecr.io/atp-ingestion:1.0.42

Benefits:

Validate Artifact Completeness: Ensure all files packaged correctly.
Debug Deployment Issues: Verify artifact contents match expectations.
Compare Versions: Diff artifacts from successful vs. failed builds.

Verbose Logging¶

Purpose: Enable detailed pipeline logging for troubleshooting intermittent or hard-to-diagnose failures.

Strategy: Set system.debug variable to true (enables verbose output from all tasks).

Enable Verbose Logging:

# Option 1: Set variable in pipeline YAML
variables:
  system.debug: true  # Enable for specific pipeline run

# Option 2: Set at queue time (Azure DevOps UI)
# When queueing build: Variables → Add variable
# Name: system.debug
# Value: true

# Option 3: Set via Azure CLI
az pipelines run --id 12345 --variables system.debug=true

Verbose Output (Example):

##[debug]Evaluating condition for step: 'dotnet restore'
##[debug]Evaluating: succeeded()
##[debug]Expanded: True
##[debug]Result: True
##[debug]Starting: dotnet restore
##[debug]Loading inputs
##[debug]Loading env
##[debug]Evaluating: parameters.restoreVstsFeed
##[debug]Expanded: e4c108b4-7989-4d22-93d6-391b77a39552
##[debug]##[command]"/usr/bin/dotnet" restore "**/*.slnx" --source https://api.nuget.org/v3/index.json
##[debug]Resolved feed: https://pkgs.dev.azure.com/ConnectSoft/_packaging/ConnectSoft/nuget/v3/index.json
##[debug]Authenticating to feed using service connection: ConnectSoft-Feed
##[debug]NuGet.config generated at: /home/vsts/work/_temp/NuGet.config
##[debug]Restoring packages for ConnectSoft.ATP.Ingestion.slnx
##[debug]Downloaded package: MassTransit.8.1.0 (512 KB)
##[debug]Restore completed in 12.3 seconds

Verbose Logging Use Cases:

Authentication Issues: See exact authentication flow (service connection, tokens).
Variable Expansion: See how variables resolve ($(solution) → **/*.slnx).
Conditional Logic: See which conditions evaluate true/false.
Task Inputs: See exact parameters passed to tasks.

Performance Impact:

Verbose logging increases log volume 5-10x (slower log upload):

Normal logging: 2 MB pipeline logs
Verbose logging: 15 MB pipeline logs (7.5x increase)

Best Practice: Enable verbose logging only for troubleshooting (disable after issue resolved).

Selective Verbose Logging (Specific Task):

- task: DotNetCoreCLI@2
  displayName: 'dotnet restore (verbose)'
  inputs:
    command: 'restore'
    verbosity: 'Diagnostic'  # Task-specific verbose mode
  env:
    DOTNET_CLI_TELEMETRY_OPTOUT: false  # Enable telemetry for diagnostics

Advanced Debugging Techniques¶

Beyond standard debug strategies, ATP provides specialized techniques for complex failure scenarios.

Pipeline Timeline Analysis¶

Purpose: Visualize pipeline execution to identify bottlenecks and failures.

Azure DevOps Timeline View:

Navigate to: Pipeline Run → Summary → Timeline

Timeline Shows:

Jobs: Parallel jobs (e.g., Build_Backend, Build_Frontend).
Steps: Sequential steps within jobs (Lint → Build → Test).
Duration: Each step's duration (identify slow steps).
Dependencies: Job dependencies (which jobs waiting on which).
Failures: Failed steps highlighted in red.

Example Analysis:

Timeline: ATP-Ingestion-CI-CD #142

Job: Build_Test_Publish (8m 23s)
  ├─ Checkout (12s)
  ├─ Lint (1m 45s)
  ├─ Build (2m 10s)
  ├─ Test (3m 48s) ⚠️ Bottleneck
  │   ├─ Start containers (35s)
  │   ├─ dotnet test (3m 12s) ⚠️ Slow
  │   └─ Publish coverage (1s)
  ├─ Docker Build (1m 30s)
  └─ Publish (42s)

Insight: Test step consumes 45% of total pipeline time; optimize tests.

Remote Debugging (Failed Deployment)¶

Purpose: Debug failed deployments by inspecting deployed App Service.

Strategy:

# Enable remote debugging in App Service (temporarily)
- task: AzureCLI@2
  condition: failed()
  displayName: 'Enable Remote Debugging'
  inputs:
    azureSubscription: $(azureSubscription)
    scriptType: 'bash'
    scriptLocation: 'inlineScript'
    inlineScript: |
      # Enable remote debugging
      az webapp config set \
        --name atp-ingestion-staging \
        --resource-group ConnectSoft-ATP-Staging-RG \
        --remote-debugging-enabled true

      # Get debugging URL
      echo "Remote debugging enabled: https://atp-ingestion-staging.scm.azurewebsites.net/DebugConsole"

Kudu Console Access:

Navigate to: https://<appname>.scm.azurewebsites.net/DebugConsole

Debug Actions:

File Explorer: Browse deployed files; verify binaries present.
Process Explorer: View running processes; check memory/CPU usage.
Environment Variables: Inspect app settings; verify Key Vault references resolved.
Log Stream: Real-time application logs.

Kudu API (Programmatic Access):

# List deployed files
curl https://atp-ingestion-staging.scm.azurewebsites.net/api/vfs/site/wwwroot/ \
  --user '$atp-ingestion-staging:<deployment-password>'

# Download specific file
curl https://atp-ingestion-staging.scm.azurewebsites.net/api/vfs/site/wwwroot/appsettings.json \
  --user '$atp-ingestion-staging:<deployment-password>' \
  -o appsettings.json

# Check app settings
curl https://atp-ingestion-staging.scm.azurewebsites.net/api/settings \
  --user '$atp-ingestion-staging:<deployment-password>'

Parallel Job Debugging¶

Purpose: Debug failures in parallel jobs (e.g., Build_Backend and Build_Frontend running simultaneously).

Strategy:

Azure DevOps provides per-job logs; download separately:

# Download specific job logs
az pipelines runs show --id 12345 --open

# Navigate to failed job
# Click "View raw log" → Download log file

# Analyze job-specific logs
grep "ERROR" job-build-frontend.log

Job Failure Correlation:

# In pipeline: Fail entire stage if any job fails
- stage: CI_Stage
  jobs:
  - job: Build_Backend
    steps: [...]

  - job: Build_Frontend
    steps: [...]

  # Both jobs must succeed for stage to succeed
  # If either fails, stage fails (downstream stages blocked)

Artifact Diff Analysis¶

Purpose: Compare artifacts from successful build vs. failed deployment to identify differences.

Strategy:

# Download successful build artifact (build #141)
az pipelines runs artifact download --run-id 141 --artifact-name atp-ingestion-drop --path ./successful

# Download failed build artifact (build #142)
az pipelines runs artifact download --run-id 142 --artifact-name atp-ingestion-drop --path ./failed

# Diff artifacts
diff -r ./successful ./failed

# Output shows added/removed/modified files
# Example: "Only in failed: MassTransit.Newtonsoft.dll" (unexpected dependency)

Binary Diff (Assembly Comparison):

# Compare assembly metadata
dotnet --info ./successful/ConnectSoft.ATP.Ingestion.dll > successful-info.txt
dotnet --info ./failed/ConnectSoft.ATP.Ingestion.dll > failed-info.txt
diff successful-info.txt failed-info.txt

# Compare file hashes
sha256sum ./successful/*.dll > successful-hashes.txt
sha256sum ./failed/*.dll > failed-hashes.txt
diff successful-hashes.txt failed-hashes.txt

Benefits:

Identify Unexpected Changes: Find files added/removed between builds.
Dependency Drift: Detect dependency version changes.
Configuration Changes: Compare appsettings.json between builds.

Diagnostic Tools & Extensions¶

Azure DevOps Extensions (Marketplace):

Pipeline Analyzer: Identifies optimization opportunities (caching, parallelization).
Build Failure Annotator: Automatically categorizes failure reasons.
Test Impact Analysis: Runs only tests affected by code changes.

CLI Tools:

# Azure DevOps CLI (debugging pipelines)
az extension add --name azure-devops

# List recent failed builds
az pipelines runs list --status failed --top 10

# Show specific build details
az pipelines runs show --id 12345

# Download build logs
az pipelines runs show --id 12345 --open > build-log.json

See Also: Runbook for incident response, observability dashboards, and on-call procedures in operations/runbook.md.

Continuous Improvement & Roadmap¶

ATP pipelines are continuously evolving to improve developer experience, increase deployment frequency, enhance security, and reduce toil. The Platform Engineering team maintains a quarterly roadmap aligned with industry best practices (DORA metrics, DevOps Research), emerging technologies (AI-assisted testing, policy-as-code), and organizational goals (faster time-to-market, compliance automation).

This section documents the current state of ATP pipelines (baseline metrics, capabilities, constraints), the improvement roadmap (quarterly initiatives with measurable outcomes), and innovation areas (experimental technologies and forward-looking investments).

Current State¶

Assessment Date: Q4 2025
Scope: All ATP microservice pipelines (7 services)

Pipeline Maturity¶

Deployment Automation:

Dev Environment: Fully automated (100% of commits deployed within 15 minutes).
Test Environment: Fully automated (100% of master branch commits deployed).
Staging Environment: Semi-automated (manual approval required; ~80% approval rate within 4 hours).
Production Environment: Manual deployment (2 approvals + CAB review for major changes; ~2 deployments per month).

Current Limitations:

Staging Approvals: Avg 4-hour wait time (bottleneck in lead time).
Production Frequency: Bi-weekly deployments (target: daily for Elite DORA performance).
Rollback: Manual process (5-10 minutes; target: automated < 1 minute).

Code Coverage¶

Average Coverage Across ATP Services: 70.2%

Service Breakdown:

Service	Line Coverage	Branch Coverage	Target	Gap
Ingestion	76.3%	63.2%	75% / 60%	✅ Met
Query	81.2%	68.5%	80% / 60%	✅ Met
Gateway	67.1%	58.9%	65% / 60%	✅ Met
Integrity	86.4%	72.3%	85% / 60%	✅ Met
Export	71.8%	61.2%	70% / 60%	✅ Met
Policy	64.2%	56.7%	75% / 60%	❌ -10.8% (Priority: Add tests)
Search	72.5%	64.1%	70% / 60%	✅ Met

Coverage Trends (Last Quarter):

Overall Trend: +2.3% increase (68.0% → 70.2%) — positive momentum.
Policy Service: Declined 4% (requires attention; new features added without tests).
Integrity Service: Increased 6% (comprehensive crypto testing initiative).

Action Items:

Policy Service: Sprint dedicated to increasing coverage (add 50+ unit tests; target 75% by Q1 2025).
Maintain Momentum: Coverage requirements enforced in PR reviews; no regressions allowed.

Pipeline Performance¶

CI Stage Duration:

Service	P50 (Median)	P95	Target	Status
Ingestion	7m 45s	9m 12s	<10m	✅ Good
Query	8m 32s	10m 48s	<10m	⚠️ P95 exceeds target
Gateway	6m 18s	7m 54s	<10m	✅ Excellent
Integrity	5m 23s	6m 45s	<10m	✅ Excellent
Export	7m 01s	8m 23s	<10m	✅ Good
Policy	4m 56s	5m 48s	<10m	✅ Excellent
Search	9m 12s	11m 34s	<10m	❌ Exceeds target

Full Pipeline Duration (CI + CD_Dev):

Average: 18m 42s (CI: 8m, CD_Dev: 10m 42s).
Target: <20m (currently meeting target).

Bottlenecks Identified:

Query Service (P95: 10m 48s):
- Root Cause: Elasticsearch container startup (60s) + complex integration tests (4m).
- Resolution: Pre-warm Elasticsearch indices; optimize query test data.
Search Service (P95: 11m 34s):
- Root Cause: Large Elasticsearch index creation (3m) + full-text search tests (5m).
- Resolution: Use smaller test index; parallelize test assemblies.

Optimization Initiatives (Q1 2025):

NuGet Package Caching: Implement Azure Pipelines cache task (expected 20% faster restores).
Docker BuildKit: Enable BuildKit caching (expected 30% faster Docker builds).
Test Parallelization: Split test assemblies across multiple agents (expected 40% faster test stage).

Success Rate¶

Overall Success Rate (Last 30 Days): 94.2%

Service	Success Rate	Failed Builds	Primary Failure Reason
Ingestion	96.7%	2/60	Flaky Redis connection test
Query	91.2%	5/57	Elasticsearch timeout
Gateway	97.3%	1/37	Frontend build (npm install failure)
Integrity	98.1%	1/52	Coverage dropped below 85%
Export	93.5%	4/62	Service container unavailable
Policy	89.8%	6/59	Test failures (new features)
Search	95.4%	3/65	Elasticsearch index conflict

Target: ≥95% success rate for all services.

Action Items:

Policy Service (89.8%): Fix flaky tests; stabilize new feature tests.
Query Service (91.2%): Increase Elasticsearch startup timeout; optimize slow tests.
Export Service (93.5%): Improve service container health checks.

Improvement Roadmap¶

ATP's pipeline improvement roadmap focuses on automation, security, developer experience, and intelligence — progressing toward Elite DORA performance while maintaining compliance and reliability.

Q1 2025: Fully Automated Canary Deployments¶

Goal: Eliminate manual approvals for production deployments; enable automatic canary rollouts based on metrics.

Initiatives:

Automated Production Deployments (Week 1-4):

Current: Manual approval required (2 approvers); deployments blocked until approval (avg 24-hour delay).
Target: Automated canary deployment triggered on staging success; metrics-based progression.

Implementation:

- stage: CD_Production
  dependsOn: CD_Staging
  condition: |
    and(
      succeeded(),
      eq(variables['Build.SourceBranch'], 'refs/heads/master'),
      eq(variables['AutoDeployProduction'], 'true')  # Feature flag
    )
  jobs:
  - deployment: CanaryProduction
    environment: ATP-Production
    strategy:
      canary:
        increments: [5, 20, 50]  # More conservative (5% → 20% → 50% → 100%)
        preDeploy:
          steps:
          - script: verify_staging_stable.sh --hours 48  # Staging stable for 48h

Success Criteria: 10 successful automated deployments; zero rollbacks; avg lead time reduced to 2 days.

Enhanced Metrics Monitoring (Week 5-8):

Current: Manual metric review during deployments.
Target: Automated metrics validation with ML-based anomaly detection.

Implementation:

# metrics_validator.py - ML-based anomaly detection
from azure.monitor import MetricsClient
from sklearn.ensemble import IsolationForest

def detect_anomalies(service_name, deployment_version):
    # Fetch metrics (error rate, latency, throughput)
    metrics = fetch_metrics(service_name, last_30_minutes)

    # Train isolation forest on baseline
    model = IsolationForest(contamination=0.1)
    model.fit(baseline_metrics)

    # Predict anomalies
    predictions = model.predict(metrics)
    anomalies = [m for m, p in zip(metrics, predictions) if p == -1]

    if anomalies:
        return {"status": "ANOMALY_DETECTED", "rollback": True}
    return {"status": "HEALTHY", "rollback": False}

Success Criteria: 95% anomaly detection accuracy; <1% false positive rate.

Automated Rollback (Week 9-12):

Current: SRE manually aborts deployment and swaps slots.
Target: Automated rollback triggered by metrics degradation.

Implementation:

postRouteTraffic:
  steps:
  - script: |
      python metrics_validator.py \
        --service atp-ingestion \
        --version $(Build.BuildNumber) \
        --threshold error_rate=1%,latency_p95=1000ms

      if [ $? -ne 0 ]; then
        echo "Metrics degraded; triggering automated rollback"
        exit 1  # Triggers on:failure block
      fi

  on:
    failure:
      steps:
      - script: rollback_deployment.sh --environment production
      - script: notify_oncall.sh --severity critical

Success Criteria: Automated rollback completes in <1 minute; on-call team notified.

Expected Outcomes:

Deployment Frequency: Increase from 2/month to 8/month (weekly).
Lead Time: Reduce from 5.2 days to 2.5 days (manual approvals eliminated).
Change Failure Rate: Maintain <10% (automated rollback prevents prolonged outages).

Q2 2025: Shift-Left Security¶

Goal: Detect security issues before code commit using pre-commit hooks, IDE analyzers, and developer training.

Initiatives:

Pre-Commit Hooks (Week 1-3):

Current: Security scans run in CI pipeline (10-minute feedback loop).
Target: Security scans run locally before commit (instant feedback).

Implementation:

# .git/hooks/pre-commit (installed via Husky or Git hooks)
#!/bin/bash

echo "Running pre-commit security checks..."

# Secrets detection (detect-secrets)
detect-secrets scan --baseline .secrets.baseline
if [ $? -ne 0 ]; then
  echo "❌ Secrets detected! Commit blocked."
  exit 1
fi

# SAST (security code scan)
security-code-scan --project *.csproj --threshold high
if [ $? -ne 0 ]; then
  echo "❌ Security vulnerabilities detected! Commit blocked."
  exit 1
fi

echo "✅ Security checks passed"

Success Criteria: 100% of developers use pre-commit hooks; 50% reduction in CI security failures.

IDE Analyzers (Week 4-6):

Current: Developers see security issues only in CI pipeline (delayed feedback).
Target: Real-time security feedback in IDE (Visual Studio, Rider).

Implementation:

<!-- Directory.Build.props (applies to all projects) -->
<PropertyGroup>
  <AnalysisLevel>latest</AnalysisLevel>
  <EnforceCodeStyleInBuild>true</EnforceCodeStyleInBuild>
  <EnableNETAnalyzers>true</EnableNETAnalyzers>
</PropertyGroup>

<ItemGroup>
  <PackageReference Include="SonarAnalyzer.CSharp" Version="9.12.0.78982">
    <PrivateAssets>all</PrivateAssets>
  </PackageReference>
  <PackageReference Include="SecurityCodeScan.VS2019" Version="5.6.7">
    <PrivateAssets>all</PrivateAssets>
  </PackageReference>
</ItemGroup>

Success Criteria: Developers fix 80% of security issues before commit.

Developer Security Training (Week 7-12):
- Current: Ad-hoc security awareness.
- Target: Quarterly security training with certification.
- Topics: Secure coding (OWASP Top 10), secrets management, dependency security, container security.
- Success Criteria: 100% developer completion; zero hardcoded secrets in Q2 builds.

Expected Outcomes:

Security Scan Failures: Reduce by 70% (caught pre-commit).
Developer Experience: Faster feedback (seconds vs. minutes).
Security Culture: Proactive security mindset.

Q3 2025: Self-Service Rollback¶

Goal: Enable service teams to rollback deployments without SRE intervention.

Initiatives:

Rollback UI (Week 1-6):
- Current: SRE manually swaps deployment slots or redeploys previous version.
- Target: Developers click "Rollback" button in Azure DevOps UI.
- Implementation:
```
# Custom Azure DevOps extension: Rollback button
# Adds "Rollback to Previous Version" action to deployment view

- task: AzureWebApp@1
  displayName: 'Rollback to Previous Version'
  inputs:
    azureSubscription: $(azureSubscription)
    appName: atp-ingestion-prod
    package: '$(previousVersionArtifact)'  # Fetched from history
```
- Safety Controls:
  - Rollback limited to last 3 deployments (prevents rolling back to very old versions).
  - Approval required for production rollbacks (1 approver from SRE team).
  - Rollback recorded in audit log (who, when, why).
- Success Criteria: 10 successful self-service rollbacks; avg rollback time <3 minutes.
Deployment History UI (Week 7-9):
- Current: Deployment history in Azure DevOps (basic view).
- Target: Rich deployment history with diff, metrics, and one-click rollback.
- Features:
  - Version Comparison: Diff between current and previous versions (code changes, dependencies).
  - Metrics Overlay: Error rate, latency trends overlaid on deployment timeline.
  - Rollback Simulation: Preview rollback impact before executing.
- Success Criteria: Service teams use UI for 90% of rollbacks (SRE escalations reduced).

Automated Rollback Policies (Week 10-12):

Current: Hardcoded rollback thresholds (error rate >1%, latency >2x baseline).
Target: Configurable rollback policies per service (JSON/YAML configuration).

Implementation:

# rollback-policy.yaml
service: atp-ingestion
environment: production
rollback:
  automatic: true
  thresholds:
    errorRate: 1.0  # Percent
    latencyP95: 1000  # Milliseconds
    healthCheckFailures: 10  # Percent of instances
    customMetric: "queue_depth > 10000"  # Prometheus query
  cooldown: 300  # Seconds between rollbacks (prevent flapping)
  notifications:
    - slack: "#atp-alerts"
    - pagerduty: "ATP-Production-Oncall"

Success Criteria: Service teams configure custom thresholds; automated rollbacks effective.

Expected Outcomes:

MTTR: Reduce from 1.2 hours to <30 minutes (faster rollbacks).
SRE Toil: Reduce deployment support by 60% (self-service empowers teams).
Confidence: Teams deploy more frequently (know rollback is easy).

Q4 2025: AI-Assisted Test Generation¶

Goal: Use AI to generate unit tests, detect flaky tests, and suggest test improvements.

Initiatives:

AI Test Generation (Week 1-6):

Current: Developers manually write unit tests (time-consuming).
Target: AI generates unit tests from code; developers review and refine.

Implementation:

// Example: GitHub Copilot or Azure OpenAI generates tests

// Given this method:
public class EventValidator
{
    public ValidationResult Validate(AuditEvent evt)
    {
        if (string.IsNullOrEmpty(evt.EventId))
            return ValidationResult.Failure("EventId required");

        if (evt.Timestamp > DateTime.UtcNow)
            return ValidationResult.Failure("Timestamp cannot be in future");

        return ValidationResult.Success();
    }
}

// AI generates:
[Fact]
public void Validate_NullEventId_ReturnsFailure()
{
    var validator = new EventValidator();
    var evt = new AuditEvent { EventId = null };
    var result = validator.Validate(evt);
    Assert.False(result.IsSuccess);
    Assert.Contains("EventId required", result.Error);
}

[Fact]
public void Validate_FutureTimestamp_ReturnsFailure()
{
    var validator = new EventValidator();
    var evt = new AuditEvent { Timestamp = DateTime.UtcNow.AddHours(1) };
    var result = validator.Validate(evt);
    Assert.False(result.IsSuccess);
    Assert.Contains("future", result.Error);
}

Success Criteria: AI generates 70% of unit tests (developers refine); coverage increases to 75% average.

Flaky Test Detection (Week 7-9):

Current: Manual identification of flaky tests (pass rate < 100%).
Target: AI analyzes test history and identifies flaky patterns.

Implementation:

# flaky_test_detector.py
from azure.devops import TestResultsClient
from sklearn.cluster import DBSCAN

def detect_flaky_tests(pipeline_id, days=30):
    # Fetch test results for last 30 days
    results = client.get_test_results(pipeline_id, last_n_days=days)

    # Extract features: pass rate, failure patterns, duration variance
    features = extract_test_features(results)

    # Cluster tests (flaky tests cluster together)
    clustering = DBSCAN(eps=0.3, min_samples=5).fit(features)

    # Identify flaky cluster
    flaky_tests = [test for test, label in zip(results, clustering.labels_) if label == 0]

    return flaky_tests

Pipeline Integration:

- script: |
    python flaky_test_detector.py --pipeline-id 1 --days 30 > flaky-tests.json

    if [ -s flaky-tests.json ]; then
      echo "##vso[task.logissue type=warning]Flaky tests detected; creating work items"
      python create_workitems_for_flaky_tests.py flaky-tests.json
    fi
  displayName: 'AI Flaky Test Detection'

Success Criteria: 90% of flaky tests auto-detected; work items auto-created.

Test Optimization Suggestions (Week 10-12):

Current: Developers manually optimize slow tests.
Target: AI suggests optimizations (reduce test data, parallelize, mock expensive dependencies).

Example Suggestion:

AI Suggestion: EventProcessor_HandleHighVolumeLoad_Test

Current Duration: 45s (slowest test in suite)

Optimization Suggestions:
1. Reduce test data: Currently seeds 10,000 records; 100 sufficient for testing.
    Expected Improvement: 35s reduction (from 45s to 10s)

2. Mock RabbitMQ: Test doesn't verify message content; can use in-memory mock.
    Expected Improvement: 5s reduction (RabbitMQ round-trip eliminated)

3. Parallelize: Test is parallel-safe (no shared state).
    Expected Improvement: Run concurrently with other tests

Apply optimizations? [Yes/No]

Success Criteria: Test duration reduced by 30% (avg); developer adoption of 80%+ suggestions.

Expected Outcomes:

Test Coverage: Increase to 75% average (AI generates tests).
Flaky Tests: Reduce by 90% (auto-detection and remediation).
Test Performance: 30% faster test execution (AI optimizations).

Innovation Areas¶

Beyond quarterly roadmap, ATP explores experimental innovations that may transform CI/CD in future years.

Predictive Failure Detection¶

Concept: Use machine learning to predict pipeline failures before execution, enabling proactive fixes.

How It Works:

Feature Extraction: Analyze commit diff (files changed, lines added/deleted, changed dependencies).
Historical Correlation: Train model on past builds (commits → failure reasons).
Prediction: Before queuing build, predict failure probability.
Proactive Action: If high failure probability, notify developer with specific concerns.

ML Model (Proof of Concept):

# predictive_failure_detector.py
from sklearn.ensemble import RandomForestClassifier
import pandas as pd

# Features: files_changed, lines_added, lines_deleted, dependencies_changed, commit_message_length
features = pd.DataFrame([
    [3, 120, 45, 1, 80, 0],  # Build succeeded
    [15, 800, 200, 5, 120, 1],  # Build failed (large change, many dependencies)
    [2, 30, 10, 0, 50, 0],  # Build succeeded
])

labels = [0, 1, 0]  # 0 = success, 1 = failure

# Train model
model = RandomForestClassifier(n_estimators=100)
model.fit(features, labels)

# Predict new commit
new_commit_features = [[20, 1000, 500, 10, 150]]  # Large change
prediction = model.predict(new_commit_features)
probability = model.predict_proba(new_commit_features)[0][1]

if probability > 0.7:
    print(f"⚠️ High failure probability ({probability*100:.1f}%)")
    print("Concerns: Large changeset (1000 lines), many dependency changes (10 packages)")
    print("Suggestion: Split into smaller PRs; test locally before pushing")

Pipeline Integration:

# Run prediction before triggering pipeline
trigger:
  branches:
    include: [master, main]

stages:
- stage: Predict_Failure
  jobs:
  - job: Analyze_Commit
    steps:
    - script: |
        python predictive_failure_detector.py \
          --commit $(Build.SourceVersion) \
          --threshold 0.7

        if [ $? -eq 1 ]; then
          echo "##vso[task.logissue type=warning]High failure probability detected"
          # Optionally: Block build, require developer acknowledgment
        fi
      displayName: 'Predictive Failure Analysis'

Benefits:

Proactive Fixes: Developers address issues before CI failure (save time).
Reduced Build Failures: Fewer failed builds (success rate increases).
Learning: Model improves over time (more training data = better predictions).

Challenges:

Model Accuracy: Requires large training dataset (1000+ builds).
False Positives: Over-prediction erodes trust (developers ignore warnings).
Explainability: Model must explain why failure predicted (not black box).

Ephemeral Environments¶

Concept: Spin up full ATP stack (all 7 services + infrastructure) for each pull request, enabling realistic integration testing.

How It Works:

PR Created: Developer opens PR for Ingestion Service.
Environment Provisioning: Pipeline provisions ephemeral environment:
- Azure Container Instances (ACI) for each ATP service.
- SQL Database (serverless tier, auto-paused).
- Redis Cache (basic tier).
- Service Bus namespace.
Deploy PR Code: Ingestion Service deployed from PR branch; other services use stable versions.
Integration Tests: PR tests run against full stack (realistic dependencies).
Environment Teardown: After PR merged/closed, environment deleted (cost optimization).

Pipeline Implementation:

# pr-environment.yml
pr:
  branches:
    include: [master, main]

stages:
- stage: Provision_PR_Environment
  jobs:
  - job: Pulumi_Deploy_Ephemeral
    steps:
    - script: |
        # Deploy ephemeral stack
        pulumi stack select atp-pr-$(System.PullRequest.PullRequestId) --create
        pulumi up --yes

        # Capture endpoints
        INGESTION_URL=$(pulumi stack output IngestionServiceUrl)
        echo "##vso[task.setvariable variable=IngestionUrl]$INGESTION_URL"
      displayName: 'Provision Ephemeral Environment'

- stage: Test_Against_Ephemeral
  jobs:
  - job: Integration_Tests
    steps:
    - script: |
        # Run integration tests against ephemeral environment
        dotnet test --filter Category=Integration \
          --environment:ServiceUrl=$(IngestionUrl)
      displayName: 'Integration Tests'

- stage: Cleanup_PR_Environment
  condition: always()
  jobs:
  - job: Pulumi_Destroy_Ephemeral
    steps:
    - script: |
        pulumi stack select atp-pr-$(System.PullRequest.PullRequestId)
        pulumi destroy --yes
        pulumi stack rm --yes
      displayName: 'Destroy Ephemeral Environment'

Benefits:

Realistic Testing: PRs tested against full ATP stack (catch integration issues early).
Isolation: Each PR has dedicated environment (no conflicts).
Confidence: Developers see how changes behave in production-like environment.

Challenges:

Cost: Ephemeral environments expensive (7 services × multiple PRs).
Provisioning Time: Takes 5-10 minutes to provision full stack.
Cleanup: Must ensure environments destroyed (prevent resource leaks).

Cost Optimization:

Serverless Resources: Use Azure Container Instances (pay per second), Serverless SQL (auto-pause).
Shared Infrastructure: Share Redis/Service Bus across PR environments (isolated by namespace).
Time Limits: Auto-destroy environments after 4 hours (prevent abandoned environments).

Policy as Code¶

Concept: Use OPA/Rego policies to enforce pipeline compliance rules (SBOM required, security scans mandatory, coverage thresholds met).

How It Works:

Policy Definition: Write Rego policies defining pipeline requirements.
Policy Enforcement: Pipeline evaluates policies before publishing artifacts.
Policy Violation: If policy violated, pipeline blocked; remediation guidance provided.

Example Policy (SBOM Required):

# policies/sbom_required.rego
package pipeline.compliance

# Policy: All production builds must include SBOM
deny[msg] {
    input.build.branch == "refs/heads/master"
    not sbom_artifact_exists
    msg := "SBOM artifact required for production builds"
}

sbom_artifact_exists {
    input.artifacts[_].name == "sbom"
}

Pipeline Integration:

- script: |
    # Evaluate OPA policies
    opa eval \
      --data policies/ \
      --input build-context.json \
      --format pretty \
      "data.pipeline.compliance.deny"

    if [ $? -ne 0 ]; then
      echo "##vso[task.logissue type=error]Policy violation detected!"
      exit 1
    fi
  displayName: 'Evaluate Pipeline Policies'

Build Context (Input to OPA):

{
  "build": {
    "branch": "refs/heads/master",
    "number": "1.0.42",
    "reason": "Manual"
  },
  "artifacts": [
    {"name": "atp-ingestion-drop", "size": 45000000},
    {"name": "sbom", "size": 125000},
    {"name": "security-reports", "size": 2500000}
  ],
  "tests": {
    "total": 1234,
    "passed": 1234,
    "coverage": 75.3
  },
  "securityScans": {
    "sonarQube": {"status": "PASSED"},
    "owasp": {"status": "PASSED"},
    "trivy": {"status": "PASSED"}
  }
}

Additional Policies:

Coverage Policy: Coverage must meet or exceed service threshold.
Security Policy: Zero critical/high vulnerabilities.
Approval Policy: Production deployments require 2 approvals.
Retention Policy: Production builds retained for 7 years.

Benefits:

Compliance Automation: Policies enforce compliance automatically (no manual checks).
Auditability: Policy evaluations logged (proof of compliance).
Flexibility: Policies updated independently of pipelines (centralized governance).

Challenges:

Policy Complexity: Rego learning curve for developers.
Performance: Policy evaluation adds latency to pipeline (mitigated by caching).

Success Metrics¶

Track improvement progress using DORA metrics and custom KPIs:

Metric	Q4 2024 (Baseline)	Q4 2025 (Target)	Status
Deployment Frequency	2/month	Daily (20/month)	In Progress (Q1 2025)
Lead Time for Changes	5.2 days	<1 day	In Progress (Q1 2025)
Mean Time to Recovery	1.2 hours	<30 minutes	In Progress (Q3 2025)
Change Failure Rate	8%	<10%	✅ Met
Code Coverage	70%	75%	In Progress (Q4 2025)
Pipeline Success Rate	94%	≥97%	In Progress (Q2 2025)
CI Duration (P95)	10m 48s	<10m	In Progress (Q1 2025)

Quarterly Reviews: Platform team reviews metrics quarterly; adjusts roadmap based on progress.

Appendix A — Pipeline YAML Example (ATP Ingestion)¶

This appendix provides a complete, production-ready pipeline YAML for the ATP Ingestion Service, demonstrating the integration of ConnectSoft.AzurePipelines templates, service containers, quality gates, and multi-environment deployment. This example serves as a reference implementation for other ATP microservices.

Complete Pipeline YAML¶

# azure-pipelines.yml (ATP Ingestion Service)
# Location: ConnectSoft.ATP.Ingestion/azure-pipelines.yml

name: $(majorMinorVersion).$(semanticVersion)

# External template repository reference
resources:
  repositories:
    - repository: templates
      type: git
      name: ConnectSoft/ConnectSoft.AzurePipelines
      ref: refs/tags/v2.3.1  # Pin to specific version for production stability

  # Service containers for integration testing
  containers:
    - container: redis
      image: redis:7-alpine
      ports: [6379:6379]
      options: --health-cmd "redis-cli ping" --health-interval 10s --health-timeout 5s --health-retries 5

    - container: rabbitmq
      image: rabbitmq:3-management-alpine
      ports: [5672:5672, 15672:15672]
      env:
        RABBITMQ_DEFAULT_USER: guest
        RABBITMQ_DEFAULT_PASS: guest
      options: --health-cmd "rabbitmq-diagnostics -q ping" --health-interval 10s --health-timeout 5s --health-retries 5

    - container: mssql
      image: mcr.microsoft.com/mssql/server:2022-latest
      ports: [1433:1433]
      env:
        ACCEPT_EULA: Y
        SA_PASSWORD: P@ssw0rd123!
        MSSQL_PID: Developer
      options: --health-cmd "/opt/mssql-tools/bin/sqlcmd -S localhost -U sa -P P@ssw0rd123! -Q 'SELECT 1'" --health-interval 10s --health-timeout 5s --health-retries 10

    - container: otel-collector
      image: otel/opentelemetry-collector:0.97.0
      ports: [4317:4317, 8888:8888, 13133:13133]
      volumes:
        - $(Build.SourcesDirectory)/test/otel-config.yaml:/etc/otel/config.yaml
      options: --command=["--config=/etc/otel/config.yaml"]

    - container: seq
      image: datalust/seq:latest
      ports: [5341:80]
      env:
        ACCEPT_EULA: Y

# Build agent pool (can be overridden for self-hosted agents)
pool:
  vmImage: 'ubuntu-latest'

# Pipeline variables (semantic versioning, paths, thresholds)
variables:
  # Versioning
  majorMinorVersion: 1.0
  semanticVersion: $[counter(variables['majorMinorVersion'], 0)]
  buildNumber: $(majorMinorVersion).$(semanticVersion)

  # Solution paths
  solution: '**/*.slnx'
  exactSolution: 'ConnectSoft.ATP.Ingestion.slnx'
  buildConfiguration: 'Release'

  # NuGet feed authentication
  restoreVstsFeed: 'e4c108b4-7989-4d22-93d6-391b77a39552/1889adca-ccb6-4ece-aa22-cad1ae4a35f3'

  # Quality gates
  codeCoverageThreshold: 75  # Ingestion service requires 75% coverage

  # Artifact configuration
  artifactName: 'atp-ingestion-drop'

  # Docker configuration
  dockerRegistryServiceConnection: '9190f67e-25ee-4478-bdd5-933128c9f06f'
  containerRegistry: 'connectsoft.azurecr.io'
  imageRepository: 'atp/ingestion'
  dockerfile: 'src/ConnectSoft.ATP.Ingestion/Dockerfile'

  # Azure subscription (for deployments)
  azureSubscription: 'ConnectSoft-Production'

# Variable groups (environment-specific secrets)
variables:
  - group: ATP-Dev-Secrets
  - group: ATP-Common-Config

# Trigger configuration
trigger:
  branches:
    include:
      - master
      - main
      - release/*
  paths:
    exclude:
      - README.md
      - docs/**
      - '*.md'
  tags:
    include:
      - v*.*.*

# Pull request validation
pr:
  branches:
    include:
      - master
      - main
  paths:
    exclude:
      - README.md
      - docs/**

# Pipeline stages
stages:
#═══════════════════════════════════════════════════════════════════════════════
# Stage 1: CI (Build, Test, Security, Publish)
#═══════════════════════════════════════════════════════════════════════════════
- stage: CI_Stage
  displayName: 'Build and Test ATP Ingestion'
  jobs:
  - job: Build_Test_Publish
    displayName: 'Build, Test, Security Scan, Publish'
    timeoutInMinutes: 20  # Fail if CI exceeds 20 minutes

    # Attach service containers
    services:
      redis: redis
      rabbitmq: rabbitmq
      mssql: mssql
      otel: otel-collector
      seq: seq

    steps:
    # ─────────────────────────────────────────────────────────────────────────
    # 1. Setup .NET SDK
    # ─────────────────────────────────────────────────────────────────────────
    - task: UseDotNet@2
      displayName: 'Install .NET 8 SDK'
      inputs:
        version: '8.x'
        includePreviewVersions: false

    # ─────────────────────────────────────────────────────────────────────────
    # 2. Lint (code style, security, deprecated packages)
    # ─────────────────────────────────────────────────────────────────────────
    - template: build/lint-microservice-steps.yaml@templates
      parameters:
        solution: $(solution)
        exactSolution: $(exactSolution)
        restoreVstsFeed: $(restoreVstsFeed)
        isNugetAuthenticateEnabled: true
        sonarQubeServiceConnection: 'SonarCloud-ConnectSoft'
        sonarQubeProjectKey: 'ConnectSoft_ATP_Ingestion'
        sonarQubeProjectName: 'ATP Ingestion Service'

    # ─────────────────────────────────────────────────────────────────────────
    # 3. Build (restore, compile, version stamp)
    # ─────────────────────────────────────────────────────────────────────────
    - template: build/build-microservice-steps.yaml@templates
      parameters:
        solution: $(solution)
        exactSolution: $(exactSolution)
        buildConfiguration: $(buildConfiguration)
        buildNumber: $(buildNumber)
        generateDocumentation: true  # XML documentation for NuGet

    # ─────────────────────────────────────────────────────────────────────────
    # 4. Test (unit + integration, coverage, reporting)
    # ─────────────────────────────────────────────────────────────────────────
    - template: test/test-microservice-steps.yaml@templates
      parameters:
        solution: $(solution)
        runSettingsFileName: 'ConnectSoft.ATP.Ingestion.runsettings'
        buildConfiguration: $(buildConfiguration)
        codeCoverageThreshold: $(codeCoverageThreshold)
        publishTestResults: true
        testResultsFormat: 'VSTest'
        mergeTestResults: true

    # ─────────────────────────────────────────────────────────────────────────
    # 5. Security Scanning (SAST, dependency check, secrets)
    # ─────────────────────────────────────────────────────────────────────────
    - template: build/security-scan-microservice-steps.yaml@templates
      parameters:
        solution: $(solution)
        owaspDependencyCheckEnabled: true
        trivyScanEnabled: true
        secretsScanEnabled: true
        failOnCriticalVulnerabilities: true

    # ─────────────────────────────────────────────────────────────────────────
    # 6. Generate SBOM (Software Bill of Materials)
    # ─────────────────────────────────────────────────────────────────────────
    - task: CmdLine@2
      displayName: 'Generate SBOM (CycloneDX)'
      inputs:
        script: |
          dotnet tool install --global CycloneDX
          dotnet CycloneDX $(exactSolution) -o $(Build.ArtifactStagingDirectory)/sbom -f json

    - task: PublishBuildArtifacts@1
      displayName: 'Publish SBOM Artifact'
      inputs:
        PathtoPublish: '$(Build.ArtifactStagingDirectory)/sbom'
        ArtifactName: 'sbom'

    # ─────────────────────────────────────────────────────────────────────────
    # 7. Publish Artifacts (binaries, test results, compliance)
    # ─────────────────────────────────────────────────────────────────────────
    - template: publish/publish-microservice-steps.yaml@templates
      parameters:
        artifactName: $(artifactName)
        buildConfiguration: $(buildConfiguration)
        publishSymbols: true
        symbolsPath: '**/*.pdb'

    # ─────────────────────────────────────────────────────────────────────────
    # 8. Build and Push Docker Image
    # ─────────────────────────────────────────────────────────────────────────
    - template: build/build-and-push-microservice-docker-steps.yaml@templates
      parameters:
        dockerRegistryServiceConnection: $(dockerRegistryServiceConnection)
        imageRepository: $(imageRepository)
        containerRegistry: $(containerRegistry)
        dockerfile: $(dockerfile)
        buildContext: '.'
        tags: |
          $(buildNumber)
          latest
        arguments: '--build-arg VERSION=$(buildNumber) --build-arg BUILD_DATE=$(Build.QueueTime)'

#═══════════════════════════════════════════════════════════════════════════════
# Stage 2: Deploy to Dev (Automated)
#═══════════════════════════════════════════════════════════════════════════════
- stage: CD_Dev
  displayName: 'Deploy to Development'
  dependsOn: CI_Stage
  condition: |
    and(
      succeeded(),
      or(
        eq(variables['Build.SourceBranch'], 'refs/heads/master'),
        eq(variables['Build.SourceBranch'], 'refs/heads/main')
      )
    )
  jobs:
  - deployment: DeployToDev
    displayName: 'Deploy ATP Ingestion to Dev'
    environment: ATP-Dev
    strategy:
      runOnce:
        deploy:
          steps:
          - template: deploy/deploy-microservice-to-azure-web-site.yaml@templates
            parameters:
              azureSubscription: $(azureSubscription)
              appName: 'atp-ingestion-dev'
              package: '$(Pipeline.Workspace)/$(artifactName)/*.zip'
              appSettings: |
                -ConnectionStrings__Redis "redis-dev.connectsoft.local:6379"
                -ConnectionStrings__RabbitMQ "amqp://guest:guest@rabbitmq-dev.connectsoft.local:5672"
                -ConnectionStrings__Database "Server=sql-dev.connectsoft.local;Database=ATP_Ingestion_Dev;User Id=$(DbUser);Password=$(DbPassword)"
                -OpenTelemetry__ExporterEndpoint "http://otel-collector-dev.connectsoft.local:4317"

          # Post-deployment smoke tests
          - task: PowerShell@2
            displayName: 'Run Smoke Tests'
            inputs:
              targetType: 'inline'
              script: |
                $healthCheckUrl = "https://atp-ingestion-dev.azurewebsites.net/health"
                $response = Invoke-RestMethod -Uri $healthCheckUrl -Method Get
                if ($response.status -ne "Healthy") {
                  Write-Error "Health check failed: $($response.status)"
                  exit 1
                }
                Write-Host "✅ Smoke test passed"

#═══════════════════════════════════════════════════════════════════════════════
# Stage 3: Deploy to Test (Automated)
#═══════════════════════════════════════════════════════════════════════════════
- stage: CD_Test
  displayName: 'Deploy to Test'
  dependsOn: CD_Dev
  condition: succeeded()
  jobs:
  - deployment: DeployToTest
    displayName: 'Deploy ATP Ingestion to Test'
    environment: ATP-Test
    strategy:
      runOnce:
        deploy:
          steps:
          - template: deploy/deploy-microservice-to-azure-web-site.yaml@templates
            parameters:
              azureSubscription: $(azureSubscription)
              appName: 'atp-ingestion-test'
              package: '$(Pipeline.Workspace)/$(artifactName)/*.zip'
              slotName: 'blue'  # Blue-green deployment

          # Validate deployment before slot swap
          - task: AzureCLI@2
            displayName: 'Validate Deployment'
            inputs:
              azureSubscription: $(azureSubscription)
              scriptType: 'bash'
              scriptLocation: 'inlineScript'
              inlineScript: |
                az webapp deployment slot swap \
                  --name atp-ingestion-test \
                  --resource-group ATP-Test-RG \
                  --slot blue \
                  --target-slot production

#═══════════════════════════════════════════════════════════════════════════════
# Stage 4: Deploy to Staging (Manual Approval)
#═══════════════════════════════════════════════════════════════════════════════
- stage: CD_Staging
  displayName: 'Deploy to Staging'
  dependsOn: CD_Test
  condition: |
    and(
      succeeded(),
      eq(variables['Build.SourceBranch'], 'refs/heads/master')
    )
  jobs:
  - deployment: DeployToStaging
    displayName: 'Deploy ATP Ingestion to Staging'
    environment: ATP-Staging  # Requires manual approval in Azure DevOps
    strategy:
      runOnce:
        deploy:
          steps:
          - template: deploy/deploy-microservice-to-azure-web-site.yaml@templates
            parameters:
              azureSubscription: $(azureSubscription)
              appName: 'atp-ingestion-staging'
              package: '$(Pipeline.Workspace)/$(artifactName)/*.zip'

#═══════════════════════════════════════════════════════════════════════════════
# Stage 5: Deploy to Production (Manual Approval + Canary)
#═══════════════════════════════════════════════════════════════════════════════
- stage: CD_Production
  displayName: 'Deploy to Production'
  dependsOn: CD_Staging
  condition: |
    and(
      succeeded(),
      eq(variables['Build.Reason'], 'Manual')  # Only allow manual production deployments
    )
  jobs:
  - deployment: DeployToProduction
    displayName: 'Deploy ATP Ingestion to Production (Canary)'
    environment: ATP-Production  # Requires 2 approvals in Azure DevOps
    strategy:
      canary:
        increments: [10, 25, 50]  # 10% → 25% → 50% → 100%
        preDeploy:
          steps:
          - script: echo "Pre-deployment validation"
          # Verify staging has been stable for 24 hours
          - task: PowerShell@2
            inputs:
              targetType: 'inline'
              script: |
                # Check no incidents in last 24 hours
                $incidents = az monitor activity-log list --resource-group ATP-Staging-RG --offset 24h
                if ($incidents.Count -gt 0) {
                  Write-Error "Active incidents detected in staging"
                  exit 1
                }

        deploy:
          steps:
          - template: deploy/deploy-microservice-to-azure-web-site.yaml@templates
            parameters:
              azureSubscription: $(azureSubscription)
              appName: 'atp-ingestion-prod'
              package: '$(Pipeline.Workspace)/$(artifactName)/*.zip'
              trafficPercentage: $(strategy.increment)  # Canary traffic routing

        routeTraffic:
          steps:
          - script: echo "Routing $(strategy.increment)% traffic to new version"
          - task: AzureAppServiceManage@0
            inputs:
              azureSubscription: $(azureSubscription)
              action: 'Start Azure App Service'
              webAppName: 'atp-ingestion-prod'

        postRouteTraffic:
          steps:
          # Monitor metrics for 10 minutes
          - task: PowerShell@2
            displayName: 'Monitor Canary Metrics'
            inputs:
              targetType: 'inline'
              script: |
                Start-Sleep -Seconds 600  # Wait 10 minutes

                # Query Application Insights
                $errorRate = az monitor app-insights metrics show \
                  --app atp-ingestion-prod \
                  --metric requests/failed \
                  --aggregation avg \
                  --offset 10m | ConvertFrom-Json

                if ($errorRate.value.avg -gt 0.01) {  # >1% error rate
                  Write-Error "Error rate exceeded threshold: $($errorRate.value.avg)%"
                  exit 1  # Trigger rollback
                }

        on:
          failure:
            steps:
            - script: echo "🔴 Canary deployment failed; rolling back"
            - task: AzureAppServiceManage@0
              inputs:
                azureSubscription: $(azureSubscription)
                action: 'Swap Slots'
                webAppName: 'atp-ingestion-prod'
                sourceSlot: 'production'
                targetSlot: 'canary'

Key Annotations¶

Versioning Strategy:

majorMinorVersion: 1.0 manually bumped for breaking changes.
semanticVersion: $[counter(...)] auto-increments for each build.
Build name: 1.0.42 (displayed in Azure DevOps).

Service Containers:

Health checks ensure containers are ready before tests run.
options: --health-cmd configures health check commands (Redis: redis-cli ping, SQL: sqlcmd -S localhost).
Environment variables configure container behavior (e.g., SQL password, RabbitMQ credentials).

Template References:

@templates references the ConnectSoft.AzurePipelines repository.
Templates pinned to refs/tags/v2.3.1 for production stability (prevents breaking changes from upstream).

Quality Gates:

codeCoverageThreshold: 75 enforced in test template.
Security scans fail build on critical/high vulnerabilities.
SBOM generated and published as artifact.

Deployment Strategy:

Dev/Test: Automated (no approval).
Staging: Manual approval (1 approver).
Production: Manual approval (2 approvers) + canary rollout (10% → 25% → 50% → 100%).

Canary Deployment:

strategy.canary.increments: [10, 25, 50] defines traffic progression.
postRouteTraffic monitors metrics for 10 minutes before proceeding.
on.failure triggers automated rollback if metrics degrade.

Appendix B — Service Container Definitions¶

Service containers enable realistic integration testing by running dependencies (databases, message brokers, observability tools) alongside the build agent. This appendix defines standard container configurations used across ATP pipelines.

Container Catalog¶

# Standard service containers for ATP pipelines
# Usage: Reference in pipeline YAML under resources.containers

containers:
  #─────────────────────────────────────────────────────────────────────────────
  # 1. Redis (Caching, Session State)
  #─────────────────────────────────────────────────────────────────────────────
  - container: redis
    image: redis:7-alpine
    ports:
      - 6379:6379
    options: >-
      --health-cmd "redis-cli ping"
      --health-interval 10s
      --health-timeout 5s
      --health-retries 5
      --name redis-test
    # Connection string (from test code): localhost:6379
    # Use cases: Session caching, distributed locking, pub/sub

  #─────────────────────────────────────────────────────────────────────────────
  # 2. SQL Server (Primary Database)
  #─────────────────────────────────────────────────────────────────────────────
  - container: mssql
    image: mcr.microsoft.com/mssql/server:2022-latest
    ports:
      - 1433:1433
    env:
      ACCEPT_EULA: Y
      SA_PASSWORD: P@ssw0rd123!  # Strong password required (min 8 chars, uppercase, lowercase, digit, symbol)
      MSSQL_PID: Developer
      MSSQL_COLLATION: SQL_Latin1_General_CP1_CI_AS
    options: >-
      --health-cmd "/opt/mssql-tools/bin/sqlcmd -S localhost -U sa -P P@ssw0rd123! -Q 'SELECT 1'"
      --health-interval 10s
      --health-timeout 5s
      --health-retries 10
      --name mssql-test
    # Connection string: Server=localhost,1433;Database=TestDb;User Id=sa;Password=P@ssw0rd123!;TrustServerCertificate=True
    # Use cases: Audit event storage, entity relationships, transactional integrity

  #─────────────────────────────────────────────────────────────────────────────
  # 3. PostgreSQL (Alternative Database)
  #─────────────────────────────────────────────────────────────────────────────
  - container: postgres
    image: postgres:16-alpine
    ports:
      - 5432:5432
    env:
      POSTGRES_USER: postgres
      POSTGRES_PASSWORD: postgres
      POSTGRES_DB: testdb
    options: >-
      --health-cmd "pg_isready -U postgres"
      --health-interval 10s
      --health-timeout 5s
      --health-retries 5
      --name postgres-test
    # Connection string: Host=localhost;Port=5432;Database=testdb;Username=postgres;Password=postgres
    # Use cases: Query service (read-optimized), GIN indexes for full-text search

  #─────────────────────────────────────────────────────────────────────────────
  # 4. MongoDB (Document Storage)
  #─────────────────────────────────────────────────────────────────────────────
  - container: mongodb
    image: mongo:7
    ports:
      - 27017:27017
    env:
      MONGO_INITDB_ROOT_USERNAME: root
      MONGO_INITDB_ROOT_PASSWORD: example
    options: >-
      --health-cmd "mongosh --eval 'db.adminCommand(\"ping\")'"
      --health-interval 10s
      --health-timeout 5s
      --health-retries 5
      --name mongodb-test
    # Connection string: mongodb://root:example@localhost:27017
    # Use cases: Unstructured audit data, event payloads, schema-less storage

  #─────────────────────────────────────────────────────────────────────────────
  # 5. RabbitMQ (Message Broker)
  #─────────────────────────────────────────────────────────────────────────────
  - container: rabbitmq
    image: rabbitmq:3-management-alpine
    ports:
      - 5672:5672   # AMQP protocol
      - 15672:15672 # Management UI
    env:
      RABBITMQ_DEFAULT_USER: guest
      RABBITMQ_DEFAULT_PASS: guest
    options: >-
      --health-cmd "rabbitmq-diagnostics -q ping"
      --health-interval 10s
      --health-timeout 5s
      --health-retries 5
      --name rabbitmq-test
    # Connection string: amqp://guest:guest@localhost:5672
    # Management UI: http://localhost:15672 (guest/guest)
    # Use cases: Async event processing, saga orchestration, integration events

  #─────────────────────────────────────────────────────────────────────────────
  # 6. Elasticsearch (Search Engine)
  #─────────────────────────────────────────────────────────────────────────────
  - container: elasticsearch
    image: docker.elastic.co/elasticsearch/elasticsearch:8.11.0
    ports:
      - 9200:9200
      - 9300:9300
    env:
      discovery.type: single-node
      xpack.security.enabled: false  # Disable security for testing
      ES_JAVA_OPTS: -Xms512m -Xmx512m  # Limit memory usage
    options: >-
      --health-cmd "curl -f http://localhost:9200/_cluster/health || exit 1"
      --health-interval 10s
      --health-timeout 5s
      --health-retries 10
      --name elasticsearch-test
    # Connection string: http://localhost:9200
    # Use cases: Full-text search, audit log querying, aggregations

  #─────────────────────────────────────────────────────────────────────────────
  # 7. OpenTelemetry Collector (Observability)
  #─────────────────────────────────────────────────────────────────────────────
  - container: otel-collector
    image: otel/opentelemetry-collector:0.97.0
    ports:
      - 4317:4317   # OTLP gRPC receiver
      - 8888:8888   # Prometheus metrics
      - 13133:13133 # Health check
    volumes:
      - $(Build.SourcesDirectory)/test/otel-config.yaml:/etc/otel/config.yaml
    options: >-
      --command=["--config=/etc/otel/config.yaml"]
      --health-cmd "curl -f http://localhost:13133/ || exit 1"
      --health-interval 10s
      --health-timeout 5s
      --health-retries 5
      --name otel-collector-test
    # Connection string: http://localhost:4317 (gRPC)
    # Use cases: Collect traces/metrics during tests, validate instrumentation

  #─────────────────────────────────────────────────────────────────────────────
  # 8. Seq (Centralized Logging)
  #─────────────────────────────────────────────────────────────────────────────
  - container: seq
    image: datalust/seq:latest
    ports:
      - 5341:80
    env:
      ACCEPT_EULA: Y
    options: >-
      --health-cmd "curl -f http://localhost:80/api/health || exit 1"
      --health-interval 10s
      --health-timeout 5s
      --health-retries 5
      --name seq-test
    # Connection string: http://localhost:5341
    # Use cases: Capture structured logs during tests, debug test failures

  #─────────────────────────────────────────────────────────────────────────────
  # 9. Azure Storage Emulator (Blob/Queue/Table)
  #─────────────────────────────────────────────────────────────────────────────
  - container: azurite
    image: mcr.microsoft.com/azure-storage/azurite:latest
    ports:
      - 10000:10000 # Blob service
      - 10001:10001 # Queue service
      - 10002:10002 # Table service
    options: >-
      --health-cmd "nc -z localhost 10000 || exit 1"
      --health-interval 10s
      --health-timeout 5s
      --health-retries 5
      --name azurite-test
    # Connection string: DefaultEndpointsProtocol=http;AccountName=devstoreaccount1;AccountKey=Eby8vdM02xNOcqFlqUwJPLlmEtlCDXJ1OUzFT50uSRZ6IFsuFq2UVErCz4I6tq/K1SZFPTOtr/KBHBeksoGMGw==;BlobEndpoint=http://localhost:10000/devstoreaccount1;QueueEndpoint=http://localhost:10001/devstoreaccount1;TableEndpoint=http://localhost:10002/devstoreaccount1;
    # Use cases: Blob storage for artifacts, queue-based processing, table storage

  #─────────────────────────────────────────────────────────────────────────────
  # 10. LocalStack (AWS Services Emulator)
  #─────────────────────────────────────────────────────────────────────────────
  - container: localstack
    image: localstack/localstack:latest
    ports:
      - 4566:4566  # Edge service (all AWS services)
    env:
      SERVICES: s3,sqs,sns,dynamodb
      DEBUG: 1
    options: >-
      --health-cmd "curl -f http://localhost:4566/_localstack/health || exit 1"
      --health-interval 10s
      --health-timeout 5s
      --health-retries 10
      --name localstack-test
    # Connection string: http://localhost:4566
    # AWS CLI config: aws --endpoint-url=http://localhost:4566 s3 ls
    # Use cases: Test S3 uploads, SQS queues, SNS notifications

Container Usage Patterns¶

Minimal Configuration (Single Database):

services:
  mssql: mssql  # Reference by container name

Multi-Service Configuration (Full Stack):

services:
  redis: redis
  mssql: mssql
  rabbitmq: rabbitmq
  otel: otel-collector
  seq: seq

Custom Configuration (Override Defaults):

resources:
  containers:
    - container: redis
      image: redis:7-alpine
      ports: [6380:6379]  # Custom port mapping
      env:
        REDIS_MAXMEMORY: 256mb
        REDIS_MAXMEMORY_POLICY: allkeys-lru

Connection Strings in Tests¶

Runsettings Configuration:

<!-- ConnectSoft.ATP.Ingestion.runsettings -->
<RunSettings>
  <TestRunParameters>
    <Parameter name="RedisConnectionString" value="localhost:6379" />
    <Parameter name="SqlConnectionString" value="Server=localhost,1433;Database=TestDb;User Id=sa;Password=P@ssw0rd123!;TrustServerCertificate=True" />
    <Parameter name="RabbitMqConnectionString" value="amqp://guest:guest@localhost:5672" />
    <Parameter name="ElasticsearchUrl" value="http://localhost:9200" />
    <Parameter name="OtelEndpoint" value="http://localhost:4317" />
    <Parameter name="SeqUrl" value="http://localhost:5341" />
  </TestRunParameters>
</RunSettings>

C# Test Code:

[Fact]
public async Task ProcessEvent_StoresInDatabase_Success()
{
    // Arrange: Connect to SQL container
    var connectionString = TestContext.Parameters["SqlConnectionString"];
    var connection = new SqlConnection(connectionString);
    await connection.OpenAsync();

    // Ensure test database exists
    await connection.ExecuteAsync("CREATE DATABASE IF NOT EXISTS TestDb");
    await connection.ExecuteAsync("USE TestDb");

    // Act: Process audit event
    var processor = new EventProcessor(connectionString);
    await processor.ProcessAsync(new AuditEvent { EventId = "test-001" });

    // Assert: Verify stored in database
    var count = await connection.ExecuteScalarAsync<int>(
        "SELECT COUNT(*) FROM AuditEvents WHERE EventId = 'test-001'");
    Assert.Equal(1, count);
}

Appendix C — Quality Gate Definitions¶

Quality gates are automated policies that enforce code quality, security, and compliance standards. Builds fail if quality gates are not met, preventing low-quality code from reaching production.

Quality Gate Configuration¶

# quality-gates.yaml (enforced in ConnectSoft.AzurePipelines templates)

qualityGates:
  #═══════════════════════════════════════════════════════════════════════════
  # 1. Code Coverage
  #═══════════════════════════════════════════════════════════════════════════
  codeCoverage:
    # Line coverage (percentage of code lines executed by tests)
    minimumLineCoverage: 70
    targetLineCoverage: 80  # Aspirational target

    # Branch coverage (percentage of conditional branches executed)
    minimumBranchCoverage: 60
    targetBranchCoverage: 70

    # Fail build if thresholds not met
    failBuildOnThresholdNotMet: true

    # Exclusions (generated code, migrations, third-party)
    excludePatterns:
      - '**/Migrations/**'
      - '**/obj/**'
      - '**/bin/**'
      - '**/*.Designer.cs'
      - '**/*.g.cs'
      - '**/*.g.i.cs'

    # Coverage format
    coverageFormat: 'Cobertura'

    # Publish coverage reports to Azure DevOps
    publishCoverageReport: true

    # Service-specific overrides
    serviceOverrides:
      - service: 'ATP.Ingestion'
        minimumLineCoverage: 75  # Higher standard for critical service
      - service: 'ATP.Query'
        minimumLineCoverage: 80  # Highest standard (complex query logic)
      - service: 'ATP.Gateway'
        minimumLineCoverage: 65  # Lower standard (thin API layer)

  #═══════════════════════════════════════════════════════════════════════════
  # 2. Security Scanning
  #═══════════════════════════════════════════════════════════════════════════
  security:
    # SAST (Static Application Security Testing)
    sonarQube:
      enabled: true
      qualityGate: 'ConnectSoft-Default'
      failOnQualityGateFail: true
      serverUrl: 'https://sonarcloud.io'
      organization: 'connectsoft'

      # Quality gate conditions (defined in SonarCloud)
      conditions:
        - metric: 'new_bugs'
          operator: 'GREATER_THAN'
          threshold: 0
        - metric: 'new_vulnerabilities'
          operator: 'GREATER_THAN'
          threshold: 0
        - metric: 'new_code_smells'
          operator: 'GREATER_THAN'
          threshold: 5
        - metric: 'new_coverage'
          operator: 'LESS_THAN'
          threshold: 70
        - metric: 'new_duplicated_lines_density'
          operator: 'GREATER_THAN'
          threshold: 3

    # Dependency vulnerability scanning
    dependencyCheck:
      enabled: true
      tool: 'OWASP Dependency-Check'
      failOnCriticalVulnerabilities: true
      failOnHighVulnerabilities: true
      failOnMediumVulnerabilities: false  # Warning only

      # CVE database
      nvdApiKey: '$(NVD_API_KEY)'  # Faster updates with API key
      updateDatabase: true

      # Suppression file (false positives)
      suppressionFile: 'dependency-check-suppressions.xml'

    # Container image scanning
    trivy:
      enabled: true
      severity: 'CRITICAL,HIGH'
      failOnVulnerabilities: true
      ignoreUnfixed: false  # Report unfixed vulnerabilities (informational)

      # Scan types
      scanTypes: ['vuln', 'config', 'secret']

      # Output format
      outputFormat: 'sarif'  # Compatible with GitHub Security tab

    # Secrets detection
    secretsScanning:
      enabled: true
      tool: 'detect-secrets'
      failOnSecretsDetected: true
      baselineFile: '.secrets.baseline'

      # Patterns to detect
      plugins:
        - 'AWSKeyDetector'
        - 'AzureStorageKeyDetector'
        - 'BasicAuthDetector'
        - 'PrivateKeyDetector'
        - 'Base64HighEntropyString'
        - 'HexHighEntropyString'

  #═══════════════════════════════════════════════════════════════════════════
  # 3. Testing Requirements
  #═══════════════════════════════════════════════════════════════════════════
  testing:
    # Test pass rate (100% required; no flaky tests tolerated)
    testPassRate: 100
    allowFlakyTests: false

    # Test execution limits
    maxTestDuration: 600  # 10 minutes total test execution
    maxSingleTestDuration: 60  # 1 minute per test

    # Test categories (must have tests in each category)
    requiredCategories:
      - 'Unit'
      - 'Integration'

    # Flaky test detection
    flakyTestDetection:
      enabled: true
      passRateThreshold: 0.95  # Test passes <95% of time = flaky
      historicalRunsToAnalyze: 30
      createWorkItemOnDetection: true

    # Test result publishing
    publishTestResults: true
    testResultsFormat: 'VSTest'
    mergeTestResults: true

  #═══════════════════════════════════════════════════════════════════════════
  # 4. Build Performance
  #═══════════════════════════════════════════════════════════════════════════
  buildPerformance:
    # CI stage duration limits
    maxBuildDuration: 600  # 10 minutes
    maxTestDuration: 300   # 5 minutes
    maxSecurityScanDuration: 180  # 3 minutes

    # Artifact size limits
    maxArtifactSize: 104857600  # 100 MB
    warnArtifactSize: 52428800  # 50 MB (warning threshold)

    # Docker image size limits
    maxDockerImageSize: 524288000  # 500 MB
    warnDockerImageSize: 262144000  # 250 MB

  #═══════════════════════════════════════════════════════════════════════════
  # 5. Code Quality
  #═══════════════════════════════════════════════════════════════════════════
  codeQuality:
    # Linting (StyleCop, EditorConfig)
    linting:
      enabled: true
      failOnWarnings: false  # Warnings don't fail build (errors do)
      warningsAsErrors: []  # Specific warnings to treat as errors

    # Deprecated package detection
    deprecatedPackages:
      enabled: true
      failOnDeprecated: true
      allowedDeprecatedPackages: []  # Exceptions (with justification)

    # Code duplication
    duplication:
      enabled: true
      maxDuplicationPercentage: 3  # 3% duplication allowed
      failOnExceeded: false  # Warning only

  #═══════════════════════════════════════════════════════════════════════════
  # 6. Compliance Artifacts
  #═══════════════════════════════════════════════════════════════════════════
  compliance:
    # SBOM (Software Bill of Materials)
    sbom:
      required: true
      format: 'CycloneDX'  # or 'SPDX'
      outputPath: '$(Build.ArtifactStagingDirectory)/sbom'
      publishAsArtifact: true

    # Security scan reports
    securityReports:
      required: true
      formats: ['SARIF', 'JSON', 'HTML']
      publishAsArtifact: true

    # Test results archive
    testResultsArchive:
      required: true
      retentionDays: 90
      publishAsArtifact: true

Quality Gate Enforcement in Templates¶

Code Coverage Gate (test-microservice-steps.yaml):

- task: DotNetCoreCLI@2
  displayName: 'Run Tests with Coverage'
  inputs:
    command: 'test'
    arguments: >
      --configuration $(buildConfiguration)
      --collect:"XPlat Code Coverage"
      --settings $(runSettingsFileName)
      --logger trx
      --results-directory $(Agent.TempDirectory)/TestResults

- task: PublishCodeCoverageResults@2
  displayName: 'Publish Code Coverage'
  inputs:
    codeCoverageTool: 'Cobertura'
    summaryFileLocation: '$(Agent.TempDirectory)/TestResults/**/coverage.cobertura.xml'
    failIfCoverageEmpty: true

# Enforce coverage threshold
- task: PowerShell@2
  displayName: 'Enforce Coverage Threshold'
  inputs:
    targetType: 'inline'
    script: |
      $coverageFile = Get-ChildItem -Path "$(Agent.TempDirectory)/TestResults" -Filter "coverage.cobertura.xml" -Recurse | Select-Object -First 1
      [xml]$coverage = Get-Content $coverageFile.FullName
      $lineCoverage = [double]$coverage.coverage.'line-rate' * 100
      $branchCoverage = [double]$coverage.coverage.'branch-rate' * 100

      Write-Host "Line Coverage: $lineCoverage%"
      Write-Host "Branch Coverage: $branchCoverage%"

      if ($lineCoverage -lt $(codeCoverageThreshold)) {
        Write-Error "❌ Line coverage ($lineCoverage%) below threshold ($(codeCoverageThreshold)%)"
        exit 1
      }

      Write-Host "✅ Coverage threshold met"

Security Gate (security-scan-microservice-steps.yaml):

- task: SonarCloudAnalyze@1
  displayName: 'Run SonarCloud Analysis'

- task: SonarCloudPublish@1
  displayName: 'Publish Quality Gate Result'
  inputs:
    pollingTimeoutSec: '300'

# Fail build if quality gate fails
- task: PowerShell@2
  displayName: 'Check Quality Gate'
  inputs:
    targetType: 'inline'
    script: |
      $qualityGateStatus = Get-Content "$(Build.SourcesDirectory)/.sonarqube/out/.sonar/report-task.txt" | Select-String "ceTaskStatus"
      if ($qualityGateStatus -notmatch "SUCCESS") {
        Write-Error "❌ SonarCloud quality gate failed"
        exit 1
      }

Appendix D — Cross-Reference Map¶

This appendix provides a navigation guide to related documentation across the ConnectSoft.Audit.Documentation repository. Use this map to locate detailed information on specific topics.

Documentation Cross-References¶

Topic	Primary Document	Section	Notes
CI/CD Strategy	azure-pipelines.md	All sections	This document (comprehensive pipeline architecture)
Development Roadmap	planning/index.md	Epic planning by bounded contexts, 30-cycle baseline	Azure DevOps integration, sprint planning
Environment Definitions	environments.md	Dev, Test, Staging, Production	Approval workflows, infrastructure specs
Quality Gates	quality-gates.md	Coverage, security, compliance	Detailed gate definitions and enforcement
GitOps & Config	gitops.md	App Configuration, drift detection	Configuration as code, secret management
Template Integration	template-integration.md	ConnectSoft microservice templates	Template usage, customization, versioning
Observability	observability.md	OpenTelemetry, metrics, logs	Pipeline metrics, DORA metrics, dashboards
Monitoring & Alerting	monitoring.md	Application Insights, alerts	Health checks, SLIs/SLOs, incident response
Runbook & Operations	runbook.md	Deployment, rollback, troubleshooting	Operational procedures, on-call playbooks
Security & Compliance	security-compliance.md	Security scanning, SBOM, audit	SOC 2, GDPR, HIPAA alignment
Infrastructure (Pulumi)	pulumi.md	IaC with C#, resource provisioning	Pulumi stack structure, deployment
Database Migrations	database-migrations.md	EF Core, FluentMigrator	Migration strategy, versioning
High-Level Design	hld.md	ATP architecture overview	System context, component diagrams
Low-Level Design	lld.md	Service details, APIs, data models	Implementation specifications
Data Architecture	data-architecture.md	Database schemas, event modeling	Entity relationships, partitioning
ADRs	decisions/	Architecture Decision Records	Design rationale, trade-offs
API Specifications	swagger.yaml	OpenAPI/Swagger definitions	REST API contracts, versioning
Testing Strategy	strategy.md	Unit, integration, E2E, load	Test pyramid, coverage targets
Security Architecture	architecture.md	Threat model, controls	Attack surface, mitigation strategies
Compliance Mapping	controls-mapping.md	SOC 2, GDPR, HIPAA	Control evidence, attestation

Pipeline Configuration:

Templates: azure-pipelines.md (this doc) → Appendix A, Appendix B
Service-Specific Pipelines: azure-pipelines.md → Service-Specific Pipeline Configurations
Deployment Strategies: azure-pipelines.md → Multi-Environment Deployment Strategy
Template Customization: template-integration.md → Template Overrides

Quality & Testing:

Coverage Thresholds: azure-pipelines.md → Appendix C, Quality Gates & Policies
Test Execution: strategy.md → Integration Testing with Containers
Flaky Test Detection: azure-pipelines.md → Continuous Improvement & Roadmap (Q4 2025)

Security & Compliance:

SAST/DAST: security-compliance.md → Security Scanning
Secrets Management: security-compliance.md → Secrets & Key Management
SBOM Generation: azure-pipelines.md → Artifact Publishing & Versioning
Audit Trail: azure-pipelines.md → Compliance & Audit Evidence

Infrastructure & Deployment:

Pulumi IaC: pulumi.md → Pulumi with C# Implementation
Environment Provisioning: environments.md → Infrastructure Specifications
GitOps Config: gitops.md → Azure App Configuration, Key Vault

Observability:

Pipeline Metrics: azure-pipelines.md → Pipeline Observability & Metrics
Application Metrics: observability.md → OpenTelemetry Integration
DORA Metrics: azure-pipelines.md → Continuous Improvement & Roadmap (Success Metrics)

External References¶

Azure DevOps:

Organization: https://dev.azure.com/dmitrykhaymov
Pipelines: https://dev.azure.com/dmitrykhaymov/ATP/_build
Templates Repository: https://dev.azure.com/dmitrykhaymov/ConnectSoft/_git/ConnectSoft.AzurePipelines
Boards: https://dev.azure.com/dmitrykhaymov/ATP/_boards/board/t/ATP%20Team/Stories

Tools & Services:

SonarCloud: https://sonarcloud.io/organizations/connectsoft
Azure Container Registry: connectsoft.azurecr.io
Azure Artifacts Feed: https://pkgs.dev.azure.com/dmitrykhaymov/_packaging/ConnectSoft/nuget/v3/index.json
Seq (Dev): https://seq-dev.connectsoft.local
Application Insights: Azure Portal → ATP Resource Group

Standards & Frameworks:

DORA Metrics: https://dora.dev/research/
CycloneDX SBOM: https://cyclonedx.org/
OWASP Top 10: https://owasp.org/www-project-top-ten/
SOC 2 Controls: https://www.aicpa.org/soc2