Load Testing & Performance - Audit Trail Platform (ATP)¶

Performance validated, confidence delivered — ATP implements comprehensive load testing with k6 and NBomber to validate SLO compliance (P95 latency, throughput, error rates), capacity planning, stress testing (breaking points), spike testing (sudden traffic surges), and soak testing (sustained load) ensuring reliability under expected and extreme workloads.

📋 Documentation Generation Plan¶

This document will be generated in 16 cycles. Current progress:

Cycle	Topics	Estimated Lines	Status
Cycle 1	Load Testing Fundamentals (1-2)	~4,000	⏳ Not Started
Cycle 2	k6 Load Testing (3-4)	~5,000	⏳ Not Started
Cycle 3	NBomber Load Testing (5-6)	~5,000	⏳ Not Started
Cycle 4	ATP Load Testing Scenarios (7-8)	~5,500	⏳ Not Started
Cycle 5	SLO Validation Testing (9-10)	~4,500	⏳ Not Started
Cycle 6	Stress Testing (11-12)	~4,500	⏳ Not Started
Cycle 7	Spike Testing (13-14)	~4,000	⏳ Not Started
Cycle 8	Soak Testing (15-16)	~4,000	⏳ Not Started
Cycle 9	Capacity Planning & Scaling (17-18)	~4,500	⏳ Not Started
Cycle 10	Performance Benchmarking (19-20)	~4,500	⏳ Not Started
Cycle 11	Database Performance Testing (21-22)	~4,000	⏳ Not Started
Cycle 12	Messaging Performance Testing (23-24)	~4,000	⏳ Not Started
Cycle 13	Multi-Tenant Load Testing (25-26)	~4,000	⏳ Not Started
Cycle 14	CI/CD Integration (27-28)	~4,000	⏳ Not Started
Cycle 15	Performance Analysis & Optimization (29-30)	~4,000	⏳ Not Started
Cycle 16	Best Practices & Troubleshooting (31-32)	~3,500	⏳ Not Started

Total Estimated Lines: ~73,500

Purpose & Scope¶

This document provides the complete load testing and performance validation guide for ATP, covering k6 (JavaScript-based load testing), NBomber (.NET load testing), load/stress/spike/soak testing, SLO validation, capacity planning, performance benchmarking, multi-tenant scenarios, and CI/CD integration for ensuring system reliability, scalability, and performance under expected and extreme workloads.

Why Load Testing for ATP?

SLO Validation: Verify P95 latency <200ms (reads), <350ms (writes) under load
Capacity Planning: Determine max throughput, concurrent users, resource limits
Breaking Point Discovery: Find system limits before production incidents
Bottleneck Identification: Identify slow components (database, cache, message bus)
Regression Prevention: Detect performance degradation in CI/CD
Resource Optimization: Right-size infrastructure (CPU, memory, scaling)
Cost Validation: Ensure infrastructure costs align with performance targets
User Experience: Validate response times meet user expectations
Compliance: Validate audit integrity performance under load
Multi-Tenant Fairness: Ensure tenant isolation under high load

Types of Load Testing

1. Load Testing (Normal Load)
   - Expected production workload
   - Validate SLO compliance
   - Example: 1,000 req/min sustained

2. Stress Testing (Beyond Capacity)
   - Push system beyond normal capacity
   - Find breaking points
   - Example: Gradually increase to 10,000 req/min until failure

3. Spike Testing (Sudden Surge)
   - Sudden traffic increases
   - Validate auto-scaling
   - Example: 0 → 5,000 req/min in 10 seconds

4. Soak Testing (Extended Duration)
   - Sustained load over hours/days
   - Detect memory leaks, resource exhaustion
   - Example: 500 req/min for 24 hours

5. Volume Testing (Data Size)
   - Large payloads, high data volumes
   - Validate database performance
   - Example: 10M records, 100MB exports

6. Endurance Testing (Stability)
   - Extended operation under load
   - Detect gradual degradation
   - Example: 48-hour continuous load

ATP Performance Targets (SLOs)

Gateway Service:
- P95 latency: <200ms (reads), <300ms (writes)
- Availability: 99.95%
- Throughput: 3,000 req/min (Enterprise edition)

Ingestion Service:
- P95 latency: <500ms (per event)
- Availability: 99.9%
- Throughput: 500 events/sec (Enterprise edition)
- Outbox relay lag: P95 <5s

Query Service:
- P95 latency: <500ms (typical queries)
- Availability: 99.9%
- Throughput: 1,500 req/min (Enterprise edition)
- Cache hit ratio: >80%

Projection Service:
- Projection lag: P95 <5s, P99 <10s
- Throughput: Process 10,000 events/sec
- DLQ rate: <0.5%

Export Service:
- Export completion: 95% within 15 minutes
- Throughput: 100GB/day (Enterprise edition)
- Concurrent jobs: 10 (Enterprise edition)

Load Testing Tools

k6: JavaScript-based, developer-friendly, CI/CD integration
NBomber: .NET native, good for .NET API testing
JMeter: GUI-based, complex scenarios
Gatling: Scala-based, high-performance
Artillery: Node.js-based, simple YAML configuration
Locust: Python-based, distributed load generation

ATP Load Testing Stack

Load Generation:
- k6 (primary) - JavaScript test scripts
- NBomber (secondary) - C# test scenarios

Test Infrastructure:
- Azure Container Instances (load generators)
- Kubernetes jobs (distributed load)
- Azure Pipelines (CI/CD integration)

Monitoring:
- Prometheus (metrics collection)
- Grafana (real-time dashboards)
- Application Insights (APM)

Test Data:
- Synthetic audit events (test data generators)
- Production-like payloads (sanitized)
- Multi-tenant scenarios (10-1000 tenants)

Detailed Cycle Plan¶

CYCLE 1: Load Testing Fundamentals (~4,000 lines)¶

Topic 1: Load Testing Philosophy¶

What will be covered: - What is Load Testing?

Definition:
Load testing evaluates system performance under expected
production workloads to validate SLO compliance and identify
performance bottlenecks.

Objectives:
1. Validate SLO targets (latency, throughput, error rates)
2. Identify performance bottlenecks
3. Determine maximum capacity
4. Validate auto-scaling behavior
5. Ensure resource efficiency

ATP Focus:
- Ingestion throughput (events/second)
- Query latency (P50, P95, P99)
- Projection lag (freshness)
- Multi-tenant fairness
- Resource utilization (CPU, memory, I/O)

Load Testing vs. Other Performance Tests

Load Testing (Normal Load):
- Expected production workload
- Validate SLO compliance
- Duration: 15-60 minutes
- Example: 1,000 req/min sustained

Stress Testing (Beyond Capacity):
- Gradually increase load until failure
- Find breaking points
- Duration: 30-120 minutes
- Example: Ramp from 1k → 10k req/min

Spike Testing (Sudden Surge):
- Instant traffic spike
- Validate auto-scaling, circuit breakers
- Duration: 5-15 minutes
- Example: 0 → 5,000 req/min in 10 seconds

Soak Testing (Extended Duration):
- Sustained load over hours/days
- Detect memory leaks, resource exhaustion
- Duration: 12-48 hours
- Example: 500 req/min for 24 hours

Volume Testing (Data Size):
- Large payloads, high data volumes
- Database performance validation
- Duration: 30-120 minutes
- Example: Ingest 10M records

Endurance Testing (Stability):
- Extended operation under load
- Detect gradual degradation
- Duration: 48+ hours
- Example: Continuous 48-hour load

Load Testing Metrics

Latency Metrics:
- P50 (median): Typical user experience
- P95 (95th percentile): Most users (SLO target)
- P99 (99th percentile): Worst-case for most users
- P99.9: Extreme outliers

Throughput Metrics:
- Requests per second (RPS)
- Requests per minute (RPM)
- Events per second (EPS)
- Bytes per second (BPS)

Error Metrics:
- Error rate: (errors) / (total requests)
- Error types: 4xx, 5xx, timeouts
- Error distribution over time

Resource Metrics:
- CPU utilization (%)
- Memory usage (GB)
- Network I/O (Mbps)
- Disk I/O (IOPS)
- Database connections (active/pooled)
- Cache hit ratio (%)

Business Metrics:
- Success rate: (successful requests) / (total requests)
- User satisfaction (latency-based)
- Cost per request

ATP Load Testing Principles

1. Test Realistic Scenarios
   - Production-like data volumes
   - Real tenant distribution
   - Actual payload sizes
   - Realistic user behavior

2. Measure User Experience
   - Focus on latency (P95, P99)
   - Track error rates
   - Validate SLO compliance
   - NOT just infrastructure metrics

3. Incremental Load Ramping
   - Start low, gradually increase
   - Identify degradation points
   - Avoid overwhelming system immediately

4. Test Under Production-Like Conditions
   - Similar infrastructure (CPU, memory)
   - Production data volumes
   - Network latency simulation
   - Shared resource contention

5. Automate in CI/CD
   - Run on every major release
   - Baseline comparisons
   - Regression detection
   - Performance gates

6. Continuous Monitoring
   - Real-time metrics during tests
   - Application Insights, Prometheus
   - Alert on degradation
   - Capture detailed traces

7. Multi-Tenant Validation
   - Test tenant isolation
   - Validate quota enforcement
   - Ensure fairness (noisy neighbor prevention)

Code Examples: - Load testing concepts - Metric definitions - Test type comparisons

Diagrams: - Load testing taxonomy - Metric visualization - Test execution timeline

Deliverables: - Load testing fundamentals guide - Metric reference - ATP principles document

Topic 2: Load Testing Strategy¶

What will be covered: - ATP Load Testing Strategy

Test Levels:

1. Component Load Tests
   - Single service under load
   - Example: Ingestion API only
   - Fast feedback (5-15 minutes)
   - CI/CD integration

2. Service Integration Load Tests
   - Multiple services together
   - Example: Gateway → Ingestion → Query
   - Validates end-to-end performance
   - Weekly/monthly execution

3. System Load Tests
   - Full ATP system
   - Production-like environment
   - Multi-tenant scenarios
   - Pre-production validation

4. Capacity Planning Tests
   - Find maximum capacity
   - Resource utilization analysis
   - Scaling threshold validation
   - Quarterly execution

Load Test Execution Frequency

Continuous (CI/CD):
- Component load tests on every PR
- Fast feedback (5-10 minutes)
- SLO validation gates

Daily:
- Integration load tests (staging)
- Baseline comparisons
- Regression detection

Weekly:
- Service integration load tests
- Multi-tenant scenarios
- Soak test (12 hours)

Monthly:
- Full system load tests
- Capacity planning analysis
- Stress tests

Quarterly:
- Extended soak tests (48 hours)
- Volume tests (10M+ records)
- Capacity review and optimization

Load Test Environments

Development:
- Quick validation
- Low load (100 req/min)
- Fast iteration

Staging:
- Production-like infrastructure
- Medium load (1,000 req/min)
- Pre-production validation

Production:
- Shadow/canary testing (limited)
- Controlled load
- Real-world validation
- Very careful execution

Code Examples: - Strategy documentation - Execution schedules

Diagrams: - Test execution timeline - Environment progression

Deliverables: - Load testing strategy document - Execution calendar - Environment guidelines

CYCLE 2: k6 Load Testing (~5,000 lines)¶

Topic 3: k6 Fundamentals¶

What will be covered: - What is k6?

k6:
- Modern load testing tool
- JavaScript/ES6 scripting
- Developer-friendly
- Excellent CI/CD integration
- Open source (Grafana Labs)

Features:
- HTTP/1.1, HTTP/2, WebSocket
- Threshold-based pass/fail
- Real-time metrics
- Cloud execution (k6 Cloud)
- Extensible with JavaScript modules

Why k6 for ATP:
- Version control friendly (JavaScript files)
- CI/CD integration (Azure Pipelines)
- Threshold-based SLO validation
- Good for API testing
- Active community and documentation

k6 Installation & Setup

# Install k6
# Windows (Chocolatey)
choco install k6

# macOS (Homebrew)
brew install k6

# Linux
sudo apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv-keys C5AD17C747E3415A3642D57D77C6C491D6AC1D69
echo "deb https://dl.k6.io/deb stable main" | sudo tee /etc/apt/sources.list.d/k6.list
sudo apt-get update
sudo apt-get install k6

# Docker
docker pull grafana/k6:latest

# Verify installation
k6 version

Basic k6 Script Structure

// ingestion-load-test.js
import http from 'k6/http';
import { check, sleep } from 'k6';
import { Rate, Trend, Counter } from 'k6/metrics';

// Custom metrics
const ingestionSuccessRate = new Rate('ingestion_success');
const ingestionLatency = new Trend('ingestion_latency_ms');
const eventsIngested = new Counter('events_ingested_total');

// Test configuration
export const options = {
  stages: [
    { duration: '1m', target: 100 },   // Ramp-up to 100 VUs
    { duration: '3m', target: 100 },   // Stay at 100 VUs
    { duration: '1m', target: 200 },   // Ramp-up to 200 VUs
    { duration: '3m', target: 200 },   // Stay at 200 VUs
    { duration: '1m', target: 0 },     // Ramp-down
  ],
  thresholds: {
    // SLO thresholds
    'http_req_duration': ['p(95)<500'],           // P95 latency <500ms
    'http_req_failed': ['rate<0.01'],             // Error rate <1%
    'ingestion_success': ['rate>0.99'],           // Success rate >99%
  },
};

// Test data (generate audit events)
function generateAuditEvent(tenantId) {
  return JSON.stringify({
    tenantId: tenantId,
    eventType: 'user.action',
    timestamp: new Date().toISOString(),
    actor: {
      type: 'user',
      id: `user-${Math.floor(Math.random() * 1000)}`,
    },
    target: {
      type: 'resource',
      id: `resource-${Math.floor(Math.random() * 1000)}`,
    },
    metadata: {
      action: 'view',
      ipAddress: '192.168.1.1',
    },
  });
}

// Main test function (executed by each VU)
export default function () {
  const tenantId = `tenant-${Math.floor(Math.random() * 10)}`;
  const url = 'https://gateway.atp.test/api/v1/audit/ingest';
  const payload = generateAuditEvent(tenantId);

  const params = {
    headers: {
      'Content-Type': 'application/json',
      'Authorization': `Bearer ${__ENV.TEST_TOKEN}`,
      'X-Tenant-Id': tenantId,
    },
    tags: {
      endpoint: 'ingest',
      tenant: tenantId,
    },
  };

  const startTime = Date.now();
  const response = http.post(url, payload, params);
  const duration = Date.now() - startTime;

  // Record metrics
  ingestionLatency.add(duration);
  ingestionSuccessRate.add(response.status === 201);
  eventsIngested.add(1);

  // Validate response
  const success = check(response, {
    'status is 201': (r) => r.status === 201,
    'response time < 500ms': (r) => r.timings.duration < 500,
    'response has auditRecordId': (r) => {
      try {
        const body = JSON.parse(r.body);
        return body.auditRecordId != null;
      } catch {
        return false;
      }
    },
  });

  if (!success) {
    console.error(`Request failed: ${response.status} - ${response.body}`);
  }

  sleep(1); // Think time (1 second between requests)
}

// Setup function (runs once before test)
export function setup() {
  // Initialize test data, get auth token, etc.
  const authUrl = 'https://gateway.atp.test/api/v1/auth/token';
  const authResponse = http.post(authUrl, JSON.stringify({
    clientId: __ENV.CLIENT_ID,
    clientSecret: __ENV.CLIENT_SECRET,
  }), {
    headers: { 'Content-Type': 'application/json' },
  });

  if (authResponse.status !== 200) {
    throw new Error('Failed to get auth token');
  }

  const token = JSON.parse(authResponse.body).accessToken;
  return { token: token };
}

// Teardown function (runs once after test)
export function teardown(data) {
  // Cleanup test data, close connections, etc.
  console.log('Test completed');
}

Running k6 Tests

# Basic execution
k6 run ingestion-load-test.js

# With environment variables
k6 run --env TEST_TOKEN=abc123 --env CLIENT_ID=client1 ingestion-load-test.js

# With custom options override
k6 run --vus 50 --duration 5m ingestion-load-test.js

# Output to file
k6 run --out json=results.json ingestion-load-test.js

# Cloud execution (k6 Cloud)
k6 cloud ingestion-load-test.js

# Docker execution
docker run -i grafana/k6 run - < ingestion-load-test.js

Code Examples: - Complete k6 test scripts (20+ scenarios) - Setup/teardown patterns - Custom metrics

Diagrams: - k6 architecture - Test execution flow

Deliverables: - k6 fundamentals guide - Test script templates - Setup instructions

Topic 4: k6 Advanced Features¶

What will be covered: - k6 Scenarios (Advanced Load Patterns)

href="#__codelineno-14-1">export const options = { scenarios: { // Scenario 1: Constant load constant_load: { executor: 'constant-vus', vus: 100, duration: '10m', tags: { scenario: 'constant' }, }, // Scenario 2: Ramping load ramping_load: { executor: 'ramping-vus', startVUs: 0, stages: [ { duration: '2m', target: 100 }, { duration: '5m', target: 100 }, { duration: '2m', target: 200 }, { duration: '5m', target: 200 }, { duration: '2m', target: 0 }, ], tags: { scenario: 'ramping' }, }, // Scenario 3: Shared iterations shared_iterations: { executor: 'shared-iterations', vus: 50, iterations: 1000, maxDuration: '30m', tags: { scenario: 'iterations' }, }, // Scenario 4: Per-VU iterations per_vu_iterations: { executor: 'per-vu-iterations', vus: 10, iterations: 100, maxDuration: '30m', tags: { scenario: 'per-vu' }, }, // Scenario 5: Constant arrival rate constant_arrival_rate: { executor: 'constant-arrival-rate', rate: 100, // 100 requests per second timeUnit: '1s', duration: '10m', preAllocatedVUs: 50, maxVUs: 200, tags: { scenario: 'arrival-rate' }, }, // Scenario 6: Ramping arrival rate ramping_arrival_rate: { executor: 'ramping-arrival-rate', startRate: 10, timeUnit: '1s', preAllocatedVUs: 10, maxVUs: 100, stages: [ { duration: '2m', target: 50 }, { duration: '5m', target: 50 }, { duration: '2m', target: 100 }, { duration: '5m', target: 100 }, { duration: '2m', target: 0 }, ], tags: { scenario: 'ramping-arrival' }, }, }, thresholds: { 'http_req_duration{scenario:constant}': ['p(95)<500'], 'http_req_duration{scenario:ramping}': ['p(95)<500'], }, };

k6 HTTP Client Configuration

import http from 'k6/http';

// Configure HTTP client
export const options = {
  // HTTP options
  httpReq: {
    timeout: '30s',
    redirects: 5,
  },

  // Batch requests for efficiency
  batch: 15,
  batchPerHost: 5,
};

// Custom HTTP client configuration
const params = {
  headers: {
    'Authorization': 'Bearer token',
    'X-Request-ID': `req-${Math.random()}`,
  },
  timeout: '10s',
  tags: {
    name: 'ingestion',
    tenant: 'tenant-1',
  },
};

const response = http.post(url, payload, params);

k6 Thresholds (SLO Validation)

export const options = {
  thresholds: {
    // HTTP metrics
    'http_req_duration': [
      'p(50)<200',    // P50 <200ms
      'p(95)<500',    // P95 <500ms (SLO)
      'p(99)<1000',   // P99 <1s
    ],
    'http_req_failed': ['rate<0.01'],  // Error rate <1%
    'http_req_waiting': ['p(95)<400'], // Waiting time

    // Custom metrics
    'ingestion_success': ['rate>0.99'],           // Success rate >99%
    'ingestion_latency_ms': ['p(95)<500'],        // P95 <500ms
    'events_ingested_total': ['count>10000'],     // At least 10k events

    // Per-scenario thresholds
    'http_req_duration{scenario:spike}': ['p(95)<1000'], // Spike tolerance

    // Grouped thresholds
    'group_duration{group:::ingestion}': ['avg<300'],

    // Trend thresholds
    'data_sent': ['count>1000000'],     // At least 1MB sent
    'data_received': ['count>5000000'], // At least 5MB received
  },
};

k6 Output Formats

# JSON output
k6 run --out json=results.json test.js

# CSV output
k6 run --out csv=results.csv test.js

# InfluxDB output (for Grafana)
k6 run --out influxdb=http://influxdb:8086/k6 test.js

# Cloud output
k6 cloud test.js

# Prometheus remote write
k6 run --out experimental-prometheus-rw test.js

Code Examples: - Advanced k6 scenarios - Threshold configurations - Output integrations

Deliverables: - k6 advanced guide - Scenario library - Output configuration

CYCLE 3: NBomber Load Testing (~5,000 lines)¶

Topic 5: NBomber Fundamentals¶

What will be covered: - What is NBomber?

NBomber:
- .NET native load testing framework
- C# test scenarios
- Good for .NET API testing
- Integrates with .NET ecosystem
- Open source

Features:
- HTTP/1.1, HTTP/2
- gRPC support
- Custom protocols
- Real-time reporting
- CI/CD integration

Why NBomber for ATP:
- Native .NET integration
- Type-safe (C#)
- Good for ATP .NET services
- Easy integration with existing test projects

NBomber Installation

<!-- ATP.Ingestion.LoadTests.csproj -->
<Project Sdk="Microsoft.NET.Sdk">
  <PropertyGroup>
    <TargetFramework>net8.0</TargetFramework>
    <IsPackable>false</IsPackable>
  </PropertyGroup>

  <ItemGroup>
    <PackageReference Include="NBomber" Version="5.2.0" />
    <PackageReference Include="NBomber.Http" Version="5.2.0" />
    <PackageReference Include="Microsoft.NET.Test.Sdk" Version="17.8.0" />
    <PackageReference Include="MSTest.TestAdapter" Version="3.1.1" />
    <PackageReference Include="MSTest.TestFramework" Version="3.1.1" />
  </ItemGroup>
</Project>

Basic NBomber Test

using NBomber.CSharp;
using NBomber.Http.CSharp;
using NBomber.Plugins.Network.Ping;

namespace ATP.Ingestion.LoadTests;

[TestClass]
public class IngestionLoadTests
{
    [TestMethod]
    public void LoadTest_Ingestion_Should_MeetSLO()
    {
        // Define scenario
        var scenario = Scenario.Create("ingestion_load", async context =>
        {
            // Generate test data
            var tenantId = $"tenant-{Random.Shared.Next(1, 10)}";
            var auditEvent = new
            {
                tenantId = tenantId,
                eventType = "user.action",
                timestamp = DateTime.UtcNow,
                actor = new { type = "user", id = $"user-{Random.Shared.Next(1, 1000)}" },
                target = new { type = "resource", id = $"resource-{Random.Shared.Next(1, 1000)}" },
                metadata = new { action = "view", ipAddress = "192.168.1.1" }
            };

            // Create HTTP request
            var request = Http.CreateRequest("POST", "https://gateway.atp.test/api/v1/audit/ingest")
                .WithHeader("Content-Type", "application/json")
                .WithHeader("Authorization", $"Bearer {Environment.GetEnvironmentVariable("TEST_TOKEN")}")
                .WithHeader("X-Tenant-Id", tenantId)
                .WithJsonBody(auditEvent);

            // Execute request
            var response = await Http.Send(request, context);

            // Return response for validation
            return response.IsSuccessStatusCode
                ? Response.Ok(statusCode: (int)response.StatusCode)
                : Response.Fail(statusCode: (int)response.StatusCode, error: response.StatusCode.ToString());
        })
        .WithWarmUpDuration(TimeSpan.FromSeconds(30))  // Warm-up
        .WithLoadSimulations(
            // Ramp-up pattern
            Simulation.InjectPerSec(rate: 10, during: TimeSpan.FromMinutes(1)),   // 10 req/sec for 1 min
            Simulation.InjectPerSec(rate: 50, during: TimeSpan.FromMinutes(3)),   // 50 req/sec for 3 min
            Simulation.InjectPerSec(rate: 100, during: TimeSpan.FromMinutes(5)),  // 100 req/sec for 5 min
            Simulation.InjectPerSec(rate: 0, during: TimeSpan.FromSeconds(10))     // Ramp-down
        );

        // Configure NBomber
        var stats = NBomberRunner
            .RegisterScenarios(scenario)
            .WithWorkerPlugins(new PingPlugin())
            .Run();

        // Assert SLO thresholds
        var scenarioStats = stats.ScenarioStats[0];

        // P95 latency <500ms (SLO)
        Assert.IsTrue(
            scenarioStats.Ok.Latency.Percent95 < TimeSpan.FromMilliseconds(500),
            $"P95 latency {scenarioStats.Ok.Latency.Percent95.TotalMilliseconds}ms exceeds SLO 500ms");

        // Error rate <1%
        var errorRate = (scenarioStats.Fail.Request.Count * 100.0) / scenarioStats.AllRequestCount;
        Assert.IsTrue(
            errorRate < 1.0,
            $"Error rate {errorRate:F2}% exceeds SLO 1%");

        // Success rate >99%
        var successRate = (scenarioStats.Ok.Request.Count * 100.0) / scenarioStats.AllRequestCount;
        Assert.IsTrue(
            successRate > 99.0,
            $"Success rate {successRate:F2}% below SLO 99%");

        // Throughput validation
        var throughput = scenarioStats.Ok.Request.RPS;
        Assert.IsTrue(
            throughput > 80,  // At least 80 req/sec achieved
            $"Throughput {throughput:F2} req/sec below target 80 req/sec");

        // Output summary
        Console.WriteLine($"Test completed:");
        Console.WriteLine($"  Total requests: {scenarioStats.AllRequestCount}");
        Console.WriteLine($"  Successful: {scenarioStats.Ok.Request.Count}");
        Console.WriteLine($"  Failed: {scenarioStats.Fail.Request.Count}");
        Console.WriteLine($"  P50 latency: {scenarioStats.Ok.Latency.Percent50.TotalMilliseconds}ms");
        Console.WriteLine($"  P95 latency: {scenarioStats.Ok.Latency.Percent95.TotalMilliseconds}ms");
        Console.WriteLine($"  P99 latency: {scenarioStats.Ok.Latency.Percent99.TotalMilliseconds}ms");
        Console.WriteLine($"  Throughput: {throughput:F2} req/sec");
    }
}

Code Examples: - Complete NBomber test suites - SLO validation patterns - Assertion helpers

Deliverables: - NBomber fundamentals guide - Test templates - SLO validation library

Topic 6: NBomber Advanced Features¶

What will be covered: - NBomber Scenarios & Simulations

// Constant load
Simulation.InjectConstant(copies: 100, during: TimeSpan.FromMinutes(10))

// Ramp-up
Simulation.RampConstant(copies: 100, during: TimeSpan.FromMinutes(5))

// Injection rate
Simulation.InjectPerSec(rate: 50, during: TimeSpan.FromMinutes(10))

// Ramping injection rate
Simulation.RampPerSec(
    minRate: 10,
    maxRate: 100,
    during: TimeSpan.FromMinutes(5))

// Keep constant
Simulation.KeepConstant(copies: 50, during: TimeSpan.FromMinutes(10))

Custom Metrics & Reporting
gRPC Testing

Code Examples: - Advanced NBomber scenarios - Custom metrics - gRPC tests

Deliverables: - NBomber advanced guide

CYCLE 4: ATP Load Testing Scenarios (~5,500 lines)¶

Topic 7: Ingestion Load Testing¶

What will be covered: - Ingestion Load Test Scenarios

// ingestion-scenarios.js
export const scenarios = {
  // Scenario 1: Normal ingestion load
  normal_load: {
    executor: 'ramping-arrival-rate',
    startRate: 10,
    timeUnit: '1s',
    preAllocatedVUs: 10,
    maxVUs: 200,
    stages: [
      { duration: '2m', target: 50 },   // 50 events/sec
      { duration: '5m', target: 50 },   // Sustain
      { duration: '2m', target: 100 },  // 100 events/sec
      { duration: '5m', target: 100 },  // Sustain
      { duration: '2m', target: 0 },    // Ramp-down
    ],
    tags: { scenario: 'normal' },
  },

  // Scenario 2: High-volume ingestion
  high_volume: {
    executor: 'constant-arrival-rate',
    rate: 500,  // 500 events/sec (Enterprise edition target)
    timeUnit: '1s',
    duration: '10m',
    preAllocatedVUs: 100,
    maxVUs: 1000,
    tags: { scenario: 'high-volume' },
  },

  // Scenario 3: Batch ingestion
  batch_ingestion: {
    executor: 'constant-vus',
    vus: 50,
    duration: '10m',
    tags: { scenario: 'batch' },
  },
};

export default function () {
  // Generate batch of events
  const batchSize = __ENV.SCENARIO === 'batch' ? 100 : 1;
  const events = [];

  for (let i = 0; i < batchSize; i++) {
    events.push(generateAuditEvent());
  }

  const url = 'https://gateway.atp.test/api/v1/audit/ingest/batch';
  const response = http.post(url, JSON.stringify(events), {
    headers: {
      'Content-Type': 'application/json',
      'Authorization': `Bearer ${__ENV.TEST_TOKEN}`,
    },
  });

  check(response, {
    'batch status is 200': (r) => r.status === 200,
    'batch processed all events': (r) => {
      const body = JSON.parse(r.body);
      return body.processed === batchSize;
    },
  });
}

Code Examples: - Complete ingestion scenarios (k6 + NBomber) - Batch ingestion tests - High-volume tests

Deliverables: - Ingestion load test suite

Topic 8: Query Load Testing¶

What will be covered: - Query Load Test Scenarios - Single query load - Complex query load - Search queries - Pagination scenarios - Multi-tenant queries

Code Examples: - Query load test scenarios

Deliverables: - Query load test suite

CYCLE 5: SLO Validation Testing (~4,500 lines)¶

Topic 9: SLO Validation Framework¶

What will be covered: - SLO Validation in Load Tests

// slo-validation.js
export const options = {
  thresholds: {
    // Gateway SLOs
    'http_req_duration{endpoint:gateway}': [
      'p(95)<200',  // P95 <200ms (read)
      'p(95)<300',  // P95 <300ms (write)
    ],

    // Ingestion SLOs
    'http_req_duration{endpoint:ingest}': ['p(95)<500'],  // P95 <500ms
    'ingestion_success': ['rate>0.999'],                   // >99.9%

    // Query SLOs
    'http_req_duration{endpoint:query}': ['p(95)<500'],   // P95 <500ms
    'query_success': ['rate>0.999'],                       // >99.9%

    // Projection SLOs (custom metric)
    'projection_lag_seconds': ['p(95)<5', 'p(99)<10'],
  },
};

Automated SLO Validation Gates

Code Examples: - SLO validation framework - Threshold configurations

Deliverables: - SLO validation guide

Topic 10: Performance Regression Testing¶

What will be covered: - Baseline Comparisons - Regression Detection - Performance Gates

Deliverables: - Regression testing guide

CYCLE 6-16: Remaining Cycles¶

Coverage includes: - Stress Testing (breaking points) - Spike Testing (sudden surges) - Soak Testing (extended duration) - Capacity Planning & Scaling - Performance Benchmarking - Database Performance Testing - Messaging Performance Testing - Multi-Tenant Load Testing - CI/CD Integration - Performance Analysis & Optimization - Best Practices & Troubleshooting

Summary of Deliverables¶

Complete load testing implementation covering:

Fundamentals: Philosophy, strategy, test types
k6: JavaScript-based load testing (primary tool)
NBomber: .NET native load testing (secondary tool)
ATP Scenarios: Ingestion, query, export, projection
SLO Validation: Automated SLO compliance testing
Stress Testing: Breaking point discovery
Spike Testing: Auto-scaling validation
Soak Testing: Memory leak detection
Capacity Planning: Maximum capacity analysis
Benchmarking: Performance baselines
Database Testing: Query performance, indexing
Messaging Testing: Throughput, latency
Multi-Tenant: Tenant isolation validation
CI/CD: Automated performance gates
Analysis: Bottleneck identification
Operations: Best practices, troubleshooting

Testing Strategy: Overall testing approach
Integration Testing: Real dependency testing
Alerts & SLOs: SLO definitions and targets
Limits & Quotas: Rate limits and quotas
Monitoring: Performance monitoring
CI/CD Pipeline: Load test integration

This load testing guide provides complete implementation for ATP's performance validation, from k6 and NBomber test scripts to load/stress/spike/soak testing scenarios, SLO validation frameworks, capacity planning procedures, multi-tenant load testing, CI/CD integration, and performance optimization for ensuring system reliability, scalability, and performance under expected and extreme workloads with automated SLO compliance validation and regression detection.