Skip to content

Load Testing & Performance - Audit Trail Platform (ATP)

Performance validated, confidence delivered — ATP implements comprehensive load testing with k6 and NBomber to validate SLO compliance (P95 latency, throughput, error rates), capacity planning, stress testing (breaking points), spike testing (sudden traffic surges), and soak testing (sustained load) ensuring reliability under expected and extreme workloads.


📋 Documentation Generation Plan

This document will be generated in 16 cycles. Current progress:

Cycle Topics Estimated Lines Status
Cycle 1 Load Testing Fundamentals (1-2) ~4,000 ⏳ Not Started
Cycle 2 k6 Load Testing (3-4) ~5,000 ⏳ Not Started
Cycle 3 NBomber Load Testing (5-6) ~5,000 ⏳ Not Started
Cycle 4 ATP Load Testing Scenarios (7-8) ~5,500 ⏳ Not Started
Cycle 5 SLO Validation Testing (9-10) ~4,500 ⏳ Not Started
Cycle 6 Stress Testing (11-12) ~4,500 ⏳ Not Started
Cycle 7 Spike Testing (13-14) ~4,000 ⏳ Not Started
Cycle 8 Soak Testing (15-16) ~4,000 ⏳ Not Started
Cycle 9 Capacity Planning & Scaling (17-18) ~4,500 ⏳ Not Started
Cycle 10 Performance Benchmarking (19-20) ~4,500 ⏳ Not Started
Cycle 11 Database Performance Testing (21-22) ~4,000 ⏳ Not Started
Cycle 12 Messaging Performance Testing (23-24) ~4,000 ⏳ Not Started
Cycle 13 Multi-Tenant Load Testing (25-26) ~4,000 ⏳ Not Started
Cycle 14 CI/CD Integration (27-28) ~4,000 ⏳ Not Started
Cycle 15 Performance Analysis & Optimization (29-30) ~4,000 ⏳ Not Started
Cycle 16 Best Practices & Troubleshooting (31-32) ~3,500 ⏳ Not Started

Total Estimated Lines: ~73,500


Purpose & Scope

This document provides the complete load testing and performance validation guide for ATP, covering k6 (JavaScript-based load testing), NBomber (.NET load testing), load/stress/spike/soak testing, SLO validation, capacity planning, performance benchmarking, multi-tenant scenarios, and CI/CD integration for ensuring system reliability, scalability, and performance under expected and extreme workloads.

Why Load Testing for ATP?

  1. SLO Validation: Verify P95 latency <200ms (reads), <350ms (writes) under load
  2. Capacity Planning: Determine max throughput, concurrent users, resource limits
  3. Breaking Point Discovery: Find system limits before production incidents
  4. Bottleneck Identification: Identify slow components (database, cache, message bus)
  5. Regression Prevention: Detect performance degradation in CI/CD
  6. Resource Optimization: Right-size infrastructure (CPU, memory, scaling)
  7. Cost Validation: Ensure infrastructure costs align with performance targets
  8. User Experience: Validate response times meet user expectations
  9. Compliance: Validate audit integrity performance under load
  10. Multi-Tenant Fairness: Ensure tenant isolation under high load

Types of Load Testing

1. Load Testing (Normal Load)
   - Expected production workload
   - Validate SLO compliance
   - Example: 1,000 req/min sustained

2. Stress Testing (Beyond Capacity)
   - Push system beyond normal capacity
   - Find breaking points
   - Example: Gradually increase to 10,000 req/min until failure

3. Spike Testing (Sudden Surge)
   - Sudden traffic increases
   - Validate auto-scaling
   - Example: 0 → 5,000 req/min in 10 seconds

4. Soak Testing (Extended Duration)
   - Sustained load over hours/days
   - Detect memory leaks, resource exhaustion
   - Example: 500 req/min for 24 hours

5. Volume Testing (Data Size)
   - Large payloads, high data volumes
   - Validate database performance
   - Example: 10M records, 100MB exports

6. Endurance Testing (Stability)
   - Extended operation under load
   - Detect gradual degradation
   - Example: 48-hour continuous load

ATP Performance Targets (SLOs)

Gateway Service:
- P95 latency: <200ms (reads), <300ms (writes)
- Availability: 99.95%
- Throughput: 3,000 req/min (Enterprise edition)

Ingestion Service:
- P95 latency: <500ms (per event)
- Availability: 99.9%
- Throughput: 500 events/sec (Enterprise edition)
- Outbox relay lag: P95 <5s

Query Service:
- P95 latency: <500ms (typical queries)
- Availability: 99.9%
- Throughput: 1,500 req/min (Enterprise edition)
- Cache hit ratio: >80%

Projection Service:
- Projection lag: P95 <5s, P99 <10s
- Throughput: Process 10,000 events/sec
- DLQ rate: <0.5%

Export Service:
- Export completion: 95% within 15 minutes
- Throughput: 100GB/day (Enterprise edition)
- Concurrent jobs: 10 (Enterprise edition)

Load Testing Tools

  • k6: JavaScript-based, developer-friendly, CI/CD integration
  • NBomber: .NET native, good for .NET API testing
  • JMeter: GUI-based, complex scenarios
  • Gatling: Scala-based, high-performance
  • Artillery: Node.js-based, simple YAML configuration
  • Locust: Python-based, distributed load generation

ATP Load Testing Stack

Load Generation:
- k6 (primary) - JavaScript test scripts
- NBomber (secondary) - C# test scenarios

Test Infrastructure:
- Azure Container Instances (load generators)
- Kubernetes jobs (distributed load)
- Azure Pipelines (CI/CD integration)

Monitoring:
- Prometheus (metrics collection)
- Grafana (real-time dashboards)
- Application Insights (APM)

Test Data:
- Synthetic audit events (test data generators)
- Production-like payloads (sanitized)
- Multi-tenant scenarios (10-1000 tenants)

Detailed Cycle Plan

CYCLE 1: Load Testing Fundamentals (~4,000 lines)

Topic 1: Load Testing Philosophy

What will be covered: - What is Load Testing?

Definition:
Load testing evaluates system performance under expected
production workloads to validate SLO compliance and identify
performance bottlenecks.

Objectives:
1. Validate SLO targets (latency, throughput, error rates)
2. Identify performance bottlenecks
3. Determine maximum capacity
4. Validate auto-scaling behavior
5. Ensure resource efficiency

ATP Focus:
- Ingestion throughput (events/second)
- Query latency (P50, P95, P99)
- Projection lag (freshness)
- Multi-tenant fairness
- Resource utilization (CPU, memory, I/O)

  • Load Testing vs. Other Performance Tests

    Load Testing (Normal Load):
    - Expected production workload
    - Validate SLO compliance
    - Duration: 15-60 minutes
    - Example: 1,000 req/min sustained
    
    Stress Testing (Beyond Capacity):
    - Gradually increase load until failure
    - Find breaking points
    - Duration: 30-120 minutes
    - Example: Ramp from 1k → 10k req/min
    
    Spike Testing (Sudden Surge):
    - Instant traffic spike
    - Validate auto-scaling, circuit breakers
    - Duration: 5-15 minutes
    - Example: 0 → 5,000 req/min in 10 seconds
    
    Soak Testing (Extended Duration):
    - Sustained load over hours/days
    - Detect memory leaks, resource exhaustion
    - Duration: 12-48 hours
    - Example: 500 req/min for 24 hours
    
    Volume Testing (Data Size):
    - Large payloads, high data volumes
    - Database performance validation
    - Duration: 30-120 minutes
    - Example: Ingest 10M records
    
    Endurance Testing (Stability):
    - Extended operation under load
    - Detect gradual degradation
    - Duration: 48+ hours
    - Example: Continuous 48-hour load
    

  • Load Testing Metrics

    Latency Metrics:
    - P50 (median): Typical user experience
    - P95 (95th percentile): Most users (SLO target)
    - P99 (99th percentile): Worst-case for most users
    - P99.9: Extreme outliers
    
    Throughput Metrics:
    - Requests per second (RPS)
    - Requests per minute (RPM)
    - Events per second (EPS)
    - Bytes per second (BPS)
    
    Error Metrics:
    - Error rate: (errors) / (total requests)
    - Error types: 4xx, 5xx, timeouts
    - Error distribution over time
    
    Resource Metrics:
    - CPU utilization (%)
    - Memory usage (GB)
    - Network I/O (Mbps)
    - Disk I/O (IOPS)
    - Database connections (active/pooled)
    - Cache hit ratio (%)
    
    Business Metrics:
    - Success rate: (successful requests) / (total requests)
    - User satisfaction (latency-based)
    - Cost per request
    

  • ATP Load Testing Principles

    1. Test Realistic Scenarios
       - Production-like data volumes
       - Real tenant distribution
       - Actual payload sizes
       - Realistic user behavior
    
    2. Measure User Experience
       - Focus on latency (P95, P99)
       - Track error rates
       - Validate SLO compliance
       - NOT just infrastructure metrics
    
    3. Incremental Load Ramping
       - Start low, gradually increase
       - Identify degradation points
       - Avoid overwhelming system immediately
    
    4. Test Under Production-Like Conditions
       - Similar infrastructure (CPU, memory)
       - Production data volumes
       - Network latency simulation
       - Shared resource contention
    
    5. Automate in CI/CD
       - Run on every major release
       - Baseline comparisons
       - Regression detection
       - Performance gates
    
    6. Continuous Monitoring
       - Real-time metrics during tests
       - Application Insights, Prometheus
       - Alert on degradation
       - Capture detailed traces
    
    7. Multi-Tenant Validation
       - Test tenant isolation
       - Validate quota enforcement
       - Ensure fairness (noisy neighbor prevention)
    

Code Examples: - Load testing concepts - Metric definitions - Test type comparisons

Diagrams: - Load testing taxonomy - Metric visualization - Test execution timeline

Deliverables: - Load testing fundamentals guide - Metric reference - ATP principles document


Topic 2: Load Testing Strategy

What will be covered: - ATP Load Testing Strategy

Test Levels:

1. Component Load Tests
   - Single service under load
   - Example: Ingestion API only
   - Fast feedback (5-15 minutes)
   - CI/CD integration

2. Service Integration Load Tests
   - Multiple services together
   - Example: Gateway → Ingestion → Query
   - Validates end-to-end performance
   - Weekly/monthly execution

3. System Load Tests
   - Full ATP system
   - Production-like environment
   - Multi-tenant scenarios
   - Pre-production validation

4. Capacity Planning Tests
   - Find maximum capacity
   - Resource utilization analysis
   - Scaling threshold validation
   - Quarterly execution

  • Load Test Execution Frequency

    Continuous (CI/CD):
    - Component load tests on every PR
    - Fast feedback (5-10 minutes)
    - SLO validation gates
    
    Daily:
    - Integration load tests (staging)
    - Baseline comparisons
    - Regression detection
    
    Weekly:
    - Service integration load tests
    - Multi-tenant scenarios
    - Soak test (12 hours)
    
    Monthly:
    - Full system load tests
    - Capacity planning analysis
    - Stress tests
    
    Quarterly:
    - Extended soak tests (48 hours)
    - Volume tests (10M+ records)
    - Capacity review and optimization
    

  • Load Test Environments

    Development:
    - Quick validation
    - Low load (100 req/min)
    - Fast iteration
    
    Staging:
    - Production-like infrastructure
    - Medium load (1,000 req/min)
    - Pre-production validation
    
    Production:
    - Shadow/canary testing (limited)
    - Controlled load
    - Real-world validation
    - Very careful execution
    

Code Examples: - Strategy documentation - Execution schedules

Diagrams: - Test execution timeline - Environment progression

Deliverables: - Load testing strategy document - Execution calendar - Environment guidelines


CYCLE 2: k6 Load Testing (~5,000 lines)

Topic 3: k6 Fundamentals

What will be covered: - What is k6?

k6:
- Modern load testing tool
- JavaScript/ES6 scripting
- Developer-friendly
- Excellent CI/CD integration
- Open source (Grafana Labs)

Features:
- HTTP/1.1, HTTP/2, WebSocket
- Threshold-based pass/fail
- Real-time metrics
- Cloud execution (k6 Cloud)
- Extensible with JavaScript modules

Why k6 for ATP:
- Version control friendly (JavaScript files)
- CI/CD integration (Azure Pipelines)
- Threshold-based SLO validation
- Good for API testing
- Active community and documentation

  • k6 Installation & Setup

    # Install k6
    # Windows (Chocolatey)
    choco install k6
    
    # macOS (Homebrew)
    brew install k6
    
    # Linux
    sudo apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv-keys C5AD17C747E3415A3642D57D77C6C491D6AC1D69
    echo "deb https://dl.k6.io/deb stable main" | sudo tee /etc/apt/sources.list.d/k6.list
    sudo apt-get update
    sudo apt-get install k6
    
    # Docker
    docker pull grafana/k6:latest
    
    # Verify installation
    k6 version
    

  • Basic k6 Script Structure

    // ingestion-load-test.js
    import http from 'k6/http';
    import { check, sleep } from 'k6';
    import { Rate, Trend, Counter } from 'k6/metrics';
    
    // Custom metrics
    const ingestionSuccessRate = new Rate('ingestion_success');
    const ingestionLatency = new Trend('ingestion_latency_ms');
    const eventsIngested = new Counter('events_ingested_total');
    
    // Test configuration
    export const options = {
      stages: [
        { duration: '1m', target: 100 },   // Ramp-up to 100 VUs
        { duration: '3m', target: 100 },   // Stay at 100 VUs
        { duration: '1m', target: 200 },   // Ramp-up to 200 VUs
        { duration: '3m', target: 200 },   // Stay at 200 VUs
        { duration: '1m', target: 0 },     // Ramp-down
      ],
      thresholds: {
        // SLO thresholds
        'http_req_duration': ['p(95)<500'],           // P95 latency <500ms
        'http_req_failed': ['rate<0.01'],             // Error rate <1%
        'ingestion_success': ['rate>0.99'],           // Success rate >99%
      },
    };
    
    // Test data (generate audit events)
    function generateAuditEvent(tenantId) {
      return JSON.stringify({
        tenantId: tenantId,
        eventType: 'user.action',
        timestamp: new Date().toISOString(),
        actor: {
          type: 'user',
          id: `user-${Math.floor(Math.random() * 1000)}`,
        },
        target: {
          type: 'resource',
          id: `resource-${Math.floor(Math.random() * 1000)}`,
        },
        metadata: {
          action: 'view',
          ipAddress: '192.168.1.1',
        },
      });
    }
    
    // Main test function (executed by each VU)
    export default function () {
      const tenantId = `tenant-${Math.floor(Math.random() * 10)}`;
      const url = 'https://gateway.atp.test/api/v1/audit/ingest';
      const payload = generateAuditEvent(tenantId);
    
      const params = {
        headers: {
          'Content-Type': 'application/json',
          'Authorization': `Bearer ${__ENV.TEST_TOKEN}`,
          'X-Tenant-Id': tenantId,
        },
        tags: {
          endpoint: 'ingest',
          tenant: tenantId,
        },
      };
    
      const startTime = Date.now();
      const response = http.post(url, payload, params);
      const duration = Date.now() - startTime;
    
      // Record metrics
      ingestionLatency.add(duration);
      ingestionSuccessRate.add(response.status === 201);
      eventsIngested.add(1);
    
      // Validate response
      const success = check(response, {
        'status is 201': (r) => r.status === 201,
        'response time < 500ms': (r) => r.timings.duration < 500,
        'response has auditRecordId': (r) => {
          try {
            const body = JSON.parse(r.body);
            return body.auditRecordId != null;
          } catch {
            return false;
          }
        },
      });
    
      if (!success) {
        console.error(`Request failed: ${response.status} - ${response.body}`);
      }
    
      sleep(1); // Think time (1 second between requests)
    }
    
    // Setup function (runs once before test)
    export function setup() {
      // Initialize test data, get auth token, etc.
      const authUrl = 'https://gateway.atp.test/api/v1/auth/token';
      const authResponse = http.post(authUrl, JSON.stringify({
        clientId: __ENV.CLIENT_ID,
        clientSecret: __ENV.CLIENT_SECRET,
      }), {
        headers: { 'Content-Type': 'application/json' },
      });
    
      if (authResponse.status !== 200) {
        throw new Error('Failed to get auth token');
      }
    
      const token = JSON.parse(authResponse.body).accessToken;
      return { token: token };
    }
    
    // Teardown function (runs once after test)
    export function teardown(data) {
      // Cleanup test data, close connections, etc.
      console.log('Test completed');
    }
    

  • Running k6 Tests

    # Basic execution
    k6 run ingestion-load-test.js
    
    # With environment variables
    k6 run --env TEST_TOKEN=abc123 --env CLIENT_ID=client1 ingestion-load-test.js
    
    # With custom options override
    k6 run --vus 50 --duration 5m ingestion-load-test.js
    
    # Output to file
    k6 run --out json=results.json ingestion-load-test.js
    
    # Cloud execution (k6 Cloud)
    k6 cloud ingestion-load-test.js
    
    # Docker execution
    docker run -i grafana/k6 run - < ingestion-load-test.js
    

Code Examples: - Complete k6 test scripts (20+ scenarios) - Setup/teardown patterns - Custom metrics

Diagrams: - k6 architecture - Test execution flow

Deliverables: - k6 fundamentals guide - Test script templates - Setup instructions


Topic 4: k6 Advanced Features

What will be covered: - k6 Scenarios (Advanced Load Patterns)

export const options = {
  scenarios: {
    // Scenario 1: Constant load
    constant_load: {
      executor: 'constant-vus',
      vus: 100,
      duration: '10m',
      tags: { scenario: 'constant' },
    },

    // Scenario 2: Ramping load
    ramping_load: {
      executor: 'ramping-vus',
      startVUs: 0,
      stages: [
        { duration: '2m', target: 100 },
        { duration: '5m', target: 100 },
        { duration: '2m', target: 200 },
        { duration: '5m', target: 200 },
        { duration: '2m', target: 0 },
      ],
      tags: { scenario: 'ramping' },
    },

    // Scenario 3: Shared iterations
    shared_iterations: {
      executor: 'shared-iterations',
      vus: 50,
      iterations: 1000,
      maxDuration: '30m',
      tags: { scenario: 'iterations' },
    },

    // Scenario 4: Per-VU iterations
    per_vu_iterations: {
      executor: 'per-vu-iterations',
      vus: 10,
      iterations: 100,
      maxDuration: '30m',
      tags: { scenario: 'per-vu' },
    },

    // Scenario 5: Constant arrival rate
    constant_arrival_rate: {
      executor: 'constant-arrival-rate',
      rate: 100, // 100 requests per second
      timeUnit: '1s',
      duration: '10m',
      preAllocatedVUs: 50,
      maxVUs: 200,
      tags: { scenario: 'arrival-rate' },
    },

    // Scenario 6: Ramping arrival rate
    ramping_arrival_rate: {
      executor: 'ramping-arrival-rate',
      startRate: 10,
      timeUnit: '1s',
      preAllocatedVUs: 10,
      maxVUs: 100,
      stages: [
        { duration: '2m', target: 50 },
        { duration: '5m', target: 50 },
        { duration: '2m', target: 100 },
        { duration: '5m', target: 100 },
        { duration: '2m', target: 0 },
      ],
      tags: { scenario: 'ramping-arrival' },
    },
  },
  thresholds: {
    'http_req_duration{scenario:constant}': ['p(95)<500'],
    'http_req_duration{scenario:ramping}': ['p(95)<500'],
  },
};

  • k6 HTTP Client Configuration

    import http from 'k6/http';
    
    // Configure HTTP client
    export const options = {
      // HTTP options
      httpReq: {
        timeout: '30s',
        redirects: 5,
      },
    
      // Batch requests for efficiency
      batch: 15,
      batchPerHost: 5,
    };
    
    // Custom HTTP client configuration
    const params = {
      headers: {
        'Authorization': 'Bearer token',
        'X-Request-ID': `req-${Math.random()}`,
      },
      timeout: '10s',
      tags: {
        name: 'ingestion',
        tenant: 'tenant-1',
      },
    };
    
    const response = http.post(url, payload, params);
    

  • k6 Thresholds (SLO Validation)

    export const options = {
      thresholds: {
        // HTTP metrics
        'http_req_duration': [
          'p(50)<200',    // P50 <200ms
          'p(95)<500',    // P95 <500ms (SLO)
          'p(99)<1000',   // P99 <1s
        ],
        'http_req_failed': ['rate<0.01'],  // Error rate <1%
        'http_req_waiting': ['p(95)<400'], // Waiting time
    
        // Custom metrics
        'ingestion_success': ['rate>0.99'],           // Success rate >99%
        'ingestion_latency_ms': ['p(95)<500'],        // P95 <500ms
        'events_ingested_total': ['count>10000'],     // At least 10k events
    
        // Per-scenario thresholds
        'http_req_duration{scenario:spike}': ['p(95)<1000'], // Spike tolerance
    
        // Grouped thresholds
        'group_duration{group:::ingestion}': ['avg<300'],
    
        // Trend thresholds
        'data_sent': ['count>1000000'],     // At least 1MB sent
        'data_received': ['count>5000000'], // At least 5MB received
      },
    };
    

  • k6 Output Formats

    # JSON output
    k6 run --out json=results.json test.js
    
    # CSV output
    k6 run --out csv=results.csv test.js
    
    # InfluxDB output (for Grafana)
    k6 run --out influxdb=http://influxdb:8086/k6 test.js
    
    # Cloud output
    k6 cloud test.js
    
    # Prometheus remote write
    k6 run --out experimental-prometheus-rw test.js
    

Code Examples: - Advanced k6 scenarios - Threshold configurations - Output integrations

Deliverables: - k6 advanced guide - Scenario library - Output configuration


CYCLE 3: NBomber Load Testing (~5,000 lines)

Topic 5: NBomber Fundamentals

What will be covered: - What is NBomber?

NBomber:
- .NET native load testing framework
- C# test scenarios
- Good for .NET API testing
- Integrates with .NET ecosystem
- Open source

Features:
- HTTP/1.1, HTTP/2
- gRPC support
- Custom protocols
- Real-time reporting
- CI/CD integration

Why NBomber for ATP:
- Native .NET integration
- Type-safe (C#)
- Good for ATP .NET services
- Easy integration with existing test projects

  • NBomber Installation

    <!-- ATP.Ingestion.LoadTests.csproj -->
    <Project Sdk="Microsoft.NET.Sdk">
      <PropertyGroup>
        <TargetFramework>net8.0</TargetFramework>
        <IsPackable>false</IsPackable>
      </PropertyGroup>
    
      <ItemGroup>
        <PackageReference Include="NBomber" Version="5.2.0" />
        <PackageReference Include="NBomber.Http" Version="5.2.0" />
        <PackageReference Include="Microsoft.NET.Test.Sdk" Version="17.8.0" />
        <PackageReference Include="MSTest.TestAdapter" Version="3.1.1" />
        <PackageReference Include="MSTest.TestFramework" Version="3.1.1" />
      </ItemGroup>
    </Project>
    

  • Basic NBomber Test

    using NBomber.CSharp;
    using NBomber.Http.CSharp;
    using NBomber.Plugins.Network.Ping;
    
    namespace ATP.Ingestion.LoadTests;
    
    [TestClass]
    public class IngestionLoadTests
    {
        [TestMethod]
        public void LoadTest_Ingestion_Should_MeetSLO()
        {
            // Define scenario
            var scenario = Scenario.Create("ingestion_load", async context =>
            {
                // Generate test data
                var tenantId = $"tenant-{Random.Shared.Next(1, 10)}";
                var auditEvent = new
                {
                    tenantId = tenantId,
                    eventType = "user.action",
                    timestamp = DateTime.UtcNow,
                    actor = new { type = "user", id = $"user-{Random.Shared.Next(1, 1000)}" },
                    target = new { type = "resource", id = $"resource-{Random.Shared.Next(1, 1000)}" },
                    metadata = new { action = "view", ipAddress = "192.168.1.1" }
                };
    
                // Create HTTP request
                var request = Http.CreateRequest("POST", "https://gateway.atp.test/api/v1/audit/ingest")
                    .WithHeader("Content-Type", "application/json")
                    .WithHeader("Authorization", $"Bearer {Environment.GetEnvironmentVariable("TEST_TOKEN")}")
                    .WithHeader("X-Tenant-Id", tenantId)
                    .WithJsonBody(auditEvent);
    
                // Execute request
                var response = await Http.Send(request, context);
    
                // Return response for validation
                return response.IsSuccessStatusCode
                    ? Response.Ok(statusCode: (int)response.StatusCode)
                    : Response.Fail(statusCode: (int)response.StatusCode, error: response.StatusCode.ToString());
            })
            .WithWarmUpDuration(TimeSpan.FromSeconds(30))  // Warm-up
            .WithLoadSimulations(
                // Ramp-up pattern
                Simulation.InjectPerSec(rate: 10, during: TimeSpan.FromMinutes(1)),   // 10 req/sec for 1 min
                Simulation.InjectPerSec(rate: 50, during: TimeSpan.FromMinutes(3)),   // 50 req/sec for 3 min
                Simulation.InjectPerSec(rate: 100, during: TimeSpan.FromMinutes(5)),  // 100 req/sec for 5 min
                Simulation.InjectPerSec(rate: 0, during: TimeSpan.FromSeconds(10))     // Ramp-down
            );
    
            // Configure NBomber
            var stats = NBomberRunner
                .RegisterScenarios(scenario)
                .WithWorkerPlugins(new PingPlugin())
                .Run();
    
            // Assert SLO thresholds
            var scenarioStats = stats.ScenarioStats[0];
    
            // P95 latency <500ms (SLO)
            Assert.IsTrue(
                scenarioStats.Ok.Latency.Percent95 < TimeSpan.FromMilliseconds(500),
                $"P95 latency {scenarioStats.Ok.Latency.Percent95.TotalMilliseconds}ms exceeds SLO 500ms");
    
            // Error rate <1%
            var errorRate = (scenarioStats.Fail.Request.Count * 100.0) / scenarioStats.AllRequestCount;
            Assert.IsTrue(
                errorRate < 1.0,
                $"Error rate {errorRate:F2}% exceeds SLO 1%");
    
            // Success rate >99%
            var successRate = (scenarioStats.Ok.Request.Count * 100.0) / scenarioStats.AllRequestCount;
            Assert.IsTrue(
                successRate > 99.0,
                $"Success rate {successRate:F2}% below SLO 99%");
    
            // Throughput validation
            var throughput = scenarioStats.Ok.Request.RPS;
            Assert.IsTrue(
                throughput > 80,  // At least 80 req/sec achieved
                $"Throughput {throughput:F2} req/sec below target 80 req/sec");
    
            // Output summary
            Console.WriteLine($"Test completed:");
            Console.WriteLine($"  Total requests: {scenarioStats.AllRequestCount}");
            Console.WriteLine($"  Successful: {scenarioStats.Ok.Request.Count}");
            Console.WriteLine($"  Failed: {scenarioStats.Fail.Request.Count}");
            Console.WriteLine($"  P50 latency: {scenarioStats.Ok.Latency.Percent50.TotalMilliseconds}ms");
            Console.WriteLine($"  P95 latency: {scenarioStats.Ok.Latency.Percent95.TotalMilliseconds}ms");
            Console.WriteLine($"  P99 latency: {scenarioStats.Ok.Latency.Percent99.TotalMilliseconds}ms");
            Console.WriteLine($"  Throughput: {throughput:F2} req/sec");
        }
    }
    

Code Examples: - Complete NBomber test suites - SLO validation patterns - Assertion helpers

Deliverables: - NBomber fundamentals guide - Test templates - SLO validation library


Topic 6: NBomber Advanced Features

What will be covered: - NBomber Scenarios & Simulations

// Constant load
Simulation.InjectConstant(copies: 100, during: TimeSpan.FromMinutes(10))

// Ramp-up
Simulation.RampConstant(copies: 100, during: TimeSpan.FromMinutes(5))

// Injection rate
Simulation.InjectPerSec(rate: 50, during: TimeSpan.FromMinutes(10))

// Ramping injection rate
Simulation.RampPerSec(
    minRate: 10,
    maxRate: 100,
    during: TimeSpan.FromMinutes(5))

// Keep constant
Simulation.KeepConstant(copies: 50, during: TimeSpan.FromMinutes(10))

  • Custom Metrics & Reporting
  • gRPC Testing

Code Examples: - Advanced NBomber scenarios - Custom metrics - gRPC tests

Deliverables: - NBomber advanced guide


CYCLE 4: ATP Load Testing Scenarios (~5,500 lines)

Topic 7: Ingestion Load Testing

What will be covered: - Ingestion Load Test Scenarios

// ingestion-scenarios.js
export const scenarios = {
  // Scenario 1: Normal ingestion load
  normal_load: {
    executor: 'ramping-arrival-rate',
    startRate: 10,
    timeUnit: '1s',
    preAllocatedVUs: 10,
    maxVUs: 200,
    stages: [
      { duration: '2m', target: 50 },   // 50 events/sec
      { duration: '5m', target: 50 },   // Sustain
      { duration: '2m', target: 100 },  // 100 events/sec
      { duration: '5m', target: 100 },  // Sustain
      { duration: '2m', target: 0 },    // Ramp-down
    ],
    tags: { scenario: 'normal' },
  },

  // Scenario 2: High-volume ingestion
  high_volume: {
    executor: 'constant-arrival-rate',
    rate: 500,  // 500 events/sec (Enterprise edition target)
    timeUnit: '1s',
    duration: '10m',
    preAllocatedVUs: 100,
    maxVUs: 1000,
    tags: { scenario: 'high-volume' },
  },

  // Scenario 3: Batch ingestion
  batch_ingestion: {
    executor: 'constant-vus',
    vus: 50,
    duration: '10m',
    tags: { scenario: 'batch' },
  },
};

export default function () {
  // Generate batch of events
  const batchSize = __ENV.SCENARIO === 'batch' ? 100 : 1;
  const events = [];

  for (let i = 0; i < batchSize; i++) {
    events.push(generateAuditEvent());
  }

  const url = 'https://gateway.atp.test/api/v1/audit/ingest/batch';
  const response = http.post(url, JSON.stringify(events), {
    headers: {
      'Content-Type': 'application/json',
      'Authorization': `Bearer ${__ENV.TEST_TOKEN}`,
    },
  });

  check(response, {
    'batch status is 200': (r) => r.status === 200,
    'batch processed all events': (r) => {
      const body = JSON.parse(r.body);
      return body.processed === batchSize;
    },
  });
}

Code Examples: - Complete ingestion scenarios (k6 + NBomber) - Batch ingestion tests - High-volume tests

Deliverables: - Ingestion load test suite


Topic 8: Query Load Testing

What will be covered: - Query Load Test Scenarios - Single query load - Complex query load - Search queries - Pagination scenarios - Multi-tenant queries

Code Examples: - Query load test scenarios

Deliverables: - Query load test suite


CYCLE 5: SLO Validation Testing (~4,500 lines)

Topic 9: SLO Validation Framework

What will be covered: - SLO Validation in Load Tests

// slo-validation.js
export const options = {
  thresholds: {
    // Gateway SLOs
    'http_req_duration{endpoint:gateway}': [
      'p(95)<200',  // P95 <200ms (read)
      'p(95)<300',  // P95 <300ms (write)
    ],

    // Ingestion SLOs
    'http_req_duration{endpoint:ingest}': ['p(95)<500'],  // P95 <500ms
    'ingestion_success': ['rate>0.999'],                   // >99.9%

    // Query SLOs
    'http_req_duration{endpoint:query}': ['p(95)<500'],   // P95 <500ms
    'query_success': ['rate>0.999'],                       // >99.9%

    // Projection SLOs (custom metric)
    'projection_lag_seconds': ['p(95)<5', 'p(99)<10'],
  },
};

  • Automated SLO Validation Gates

Code Examples: - SLO validation framework - Threshold configurations

Deliverables: - SLO validation guide


Topic 10: Performance Regression Testing

What will be covered: - Baseline Comparisons - Regression Detection - Performance Gates

Deliverables: - Regression testing guide


CYCLE 6-16: Remaining Cycles

Coverage includes: - Stress Testing (breaking points) - Spike Testing (sudden surges) - Soak Testing (extended duration) - Capacity Planning & Scaling - Performance Benchmarking - Database Performance Testing - Messaging Performance Testing - Multi-Tenant Load Testing - CI/CD Integration - Performance Analysis & Optimization - Best Practices & Troubleshooting


Summary of Deliverables

Complete load testing implementation covering:

  1. Fundamentals: Philosophy, strategy, test types
  2. k6: JavaScript-based load testing (primary tool)
  3. NBomber: .NET native load testing (secondary tool)
  4. ATP Scenarios: Ingestion, query, export, projection
  5. SLO Validation: Automated SLO compliance testing
  6. Stress Testing: Breaking point discovery
  7. Spike Testing: Auto-scaling validation
  8. Soak Testing: Memory leak detection
  9. Capacity Planning: Maximum capacity analysis
  10. Benchmarking: Performance baselines
  11. Database Testing: Query performance, indexing
  12. Messaging Testing: Throughput, latency
  13. Multi-Tenant: Tenant isolation validation
  14. CI/CD: Automated performance gates
  15. Analysis: Bottleneck identification
  16. Operations: Best practices, troubleshooting


This load testing guide provides complete implementation for ATP's performance validation, from k6 and NBomber test scripts to load/stress/spike/soak testing scenarios, SLO validation frameworks, capacity planning procedures, multi-tenant load testing, CI/CD integration, and performance optimization for ensuring system reliability, scalability, and performance under expected and extreme workloads with automated SLO compliance validation and regression detection.