Load Testing & Performance - Audit Trail Platform (ATP)¶
Performance validated, confidence delivered — ATP implements comprehensive load testing with k6 and NBomber to validate SLO compliance (P95 latency, throughput, error rates), capacity planning, stress testing (breaking points), spike testing (sudden traffic surges), and soak testing (sustained load) ensuring reliability under expected and extreme workloads.
📋 Documentation Generation Plan¶
This document will be generated in 16 cycles. Current progress:
| Cycle | Topics | Estimated Lines | Status |
|---|---|---|---|
| Cycle 1 | Load Testing Fundamentals (1-2) | ~4,000 | ⏳ Not Started |
| Cycle 2 | k6 Load Testing (3-4) | ~5,000 | ⏳ Not Started |
| Cycle 3 | NBomber Load Testing (5-6) | ~5,000 | ⏳ Not Started |
| Cycle 4 | ATP Load Testing Scenarios (7-8) | ~5,500 | ⏳ Not Started |
| Cycle 5 | SLO Validation Testing (9-10) | ~4,500 | ⏳ Not Started |
| Cycle 6 | Stress Testing (11-12) | ~4,500 | ⏳ Not Started |
| Cycle 7 | Spike Testing (13-14) | ~4,000 | ⏳ Not Started |
| Cycle 8 | Soak Testing (15-16) | ~4,000 | ⏳ Not Started |
| Cycle 9 | Capacity Planning & Scaling (17-18) | ~4,500 | ⏳ Not Started |
| Cycle 10 | Performance Benchmarking (19-20) | ~4,500 | ⏳ Not Started |
| Cycle 11 | Database Performance Testing (21-22) | ~4,000 | ⏳ Not Started |
| Cycle 12 | Messaging Performance Testing (23-24) | ~4,000 | ⏳ Not Started |
| Cycle 13 | Multi-Tenant Load Testing (25-26) | ~4,000 | ⏳ Not Started |
| Cycle 14 | CI/CD Integration (27-28) | ~4,000 | ⏳ Not Started |
| Cycle 15 | Performance Analysis & Optimization (29-30) | ~4,000 | ⏳ Not Started |
| Cycle 16 | Best Practices & Troubleshooting (31-32) | ~3,500 | ⏳ Not Started |
Total Estimated Lines: ~73,500
Purpose & Scope¶
This document provides the complete load testing and performance validation guide for ATP, covering k6 (JavaScript-based load testing), NBomber (.NET load testing), load/stress/spike/soak testing, SLO validation, capacity planning, performance benchmarking, multi-tenant scenarios, and CI/CD integration for ensuring system reliability, scalability, and performance under expected and extreme workloads.
Why Load Testing for ATP?
- SLO Validation: Verify P95 latency <200ms (reads), <350ms (writes) under load
- Capacity Planning: Determine max throughput, concurrent users, resource limits
- Breaking Point Discovery: Find system limits before production incidents
- Bottleneck Identification: Identify slow components (database, cache, message bus)
- Regression Prevention: Detect performance degradation in CI/CD
- Resource Optimization: Right-size infrastructure (CPU, memory, scaling)
- Cost Validation: Ensure infrastructure costs align with performance targets
- User Experience: Validate response times meet user expectations
- Compliance: Validate audit integrity performance under load
- Multi-Tenant Fairness: Ensure tenant isolation under high load
Types of Load Testing
1. Load Testing (Normal Load)
- Expected production workload
- Validate SLO compliance
- Example: 1,000 req/min sustained
2. Stress Testing (Beyond Capacity)
- Push system beyond normal capacity
- Find breaking points
- Example: Gradually increase to 10,000 req/min until failure
3. Spike Testing (Sudden Surge)
- Sudden traffic increases
- Validate auto-scaling
- Example: 0 → 5,000 req/min in 10 seconds
4. Soak Testing (Extended Duration)
- Sustained load over hours/days
- Detect memory leaks, resource exhaustion
- Example: 500 req/min for 24 hours
5. Volume Testing (Data Size)
- Large payloads, high data volumes
- Validate database performance
- Example: 10M records, 100MB exports
6. Endurance Testing (Stability)
- Extended operation under load
- Detect gradual degradation
- Example: 48-hour continuous load
ATP Performance Targets (SLOs)
Gateway Service:
- P95 latency: <200ms (reads), <300ms (writes)
- Availability: 99.95%
- Throughput: 3,000 req/min (Enterprise edition)
Ingestion Service:
- P95 latency: <500ms (per event)
- Availability: 99.9%
- Throughput: 500 events/sec (Enterprise edition)
- Outbox relay lag: P95 <5s
Query Service:
- P95 latency: <500ms (typical queries)
- Availability: 99.9%
- Throughput: 1,500 req/min (Enterprise edition)
- Cache hit ratio: >80%
Projection Service:
- Projection lag: P95 <5s, P99 <10s
- Throughput: Process 10,000 events/sec
- DLQ rate: <0.5%
Export Service:
- Export completion: 95% within 15 minutes
- Throughput: 100GB/day (Enterprise edition)
- Concurrent jobs: 10 (Enterprise edition)
Load Testing Tools
- k6: JavaScript-based, developer-friendly, CI/CD integration
- NBomber: .NET native, good for .NET API testing
- JMeter: GUI-based, complex scenarios
- Gatling: Scala-based, high-performance
- Artillery: Node.js-based, simple YAML configuration
- Locust: Python-based, distributed load generation
ATP Load Testing Stack
Load Generation:
- k6 (primary) - JavaScript test scripts
- NBomber (secondary) - C# test scenarios
Test Infrastructure:
- Azure Container Instances (load generators)
- Kubernetes jobs (distributed load)
- Azure Pipelines (CI/CD integration)
Monitoring:
- Prometheus (metrics collection)
- Grafana (real-time dashboards)
- Application Insights (APM)
Test Data:
- Synthetic audit events (test data generators)
- Production-like payloads (sanitized)
- Multi-tenant scenarios (10-1000 tenants)
Detailed Cycle Plan¶
CYCLE 1: Load Testing Fundamentals (~4,000 lines)¶
Topic 1: Load Testing Philosophy¶
What will be covered: - What is Load Testing?
Definition:
Load testing evaluates system performance under expected
production workloads to validate SLO compliance and identify
performance bottlenecks.
Objectives:
1. Validate SLO targets (latency, throughput, error rates)
2. Identify performance bottlenecks
3. Determine maximum capacity
4. Validate auto-scaling behavior
5. Ensure resource efficiency
ATP Focus:
- Ingestion throughput (events/second)
- Query latency (P50, P95, P99)
- Projection lag (freshness)
- Multi-tenant fairness
- Resource utilization (CPU, memory, I/O)
-
Load Testing vs. Other Performance Tests
Load Testing (Normal Load): - Expected production workload - Validate SLO compliance - Duration: 15-60 minutes - Example: 1,000 req/min sustained Stress Testing (Beyond Capacity): - Gradually increase load until failure - Find breaking points - Duration: 30-120 minutes - Example: Ramp from 1k → 10k req/min Spike Testing (Sudden Surge): - Instant traffic spike - Validate auto-scaling, circuit breakers - Duration: 5-15 minutes - Example: 0 → 5,000 req/min in 10 seconds Soak Testing (Extended Duration): - Sustained load over hours/days - Detect memory leaks, resource exhaustion - Duration: 12-48 hours - Example: 500 req/min for 24 hours Volume Testing (Data Size): - Large payloads, high data volumes - Database performance validation - Duration: 30-120 minutes - Example: Ingest 10M records Endurance Testing (Stability): - Extended operation under load - Detect gradual degradation - Duration: 48+ hours - Example: Continuous 48-hour load -
Load Testing Metrics
Latency Metrics: - P50 (median): Typical user experience - P95 (95th percentile): Most users (SLO target) - P99 (99th percentile): Worst-case for most users - P99.9: Extreme outliers Throughput Metrics: - Requests per second (RPS) - Requests per minute (RPM) - Events per second (EPS) - Bytes per second (BPS) Error Metrics: - Error rate: (errors) / (total requests) - Error types: 4xx, 5xx, timeouts - Error distribution over time Resource Metrics: - CPU utilization (%) - Memory usage (GB) - Network I/O (Mbps) - Disk I/O (IOPS) - Database connections (active/pooled) - Cache hit ratio (%) Business Metrics: - Success rate: (successful requests) / (total requests) - User satisfaction (latency-based) - Cost per request -
ATP Load Testing Principles
1. Test Realistic Scenarios - Production-like data volumes - Real tenant distribution - Actual payload sizes - Realistic user behavior 2. Measure User Experience - Focus on latency (P95, P99) - Track error rates - Validate SLO compliance - NOT just infrastructure metrics 3. Incremental Load Ramping - Start low, gradually increase - Identify degradation points - Avoid overwhelming system immediately 4. Test Under Production-Like Conditions - Similar infrastructure (CPU, memory) - Production data volumes - Network latency simulation - Shared resource contention 5. Automate in CI/CD - Run on every major release - Baseline comparisons - Regression detection - Performance gates 6. Continuous Monitoring - Real-time metrics during tests - Application Insights, Prometheus - Alert on degradation - Capture detailed traces 7. Multi-Tenant Validation - Test tenant isolation - Validate quota enforcement - Ensure fairness (noisy neighbor prevention)
Code Examples: - Load testing concepts - Metric definitions - Test type comparisons
Diagrams: - Load testing taxonomy - Metric visualization - Test execution timeline
Deliverables: - Load testing fundamentals guide - Metric reference - ATP principles document
Topic 2: Load Testing Strategy¶
What will be covered: - ATP Load Testing Strategy
Test Levels:
1. Component Load Tests
- Single service under load
- Example: Ingestion API only
- Fast feedback (5-15 minutes)
- CI/CD integration
2. Service Integration Load Tests
- Multiple services together
- Example: Gateway → Ingestion → Query
- Validates end-to-end performance
- Weekly/monthly execution
3. System Load Tests
- Full ATP system
- Production-like environment
- Multi-tenant scenarios
- Pre-production validation
4. Capacity Planning Tests
- Find maximum capacity
- Resource utilization analysis
- Scaling threshold validation
- Quarterly execution
-
Load Test Execution Frequency
Continuous (CI/CD): - Component load tests on every PR - Fast feedback (5-10 minutes) - SLO validation gates Daily: - Integration load tests (staging) - Baseline comparisons - Regression detection Weekly: - Service integration load tests - Multi-tenant scenarios - Soak test (12 hours) Monthly: - Full system load tests - Capacity planning analysis - Stress tests Quarterly: - Extended soak tests (48 hours) - Volume tests (10M+ records) - Capacity review and optimization -
Load Test Environments
Code Examples: - Strategy documentation - Execution schedules
Diagrams: - Test execution timeline - Environment progression
Deliverables: - Load testing strategy document - Execution calendar - Environment guidelines
CYCLE 2: k6 Load Testing (~5,000 lines)¶
Topic 3: k6 Fundamentals¶
What will be covered: - What is k6?
k6:
- Modern load testing tool
- JavaScript/ES6 scripting
- Developer-friendly
- Excellent CI/CD integration
- Open source (Grafana Labs)
Features:
- HTTP/1.1, HTTP/2, WebSocket
- Threshold-based pass/fail
- Real-time metrics
- Cloud execution (k6 Cloud)
- Extensible with JavaScript modules
Why k6 for ATP:
- Version control friendly (JavaScript files)
- CI/CD integration (Azure Pipelines)
- Threshold-based SLO validation
- Good for API testing
- Active community and documentation
-
k6 Installation & Setup
# Install k6 # Windows (Chocolatey) choco install k6 # macOS (Homebrew) brew install k6 # Linux sudo apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv-keys C5AD17C747E3415A3642D57D77C6C491D6AC1D69 echo "deb https://dl.k6.io/deb stable main" | sudo tee /etc/apt/sources.list.d/k6.list sudo apt-get update sudo apt-get install k6 # Docker docker pull grafana/k6:latest # Verify installation k6 version -
Basic k6 Script Structure
// ingestion-load-test.js import http from 'k6/http'; import { check, sleep } from 'k6'; import { Rate, Trend, Counter } from 'k6/metrics'; // Custom metrics const ingestionSuccessRate = new Rate('ingestion_success'); const ingestionLatency = new Trend('ingestion_latency_ms'); const eventsIngested = new Counter('events_ingested_total'); // Test configuration export const options = { stages: [ { duration: '1m', target: 100 }, // Ramp-up to 100 VUs { duration: '3m', target: 100 }, // Stay at 100 VUs { duration: '1m', target: 200 }, // Ramp-up to 200 VUs { duration: '3m', target: 200 }, // Stay at 200 VUs { duration: '1m', target: 0 }, // Ramp-down ], thresholds: { // SLO thresholds 'http_req_duration': ['p(95)<500'], // P95 latency <500ms 'http_req_failed': ['rate<0.01'], // Error rate <1% 'ingestion_success': ['rate>0.99'], // Success rate >99% }, }; // Test data (generate audit events) function generateAuditEvent(tenantId) { return JSON.stringify({ tenantId: tenantId, eventType: 'user.action', timestamp: new Date().toISOString(), actor: { type: 'user', id: `user-${Math.floor(Math.random() * 1000)}`, }, target: { type: 'resource', id: `resource-${Math.floor(Math.random() * 1000)}`, }, metadata: { action: 'view', ipAddress: '192.168.1.1', }, }); } // Main test function (executed by each VU) export default function () { const tenantId = `tenant-${Math.floor(Math.random() * 10)}`; const url = 'https://gateway.atp.test/api/v1/audit/ingest'; const payload = generateAuditEvent(tenantId); const params = { headers: { 'Content-Type': 'application/json', 'Authorization': `Bearer ${__ENV.TEST_TOKEN}`, 'X-Tenant-Id': tenantId, }, tags: { endpoint: 'ingest', tenant: tenantId, }, }; const startTime = Date.now(); const response = http.post(url, payload, params); const duration = Date.now() - startTime; // Record metrics ingestionLatency.add(duration); ingestionSuccessRate.add(response.status === 201); eventsIngested.add(1); // Validate response const success = check(response, { 'status is 201': (r) => r.status === 201, 'response time < 500ms': (r) => r.timings.duration < 500, 'response has auditRecordId': (r) => { try { const body = JSON.parse(r.body); return body.auditRecordId != null; } catch { return false; } }, }); if (!success) { console.error(`Request failed: ${response.status} - ${response.body}`); } sleep(1); // Think time (1 second between requests) } // Setup function (runs once before test) export function setup() { // Initialize test data, get auth token, etc. const authUrl = 'https://gateway.atp.test/api/v1/auth/token'; const authResponse = http.post(authUrl, JSON.stringify({ clientId: __ENV.CLIENT_ID, clientSecret: __ENV.CLIENT_SECRET, }), { headers: { 'Content-Type': 'application/json' }, }); if (authResponse.status !== 200) { throw new Error('Failed to get auth token'); } const token = JSON.parse(authResponse.body).accessToken; return { token: token }; } // Teardown function (runs once after test) export function teardown(data) { // Cleanup test data, close connections, etc. console.log('Test completed'); } -
Running k6 Tests
# Basic execution k6 run ingestion-load-test.js # With environment variables k6 run --env TEST_TOKEN=abc123 --env CLIENT_ID=client1 ingestion-load-test.js # With custom options override k6 run --vus 50 --duration 5m ingestion-load-test.js # Output to file k6 run --out json=results.json ingestion-load-test.js # Cloud execution (k6 Cloud) k6 cloud ingestion-load-test.js # Docker execution docker run -i grafana/k6 run - < ingestion-load-test.js
Code Examples: - Complete k6 test scripts (20+ scenarios) - Setup/teardown patterns - Custom metrics
Diagrams: - k6 architecture - Test execution flow
Deliverables: - k6 fundamentals guide - Test script templates - Setup instructions
Topic 4: k6 Advanced Features¶
What will be covered: - k6 Scenarios (Advanced Load Patterns)
export const options = {
scenarios: {
// Scenario 1: Constant load
constant_load: {
executor: 'constant-vus',
vus: 100,
duration: '10m',
tags: { scenario: 'constant' },
},
// Scenario 2: Ramping load
ramping_load: {
executor: 'ramping-vus',
startVUs: 0,
stages: [
{ duration: '2m', target: 100 },
{ duration: '5m', target: 100 },
{ duration: '2m', target: 200 },
{ duration: '5m', target: 200 },
{ duration: '2m', target: 0 },
],
tags: { scenario: 'ramping' },
},
// Scenario 3: Shared iterations
shared_iterations: {
executor: 'shared-iterations',
vus: 50,
iterations: 1000,
maxDuration: '30m',
tags: { scenario: 'iterations' },
},
// Scenario 4: Per-VU iterations
per_vu_iterations: {
executor: 'per-vu-iterations',
vus: 10,
iterations: 100,
maxDuration: '30m',
tags: { scenario: 'per-vu' },
},
// Scenario 5: Constant arrival rate
constant_arrival_rate: {
executor: 'constant-arrival-rate',
rate: 100, // 100 requests per second
timeUnit: '1s',
duration: '10m',
preAllocatedVUs: 50,
maxVUs: 200,
tags: { scenario: 'arrival-rate' },
},
// Scenario 6: Ramping arrival rate
ramping_arrival_rate: {
executor: 'ramping-arrival-rate',
startRate: 10,
timeUnit: '1s',
preAllocatedVUs: 10,
maxVUs: 100,
stages: [
{ duration: '2m', target: 50 },
{ duration: '5m', target: 50 },
{ duration: '2m', target: 100 },
{ duration: '5m', target: 100 },
{ duration: '2m', target: 0 },
],
tags: { scenario: 'ramping-arrival' },
},
},
thresholds: {
'http_req_duration{scenario:constant}': ['p(95)<500'],
'http_req_duration{scenario:ramping}': ['p(95)<500'],
},
};
-
k6 HTTP Client Configuration
import http from 'k6/http'; // Configure HTTP client export const options = { // HTTP options httpReq: { timeout: '30s', redirects: 5, }, // Batch requests for efficiency batch: 15, batchPerHost: 5, }; // Custom HTTP client configuration const params = { headers: { 'Authorization': 'Bearer token', 'X-Request-ID': `req-${Math.random()}`, }, timeout: '10s', tags: { name: 'ingestion', tenant: 'tenant-1', }, }; const response = http.post(url, payload, params); -
k6 Thresholds (SLO Validation)
export const options = { thresholds: { // HTTP metrics 'http_req_duration': [ 'p(50)<200', // P50 <200ms 'p(95)<500', // P95 <500ms (SLO) 'p(99)<1000', // P99 <1s ], 'http_req_failed': ['rate<0.01'], // Error rate <1% 'http_req_waiting': ['p(95)<400'], // Waiting time // Custom metrics 'ingestion_success': ['rate>0.99'], // Success rate >99% 'ingestion_latency_ms': ['p(95)<500'], // P95 <500ms 'events_ingested_total': ['count>10000'], // At least 10k events // Per-scenario thresholds 'http_req_duration{scenario:spike}': ['p(95)<1000'], // Spike tolerance // Grouped thresholds 'group_duration{group:::ingestion}': ['avg<300'], // Trend thresholds 'data_sent': ['count>1000000'], // At least 1MB sent 'data_received': ['count>5000000'], // At least 5MB received }, }; -
k6 Output Formats
Code Examples: - Advanced k6 scenarios - Threshold configurations - Output integrations
Deliverables: - k6 advanced guide - Scenario library - Output configuration
CYCLE 3: NBomber Load Testing (~5,000 lines)¶
Topic 5: NBomber Fundamentals¶
What will be covered: - What is NBomber?
NBomber:
- .NET native load testing framework
- C# test scenarios
- Good for .NET API testing
- Integrates with .NET ecosystem
- Open source
Features:
- HTTP/1.1, HTTP/2
- gRPC support
- Custom protocols
- Real-time reporting
- CI/CD integration
Why NBomber for ATP:
- Native .NET integration
- Type-safe (C#)
- Good for ATP .NET services
- Easy integration with existing test projects
-
NBomber Installation
<!-- ATP.Ingestion.LoadTests.csproj --> <Project Sdk="Microsoft.NET.Sdk"> <PropertyGroup> <TargetFramework>net8.0</TargetFramework> <IsPackable>false</IsPackable> </PropertyGroup> <ItemGroup> <PackageReference Include="NBomber" Version="5.2.0" /> <PackageReference Include="NBomber.Http" Version="5.2.0" /> <PackageReference Include="Microsoft.NET.Test.Sdk" Version="17.8.0" /> <PackageReference Include="MSTest.TestAdapter" Version="3.1.1" /> <PackageReference Include="MSTest.TestFramework" Version="3.1.1" /> </ItemGroup> </Project> -
Basic NBomber Test
using NBomber.CSharp; using NBomber.Http.CSharp; using NBomber.Plugins.Network.Ping; namespace ATP.Ingestion.LoadTests; [TestClass] public class IngestionLoadTests { [TestMethod] public void LoadTest_Ingestion_Should_MeetSLO() { // Define scenario var scenario = Scenario.Create("ingestion_load", async context => { // Generate test data var tenantId = $"tenant-{Random.Shared.Next(1, 10)}"; var auditEvent = new { tenantId = tenantId, eventType = "user.action", timestamp = DateTime.UtcNow, actor = new { type = "user", id = $"user-{Random.Shared.Next(1, 1000)}" }, target = new { type = "resource", id = $"resource-{Random.Shared.Next(1, 1000)}" }, metadata = new { action = "view", ipAddress = "192.168.1.1" } }; // Create HTTP request var request = Http.CreateRequest("POST", "https://gateway.atp.test/api/v1/audit/ingest") .WithHeader("Content-Type", "application/json") .WithHeader("Authorization", $"Bearer {Environment.GetEnvironmentVariable("TEST_TOKEN")}") .WithHeader("X-Tenant-Id", tenantId) .WithJsonBody(auditEvent); // Execute request var response = await Http.Send(request, context); // Return response for validation return response.IsSuccessStatusCode ? Response.Ok(statusCode: (int)response.StatusCode) : Response.Fail(statusCode: (int)response.StatusCode, error: response.StatusCode.ToString()); }) .WithWarmUpDuration(TimeSpan.FromSeconds(30)) // Warm-up .WithLoadSimulations( // Ramp-up pattern Simulation.InjectPerSec(rate: 10, during: TimeSpan.FromMinutes(1)), // 10 req/sec for 1 min Simulation.InjectPerSec(rate: 50, during: TimeSpan.FromMinutes(3)), // 50 req/sec for 3 min Simulation.InjectPerSec(rate: 100, during: TimeSpan.FromMinutes(5)), // 100 req/sec for 5 min Simulation.InjectPerSec(rate: 0, during: TimeSpan.FromSeconds(10)) // Ramp-down ); // Configure NBomber var stats = NBomberRunner .RegisterScenarios(scenario) .WithWorkerPlugins(new PingPlugin()) .Run(); // Assert SLO thresholds var scenarioStats = stats.ScenarioStats[0]; // P95 latency <500ms (SLO) Assert.IsTrue( scenarioStats.Ok.Latency.Percent95 < TimeSpan.FromMilliseconds(500), $"P95 latency {scenarioStats.Ok.Latency.Percent95.TotalMilliseconds}ms exceeds SLO 500ms"); // Error rate <1% var errorRate = (scenarioStats.Fail.Request.Count * 100.0) / scenarioStats.AllRequestCount; Assert.IsTrue( errorRate < 1.0, $"Error rate {errorRate:F2}% exceeds SLO 1%"); // Success rate >99% var successRate = (scenarioStats.Ok.Request.Count * 100.0) / scenarioStats.AllRequestCount; Assert.IsTrue( successRate > 99.0, $"Success rate {successRate:F2}% below SLO 99%"); // Throughput validation var throughput = scenarioStats.Ok.Request.RPS; Assert.IsTrue( throughput > 80, // At least 80 req/sec achieved $"Throughput {throughput:F2} req/sec below target 80 req/sec"); // Output summary Console.WriteLine($"Test completed:"); Console.WriteLine($" Total requests: {scenarioStats.AllRequestCount}"); Console.WriteLine($" Successful: {scenarioStats.Ok.Request.Count}"); Console.WriteLine($" Failed: {scenarioStats.Fail.Request.Count}"); Console.WriteLine($" P50 latency: {scenarioStats.Ok.Latency.Percent50.TotalMilliseconds}ms"); Console.WriteLine($" P95 latency: {scenarioStats.Ok.Latency.Percent95.TotalMilliseconds}ms"); Console.WriteLine($" P99 latency: {scenarioStats.Ok.Latency.Percent99.TotalMilliseconds}ms"); Console.WriteLine($" Throughput: {throughput:F2} req/sec"); } }
Code Examples: - Complete NBomber test suites - SLO validation patterns - Assertion helpers
Deliverables: - NBomber fundamentals guide - Test templates - SLO validation library
Topic 6: NBomber Advanced Features¶
What will be covered: - NBomber Scenarios & Simulations
// Constant load
Simulation.InjectConstant(copies: 100, during: TimeSpan.FromMinutes(10))
// Ramp-up
Simulation.RampConstant(copies: 100, during: TimeSpan.FromMinutes(5))
// Injection rate
Simulation.InjectPerSec(rate: 50, during: TimeSpan.FromMinutes(10))
// Ramping injection rate
Simulation.RampPerSec(
minRate: 10,
maxRate: 100,
during: TimeSpan.FromMinutes(5))
// Keep constant
Simulation.KeepConstant(copies: 50, during: TimeSpan.FromMinutes(10))
- Custom Metrics & Reporting
- gRPC Testing
Code Examples: - Advanced NBomber scenarios - Custom metrics - gRPC tests
Deliverables: - NBomber advanced guide
CYCLE 4: ATP Load Testing Scenarios (~5,500 lines)¶
Topic 7: Ingestion Load Testing¶
What will be covered: - Ingestion Load Test Scenarios
// ingestion-scenarios.js
export const scenarios = {
// Scenario 1: Normal ingestion load
normal_load: {
executor: 'ramping-arrival-rate',
startRate: 10,
timeUnit: '1s',
preAllocatedVUs: 10,
maxVUs: 200,
stages: [
{ duration: '2m', target: 50 }, // 50 events/sec
{ duration: '5m', target: 50 }, // Sustain
{ duration: '2m', target: 100 }, // 100 events/sec
{ duration: '5m', target: 100 }, // Sustain
{ duration: '2m', target: 0 }, // Ramp-down
],
tags: { scenario: 'normal' },
},
// Scenario 2: High-volume ingestion
high_volume: {
executor: 'constant-arrival-rate',
rate: 500, // 500 events/sec (Enterprise edition target)
timeUnit: '1s',
duration: '10m',
preAllocatedVUs: 100,
maxVUs: 1000,
tags: { scenario: 'high-volume' },
},
// Scenario 3: Batch ingestion
batch_ingestion: {
executor: 'constant-vus',
vus: 50,
duration: '10m',
tags: { scenario: 'batch' },
},
};
export default function () {
// Generate batch of events
const batchSize = __ENV.SCENARIO === 'batch' ? 100 : 1;
const events = [];
for (let i = 0; i < batchSize; i++) {
events.push(generateAuditEvent());
}
const url = 'https://gateway.atp.test/api/v1/audit/ingest/batch';
const response = http.post(url, JSON.stringify(events), {
headers: {
'Content-Type': 'application/json',
'Authorization': `Bearer ${__ENV.TEST_TOKEN}`,
},
});
check(response, {
'batch status is 200': (r) => r.status === 200,
'batch processed all events': (r) => {
const body = JSON.parse(r.body);
return body.processed === batchSize;
},
});
}
Code Examples: - Complete ingestion scenarios (k6 + NBomber) - Batch ingestion tests - High-volume tests
Deliverables: - Ingestion load test suite
Topic 8: Query Load Testing¶
What will be covered: - Query Load Test Scenarios - Single query load - Complex query load - Search queries - Pagination scenarios - Multi-tenant queries
Code Examples: - Query load test scenarios
Deliverables: - Query load test suite
CYCLE 5: SLO Validation Testing (~4,500 lines)¶
Topic 9: SLO Validation Framework¶
What will be covered: - SLO Validation in Load Tests
// slo-validation.js
export const options = {
thresholds: {
// Gateway SLOs
'http_req_duration{endpoint:gateway}': [
'p(95)<200', // P95 <200ms (read)
'p(95)<300', // P95 <300ms (write)
],
// Ingestion SLOs
'http_req_duration{endpoint:ingest}': ['p(95)<500'], // P95 <500ms
'ingestion_success': ['rate>0.999'], // >99.9%
// Query SLOs
'http_req_duration{endpoint:query}': ['p(95)<500'], // P95 <500ms
'query_success': ['rate>0.999'], // >99.9%
// Projection SLOs (custom metric)
'projection_lag_seconds': ['p(95)<5', 'p(99)<10'],
},
};
- Automated SLO Validation Gates
Code Examples: - SLO validation framework - Threshold configurations
Deliverables: - SLO validation guide
Topic 10: Performance Regression Testing¶
What will be covered: - Baseline Comparisons - Regression Detection - Performance Gates
Deliverables: - Regression testing guide
CYCLE 6-16: Remaining Cycles¶
Coverage includes: - Stress Testing (breaking points) - Spike Testing (sudden surges) - Soak Testing (extended duration) - Capacity Planning & Scaling - Performance Benchmarking - Database Performance Testing - Messaging Performance Testing - Multi-Tenant Load Testing - CI/CD Integration - Performance Analysis & Optimization - Best Practices & Troubleshooting
Summary of Deliverables¶
Complete load testing implementation covering:
- Fundamentals: Philosophy, strategy, test types
- k6: JavaScript-based load testing (primary tool)
- NBomber: .NET native load testing (secondary tool)
- ATP Scenarios: Ingestion, query, export, projection
- SLO Validation: Automated SLO compliance testing
- Stress Testing: Breaking point discovery
- Spike Testing: Auto-scaling validation
- Soak Testing: Memory leak detection
- Capacity Planning: Maximum capacity analysis
- Benchmarking: Performance baselines
- Database Testing: Query performance, indexing
- Messaging Testing: Throughput, latency
- Multi-Tenant: Tenant isolation validation
- CI/CD: Automated performance gates
- Analysis: Bottleneck identification
- Operations: Best practices, troubleshooting
Related Documentation¶
- Testing Strategy: Overall testing approach
- Integration Testing: Real dependency testing
- Alerts & SLOs: SLO definitions and targets
- Limits & Quotas: Rate limits and quotas
- Monitoring: Performance monitoring
- CI/CD Pipeline: Load test integration
This load testing guide provides complete implementation for ATP's performance validation, from k6 and NBomber test scripts to load/stress/spike/soak testing scenarios, SLO validation frameworks, capacity planning procedures, multi-tenant load testing, CI/CD integration, and performance optimization for ensuring system reliability, scalability, and performance under expected and extreme workloads with automated SLO compliance validation and regression detection.