Skip to content

Webhooks - Audit Trail Platform (ATP)

Push notifications as contracts — ATP's webhooks provide real-time, secure, and reliable outbound notifications with JSON Schema specifications, HMAC signature verification, and automatic retry for event-driven integrations.


📋 Documentation Generation Plan

This document will be generated in 6 cycles. Current progress:

Cycle Topics Estimated Lines Status
Cycle 1 Webhook Fundamentals & Architecture (1-2) ~2,000 ⏳ Not Started
Cycle 2 Event Notification Webhooks (3-4) ~2,500 ⏳ Not Started
Cycle 3 Export & eDiscovery Webhooks (5-6) ~2,000 ⏳ Not Started
Cycle 4 Policy & Alert Webhooks (7-8) ~1,500 ⏳ Not Started
Cycle 5 Security & Reliability (9-10) ~2,500 ⏳ Not Started
Cycle 6 Testing & Debugging (11-12) ~2,000 ⏳ Not Started

Total Estimated Lines: ~12,500


Purpose & Scope

This document provides complete webhook specifications for all outbound notifications from the Audit Trail Platform (ATP), defining webhook payloads, signature verification, retry policies, delivery guarantees, and consumer implementation patterns to enable secure, reliable, and event-driven integrations with external systems.

Key Webhook Principles - Push-Based Notifications: ATP pushes events to consumer URLs (reverse of polling) - JSON Payloads: All webhooks use JSON format with JSON Schema validation - HMAC Signatures: Every webhook signed with shared secret for authenticity - Automatic Retry: Failed deliveries retried with exponential backoff (up to 10 attempts) - Idempotent Delivery: Webhooks may be delivered multiple times (consumers must handle duplicates) - Timeout Protection: Webhook calls timeout after 10 seconds - Dead-Letter Queue: Failed webhooks after max retries go to DLQ for manual review - Audit Trail: All webhook deliveries logged for compliance and debugging

What this document covers

  • Establish ATP's webhook architecture: Delivery service, retry mechanism, monitoring
  • Define event notification webhooks: Real-time notifications for audit events matching criteria
  • Specify export completion webhooks: Notifications when exports are ready for download
  • Document policy alert webhooks: Notifications for policy violations and compliance issues
  • Detail webhook security: HMAC-SHA256 signature verification, secret management, replay protection
  • Describe webhook reliability: Retry policies, exponential backoff, DLQ handling, delivery guarantees
  • Outline webhook configuration: Endpoint registration, filtering, authentication, headers
  • Specify webhook payloads: JSON Schema for all webhook types with examples
  • Document webhook verification: How consumers verify webhook authenticity
  • Detail webhook monitoring: Delivery metrics, failure tracking, alerting
  • Describe webhook testing: Local testing, replay, debugging tools
  • Outline webhook troubleshooting: Common issues and resolution procedures

Out of scope (referenced elsewhere)

Readers & ownership

  • Integration Engineers (owners): Webhook configuration, endpoint management, security
  • Partner Engineers: External system integration, webhook consumption
  • Backend Developers: Webhook implementation, delivery service, retry logic
  • Security Engineers: Signature verification, secret management, security review
  • QA/Test Engineers: Webhook testing, replay testing, failure simulation
  • Operations/SRE: Webhook monitoring, DLQ management, incident response

Artifacts produced

  • Webhook Specifications: JSON Schema for all webhook payload types
  • Webhook Catalog: All webhook types with purposes and schemas
  • Signature Verification Guide: HMAC-SHA256 verification implementation in multiple languages
  • Webhook Configuration API: Endpoints for registering and managing webhooks
  • Retry Policy Documentation: Retry schedule, backoff algorithm, max attempts
  • Webhook Examples: Sample payloads for all webhook types
  • Consumer Implementation Guide: How to receive, verify, and process webhooks
  • Testing Tools: Webhook replay tool, signature generator, mock receiver
  • Monitoring Dashboards: Delivery success rate, retry metrics, DLQ size
  • Troubleshooting Guide: Common webhook issues and debugging procedures

Acceptance (done when)

  • All webhook types are documented with complete JSON Schema specifications
  • Signature verification is documented with implementation examples in 3+ languages
  • Retry policies are specified with schedule, backoff, and max attempts
  • Configuration API is documented for webhook registration and management
  • Delivery guarantees are specified (at-least-once, timeout, retry)
  • Security best practices are documented (signature verification, secret rotation, HTTPS)
  • Consumer implementation guide provides step-by-step webhook handling
  • Testing tools are available for local webhook development
  • Monitoring is operational with delivery metrics and DLQ alerts
  • Troubleshooting guide covers common issues and resolutions
  • Code examples show webhook consumption in multiple languages
  • Documentation complete with schemas, examples, diagrams, and best practices

Detailed Cycle Plan

CYCLE 1: Webhook Fundamentals & Architecture (~2,000 lines)

Topic 1: Webhook Fundamentals

What will be covered:

  • What are Webhooks?
  • HTTP callbacks for event notifications
  • Push-based (reverse of polling)
  • Real-time event delivery to consumer URLs
  • Common in SaaS integrations (GitHub, Stripe, Slack)

  • Why Webhooks for ATP?

  • Real-time notifications (no polling required)
  • Reduced load on query APIs
  • Event-driven integrations with external systems
  • Compliance notifications (export ready, policy violations)
  • Operational alerts (system events, errors)

  • Webhooks vs Other Integration Patterns

  • Webhooks (Push): Server pushes to client URL
  • Polling (Pull): Client repeatedly queries server
  • WebSockets: Bidirectional, persistent connection
  • Server-Sent Events: Server pushes over HTTP long-polling
  • Message Queue: Both produce to shared queue
Pattern Real-Time Complexity ATP Use Case
Webhooks High Medium Event notifications, exports
Polling Low Low Status checks (fallback)
WebSockets Highest High Not used (overkill)
SSE High Medium Not used (limited browser support)
Message Queue High High Internal (not external integrations)
  • ATP Webhook Use Cases
  • Event Notifications: Notify when specific events occur
  • Export Completion: Notify when export is ready for download
  • Policy Violations: Alert on compliance policy breaches
  • Threshold Alerts: Notify when event volume exceeds threshold
  • System Events: Service health changes, maintenance notifications

  • Webhook Lifecycle

  • Registration: Consumer registers webhook endpoint with ATP
  • Configuration: Set filters, event types, authentication
  • Event Trigger: ATP event matches webhook criteria
  • Delivery Attempt: ATP POSTs to consumer URL
  • Verification: Consumer verifies signature
  • Processing: Consumer processes webhook
  • Acknowledgment: Consumer returns 2xx status
  • Retry (if failed): ATP retries with backoff
  • DLQ (if max retries): Webhook sent to dead-letter queue

  • Webhook Guarantees

  • At-Least-Once Delivery: Webhooks delivered at least once (may duplicate)
  • Ordering: No ordering guarantee across different events
  • Timeout: 10 seconds per attempt
  • Retry: Up to 10 attempts with exponential backoff
  • Idempotency: Consumers must handle duplicates

Code Examples: - Webhook lifecycle code - Registration request example - Simple webhook receiver (C#, Node.js, Python)

Diagrams: - Webhook pattern overview - Webhooks vs polling comparison - Webhook lifecycle flow - ATP webhook architecture

Deliverables: - Webhook fundamentals guide - Use case documentation - Lifecycle specification - Comparison with alternatives


Topic 2: ATP Webhook Architecture

What will be covered:

  • Webhook Delivery Service
  • Dedicated service for webhook delivery
  • Queue-based processing (Azure Service Bus)
  • Worker pool for concurrent deliveries
  • Retry management
  • DLQ handling

  • Webhook Architecture Components

    Event Trigger (ATP Service)
    Webhook Router (matches subscriptions)
    Webhook Queue (Azure Service Bus)
    Webhook Delivery Workers (concurrent processors)
    HTTP POST to Consumer URL
    Response Validation
    Success → Acknowledge | Failure → Retry/DLQ
    

  • Webhook Subscription Model

  • Webhook Subscription: Consumer configuration
  • Endpoint URL: Where to POST (HTTPS required)
  • Event Filters: Which events trigger webhook
  • Secret: Shared secret for HMAC signature
  • Headers: Custom headers to include
  • Status: Active, Paused, Failed

  • Webhook Types in ATP

Webhook Type Event Trigger Use Case Frequency
Event Notification Audit event matches filter Real-time event monitoring High volume
Export Completion Export ready for download eDiscovery workflow Low volume
Policy Violation Event violates policy Compliance alerts Medium volume
Threshold Alert Metric exceeds threshold Operational monitoring Low volume
System Event Service status change Infrastructure monitoring Very low
  • Webhook Delivery Flow

    sequenceDiagram
      participant ATP as ATP Service
      participant Router as Webhook Router
      participant Queue as Delivery Queue
      participant Worker as Delivery Worker
      participant Consumer as Consumer URL
    
      ATP->>Router: Event occurred
      Router->>Router: Match subscriptions
      Router->>Queue: Enqueue webhook delivery
      Queue->>Worker: Dequeue delivery task
      Worker->>Worker: Generate HMAC signature
      Worker->>Consumer: POST webhook payload
      alt Success (2xx)
        Consumer-->>Worker: 200 OK
        Worker->>Queue: Acknowledge
      else Failure (4xx, 5xx, timeout)
        Consumer-->>Worker: Error or timeout
        Worker->>Queue: Nack (retry)
      end
    Hold "Alt" / "Option" to enable pan & zoom

  • Retry Policy

  • Retry Schedule: Exponential backoff
    • Attempt 1: Immediate
    • Attempt 2: 30 seconds
    • Attempt 3: 5 minutes
    • Attempt 4: 15 minutes
    • Attempt 5: 1 hour
    • Attempt 6-10: 1 hour each
  • Max Attempts: 10 retries
  • Backoff Algorithm: delay = min(initial_delay * 2^attempt, max_delay)
  • Jitter: ±20% random jitter to prevent thundering herd

  • Dead-Letter Queue (DLQ)

  • Webhooks sent to DLQ after max retries
  • DLQ monitoring and alerting
  • Manual review and reprocessing
  • DLQ retention: 7 days

  • Webhook Monitoring

  • Delivery success rate per webhook
  • Average delivery latency
  • Retry count distribution
  • DLQ message count
  • Consumer response times

Code Examples: - Webhook subscription configuration - Delivery worker implementation - Retry policy code - DLQ processor - Monitoring queries

Diagrams: - Webhook architecture - Delivery flow (sequence diagram) - Retry mechanism - DLQ handling - Monitoring dashboard

Deliverables: - Webhook architecture specification - Delivery service design - Retry policy implementation - DLQ management guide - Monitoring setup


CYCLE 2: Event Notification Webhooks (~2,500 lines)

Topic 3: Event Notification Webhook Specification

What will be covered:

  • Event Notification Webhook Overview
  • Purpose: Real-time notifications when audit events match criteria
  • Use cases: SIEM integration, alerting, workflow automation
  • Filtering: By actor, action, resource, classification
  • Volume: Potentially high (1000s per second)

  • Webhook Registration

  • POST /api/v1/webhooks/subscriptions
  • Request schema:

    {
      "name": "Security Events to SIEM",
      "url": "https://siem.example.com/atp/webhooks",
      "secret": "shared-secret-for-hmac",
      "eventTypes": ["AuditEventReceived"],
      "filters": {
        "classifications": ["Confidential", "Secret"],
        "actions": ["User.Login", "User.Logout", "Data.Access"]
      },
      "headers": {
        "X-Custom-Header": "value"
      },
      "active": true
    }
    

  • Event Notification Payload Schema

    EventNotificationWebhook:
      type: object
      required:
        - webhookId
        - webhookType
        - timestamp
        - event
        - signature
      properties:
        webhookId:
          type: string
          format: uuid
          description: Unique webhook delivery identifier
          example: "webhook-abc-123-def-456"
        webhookType:
          type: string
          enum: [event.notification]
          description: Type of webhook
        timestamp:
          type: string
          format: date-time
          description: When webhook was generated (UTC)
          example: "2024-10-30T10:30:00.123Z"
        event:
          type: object
          description: The audit event that triggered the webhook
          properties:
            eventId:
              type: string
              description: Unique event identifier
            tenantId:
              type: string
              description: Tenant identifier
            timestamp:
              type: string
              format: date-time
            actor:
              type: object
              description: Who performed the action
            action:
              type: string
              description: What action was performed
            resource:
              type: object
              description: What resource was affected
            classification:
              type: string
              enum: [Public, Internal, Confidential, Secret]
            payload:
              type: object
              description: Event-specific data (may be redacted)
        subscription:
          type: object
          description: Webhook subscription details
          properties:
            subscriptionId:
              type: string
            subscriptionName:
              type: string
        signature:
          type: string
          description: HMAC-SHA256 signature for verification
          example: "sha256=abc123def456..."
        attemptNumber:
          type: integer
          description: Delivery attempt number (1 for first attempt)
          minimum: 1
          maximum: 10
    

  • Payload Example (Complete)

    {
      "webhookId": "webhook-abc-123-def-456",
      "webhookType": "event.notification",
      "timestamp": "2024-10-30T10:30:00.123Z",
      "event": {
        "eventId": "01HQZXYZ123456789ABCDEF",
        "tenantId": "tenant-abc-123",
        "timestamp": "2024-10-30T10:29:58.000Z",
        "actor": {
          "userId": "user-123",
          "userName": "alice@example.com",
          "actorType": "User"
        },
        "action": "Data.Access",
        "resource": {
          "resourceType": "Document",
          "resourceId": "doc-456",
          "resourceName": "Confidential Report"
        },
        "classification": "Confidential",
        "context": {
          "ipAddress": "192.168.1.100",
          "userAgent": "Mozilla/5.0..."
        },
        "payload": {
          "accessType": "Read",
          "documentSize": 1024000
        }
      },
      "subscription": {
        "subscriptionId": "sub-xyz-789",
        "subscriptionName": "Security Events to SIEM"
      },
      "signature": "sha256=a1b2c3d4e5f6...",
      "attemptNumber": 1
    }
    

  • HTTP Request Format

    POST /atp/webhooks HTTP/1.1
    Host: siem.example.com
    Content-Type: application/json
    X-ATP-Webhook-Id: webhook-abc-123-def-456
    X-ATP-Webhook-Type: event.notification
    X-ATP-Signature: sha256=a1b2c3d4e5f6...
    X-ATP-Timestamp: 2024-10-30T10:30:00.123Z
    X-ATP-Attempt: 1
    User-Agent: ATP-Webhook-Delivery/1.0
    
    { ... payload ... }
    

  • Expected Consumer Response

  • Success: 200 OK, 201 Created, 202 Accepted, 204 No Content
  • Failure: Any other status triggers retry
  • Timeout: No response within 10 seconds triggers retry
  • Response Body: Ignored (only status code matters)

  • Filtering Options

  • By event type (AuditEventReceived, EventStreamSealed)
  • By classification (Public, Internal, Confidential, Secret)
  • By action (User.Login, Data.Access, Document.Delete)
  • By actor (specific users or services)
  • By resource type (Document, Database, API)
  • By custom payload fields (JSON path expressions)

Code Examples: - Complete webhook payload (JSON) - Webhook subscription registration (cURL, C#) - Webhook receiver implementation (C#, Node.js, Python) - Signature verification code (all languages) - Filter configuration examples

Diagrams: - Event notification flow - Subscription matching logic - Delivery sequence diagram - Filter evaluation

Deliverables: - Event notification webhook spec - JSON Schema definition - Subscription API documentation - Consumer implementation guide - Filter syntax reference


Topic 4: Event Filtering and Transformation

What will be covered:

  • Filter Syntax
  • Simple filters (field equality)
  • Complex filters (AND, OR, NOT logic)
  • JSON path for nested fields
  • Regular expressions
  • Filter validation

  • Filter Examples

    {
      "filters": {
        "classifications": ["Confidential", "Secret"],
        "actions": ["Data.Access", "Data.Export"],
        "actor.userId": "user-123",
        "resource.resourceType": "Document",
        "payload.documentSize": { "gt": 1000000 }
      }
    }
    

  • Payload Transformation

  • Include/exclude fields
  • Redaction of sensitive data
  • Format transformation (JSON → XML)
  • Custom payload templates

  • Rate Limiting for Webhooks

  • Max webhook deliveries per consumer: 100/second
  • Backpressure handling
  • Queue depth limits
  • Throttling notifications

Code Examples: - Filter definitions - Transformation templates - Rate limiting configuration

Diagrams: - Filter evaluation pipeline - Transformation flow - Rate limiting mechanism

Deliverables: - Filter syntax guide - Transformation documentation - Rate limiting specification


CYCLE 3: Export & eDiscovery Webhooks (~2,000 lines)

Topic 5: Export Completion Webhook

What will be covered:

  • Export Completion Webhook Overview
  • Purpose: Notify when export is ready for download
  • Use case: Automated eDiscovery workflows
  • Trigger: ExportCompleted event
  • Volume: Low (exports are infrequent)

  • Payload Schema

    ExportCompletionWebhook:
      type: object
      required:
        - webhookId
        - webhookType
        - timestamp
        - export
        - signature
      properties:
        webhookId:
          type: string
          format: uuid
        webhookType:
          type: string
          enum: [export.completed]
        timestamp:
          type: string
          format: date-time
        export:
          type: object
          required:
            - exportId
            - status
            - requestedAt
            - completedAt
          properties:
            exportId:
              type: string
              description: Export request identifier
            status:
              type: string
              enum: [Completed, Failed]
            requestedAt:
              type: string
              format: date-time
            completedAt:
              type: string
              format: date-time
            eventCount:
              type: integer
              description: Number of events in export
            fileSize:
              type: integer
              description: Export file size in bytes
            format:
              type: string
              enum: [JSON, CSV, XML]
            downloadUrl:
              type: string
              format: uri
              description: Signed URL for download (expires in 24 hours)
            expiresAt:
              type: string
              format: date-time
              description: When download URL expires
        signature:
          type: string
          description: HMAC signature
    

  • Payload Example

    {
      "webhookId": "webhook-export-123",
      "webhookType": "export.completed",
      "timestamp": "2024-10-30T11:00:00.123Z",
      "export": {
        "exportId": "export-xyz-789",
        "status": "Completed",
        "requestedAt": "2024-10-30T10:00:00.000Z",
        "completedAt": "2024-10-30T11:00:00.000Z",
        "eventCount": 15000,
        "fileSize": 25600000,
        "format": "JSON",
        "downloadUrl": "https://exports.atp.example.com/...",
        "expiresAt": "2024-10-31T11:00:00.000Z"
      },
      "signature": "sha256=xyz123..."
    }
    

  • Consumer Workflow

  • Receive webhook
  • Verify signature
  • Extract downloadUrl
  • Download export file
  • Process export data
  • Acknowledge webhook (200 OK)

  • Security Considerations

  • Download URL is time-limited (24 hours)
  • URL is signed (Azure Blob SAS token)
  • HTTPS required for download
  • Webhook signature verification required

Code Examples: - Export completion webhook payload - Consumer implementation (download export) - Signature verification - SAS token handling

Diagrams: - Export completion flow - Download workflow - Security model

Deliverables: - Export webhook specification - Consumer workflow guide - Security documentation - Implementation examples


Topic 6: Export Failure Webhook

What will be covered:

  • Export Failure Webhook
  • Purpose: Notify when export fails
  • Payload includes error details
  • Consumer can retry or investigate

  • Payload Schema

    {
      "webhookType": "export.failed",
      "export": {
        "exportId": "export-xyz-789",
        "status": "Failed",
        "failureReason": "Query timeout exceeded",
        "errorDetails": "...",
        "retriable": true
      }
    }
    

Complete specification


CYCLE 4: Policy & Alert Webhooks (~1,500 lines)

Topic 7: Policy Violation Webhook

What will be covered:

  • Policy Violation Webhook
  • Purpose: Alert on compliance policy breaches
  • Triggers: Event violates classification, retention, access policy
  • Payload: Event details, policy violated, violation type

  • Payload Schema

    {
      "webhookType": "policy.violated",
      "violation": {
        "violationId": "viol-123",
        "eventId": "01HQZ...",
        "policyId": "policy-456",
        "policyName": "Confidential Data Access Policy",
        "violationType": "UnauthorizedAccess",
        "severity": "High",
        "detectedAt": "2024-10-30T10:30:00Z"
      }
    }
    

Complete specification


Topic 8: Threshold and System Alert Webhooks

What will be covered:

  • Threshold Alert Webhook
  • Event volume exceeds threshold
  • Error rate threshold exceeded
  • Storage quota warnings

  • System Event Webhook

  • Service health degradation
  • Maintenance notifications
  • System updates

Complete specifications


CYCLE 5: Security & Reliability (~2,500 lines)

Topic 9: Webhook Security

What will be covered:

  • HMAC Signature Verification
  • Algorithm: HMAC-SHA256
  • Signature Format: sha256={hex_digest}
  • Header: X-ATP-Signature
  • Payload: Request body (raw bytes)
  • Secret: Shared secret from webhook registration

  • Signature Generation (ATP Side)

    import hmac
    import hashlib
    
    def generate_signature(payload: bytes, secret: str) -> str:
        signature = hmac.new(
            key=secret.encode('utf-8'),
            msg=payload,
            digestmod=hashlib.sha256
        ).hexdigest()
        return f"sha256={signature}"
    

  • Signature Verification (Consumer Side)

    public bool VerifyWebhookSignature(
        string payload, 
        string receivedSignature, 
        string secret)
    {
        var expectedSignature = GenerateHmacSignature(payload, secret);
        return CryptographicOperations.FixedTimeEquals(
            Encoding.UTF8.GetBytes(expectedSignature),
            Encoding.UTF8.GetBytes(receivedSignature)
        );
    }
    
    private string GenerateHmacSignature(string payload, string secret)
    {
        using var hmac = new HMACSHA256(Encoding.UTF8.GetBytes(secret));
        var hash = hmac.ComputeHash(Encoding.UTF8.GetBytes(payload));
        return $"sha256={BitConverter.ToString(hash).Replace("-", "").ToLower()}";
    }
    

const crypto = require('crypto');

function verifyWebhookSignature(payload, receivedSignature, secret) {
    const expectedSignature = crypto
        .createHmac('sha256', secret)
        .update(payload)
        .digest('hex');

    const expectedHeader = `sha256=${expectedSignature}`;

    return crypto.timingSafeEqual(
        Buffer.from(expectedHeader),
        Buffer.from(receivedSignature)
    );
}
  • Security Best Practices
  • Always verify signature: Reject unsigned or invalid signatures
  • Use HTTPS only: Never accept webhooks over HTTP
  • Timing-safe comparison: Prevent timing attacks
  • Rotate secrets regularly: Every 90 days
  • Validate timestamp: Reject old webhooks (> 5 minutes)
  • Log all webhooks: For audit and debugging

  • Replay Attack Prevention

  • Timestamp validation (reject old webhooks)
  • Webhook ID deduplication (track processed IDs)
  • Nonce or sequence number (optional)

  • Secret Management

  • Secret stored in Azure Key Vault
  • Rotation procedure
  • Secret versioning
  • Access audit logging

Code Examples: - Signature generation (Python, C#, JavaScript, Go) - Signature verification (all languages) - Timestamp validation - Deduplication logic - Secret rotation procedure

Diagrams: - HMAC signature flow - Security verification pipeline - Replay attack prevention - Secret rotation workflow

Deliverables: - Security implementation guide - Signature verification code (4 languages) - Best practices checklist - Secret management procedures


Topic 10: Webhook Reliability

What will be covered:

  • Delivery Guarantees
  • At-least-once delivery
  • Retry on failure
  • Timeout handling
  • Circuit breaker (pause failing webhooks)

  • Retry Configuration

  • Retry schedule (exponential backoff)
  • Max retry attempts (10)
  • Backoff algorithm with jitter
  • Retry status codes (5xx, timeout)
  • No retry status codes (4xx client errors)

  • Timeout Handling

  • Default timeout: 10 seconds
  • Configurable per webhook
  • Timeout counts as failure (triggers retry)

  • Circuit Breaker

  • Failure threshold: 10 consecutive failures
  • Circuit opens: Webhook paused
  • Cooldown period: 1 hour
  • Circuit half-open: Test with single delivery
  • Circuit closes: Resume normal delivery

  • Idempotency Requirements

  • Consumers must handle duplicate deliveries
  • Use webhookId for deduplication
  • Idempotency window: 24 hours

  • Delivery Acknowledgment

  • Consumer returns 2xx status (success)
  • Any other response triggers retry
  • Response body ignored

  • Dead-Letter Queue (DLQ)

  • After 10 retries, webhook goes to DLQ
  • DLQ monitoring dashboard
  • Manual reprocessing procedures
  • Root cause analysis

Code Examples: - Retry policy implementation - Circuit breaker code - Idempotent webhook handler - DLQ processor - Timeout configuration

Diagrams: - Retry mechanism with backoff - Circuit breaker state machine - DLQ workflow - Idempotency check flow

Deliverables: - Reliability implementation - Retry policy configuration - Circuit breaker setup - DLQ management guide


CYCLE 6: Testing & Debugging (~2,000 lines)

Topic 11: Webhook Testing

What will be covered:

  • Local Webhook Testing
  • Tools: ngrok, localtunnel, webhook.site
  • Exposing localhost to internet
  • Testing signature verification
  • Testing retry behavior

  • Webhook Replay Tool

  • Replaying production webhooks in test
  • Replay from DLQ
  • Replay specific webhook by ID
  • Replay with modified payload

  • Mock Webhook Receiver

  • Test server for webhook development
  • Automatic signature verification
  • Payload logging
  • Response simulation (success, failure, timeout)

  • Webhook Contract Tests

  • Validate payload matches schema
  • Test signature generation
  • Test retry logic
  • Test timeout handling

  • Integration Testing

  • End-to-end webhook delivery tests
  • Test all webhook types
  • Test failure scenarios
  • Test DLQ processing

Code Examples: - Webhook replay script - Mock receiver implementation - Contract test examples - Integration test suite - ngrok setup guide

Diagrams: - Local testing setup - Replay tool architecture - Mock receiver flow - Testing strategy

Deliverables: - Webhook testing guide - Replay tool - Mock receiver - Test framework - Integration test suite


Topic 12: Webhook Debugging and Troubleshooting

What will be covered:

  • Common Webhook Issues

1. Signature Verification Failures - Issue: Consumer rejects webhook (invalid signature) - Causes: Wrong secret, encoding mismatch, timestamp drift - Resolution: Verify secret, check encoding (UTF-8), sync clocks

2. Delivery Timeouts - Issue: Consumer doesn't respond within 10 seconds - Causes: Slow processing, network issues, endpoint down - Resolution: Optimize processing, increase timeout, scale consumer

3. 4xx Errors from Consumer - Issue: Consumer returns 400, 401, 403, 404 - Causes: Invalid payload, wrong URL, authentication failure - Resolution: Check URL, verify payload, check consumer logs

4. Continuous Retries (Circuit Breaker) - Issue: Webhook keeps retrying and goes to DLQ - Causes: Consumer endpoint down, persistent errors - Resolution: Fix consumer, manual DLQ reprocessing

5. Missing Webhooks - Issue: Expected webhook not received - Causes: Filter mismatch, subscription inactive, rate limiting - Resolution: Verify filters, check subscription status, review logs

  • Debugging Tools
  • Webhook delivery logs (Azure Monitor)
  • Webhook trace viewer (correlation ID)
  • Webhook replay tool
  • Signature verification tool

  • Webhook Logs

  • Delivery attempts logged
  • Response status and latency
  • Retry history
  • DLQ entries

  • Troubleshooting Checklist

  • Verify webhook subscription is active
  • Check endpoint URL is reachable
  • Verify HTTPS (not HTTP)
  • Test signature verification locally
  • Check consumer logs for errors
  • Review ATP webhook delivery logs
  • Test with webhook replay tool
  • Check DLQ for failed deliveries

Code Examples: - Debugging script (check webhook status) - Log query examples (Azure Monitor KQL) - Signature verification test - Webhook replay command

Diagrams: - Troubleshooting decision tree - Debugging workflow - Log analysis flow

Deliverables: - Troubleshooting guide - Common issues catalog - Debugging tools - Runbook for on-call


Webhook Catalog Quick Reference

Webhook Types

Webhook Type Trigger Event Frequency Payload Size Retry
event.notification AuditEventReceived (filtered) High 1-10KB Yes
export.completed ExportCompleted Low 1KB Yes
export.failed ExportFailed Low 1KB Yes
policy.violated Policy violation detected Medium 2-5KB Yes
threshold.exceeded Metric threshold exceeded Low 1KB Yes
system.event Service status change Very Low 1KB Yes

Webhook Headers

All webhooks include these headers:

Header Description Example
X-ATP-Webhook-Id Unique webhook delivery ID webhook-abc-123
X-ATP-Webhook-Type Type of webhook event.notification
X-ATP-Signature HMAC-SHA256 signature sha256=abc123...
X-ATP-Timestamp Webhook generation time 2024-10-30T10:30:00Z
X-ATP-Attempt Delivery attempt number 1 (first), 5 (fifth retry)
Content-Type Payload format application/json
User-Agent ATP webhook delivery agent ATP-Webhook-Delivery/1.0

Retry Schedule

Attempt Delay Total Elapsed
1 Immediate 0s
2 30s 30s
3 5m 5m 30s
4 15m 20m 30s
5 1h 1h 20m 30s
6 1h 2h 20m 30s
7 1h 3h 20m 30s
8 1h 4h 20m 30s
9 1h 5h 20m 30s
10 1h 6h 20m 30s
DLQ After 10 failures -

Summary & Implementation Plan

Implementation Phases

Phase 1: Foundations (Cycle 1) - 1 week - Webhook fundamentals and architecture

Phase 2: Event Webhooks (Cycle 2) - 1.5 weeks - Event notification webhooks

Phase 3: Export Webhooks (Cycle 3) - 1 week - Export completion and failure webhooks

Phase 4: Alerts (Cycle 4) - 0.5 weeks - Policy and threshold webhooks

Phase 5: Security (Cycle 5) - 1.5 weeks - Security and reliability

Phase 6: Testing (Cycle 6) - 1 week - Testing and troubleshooting

Success Metrics

  • Delivery Success Rate: >99% within 3 attempts
  • Average Latency: <500ms delivery time
  • DLQ Rate: <0.1% of webhooks
  • Signature Verification: 100% webhooks signed
  • Consumer Compliance: All consumers verify signatures
  • Documentation: All webhook types documented

Ownership & Maintenance

  • Integration Engineers: Cycles 1-4 (specifications)
  • Security Engineers: Cycle 5 (security)
  • QA Engineers: Cycle 6 (testing)
  • Operations: Monitoring and DLQ management

Document Status: ✅ Plan Approved - Ready for Content Generation

Target Start Date: Q3 2025

Expected Completion: Q3 2025 (6.5 weeks)

Owner: Integration Engineering Team

Last Updated: 2024-10-30