Webhooks - Audit Trail Platform (ATP)¶
Push notifications as contracts — ATP's webhooks provide real-time, secure, and reliable outbound notifications with JSON Schema specifications, HMAC signature verification, and automatic retry for event-driven integrations.
📋 Documentation Generation Plan¶
This document will be generated in 6 cycles. Current progress:
| Cycle | Topics | Estimated Lines | Status |
|---|---|---|---|
| Cycle 1 | Webhook Fundamentals & Architecture (1-2) | ~2,000 | ⏳ Not Started |
| Cycle 2 | Event Notification Webhooks (3-4) | ~2,500 | ⏳ Not Started |
| Cycle 3 | Export & eDiscovery Webhooks (5-6) | ~2,000 | ⏳ Not Started |
| Cycle 4 | Policy & Alert Webhooks (7-8) | ~1,500 | ⏳ Not Started |
| Cycle 5 | Security & Reliability (9-10) | ~2,500 | ⏳ Not Started |
| Cycle 6 | Testing & Debugging (11-12) | ~2,000 | ⏳ Not Started |
Total Estimated Lines: ~12,500
Purpose & Scope¶
This document provides complete webhook specifications for all outbound notifications from the Audit Trail Platform (ATP), defining webhook payloads, signature verification, retry policies, delivery guarantees, and consumer implementation patterns to enable secure, reliable, and event-driven integrations with external systems.
Key Webhook Principles - Push-Based Notifications: ATP pushes events to consumer URLs (reverse of polling) - JSON Payloads: All webhooks use JSON format with JSON Schema validation - HMAC Signatures: Every webhook signed with shared secret for authenticity - Automatic Retry: Failed deliveries retried with exponential backoff (up to 10 attempts) - Idempotent Delivery: Webhooks may be delivered multiple times (consumers must handle duplicates) - Timeout Protection: Webhook calls timeout after 10 seconds - Dead-Letter Queue: Failed webhooks after max retries go to DLQ for manual review - Audit Trail: All webhook deliveries logged for compliance and debugging
What this document covers
- Establish ATP's webhook architecture: Delivery service, retry mechanism, monitoring
- Define event notification webhooks: Real-time notifications for audit events matching criteria
- Specify export completion webhooks: Notifications when exports are ready for download
- Document policy alert webhooks: Notifications for policy violations and compliance issues
- Detail webhook security: HMAC-SHA256 signature verification, secret management, replay protection
- Describe webhook reliability: Retry policies, exponential backoff, DLQ handling, delivery guarantees
- Outline webhook configuration: Endpoint registration, filtering, authentication, headers
- Specify webhook payloads: JSON Schema for all webhook types with examples
- Document webhook verification: How consumers verify webhook authenticity
- Detail webhook monitoring: Delivery metrics, failure tracking, alerting
- Describe webhook testing: Local testing, replay, debugging tools
- Outline webhook troubleshooting: Common issues and resolution procedures
Out of scope (referenced elsewhere)
- REST API specifications (see rest-apis.md)
- Message schemas for internal events (see message-schemas.md)
- Domain model details (see ../aggregates-entities.md)
- Implementation patterns (see ../../implementation/)
- SDK documentation (see ../../sdk/)
Readers & ownership
- Integration Engineers (owners): Webhook configuration, endpoint management, security
- Partner Engineers: External system integration, webhook consumption
- Backend Developers: Webhook implementation, delivery service, retry logic
- Security Engineers: Signature verification, secret management, security review
- QA/Test Engineers: Webhook testing, replay testing, failure simulation
- Operations/SRE: Webhook monitoring, DLQ management, incident response
Artifacts produced
- Webhook Specifications: JSON Schema for all webhook payload types
- Webhook Catalog: All webhook types with purposes and schemas
- Signature Verification Guide: HMAC-SHA256 verification implementation in multiple languages
- Webhook Configuration API: Endpoints for registering and managing webhooks
- Retry Policy Documentation: Retry schedule, backoff algorithm, max attempts
- Webhook Examples: Sample payloads for all webhook types
- Consumer Implementation Guide: How to receive, verify, and process webhooks
- Testing Tools: Webhook replay tool, signature generator, mock receiver
- Monitoring Dashboards: Delivery success rate, retry metrics, DLQ size
- Troubleshooting Guide: Common webhook issues and debugging procedures
Acceptance (done when)
- All webhook types are documented with complete JSON Schema specifications
- Signature verification is documented with implementation examples in 3+ languages
- Retry policies are specified with schedule, backoff, and max attempts
- Configuration API is documented for webhook registration and management
- Delivery guarantees are specified (at-least-once, timeout, retry)
- Security best practices are documented (signature verification, secret rotation, HTTPS)
- Consumer implementation guide provides step-by-step webhook handling
- Testing tools are available for local webhook development
- Monitoring is operational with delivery metrics and DLQ alerts
- Troubleshooting guide covers common issues and resolutions
- Code examples show webhook consumption in multiple languages
- Documentation complete with schemas, examples, diagrams, and best practices
Detailed Cycle Plan¶
CYCLE 1: Webhook Fundamentals & Architecture (~2,000 lines)¶
Topic 1: Webhook Fundamentals¶
What will be covered:
- What are Webhooks?
- HTTP callbacks for event notifications
- Push-based (reverse of polling)
- Real-time event delivery to consumer URLs
-
Common in SaaS integrations (GitHub, Stripe, Slack)
-
Why Webhooks for ATP?
- Real-time notifications (no polling required)
- Reduced load on query APIs
- Event-driven integrations with external systems
- Compliance notifications (export ready, policy violations)
-
Operational alerts (system events, errors)
-
Webhooks vs Other Integration Patterns
- Webhooks (Push): Server pushes to client URL
- Polling (Pull): Client repeatedly queries server
- WebSockets: Bidirectional, persistent connection
- Server-Sent Events: Server pushes over HTTP long-polling
- Message Queue: Both produce to shared queue
| Pattern | Real-Time | Complexity | ATP Use Case |
|---|---|---|---|
| Webhooks | High | Medium | Event notifications, exports |
| Polling | Low | Low | Status checks (fallback) |
| WebSockets | Highest | High | Not used (overkill) |
| SSE | High | Medium | Not used (limited browser support) |
| Message Queue | High | High | Internal (not external integrations) |
- ATP Webhook Use Cases
- Event Notifications: Notify when specific events occur
- Export Completion: Notify when export is ready for download
- Policy Violations: Alert on compliance policy breaches
- Threshold Alerts: Notify when event volume exceeds threshold
-
System Events: Service health changes, maintenance notifications
-
Webhook Lifecycle
- Registration: Consumer registers webhook endpoint with ATP
- Configuration: Set filters, event types, authentication
- Event Trigger: ATP event matches webhook criteria
- Delivery Attempt: ATP POSTs to consumer URL
- Verification: Consumer verifies signature
- Processing: Consumer processes webhook
- Acknowledgment: Consumer returns 2xx status
- Retry (if failed): ATP retries with backoff
-
DLQ (if max retries): Webhook sent to dead-letter queue
-
Webhook Guarantees
- At-Least-Once Delivery: Webhooks delivered at least once (may duplicate)
- Ordering: No ordering guarantee across different events
- Timeout: 10 seconds per attempt
- Retry: Up to 10 attempts with exponential backoff
- Idempotency: Consumers must handle duplicates
Code Examples: - Webhook lifecycle code - Registration request example - Simple webhook receiver (C#, Node.js, Python)
Diagrams: - Webhook pattern overview - Webhooks vs polling comparison - Webhook lifecycle flow - ATP webhook architecture
Deliverables: - Webhook fundamentals guide - Use case documentation - Lifecycle specification - Comparison with alternatives
Topic 2: ATP Webhook Architecture¶
What will be covered:
- Webhook Delivery Service
- Dedicated service for webhook delivery
- Queue-based processing (Azure Service Bus)
- Worker pool for concurrent deliveries
- Retry management
-
DLQ handling
-
Webhook Architecture Components
-
Webhook Subscription Model
- Webhook Subscription: Consumer configuration
- Endpoint URL: Where to POST (HTTPS required)
- Event Filters: Which events trigger webhook
- Secret: Shared secret for HMAC signature
- Headers: Custom headers to include
-
Status: Active, Paused, Failed
-
Webhook Types in ATP
| Webhook Type | Event Trigger | Use Case | Frequency |
|---|---|---|---|
| Event Notification | Audit event matches filter | Real-time event monitoring | High volume |
| Export Completion | Export ready for download | eDiscovery workflow | Low volume |
| Policy Violation | Event violates policy | Compliance alerts | Medium volume |
| Threshold Alert | Metric exceeds threshold | Operational monitoring | Low volume |
| System Event | Service status change | Infrastructure monitoring | Very low |
-
Webhook Delivery Flow
sequenceDiagram participant ATP as ATP Service participant Router as Webhook Router participant Queue as Delivery Queue participant Worker as Delivery Worker participant Consumer as Consumer URL ATP->>Router: Event occurred Router->>Router: Match subscriptions Router->>Queue: Enqueue webhook delivery Queue->>Worker: Dequeue delivery task Worker->>Worker: Generate HMAC signature Worker->>Consumer: POST webhook payload alt Success (2xx) Consumer-->>Worker: 200 OK Worker->>Queue: Acknowledge else Failure (4xx, 5xx, timeout) Consumer-->>Worker: Error or timeout Worker->>Queue: Nack (retry) endHold "Alt" / "Option" to enable pan & zoom -
Retry Policy
- Retry Schedule: Exponential backoff
- Attempt 1: Immediate
- Attempt 2: 30 seconds
- Attempt 3: 5 minutes
- Attempt 4: 15 minutes
- Attempt 5: 1 hour
- Attempt 6-10: 1 hour each
- Max Attempts: 10 retries
- Backoff Algorithm:
delay = min(initial_delay * 2^attempt, max_delay) -
Jitter: ±20% random jitter to prevent thundering herd
-
Dead-Letter Queue (DLQ)
- Webhooks sent to DLQ after max retries
- DLQ monitoring and alerting
- Manual review and reprocessing
-
DLQ retention: 7 days
-
Webhook Monitoring
- Delivery success rate per webhook
- Average delivery latency
- Retry count distribution
- DLQ message count
- Consumer response times
Code Examples: - Webhook subscription configuration - Delivery worker implementation - Retry policy code - DLQ processor - Monitoring queries
Diagrams: - Webhook architecture - Delivery flow (sequence diagram) - Retry mechanism - DLQ handling - Monitoring dashboard
Deliverables: - Webhook architecture specification - Delivery service design - Retry policy implementation - DLQ management guide - Monitoring setup
CYCLE 2: Event Notification Webhooks (~2,500 lines)¶
Topic 3: Event Notification Webhook Specification¶
What will be covered:
- Event Notification Webhook Overview
- Purpose: Real-time notifications when audit events match criteria
- Use cases: SIEM integration, alerting, workflow automation
- Filtering: By actor, action, resource, classification
-
Volume: Potentially high (1000s per second)
-
Webhook Registration
- POST /api/v1/webhooks/subscriptions
-
Request schema:
{ "name": "Security Events to SIEM", "url": "https://siem.example.com/atp/webhooks", "secret": "shared-secret-for-hmac", "eventTypes": ["AuditEventReceived"], "filters": { "classifications": ["Confidential", "Secret"], "actions": ["User.Login", "User.Logout", "Data.Access"] }, "headers": { "X-Custom-Header": "value" }, "active": true } -
Event Notification Payload Schema
EventNotificationWebhook: type: object required: - webhookId - webhookType - timestamp - event - signature properties: webhookId: type: string format: uuid description: Unique webhook delivery identifier example: "webhook-abc-123-def-456" webhookType: type: string enum: [event.notification] description: Type of webhook timestamp: type: string format: date-time description: When webhook was generated (UTC) example: "2024-10-30T10:30:00.123Z" event: type: object description: The audit event that triggered the webhook properties: eventId: type: string description: Unique event identifier tenantId: type: string description: Tenant identifier timestamp: type: string format: date-time actor: type: object description: Who performed the action action: type: string description: What action was performed resource: type: object description: What resource was affected classification: type: string enum: [Public, Internal, Confidential, Secret] payload: type: object description: Event-specific data (may be redacted) subscription: type: object description: Webhook subscription details properties: subscriptionId: type: string subscriptionName: type: string signature: type: string description: HMAC-SHA256 signature for verification example: "sha256=abc123def456..." attemptNumber: type: integer description: Delivery attempt number (1 for first attempt) minimum: 1 maximum: 10 -
Payload Example (Complete)
{ "webhookId": "webhook-abc-123-def-456", "webhookType": "event.notification", "timestamp": "2024-10-30T10:30:00.123Z", "event": { "eventId": "01HQZXYZ123456789ABCDEF", "tenantId": "tenant-abc-123", "timestamp": "2024-10-30T10:29:58.000Z", "actor": { "userId": "user-123", "userName": "alice@example.com", "actorType": "User" }, "action": "Data.Access", "resource": { "resourceType": "Document", "resourceId": "doc-456", "resourceName": "Confidential Report" }, "classification": "Confidential", "context": { "ipAddress": "192.168.1.100", "userAgent": "Mozilla/5.0..." }, "payload": { "accessType": "Read", "documentSize": 1024000 } }, "subscription": { "subscriptionId": "sub-xyz-789", "subscriptionName": "Security Events to SIEM" }, "signature": "sha256=a1b2c3d4e5f6...", "attemptNumber": 1 } -
HTTP Request Format
POST /atp/webhooks HTTP/1.1 Host: siem.example.com Content-Type: application/json X-ATP-Webhook-Id: webhook-abc-123-def-456 X-ATP-Webhook-Type: event.notification X-ATP-Signature: sha256=a1b2c3d4e5f6... X-ATP-Timestamp: 2024-10-30T10:30:00.123Z X-ATP-Attempt: 1 User-Agent: ATP-Webhook-Delivery/1.0 { ... payload ... } -
Expected Consumer Response
- Success: 200 OK, 201 Created, 202 Accepted, 204 No Content
- Failure: Any other status triggers retry
- Timeout: No response within 10 seconds triggers retry
-
Response Body: Ignored (only status code matters)
-
Filtering Options
- By event type (AuditEventReceived, EventStreamSealed)
- By classification (Public, Internal, Confidential, Secret)
- By action (User.Login, Data.Access, Document.Delete)
- By actor (specific users or services)
- By resource type (Document, Database, API)
- By custom payload fields (JSON path expressions)
Code Examples: - Complete webhook payload (JSON) - Webhook subscription registration (cURL, C#) - Webhook receiver implementation (C#, Node.js, Python) - Signature verification code (all languages) - Filter configuration examples
Diagrams: - Event notification flow - Subscription matching logic - Delivery sequence diagram - Filter evaluation
Deliverables: - Event notification webhook spec - JSON Schema definition - Subscription API documentation - Consumer implementation guide - Filter syntax reference
Topic 4: Event Filtering and Transformation¶
What will be covered:
- Filter Syntax
- Simple filters (field equality)
- Complex filters (AND, OR, NOT logic)
- JSON path for nested fields
- Regular expressions
-
Filter validation
-
Filter Examples
-
Payload Transformation
- Include/exclude fields
- Redaction of sensitive data
- Format transformation (JSON → XML)
-
Custom payload templates
-
Rate Limiting for Webhooks
- Max webhook deliveries per consumer: 100/second
- Backpressure handling
- Queue depth limits
- Throttling notifications
Code Examples: - Filter definitions - Transformation templates - Rate limiting configuration
Diagrams: - Filter evaluation pipeline - Transformation flow - Rate limiting mechanism
Deliverables: - Filter syntax guide - Transformation documentation - Rate limiting specification
CYCLE 3: Export & eDiscovery Webhooks (~2,000 lines)¶
Topic 5: Export Completion Webhook¶
What will be covered:
- Export Completion Webhook Overview
- Purpose: Notify when export is ready for download
- Use case: Automated eDiscovery workflows
- Trigger: ExportCompleted event
-
Volume: Low (exports are infrequent)
-
Payload Schema
ExportCompletionWebhook: type: object required: - webhookId - webhookType - timestamp - export - signature properties: webhookId: type: string format: uuid webhookType: type: string enum: [export.completed] timestamp: type: string format: date-time export: type: object required: - exportId - status - requestedAt - completedAt properties: exportId: type: string description: Export request identifier status: type: string enum: [Completed, Failed] requestedAt: type: string format: date-time completedAt: type: string format: date-time eventCount: type: integer description: Number of events in export fileSize: type: integer description: Export file size in bytes format: type: string enum: [JSON, CSV, XML] downloadUrl: type: string format: uri description: Signed URL for download (expires in 24 hours) expiresAt: type: string format: date-time description: When download URL expires signature: type: string description: HMAC signature -
Payload Example
{ "webhookId": "webhook-export-123", "webhookType": "export.completed", "timestamp": "2024-10-30T11:00:00.123Z", "export": { "exportId": "export-xyz-789", "status": "Completed", "requestedAt": "2024-10-30T10:00:00.000Z", "completedAt": "2024-10-30T11:00:00.000Z", "eventCount": 15000, "fileSize": 25600000, "format": "JSON", "downloadUrl": "https://exports.atp.example.com/...", "expiresAt": "2024-10-31T11:00:00.000Z" }, "signature": "sha256=xyz123..." } -
Consumer Workflow
- Receive webhook
- Verify signature
- Extract downloadUrl
- Download export file
- Process export data
-
Acknowledge webhook (200 OK)
-
Security Considerations
- Download URL is time-limited (24 hours)
- URL is signed (Azure Blob SAS token)
- HTTPS required for download
- Webhook signature verification required
Code Examples: - Export completion webhook payload - Consumer implementation (download export) - Signature verification - SAS token handling
Diagrams: - Export completion flow - Download workflow - Security model
Deliverables: - Export webhook specification - Consumer workflow guide - Security documentation - Implementation examples
Topic 6: Export Failure Webhook¶
What will be covered:
- Export Failure Webhook
- Purpose: Notify when export fails
- Payload includes error details
-
Consumer can retry or investigate
-
Payload Schema
Complete specification
CYCLE 4: Policy & Alert Webhooks (~1,500 lines)¶
Topic 7: Policy Violation Webhook¶
What will be covered:
- Policy Violation Webhook
- Purpose: Alert on compliance policy breaches
- Triggers: Event violates classification, retention, access policy
-
Payload: Event details, policy violated, violation type
-
Payload Schema
Complete specification
Topic 8: Threshold and System Alert Webhooks¶
What will be covered:
- Threshold Alert Webhook
- Event volume exceeds threshold
- Error rate threshold exceeded
-
Storage quota warnings
-
System Event Webhook
- Service health degradation
- Maintenance notifications
- System updates
Complete specifications
CYCLE 5: Security & Reliability (~2,500 lines)¶
Topic 9: Webhook Security¶
What will be covered:
- HMAC Signature Verification
- Algorithm: HMAC-SHA256
- Signature Format:
sha256={hex_digest} - Header:
X-ATP-Signature - Payload: Request body (raw bytes)
-
Secret: Shared secret from webhook registration
-
Signature Generation (ATP Side)
-
Signature Verification (Consumer Side)
public bool VerifyWebhookSignature( string payload, string receivedSignature, string secret) { var expectedSignature = GenerateHmacSignature(payload, secret); return CryptographicOperations.FixedTimeEquals( Encoding.UTF8.GetBytes(expectedSignature), Encoding.UTF8.GetBytes(receivedSignature) ); } private string GenerateHmacSignature(string payload, string secret) { using var hmac = new HMACSHA256(Encoding.UTF8.GetBytes(secret)); var hash = hmac.ComputeHash(Encoding.UTF8.GetBytes(payload)); return $"sha256={BitConverter.ToString(hash).Replace("-", "").ToLower()}"; }
const crypto = require('crypto');
function verifyWebhookSignature(payload, receivedSignature, secret) {
const expectedSignature = crypto
.createHmac('sha256', secret)
.update(payload)
.digest('hex');
const expectedHeader = `sha256=${expectedSignature}`;
return crypto.timingSafeEqual(
Buffer.from(expectedHeader),
Buffer.from(receivedSignature)
);
}
- Security Best Practices
- Always verify signature: Reject unsigned or invalid signatures
- Use HTTPS only: Never accept webhooks over HTTP
- Timing-safe comparison: Prevent timing attacks
- Rotate secrets regularly: Every 90 days
- Validate timestamp: Reject old webhooks (> 5 minutes)
-
Log all webhooks: For audit and debugging
-
Replay Attack Prevention
- Timestamp validation (reject old webhooks)
- Webhook ID deduplication (track processed IDs)
-
Nonce or sequence number (optional)
-
Secret Management
- Secret stored in Azure Key Vault
- Rotation procedure
- Secret versioning
- Access audit logging
Code Examples: - Signature generation (Python, C#, JavaScript, Go) - Signature verification (all languages) - Timestamp validation - Deduplication logic - Secret rotation procedure
Diagrams: - HMAC signature flow - Security verification pipeline - Replay attack prevention - Secret rotation workflow
Deliverables: - Security implementation guide - Signature verification code (4 languages) - Best practices checklist - Secret management procedures
Topic 10: Webhook Reliability¶
What will be covered:
- Delivery Guarantees
- At-least-once delivery
- Retry on failure
- Timeout handling
-
Circuit breaker (pause failing webhooks)
-
Retry Configuration
- Retry schedule (exponential backoff)
- Max retry attempts (10)
- Backoff algorithm with jitter
- Retry status codes (5xx, timeout)
-
No retry status codes (4xx client errors)
-
Timeout Handling
- Default timeout: 10 seconds
- Configurable per webhook
-
Timeout counts as failure (triggers retry)
-
Circuit Breaker
- Failure threshold: 10 consecutive failures
- Circuit opens: Webhook paused
- Cooldown period: 1 hour
- Circuit half-open: Test with single delivery
-
Circuit closes: Resume normal delivery
-
Idempotency Requirements
- Consumers must handle duplicate deliveries
- Use webhookId for deduplication
-
Idempotency window: 24 hours
-
Delivery Acknowledgment
- Consumer returns 2xx status (success)
- Any other response triggers retry
-
Response body ignored
-
Dead-Letter Queue (DLQ)
- After 10 retries, webhook goes to DLQ
- DLQ monitoring dashboard
- Manual reprocessing procedures
- Root cause analysis
Code Examples: - Retry policy implementation - Circuit breaker code - Idempotent webhook handler - DLQ processor - Timeout configuration
Diagrams: - Retry mechanism with backoff - Circuit breaker state machine - DLQ workflow - Idempotency check flow
Deliverables: - Reliability implementation - Retry policy configuration - Circuit breaker setup - DLQ management guide
CYCLE 6: Testing & Debugging (~2,000 lines)¶
Topic 11: Webhook Testing¶
What will be covered:
- Local Webhook Testing
- Tools: ngrok, localtunnel, webhook.site
- Exposing localhost to internet
- Testing signature verification
-
Testing retry behavior
-
Webhook Replay Tool
- Replaying production webhooks in test
- Replay from DLQ
- Replay specific webhook by ID
-
Replay with modified payload
-
Mock Webhook Receiver
- Test server for webhook development
- Automatic signature verification
- Payload logging
-
Response simulation (success, failure, timeout)
-
Webhook Contract Tests
- Validate payload matches schema
- Test signature generation
- Test retry logic
-
Test timeout handling
-
Integration Testing
- End-to-end webhook delivery tests
- Test all webhook types
- Test failure scenarios
- Test DLQ processing
Code Examples: - Webhook replay script - Mock receiver implementation - Contract test examples - Integration test suite - ngrok setup guide
Diagrams: - Local testing setup - Replay tool architecture - Mock receiver flow - Testing strategy
Deliverables: - Webhook testing guide - Replay tool - Mock receiver - Test framework - Integration test suite
Topic 12: Webhook Debugging and Troubleshooting¶
What will be covered:
- Common Webhook Issues
1. Signature Verification Failures - Issue: Consumer rejects webhook (invalid signature) - Causes: Wrong secret, encoding mismatch, timestamp drift - Resolution: Verify secret, check encoding (UTF-8), sync clocks
2. Delivery Timeouts - Issue: Consumer doesn't respond within 10 seconds - Causes: Slow processing, network issues, endpoint down - Resolution: Optimize processing, increase timeout, scale consumer
3. 4xx Errors from Consumer - Issue: Consumer returns 400, 401, 403, 404 - Causes: Invalid payload, wrong URL, authentication failure - Resolution: Check URL, verify payload, check consumer logs
4. Continuous Retries (Circuit Breaker) - Issue: Webhook keeps retrying and goes to DLQ - Causes: Consumer endpoint down, persistent errors - Resolution: Fix consumer, manual DLQ reprocessing
5. Missing Webhooks - Issue: Expected webhook not received - Causes: Filter mismatch, subscription inactive, rate limiting - Resolution: Verify filters, check subscription status, review logs
- Debugging Tools
- Webhook delivery logs (Azure Monitor)
- Webhook trace viewer (correlation ID)
- Webhook replay tool
-
Signature verification tool
-
Webhook Logs
- Delivery attempts logged
- Response status and latency
- Retry history
-
DLQ entries
-
Troubleshooting Checklist
- Verify webhook subscription is active
- Check endpoint URL is reachable
- Verify HTTPS (not HTTP)
- Test signature verification locally
- Check consumer logs for errors
- Review ATP webhook delivery logs
- Test with webhook replay tool
- Check DLQ for failed deliveries
Code Examples: - Debugging script (check webhook status) - Log query examples (Azure Monitor KQL) - Signature verification test - Webhook replay command
Diagrams: - Troubleshooting decision tree - Debugging workflow - Log analysis flow
Deliverables: - Troubleshooting guide - Common issues catalog - Debugging tools - Runbook for on-call
Webhook Catalog Quick Reference¶
Webhook Types¶
| Webhook Type | Trigger Event | Frequency | Payload Size | Retry |
|---|---|---|---|---|
| event.notification | AuditEventReceived (filtered) | High | 1-10KB | Yes |
| export.completed | ExportCompleted | Low | 1KB | Yes |
| export.failed | ExportFailed | Low | 1KB | Yes |
| policy.violated | Policy violation detected | Medium | 2-5KB | Yes |
| threshold.exceeded | Metric threshold exceeded | Low | 1KB | Yes |
| system.event | Service status change | Very Low | 1KB | Yes |
Webhook Headers¶
All webhooks include these headers:
| Header | Description | Example |
|---|---|---|
X-ATP-Webhook-Id |
Unique webhook delivery ID | webhook-abc-123 |
X-ATP-Webhook-Type |
Type of webhook | event.notification |
X-ATP-Signature |
HMAC-SHA256 signature | sha256=abc123... |
X-ATP-Timestamp |
Webhook generation time | 2024-10-30T10:30:00Z |
X-ATP-Attempt |
Delivery attempt number | 1 (first), 5 (fifth retry) |
Content-Type |
Payload format | application/json |
User-Agent |
ATP webhook delivery agent | ATP-Webhook-Delivery/1.0 |
Retry Schedule¶
| Attempt | Delay | Total Elapsed |
|---|---|---|
| 1 | Immediate | 0s |
| 2 | 30s | 30s |
| 3 | 5m | 5m 30s |
| 4 | 15m | 20m 30s |
| 5 | 1h | 1h 20m 30s |
| 6 | 1h | 2h 20m 30s |
| 7 | 1h | 3h 20m 30s |
| 8 | 1h | 4h 20m 30s |
| 9 | 1h | 5h 20m 30s |
| 10 | 1h | 6h 20m 30s |
| DLQ | After 10 failures | - |
Summary & Implementation Plan¶
Implementation Phases¶
Phase 1: Foundations (Cycle 1) - 1 week - Webhook fundamentals and architecture
Phase 2: Event Webhooks (Cycle 2) - 1.5 weeks - Event notification webhooks
Phase 3: Export Webhooks (Cycle 3) - 1 week - Export completion and failure webhooks
Phase 4: Alerts (Cycle 4) - 0.5 weeks - Policy and threshold webhooks
Phase 5: Security (Cycle 5) - 1.5 weeks - Security and reliability
Phase 6: Testing (Cycle 6) - 1 week - Testing and troubleshooting
Success Metrics¶
- Delivery Success Rate: >99% within 3 attempts
- Average Latency: <500ms delivery time
- DLQ Rate: <0.1% of webhooks
- Signature Verification: 100% webhooks signed
- Consumer Compliance: All consumers verify signatures
- Documentation: All webhook types documented
Ownership & Maintenance¶
- Integration Engineers: Cycles 1-4 (specifications)
- Security Engineers: Cycle 5 (security)
- QA Engineers: Cycle 6 (testing)
- Operations: Monitoring and DLQ management
Document Status: ✅ Plan Approved - Ready for Content Generation
Target Start Date: Q3 2025
Expected Completion: Q3 2025 (6.5 weeks)
Owner: Integration Engineering Team
Last Updated: 2024-10-30