Reliability
PluginHub
Reliable integrations: idempotency, retries, DLQ
Integrations don’t fail as a single bug—they fail as a chain: timeout → retry → duplicate order → manual cleanup. Production integrations need error discipline.
Reliability patterns
- Idempotency: reprocessing the same event doesn’t create duplicates.
- Retries: only retry safe operations; use exponential backoff.
- DLQ: broken messages don’t block the queue.
- Tracing: correlation IDs across logs and events.
- Alerts: spikes in 4xx/5xx, latency, queue depth.
What “idempotent” means in practice
Store a unique key per operation (e.g., order_id + event_type) and make your handler “return success” if the operation was already applied. This turns webhook retries into a non‑issue.
Retry policy (simple and effective)
- Retry network/timeouts and 5xx.
- Don’t blindly retry 4xx—fix the payload or credentials.
- Cap attempts and send to DLQ with context for manual review.
Log payloads (sanitized), validation result, outgoing request, response code, duration, and idempotency key.