How do I keep one noisy log source from starving every other source?

Key the limiter per source by normalized source.ip, tenant ID, or forwarder token so each endpoint draws from its own token budget. A crash-looping agent then throttles only itself while well-behaved sources continue at full speed. A single global bucket is the classic mistake that turns one misconfigured forwarder into a pipeline-wide outage.

What should happen to events that the rate limiter rejects?

Never drop them silently. Tag each rejected or deferred event with a disposition code, the policy applied, the source, and a monotonic timestamp, then persist it to a dead-letter sink with its original payload. That keeps a shed event distinguishable from a lost one, helps analysts spot deliberate flooding meant to mask a quieter intrusion, and allows replay once the quota is corrected or capacity recovers.

Rate Limiting Strategies for SOC Log Pipelines

Rate limiting inside a Security Operations Center is not network traffic shaping borrowed from web APIs; it is a deterministic pipeline governance control that decides which telemetry reaches your detection logic and which is shed, deferred, or backpressured at the edge. It sits inside the broader Log Ingestion & Parsing Workflows pipeline, upstream of every computationally expensive stage, and protects finite parsing, validation, and correlation capacity from being exhausted by a single misbehaving source. Done well, a limiter enforces predictable throughput, isolates noisy endpoints, and preserves CPU and memory for high-fidelity threat detection; done naively — with a global counter and no per-source isolation — it drops the exact authentication burst that signals the attack you most need to see.

Problem Framing

Consider a forwarder fleet feeding a SOC pipeline that sustains roughly 8,000 events/second at steady state, comfortably within the SIEM’s ingest quota and the parser pool’s throughput envelope. At 02:14 a misconfigured EDR agent on a single host enters a crash-loop and begins re-emitting its entire local event journal — 250,000 events in ninety seconds, all from one source.ip. Without rate limiting, those events flow straight into regex parsing and schema validation. The CPU-bound validation stage saturates, the bounded queues behind it fill, producers block, resident memory climbs, and the kernel OOM-killer reaps the collector. Every other source — including a real credential-stuffing wave (T1110 Brute Force) hitting the VPN concentrator at the same moment — loses events during the kill window. The correlation rule that should have fired never sees a complete stream.

A per-source rate limiter resolves this by capping each source’s admission rate independently. The crash-looping agent is throttled to its quota; its excess events are dropped with a labeled disposition or spooled to a dead-letter path, while every well-behaved source continues at full speed. The unit of protection shifts from “the pipeline” to “each source against its own budget,” which is what keeps one noisy endpoint from becoming a self-inflicted denial of service against the whole SOC. This page builds that limiter in Python and wires it into the validation, error-handling, and batching controls the rest of the pipeline expects.

Prerequisites & Environment

The reference implementation targets Python 3.11+ and uses only the standard library for the core limiter (asyncio, dataclasses, time, logging, enum). time.monotonic() is used throughout for refill timing because it is immune to wall-clock adjustments (NTP steps, leap-second smearing) that would otherwise corrupt token accounting. Two optional third-party libraries appear only at the integration edges:

pip install "pydantic>=2.7"    # strict schema validation downstream of the limiter
pip install "prometheus-client>=0.20"  # export limiter counters for dashboards/alerting

Infrastructure assumptions:

A bounded buffer (an in-memory asyncio.Queue, or Kafka/Redpanda for cross-process durability) behind the limiter, so admitted events have somewhere to land without unbounded growth.
A reachable dead-letter sink — an on-disk spool, a separate Kafka topic, or an object-store prefix — for events rejected or deferred by the limiter.
A per-source identity you can key on. Use the normalized source.ip, a tenant ID, or a forwarder token; never key on a field an attacker fully controls if you can avoid it, since that lets them evade their own quota by rotating the value.

Treat the limiter as a long-lived component that shares an event loop with your collectors — the same single-thread asyncio model used by async log batching further down the pipeline.

Architecture Overview

The limiter sits between the network receiver (syslog listener, HTTP webhook, or queue consumer) and the parsing stage. Every arriving event is keyed by source and checked against that source’s bucket. Events with tokens available pass immediately to the bounded queue feeding the parser; events that exhaust their tokens are either deferred into a bounded spill buffer or rejected with a structured disposition code. Critically, the limiter runs before regex parsing and schema validation so the most expensive work is never spent on shed traffic.

Four invariants make the design safe, and every failure mode in the Troubleshooting section is a violation of one of them: (1) the limiter is keyed per source so one source cannot consume another’s budget; (2) every buffer is bounded, so shedding is an explicit labeled decision, never an OOM; (3) the clock is monotonic, so token math cannot be corrupted by wall-clock jumps; and (4) every rejected or deferred event carries a disposition code, so a shed log is distinguishable from a lost one.

Step-by-Step Implementation

1. Model dispositions and per-source state

Start with an explicit disposition enum so every limiter decision is auditable, and a dataclass holding one source’s token state. Keeping per-source state in a plain dict keyed by source identity is what gives each endpoint an isolated budget.

import asyncio
import time
import logging
from dataclasses import dataclass, field
from enum import Enum
from typing import Any

logger = logging.getLogger("soc.rate_limiter")


class Disposition(str, Enum):
    ADMITTED = "admitted"
    DEFERRED = "deferred_backpressure"
    REJECTED = "rejected_quota_exhausted"


@dataclass
class BucketState:
    """Token state for a single source, refilled lazily from a monotonic clock."""
    tokens: float
    last_refill: float

2. Implement the token bucket refill

The token bucket is the workhorse algorithm for SOC telemetry because it tolerates legitimate bursts (a password-reset campaign, a fleet-wide patch reboot) while enforcing a long-run average. Refill lazily on access rather than with a background timer — it is cheaper and avoids a per-source coroutine. The full standalone derivation lives in implementing token bucket rate limiting; here it is folded into a per-source limiter.

@dataclass
class PerSourceRateLimiter:
    """Per-source token-bucket admission control for a SOC ingestion edge."""
    rate: float                 # sustained tokens/sec per source
    capacity: int               # burst ceiling per source
    max_idle_seconds: float = 900.0
    _buckets: dict[str, BucketState] = field(default_factory=dict)

    def _bucket(self, key: str, now: float) -> BucketState:
        b = self._buckets.get(key)
        if b is None:
            b = BucketState(tokens=float(self.capacity), last_refill=now)
            self._buckets[key] = b
        else:
            elapsed = now - b.last_refill
            b.tokens = min(float(self.capacity), b.tokens + elapsed * self.rate)
            b.last_refill = now
        return b

    def evict_idle(self, now: float) -> int:
        """Reclaim memory from sources silent longer than max_idle_seconds."""
        stale = [k for k, b in self._buckets.items()
                 if now - b.last_refill > self.max_idle_seconds]
        for k in stale:
            del self._buckets[k]
        return len(stale)

3. Make the admission decision

A single classify method returns a disposition without doing I/O, which keeps it trivially unit-testable. It deducts a token on admit, and otherwise reports quota exhaustion so the caller can decide between deferral and rejection based on the event’s priority.

    def classify(self, source_key: str, cost: float = 1.0) -> Disposition:
        now = time.monotonic()
        bucket = self._bucket(source_key, now)
        if bucket.tokens >= cost:
            bucket.tokens -= cost
            return Disposition.ADMITTED
        return Disposition.REJECTED

4. Wire admission into a bounded async queue

The limiter only governs the rate; a bounded queue governs the depth. Combining them means a source within its rate still cannot push the pipeline past its memory ceiling — the queue applies backpressure, and the caller downgrades an otherwise-admitted event to DEFERRED rather than blocking the receive loop.

    async def admit(
        self,
        event: dict[str, Any],
        queue: asyncio.Queue,
        priority_floor: int = 7,
    ) -> Disposition:
        key = event.get("source.ip", "unknown")
        decision = self.classify(key)

        if decision is Disposition.ADMITTED:
            try:
                queue.put_nowait(event)
            except asyncio.QueueFull:
                # Shed low-priority telemetry; preserve security-relevant streams.
                if int(event.get("event.severity", 0)) < priority_floor:
                    decision = Disposition.DEFERRED
                else:
                    queue.put_nowait  # high-severity: caller must drain & retry
                    raise

        logger.log(
            logging.INFO if decision is Disposition.ADMITTED else logging.WARNING,
            "rate_limit_decision",
            extra={
                "disposition": decision.value,
                "source": key,
                "event_type": event.get("event.action"),
                "queue_depth": queue.qsize(),
            },
        )
        return decision

5. Run the receive loop

The receiver pulls raw events, applies admission, and routes rejected or deferred events to the dead-letter path. Periodic idle eviction keeps the per-source state table from growing unbounded across the sprawl of one-off source addresses.

async def receive_loop(
    limiter: PerSourceRateLimiter,
    source: asyncio.Queue,        # raw inbound events
    parse_queue: asyncio.Queue,   # bounded; feeds the parser
    dlq: asyncio.Queue,           # dead-letter sink
) -> None:
    last_evict = time.monotonic()
    while True:
        event = await source.get()
        decision = await limiter.admit(event, parse_queue)
        if decision is not Disposition.ADMITTED:
            event["_disposition"] = decision.value
            await dlq.put(event)
        now = time.monotonic()
        if now - last_evict > 60.0:
            limiter.evict_idle(now)
            last_evict = now
        source.task_done()

Schema & Validation Integration

The limiter keys on normalized fields — source.ip, event.severity, event.action — which only exist if a parsing pass has already mapped raw telemetry onto Elastic Common Schema (ECS). In practice the edge limiter does a lightweight extraction (enough to identify the source and a coarse priority) and admitted events still flow through the full schema validation pipeline before they reach correlation. That ordering is deliberate: rate limiting is the cheap gate that protects the expensive Pydantic-based ECS validation from being overwhelmed by malformed or adversarial input. A flood of non-conformant payloads that would otherwise spike validation latency is capped at the source’s quota first, so CPU stays reserved for well-formed events that contribute to detection logic.

When the limiter shares a priority signal with validation, use the same ECS event.severity scale on both sides so a high-severity stream is never the one shed under pressure. Events that the limiter admits but validation later rejects as malformed are early signals of vendor schema drift — route those into the error categorization framework so the drift is classified and surfaced rather than silently absorbed into the deferral counters.

Error Handling & DLQ Routing

Every limiter decision that is not a clean admit must produce a labeled, replayable record. Rejecting or deferring a log without metadata is indistinguishable from losing it — and an attacker who can trigger silent suppression (a deliberate flood to mask a quieter intrusion under T1499 Endpoint Denial of Service) gains exactly the blind spot the SOC cannot afford. Tag each shed event with the policy applied, the source identifier, the disposition, and the monotonic timestamp, then persist it with its original payload so a corrected quota or a recovered downstream lets you replay it without data loss. Codes follow the established ERR_CATEGORY_NNN convention:

Error code	Meaning	Recovery action
`ERR_RATELIMIT_001`	Source exceeded its sustained token rate	DLQ with disposition `rejected_quota_exhausted`; replay after quota review
`ERR_RATELIMIT_002`	Bounded parse queue full; low-priority event deferred	Spill buffer; re-enqueue when `queue.qsize()` drops below the high-water mark
`ERR_RATELIMIT_003`	Per-source state table hit its cardinality cap	Evict idle buckets; alert on possible source-spoofing (`source.ip` churn)
`ERR_BACKPRESSURE_004`	High-severity event could not be admitted	Page on-call; never silently drop — block or escalate instead

The non-negotiable rule for SOC pipelines is that backpressure degrades gracefully, never catastrophically. Shedding clearly-labeled low-priority telemetry (ERR_RATELIMIT_002) while preserving security-relevant streams is correct; an unbounded buffer that defers the decision until the OOM-killer makes it is not.

Performance Tuning

Three parameters govern the throughput/burst/memory triangle, and each must be tuned against a measured baseline rather than a round number:

rate (tokens/sec per source) sets the sustained ceiling. Derive it from p95 events/second per source over a representative week, not the global average — a chatty domain controller and a quiet IoT sensor need different budgets. Set it too low and legitimate bursts are shed as ERR_RATELIMIT_001; too high and the limiter stops protecting anything.
capacity (burst ceiling per source) absorbs legitimate spikes. A useful starting point is rate × expected_burst_seconds — for a source that legitimately bursts for ~5 seconds during a reboot, capacity = rate × 5. This is the single knob that distinguishes the token bucket from a fixed-window counter, which would either reject the whole burst or allow a double-rate spike at the window edge.
max_queue_size on the parse queue is your memory ceiling. Worst-case resident memory is roughly max_queue_size × avg_event_bytes; a 10,000-event queue of ~1 KB events costs ~10 MB. Size it from host headroom and pair it with the spill policy above.

Per-source state itself costs memory: each BucketState is two floats plus a dict entry, so a million distinct sources is on the order of low hundreds of MB — cap cardinality and evict idle buckets to keep it bounded. Because the limiter is pure CPU and asyncio multiplexes I/O on one thread, a single instance comfortably classifies well over 100,000 events/second; partition by source across instances only when validation or parsing — not the limiter — becomes the bottleneck. Normalizing heterogeneous inputs to uniform dicts upstream, as the CSV ingestion patterns guidance describes, keeps the classify path allocation-light.

Verification & Observability

Confirm correct operation along three axes — conservation, shaping, and isolation:

Event conservation. Over any window, admitted + deferred + rejected must equal events received. A mismatch means an event leaked out of the disposition model. Assert it directly:

def test_conservation():
    lim = PerSourceRateLimiter(rate=10.0, capacity=10)
    counts = {d: 0 for d in Disposition}
    for _ in range(100):
        counts[lim.classify("10.0.0.5")] += 1
    assert sum(counts.values()) == 100
    # With capacity 10 and no refill, exactly 10 admits then rejects
    assert counts[Disposition.ADMITTED] == 10
    assert counts[Disposition.REJECTED] == 90

Shaping. Sample admitted-events-per-second per source during a synthetic flood; it must converge to rate, never exceed it for more than the capacity burst window. Export the rate_limit_decision counters to Prometheus and chart admit-vs-reject ratios per source.
Isolation. Drive one source far past its quota while a second stays at baseline; the second source’s admit rate must be unaffected. This is the test that proves a noisy endpoint cannot starve a quiet one.

The structured rate_limit_decision log lines carry disposition, source, and queue_depth, so you can build the admit/defer/reject dashboard straight from the limiter’s own telemetry — and alert when any single source’s reject ratio crosses a threshold, which is itself a useful signal of a misconfigured forwarder or an active flood.

Troubleshooting

A legitimate burst is being rejected as ERR_RATELIMIT_001. capacity is too small to absorb the source’s real spike. Raise capacity toward rate × expected_burst_seconds; do not raise rate, which would loosen the sustained ceiling you actually want to keep.
Memory climbs steadily even though admit rates look normal. The per-source state table is unbounded — a churning or spoofed source.ip is creating a new bucket per event. Confirm evict_idle is running and cap cardinality; investigate the source.ip churn as a possible evasion attempt.
Tokens appear to refill erratically or grant double bursts. The code is using time.time() somewhere instead of time.monotonic(), so an NTP step corrupted the elapsed calculation. Audit every timestamp on the refill path.
High-severity events are being silently dropped under load. The priority_floor shedding logic is mis-ordered and is treating critical streams as low priority. Verify the ECS event.severity mapping and ensure high-severity events escalate (ERR_BACKPRESSURE_004) rather than defer.
The whole pipeline throttles when only one source misbehaves. The limiter is keyed globally instead of per source — a single shared bucket. Confirm classify keys on source.ip (or tenant/token), not a constant.

FAQ

Why a token bucket instead of a fixed-window counter for SOC telemetry?

A fixed-window counter resets abruptly at interval boundaries, which both rejects a legitimate burst that happens to straddle a reset and permits a double-rate spike across two adjacent windows. The token bucket maintains a continuous refill and a hard capacity ceiling, so it tolerates real security bursts — a password-reset wave, a fleet reboot — up to capacity while still enforcing the long-run rate. That burst tolerance is exactly what keeps forensic context intact during the spikes you most need to capture.

Should rate limiting run before or after parsing and schema validation?

Before. Parsing and validation are the most CPU-intensive stages, so spending them on traffic you are about to shed is wasted work and the fastest path to saturation. Run a lightweight extraction to identify the source and a coarse priority, make the admission decision, and only then send admitted events through full ECS validation. The limiter is the cheap gate that protects the expensive one.

How do I keep one noisy source from starving every other source?

Key the limiter per source — by normalized source.ip, tenant ID, or forwarder token — so each endpoint draws from its own token budget. A crash-looping agent then throttles only itself; every well-behaved source continues at full speed. A single global bucket is the classic mistake that turns one misconfigured forwarder into a pipeline-wide outage.

What should happen to events the limiter rejects?

Never drop them silently. Tag each rejected or deferred event with a disposition code, the policy applied, the source, and a monotonic timestamp, then persist it to a dead-letter sink with its original payload. That makes a shed event distinguishable from a lost one, lets analysts spot deliberate flooding meant to mask a quieter intrusion, and allows replay once the quota is corrected or downstream capacity recovers.

Log Ingestion & Parsing Workflows — parent architecture
Implementing token bucket rate limiting
Async Log Batching
Schema Validation Pipelines
Error Categorization Frameworks

Rate Limiting Strategies for SOC Log Pipelines

Explore deeper

Related guides