MITRE ATT&CK Integration for Alert Correlatio…

Q: Should the engine map to techniques or sub-techniques?

Map to the most specific sub-technique you can observe and record the parent ID alongside it. Storing both lets rule authors choose granularity: a precise rule matches the sub-technique while a broad coverage rule matches the parent and still catches every sub-technique under it. Collapsing to the parent at write time discards signal you cannot recover.

Q: How do I keep mappings current as MITRE ATT&CK releases new versions?

Pin a specific enterprise-attack.json version, record it in pipeline metadata, and treat an upgrade as a reviewed change. Re-run mapping fixtures and sequence-ordering tests in CI against the new bundle so you see which technique IDs were added, renamed, or revoked before it reaches production.

Q: Does ATT&CK integration satisfy any compliance requirement?

It operationalizes the analysis expectation in NIST SP 800-92, which requires organizations to analyze and correlate log data rather than merely retain it. Mapping detections to ATT&CK also produces a measurable coverage matrix reviewable against NIST SP 800-94 intrusion-analysis guidance.

Mapping telemetry to the MITRE ATT&CK framework turns a stream of disconnected detections into a tactic-ordered narrative of adversary behavior. Without it, a correlation engine can tell you that powershell.exe ran and that an outbound connection opened, but not that the two events together describe Execution followed by Command-and-Control. ATT&CK integration is the layer that attaches a stable technique identifier to every normalized event, so that sequence rules can reason about what the adversary is trying to achieve rather than which process name happened to appear. This page is part of the broader Alert Correlation & Rule Engines pillar and sits between normalization and scoring: it consumes the canonical event contract enforced upstream and hands tactic-tagged events to the correlation buffer, where they feed both dynamic severity scoring and cross-source event linking.

Problem Framing

Consider a mid-size enterprise SOC ingesting roughly 40,000 EDR and authentication events per second across Windows endpoints, a cloud identity provider, and network sensors. A credential-theft campaign generates, in order: a valid login from an unusual ASN (T1078, Valid Accounts), a rundll32.exe invocation of a renamed comsvcs.dll to dump LSASS (T1003.001, LSASS Memory), and an SMB connection to a domain controller (T1021.002, SMB/Windows Admin Shares). Each event, evaluated in isolation, is low-severity and individually plausible as benign administration. The campaign is only visible as the ordered progression across three ATT&CK tactics — Initial Access, Credential Access, Lateral Movement.

The concrete failure this page resolves is twofold. First, vendor telemetry arrives with inconsistent or absent technique labels: one EDR tags the LSASS dump as T1003, another as T1003.001, a third as a vendor-specific string. Second, even when technique IDs are present, a naive engine treats them as opaque keywords and cannot tell that T1003.001 is a sub-technique of T1003, or that Credential Access (TA0006) normally precedes Lateral Movement (TA0008) in a kill chain. The result is either missed multi-stage detections or an avalanche of single-technique alerts that recreate the alert fatigue the correlation layer exists to eliminate. ATT&CK integration fixes both by normalizing every event to a canonical technique ID, resolving sub-technique hierarchy, and ordering techniques by tactic so sequence rules can fire on progression instead of coincidence.

Prerequisites & Environment

The implementation targets Python 3.11+ (for tomllib and improved asyncio semantics) and depends on a small, audited dependency set. The ATT&CK content itself is the official MITRE ATT&CK STIX bundle, distributed as the enterprise-attack.json STIX 2.1 file.

python3 -m venv .venv && source .venv/bin/activate
pip install pydantic==2.* mitreattack-python==3.*
# enterprise-attack.json is fetched once and pinned by version for reproducible mappings

Infrastructure assumptions:

The engine receives events that have already passed the upstream schema validation pipelines, so fields such as process.name, process.command_line, user.name, and @timestamp are present and typed.
A pinned copy of enterprise-attack.json is mounted read-only; the ATT&CK version (e.g. v15.1) is recorded in pipeline metadata so a mapping change is auditable rather than silently rolled forward.
Per-entity correlation state is persisted in a low-latency store (Redis or an embedded key-value buffer) so technique sequences survive a worker restart mid-campaign.

ATT&CK content is versioned. Pinning a release and treating an upgrade as a reviewed change is what keeps detection coverage measurable over time, the same config-as-code discipline applied to detection rules elsewhere in the correlation engine.

Architecture Overview

The integration is a three-stage transform inserted between normalization and the correlation buffer. The technique mapper resolves observable signatures (process name, command-line pattern, registry path, cloud API call) to a canonical ATT&CK technique ID and collapses sub-techniques to their parent where a rule operates at parent granularity. The tactic tagger loads the STIX bundle once and resolves each technique to its tactic(s), attaching the kill-chain phase. The sequence buffer keys events by resolved entity and holds a time-bounded, tactic-ordered window so a rule can match progression across tactics.

Step-by-Step Implementation

Step 1 — Load the ATT&CK content and build a tactic index

Parse the pinned STIX bundle once at startup and build an in-memory index mapping each technique ID to its name, parent (for sub-techniques), and ordered tactics. Loading per event would be catastrophic for throughput, so this is a cold-start cost amortized across the worker’s lifetime.

from __future__ import annotations

import json
import logging
from dataclasses import dataclass, field
from typing import Optional

logger = logging.getLogger("attck_integration")


@dataclass(frozen=True)
class TechniqueMeta:
    """Canonical metadata for a single ATT&CK technique."""

    technique_id: str          # e.g. "T1003.001"
    name: str                  # e.g. "LSASS Memory"
    parent_id: Optional[str]   # "T1003" for a sub-technique, else None
    tactics: tuple[str, ...]   # ordered tactic shortnames, e.g. ("credential-access",)


class AttckIndex:
    """Loads the enterprise-attack STIX 2.1 bundle into a fast lookup table."""

    # Kill-chain ordering used to detect tactic *progression*, not just presence.
    TACTIC_ORDER: tuple[str, ...] = (
        "initial-access", "execution", "persistence", "privilege-escalation",
        "defense-evasion", "credential-access", "discovery", "lateral-movement",
        "collection", "command-and-control", "exfiltration", "impact",
    )

    def __init__(self) -> None:
        self._by_id: dict[str, TechniqueMeta] = {}

    def load(self, stix_path: str) -> "AttckIndex":
        with open(stix_path, encoding="utf-8") as fh:
            bundle = json.load(fh)
        for obj in bundle.get("objects", []):
            if obj.get("type") != "attack-pattern" or obj.get("revoked"):
                continue
            ext = next(
                (r for r in obj.get("external_references", [])
                 if r.get("source_name") == "mitre-attack"),
                None,
            )
            if not ext or "external_id" not in ext:
                continue
            tid = ext["external_id"]
            tactics = tuple(
                p["phase_name"] for p in obj.get("kill_chain_phases", [])
                if p.get("kill_chain_name") == "mitre-attack"
            )
            self._by_id[tid] = TechniqueMeta(
                technique_id=tid,
                name=obj.get("name", ""),
                parent_id=tid.split(".")[0] if "." in tid else None,
                tactics=tactics,
            )
        logger.info("ATT&CK index loaded | techniques=%d", len(self._by_id))
        return self

    def get(self, technique_id: str) -> Optional[TechniqueMeta]:
        return self._by_id.get(technique_id)

    def tactic_rank(self, tactic: str) -> int:
        """Position of a tactic in the kill chain; -1 if unknown."""
        try:
            return self.TACTIC_ORDER.index(tactic)
        except ValueError:
            return -1

Step 2 — Map observables to canonical technique IDs

The mapper translates raw observables into technique IDs. In production this table is config-as-code (a reviewed YAML/JSON file, not inline literals), and command-line regexes capture renamed-binary evasion. Anything unmatched returns None so it can be routed to the DLQ rather than silently mislabeled.

import re
from typing import NamedTuple


class TechniqueHit(NamedTuple):
    technique_id: str
    confidence: float   # 0.0..1.0; low-confidence hits can be down-weighted downstream


# Loaded from reviewed config; literals shown for a runnable example.
PROCESS_SIGNATURES: dict[str, TechniqueHit] = {
    "powershell.exe": TechniqueHit("T1059.001", 0.6),   # Command and Scripting: PowerShell
    "cmd.exe": TechniqueHit("T1059.003", 0.5),          # Windows Command Shell
    "mimikatz.exe": TechniqueHit("T1003.001", 0.95),    # LSASS Memory
}

COMMANDLINE_SIGNATURES: list[tuple[re.Pattern[str], TechniqueHit]] = [
    # Renamed comsvcs.dll LSASS dump — evades process-name matching.
    (re.compile(r"comsvcs\.dll.*MiniDump", re.IGNORECASE), TechniqueHit("T1003.001", 0.9)),
    (re.compile(r"\b(whoami|net\s+group|nltest)\b", re.IGNORECASE), TechniqueHit("T1087", 0.4)),
]


def map_observable(process_name: str, command_line: str) -> Optional[TechniqueHit]:
    """Resolve a process + command line to a single highest-confidence technique hit."""
    candidates: list[TechniqueHit] = []
    sig = PROCESS_SIGNATURES.get(process_name.lower())
    if sig:
        candidates.append(sig)
    for pattern, hit in COMMANDLINE_SIGNATURES:
        if pattern.search(command_line):
            candidates.append(hit)
    if not candidates:
        return None
    return max(candidates, key=lambda h: h.confidence)

Step 3 — Tag events with tactic metadata and buffer them per entity

Each mapped event is enriched with its tactic and appended to a time-bounded, size-capped window keyed by the resolved entity. Sub-techniques are collapsed to their parent so a rule written against T1003 (OS Credential Dumping) still matches a T1003.001 observation.

import time
from dataclasses import dataclass, field


@dataclass
class TaggedEvent:
    entity_id: str
    timestamp: float
    technique_id: str
    parent_id: str
    tactics: tuple[str, ...]
    confidence: float


class TacticSequenceBuffer:
    WINDOW_SECONDS: int = 3600       # multi-stage campaigns unfold over minutes to an hour
    WINDOW_MAX_EVENTS: int = 500

    def __init__(self, index: AttckIndex) -> None:
        self._index = index
        self._state: dict[str, list[TaggedEvent]] = {}

    def tag_and_buffer(
        self, entity_id: str, hit: TechniqueHit, ts: float
    ) -> Optional[TaggedEvent]:
        meta = self._index.get(hit.technique_id)
        if meta is None:
            logger.warning("Unmapped technique | code=ERR_ATTCK_001 | id=%s", hit.technique_id)
            return None
        event = TaggedEvent(
            entity_id=entity_id,
            timestamp=ts,
            technique_id=meta.technique_id,
            parent_id=meta.parent_id or meta.technique_id,
            tactics=meta.tactics,
            confidence=hit.confidence,
        )
        window = self._state.setdefault(entity_id, [])
        window.append(event)
        cutoff = ts - self.WINDOW_SECONDS
        window = [e for e in window if e.timestamp > cutoff]
        if len(window) > self.WINDOW_MAX_EVENTS:
            window = window[-self.WINDOW_MAX_EVENTS:]
            logger.warning("Window capped | code=ERR_ATTCK_020 | entity=%s", entity_id)
        self._state[entity_id] = window
        return event

    def window(self, entity_id: str) -> list[TaggedEvent]:
        return self._state.get(entity_id, [])

Step 4 — Evaluate ordered tactic-progression rules

A sequence rule fires only when the required tactics are observed in kill-chain order within the window. Ordering rules out the common false positive where Credential Access and Lateral Movement co-occur for legitimate administrative reasons but in a benign sequence.

@dataclass(frozen=True)
class SequenceRule:
    rule_id: str
    name: str
    required_tactics: tuple[str, ...]   # must appear in this kill-chain order
    base_severity: int


class AttckRuleEvaluator:
    def __init__(self, index: AttckIndex, buffer: TacticSequenceBuffer) -> None:
        self._index = index
        self._buffer = buffer

    def evaluate(self, rule: SequenceRule, entity_id: str) -> Optional[dict[str, object]]:
        window = self._buffer.window(entity_id)
        # First-seen timestamp per required tactic within the window.
        seen: dict[str, float] = {}
        for ev in window:
            for tac in ev.tactics:
                if tac in rule.required_tactics and tac not in seen:
                    seen[tac] = ev.timestamp
        if len(seen) != len(rule.required_tactics):
            return None
        # Enforce kill-chain ordering by timestamp.
        ordered = [seen[t] for t in rule.required_tactics]
        if ordered != sorted(ordered):
            return None
        techniques = sorted({ev.technique_id for ev in window
                             if any(t in rule.required_tactics for t in ev.tactics)})
        return {
            "rule_id": rule.rule_id,
            "alert_type": rule.name,
            "entity_id": entity_id,
            "tactics": list(rule.required_tactics),
            "techniques": techniques,
            "base_severity": rule.base_severity,
            "timestamp": time.time(),
        }


if __name__ == "__main__":
    logging.basicConfig(level=logging.INFO)
    index = AttckIndex().load("enterprise-attack.json")
    buffer = TacticSequenceBuffer(index)
    evaluator = AttckRuleEvaluator(index, buffer)

    rule = SequenceRule(
        rule_id="SEQ-CRED-LATERAL-001",
        name="credential_theft_to_lateral_movement",
        required_tactics=("credential-access", "lateral-movement"),
        base_severity=70,
    )
    now = time.time()
    for proc, cmd, offset in [
        ("rundll32.exe", "comsvcs.dll, MiniDump 624 lsass.dmp full", 0),
        ("net.exe", "use \\\\dc01\\C$", 30),
    ]:
        hit = map_observable(proc, cmd)
        if hit:
            buffer.tag_and_buffer("host:WS-42|user:svc_admin", hit, now + offset)
    alert = evaluator.evaluate(rule, "host:WS-42|user:svc_admin")
    print(json.dumps(alert, indent=2))

The enriched alert carries base_severity, the matched techniques, and the ordered tactics, which is exactly the contract the dynamic severity scoring stage consumes to weight the alert against asset criticality before routing.

Schema & Validation Integration

ATT&CK enrichment writes back into the same canonical event contract the rest of the pipeline enforces, so downstream stages need no special-case handling. Three fields are added to each normalized event, aligned to the Elastic Common Schema (ECS) threat field set so the data is portable across SIEMs:

ECS field	Type	Source
`threat.technique.id`	keyword	Resolved canonical technique (e.g. `T1003.001`)
`threat.technique.name`	keyword	Technique name from the STIX index
`threat.tactic.name`	keyword	Kill-chain phase (e.g. `Credential Access`)

The mapper never invents a label. A technique ID that is absent from the pinned STIX index — because a vendor sent a revoked or future ID — is rejected rather than written through, which keeps the enrichment provably consistent with one ATT&CK version. This is the same fail-closed contract established by the upstream JSON event normalization layer: an event that cannot be cleanly enriched goes to the DLQ, not to the correlation buffer with a guessed tag. Validating the technique-ID format (^T\d{4}(\.\d{3})?$) at the enrichment boundary catches a surprising share of upstream parser regressions before they pollute correlation state.

Error Handling & DLQ Routing

ATT&CK integration introduces failure modes specific to mapping and content versioning, each routed to the dead-letter queue with a deterministic code so it is machine-routable and replayable after a fix. These extend the shared taxonomy used by the error categorization frameworks so one set of codes spans the whole pipeline.

Code	Meaning	Action
`ERR_ATTCK_001`	Observable mapped to a technique ID absent from the index	Route to DLQ; flag as unmapped-technique signal
`ERR_ATTCK_002`	Malformed technique ID (failed `^T\d{4}(\.\d{3})?$`)	Reject to DLQ; raise parser-regression alert
`ERR_ATTCK_010`	STIX bundle load/parse failure at startup	Fail readiness probe; refuse to serve with stale/empty index
`ERR_ATTCK_020`	Per-entity technique window exceeded size cap	Summarize + evict oldest; emit metric
`ERR_ATTCK_030`	Tactic resolution returned no kill-chain phase	Tag technique, skip sequence rules; emit low-confidence metric

The most important recovery path is ERR_ATTCK_001. A rising rate of unmapped observables is not a bug to suppress — it is a coverage gap. Those DLQ records are the raw material for new signatures, and routing them to a review queue (rather than discarding them) is how the detection surface grows. ERR_ATTCK_010 must fail closed: a worker that cannot load the STIX bundle should fail its readiness probe rather than run with an empty index and silently tag every event as unmapped.

Performance Tuning

The STIX index holds roughly 600–800 enterprise techniques and sub-techniques — a few megabytes resident — so the mapping lookup is an O(1) dictionary hit with negligible per-event cost. The real tuning levers are the sequence buffer and the regex layer.

Window depth vs. memory. A 1-hour, 500-event-capped window per entity keeps the footprint flat under campaign-grade volume. Multi-stage rules rarely need more than an hour of dwell; widen the window only for slow, low-and-slow threats and pay for it with a lower event cap. Validate the chosen window against historical incidents through threshold tuning strategies rather than guessing.
Regex compilation. Command-line signatures are the hot path. Compile every pattern once at startup (as shown), never inside the per-event loop, and prefer anchored patterns. Uncompiled or backtracking-prone regexes are the single most common cause of CPU blowups in this stage.
Batching and concurrency. Enrichment is CPU-bound, not I/O-bound (the STIX index is in memory), so it parallelizes by entity partition rather than by async I/O. Partition state by entity key so a window never spans two workers, then scale workers horizontally. A single worker should sustain 10,000–20,000 enrichments/sec; sequence evaluation against a capped window adds low single-digit microseconds per event.
Latency target. Keep added p99 enrichment latency under 2 ms so ATT&CK tagging never becomes the bottleneck ahead of scoring and routing.

Verification & Observability

Confirm correct operation at three layers: mapping, tagging, and sequence firing.

Unit assertions on mapping. Replay a fixture of known observables and assert the exact technique ID and confidence, including the renamed-binary case (comsvcs.dll → T1003.001). A mapping change that alters an existing assertion must be a reviewed pull request.
Sub-technique collapse. Assert that a T1003.001 (LSASS Memory) event satisfies a rule written against T1003 (OS Credential Dumping) via parent_id, so rule authors can choose their granularity.
Sequence ordering. Feed Credential Access after Lateral Movement and assert the rule does not fire; feed them in kill-chain order and assert it does. Order-sensitivity is the property most likely to regress silently.
Structured metrics. Emit attck_unmapped_total (drives ERR_ATTCK_001 review), attck_window_capped_total, attck_index_techniques (sanity-check the loaded count after an ATT&CK upgrade), and attck_enrich_latency_seconds. A sudden jump in unmapped rate after a vendor update is the canonical early warning of a parser drift.

def assert_mapping_fixtures() -> None:
    hit = map_observable("rundll32.exe", "comsvcs.dll, MiniDump 624 lsass.dmp full")
    assert hit is not None and hit.technique_id == "T1003.001", "LSASS dump must map to T1003.001"
    assert map_observable("explorer.exe", "") is None, "benign process must not map"
    print("mapping fixtures: PASS")

Troubleshooting

Every event tags as unmapped (ERR_ATTCK_001 flood). Root cause: the STIX bundle failed to load (ERR_ATTCK_010) and the index is empty, but the worker started anyway. Fix: gate the readiness probe on a non-zero attck_index_techniques count so the worker refuses traffic until the bundle is loaded.
Sub-technique rules silently miss. Root cause: rules are written against parent IDs (T1003) but the matcher compares the full technique_id (T1003.001). Fix: match on parent_id for parent-level rules, as the buffer already records it.
Sequence rule fires on benign admin activity. Root cause: the rule checks tactic presence, not order. Fix: enforce kill-chain ordering by first-seen timestamp (Step 4), so co-occurring-but-out-of-order tactics do not trigger.
CPU saturates under load. Root cause: command-line regexes are compiled per event or backtrack catastrophically. Fix: compile patterns once at startup and anchor them; profile the slowest pattern with re.DEBUG.
Mapping drifts after an ATT&CK upgrade. Root cause: the pinned enterprise-attack.json was rolled forward without re-running fixtures. Fix: treat the ATT&CK version as a reviewed dependency, re-run mapping assertions in CI, and record the version in pipeline metadata.

FAQ

Should the engine map to techniques or sub-techniques?

Map to the most specific sub-technique you can observe (e.g. T1003.001, LSASS Memory rather than bare T1003, OS Credential Dumping), and record the parent ID alongside it. Storing both lets rule authors choose granularity: a precise rule matches the sub-technique, while a broad coverage rule matches the parent and still catches every sub-technique under it. Collapsing to the parent at write time throws away signal you cannot recover; keeping both costs one extra keyword field.

How do I keep mappings current as MITRE ATT&CK releases new versions?

Pin a specific enterprise-attack.json version, record it in pipeline metadata, and treat an upgrade as a reviewed change rather than an automatic pull. Re-run your mapping fixtures and sequence-ordering tests in CI against the new bundle so you see exactly which technique IDs were added, renamed, or revoked before it reaches production. A revoked ID that vendors still emit should be handled explicitly rather than silently dropped to the DLQ.

Where does ATT&CK enrichment sit relative to scoring and Sigma rules?

Enrichment runs after normalization and before correlation, attaching threat.technique.* and threat.tactic.* fields so both scoring and sequence rules can consume them. Portable detections expressed in Sigma carry ATT&CK tags natively; converting them is covered in mapping Sigma rules to MITRE ATT&CK techniques. The enriched technique and tactic then flow into dynamic severity scoring, which weights the alert against asset and identity context.

Does ATT&CK integration satisfy any compliance requirement?

It operationalizes the analysis expectation in NIST SP 800-92 (Guide to Computer Security Log Management), which requires that organizations analyze and correlate log data rather than merely retain it. Mapping detections to ATT&CK also produces a measurable coverage matrix that auditors and risk teams can review against NIST SP 800-94 intrusion-analysis guidance, turning “we collect logs” into “we detect these specific adversary techniques.”

MITRE ATT&CK Integration for Alert Correlation Pipelines

Explore deeper

Related guides