Why quarantine deprecated techniques instead of passing them through?

A revoked or deprecated ID means ATT&CK has restructured that behaviour, so the tag no longer points at current guidance and the coverage matrix overstates detection. ERR_SIGMA_005 forces a human to re-map the rule to the superseding technique, and because some vendors keep emitting the old ID it is handled explicitly rather than dropped.

Mapping Sigma Rules to MITRE ATT&CK Techniques

Q: Should I map a rule to the sub-technique or the parent technique?

Tag the most specific sub-technique you can detect, such as T1003.001 LSASS Memory, and record the parent T1003 alongside it during enrichment. Keeping both lets a precise rule match the sub-technique while a broad coverage rule matches the parent and still catches every sub-technique under it.

When Sigma rules reach a correlation engine without verified technique metadata, every rule hit becomes an isolated event the engine cannot order, weight, or chain into a campaign. This page is a focused procedure within the broader MITRE ATT&CK integration layer of the Alert Correlation & Rule Engines pipeline: it resolves each Sigma rule’s attack.* tags against the official ATT&CK STIX bundle, rejects unmapped or deprecated rules at build time, and emits the tactic-aware correlation_weight that downstream scoring and sequencing depend on.

Root-Cause Context

Sigma is deliberately log-agnostic, so a rule’s only link to ATT&CK is a free-text tags list such as attack.t1059.001. Three properties of that contract cause the failure this page addresses. First, the tags are unvalidated strings: a typo (attack.t1509), a vendor-specific label, or a sub-technique that ATT&CK has since revoked all pass YAML parsing untouched and only surface as a missing detection months later. Second, the tag carries no tactic. The engine sees t1059.001 but not that it belongs to Execution (TA0002), so it cannot reason about kill-chain order or assign a phase-based weight. Third, rule sets drift: when several open-source repositories and vendor packs are merged, the same technique is tagged inconsistently — attack.t1003 in one rule, attack.t1003.001 in another — which fragments coverage metrics and duplicates alerts.

Left unresolved, those gaps push raw, context-poor signals into the correlation buffer, where they bypass tactic-ordered sequence rules and inflate the false-positive volume that threshold tuning strategies then have to absorb. The fix is to treat Sigma-to-ATT&CK resolution as a deterministic validation gate that runs before a rule is ever compiled into production routing — the same config-as-code discipline applied across the correlation pillar.

Prerequisites

The implementation targets Python 3.11+ (for dict[str, ...] builtins generics, X | None unions, and dataclass(slots=True)) and depends only on a YAML parser plus the standard library:

python3 -m venv .venv && source .venv/bin/activate
pip install "PyYAML>=6.0"

Configuration assumptions:

A directory of Sigma rule files (./sigma_rules/*.yml), each a valid Sigma document with a tags list.
A pinned copy of the MITRE ATT&CK enterprise STIX bundle (enterprise-attack.json). The loader below fetches it once over HTTPS for a build runner, but in production you mount a version-pinned file read-only so a mapping change is a reviewed event, not a silent upstream pull.
The resolver runs inside CI, before rules are compiled and shipped — quarantined rules fail the build rather than degrade live routing.

Production-Ready Implementation

The module below builds an external_id → Technique index from the STIX bundle, then resolves every rule’s attack.* tags concurrently with asyncio, offloading blocking file and network I/O to worker threads so a large rule set validates without serializing on disk reads. Each rule yields a typed MappingResult carrying either validated technique metadata and a tactic-derived correlation_weight, or a structured ERR_SIGMA_* code explaining why it was quarantined.

from __future__ import annotations

import asyncio
import json
import urllib.request
from dataclasses import asdict, dataclass, field
from pathlib import Path
from typing import Final

import yaml  # PyYAML

ATTACK_STIX_URL: Final[str] = (
    "https://raw.githubusercontent.com/mitre-attack/attack-stix-data/"
    "master/enterprise-attack/enterprise-attack.json"
)
SIGMA_DIR: Final[Path] = Path("./sigma_rules")
OUTPUT_DIR: Final[Path] = Path("./enriched_rules")

# Kill-chain tactic -> correlation weight (1-10); higher tactics escalate faster.
TACTIC_WEIGHTS: Final[dict[str, int]] = {
    "initial-access": 9, "execution": 8, "persistence": 7,
    "privilege-escalation": 8, "defense-evasion": 6, "credential-access": 9,
    "discovery": 4, "lateral-movement": 8, "collection": 5,
    "command-and-control": 8, "exfiltration": 9, "impact": 9,
    "reconnaissance": 3, "resource-development": 3,
}


@dataclass(frozen=True, slots=True)
class Technique:
    technique_id: str   # canonical external_id, e.g. "T1059.001"
    name: str
    tactic: str         # kill_chain phase_name, e.g. "execution"
    deprecated: bool


@dataclass(slots=True)
class MappingResult:
    rule: str
    status: str                       # "validated" | "quarantined"
    error_code: str | None = None
    techniques: list[dict[str, str]] = field(default_factory=list)
    correlation_weight: int = 0


def build_technique_index(stix: dict[str, object]) -> dict[str, Technique]:
    """Index ATT&CK attack-patterns by lowercased external_id for O(1) lookup."""
    index: dict[str, Technique] = {}
    objects = stix.get("objects", [])
    if not isinstance(objects, list):
        return index
    for obj in objects:
        if not isinstance(obj, dict) or obj.get("type") != "attack-pattern":
            continue
        ext_id: str | None = None
        for ref in obj.get("external_references", []):
            if isinstance(ref, dict) and ref.get("source_name") == "mitre-attack":
                ext_id = str(ref.get("external_id", "")) or None
                break
        if ext_id is None:
            continue
        phases = obj.get("kill_chain_phases") or [{}]
        tactic = str(phases[0].get("phase_name", "unknown"))
        deprecated = bool(obj.get("x_mitre_deprecated") or obj.get("revoked"))
        index[ext_id.lower()] = Technique(ext_id, str(obj.get("name", "")), tactic, deprecated)
    return index


def load_attack_index() -> dict[str, Technique]:
    """Fetch and parse the pinned ATT&CK STIX bundle. Raises on transport failure."""
    with urllib.request.urlopen(ATTACK_STIX_URL, timeout=30) as resp:  # noqa: S310
        stix: dict[str, object] = json.loads(resp.read().decode("utf-8"))
    return build_technique_index(stix)


def resolve_rule(raw: str, name: str, index: dict[str, Technique]) -> MappingResult:
    """Validate one Sigma document's attack.* tags against the ATT&CK index."""
    try:
        rule = yaml.safe_load(raw)
    except yaml.YAMLError:
        return MappingResult(name, "quarantined", "ERR_SIGMA_001")
    if not isinstance(rule, dict):
        return MappingResult(name, "quarantined", "ERR_SIGMA_001")

    title = str(rule.get("title", name))
    tags = rule.get("tags")
    if not isinstance(tags, list) or not tags:
        return MappingResult(title, "quarantined", "ERR_SIGMA_002")

    attack_tags = [t[len("attack."):].lower() for t in tags
                   if isinstance(t, str) and t.lower().startswith("attack.t")]
    if not attack_tags:
        return MappingResult(title, "quarantined", "ERR_SIGMA_003")

    resolved: list[Technique] = []
    for tech_id in attack_tags:
        tech = index.get(tech_id)
        if tech is None:
            return MappingResult(title, "quarantined", "ERR_SIGMA_004")
        if tech.deprecated:
            return MappingResult(title, "quarantined", "ERR_SIGMA_005")
        resolved.append(tech)

    weight = max(TACTIC_WEIGHTS.get(t.tactic, 5) for t in resolved)
    return MappingResult(
        rule=title,
        status="validated",
        techniques=[{"id": t.technique_id, "name": t.name, "tactic": t.tactic} for t in resolved],
        correlation_weight=weight,
    )


async def enrich_file(path: Path, index: dict[str, Technique]) -> MappingResult:
    """Read and resolve a single rule file off the event loop's thread pool."""
    raw = await asyncio.to_thread(path.read_text, encoding="utf-8")
    result = await asyncio.to_thread(resolve_rule, raw, path.stem, index)
    if result.status == "validated":
        out = OUTPUT_DIR / f"{path.stem}_enriched.json"
        await asyncio.to_thread(out.write_text, json.dumps(asdict(result), indent=2))
    return result


async def run_pipeline() -> int:
    """Resolve every rule concurrently; exit non-zero if any rule is quarantined."""
    index = await asyncio.to_thread(load_attack_index)
    OUTPUT_DIR.mkdir(exist_ok=True)
    files = sorted(SIGMA_DIR.glob("*.yml"))
    results = await asyncio.gather(*(enrich_file(p, index) for p in files))
    failures = [r for r in results if r.status == "quarantined"]
    for r in failures:
        print(f"QUARANTINED {r.rule}: {r.error_code}")
    print(f"{len(results) - len(failures)}/{len(results)} rules validated")
    return 1 if failures else 0


if __name__ == "__main__":
    raise SystemExit(asyncio.run(run_pipeline()))

A non-zero exit code fails the CI job, so an unmapped or deprecated rule blocks the merge instead of silently entering production routing. Each correlation_weight becomes a foundational input for dynamic severity scoring, and the per-rule technique list lets the engine collapse sub-techniques to their parent during cross-source event linking.

Error-Code Reference

The resolver emits a single stable code per quarantined rule so the CI log and any downstream dead-letter queue agree on cause and remediation. Codes follow the established ERR_CATEGORY_NNN convention.

Code	Meaning	Action
`ERR_SIGMA_001`	Rule file is not valid YAML or not a mapping document	Fix the YAML syntax; re-run the linter before the mapping gate
`ERR_SIGMA_002`	No `tags` list present, or the list is empty	Add `attack.t####` tags for every technique the rule detects
`ERR_SIGMA_003`	Tags exist but none match the `attack.t*` technique pattern	Replace tactic-only or vendor tags with concrete technique IDs
`ERR_SIGMA_004`	A technique ID is absent from the pinned STIX index	Correct the typo, or bump the pinned ATT&CK version if the ID is new
`ERR_SIGMA_005`	A referenced technique is deprecated or revoked in ATT&CK	Re-map the rule to the superseding technique recorded in ATT&CK
`ERR_ATTCK_001`	STIX bundle could not be fetched or parsed (raised by loader)	Pin and mount a local `enterprise-attack.json`; fail the build closed

Treat ERR_SIGMA_005 as a coverage signal rather than noise: vendors frequently keep emitting a revoked ID for one or two releases, so log the re-map and confirm the replacement technique is still detected before deleting the rule.

Operational Notes

The full enterprise STIX bundle is roughly 35–45 MB of JSON; parsing it builds a technique index of about 1,500–1,800 entries that occupies only a few megabytes resident, so memory is dominated by the transient parse, not the index. Load it once per process and share the read-only dict across all asyncio.to_thread workers — never re-parse per rule. Because resolution is CPU-light and I/O-bound, concurrency of 8–16 in-flight files comfortably saturates a typical CI runner’s disk; pushing far higher mainly adds thread-pool overhead. For very large rule estates (10,000+ files), batch the asyncio.gather calls in groups of a few hundred to cap open file descriptors and keep peak memory flat.

Two vendor quirks recur. Some exporters emit tactic tags such as attack.execution alongside technique tags; the resolver ignores anything not matching attack.t*, which is intentional — tactic is derived from the technique, not trusted from the tag. Others emit uppercase or zero-padded IDs (attack.T1059.001); lowercasing both the tag and the index key absorbs that. Pin the ATT&CK version in pipeline metadata so a quarterly bundle refresh is a reviewed change and you can diff exactly which IDs were added, renamed, or revoked.

Verification Checklist

A rule tagged with a known-good ID (e.g. attack.t1059.001, PowerShell) resolves to status: "validated" with the Execution tactic and correlation_weight 8.
A deliberately misspelled tag (attack.t9999) is quarantined with ERR_SIGMA_004 and fails the build.
A rule whose only tags are tactic-level (attack.execution) is quarantined with ERR_SIGMA_003.
Each validated rule writes a deterministic *_enriched.json whose correlation_weight matches its highest-criticality tactic.
The pinned ATT&CK version recorded in pipeline metadata matches the bundle the index was built from.
run_pipeline() returns a non-zero exit code whenever any rule is quarantined, blocking the merge.

FAQ

Should I map a rule to the sub-technique or the parent technique?

Tag the most specific sub-technique you can detect (e.g. attack.t1003.001, LSASS Memory) and let the engine record the parent (T1003, OS Credential Dumping) alongside it during enrichment. Keeping both lets a precise rule match the sub-technique while a broad coverage rule matches the parent and still catches every sub-technique under it. Collapsing to the parent at map time throws away granularity you cannot recover.

Why quarantine deprecated techniques instead of just passing them through?

A revoked or deprecated ID means ATT&CK has restructured that behaviour, so the tag no longer points at current detection guidance and your coverage matrix silently overstates what you detect. ERR_SIGMA_005 forces a human to re-map the rule to the superseding technique. Because some vendors keep emitting the old ID for a release or two, handle it explicitly rather than dropping it without a trace.

Where does the correlation_weight value get used after mapping?

It is the tactic-criticality input the correlation layer reads when it weights and orders alerts — it feeds dynamic severity scoring so a Credential Access hit outranks a Discovery hit, and it gives sequence rules a phase ordering during cross-source event linking. Mapping it once at build time keeps the runtime engine free of per-event ATT&CK lookups.