MITRE ATT&CK in Python: From STIX Data to Coverage Map

Q: Should I use mitreattack-python or parse the STIX JSON directly?

Use mitreattack-python. It abstracts relationship walking into named methods, including subtechnique flattening and deprecated-object handling — exactly the code that breaks when MITRE changes a field name if you write it yourself.

Q: How often does the ATT&CK dataset change?

Major releases roughly every six months, with minor corrections in between. Refetch weekly and diff the technique list; a content-hash cache key makes a new release invalidate downstream artifacts automatically.

Q: Is D3FEND a replacement for ATT&CK's mitigations?

No. ATT&CK mitigations are broad process-level recommendations; D3FEND is a structured ontology of defensive techniques with far more granularity. Use mitigations for high-level mapping and D3FEND for control-design work.

Q: Can I run this pipeline offline?

Yes. Once the STIX bundle and D3FEND responses are cached, nothing needs the internet. Schedule a periodic cache refresh and the analysis runs entirely offline, including a self-hosted Navigator.

MITRE ATT&CK in Python — pipeline banner from STIX data to Navigator coverage map

Working with MITRE ATT&CK in Python beats clicking through the matrix the moment you want real answers out of the framework — which techniques the groups targeting your sector actually use, where your detections cover them, and what to build next. This guide walks the full pipeline: fetching the STIX dataset, mapping techniques to groups and mitigations, generating ATT&CK Navigator coverage layers, and bridging across to MITRE D3FEND. It consolidates what used to be a four-part series on this blog into one place, including the rough edges I hit along the way.

Key Takeaways

MITRE publishes ATT&CK as machine-readable STIX bundles on GitHub, and the official mitreattack-python library is the supported way to parse and query them.
Relationships are most of the value: uses (group to technique) and mitigates (mitigation to technique) turn a flat technique list into a queryable knowledge graph.
Map everything once, cache the JSON output, and reuse it — live relationship-walking on every query is slow and buys you nothing.
ATT&CK Navigator renders coverage heatmaps from plain JSON layer files, so a Python script can produce every coverage view stakeholders ask for.
D3FEND closes the loop by mapping defensive techniques to the offensive techniques they counter; the gap analysis at the end is where the pipeline pays for itself.

Environment

Python 3.10 or later (the Navigator step uses statistics.quantiles).
mitreattack-python 3.0 or later, plus requests.
Internet access for the initial downloads of the ATT&CK STIX bundle (~30 MB) and the D3FEND mappings; everything after that runs from a local cache.
Roughly 100 MB of free disk space for caches, and about the same in memory once the full dataset is loaded.
The public MITRE ATT&CK Navigator instance, or a self-hosted copy for air-gapped environments.

The Problem

Browsing the ATT&CK website is fine for looking up a single technique. The friction starts when you want cross-cutting answers: which techniques are most used by groups that target your sector? Which techniques have no documented mitigations? How much of the matrix do your SIEM rules actually cover? Those answers need the data in a form you can query — STIX JSON in memory, not HTML in a browser tab. If the framework itself is new to you, my MITRE ATT&CK fundamentals guide covers the tactics-techniques-procedures model before you automate it.

Numbers alone are not the end state either. A defender-friendly question like "how much of ATT&CK do we cover?" is impossible to answer convincingly from a table, and persuasive as a coloured heatmap over the matrix. And once the offensive side is mapped, the obvious follow-up — "so what should we build next?" — needs the defensive side, which lives in a separate framework (D3FEND) behind a separate API with its own quirks.

Should this all be one coherent, first-party toolchain instead of a STIX repo, a Python library, an Angular app, and a SPARQL-shaped API? Probably. But here we are, and the pipeline below tames it into four stages that run end to end in a couple of minutes.

The Solution — One MITRE ATT&CK Pipeline in Python

Step 1 — Lay out the project

A handful of files and two cache directories keep the pipeline maintainable. For one-off analysis a notebook works; for anything repeatable, this structure has held up:

mitre_project/
├── config.py            # paths, logging
├── loader.py            # download and parse STIX
├── mapper.py            # relationship mapping
├── layers.py            # Navigator layer generation
├── d3fend.py            # D3FEND bridge
├── main.py              # entry point
├── cache/
│   ├── enterprise-attack.json
│   └── d3fend/
└── navigator_layers/

Step 2 — Centralise configuration and download the STIX dataset

Even a small pipeline benefits from a logger — print() debugging stops scaling the moment you move beyond one file:

# config.py
import logging
from pathlib import Path

BASE_DIR  = Path(__file__).resolve().parent
CACHE_DIR = BASE_DIR / 'cache'
CACHE_DIR.mkdir(exist_ok=True)
STIX_PATH = CACHE_DIR / 'enterprise-attack.json'

def setup_logging() -> None:
    logging.basicConfig(
        level=logging.INFO,
        format='%(asctime)s %(name)s %(levelname)s %(message)s',
    )

MITRE publishes the canonical STIX 2.0 bundles in the mitre/cti repository (a STIX 2.1 collection also exists in attack-stix-data; the library's stix20 module works against the former). The enterprise bundle is around 30 MB, so cache it locally and refetch on demand rather than on every run:

# loader.py
import json
import logging
import requests
from mitreattack.stix20 import MitreAttackData
from config import STIX_PATH

logger = logging.getLogger(__name__)
STIX_URL = ('https://raw.githubusercontent.com/mitre/cti/'
            'master/enterprise-attack/enterprise-attack.json')

def fetch_stix(force: bool = False) -> None:
    if STIX_PATH.exists() and not force:
        logger.info('STIX cache present at %s', STIX_PATH)
        return
    logger.info('Downloading STIX from %s', STIX_URL)
    resp = requests.get(STIX_URL, timeout=60)
    resp.raise_for_status()
    STIX_PATH.write_text(resp.text, encoding='utf-8')

def load_attack() -> MitreAttackData:
    fetch_stix()
    return MitreAttackData(str(STIX_PATH))

Step 3 — Flatten STIX objects and extract techniques

The library returns rich STIX objects that do not serialise to JSON cleanly. A short recursive helper handles the conversion once and for all:

def to_dict(obj):
    if hasattr(obj, 'serialize'):
        return json.loads(obj.serialize())
    if isinstance(obj, dict):
        return {k: to_dict(v) for k, v in obj.items()}
    if isinstance(obj, list):
        return [to_dict(i) for i in obj]
    return obj

Techniques carry their human-readable ATT&CK ID (T1059.001) in external_references under the mitre-attack source. Extract that and you have a useful base record; groups and mitigations follow the same shape via get_groups() and get_mitigations():

def get_techniques(attack: MitreAttackData) -> list[dict]:
    out = []
    for tech in attack.get_techniques():
        tid = None
        refs = to_dict(getattr(tech, 'external_references', []))
        for ref in refs:
            if ref.get('source_name') == 'mitre-attack':
                tid = ref.get('external_id')
                break
        if not tid:
            continue
        out.append({
            'technique_id': tid,
            'stix_id':      tech.id,
            'name':         tech.name,
            'description':  tech.description,
            'is_subtechnique': tech.x_mitre_is_subtechnique,
            'platforms':    list(getattr(tech, 'x_mitre_platforms', [])),
            'external_references': refs,
        })
    return out

Wire the getters into an entry point and the first run confirms the dataset loaded (counts include subtechniques plus revoked and deprecated objects, which the library returns unless you filter them):

2026-05-28 14:07:35 main INFO Loaded 799 techniques
2026-05-28 14:07:35 main INFO Loaded 174 groups
2026-05-28 14:07:35 main INFO Loaded 268 mitigations

Step 4 — Map techniques to groups and mitigations

Without relationships, ATT&CK is a glossary. The STIX data encodes them explicitly — uses from group to technique, mitigates from mitigation to technique — and the library exposes them as named helpers so you never walk raw relationship objects:

# mapper.py
import logging
from typing import Any
from mitreattack.stix20 import MitreAttackData
from loader import to_dict

logger = logging.getLogger(__name__)

class TechniqueMapper:
    def __init__(self, attack: MitreAttackData):
        self.attack = attack

    def groups_for(self, technique_id: str) -> list[dict[str, Any]]:
        tech = self.attack.get_object_by_attack_id(technique_id, 'attack-pattern')
        if not tech:
            logger.warning('Unknown technique %s', technique_id)
            return []
        groups = self.attack.get_groups_using_technique(tech.id) or []
        return [to_dict(g['object']) for g in groups]

    def mitigations_for(self, technique_id: str) -> list[dict[str, Any]]:
        tech = self.attack.get_object_by_attack_id(technique_id, 'attack-pattern')
        if not tech:
            return []
        mitigations = self.attack.get_mitigations_mitigating_technique(tech.id) or []
        return [to_dict(m['object']) for m in mitigations]

The helpers return dictionaries with object and relationships keys — keep the object payload only unless you specifically need the relationship metadata. Then decorate every technique record in one pass:

def enrich_techniques(techniques: list[dict], mapper: TechniqueMapper) -> list[dict]:
    enriched = []
    total = len(techniques)
    for i, tech in enumerate(techniques, start=1):
        tid = tech['technique_id']
        tech['groups']      = mapper.groups_for(tid)
        tech['mitigations'] = mapper.mitigations_for(tid)
        enriched.append(tech)
        if i % 50 == 0:
            logger.info('Processed %d/%d techniques', i, total)
    return enriched

All 799 techniques take around 10 seconds end to end on the current enterprise dataset. Run once, cache, reuse.

Step 5 — Cache the mapped output and query it

Serialise the result to disk and key the cache on a content hash of the STIX file, so a new ATT&CK release invalidates it automatically:

import json, hashlib
from pathlib import Path

def cache_key(stix_path: Path) -> str:
    return hashlib.sha256(stix_path.read_bytes()).hexdigest()[:16]

def save_mapped(techniques: list[dict], stix_path: Path) -> Path:
    out = Path('cache') / f'mapped_{cache_key(stix_path)}.json'
    out.write_text(json.dumps(techniques, ensure_ascii=False), encoding='utf-8')
    return out

The mapped output is about 8 MB of compact JSON, and with it on disk in a stable shape, ad-hoc analysis collapses into list comprehensions:

mapped = json.loads(Path('cache/mapped_*.json').read_text())

# Techniques with no mapped mitigations
gap = [t for t in mapped if not t['mitigations']]
print(f'Techniques with no mitigations: {len(gap)}')

# Top 10 techniques by number of groups using them
top = sorted(mapped, key=lambda t: len(t['groups']), reverse=True)[:10]
for t in top:
    print(t['technique_id'], t['name'], len(t['groups']))

# Groups that use both PowerShell (T1059.001) and Scheduled Task (T1053.005)
ps = {g['name'] for t in mapped if t['technique_id'] == 'T1059.001' for g in t['groups']}
st = {g['name'] for t in mapped if t['technique_id'] == 'T1053.005' for g in t['groups']}
print('Groups using both:', sorted(ps & st))

A few things bite once the mapper runs at scale:

Techniques with no groups are common for newly added techniques — skip them silently rather than logging warnings.
Techniques with no mitigations are also common, and often the most interesting rows from a defender's perspective, because they are gaps.
Subtechniques are full attack-pattern objects with their own IDs — iterate them alongside parents and do not deduplicate.
Deprecated techniques are marked with x_mitre_deprecated; filter them out or keep them flagged, depending on the use case.

Step 6 — Generate ATT&CK Navigator layers

MITRE ATT&CK Navigator consumes JSON "layer" files conforming to the published layer format specification (currently 4.5). A layer is essentially a list of technique IDs with colours and comments, plus display options. Each layer needs one number per technique, pulled straight from the mapped data:

def per_technique_counts(mapped: list[dict], key: str) -> dict[str, int]:
    """key in ('groups','mitigations'); returns {technique_id: count}"""
    return {t['technique_id']: len(t.get(key, [])) for t in mapped}

groups_count = per_technique_counts(mapped, 'groups')
mitig_count  = per_technique_counts(mapped, 'mitigations')

Colour thresholds should come from the data distribution, not fixed buckets. The count distribution is always skewed — a handful of techniques are used by 70+ groups while the median is 2 — so equal-width bins produce a heatmap that is one dark cell and a sea of beige. Quantiles fix that:

from statistics import quantiles

def colour_buckets(counts: dict[str, int]) -> list[tuple[int, str]]:
    values = [v for v in counts.values() if v > 0]
    if not values:
        return [(0, '#ffffff')]
    qs = quantiles(values, n=4)
    palette = ['#ffe5e5', '#ff9999', '#ff4d4d', '#cc0000', '#660000']
    thresholds = sorted({0, round(qs[0]), round(qs[1]), round(qs[2]), max(values)})
    return list(zip(thresholds, palette[:len(thresholds)]))

def colour_for(count: int, buckets: list[tuple[int, str]]) -> str:
    chosen = buckets[0][1]
    for thr, col in buckets:
        if count >= thr:
            chosen = col
    return chosen

The layer boilerplate itself is minimal:

def build_layer(name: str, counts: dict[str, int]) -> dict:
    buckets = colour_buckets(counts)
    return {
        'name':        name,
        'description': f'Heatmap of {name} per technique',
        'domain':      'enterprise-attack',
        'versions':    { 'attack': '15', 'navigator': '5.0.0', 'layer': '4.5' },
        'gradient':    { 'colors': [c for _, c in buckets], 'minValue': 0, 'maxValue': max(counts.values()) or 1 },
        'legendItems': [ { 'label': str(thr), 'color': col } for thr, col in buckets ],
        'techniques':  [
            {
                'techniqueID': tid,
                'color':       colour_for(count, buckets),
                'comment':     f'{count} {name.lower()}',
                'enabled':     True,
            }
            for tid, count in counts.items()
        ],
        'layout':      { 'layout': 'flat', 'showName': True, 'showID': False, 'expandedSubtechniques': True },
        'showTacticRowBackground': True,
        'tacticRowBackground':     '#dddddd',
        'hideDisabled':            True,
    }

Step 7 — Write the three standard layers and read them

Three layers cover most stakeholder conversations: attacker usage, documented defensive options, and what your SIEM actually catches. The detection layer takes technique IDs from wherever your rules live — Sigma frontmatter tags, a SIEM export, or the technique fields your Sigma rules already carry:

import json
from pathlib import Path

def write_layer(layer: dict, path: Path) -> None:
    path.write_text(json.dumps(layer, ensure_ascii=False, indent=2), encoding='utf-8')

out = Path('navigator_layers')
out.mkdir(exist_ok=True)

write_layer(build_layer('Groups',      groups_count), out / 'groups.json')
write_layer(build_layer('Mitigations', mitig_count),  out / 'mitigations.json')

# Detection coverage layer: load your own technique IDs from a CSV / SIEM export
detection_count = {tid: 1 for tid in detected_technique_ids}
write_layer(build_layer('Detection Coverage', detection_count), out / 'detection.json')

Open mitre-attack.github.io/attack-navigator, click Create New Layer → Open Existing Layer, and upload the JSON. For an internal instance, clone mitre-attack/attack-navigator, build with npm, and serve the static output behind your usual reverse proxy. The three layers answer three distinct questions:

Groups heatmap — where attackers concentrate effort. T1059.001 (PowerShell) and T1566.001 (Spearphishing Attachment) are perpetual heat sinks.
Mitigations coverage — where MITRE documents defensive options on paper. Light cells with heavy group usage are gaps in the framework itself.
Detection coverage — where you actually have a rule or alert. The overlap with the groups heatmap is your real posture; the gap between them is your detection roadmap. If you are still deciding what telemetry feeds that layer, start with the Windows event IDs worth monitoring.

Step 8 — Bridge to MITRE D3FEND

ATT&CK catalogues what attackers do; MITRE D3FEND catalogues what defenders can do about it, and exposes the mapping between the two via a JSON API. The payloads are SPARQL-shaped — D3FEND is built on an OWL ontology, and the JSON wrapper serialises SPARQL result bindings — which is awkward but stable. Two endpoints carry the value; cache both aggressively, because the API is a free service and the data does not change minute to minute. My D3FEND fundamentals guide covers the framework itself if it is new to you.

# d3fend.py
import json, logging, requests
from pathlib import Path

logger = logging.getLogger(__name__)
D3F_CACHE = Path('cache/d3fend')
D3F_CACHE.mkdir(parents=True, exist_ok=True)

def fetch_d3fend_mapping(attack_id: str) -> dict | None:
    cached = D3F_CACHE / f'{attack_id}.json'
    if cached.exists():
        return json.loads(cached.read_text())
    try:
        url  = f'https://d3fend.mitre.org/api/offensive-technique/attack/{attack_id}.json'
        resp = requests.get(url, timeout=30)
        resp.raise_for_status()
        data = resp.json()
        cached.write_text(json.dumps(data))
        return data
    except requests.RequestException as exc:
        logger.warning('D3FEND fetch failed for %s: %s', attack_id, exc)
        return None

def fetch_d3fend_detail(def_id: str) -> dict | None:
    cached = D3F_CACHE / f'{def_id}_detail.json'
    if cached.exists():
        return json.loads(cached.read_text())
    try:
        url  = f'https://d3fend.mitre.org/api/technique/d3f:{def_id}.json'
        resp = requests.get(url, timeout=30)
        resp.raise_for_status()
        data = resp.json()
        cached.write_text(json.dumps(data))
        return data
    except requests.RequestException as exc:
        logger.warning('D3FEND detail failed for %s: %s', def_id, exc)
        return None

The mapping payload nests the actual links under off_to_def.results.bindings; each binding carries a defensive-technique URI whose ID is the fragment after #. Resolve those into flat records and decorate the mapped dataset the same way as Step 4 — the first run takes minutes, cached runs seconds:

def defensive_techniques_for(attack_id: str) -> list[dict]:
    raw = fetch_d3fend_mapping(attack_id)
    if not raw or 'off_to_def' not in raw:
        return []

    out = []
    for binding in raw['off_to_def']['results']['bindings']:
        label = binding.get('def_tech_label', {}).get('value')
        uri   = binding.get('def_tech',       {}).get('value')
        if not (label and uri):
            continue
        def_id = uri.rsplit('#', 1)[-1]

        detail = fetch_d3fend_detail(def_id) or {}
        description = (
            detail.get('description', {})
                  .get('@graph', [{}])[0]
                  .get('d3f:definition')
        )

        out.append({
            'id':          def_id,
            'label':       label,
            'description': description,
            'url':         f'https://d3fend.mitre.org/technique/d3f:{def_id}',
        })
    return out

def add_d3fend(mapped: list[dict]) -> list[dict]:
    for i, tech in enumerate(mapped, start=1):
        tech['d3fend'] = defensive_techniques_for(tech['technique_id'])
        if i % 50 == 0:
            logger.info('D3FEND mapped %d/%d', i, len(mapped))
    return mapped

Step 9 — Control the payload size and run the gap analysis

Naively decorated, the combined output bloats to roughly 80 MB, because the same references and authors repeat across dozens of techniques. A lookup table replaces each repeated reference with a small integer and stores the canonical data once, cutting the file to about 20 MB:

def deduplicate_references(mapped: list[dict]) -> dict:
    ref_table:    dict[str, int] = {}
    author_table: dict[str, int] = {}

    for tech in mapped:
        for d3f in tech.get('d3fend', []):
            new_refs = []
            for ref in d3f.get('references', []):
                key = ref['url'] if isinstance(ref, dict) else ref
                if key not in ref_table:
                    ref_table[key] = len(ref_table) + 1
                new_refs.append(ref_table[key])
            d3f['references'] = new_refs

            new_authors = []
            for author in d3f.get('authors', []):
                if author not in author_table:
                    author_table[author] = len(author_table) + 1
                new_authors.append(author_table[author])
            d3f['authors'] = new_authors

    return {
        'techniques': mapped,
        'metadata': {
            'references': {v: k for k, v in ref_table.items()},
            'authors':    {v: k for k, v in author_table.items()},
        },
    }

A small recursive cleanup that strips null, empty strings, empty lists, and empty dicts before serialisation saves a couple more megabytes and makes the output noticeably easier to read. With the dataset enriched, the most useful query is the inverse mapping — techniques attackers demonstrably use that have no D3FEND coverage:

uncovered = [
    t for t in mapped
    if not t.get('d3fend') and len(t.get('groups', [])) >= 5
]
uncovered.sort(key=lambda t: len(t['groups']), reverse=True)
for t in uncovered[:20]:
    print(t['technique_id'], t['name'], 'groups:', len(t['groups']))

The top of that list is where security investment maps directly onto reduction of real-world attacker capability. Take it to the next strategy review. If the follow-up question is "and how do we turn the covered techniques into actual SIEM rules", that is exactly what I built SIOR-Helper for.

Frequently Asked Questions

Should I use mitreattack-python or parse the STIX JSON directly?

Use mitreattack-python. It abstracts the relationship-walking logic into named methods, including subtechnique flattening and deprecated-object handling — exactly the kind of code that breaks the next time MITRE changes a field name if you write it yourself.

How often does the ATT&CK dataset change?

Major releases land roughly every six months, with minor corrections in between. Refetch on a weekly schedule and diff the technique list; the content-hash cache key above makes a new release invalidate downstream artifacts automatically.

Why not just take a screenshot from the ATT&CK website?

The website does not know which techniques you have detections for or which mitigations you have implemented. Navigator with custom layers does. The website is for browsing the framework; Navigator is for representing your environment against it.

Is D3FEND a replacement for ATT&CK's mitigations?

No — they overlap but are not interchangeable. ATT&CK mitigations are broad, often process-level recommendations. D3FEND is a structured ontology of defensive techniques with far more granularity. Use mitigations for high-level mapping and D3FEND for detailed control-design work.

Can I run this pipeline offline?

Yes. Once the STIX bundle and the D3FEND responses are cached, nothing needs the internet. Schedule a periodic refresh of the caches and the analysis itself runs entirely offline — including a self-hosted Navigator for air-gapped environments.

Conclusion

Getting ATT&CK out of the browser and into a pipeline is mostly mechanical: cache the STIX file, use the official library, map the relationships once, and serialise the result. Everything interesting builds on that cache — the Navigator layers are a JSON transform, and the D3FEND bridge is a cached API walk with some deduplication to keep the output sane.

The parts that look clunky are clunky. The D3FEND API speaks SPARQL bindings because of the ontology underneath it, the reference bloat has to be deduplicated by hand, and the counts you get from the library include deprecated objects until you filter them. None of that is hard to work around; it is just the kind of thing nobody tells you before you have already hit it.

The genuinely useful output is the pair of gap lists: techniques with heavy group usage and no detection coverage, and techniques with heavy group usage and no D3FEND countermeasures. Both fall out of the same 8 MB cache file, and both give a security team something concrete to build next — which is what these frameworks are for.

MITRE ATT&CK to SIEM Rules: A Practical Look at SIOR-Helper — the tool that consumes these mappings and turns techniques into detection rules.
Sigma Rules for SIEM Detection: A Beginner's Guide — writing the rules whose attack.tXXXX tags feed the detection coverage layer.
KQL Threat Hunting in Microsoft Defender: Full Guide — the hunting side of the same coverage question in a Microsoft 365 estate.

Editorial note: posts on this blog are drafted with AI assistance and then reviewed, edited, and tested against a real environment before publishing. Commands, output, and screenshots come from systems I actually ran the work on.

Through Security Scriptographer, I transform complex security concepts into practical scripts and tutorials. Proficient in PowerShell, Python and various security frameworks, I'm here to help others enhance their security toolkit. Simple code, serious security. 🛡️

MITRE ATT&CK in Python: From STIX Data to Coverage Map

Key Takeaways

Environment

The Problem

The Solution — One MITRE ATT&CK Pipeline in Python

Step 1 — Lay out the project

Step 2 — Centralise configuration and download the STIX dataset

Step 3 — Flatten STIX objects and extract techniques

Step 4 — Map techniques to groups and mitigations

Step 5 — Cache the mapped output and query it

Step 6 — Generate ATT&CK Navigator layers

Step 7 — Write the three standard layers and read them

Step 8 — Bridge to MITRE D3FEND

Step 9 — Control the payload size and run the gap analysis

Frequently Asked Questions

Should I use mitreattack-python or parse the STIX JSON directly?

How often does the ATT&CK dataset change?

Why not just take a screenshot from the ATT&CK website?

Is D3FEND a replacement for ATT&CK's mitigations?

Can I run this pipeline offline?

Conclusion

Related Posts

0 comments:

Post a Comment

Search

most popular blogs

From Logs to Threats: SIEM Correlation Rules for Real Attacks

Detecting Kerberoasting with Windows Event ID 4769

Important References

Categories

Blog Archive

Report Abuse