Getting Started with MITRE ATT&CK: Fetching and Processing Data Like a Pro

Q: Should I use mitreattack-python or parse the STIX JSON directly?

Use mitreattack-python. It abstracts relationship walking into named methods, which makes calling code shorter and less error-prone than walking STIX relationship objects by hand.

Q: How often does the dataset change?

Major releases roughly every six months, minor updates more often. Refetch on a weekly cron and diff the technique list; x_mitre_version supports version-aware comparisons.

Q: What about Mobile and ICS ATT&CK?

Both are separate STIX files in the same repo. Load them with the same MitreAttackData constructor. Enterprise covers most defender use cases; check ICS for OT environments.

Q: Can I run this offline?

Yes. Once the STIX JSON is cached, nothing in the pipeline needs the internet. Schedule a periodic refresh and run analysis offline.

MITRE ATT&CK is the most widely used knowledge base of adversary tactics and techniques, and at some point every defender wants to do more with it than click through the web UI. This is part one of a four-part series on working with ATT&CK programmatically in Python — fetching the data, mapping relationships, visualising in the Navigator, and bridging across to MITRE D3FEND. The series uses the official mitreattack-python library on the maintained STIX 2.1 dataset.

Key Takeaways

MITRE publishes ATT&CK as STIX 2.1 JSON on GitHub. The mitreattack-python library is the supported way to parse and query it.
Cache the STIX JSON locally — the enterprise dataset is ~30 MB and refetching it on every run is wasteful.
The library returns STIX objects. Convert them to plain dictionaries with a small recursive helper so they survive JSON serialisation.
Three object types do most of the work: techniques, groups, and mitigations. Subsequent posts in this series wire them together.
For one-off analysis, a Jupyter notebook works. For repeatable pipelines, the structure below (loader + analyser + cache directory) keeps things maintainable.

Environment

Python 3.10+.
mitreattack-python 3.0 or later.
Internet access for the initial download of enterprise-attack.json.
~100 MB of free disk space for the cache (the optional Mobile and ICS datasets add a similar amount each).

The Problem

Browsing the ATT&CK website is fine for looking up a single technique. The friction starts when you want to ask cross-cutting questions: which techniques are most used by groups that target our sector? Which techniques have no documented mitigations? Which subtechniques cluster under the techniques our SIEM is already catching? Those answers need the data in a form you can query, and that means STIX JSON in memory, not HTML in a browser tab.

The good news is MITRE publishes the data in exactly that form, and ships an official library that hides most of the STIX awkwardness. The structure below is what we use in our internal tooling.

The Solution

Step 1 — Lay out the project

Three files plus a cache directory keep things tidy:

mitre_project/
├── config.py        # paths, logging
├── loader.py        # download and parse STIX
├── main.py          # entry point
└── cache/
    └── enterprise-attack.json

Step 2 — Centralise configuration and logging

Even a small script benefits from a logger. print() debugging stops scaling the moment you move beyond one file:

# config.py
import logging
from pathlib import Path

BASE_DIR  = Path(__file__).resolve().parent
CACHE_DIR = BASE_DIR / 'cache'
CACHE_DIR.mkdir(exist_ok=True)
STIX_PATH = CACHE_DIR / 'enterprise-attack.json'

def setup_logging() -> None:
    logging.basicConfig(
        level=logging.INFO,
        format='%(asctime)s %(name)s %(levelname)s %(message)s',
    )

Step 3 — Download and cache the STIX feed

MITRE publishes the canonical files at raw.githubusercontent.com/mitre/cti. Cache locally; re-download on demand:

# loader.py
import json
import logging
import requests
from mitreattack.stix20 import MitreAttackData
from config import STIX_PATH

logger = logging.getLogger(__name__)
STIX_URL = ('https://raw.githubusercontent.com/mitre/cti/'
            'master/enterprise-attack/enterprise-attack.json')

def fetch_stix(force: bool = False) -> None:
    if STIX_PATH.exists() and not force:
        logger.info('STIX cache present at %s', STIX_PATH)
        return
    logger.info('Downloading STIX from %s', STIX_URL)
    resp = requests.get(STIX_URL, timeout=60)
    resp.raise_for_status()
    STIX_PATH.write_text(resp.text, encoding='utf-8')

def load_attack() -> MitreAttackData:
    fetch_stix()
    return MitreAttackData(str(STIX_PATH))

Step 4 — Convert STIX objects to plain dictionaries

The library returns rich STIX objects that do not serialise to JSON cleanly. A short recursive helper handles the conversion once and for all:

def to_dict(obj):
    if hasattr(obj, 'serialize'):
        return json.loads(obj.serialize())
    if isinstance(obj, dict):
        return {k: to_dict(v) for k, v in obj.items()}
    if isinstance(obj, list):
        return [to_dict(i) for i in obj]
    return obj

Step 5 — Pull techniques with their ATT&CK IDs

Techniques carry their human-readable ID (T1059.001) in external_references under the mitre-attack source. Extract that and you have a useful starting record:

def get_techniques(attack: MitreAttackData) -> list[dict]:
    out = []
    for tech in attack.get_techniques():
        tid = None
        refs = to_dict(getattr(tech, 'external_references', []))
        for ref in refs:
            if ref.get('source_name') == 'mitre-attack':
                tid = ref.get('external_id')
                break
        if not tid:
            continue
        out.append({
            'technique_id': tid,
            'stix_id':      tech.id,
            'name':         tech.name,
            'description':  tech.description,
            'is_subtechnique': tech.x_mitre_is_subtechnique,
            'platforms':    list(getattr(tech, 'x_mitre_platforms', [])),
            'external_references': refs,
        })
    return out

Groups and mitigations follow the same shape. The library exposes them via get_groups() and get_mitigations() respectively.

Step 6 — Wire it together

The entry point loads the data, dumps a structured summary, and serialises the result for downstream tools:

# main.py
import json
import logging
from config import setup_logging
from loader import load_attack, get_techniques, to_dict

def main():
    setup_logging()
    log = logging.getLogger('main')

    attack = load_attack()

    techniques  = get_techniques(attack)
    groups      = [to_dict(g) for g in attack.get_groups()]
    mitigations = [to_dict(m) for m in attack.get_mitigations()]

    log.info('Loaded %d techniques',  len(techniques))
    log.info('Loaded %d groups',      len(groups))
    log.info('Loaded %d mitigations', len(mitigations))

    with open('attack_summary.json', 'w', encoding='utf-8') as f:
        json.dump({
            'techniques':  techniques,
            'groups':      groups,
            'mitigations': mitigations,
        }, f, ensure_ascii=False)

if __name__ == '__main__':
    main()

Running it produces output like:

2026-05-28 14:07:35 main INFO Loaded 799 techniques
2026-05-28 14:07:35 main INFO Loaded 174 groups
2026-05-28 14:07:35 main INFO Loaded 268 mitigations

Frequently Asked Questions

Why use STIX 2.1 instead of the ATT&CK web pages?

Programmatic access. The web UI is for humans, not pipelines. STIX gives you a stable schema, every object is uniquely identified, and the entire dataset is queryable with a few library calls.

Should I use `mitreattack-python` or parse the STIX JSON directly?

Use mitreattack-python. It abstracts the relationship-walking logic (groups → techniques, techniques → mitigations) into named methods, which makes the calling code dramatically shorter and less error-prone than walking relationship objects yourself.

How often does the dataset change?

Major releases land roughly every six months; minor updates and corrections more often. For monitoring, refetch on a weekly cron and diff the technique list. The library exposes x_mitre_version per object for version-aware comparisons.

What about Mobile and ICS ATT&CK?

Both are available as separate STIX files in the same repository (mobile-attack.json, ics-attack.json). The library loads them with the same MitreAttackData constructor — pass the path and it handles the rest. Enterprise covers most defender use cases, but check ICS specifically if you operate OT environments.

Can I run this offline?

Yes. Once the STIX JSON is cached, nothing in this pipeline needs the internet. Schedule a periodic refresh in the cache and the rest of the analysis runs entirely offline.

Conclusion

Getting ATT&CK out of the browser and into a pipeline is mostly mechanical: cache the STIX file, use the official library, convert objects to plain dictionaries, serialise the result. The interesting work — connecting techniques to groups and mitigations, visualising coverage, mapping defensive countermeasures — starts in the next post in the series. The structure above is the foundation those posts build on.

Mapping with MITRE ATT&CK: Connecting Techniques, Groups, and Mitigations — part 2 of the series.
Visualizing with MITRE ATT&CK Navigator — part 3, turning mapped data into Navigator layers.
MITRE ATT&CK + D3FEND: Mapping Defense to Attack — part 4, bridging into defensive countermeasures.

Authoritative references: MITRE ATT&CK and mitreattack-python.

Editorial note: posts on this blog are drafted with AI assistance and then reviewed, edited, and tested against a real environment before publishing. Commands, output, and screenshots come from systems I actually ran the work on.

Security Scriptographer — PowerShell & Threat Hunting

Through Security Scriptographer, I transform complex security concepts into practical scripts and tutorials. Proficient in PowerShell, Python and various security frameworks, I'm here to help others enhance their security toolkit. Simple code, serious security. 🛡️

Getting Started with MITRE ATT&CK: Fetching and Processing Data Like a Pro

Key Takeaways

Environment

The Problem

The Solution

Step 1 — Lay out the project

Step 2 — Centralise configuration and logging

Step 3 — Download and cache the STIX feed

Step 4 — Convert STIX objects to plain dictionaries

Step 5 — Pull techniques with their ATT&CK IDs

Step 6 — Wire it together

Frequently Asked Questions

Why use STIX 2.1 instead of the ATT&CK web pages?

Should I use `mitreattack-python` or parse the STIX JSON directly?

How often does the dataset change?

What about Mobile and ICS ATT&CK?

Can I run this offline?

Conclusion

Related Posts

0 comments:

Post a Comment

Search

most popular blogs

MITRE ATT&CK to SIEM Rules: A Practical Look at SIOR-Helper

From Logs to Threats: SIEM Correlation Rules for Real Attacks

Important References

Categories

Blog Archive

Report Abuse