MITRE ATT&CK is the most widely used knowledge base of adversary tactics and techniques, and at some point every defender wants to do more with it than click through the web UI. This is part one of a four-part series on working with ATT&CK programmatically in Python — fetching the data, mapping relationships, visualising in the Navigator, and bridging across to MITRE D3FEND. The series uses the official mitreattack-python library on the maintained STIX 2.1 dataset.
Key Takeaways
- MITRE publishes ATT&CK as STIX 2.1 JSON on GitHub. The mitreattack-python library is the supported way to parse and query it.
- Cache the STIX JSON locally — the enterprise dataset is ~30 MB and refetching it on every run is wasteful.
- The library returns STIX objects. Convert them to plain dictionaries with a small recursive helper so they survive JSON serialisation.
- Three object types do most of the work: techniques, groups, and mitigations. Subsequent posts in this series wire them together.
- For one-off analysis, a Jupyter notebook works. For repeatable pipelines, the structure below (loader + analyser + cache directory) keeps things maintainable.
Environment
- Python 3.10+.
- mitreattack-python 3.0 or later.
- Internet access for the initial download of
enterprise-attack.json. - ~100 MB of free disk space for the cache (the optional Mobile and ICS datasets add a similar amount each).
The Problem
Browsing the ATT&CK website is fine for looking up a single technique. The friction starts when you want to ask cross-cutting questions: which techniques are most used by groups that target our sector? Which techniques have no documented mitigations? Which subtechniques cluster under the techniques our SIEM is already catching? Those answers need the data in a form you can query, and that means STIX JSON in memory, not HTML in a browser tab.
The good news is MITRE publishes the data in exactly that form, and ships an official library that hides most of the STIX awkwardness. The structure below is what we use in our internal tooling.
The Solution
Step 1 — Lay out the project
Three files plus a cache directory keep things tidy:
mitre_project/
├── config.py # paths, logging
├── loader.py # download and parse STIX
├── main.py # entry point
└── cache/
└── enterprise-attack.json
Step 2 — Centralise configuration and logging
Even a small script benefits from a logger. print() debugging stops scaling the moment you move beyond one file:
# config.py
import logging
from pathlib import Path
BASE_DIR = Path(__file__).resolve().parent
CACHE_DIR = BASE_DIR / 'cache'
CACHE_DIR.mkdir(exist_ok=True)
STIX_PATH = CACHE_DIR / 'enterprise-attack.json'
def setup_logging() -> None:
logging.basicConfig(
level=logging.INFO,
format='%(asctime)s %(name)s %(levelname)s %(message)s',
)
Step 3 — Download and cache the STIX feed
MITRE publishes the canonical files at raw.githubusercontent.com/mitre/cti. Cache locally; re-download on demand:
# loader.py
import json
import logging
import requests
from mitreattack.stix20 import MitreAttackData
from config import STIX_PATH
logger = logging.getLogger(__name__)
STIX_URL = ('https://raw.githubusercontent.com/mitre/cti/'
'master/enterprise-attack/enterprise-attack.json')
def fetch_stix(force: bool = False) -> None:
if STIX_PATH.exists() and not force:
logger.info('STIX cache present at %s', STIX_PATH)
return
logger.info('Downloading STIX from %s', STIX_URL)
resp = requests.get(STIX_URL, timeout=60)
resp.raise_for_status()
STIX_PATH.write_text(resp.text, encoding='utf-8')
def load_attack() -> MitreAttackData:
fetch_stix()
return MitreAttackData(str(STIX_PATH))
Step 4 — Convert STIX objects to plain dictionaries
The library returns rich STIX objects that do not serialise to JSON cleanly. A short recursive helper handles the conversion once and for all:
def to_dict(obj):
if hasattr(obj, 'serialize'):
return json.loads(obj.serialize())
if isinstance(obj, dict):
return {k: to_dict(v) for k, v in obj.items()}
if isinstance(obj, list):
return [to_dict(i) for i in obj]
return obj
Step 5 — Pull techniques with their ATT&CK IDs
Techniques carry their human-readable ID (T1059.001) in external_references under the mitre-attack source. Extract that and you have a useful starting record:
def get_techniques(attack: MitreAttackData) -> list[dict]:
out = []
for tech in attack.get_techniques():
tid = None
refs = to_dict(getattr(tech, 'external_references', []))
for ref in refs:
if ref.get('source_name') == 'mitre-attack':
tid = ref.get('external_id')
break
if not tid:
continue
out.append({
'technique_id': tid,
'stix_id': tech.id,
'name': tech.name,
'description': tech.description,
'is_subtechnique': tech.x_mitre_is_subtechnique,
'platforms': list(getattr(tech, 'x_mitre_platforms', [])),
'external_references': refs,
})
return out
Groups and mitigations follow the same shape. The library exposes them via get_groups() and get_mitigations() respectively.
Step 6 — Wire it together
The entry point loads the data, dumps a structured summary, and serialises the result for downstream tools:
# main.py
import json
import logging
from config import setup_logging
from loader import load_attack, get_techniques, to_dict
def main():
setup_logging()
log = logging.getLogger('main')
attack = load_attack()
techniques = get_techniques(attack)
groups = [to_dict(g) for g in attack.get_groups()]
mitigations = [to_dict(m) for m in attack.get_mitigations()]
log.info('Loaded %d techniques', len(techniques))
log.info('Loaded %d groups', len(groups))
log.info('Loaded %d mitigations', len(mitigations))
with open('attack_summary.json', 'w', encoding='utf-8') as f:
json.dump({
'techniques': techniques,
'groups': groups,
'mitigations': mitigations,
}, f, ensure_ascii=False)
if __name__ == '__main__':
main()
Running it produces output like:
2026-05-28 14:07:35 main INFO Loaded 799 techniques
2026-05-28 14:07:35 main INFO Loaded 174 groups
2026-05-28 14:07:35 main INFO Loaded 268 mitigations
Frequently Asked Questions
Why use STIX 2.1 instead of the ATT&CK web pages?
Programmatic access. The web UI is for humans, not pipelines. STIX gives you a stable schema, every object is uniquely identified, and the entire dataset is queryable with a few library calls.
Should I use mitreattack-python or parse the STIX JSON directly?
Use mitreattack-python. It abstracts the relationship-walking logic (groups → techniques, techniques → mitigations) into named methods, which makes the calling code dramatically shorter and less error-prone than walking relationship objects yourself.
How often does the dataset change?
Major releases land roughly every six months; minor updates and corrections more often. For monitoring, refetch on a weekly cron and diff the technique list. The library exposes x_mitre_version per object for version-aware comparisons.
What about Mobile and ICS ATT&CK?
Both are available as separate STIX files in the same repository (mobile-attack.json, ics-attack.json). The library loads them with the same MitreAttackData constructor — pass the path and it handles the rest. Enterprise covers most defender use cases, but check ICS specifically if you operate OT environments.
Can I run this offline?
Yes. Once the STIX JSON is cached, nothing in this pipeline needs the internet. Schedule a periodic refresh in the cache and the rest of the analysis runs entirely offline.
Conclusion
Getting ATT&CK out of the browser and into a pipeline is mostly mechanical: cache the STIX file, use the official library, convert objects to plain dictionaries, serialise the result. The interesting work — connecting techniques to groups and mitigations, visualising coverage, mapping defensive countermeasures — starts in the next post in the series. The structure above is the foundation those posts build on.
Related Posts
- Mapping with MITRE ATT&CK: Connecting Techniques, Groups, and Mitigations — part 2 of the series.
- Visualizing with MITRE ATT&CK Navigator — part 3, turning mapped data into Navigator layers.
- MITRE ATT&CK + D3FEND: Mapping Defense to Attack — part 4, bridging into defensive countermeasures.
Authoritative references: MITRE ATT&CK and mitreattack-python.
Editorial note: posts on this blog are drafted with AI assistance and then reviewed, edited, and tested against a real environment before publishing. Commands, output, and screenshots come from systems I actually ran the work on.
0 comments:
Post a Comment