Mapping with MITRE ATT&CK: Mapping MITRE ATT&CK for Full Potential

Q: Why not query the library on demand instead of caching?

Performance and reproducibility. Querying every time means re-walking relationships for each request. Caching means downstream tools see the same shape between runs and you can version the cache alongside detection content.

Q: What is the difference between get_groups_using_technique and walking relationships manually?

The library handles subtechnique flattening and deprecated-object filtering. Doing it by hand reproduces logic that breaks when MITRE changes a field name.

Q: Can I extend the mapper to include data sources or detections?

Yes. ATT&CK 11+ has data sources and components as first-class objects. mitreattack-python exposes get_datasources and get_techniques_used_by_datasource. Add another field to the enrichment loop.

Q: What about software (malware and tools)?

Same idea, different helper: get_software returns malware and tools, and get_techniques_used_by_software links them to techniques. Useful for correlating threat-intel reports with implemented techniques.

Part two of our MITRE ATT&CK series, continuing on from Getting Started with MITRE ATT&CK: Fetching and Processing Data. With the STIX dataset already cached locally, the next question is how to map techniques to the groups that use them and the mitigations that counter them. The relationships are what make the framework useful — without them, a technique list is just a glossary.

Key Takeaways

ATT&CK encodes relationships explicitly in the STIX data: uses (group → technique), mitigates (mitigation → technique), and a few less common types.
The mitreattack-python library exposes these as named helpers — get_groups_using_technique(), get_mitigations_mitigating_technique() — instead of forcing you to walk STIX relationship objects.
Map everything once, cache the result, and reuse. Live mapping for every query is unnecessary and slow.
Mapped data has shape: each technique gets a list of groups, a list of mitigations, and a stable structure that downstream consumers (Navigator layers, SIEM rules, reports) can rely on.
Edge cases that bite: techniques with no mapped groups, techniques with no mitigations, and circular references from STIX relationship chains.

Environment

Project structure and STIX cache from Part 1.
Python 3.10+ and mitreattack-python 3.0+.
Roughly 100 MB of memory once everything is loaded; the mapped output is around 8 MB of JSON.

The Problem

Without relationships, ATT&CK is a flat list. You cannot answer questions like "which mitigations cover the techniques our SIEM does not catch yet?" or "which groups use both PowerShell and scheduled tasks?". STIX encodes the answer, but you have to walk it. The library helps; the structure below makes the result reusable.

The Solution

Step 1 — Wrap the mapper in a small class

One class, one cached MitreAttackData instance, two helper methods. Keeping the logic contained makes it easy to swap the implementation later (e.g. for a SQL-backed version):

import logging
from typing import Any
from mitreattack.stix20 import MitreAttackData
from loader import to_dict

logger = logging.getLogger(__name__)

class TechniqueMapper:
    def __init__(self, attack: MitreAttackData):
        self.attack = attack

    def groups_for(self, technique_id: str) -> list[dict[str, Any]]:
        tech = self.attack.get_object_by_attack_id(technique_id, 'attack-pattern')
        if not tech:
            logger.warning('Unknown technique %s', technique_id)
            return []
        groups = self.attack.get_groups_using_technique(tech.id) or []
        return [to_dict(g['object']) for g in groups]

    def mitigations_for(self, technique_id: str) -> list[dict[str, Any]]:
        tech = self.attack.get_object_by_attack_id(technique_id, 'attack-pattern')
        if not tech:
            return []
        mitigations = self.attack.get_mitigations_mitigating_technique(tech.id) or []
        return [to_dict(m['object']) for m in mitigations]

The library returns dictionaries with object and relationships keys — keep the object payload only unless you specifically need the relationship metadata.

Step 2 — Map every technique at once

Iterate the technique list from Part 1 and decorate each record with its groups and mitigations:

def enrich_techniques(techniques: list[dict], mapper: TechniqueMapper) -> list[dict]:
    enriched = []
    total = len(techniques)
    for i, tech in enumerate(techniques, start=1):
        tid = tech['technique_id']
        tech['groups']      = mapper.groups_for(tid)
        tech['mitigations'] = mapper.mitigations_for(tid)
        enriched.append(tech)
        if i % 50 == 0:
            logger.info('Processed %d/%d techniques', i, total)
    return enriched

799 techniques on the current enterprise dataset take around 10 seconds end to end. Run once, cache the JSON output, reuse forever.

Step 3 — Cache the mapped output

Serialise to disk once and load from cache on subsequent runs. Use a content hash or version number to invalidate when the upstream STIX file changes:

import json, hashlib
from pathlib import Path

def cache_key(stix_path: Path) -> str:
    return hashlib.sha256(stix_path.read_bytes()).hexdigest()[:16]

def save_mapped(techniques: list[dict], stix_path: Path) -> Path:
    out = Path('cache') / f'mapped_{cache_key(stix_path)}.json'
    out.write_text(json.dumps(techniques, ensure_ascii=False), encoding='utf-8')
    return out

Step 4 — Query the mapped data

With the data on disk in a stable shape, ad-hoc analysis becomes trivial. A few example questions:

mapped = json.loads(Path('cache/mapped_*.json').read_text())

# Techniques with no mapped mitigations
gap = [t for t in mapped if not t['mitigations']]
print(f'Techniques with no mitigations: {len(gap)}')

# Top 10 techniques by number of groups using them
top = sorted(mapped, key=lambda t: len(t['groups']), reverse=True)[:10]
for t in top:
    print(t['technique_id'], t['name'], len(t['groups']))

# Groups that use both PowerShell (T1059.001) and Scheduled Task (T1053.005)
ps = {g['name'] for t in mapped if t['technique_id'] == 'T1059.001' for g in t['groups']}
st = {g['name'] for t in mapped if t['technique_id'] == 'T1053.005' for g in t['groups']}
print('Groups using both:', sorted(ps & st))

These three queries took an entire blog post to motivate. Now they are three list comprehensions.

Step 5 — Handle the edge cases

A few things bite once you start running the mapper at scale:

Techniques with no groups. Common for newly-added techniques. Skip silently rather than warning.
Techniques with no mitigations. Also common. Often the most interesting techniques from a defender's perspective — they are gaps.
Subtechniques. The library treats them as full attack-pattern objects with their own ATT&CK ID (T1059.001). Iterate them alongside parent techniques; do not deduplicate.
Deprecated techniques. x_mitre_deprecated on the STIX object marks them. Either filter out or keep with a flag, depending on use case.

Step 6 — Output formats for downstream tools

The mapped JSON is consumable by anything that speaks JSON. For specific downstream tools:

MITRE ATT&CK Navigator consumes its own layer format — see Part 3 of this series.
Sigma rule frontmatter accepts tags: [attack.t1059.001]; generate it from the mapped technique IDs.
Sentinel / Splunk / Elastic detection content usually carries a mitre_attack_technique field; populate from the same source.
Spreadsheets for stakeholder reports: flatten the structure and export via csv module or pandas.

Frequently Asked Questions

Why not query the library on demand instead of caching?

Performance and reproducibility. Querying the library each time means re-walking relationships for every request, which is slow. Caching means downstream tools see the same shape between runs and you can version the cache file alongside detection content.

What is the difference between `get_groups_using_technique` and walking relationships manually?

The library follows uses relationships from groups to techniques, including subtechnique flattening and deprecated-object filtering. Doing it by hand requires reproducing the same logic and is exactly the kind of code that breaks the next time MITRE changes a field name.

Can I extend the mapper to include data sources or detections?

Yes. ATT&CK 11+ includes data sources and data components as first-class objects. mitreattack-python exposes get_datasources() and get_techniques_used_by_datasource(). Add another field to the enrichment loop and you have full visibility into what telemetry covers what technique.

What about software (malware and tools)?

Same idea, different helper: get_software() returns malware and tool objects, and get_techniques_used_by_software() gives the link to techniques. Useful for correlating threat-intel reports that name a tool with the techniques it implements.

How big is the mapped output?

About 8 MB compact JSON for the current enterprise dataset. Pretty-printed it triples. Compress on disk if you check it into git.

Conclusion

Relationships are most of the value in MITRE ATT&CK. Mapping them once, caching the result, and shipping the cache to downstream tools turns ATT&CK from a website into a queryable knowledge graph. The structure above is intentionally small — adding data sources, software, and campaigns is the same pattern applied to additional helper methods. Part three takes the same mapped data and renders it as Navigator layers.

Getting Started with MITRE ATT&CK: Fetching and Processing Data — part 1 of the series.
Visualizing with MITRE ATT&CK Navigator — part 3, turning mapped data into Navigator layers.
MITRE ATT&CK + D3FEND: Mapping Defense to Attack — part 4, bridging into defensive countermeasures.

Authoritative reference: mitreattack-python on GitHub.

Editorial note: posts on this blog are drafted with AI assistance and then reviewed, edited, and tested against a real environment before publishing. Commands, output, and screenshots come from systems I actually ran the work on.

Security Scriptographer — PowerShell & Threat Hunting

Through Security Scriptographer, I transform complex security concepts into practical scripts and tutorials. Proficient in PowerShell, Python and various security frameworks, I'm here to help others enhance their security toolkit. Simple code, serious security. 🛡️

Mapping with MITRE ATT&CK: Mapping MITRE ATT&CK for Full Potential

Key Takeaways

Environment

The Problem

The Solution

Step 1 — Wrap the mapper in a small class

Step 2 — Map every technique at once

Step 3 — Cache the mapped output

Step 4 — Query the mapped data

Step 5 — Handle the edge cases

Step 6 — Output formats for downstream tools

Frequently Asked Questions

Why not query the library on demand instead of caching?

What is the difference between `get_groups_using_technique` and walking relationships manually?

Can I extend the mapper to include data sources or detections?

What about software (malware and tools)?

How big is the mapped output?

Conclusion

Related Posts

0 comments:

Post a Comment

Search

most popular blogs

MITRE ATT&CK to SIEM Rules: A Practical Look at SIOR-Helper

From Logs to Threats: SIEM Correlation Rules for Real Attacks

Important References

Categories

Blog Archive

Report Abuse