MITRE ATT&CK + D3FEND: Mapping Defense to Attack

December 23, 2024 0 Comments 10 min read

Hey there, fellow threat hunters! 👋 Welcome to part 4 of our MITRE ATT&CK journey! Today, we're exploring how to integrate MITRE D3FEND into our project. This isn't going to be a perfect solution, but it's a solid starting point for anyone looking to connect offensive techniques with their defensive counterparts.

Why Map D3FEND to ATT&CK?

The MITRE ATT&CK framework gives us great insights into adversary tactics and techniques, but it's only part of the picture. MITRE D3FEND complements this by providing a knowledge base of defensive countermeasures. By mapping them together, we can:

Quickly identify potential defensive measures for known attack techniques
Understand gaps in our defensive coverage
Make more informed decisions about security investments
Create comprehensive security documentation that covers both offense and defense

The Data Loading Challenge

Our first challenge was figuring out how to efficiently load D3FEND data. The D3FEND API provides two main endpoints we're interested in:

Offensive technique mapping: /api/offensive-technique/attack/{technique_id}.json
Defensive technique details: /api/technique/d3f:{def_tech_id}.json

Here's how we handle the loading:

def load_d3fend_data(technique_id: str, use_cache: bool = True) -> Optional[Dict]:
    """Load D3FEND data for a specific technique ID"""
    cache_file = os.path.join(cache_dir, f'{technique_id}.json')
    
    if use_cache and os.path.exists(cache_file):
        logger.debug(f"Loading cached D3FEND data for {technique_id}")
        return json.load(open(cache_file, 'r'))
            
    try:
        url = f"https://d3fend.mitre.org/api/offensive-technique/attack/{technique_id}.json"
        response = requests.get(url)
        response.raise_for_status()
        d3fend_data = response.json()
        
        with open(cache_file, 'w') as f:
            json.dump(d3fend_data, f)
            
        return d3fend_data
    except requests.RequestException as e:
        logger.debug(f"Failed to fetch D3FEND data for {technique_id}")
        return None

Loading Defensive Details

We also need detailed information about each defensive technique:

def load_d3fend_technique_details(def_tech_id: str, use_cache: bool = True) -> Optional[Dict]:
    """Load detailed information for a specific D3FEND technique"""
    cache_file = os.path.join(cache_dir, f'{def_tech_id}_details.json')
    
    if use_cache and os.path.exists(cache_file):
        return json.load(open(cache_file, 'r'))
            
    try:
        url = f"https://d3fend.mitre.org/api/technique/d3f:{def_tech_id}.json"
        response = requests.get(url)
        response.raise_for_status()
        technique_data = response.json()
        
        with open(cache_file, 'w') as f:
            json.dump(technique_data, f)
            
        return technique_data
    except requests.RequestException as e:
        logger.debug(f"Failed to fetch D3FEND technique details: {str(e)}")
        return None

Some issues we encountered with this approach:

Rate limiting can be a problem when fetching a lot of data
The API occasionally returns inconsistent data structures
Some technique mappings are incomplete or missing

The Mapping Process

Our TechniqueMapper class handles the D3FEND integration. The core challenge here is to create reliable connections between offensive techniques and their defensive counterparts. Let's break down how we implemented this:

Understanding the Data Structure

Before we dive into the code, it's important to understand what we're working with. The D3FEND API returns data in a specific format:

Each offensive technique can map to multiple defensive techniques
The mapping comes with additional metadata like labels and descriptions
The response uses a SPARQL-like structure with 'bindings'

Here's our implementation of the mapping function with detailed comments explaining each step:

def map_d3fend_to_technique(self, technique_id: str, use_cache: bool = True) -> List[Dict[str, Any]]:
    """Maps D3FEND defensive techniques to an ATT&CK technique"""
    try:
        # First, fetch the basic mapping data from D3FEND
        d3fend_data = load_d3fend_data(technique_id, use_cache)
        if not d3fend_data or 'off_to_def' not in d3fend_data:
            return []

        d3fend_techniques = []
        # The bindings contain the actual mappings
        bindings = d3fend_data['off_to_def']['results']['bindings']
        
        for binding in bindings:
            # Each binding should have a label - if not, it's malformed
            if 'def_tech_label' not in binding:
                continue
                
            # Extract the D3FEND technique ID from the URI
            def_tech_id = binding['def_tech']['value'].split('#')[-1]
            
            # Cache management for detailed information
            if def_tech_id not in self.d3fend_cache:
                self.d3fend_cache[def_tech_id] = load_d3fend_technique_details(def_tech_id, use_cache)
            
            def_tech_details = self.d3fend_cache[def_tech_id]

The code above handles the initial data fetching and preprocessing. Now let's look at how we extract the actual useful information:

            # Extract description from the complex graph structure
            description = None
            if def_tech_details and 'description' in def_tech_details:
                if '@graph' in def_tech_details['description']:
                    graph = def_tech_details['description']['@graph']
                    if graph and 'd3f:definition' in graph[0]:
                        description = graph[0]['d3f:definition']

            # Build a standardized technique info object
            technique_info = {
                "id": def_tech_id,
                "title": binding['def_tech_label']['value'],
                "url": f"https://d3fend.mitre.org/technique/d3f:{def_tech_id}",
                "description": description
            }

A few important implementation details to note:

We use a cache to avoid repeated API calls for the same technique
The error handling is deliberately permissive - we'd rather return partial data than nothing
We standardize the output format to make it easier to work with later

Some challenges we encountered during implementation:

The D3FEND API can be inconsistent in how it returns data structures
Some descriptions contain HTML-like markup that needs to be handled
The graph structure can vary between different technique types

The Memory Problem: Deep Dive

When we first finished our implementation, we ran into a serious issue: our JSON output was ballooning to around 80MB. For context, that's larger than most complete databases for small applications. Let's break down why this happened and how we solved it.

Understanding the Bloat

After analyzing our output, we identified several causes of the excessive size:

References were being duplicated across multiple techniques
Many fields contained empty or null values that were being stored unnecessarily
Description texts were often repeated with slight variations
The JSON structure itself had unnecessary nesting

Solution 1: Reference Deduplication

The biggest offender was duplicate references. Many D3FEND techniques share the same academic papers or documentation. Here's how we implemented the deduplication:

def optimize_d3fend_references(techniques: List[Dict]) -> Dict:
    # Create lookup tables for both references and authors
    reference_lookup = {}
    author_lookup = {}
    ref_counter = 0
    author_counter = 0

    # First pass: build lookup tables
    for tech in techniques:
        if 'd3fend' in tech:
            for d3f in tech['d3fend']:
                # Handle references
                if 'references' in d3f:
                    new_refs = []
                    for ref in d3f['references']:
                        # Create a unique key from the reference URL
                        ref_key = ref['url'] if isinstance(ref, dict) else ref
                        
                        # If we haven't seen this reference before, add it to lookup
                        if ref_key not in reference_lookup:
                            ref_counter += 1
                            reference_lookup[ref_key] = {
                                'id': str(ref_counter),
                                'data': ref
                            }
                        # Store only the reference ID instead of full data
                        new_refs.append(reference_lookup[ref_key]['id'])
                    # Replace full reference data with IDs
                    d3f['references'] = new_refs

This approach gave us significant benefits:

Each unique reference is stored only once
References are easily updateable via the lookup table
Memory usage becomes more predictable

Solution 2: Empty Value Cleanup

Next, we tackled the problem of empty values. Initially, we were storing a lot of nulls, empty strings, and empty arrays. Here's our recursive cleanup function:

def clean_empty(d):
    """
    Recursively remove empty values from data structure
    - Empty strings
    - Empty lists/dicts
    - None values
    - Zero values where inappropriate
    """
    if isinstance(d, dict):
        # Process dictionaries recursively
        return {
            k: clean_empty(v) 
            for k, v in d.items() 
            # Only keep values that aren't empty and don't clean to empty
            if v not in (None, "", [], {}, 0) and 
               clean_empty(v) not in (None, "", [], {}, 0)
        }
    elif isinstance(d, list):
        # Process lists recursively
        return [
            clean_empty(item) 
            for item in d 
            if item not in (None, "", [], {}, 0)
        ]
    # Return base values unchanged
    return d

Some key decisions in this implementation:

We handle both dictionaries and lists recursively
We consider multiple types of "empty" values
We're careful not to remove legitimate zero values in numeric fields

Solution 3: Metadata Structure Optimization

Finally, we optimized how we store the core technique data:

def optimize_technique_data(technique: Dict) -> Dict:
    """
    Optimizes technique data structure by:
    1. Keeping only essential fields
    2. Flattening nested structures where possible
    3. Using consistent data types
    """
    return {
        "type": "attack-pattern",  # Required for STIX compatibility
        "id": technique.get("id"),
        "technique_id": technique.get("technique_id"),
        "name": technique.get("name"),
        # Only store description if it adds value
        "description": technique.get("description"),
        # Store D3FEND data with optimized references
        "d3fend": technique.get("d3fend", [])
    }

The Results

After implementing all these optimizations, we saw dramatic improvements:

File size reduced from 80MB to about 20MB
Load times improved by approximately 75%
Memory usage during processing dropped significantly

Potential Further Optimizations

While we've made significant improvements, there's still room for more:

Implement actual compression (gzip/bzip2) for stored files
Implement lazy loading for detailed technique information
Create separate caches for frequently and rarely accessed data

Data Optimization: The Final Piece

Now that we've addressed the core memory issues, let's dive into how we handle the overall data optimization process. This is where we bring everything together into a cohesive system.

The Complete Optimization Pipeline

Our optimization process happens in several stages, each building on the previous one:

def optimize_d3fend_references(techniques: List[Dict]) -> Dict:
    """
    Create optimized data structure with lookup tables for references and authors
    """
    # Initialize our lookup tables and counters
    reference_lookup = {}
    author_lookup = {}
    ref_counter = 0
    author_counter = 0

    # First pass: Process all techniques and build lookups
    for tech in techniques:
        if 'd3fend' in tech:
            for d3f in tech['d3fend']:
                # Handle references
                if 'references' in d3f:
                    new_refs = []
                    for ref in d3f['references']:
                        # Create a unique key for each reference
                        ref_key = ref['url']
                        if ref_key not in reference_lookup:
                            ref_counter += 1
                            reference_lookup[ref_key] = {
                                'id': str(ref_counter),
                                'data': ref
                            }
                        new_refs.append(reference_lookup[ref_key]['id'])
                    d3f['references'] = new_refs

                # Handle authors similarly
                if 'authors' in d3f:
                    new_authors = []
                    for author in d3f['authors']:
                        if author not in author_lookup:
                            author_counter += 1
                            author_lookup[author] = str(author_counter)
                        new_authors.append(author_lookup[author])
                    d3f['authors'] = new_authors

We then create reverse lookups for easy access:

    # Create reverse lookups for the final data structure
    reference_reverse_lookup = {
        v['id']: v['data'] for v in reference_lookup.values()
    }
    author_reverse_lookup = {v: k for k, v in author_lookup.items()}

    # Return the complete optimized structure
    return {
        'techniques': techniques,
        'metadata': {
            'reference_lookup': reference_reverse_lookup,
            'author_lookup': author_reverse_lookup,
            'generated_at': datetime.datetime.now().isoformat(),
            'version': '1.0',
            'technique_count': len(techniques)
        }
    }

Final Save and Optimization

The last step is saving our optimized data:

def save_optimized_data(mapped_techniques: List[Dict], output_path: str):
    """
    Save the final optimized data structure with maximum efficiency
    """
    # First optimize each technique
    optimized_techniques = [
        optimize_technique_data(tech) for tech in mapped_techniques
    ]
    
    # Then optimize references across all techniques
    optimized_data = optimize_d3fend_references(optimized_techniques)
    
    # Remove any remaining empty values
    optimized_data = clean_empty(optimized_data)
    
    # Use optimal JSON encoding settings
    with open(output_path, 'w', encoding='utf-8') as f:
        json.dump(optimized_data, f, 
                 ensure_ascii=False,
                 separators=(',', ':'),
                 check_circular=False)

Looking Forward

While we've made significant progress in mapping ATT&CK to D3FEND and optimizing the data storage, there's always room for improvement. Here are some areas we could explore in the future:

Interactive visualization tools for exploring the relationships
Use of MITRE ATT&CK mappings for SIEM and SOAR rules

Wrapping Up

This project taught us several valuable lessons:

Data optimization isn't just about size - it's about making the data more usable
Sometimes the simple solutions (like lookup tables) are the most effective

Until then, keep your code clean and your security tight! 🕵️‍♂️

P.S. All the code shown in this blog series is available in our GitHub repository:
https://github.com/SecurityScriptographer/mitre
Feel free to fork it, improve it, and share your optimizations with the community!

Additional Resources

MITRE D3FEND Official Site
MITRE ATT&CK Framework
Cyberchef
NIST CSF

PowerShell Quick Guide: Process Investigation

December 23, 2024 0 Comments 10 min read

Hey there, fellow threat hunters! 👋 Today we're diving into process investigation with PowerShell. Whether you're hunting malware or troubleshooting system issues, understanding processes is crucial. Let's dig in!

Basic Process Information

Let's start with the basics:

# Get all running processes
Get-Process | Select-Object Name, Id, Path, Company, CPU, StartTime | Sort-Object CPU -Descending

# Get specific process details
Get-Process notepad | Select-Object *

# Find processes by name (supports wildcards)
Get-Process *chrome* | Select-Object Name, Id, Path

Process Relationships

Understanding parent-child relationships is crucial for threat hunting:

# Get process with parent info (Windows 10+)
Get-CimInstance Win32_Process | Select-Object ProcessId, ParentProcessId, 
    CommandLine, @{Name='ParentProcess';
    Expression={(Get-Process -Id $_.ParentProcessId).Name}},
    @{Name='CreationDate';Expression={$_.CreationDate}} |
    Sort-Object CreationDate -Descending

# Find children of a specific process
$parentId = (Get-Process explorer).Id
Get-CimInstance Win32_Process | 
    Where-Object { $_.ParentProcessId -eq $parentId } |
    Select-Object Name, ProcessId, CommandLine

Command Line Investigation

While these PowerShell commands are useful for basic investigation and learning, it's important to note that real-world threat hunting is much more complex. Professional security teams typically use dedicated tools and platforms like:

Security Information and Event Management (SIEM) systems
Endpoint Detection and Response (EDR) solutions
Advanced threat hunting platforms
Machine learning-based anomaly detection

This guide is meant to demonstrate basic concepts and help you understand what's happening under the hood. For production environments and serious security monitoring, always invest in proper security tools and professional training.

Command lines can reveal suspicious behavior:

# Get processes with command line info
Get-CimInstance Win32_Process | 
    Select-Object ProcessId, Name, CommandLine, CreationDate |
    Where-Object { $_.CommandLine -ne $null } |
    Sort-Object CreationDate -Descending

# Look for suspicious PowerShell commands
Get-CimInstance Win32_Process | 
    Where-Object { $_.CommandLine -like '*powershell*' -and $_.CommandLine -like '*encoded*' } |
    Select-Object Name, ProcessId, CommandLine, CreationDate

Memory Analysis

Check memory usage patterns:

# Get top memory consumers
Get-Process | Sort-Object WorkingSet64 -Descending | 
    Select-Object -First 10 Name, Id, @{
        Name='MemoryUsage(MB)';
        Expression={[math]::Round($_.WorkingSet64/1MB, 2)}
    }

# Find memory leaks (basic)
$samples = 1..3 | ForEach-Object {
    Get-Process | Select-Object Name, WorkingSet64
    Start-Sleep -Seconds 30
}
$samples | Group-Object Name | 
    Where-Object { $_.Count -eq 3 } |
    ForEach-Object {
        $name = $_.Name
        $growth = ($_.Group.WorkingSet64 | Measure-Object -Minimum -Maximum)
        [PSCustomObject]@{
            Name = $name
            Growth = [math]::Round(($growth.Maximum - $growth.Minimum)/1MB, 2)
        }
    } |
    Where-Object Growth -gt 10 |
    Sort-Object Growth -Descending

Startup Location Tracking

Find where processes are launching from:

# Get process paths and signatures
Get-Process | Where-Object Path | Select-Object Name, Path,
    @{Name='Signature';Expression={
        Get-AuthenticodeSignature $_.Path | 
        Select-Object -ExpandProperty Status
    }}

# Check startup locations
Get-CimInstance Win32_StartupCommand | 
    Select-Object Name, Command, Location, User

Network Connections

See what processes are communicating:

# Get processes with network connections
Get-NetTCPConnection | 
    Where-Object State -eq 'Established' |
    Select-Object LocalAddress, LocalPort, RemoteAddress, RemotePort,
        @{Name='ProcessName';Expression={
            (Get-Process -Id $_.OwningProcess).Name
        }}

# Look for listening ports
Get-NetTCPConnection | 
    Where-Object State -eq 'Listen' |
    Select-Object LocalPort,
        @{Name='ProcessName';Expression={
            (Get-Process -Id $_.OwningProcess).Name
        }}

Pro Tips

Watch for encoded commands: Base64 encoded PowerShell commands might be suspicious
Check digital signatures: Unsigned executables in unusual locations warrant investigation
Monitor parent-child: Unusual parent processes might indicate process injection
Track remote connections: Processes with unexpected network connections need attention

Common Investigation Scenarios

Here's a quick script for basic malware hunting:

# Quick malware hunt
$suspiciousProcesses = Get-Process | Where-Object {
    $_.Path -and (
        # Unusual locations
        $_.Path -like "$env:TEMP\*" -or
        $_.Path -like "$env:APPDATA\*" -or
        # No signature
        (Get-AuthenticodeSignature $_.Path).Status -eq 'NotSigned' -or
        # High CPU with network
        ($_.CPU -gt 70 -and (Get-NetTCPConnection |
            Where-Object OwningProcess -eq $_.Id))
    )
} | Select-Object Name, Id, Path, CPU,
    @{Name='Connections';Expression={
        (Get-NetTCPConnection | 
        Where-Object OwningProcess -eq $_.Id).Count
    }}

Wrapping Up

Remember to baseline your normal system behavior to better identify anomalies.

PowerShell with obvious false positive suspicious processes

Stay safe, and happy hunting! 🕵️‍♂️

P.S. For more details, check out the official documentation:

Get-Process Documentation
Get-CimInstance Documentation
Get-NetTCPConnection Documentation

PowerShell Quick Guide: Remote Management Basics

December 23, 2024 0 Comments 10 min read

Hey there, fellow threat hunters! 👋 Today we're diving into PowerShell remote management. Whether you're managing a fleet of servers or investigating a suspicious endpoint, knowing how to work remotely is essential. Let's get started!

Check Remote Access

First, let's see if we can even connect remotely:

# Test WinRM connectivity
Test-WSMan -ComputerName "remote-pc.domain.name"

# Check if PowerShell remoting is enabled
$computerName = "remote-pc.domain.name"
Test-NetConnection -ComputerName $computerName -Port 5985 # HTTP
Test-NetConnection -ComputerName $computerName -Port 5986 # HTTPS

Starting a Remote Session

There are several ways to work remotely. Here are the most common:

# Method 1: One-off command
Invoke-Command -ComputerName "remote-pc" -ScriptBlock {
    Get-Service | Where-Object Status -eq "Running"
}

# Method 2: Interactive session
Enter-PSSession -ComputerName "remote-pc"

# Method 3: Multiple computers at once
$computers = "server1", "server2", "server3"
Invoke-Command -ComputerName $computers -ScriptBlock {
    Get-Process | Select-Object Name, CPU, PM
}

Working with Credentials

Sometimes you need different credentials:

# Store credentials securely
$cred = Get-Credential

# Use stored credentials
Enter-PSSession -ComputerName "remote-pc" -Credential $cred

# For multiple machines
Invoke-Command -ComputerName $computers -Credential $cred -ScriptBlock {
    Get-WinEvent -LogName Security -MaxEvents 10
}

Understanding the Protocols

PowerShell remoting isn't magic - it relies on specific protocols:

WS-Management (WinRM): The core protocol that handles the remote connections
- Uses HTTP (5985) or HTTPS (5986)
- Handles authentication and encryption
- Built on SOAP (Simple Object Access Protocol)
Kerberos/NTLM: For authentication
- Kerberos is used in domain environments
- NTLM is the fallback for workgroup scenarios

There's also legacy protocols that you might encounter:

DCOM (Distributed COM): Older method, still used by some cmdlets
- Uses RPC (TCP 135)
- Less secure than WinRM
- Still used by Get-WmiObject (but not by Get-CimInstance)

# Check which protocol you're using
Get-CimInstance -ComputerName "remote-pc" Win32_OperatingSystem # Uses WinRM
Get-WmiObject -ComputerName "remote-pc" Win32_OperatingSystem  # Uses DCOM

File Operations

Need to copy files? Here's how:

# Copy a file to remote machine
Copy-Item -Path "C:\Scripts\test.ps1" `
    -Destination "C:\Scripts\" `
    -ToSession (New-PSSession -ComputerName "remote-pc")

# Copy from remote machine
Copy-Item -Path "C:\Logs\error.log" `
    -Destination "C:\LocalLogs\" `
    -FromSession (New-PSSession -ComputerName "remote-pc")

Session Management

Keep your sessions under control:

# Create a persistent session
$session = New-PSSession -ComputerName "remote-pc"

# Use the session multiple times
Invoke-Command -Session $session -ScriptBlock {
    Get-Process
}

# Clean up when done
Remove-PSSession -Session $session

# List all active sessions
Get-PSSession

Common Issues and Solutions

# Enable PowerShell remoting (run as admin on remote PC)
Enable-PSRemoting -Force

# Add host to trusted list (if not in domain)
Set-Item WSMan:\localhost\Client\TrustedHosts -Value "remote-pc" -Force

# Increase timeout for long-running commands
$session = New-PSSession -ComputerName "remote-pc" -MaxConnectionRetryCount 5

Pro Tips

Use session splat: Create a hashtable for session parameters you use often
Clean up sessions: Always remove sessions when done to free up resources
Mind the scope: Variables in remote sessions are isolated by default
Consider security: HTTPS (5986) is more secure than HTTP (5985)

Security Considerations

# Check current WinRM security configuration
winrm get winrm/config/client
winrm get winrm/config/service

# Configure HTTPS listener (more secure)
New-SelfSignedCertificate -DnsName "domain.name" `
    -CertStoreLocation "Cert:\LocalMachine\My"
    
# Configure HTTPS WinRM listener (run as admin)
winrm create winrm/config/Listener?Address=*+Transport=HTTPS...

Wrapping Up

Remote management is powerful but requires careful attention to security. Always use the principle of least privilege and clean up your sessions!

Stay safe, and happy hunting! 🕵️‍♂️

P.S. Check out the official documentation for more details:

PowerShell Remoting Documentation
Remoting Troubleshooting Guide
WS-Management Documentation

PowerShell Quick Guide: Managing Event Log Sizes and Retention

December 22, 2024 0 Comments 10 min read

Hey there, fellow threat hunters! 👋 Today we're talking about something that can bite you when you least expect it - Event Log sizes and retention policies. Because nobody wants to investigate an incident only to find out the logs are gone!

Check Current Log Settings

First, let's see what we're working with:

# Get settings for all logs
Get-WinEvent -ListLog * | 
Where-Object {$_.RecordCount -gt 0} | 
Select-Object LogName, FileSize, MaximumSizeInBytes, RecordCount, LogMode |
Sort-Object FileSize -Descending |
Format-Table -AutoSize

# Or focus on security logs
Get-WinEvent -ListLog Security | 
Select-Object LogName, FileSize, MaximumSizeInBytes, RecordCount, LogMode

Understanding LogMode

The LogMode property tells you what happens when the log is full:

Circular: Overwrites oldest events (default)
AutoBackup: Automatically archives and starts new log
Retain: Keeps events and requires manual clearing
Archive: Similar to AutoBackup but stops logging when full

Modifying Log Settings

Let's adjust these settings to meet our needs:

# Increase Security log size to 4GB
$maximumSize = 4GB
wevtutil set-log Security /maxsize:$maximumSize

# Or using PowerShell's Limit-EventLog (works for classic Windows logs)
Limit-EventLog -LogName Security -MaximumSize 4GB

# Change retention to AutoBackup
wevtutil set-log Security /retention:true

Checking Available Space

Before setting huge log sizes, check your disk space:

# Get disk space where Windows is installed
Get-WmiObject -Class Win32_LogicalDisk |
Where-Object {$_.DeviceID -eq "C:"} |
Select-Object DeviceID, 
    @{N='FreeSpace(GB)';E={[math]::Round($_.FreeSpace/1GB, 2)}},
    @{N='TotalSpace(GB)';E={[math]::Round($_.Size/1GB, 2)}}

Backup Before Changes

Always backup important logs before making changes:

# Backup Security log
wevtutil export-log Security "C:\Backup\Security_$(Get-Date -Format 'yyyyMMdd').evtx"

You could obviously also use the GUI

Pro Tips

Monitor log sizes: Set up alerts for when logs are near capacity
Regular backups: Automate log exports for critical events
Right-size your logs: Balance between retention needs and disk space
Check compliance: Some regulations require specific retention periods

Monitoring Script

Here's a simple script to monitor log sizes:

# Monitor logs over 75% full
Get-WinEvent -ListLog * | 
Where-Object {$_.RecordCount -gt 0} |
ForEach-Object {
    $percentFull = ($_.FileSize / $_.MaximumSizeInBytes) * 100
    if ($percentFull -gt 75) {
        [PSCustomObject]@{
            LogName = $_.LogName
            PercentFull = [math]::Round($percentFull, 2)
            MaxSize = [math]::Round($_.MaximumSizeInBytes/1MB, 2)
            CurrentSize = [math]::Round($_.FileSize/1MB, 2)
        }
    }
} |
Format-Table -AutoSize

Wrapping Up

Properly configured log sizes and retention policies are crucial for security monitoring and incident response. Don't wait until it's too late to find out your logs are being overwritten!

Stay safe, and happy hunting! 🕵️‍♂️

P.S. Want to learn more? Check out the official Microsoft documentation:

PowerShell Limit-EventLog Documentation
Wevtutil Command Documentation
Windows Event Log Architecture

PowerShell Quick Guide: Working with Event Logs Like a Pro

December 22, 2024 0 Comments 10 min read

Hey there, fellow threat hunters! 👋 Today we're diving into the fascinating world of Windows Event Logs with PowerShell. Sure, the Event Viewer GUI is nice, but real pros use PowerShell to get exactly what they need. Let's cut through the noise and get to the good stuff!

The Basics

First things first - let's see what we're working with. Here's how to get a list of available logs:

Get-WinEvent -ListLog * | Where-Object {$_.RecordCount -gt 0} | Select-Object LogName, RecordCount

Finding the Important Stuff

Nobody wants to scroll through thousands of events. Here's how to find what matters:

# Get last 50 Error events from System log
Get-WinEvent -FilterHashtable @{
    LogName = 'System'
    Level = 2  # Error level
} -MaxEvents 50

# Look for recent failed logons across your domain
$start = (Get-Date).AddHours(-1)
Get-WinEvent -FilterHashtable @{
    LogName = 'Security'
    ID = 4625  # Failed logon attempts
    StartTime = $start
} -ErrorAction SilentlyContinue  # Handles case when no events are found

Domain Controller Logs

For Active Directory environments, the most valuable logs are often on your Domain Controllers. Here's how to access them:

# Access DC logs remotely
$dc = "DC01.domain.name"
Get-WinEvent -ComputerName $dc -FilterHashtable @{
    LogName = 'Security'
    ID = 4624  # Successful logon
    StartTime = (Get-Date).AddHours(-1)
} -ErrorAction SilentlyContinue

# Or connect directly to your DC and run:
Get-WinEvent -FilterHashtable @{
    LogName = 'Directory Service'  # AD-specific events
    Level = 2  # Error level
} -MaxEvents 50

Pro tip: Always check these logs on your Domain Controllers:

Security: For authentication and security-related events
Directory Service: For AD replication and changes
DNS Server: For DNS-related issues
DFS Replication: If using DFS

Handling No Results

Sometimes you won't find any events matching your criteria. Let's handle that gracefully:

try {
    Get-WinEvent -FilterHashtable @{
        LogName = 'Security'
        ID = 4625
        StartTime = (Get-Date).AddHours(-1)
    } -ErrorAction Stop
} catch {
    if ($_.Exception.Message -like '*No events were found*') {
        Write-Host "No matching events in the last hour"
    } else {
        Write-Host "Error: $($_.Exception.Message)"
    }
}

Common Security Event IDs

Here are some event IDs you'll want to know:

4624: Successful logon
4625: Failed logon
4688: New process created
4720: User account created
1102: Audit log cleared (Someone's hiding something?)
4647: User initiated logoff
4723: Password change attempt

Making It Useful

Let's create something actually useful - checking for potential brute force attempts:

# Find repeated failed logons
Get-WinEvent -FilterHashtable @{
    LogName = 'Security'
    ID = 4625
} -MaxEvents 1000 -ErrorAction SilentlyContinue | 
Select-Object TimeCreated,
    @{N='Username';E={$_.Properties[5].Value}},
    @{N='Source';E={$_.Properties[2].Value}} |
Group-Object Username |
Where-Object {$_.Count -gt 10}

Exporting for Analysis

Found something interesting? Export it for further analysis:

# Export to CSV with custom properties
Get-WinEvent -FilterHashtable @{
    LogName = 'Security'
    ID = 4688  # New process created
} -MaxEvents 100 -ErrorAction SilentlyContinue | 
Select-Object TimeCreated,
    @{N='Process';E={$_.Properties[5].Value}},
    @{N='Creator';E={$_.Properties[13].Value}} |
Export-Csv -Path ".\new_processes.csv" -NoTypeInformation

Pro Tips

Use FilterHashtable: It's WAY faster than Where-Object for event logs
Always handle errors: Use -ErrorAction SilentlyContinue or try/catch blocks
Test your filters: Start with a small MaxEvents value to verify your properties
Check your permissions: Some logs need admin rights to access
Remote collection: Use -ComputerName parameter for remote systems (requires appropriate permissions)
DC logs are gold: Most security events in an AD environment are best found on Domain Controllers

Wrapping Up

Event logs are a goldmine of information if you know how to dig. These PowerShell commands will help you find what you need without drowning in the noise.

Stay safe, and happy hunting! 🕵️‍♂️

P.S. Remember to check your event log sizes and retention policies - nothing worse than missing logs when you need them!

PowerShell Quick Guide: Exporting Data to CSV Files

December 22, 2024 0 Comments 5 min read

Hey there, fellow threat hunters! 👋 Today we're diving into something straightforward but super useful - how to export PowerShell data to CSV files. Whether you're collecting system information, analyzing logs, or just need to get data into Excel, this one's for you.

The Basics

PowerShell's Export-Csv cmdlet is your best friend when it comes to creating CSV files. Here's a simple example using scheduled tasks (because why not?):

Get-Process | Export-Csv -Path ".\processes.csv" -NoTypeInformation

Making It Better

But wait - do we really need ALL that data? Probably not. Let's be more specific using Select-Object:

# Select specific properties
Get-Process | Select-Object Name, Id, CPU | Export-Csv -Path ".\processes.csv" -NoTypeInformation

Pro Tips

Always use -NoTypeInformation: Keeps your CSV clean without the type information header
Filter first, export later: Use Where-Object to reduce data before exporting
Check your paths: Make sure you have write permissions where you're trying to save.

Some Useful Examples

Here are some practical examples you might want to use:

# Export running services
Get-Service | Where-Object {$_.Status -eq 'Running'} | Export-Csv -Path ".\running_services.csv" -NoTypeInformation

# Export user information
Get-LocalUser | Select-Object Name, Enabled, LastLogon | Export-Csv -Path ".\users.csv" -NoTypeInformation

# Export installed software (might need admin rights)
Get-WmiObject -Class Win32_Product | Select-Object Name, Version, Vendor | Export-Csv -Path ".\installed_software.csv" -NoTypeInformation

Quick Troubleshooting

If you're getting weird characters in Excel:

# Use UTF8 encoding with BOM
Get-Process | Export-Csv -Path ".\processes.csv" -NoTypeInformation -Encoding UTF8

Wrapping Up

There you have it - a quick guide to exporting data from PowerShell. Simple, effective, and incredibly useful for both analysis and documentation.

Stay safe, and happy hunting! 🕵️‍♂️

Visualizing with MITRE ATT&CK Navigator: How to Visualize Mapped Data in MITRE ATT&CK Navigator

December 18, 2024 0 Comments 10 min read

Hey there, fellow threat hunters! 👋 Welcome to part 3 of our MITRE ATT&CK journey! In our previous posts, we covered data retrieval and relationship mapping. Today, we're diving into something visually exciting - analyzing and visualizing our MITRE ATT&CK data using the MITRE ATT&CK Navigator.

The Story Our Data Tells

Let's start by looking at some interesting statistics our analyzer uncovered:

2024-12-22 14:09:46,402 - __main__ - INFO - Statistics for techniques:
2024-12-22 14:09:46,402 - __main__ - INFO -   all_techniques: 799
2024-12-22 14:09:46,402 - __main__ - INFO -   total_used_techniques: 799
2024-12-22 14:09:46,402 - __main__ - INFO -   total_groups: 3969
2024-12-22 14:09:46,402 - __main__ - INFO -   total_mitigations: 1372
2024-12-22 14:09:46,402 - __main__ - INFO -   total_references: 3961
2024-12-22 14:09:46,403 - __main__ - INFO -   avg_groups_per_technique: 4.97
2024-12-22 14:09:46,403 - __main__ - INFO -   avg_mitigations_per_technique: 1.72
2024-12-22 14:09:46,403 - __main__ - INFO -   avg_references_per_technique: 4.96

Those are some impressive numbers! But raw numbers don't tell the whole story. Let's see how our analyzer helps us make sense of it all.

The Analysis Engine

Our analyzer.py is the brain behind our visualization operation. Here's how it works:

1. Statistical Analysis

First, we calculate comprehensive statistics for our techniques:

    def analyze_and_update_techniques(techniques: List[Dict[str, Any]], 
                                 all_techniques_length: int) -> tuple[Dict[str, Any], 
                                                                    List[Dict[str, Any]]]:
    for technique in techniques:
        technique['stats'] = {
            'groups_count': len(technique.get('groups', [])),
            'mitigations_count': len(technique.get('mitigations', [])),
            'referenced_count': len(technique.get('external_references', []))
        }

2. Dynamic Color Generation

One cool feature is our dynamic color gradient generation for heatmaps:

    def generate_color_gradient(start_hex: str, end_hex: str, steps: int) -> List[str]:
    def hex_to_hsv(hex_color: str) -> Tuple[float, float, float]:
        hex_color = hex_color.lstrip('#')
        rgb = tuple(int(hex_color[i:i+2], 16) for i in (0, 2, 4))
        return colorsys.rgb_to_hsv(rgb[0]/255, rgb[1]/255, rgb[2]/255)
    
    start_hsv = hex_to_hsv(start_hex)
    end_hsv = hex_to_hsv(end_hex)
    
    gradient = []
    for i in range(steps):
        ratio = i / (steps - 1)
        hsv = tuple(
            start + (end - start) * ratio
            for start, end in zip(start_hsv, end_hsv)
        )
        gradient.append(hsv_to_hex(hsv))
    
    return gradient

Creating Navigator Layers

The real magic happens when we create our Navigator layers. These visualizations help us understand our data at a glance. The MITRE ATT&CK Navigator is a powerful web-based tool that helps us visualize and annotate ATT&CK matrices.

Our code creates three different types of layer files:

Groups Layer: Shows how many groups use each technique
Mitigations Layer: Displays mitigation coverage
References Layer: Indicates how well-documented each technique is

Here's how we create these layers:

    def create_navigator_layer(techniques: List[Dict[str, Any]], 
                         layer_name: str, 
                         count_type: str, 
                         hide_uncovered=True) -> Dict[str, Any]:
    # Calculate color thresholds based on data distribution
    colors = calculate_color_thresholds(techniques, count_type)
    
    # Basic layer structure required by Navigator
    result_data = {
        "description": f"Enterprise techniques heat map showing {count_type} count",
        "name": layer_name,
        "domain": "enterprise-attack",
        "versions": {
            "attack": "16",
            "navigator": "5.0.0",
            "layer": "4.5"
        },
        "gradient": {
            "colors": [],
            "minValue": 0,
            "maxValue": 1
        },
        "legendItems": [],
        "techniques": [],
        "showTacticRowBackground": True,
        "tacticRowBackground": "#dddddd",
        "selectTechniquesAcrossTactics": True,
        "selectSubtechniquesWithParent": True,
        "selectVisibleTechniques": False,
        "layout": {
            "layout": "flat",
            "showName": True,
            "showID": False,
            "expandedSubtechniques": True
        },
        "hideDisabled": True # Make sure to hide disabled techniques for better overview.
    }

For each technique, we add detailed information to our layer:

    # Adding technique entries
    for technique in techniques:
        if not technique.get("technique_id"):
            logger.debug(f"Could not find technique_id for {technique.get('name', 'Unknown')}")
            continue
            
        count = technique['stats'].get(f'{count_type}_count', 0)
        comment = f"{count} {count_type}"
        color = find_color_for_count(colors, count)
        
        technique_entry = {
            "techniqueID": technique["technique_id"],
            "color": color,
            "comment": comment,
            "showSubtechniques": True,
            "enabled": not hide_uncovered or count > 0,
        }
        result_data["techniques"].append(technique_entry)

Finally, we add a gradient legend to help interpret the visualization:

    # Add color gradient and legend
    result_data["gradient"]["colors"] = [v['color'] for v in colors.values()]
    result_data["legendItems"] = [
        {
            "label": "More" if k == "more" else str(k),
            "color": v['color']
        } for k, v in colors.items()
    ]

We save these layers as separate JSON files that can be imported directly into the MITRE ATT&CK Navigator:

    def save_navigator_layers(analysis_results: Dict[str, Any], 
                           output_dir: str, 
                           hide_uncovered=True) -> None:
    os.makedirs(output_dir, exist_ok=True)
    
    layer_types = {
        'groups': 'Groups Heat Map',
        'mitigations': 'Mitigations Heat Map',
        'references': 'References Heat Map'
    }
    
    for layer_type, layer_name in layer_types.items():
        layer = create_navigator_layer(
            analysis_results['techniques'],
            layer_name,
            layer_type,
            hide_uncovered=hide_uncovered
        )
        
        output_path = os.path.join(output_dir, f'{layer_type}_layer.json')
        with open(output_path, 'w') as f:
            json.dump(layer, f, indent=2)
        logger.info(f"Saved {layer_type} layer to {output_path}")

These layer files can then be loaded into the MITRE ATT&CK Navigator web interface (https://mitre-attack.github.io/attack-navigator/), giving us beautiful visualizations of our data. Each layer provides different insights:

Groups Layer: Darker colors indicate techniques used by more groups, helping identify commonly used tactics
Mitigations Layer: Shows which techniques have more or fewer documented mitigations, highlighting potential defensive gaps
References Layer: Indicates which techniques are well-documented vs. those that might need more research

Understanding the Visualizations

Groups Coverage Heatmap

Our first visualization shows which techniques are most commonly used by threat groups. Some interesting findings:

T1059.001 (PowerShell): Used by 76 groups
T1204.002 (Malicious File): Used by 82 groups
T1566.001 (Spearphishing Attachment): Used by 77 groups

Mitigations Coverage

The mitigations heatmap reveals some interesting patterns:

T1552 (Unsecured Credentials): 11 mitigations
T1072 (Software Deployment Tools): 10 mitigations
Some heavily-used techniques have surprisingly few documented mitigations

Deep Dive into the Statistics

Our analyzer calculates some fascinating metrics:

overall_stats = {
    "all_techniques": all_techniques_length,
    'total_used_techniques': total_techniques,
    'total_groups': sum(t['stats']['groups_count'] for t in techniques),
    'total_mitigations': sum(t['stats']['mitigations_count'] for t in techniques),
    'avg_groups_per_technique': safe_average([t['stats']['groups_count'] 
                                            for t in techniques], total_techniques),
    # ... more statistics ...
}

Pro Tips for Analysis

Color Distribution: Use quantiles for better color distribution in heatmaps:

  quarts = quantiles(counts, n=4)
  thresholds = sorted(list(set([
      0,
      round(quarts[0]),
      round(mean(counts)),
      round(quarts[2]),
      round(max_count * 0.9),
      max_count
  ])))

Handle Edge Cases: Always provide default colors for edge cases:

def default_color_scheme() -> Dict[str, Dict[str, str]]:
    return {
        "0": {"color": "#ffffff"},  # White
        "1": {"color": "#ff6666"},  # Light red
        "more": {"color": "#2b0000"}  # Very deep red
    }

Save Your Work: Always save your layers for future reference:

def save_navigator_layers(analysis_results: Dict[str, Any], 
                           output_dir: str, 
                           hide_uncovered=True) -> None:
    os.makedirs(output_dir, exist_ok=True)
    # ... save layers ...

Making the Most of Your Analysis

Here are some practical ways to use this analysis:

1. Defense Planning

Identify techniques with high group usage but low mitigation coverage
Focus on implementing mitigations for frequently used techniques
Track your defensive coverage over time

2. Threat Intelligence

Identify trending techniques among threat groups
Spot gaps in your defensive strategy
Prioritize your security investments

Common Analysis Pitfalls

Data Freshness: Always check your MITRE ATT&CK data version
Context Matters: Not all techniques are equally relevant to your environment
Color Schemes: Choose colors that make sense for your audience
Scale Considerations: Be careful with color gradient steps - too many or too few can be misleading

What's Next?

In our next post, we'll make this analysis more actionable by focusing on a specific use case: identifying and analyzing relevant threat groups for a financial services organization. We'll cover:

Filtering Groups by Sector
- Identifying groups known to target financial institutions
- Analyzing their common techniques
- Understanding their typical attack patterns
Creating Custom Layers
- Building sector-specific visualizations
- Highlighting relevant techniques
- Mapping existing security controls
Practical Application
- Developing focused detection strategies
- Prioritizing security investments

Getting Started

Want to try this yourself? Here's how:

# Clone the repository
git clone https://github.com/mitre-attack/mitreattack-python.git
cd mitre

# Install requirements
pip install -r requirements.txt

# Run the analysis
python main.py

Check out Parts 1 and 2 if you haven't already - they'll help you understand how we got here!

Final Thoughts

Remember, visualization isn't just about making pretty pictures - it's about making data actionable. Use these tools to understand your threat landscape better and make informed security decisions.

Stay tuned for more security scripting adventures! And remember - sometimes the best insights come from just visualizing your data differently! 🕵️‍♂️

Until next time, happy hunting!

References

MITRE ATT&CK Framework
MITRE ATT&CK Navigator
Cyberchef
NIST CSF

Mapping with MITRE ATT&CK: Mapping MITRE ATT&CK for Full Potential

December 17, 2024 0 Comments 10 min read

Hey there, fellow threat hunters! 👋 Welcome back to part 2 of our MITRE ATT&CK journey! Last time, we built a solid foundation by setting up our data fetching infrastructure. If you haven't read part 1 yet, I highly recommend checking it out first.

Today, we're going to dive into something really exciting - mapping relationships between different MITRE ATT&CK components. We've got techniques, groups, and mitigations all waiting to be connected!

The Power of Relationships

Why are relationships so important? Well, imagine trying to defend against threats without knowing which groups use which techniques, or which mitigations counter which attacks. That's like playing chess without knowing how the pieces move!

Enter the TechniqueMapper

The star of today's show is our TechniqueMapper class. This beautiful piece of code handles all the complex relationships between techniques, groups, and mitigations. Let's break it down:

    class TechniqueMapper:
    def __init__(self, attack):
        self.attack = attack
    
    def map_groups_to_technique(self, technique_id: str) -> List[Dict[str, Any]]:
        """Maps groups to a specific technique"""
        try:
            if not technique_id:
                return []
                
            tech_obj = self.attack.get_object_by_attack_id(technique_id, "attack-pattern")
            if not tech_obj:
                logger.warning(f"Could not find technique object for {technique_id}")
                return []
                
            logger.debug(f"Getting groups for technique {technique_id}")
            
            groups = self.attack.get_groups_using_technique(tech_obj.id)
            return [make_json_serializable(group) for group in groups] if groups else []
            
        except Exception as e:
            logger.error(f"Error mapping groups for technique {technique_id}: {str(e)}")
            return []

Understanding the Mapping Process

Our mapping process involves three main components:

1. Group Mapping

First, we map threat groups to techniques. This tells us who's using what:

    def map_groups_to_technique(self, technique_id: str) -> List[Dict[str, Any]]:
    tech_obj = self.attack.get_object_by_attack_id(technique_id, "attack-pattern")
    groups = self.attack.get_groups_using_technique(tech_obj.id)
    return [make_json_serializable(group) for group in groups]

2. Mitigation Mapping

Next, we map mitigations to techniques. This helps us understand our defense options:

    def map_mitigations_to_technique(self, technique_id: str) -> List[Dict[str, Any]]:
    tech_obj = self.attack.get_object_by_attack_id(technique_id, "attack-pattern")
    mitigations = self.attack.get_mitigations_mitigating_technique(tech_obj.id)
    return [make_json_serializable(mitigation) for mitigation in mitigations]

3. Making Everything JSON-Serializable

One of the trickiest parts was dealing with STIX objects. They're great for storing data, but not so great for JSON serialization. Here's our solution:

    def make_json_serializable(obj):
    """Convert STIX objects to dictionaries"""
    if hasattr(obj, 'serialize'):
        return json.loads(obj.serialize())
    elif isinstance(obj, dict):
        return {k: make_json_serializable(v) for k, v in obj.items()}
    elif isinstance(obj, list):
        return [make_json_serializable(i) for i in obj]
    return obj

Putting It All Together

The magic happens in our map_all_data function:

    def map_all_data(data: Dict[str, Any]) -> List[Dict[str, Any]]:
    """Main function to map all relationships"""
    attack = data['attack']
    
    mapper = TechniqueMapper(attack)
    mapped_data = mapper.map_all_techniques(data['techniques'])
    
    # Ensure everything is JSON serializable
    return make_json_serializable(mapped_data)

The Results

When you run this code, you'll get something like this:

24-12-22 14:07:35,539 - __main__ - INFO - Loaded 799 techniques
2024-12-22 14:07:35,539 - __main__ - INFO - Loaded 174 groups
2024-12-22 14:07:35,539 - __main__ - INFO - Loaded 268 mitigations
2024-12-22 14:07:35,539 - __main__ - INFO - Mapping relationships
2024-12-22 14:07:35,539 - mapper - INFO - Mapping techniques to groups and mitigations...
2024-12-22 14:07:43,882 - mapper - INFO - Processed 50/799 techniques...

Pro Tips for Working with the Mapper

Cache Your Results: The mapping process can be time-consuming. Save the results!
Handle Edge Cases: Not all techniques have groups or mitigations. Always check for empty results.
Log Everything: When dealing with complex relationships, logging is your best friend.

Common Pitfalls to Avoid

Here are some things we learned the hard way:

STIX Object Serialization: Always use make_json_serializable before trying to save or transmit data
Circular References: STIX objects can have circular references - our serialization handles this
Error Handling: Network issues, missing data, invalid references - there's a lot that can go wrong

Debugging Tips

When things go wrong (and they will), here's how to debug:

    def debug_technique_mapping(technique_id: str):
    """Helper function for debugging mapping issues"""
    logger.setLevel(logging.DEBUG)
    tech_obj = self.attack.get_object_by_attack_id(technique_id, "attack-pattern")
    logger.debug(f"Technique object: {tech_obj}")
    groups = self.attack.get_groups_using_technique(tech_obj.id)
    logger.debug(f"Found {len(groups)} groups for technique {technique_id}")
    return groups

Real-World Example

Let's look at what our mapped data actually looks like. Here's a snippet for the technique T1059.001 (PowerShell):

What's Next?

Now that we have our data all mapped and connected, we can do some really interesting analysis. In our next post, we'll:

Create visualizations of technique usage patterns
Analyze which groups use which combinations of techniques
Build heat maps of technique popularity

Stay tuned for Part 3 where we'll turn this mapped data into actionable intelligence! 🕵️‍♂️

Getting the Code

Want to try it yourself? The complete code is available in our project.
https://github.com/SecurityScriptographer/mitre

Remember to check out Part 1 first to set up your environment properly.

Until next time, happy hunting! And remember - in threat intelligence, connections matter! 🔍

References

Cyberchef
NIST CSF

Getting Started with MITRE ATT&CK: Fetching and Processing Data Like a Pro

December 17, 2024 0 Comments 10 min read

Hey there, fellow threat hunters! 👋 Today, we're diving into something that every security professional should have in their toolkit - working with MITRE ATT&CK data programmatically. If you've been manually browsing the MITRE website to look up techniques, it's time to level up your game!

What's MITRE ATT&CK Anyway?

Before we get our hands dirty with code, let's quickly understand what we're dealing with. MITRE ATT&CK is basically the encyclopedia of adversary tactics and techniques - think of it as the "bad guys' playbook" that we use to improve our defenses. It's a globally-accessible knowledge base of adversary tactics and techniques based on real-world observations.

For more information you can look at our MITRE ATT&CK Fundamentals.

Project Setup

First things first, we'll need a few ingredients for our cyber-soup:

Python (because we're not savages 😉)
The mitreattack-python library
Basic understanding of JSON (or at least the ability to pretend you do)
Coffee ☕ (optional but highly recommended)

Let's set up our project structure:

mitre_project/
├── main.py
├── config.py
├── loader.py
└── cache/
    └── enterprise-attack.json

The Code Breakdown

Our project keeps things neat and organized. Let's break down each component and see what makes it tick.

config.py - The Settings Master

This is where we keep all our configuration settings neat and tidy:

import os
import logging

# Paths
BASE_DIR = os.path.dirname(os.path.abspath(__file__))
CACHE_DIR = os.path.join(BASE_DIR, 'cache')

# Cache File paths
TECH_PATH = os.path.join(CACHE_DIR, 'all_attack_techniques.json')

def setup_logging():
    logging.basicConfig(
        level=logging.INFO,
        format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',
        handlers=[logging.StreamHandler()]
    )

The config file keeps everything organized and makes sure we're consistent with where we store our data. It also sets up our logging configuration because, let's face it, print statements are so 2010.

loader.py - The Data Fetcher

Here's where the real magic starts. Our loader uses the official MITRE ATT&CK Python library to fetch and process the data:

def load_attack_data(use_cache: bool = True) -> MitreAttackData:
    """Initialize MitreAttackData with STIX data"""
    stix_path = os.path.join(os.path.dirname(__file__), 'cache', 'enterprise-attack.json')
    
    if not os.path.exists(os.path.dirname(stix_path)):
        os.makedirs(os.path.dirname(stix_path))
    
    if os.path.exists(stix_path) and use_cache:
        logger.info("Loading STIX data from cache")
    else:
        logger.info("Downloading latest STIX data")
        import requests
        
        url = "https://raw.githubusercontent.com/mitre/cti/master/enterprise-attack/enterprise-attack.json"
        response = requests.get(url)
        response.raise_for_status()
        
        with open(stix_path, 'w', encoding='utf-8') as f:
            json.dump(response.json(), f, ensure_ascii=False, indent=2)

The loader handles several crucial tasks:

Fetching the latest STIX data from MITRE's GitHub repository
Managing local caching (because waiting for downloads is like watching paint dry)
Converting STIX objects into Python-friendly dictionaries
Extracting technique information with proper references

Here's how we handle technique extraction:

def get_techniques(attack: MitreAttackData) -> list:
    """Get all techniques from MITRE ATT&CK"""
    techniques = []
    for technique in attack.get_techniques():
        technique_id = None
        external_references = []
        
        if hasattr(technique, 'external_references'):
            external_references = make_json_serializable(technique.external_references)
            for ref in external_references:
                if ref.get('source_name') == 'mitre-attack':
                    technique_id = ref.get('external_id')
                    break
                
        if technique_id:
            techniques.append({
                "technique_id": technique_id,
                "id": technique.id,
                "name": technique.name,
                "description": technique.description,
                "external_references": external_references,
                "groups": [],
                "mitigations": []
            })
    return techniques

Making It All Work Together

In main.py, we bring everything together:

if __name__ == "__main__":
    # Set up logging
    setup_logging()
    logging.getLogger().setLevel(logging.INFO)
    
    # Load all data including the attack object
    data = load_all_data()

When you run this, you'll see something like:

2024-12-22 14:07:29,945 - loader - INFO - Loading STIX data from cache
2024-12-22 14:07:35,539 - __main__ - INFO - Loaded 799 techniques
2024-12-22 14:07:35,539 - __main__ - INFO - Loaded 174 groups
2024-12-22 14:07:35,539 - __main__ - INFO - Loaded 268 mitigations

Error Handling

One thing we've learned the hard way: MITRE data can sometimes be unpredictable. Here's how we handle that:

def make_json_serializable(obj):
    """Convert STIX objects to dictionaries"""
    if hasattr(obj, 'serialize'):
        return json.loads(obj.serialize())
    elif isinstance(obj, dict):
        return {k: make_json_serializable(v) for k, v in obj.items()}
    elif isinstance(obj, list):
        return [make_json_serializable(i) for i in obj]
    return obj

This function makes sure everything can be properly serialized to JSON, no matter what weird formats MITRE throws at us.

Get The Code

Want to dive right in? The complete code is right here in this project. Just:

# Clone the repository
git clone [your-repo-url]
cd mitre

# Create a virtual environment (optional but recommended)
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install requirements
pip install -r requirements.txt

Troubleshooting Common Issues

Here are some common gotchas you might run into:

"Module not found" errors: Make sure you've installed all requirements and activated your virtual environment
JSON serialization errors: Check that you're using our make_json_serializable function
Cache issues: Try deleting the cache directory and letting the script download fresh data

What's Next?

In our next post, we'll dive into how to map relationships between different MITRE ATT&CK components. We'll explore how to connect techniques with the groups that use them and the mitigations that defend against them.

Until then, happy hunting!

References

MITRE ATT&CK
mitreattack-python
Cyberchef
NIST CSF