Sigma Rules for SIEM Detection: A Beginner's Guide

Sigma rules for SIEM detection are what YARA is to files: a structured, vendor-neutral way to describe what a bad log event looks like, written once and converted to whatever query language your SIEM actually speaks. I kept rewriting the same detection three times — once in KQL for Defender, once in SPL for a client's Splunk, once by hand in a spreadsheet for review — until I committed to authoring in Sigma and converting from there. This post is the starting point I wish I had: what a Sigma rule is made of, the parts of the condition grammar that matter, and two working rules you can convert and run today.

Key Takeaways

  • A Sigma rule is a small YAML document with three parts that matter: logsource (where to look), detection (what to match), and condition (when to fire).
  • Sigma rules for SIEM detection are written once and converted to KQL, SPL, or Elasticsearch with the sigma CLI, so the same logic survives a platform migration.
  • The condition grammar — named selections combined with and, or, not, 1 of, and all of — is where false-positive rate is won or lost.
  • Field modifiers like |contains, |endswith, |re, and |base64offset let one rule catch obfuscated variants without a second rule.
  • Sigma pairs naturally with Sysmon, Windows Security auditing, and PowerShell Script Block Logging — the log sources I already had to configure for everything else.

Environment

  • sigma-cli 1.0+ (the modern Python tool, installed with pip install sigma-cli) — this replaces the older sigmac script.
  • A pySigma backend for your target. I used sigma plugin install kusto for Microsoft Sentinel and Defender XDR; Splunk and Elasticsearch backends exist too.
  • The SigmaHQ rule repository cloned locally as a reference corpus to read and adapt from.
  • Sysmon and Windows Security auditing on the endpoints generating the events, so the field names in my rules actually exist in the logs.
  • A test SIEM workspace (a Sentinel trial tenant) to validate converted queries before they reach production analytics rules.

The Problem

Detection logic does not travel. A query that catches encoded PowerShell in Microsoft Sentinel is written in KQL against the DeviceProcessEvents table. The same idea in Splunk is SPL against a sourcetype, and in Elastic it is a different field schema again. Every time I changed employer, tool, or client, the detections I had carefully tuned stayed behind because they were welded to one product's query language. Re-deriving them from memory is slow and lossy.

Sigma fixes the authoring half of that problem. You describe the event in an abstract schema — "a process creation where the image ends with powershell.exe and the command line contains an encoded-command flag" — and a converter emits the platform-specific query. It is not magic. The mapping between Sigma's abstract fields and your real log schema lives in a pipeline, and getting that pipeline right is the actual work. But once it is right, the detection itself is portable, reviewable, and diffable in Git like any other code.

The Solution — Writing Sigma Rules for SIEM Detection

Step 1 — Read the anatomy of a Sigma rule

Every Sigma rule is YAML with the same recognisable shape. The metadata at the top is documentation; the logsource and detection blocks are the logic:

title: Suspicious Example Detection
id: 6f3e1d20-3b7a-4f6e-9c0a-1d2e3f4a5b6c   # stable UUID, generate once
status: experimental
description: One sentence on what this catches and why it matters
references:
  - https://attack.mitre.org/techniques/T1059/001/
author: SecurityScriptographer
date: 2026-05-30
tags:
  - attack.execution
  - attack.t1059.001
logsource:
  category: process_creation
  product: windows
detection:
  selection:
    Image|endswith: '\powershell.exe'
    CommandLine|contains: '-enc'
  condition: selection
falsepositives:
  - Administrative scripts that legitimately use encoded commands
level: high

The id is a UUID you generate once and never change — it is how a SIEM tracks the rule across edits. level (informational through critical) maps to alert severity after conversion. Everything under detection is the part that becomes a query.

Step 2 — Point logsource at the right events

logsource is how Sigma knows which log to target, and it is the field most beginners get wrong. It has three optional keys — category, product, and service — and the converter's pipeline turns them into a concrete table, index, or EventID filter:

# Sysmon process creation (Event ID 1) or Security 4688
logsource:
  category: process_creation
  product: windows

# PowerShell Script Block Logging (Event ID 4104)
logsource:
  product: windows
  service: powershell

# Windows Security log directly
logsource:
  product: windows
  service: security

The category names are not arbitrary — they are a fixed taxonomy defined by SigmaHQ. If your endpoints are not actually generating the events a category implies, the converted query runs against an empty table and quietly finds nothing. This is why I treat Sysmon configuration and Script Block Logging via Event ID 4104 as prerequisites, not optional extras.

Step 3 — Compose the detection and condition

The detection block is a set of named maps (conventionally selection and filter), and condition is a small boolean grammar that combines them. Inside a single map, keys are AND-ed; a list of values for one key is OR-ed:

detection:
  selection:
    Image|endswith:
      - '\powershell.exe'
      - '\pwsh.exe'          # either image (OR)
    CommandLine|contains:    # AND this is also present
      - '-enc'
      - '-encodedcommand'
      - '-ep bypass'         # any one of these (OR)
  filter_admin:
    User|contains: 'svc_deploy'
  condition: selection and not filter_admin

The condition line is where you control noise. The patterns that get the most mileage are 1 of selection* (any selection block matches), all of selection* (every one matches), and the and not filter idiom for carving out known-good activity. This is the same N-of-M thinking that keeps YARA rules tight, applied to log fields instead of byte strings.

Step 4 — Use field modifiers to catch variants

Modifiers are appended to a field name with a pipe and change how the value matches. They are how one rule covers obfuscation instead of spawning ten near-duplicates:

CommandLine|contains: 'DownloadString'    # substring match
Image|endswith: '\rundll32.exe'           # anchored to the end
CommandLine|re: 'https?://\d{1,3}(\.\d{1,3}){3}/'   # raw regex
CommandLine|base64offset|contains: 'IEX'  # match base64-encoded payloads
DestinationIp|cidr: '10.0.0.0/8'          # network range
EventID|gt: 4624                          # numeric comparison

base64offset|contains is the one that earns its keep: it matches a plaintext string even when it appears as part of a base64 blob, regardless of byte alignment. That is exactly the loader pattern that -EncodedCommand produces.

Step 5 — A practical rule for Kerberoasting

Here is a rule for a technique I have written about before from the raw-log side. It fires on a Kerberos service-ticket request (Event ID 4769) using weak RC4 encryption for a non-machine account — the fingerprint of a Kerberoasting attempt:

title: Potential Kerberoasting via RC4 Service Ticket Request
id: a1c9e4d2-7f83-4b1e-bf6a-2c5d8e9f0a3b
status: experimental
description: Detects 4769 events requesting RC4-encrypted service tickets, a hallmark of Kerberoasting
references:
  - https://attack.mitre.org/techniques/T1558/003/
author: SecurityScriptographer
date: 2026-05-30
tags:
  - attack.credential_access
  - attack.t1558.003
logsource:
  product: windows
  service: security
detection:
  selection:
    EventID: 4769
    TicketEncryptionType: '0x17'   # RC4-HMAC
    TicketOptions: '0x40810000'
  filter_machine:
    ServiceName|endswith: '$'        # machine accounts
  filter_krbtgt:
    ServiceName: 'krbtgt'
  condition: selection and not 1 of filter_*
level: high

The two filter_* blocks remove the legitimate noise — machine-account tickets and the krbtgt account itself — so the analyst sees the requests that actually warrant a look. Tune the threshold and add an account-volume correlation in the SIEM if your environment is chatty.

Step 6 — Convert the rule to your SIEM's query language

Authoring is only useful if conversion works. The sigma CLI takes a target backend (-t) and a pipeline (-p) that maps abstract fields to your real schema:

# One-time setup
pip install sigma-cli
sigma plugin install kusto        # Microsoft Sentinel / Defender XDR backend

# Convert a single rule to KQL for Microsoft Sentinel
sigma convert -t kusto -p sysmon kerberoasting.yml

# Convert a whole directory to Splunk SPL
sigma convert -t splunk -p sysmon ./rules/

# List the backends and pipelines available
sigma list targets
sigma list pipelines

The Kerberoasting rule above converts to a SecurityEvent query that drops straight into a Sentinel analytics rule — close in spirit to the hand-written KQL in my SIEM correlation rules walkthrough, except I did not have to write it twice:

SecurityEvent
| where EventID == 4769
| where TicketEncryptionType == "0x17"
| where TicketOptions == "0x40810000"
| where not(ServiceName endswith "$")
| where ServiceName != "krbtgt"

The exact backend and pipeline names evolve as pySigma develops, so check sigma list rather than trusting a year-old tutorial. The mapping pipeline is the part that breaks first when field names differ from the defaults.

Step 7 — Test, tune, and treat rules as code

A rule that fires on benign activity trains the analyst to ignore it, so every rule gets validated against both a known-bad event and a quiet baseline before it ships. Because Sigma is plain YAML, the whole lifecycle fits in Git:

# Sanity-check that a rule parses and converts cleanly
sigma check ./rules/

# Convert and diff against what is already deployed
sigma convert -t kusto -p sysmon ./rules/ > deployed/kerberoasting.kql
git diff deployed/

This is the detection-as-code workflow: rules reviewed in pull requests, converted in CI, and deployed to the SIEM as an artefact rather than pasted into a portal by hand. It is the same discipline that makes monitoring the right Windows Event IDs repeatable instead of tribal knowledge.

Frequently Asked Questions

Is Sigma a SIEM or a replacement for one?

Neither. Sigma is a rule format and a converter. It produces queries that run inside your existing SIEM — Microsoft Sentinel, Splunk, Elastic, and others. It does not collect, store, or correlate logs itself; it just lets you write the detection once and emit it everywhere.

Does Sigma work with Microsoft Sentinel and Defender XDR?

Yes. Install the kusto backend with sigma plugin install kusto and convert with -t kusto. It emits KQL targeting the relevant tables, which you paste into a Sentinel analytics rule or a Defender custom detection. The field mapping is handled by the pipeline you select.

What is the difference between sigmac and sigma-cli?

sigmac is the legacy converter from the original Sigma project and is deprecated. sigma-cli, built on the pySigma library, is the current tool. If a tutorial tells you to run sigmac, it predates the rewrite — use sigma convert instead.

How do I avoid false positives in a Sigma rule?

Use filter blocks combined with and not in the condition to carve out known-good activity, prefer anchored modifiers like |endswith over loose |contains where you can, and validate against a real baseline. The falsepositives field documents the ones you accept so the next reviewer understands the tradeoff.

Where can I find Sigma rules to learn from?

The SigmaHQ/sigma repository on GitHub holds thousands of community rules organised by log source and platform. Read a rule before deploying it — quality and false-positive risk vary, and the abstract field names still have to match your environment's actual schema.

Conclusion

Sigma is the format I reach for when a detection needs to outlive the tool it was written against. The portability is real, but it is not free: the value is in the authoring discipline and the pipeline mapping, not in the converter pretending every SIEM is identical. Expect to spend your effort on field mappings and false-positive tuning, the same places you would spend it writing KQL or SPL by hand.

Start with the three blocks that matter — logsource, detection, condition — adapt a rule or two from SigmaHQ against log sources you already collect, and convert them into your SIEM. Within a few weeks you will have a small, version-controlled ruleset that you own independently of any one vendor. That is worth more than a perfectly tuned query you can never take with you.

Related Posts

Authoritative reference: the SigmaHQ project on GitHub and the Sigma specification.

Editorial note: posts on this blog are drafted with AI assistance and then reviewed, edited, and tested against a real environment before publishing. Commands, output, and screenshots come from systems I actually ran the work on.

0 comments:

Post a Comment