SIEM correlation rules are what turn a pile of Windows events into actual detections. Single-event rules ("alert on Event 4625") generate a lot of tickets and miss most attacks; correlation rules tie events together across users, hosts, and time to recognise the shapes that matter. This post walks through three concrete attack chains and the correlation logic we use to catch them in Splunk, Elastic, and QRadar query languages.
Key Takeaways
- Correlation rules join events across time, user, host, or source IP — a single event rarely tells the whole story.
- Anchor each rule to a MITRE ATT&CK technique. It forces precision and gives the analyst context when the rule fires.
- The three high-value patterns we always implement first: failed-to-successful authentication progression, suspicious PowerShell execution chain, and persistence creation right after privileged authentication.
- Time windows are the single most-tuned setting. Too narrow and the rule misses related events; too wide and unrelated events get linked.
- Always baseline before deploying. A correlation rule that has not seen your environment's normal patterns is guaranteed to alert on legitimate activity.
Environment
- Windows endpoints and Domain Controllers forwarding to a central SIEM.
- Advanced audit policy enabled — at minimum Account Logon, Logon/Logoff, Process Creation, and Object Access categories.
- PowerShell module logging and script block logging enabled via Group Policy.
- A SIEM that supports time-windowed joins. Examples below cover Splunk SPL, Elastic ES|QL, and IBM QRadar AQL.
The Problem
Most SIEM deployments accumulate logs faster than they accumulate detection. Single-event rules survive only until they are tuned out: "alert on 4625" becomes noise the first time someone fat-fingers a password ten times in a row. The events themselves are not the problem — they are evidence. Correlation rules use that evidence to recognise the shapes of real attacks: dwell time, lateral progression, the order in which artefacts appear.
The three patterns below cover a large fraction of the attack chains we see in mid-sized Windows environments. They are not exhaustive; they are the rules we ship first because the cost-to-value ratio is best.
The Solution
Pattern 1 — Brute force to successful logon to lateral movement (T1110.001 → T1078 → T1021)
An attacker spraying RDP against an exposed jump host succeeds, logs in, and immediately tries other internal hosts. The shape in Windows events:
- 4625 — multiple failed logons from a single source IP over a short window.
- 4624 — successful logon from the same source IP within a few minutes.
- 4648 or 5140 — explicit-credential use or network-share access from the same actor shortly after.
Splunk SPL implementation:
index=windows EventCode=4625
| bucket _time span=5m
| stats count as failed_count, min(_time) as first_fail, max(_time) as last_fail by src_ip
| where failed_count >= 10
| join src_ip type=inner
[ search index=windows EventCode=4624
| rename _time as success_time
| fields src_ip, success_time, user ]
| where success_time > last_fail AND success_time < (last_fail + 600)
| join src_ip type=inner
[ search index=windows (EventCode=4648 OR EventCode=5140)
| rename _time as lateral_time
| fields src_ip, lateral_time, dest_host ]
| where lateral_time > success_time AND lateral_time < (success_time + 1800)
| table src_ip, user, dest_host, first_fail, success_time, lateral_time, failed_count
The three time windows are tunable. 10 failures in 5 minutes is conservative; drop to 5 in 1 minute for tighter detection on hardened systems. The 30-minute lateral-movement window catches slower follow-ups.
Pattern 2 — PowerShell host with encoded staging (T1059.001 + T1105)
Operators rarely run plain powershell.exe with no arguments. The shape that matters is a host process spawned by Office or a browser, paired with script-block content that downloads and executes:
- 4688 — process creation, parent in
{winword.exe, excel.exe, outlook.exe, mshta.exe}, child ispowershell.exeorpwsh.exe. - 4104 — script block logged from the PowerShell operational log containing
DownloadString,FromBase64String,Invoke-Expression, or-EncodedCommand. - The two events appear under the same correlation key (computer + new-process GUID) within seconds.
Elastic ES|QL:
FROM logs-windows-*
| WHERE event.code IN ("4688", "4104")
| WHERE @timestamp > NOW() - 1 HOUR
| EVAL bucket = DATE_TRUNC(1 MINUTES, @timestamp)
| STATS
bad_process = COUNT_DISTINCT(CASE WHEN event.code == "4688"
AND winlog.event_data.NewProcessName LIKE "%powershell%"
AND winlog.event_data.ParentProcessName RLIKE ".*(winword|excel|outlook|mshta)\\.exe" THEN 1 END),
bad_script = COUNT_DISTINCT(CASE WHEN event.code == "4104"
AND powershell.script_block_text RLIKE "(?i)(downloadstring|frombase64string|invoke-expression|-encodedcommand)" THEN 1 END)
BY host.name, bucket
| WHERE bad_process > 0 AND bad_script > 0
| SORT bucket DESC
The host-name plus minute-bucket join handles the common case where the two events arrive seconds apart but in different channels. Add user.name to the STATS BY when running on a fleet.
Pattern 3 — Persistence right after privileged authentication (T1078 → T1543/T1547)
Most credential theft is only valuable if the attacker plants persistence with the stolen identity. The signal is a new service or Run-key registration shortly after an administrative logon:
- 4672 — special privileges assigned to a new logon (administrator).
- 7045 — new service installed (System log); or 4657 — registry value modified under a Run key (Security log, requires Object Access audit).
- Same user and same host within ~10 minutes.
QRadar AQL:
SELECT
"User Name" AS user,
sourceaddress AS src,
"Service Name" AS svc,
MIN(starttime) AS auth_time,
MAX(starttime) AS persist_time
FROM events
WHERE
(eventid = '4672' OR eventid = '7045' OR eventid = '4657')
AND starttime > NOW() - INTERVAL '2 HOURS'
GROUP BY "User Name", sourceaddress, "Service Name"
HAVING
COUNT(DISTINCT eventid) >= 2
AND (MAX(starttime) - MIN(starttime)) < 600000 -- 10 minutes in ms
ORDER BY auth_time DESC
The 10-minute window is the practical floor for "this looks like one person acting". Tighten it for high-fidelity service tiers; widen it for environments where legitimate admins genuinely do install services in batches.
Step 4 — Build a baseline before you deploy
Every correlation rule needs a baseline run before it goes live. Pull the rule's query, drop the alerting threshold, and look at what fires over the last 30 days. Common surprises:
- Scheduled tasks that legitimately spawn PowerShell at 02:00 every night.
- Backup or imaging tools that touch service and registry keys at boot.
- Helpdesk admin tooling that performs explicit-credential logons across hosts.
Each of these turns into an exclusion: a user, a host, a parent process, a service name. Document the exclusion in the rule's metadata so the next analyst can see why it is there.
Step 5 — Anchor every rule to a MITRE ATT&CK technique
Tagging rules with the technique they detect does two things: it forces you to be specific about the behaviour, and it gives the analyst a starting point for response. The three rules above map cleanly:
- Pattern 1 → T1110.001 (Password Guessing) + T1021 (Remote Services)
- Pattern 2 → T1059.001 (PowerShell) + T1105 (Ingress Tool Transfer)
- Pattern 3 → T1543.003 (Windows Service) or T1547.001 (Registry Run Keys)
Once every active rule has a technique tag, you have a coverage map. Gaps in the map are where to invest next.
Step 6 — Measure what the rules actually deliver
Each rule needs three running metrics:
- True positive rate — share of alerts the IR team confirmed as real activity. Target above 50%.
- Time to detect — minutes between the first contributing event and the alert. Aim for under five.
- False positive rate — share of alerts closed as benign. Above 20% means tune; above 50% means disable and rebuild.
Track them in a small Splunk/Kibana dashboard. Without them the rules drift, and "the SIEM is too noisy" becomes the same lived experience all over again.
Frequently Asked Questions
Why not just use vendor-shipped correlation packs?
Use them as a starting point, but expect to rewrite most. Vendor packs target the broadest possible audience, which means generous thresholds and field assumptions that may not match your environment. The rules above are specific to Windows events and tuned for normal admin behaviour; a vendor pack cannot know what "normal" looks like for you.
How tight should the time windows be?
Tight enough that unrelated activity does not get joined, loose enough that genuine attack chains stay connected. For interactive attacks, 5–15 minute windows are usually right. For staged or low-and-slow campaigns, hours; for ransomware execution sequences, seconds.
What about correlation across data sources, not just Windows logs?
Same idea, more keys. Add EDR alerts, firewall denies, and proxy logs to the join key on user, host, and time. The cost is field normalisation — different sources use different field names, so plan ingestion-side mappings before writing the rule.
Can I express these in Sigma?
Most single-event detections, yes. Sigma's correlation extension (introduced in Sigma 2.x) covers temporal joins and N-of-M aggregations, which is enough for the three patterns above. Convert Sigma to your SIEM's native query language with the official Sigma converter rather than maintaining two versions.
How often should I re-tune correlation rules?
Every quarter, plus any time the environment changes materially (new EDR deployment, new admin tooling, a different patch cadence). Track the three metrics above and re-tune when any drift outside its target.
Conclusion
A SIEM that just stores logs is an expensive log store. The leverage comes from correlation: a small number of rules that recognise the multi-step shapes attackers actually move in. Start with the three patterns above, anchor each rule to a MITRE technique, baseline before deploying, and measure every alert. Within a quarter the same SIEM will be telling you genuinely useful things — and the alert volume will go down, not up.
Related Posts
- Essential Windows Event IDs for Security Monitoring — the event-ID reference the rules above rely on.
- Getting Started with MITRE ATT&CK: Fetching and Processing Data — how to programmatically tag rules with techniques.
- PowerShell Quick Guide: Working with Event Logs Like a Pro — local query patterns that complement the SIEM rules.
Authoritative references: MITRE ATT&CK and SigmaHQ rule project.
0 comments:
Post a Comment