If you want to parse Windows Event Logs with Python, the saved .evtx files are a binary XML format that you cannot just open and read line by line. This Python Quick Guide walks through reading an .evtx export with the python-evtx library, pulling out the fields that matter for detection, and filtering by Event ID — the same work I would otherwise do in PowerShell, but in a form that travels to Linux analysis boxes and slots into a larger pipeline.
Key Takeaways
- Windows Event Logs are stored as binary XML in
.evtxfiles, so to parse Windows Event Logs with Python you need a dedicated parser rather than plain text handling. - The pure-Python
python-evtxlibrary reads.evtxfiles on any platform and hands you each record as XML you can walk with the standard library. - Filtering by Event ID means parsing the record XML and reading the
EventIDelement under the event'sSystemblock, which lives in a specific XML namespace. - For multi-gigabyte logs, the Rust-backed
evtxparser is considerably faster than the pure-Python option, at the cost of a compiled dependency. - Reading saved
.evtxexports offline keeps your analysis off the source host and avoids touching a system you may need to preserve.
Environment
- Python 3.9+ on Windows 11, though the same code runs unchanged on Linux or macOS.
- python-evtx installed via
pip install python-evtx(the import name isEvtx). - A saved
.evtxexport — for testing I used a copy ofSecurity.evtxpulled fromC:\Windows\System32\winevt\Logs\. - Standard-library
xml.etree.ElementTreefor reading fields out of each record — no extra dependency.
The Problem
On a live Windows box I would reach for Get-WinEvent and be done. The trouble starts when the logs are not on a live Windows box: an analyst hands me a folder of exported .evtx files from an incident, or the logs land on a Linux host that has no Get-WinEvent at all. The .evtx format is binary XML with chunked records and templated structures, so opening one in a text editor gives you a wall of unreadable bytes. You cannot grep it, and you cannot stream it like a CSV.
Python gives me a portable way to read these files anywhere, and once each record is XML I can do whatever I want with it — filter by Event ID, extract account names, count failed logons, or feed the result into the same kind of analysis I described in my SIEM correlation walkthrough. The one thing to get right first is the parser, because the binary format is not something you want to decode by hand.
The Solution — Parse Windows Event Logs with Python
Step 1 — Read every record out of an .evtx file
The python-evtx library exposes the file as a context manager and yields each record in turn. Calling record.xml() gives you the event as a readable XML string. This is the whole minimal parser:
import Evtx.Evtx as evtx
with evtx.Evtx("Security.evtx") as log:
for record in log.records():
print(record.xml())
That prints every event in the file as XML. Note that records() is a method — the parentheses matter. Each record's XML is a complete <Event> element with a System block (metadata such as the Event ID, time, and computer) and an EventData block (the event-specific fields). If you only need a quick human-readable dump and not a script, the library also ships a command-line tool, evtx_dump.py, that does exactly this.
Step 2 — Pull out the Event ID and timestamp
Printing raw XML is not analysis. To filter and report, I parse each record's XML with the standard library and read specific elements. The catch that trips people up: every element sits in the Windows event schema namespace, so a plain find("System/EventID") returns None. You have to register the namespace and prefix your paths with it:
import Evtx.Evtx as evtx
import xml.etree.ElementTree as ET
# The namespace every Windows event XML element lives in
NS = {"e": "http://schemas.microsoft.com/win/2004/08/events/event"}
with evtx.Evtx("Security.evtx") as log:
for record in log.records():
root = ET.fromstring(record.xml())
system = root.find("e:System", NS)
event_id = system.find("e:EventID", NS).text
time_created = system.find("e:TimeCreated", NS).get("SystemTime")
print(f"{time_created} EventID={event_id}")
The timestamp is an attribute (SystemTime) on the TimeCreated element, not element text, which is why it is read with .get() rather than .text. That asymmetry is easy to miss and produces a confusing AttributeError if you guess wrong.
Step 3 — Filter for the events you actually care about
Most of the time I am after one or two Event IDs — failed logons (4625), successful logons (4624), or whatever the investigation calls for. Filtering is just a comparison, but the EventData fields are stored as named <Data Name="..."> elements, so a small helper to pull a field by name keeps the code readable:
import Evtx.Evtx as evtx
import xml.etree.ElementTree as ET
NS = {"e": "http://schemas.microsoft.com/win/2004/08/events/event"}
def get_data(root, name):
"""Return the text of an EventData/Data element by its Name attribute."""
node = root.find(f"e:EventData/e:Data[@Name='{name}']", NS)
return node.text if node is not None else None
with evtx.Evtx("Security.evtx") as log:
for record in log.records():
root = ET.fromstring(record.xml())
event_id = root.find("e:System/e:EventID", NS).text
if event_id != "4625": # failed logon only
continue
account = get_data(root, "TargetUserName")
source_ip = get_data(root, "IpAddress")
print(f"Failed logon: account={account} src={source_ip}")
Event ID 4625 is the failed-logon event, and the TargetUserName and IpAddress fields are where a password-spray or brute-force pattern shows up. This is the offline equivalent of the kind of monitoring I described in essential Windows Event IDs for security monitoring — same events, read from a saved file instead of a live channel. From here, counting failures per source IP or per account is a few more lines with collections.Counter.
Step 4 — Know when to switch to the faster parser
python-evtx is pure Python, which is exactly what you want for portability and for reading the code to understand it. The trade-off is speed: on a multi-gigabyte Security.evtx it is noticeably slow. When throughput matters, the Rust-backed evtx parser (installed with pip install evtx, imported as PyEvtxParser) parses the same files much faster because the heavy lifting happens in compiled code:
from evtx import PyEvtxParser
parser = PyEvtxParser("Security.evtx")
for record in parser.records():
# record is a dict: event_record_id, timestamp, and 'data' (XML string)
print(record["timestamp"], record["data"][:80])
Both libraries hand you XML in the end, so the parsing logic from Steps 2 and 3 carries over with only the iteration changed. I reach for python-evtx when I want a dependency-free script I can drop on any machine, and for the Rust-backed evtx when I am grinding through large collections.
Frequently Asked Questions
Why does find() return None when parsing EVTX XML in Python?
Because every element is in the Windows event schema namespace. A path like find("System/EventID") looks for elements with no namespace and finds nothing. Register the namespace (http://schemas.microsoft.com/win/2004/08/events/event) and prefix each path segment, as in Step 2.
Can python-evtx read a log from a live, running system?
It reads .evtx files, so you point it at a saved export or a copy from C:\Windows\System32\winevt\Logs\. The active log file can be locked by the Event Log service, so the reliable approach is to export or copy it first. Reading a saved copy also keeps your analysis off the source host.
Is python-evtx or the Rust evtx parser better?
For small files and maximum portability, python-evtx is pure Python with no compiled dependency. For large files where speed matters, the Rust-backed evtx is considerably faster. Both ultimately give you the record as XML, so switching between them changes only a few lines.
How do I extract a specific field like the account name?
Event-specific fields live in the EventData block as <Data Name="..."> elements. Match on the Name attribute — for example EventData/Data[@Name='TargetUserName'] — and read the element's text, as shown by the get_data helper in Step 3.
Conclusion
Parsing .evtx files in Python is not hard once you accept that the format is binary XML and let a real parser handle it. python-evtx gives you each record as XML, the standard library reads the fields, and the only genuine gotcha is the XML namespace that quietly breaks every path you write until you register it.
The honest limitation is performance: the pure-Python parser is fine for an export from a single host but slow across a large collection, which is where the Rust-backed parser earns its compiled dependency. Either way, the value is portability — the same script reads Windows logs on a Linux analysis box, and the parsed output feeds straight into counting, correlation, or whatever the investigation needs next.
Related Posts
- Essential Windows Event IDs for Security Monitoring — which Event IDs to filter for once you can read the logs.
- PowerShell Quick Guide: Process Investigation — the live-system counterpart to this offline parsing approach.
- From Logs to Threats: SIEM Correlation Rules for Real Attacks — where parsed events turn into detections.
Editorial note: posts on this blog are drafted with AI assistance and then reviewed, edited, and tested against a real environment before publishing. Commands, output, and screenshots come from systems I actually ran the work on.
0 comments:
Post a Comment