Python Quick Guide: Building a Simple Port Scanner

Building a Python port scanner is a useful exercise for any defender who wants to understand what attackers' first reconnaissance actually looks like — and, more usefully, what an internal asset-discovery sweep across your own network looks like from a tool's perspective. This post walks through a small but real scanner intended for authorised scanning of systems you own or administer. For production discovery, use Nmap; this implementation is here to teach you what Nmap does underneath.

Key Takeaways

  • A working TCP-connect scanner is ~100 lines of Python using socket and concurrent.futures; you do not need root or raw packets.
  • Threading is essential — scanning 1024 ports sequentially with a one-second timeout takes 17 minutes; with 50 workers it takes ~25 seconds.
  • Connect scans complete the three-way handshake, so the destination logs the connection. This is intentional for authorised scanning; do not run this against systems you do not own.
  • For production network discovery, asset inventory, or pentesting, use Nmap. It is more accurate, faster, and battle-tested.
  • Treat the scanner as a learning step toward understanding firewall, IDS, and EDR behaviour, not as a long-term replacement tool.

Environment

  • Python 3.10+ (uses concurrent.futures.ThreadPoolExecutor and type hints).
  • Windows, macOS, or Linux — sockets are stdlib.
  • Network connectivity to the target you have explicit authorisation to scan.
  • No root or admin rights needed for TCP-connect scanning. UDP and SYN scanning would; we are deliberately staying in user space.

The Problem

Network ports are the entry points for everything that runs on a host. Knowing which are open, which service answers, and how it identifies itself is the first question of both attack and defence. Nmap answers that question better than any homemade script, but rolling a simple version forces you to confront the actual mechanics: socket states, timeouts, banner grabbing, false negatives caused by firewalls, and why scan rate matters. Five percent of the value of writing one is the script. Ninety-five percent is internalising what the script is doing.

The Solution

Step 1 — Scaffold the scanner

Start with a class that captures the target, port range, and worker count. concurrent.futures replaces the manual queue and thread plumbing — fewer lines, fewer bugs:

import socket
from concurrent.futures import ThreadPoolExecutor, as_completed
from dataclasses import dataclass
from typing import Iterable

@dataclass
class PortResult:
    port: int
    open: bool
    service: str = ""
    banner: str = ""


class PortScanner:
    def __init__(self, target: str, ports: Iterable[int], workers: int = 50, timeout: float = 1.0):
        self.target  = target
        self.ports   = list(ports)
        self.workers = workers
        self.timeout = timeout

    def _probe(self, port: int) -> PortResult:
        with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s:
            s.settimeout(self.timeout)
            if s.connect_ex((self.target, port)) != 0:
                return PortResult(port=port, open=False)
            service = self._service_name(port)
            banner  = self._banner(s)
            return PortResult(port=port, open=True, service=service, banner=banner)

    def scan(self) -> list[PortResult]:
        with ThreadPoolExecutor(max_workers=self.workers) as pool:
            futures = [pool.submit(self._probe, p) for p in self.ports]
            results = [f.result() for f in as_completed(futures)]
        return sorted([r for r in results if r.open], key=lambda r: r.port)

connect_ex returns 0 on success and the OS error number on failure, which is exactly what we need. connect would raise on failure, forcing exception-handling overhead in the hot path.

Step 2 — Resolve service names

The operating system already knows what service usually runs on a given port. socket.getservbyport consults /etc/services on Unix or %SystemRoot%\system32\drivers\etc\services on Windows:

    @staticmethod
    def _service_name(port: int) -> str:
        try:
            return socket.getservbyport(port, 'tcp')
        except OSError:
            return 'unknown'

This is a hint, not ground truth. The service running on port 22 might not be SSH, and the service running on 8080 might be anything. Banner grabbing in the next step is more reliable when it works.

Step 3 — Grab a banner where the protocol lets you

Some protocols volunteer a banner on connect (SSH, SMTP, FTP). Others wait for a request (HTTP). A short read with a tight timeout covers both cases without blocking on silent ports:

    def _banner(self, sock: socket.socket) -> str:
        try:
            sock.settimeout(0.5)
            data = sock.recv(256)
            return data.decode(errors='replace').strip()
        except OSError:
            return ''

An empty banner does not mean the port is closed — it means the service did not chatter on connect. For HTTP, sending a single HEAD / HTTP/1.0\r\n\r\n before reading recovers the response line.

Step 4 — Add a small CLI

argparse keeps the entry point thin and self-documenting:

import argparse
import time

def main():
    parser = argparse.ArgumentParser(description='Authorised TCP-connect port scanner')
    parser.add_argument('target', help='Host or IP you have permission to scan')
    parser.add_argument('-s', '--start',   type=int,   default=1)
    parser.add_argument('-e', '--end',     type=int,   default=1024)
    parser.add_argument('-w', '--workers', type=int,   default=50)
    parser.add_argument('-t', '--timeout', type=float, default=1.0)
    args = parser.parse_args()

    scanner = PortScanner(args.target, range(args.start, args.end + 1), args.workers, args.timeout)
    print(f'Scanning {args.target} ports {args.start}-{args.end} ({args.workers} workers)')

    start   = time.time()
    results = scanner.scan()
    elapsed = time.time() - start

    print(f'\nFound {len(results)} open ports in {elapsed:.2f}s\n')
    print(f'{"PORT":<8}{"SERVICE":<12}BANNER')
    for r in results:
        print(f'{r.port:<8}{r.service:<12}{r.banner[:80]}')


if __name__ == '__main__':
    main()

Running it: python scanner.py 192.0.2.10 -s 1 -e 65535 -w 200 -t 0.5. Pulling the worker count up and the timeout down is the easiest way to trade accuracy for speed.

Step 5 — Understand what you cannot see

TCP-connect scanning has limits worth knowing before you trust the output:

  • Filtered ports look closed. A stateful firewall that drops packets silently is indistinguishable from a closed port at the socket layer. Nmap's -sS SYN scan can sometimes distinguish them; connect_ex cannot.
  • UDP requires a different probe. UDP has no handshake, so absence of a reply is ambiguous. Application-specific probes (DNS query to port 53, SNMP getRequest to 161) are how Nmap handles it.
  • Rate limiting hides services. Many environments rate-limit new TCP connections. Push too many workers, lose results to silent drops.
  • You are logged. A connect scan completes a three-way handshake. Firewalls, IDS, EDR, and the host itself all record the event. Treat this as expected behaviour for authorised testing.

Step 6 — Compare against Nmap on the same target

For any non-trivial use, run Nmap on the same target and compare. The equivalent commands:

# Same shape as our scanner
nmap -p1-1024 -sT --max-rate 200 target

# Add service detection
nmap -p1-1024 -sT -sV target

# SYN scan, faster and more accurate (requires root)
sudo nmap -p- -sS -T4 target

Nmap's NSE engine, OS fingerprinting, and version detection are decades of work. Our scanner exists to demystify the mechanics; Nmap exists to actually do the job in production.

Frequently Asked Questions

Is it legal to run this scanner?

On systems you own or have written permission to test — yes. Against arbitrary public targets — almost certainly not, depending on jurisdiction. Most jurisdictions treat unauthorised port scanning as at minimum a terms-of-service violation and often an offence under computer-misuse law. Always have explicit written authorisation before scanning anything you do not own.

Why use threads rather than asyncio?

Threads work cleanly with the blocking socket API and need very little code. asyncio is the right tool when you scale past low-thousands of concurrent probes, but the added complexity is not justified for a 1024-port sweep.

Why are my scan results different from Nmap's?

Most often: a stateful firewall is dropping packets silently and our connect scan reads that as "closed", while Nmap's SYN scan reads it as "filtered". Less often: rate limiting is silently dropping fast probes. Reduce --workers and increase --timeout and re-test.

Can I extend this to do UDP?

Yes, but the simple "connect and see what comes back" approach does not work because UDP is stateless. You need protocol-specific probes (a DNS query for port 53, an NTP request for port 123, etc.). Nmap's nmap-payloads file is the reference set; replicating it is more work than the rest of the scanner combined.

How do I tell whether a port banner is genuine or an opaque proxy?

You cannot, from a single probe. The banner is whatever bytes the service decides to send. Cross-check with protocol-specific probes — for HTTP, request / and look at the headers; for SSH, the version exchange is well-defined; for SMTP, send EHLO and inspect the response.

Conclusion

Writing your own port scanner is one of those rare exercises where the act of writing it is the entire point. The script itself does not do anything Nmap cannot do better — but once you have written it, you understand exactly why TCP-connect scanning leaves traces, why UDP scanning is hard, why rate limits matter, and what a "filtered" port actually means. After that, treat Nmap as the production tool and your scanner as a teaching artefact. Both have their place; do not confuse the roles.

Related Posts

Authoritative reference for the real tool: Nmap Reference Guide (man page).

0 comments:

Post a Comment