Building a Domain Security Analyzer – The Main Scanner Class

In the last post, we completed the HTTP headers scanner. Now we have three independent scanner modules – DNS, SSL, and headers. Each one works perfectly on its own. But our users shouldn't have to call three separate functions and combine results themselves.

Today we build the coordinator – the main Scanner class that ties everything together.

Today's Goal

By the end of this post, you'll understand:

Why we need a coordinating class – The difference between functions and classes
How __init__ initializes objects – Setting up state before scanning
How relative imports work – The from . import syntax
Why we store results as instance attributes – Accessing data after scanning

The Complete Code

Here's the full scanner/scanner.py file. We'll break down every piece:

"""Main scanner that coordinates all security checks."""

from datetime import datetime, timezone
from . import dns_scanner
from . import ssl_scanner
from . import headers_scanner


class Scanner:
    """
    Scans a domain for security configuration.

    Runs DNS, SSL, and HTTP header checks and returns
    a comprehensive result dictionary.
    """

    def __init__(self, domain):
        """
        Initialize scanner with target domain.

        Args:
            domain: The domain to scan (e.g., 'github.com')
        """
        self.domain = domain.lower().strip()
        self.results = None
        self.scan_time = None

    def scan(self):
        """
        Run all security checks on the domain.

        Returns:
            dict containing all scan results
        """
        self.scan_time = datetime.now(timezone.utc)

        self.results = {
            'domain': self.domain,
            'scan_time': self.scan_time.isoformat(),
            'dns': dns_scanner.scan_dns(self.domain),
            'ssl': ssl_scanner.scan_ssl(self.domain),
            'headers': headers_scanner.scan_headers(self.domain)
        }

        return self.results

    def get_results(self):
        """
        Get the scan results.

        Returns:
            dict with results, or None if scan hasn't run
        """
        return self.results

Now let's understand every piece.

Part 1: The Module Docstring and Imports

"""Main scanner that coordinates all security checks."""

from datetime import datetime, timezone
from . import dns_scanner
from . import ssl_scanner
from . import headers_scanner

The Docstring

"""Main scanner that coordinates all security checks."""

Same pattern as our other modules – a brief description of the file's purpose. This module's job is coordination, not implementation. It doesn't know how to check DNS records or validate certificates. It just knows to call the modules that do.

The datetime Import

from datetime import datetime, timezone

We import two things from Python's built-in datetime module:

datetime – A class for representing dates and times
timezone – A class for handling time zones

We'll use these to timestamp when each scan occurs. Recording the scan time is important because:

Security configurations change over time
Users might want to compare "what was the status last week vs today"
Debugging requires knowing when something was checked

The Relative Imports

from . import dns_scanner
from . import ssl_scanner
from . import headers_scanner

These three lines import our scanner modules. The dot (.) means "from the current package" – in other words, from the same scanner/ folder where this file lives.

Why from . import instead of just import?

Let's understand the difference:

# This looks for dns_scanner in Python's global path
import dns_scanner  # Would fail – dns_scanner isn't installed globally

# This looks for dns_scanner in the current package
from . import dns_scanner  # Works – finds scanner/dns_scanner.py

The relative import tells Python: "Don't search everywhere. Look right here, in the same folder as me."

Visualizing the imports:

scanner/
├── __init__.py
├── scanner.py          ← We're here
├── dns_scanner.py      ← from . import dns_scanner finds this
├── ssl_scanner.py      ← from . import ssl_scanner finds this
└── headers_scanner.py  ← from . import headers_scanner finds this

After these imports, we can use:

dns_scanner.scan_dns(domain)
ssl_scanner.scan_ssl(domain)
headers_scanner.scan_headers(domain)

Part 2: Why a Class Instead of Functions?

Before we dive into the code, let's understand why we're using a class here when our individual scanners used plain functions.

The Function Approach

We could have written this as a simple function:

def scan_domain(domain):
    return {
        'domain': domain,
        'dns': dns_scanner.scan_dns(domain),
        'ssl': ssl_scanner.scan_ssl(domain),
        'headers': headers_scanner.scan_headers(domain)
    }

This works! Call scan_domain("github.com") and get results. So why bother with a class?

The Problem with Functions

Functions are stateless – they run, return a value, and forget everything. But scanning has state we might want to keep:

# With a function, you'd have to do this:
results = scan_domain("github.com")
# Later...
domain = results['domain']  # Have to dig into the results
scan_time = results['scan_time']  # Accessing nested data

What if you want to:

Check which domain was scanned without parsing results?
Know if a scan has been run yet?
Re-scan the same domain?
Access the scan time separately?

The Class Approach

A class bundles data and behavior together:

# Create a scanner for a specific domain
scanner = Scanner("github.com")

# The domain is stored
print(scanner.domain)  # "github.com"

# Run the scan
scanner.scan()

# Access results anytime
print(scanner.results)
print(scanner.scan_time)

# Re-scan if needed
scanner.scan()  # New results, same domain

The scanner "remembers" its domain and results. You can create multiple scanners for different domains and they don't interfere with each other:

github_scanner = Scanner("github.com")
google_scanner = Scanner("google.com")

github_scanner.scan()
google_scanner.scan()

# Each scanner has its own results
print(github_scanner.results)  # GitHub's results
print(google_scanner.results)  # Google's results

When to Use Classes vs Functions

Use Functions When	Use Classes When
One-shot operations	Need to track state
No state to remember	Multiple related operations
Simple input → output	Object has identity (this scanner vs that scanner)
Utility operations	Will be extended/subclassed

Our individual scanner modules (dns_scanner, etc.) use functions because each scan is independent – call it, get results, done.

The main Scanner uses a class because it represents "a scanner configured for a specific domain" – an object with identity and state.

Part 3: The Class Definition

class Scanner:
    """
    Scans a domain for security configuration.

    Runs DNS, SSL, and HTTP header checks and returns
    a comprehensive result dictionary.
    """

What is a Class?

A class is a blueprint for creating objects. Think of it like a cookie cutter – the class defines the shape, and each object is a cookie made from that cutter.

class Scanner:  # The blueprint
    ...

# Creating objects from the blueprint
scanner1 = Scanner("github.com")   # One cookie
scanner2 = Scanner("google.com")   # Another cookie

Each scanner object has its own domain, results, and scan_time. They're independent instances of the same blueprint.

The Class Docstring

    """
    Scans a domain for security configuration.

    Runs DNS, SSL, and HTTP header checks and returns
    a comprehensive result dictionary.
    """

This documents what the class does. When someone types help(Scanner), they see this description. Good docstrings explain:

What the class represents
What it does
How to use it (though we'll add more detail in method docstrings)

Part 4: The `init` Method

    def __init__(self, domain):
        """
        Initialize scanner with target domain.

        Args:
            domain: The domain to scan (e.g., 'github.com')
        """
        self.domain = domain.lower().strip()
        self.results = None
        self.scan_time = None

What is `init`?

__init__ is a special method called the constructor or initializer. Python calls it automatically when you create a new object:

scanner = Scanner("github.com")
#         ↑
#         This triggers __init__ with domain="github.com"

The double underscores (called "dunder" for "double underscore") mark this as a special Python method. You never call __init__ directly – Python calls it for you.

Understanding `self`

Every method in a class receives self as its first parameter. self refers to the specific object being worked on.

def __init__(self, domain):
    #          ↑
    #     self = the new Scanner object being created

When you write:

scanner = Scanner("github.com")

Python creates a new Scanner object and passes it as self to __init__. The domain parameter gets the value "github.com".

Setting Instance Attributes

        self.domain = domain.lower().strip()
        self.results = None
        self.scan_time = None

These lines create instance attributes – data that belongs to this specific Scanner object.

self.domain = domain.lower().strip()

We store the domain name, but first we clean it:

.lower() – Convert to lowercase ("GitHub.com" → "github.com")
.strip() – Remove leading/trailing whitespace (" github.com " → "github.com")

Why clean the input? Users might type " GitHub.COM " and expect it to work. By normalizing here, we handle variations gracefully.

self.results = None

We initialize results to None. This indicates "no scan has been run yet." After calling .scan(), this will hold the results dictionary.

self.scan_time = None

Same idea – None until a scan runs, then it holds the timestamp.

Why Initialize to None?

We could skip these lines and only create the attributes when needed:

def __init__(self, domain):
    self.domain = domain.lower().strip()
    # Don't define self.results or self.scan_time yet

def scan(self):
    self.scan_time = datetime.now(timezone.utc)
    self.results = {...}

But this creates problems:

scanner = Scanner("github.com")
print(scanner.results)  # AttributeError: 'Scanner' has no attribute 'results'

By initializing to None in __init__, we guarantee these attributes always exist:

scanner = Scanner("github.com")
print(scanner.results)  # None (not an error)

if scanner.results is None:
    print("No scan yet")

This is a defensive programming pattern – define all attributes upfront so code that accesses them doesn't crash.

Part 5: The `scan` Method

    def scan(self):
        """
        Run all security checks on the domain.

        Returns:
            dict containing all scan results
        """
        self.scan_time = datetime.now(timezone.utc)

        self.results = {
            'domain': self.domain,
            'scan_time': self.scan_time.isoformat(),
            'dns': dns_scanner.scan_dns(self.domain),
            'ssl': ssl_scanner.scan_ssl(self.domain),
            'headers': headers_scanner.scan_headers(self.domain)
        }

        return self.results

This is the main method – it does the actual work.

Recording the Scan Time

        self.scan_time = datetime.now(timezone.utc)

We capture the current time in UTC (Coordinated Universal Time).

Why UTC?

UTC is the universal reference time – it doesn't change with daylight saving or location. If your server is in New York and a user is in Tokyo, both see the same UTC time. This avoids confusion like "the scan says 3pm but it's 9am here."

What does this return?

datetime.now(timezone.utc)
# datetime(2024, 12, 27, 15, 30, 45, 123456, tzinfo=datetime.timezone.utc)

A datetime object with year, month, day, hour, minute, second, microsecond, and timezone info.

Building the Results Dictionary

        self.results = {
            'domain': self.domain,
            'scan_time': self.scan_time.isoformat(),
            'dns': dns_scanner.scan_dns(self.domain),
            'ssl': ssl_scanner.scan_ssl(self.domain),
            'headers': headers_scanner.scan_headers(self.domain)
        }

Let's break down each key:

'domain': self.domain

Include the domain in results. This might seem redundant (we already have scanner.domain), but it makes the results self-contained. If you save results to a file or database, you'll know which domain they're for without external context.

'scan_time': self.scan_time.isoformat()

The .isoformat() method converts the datetime to a string:

datetime(2024, 12, 27, 15, 30, 45).isoformat()
# '2024-12-27T15:30:45'

ISO 8601 format is:

Human-readable
Machine-parseable
Sortable (alphabetical order = chronological order)
Standard across programming languages

'dns': dns_scanner.scan_dns(self.domain)

Call our DNS scanner module. This runs all DNS checks (A, MX, SPF, DMARC) and returns a dictionary. The entire result becomes the value of the 'dns' key.

'ssl': ssl_scanner.scan_ssl(self.domain)

Call our SSL scanner module. Checks certificate validity, expiration, issuer, etc.

'headers': headers_scanner.scan_headers(self.domain)

Call our headers scanner module. Checks for security headers like HSTS, CSP, X-Frame-Options.

What the Results Look Like

After scanning github.com, self.results contains:

{
    'domain': 'github.com',
    'scan_time': '2024-12-27T15:30:45.123456+00:00',
    'dns': {
        'a_records': {'success': True, 'records': ['20.207.73.82']},
        'mx_records': {'success': True, 'records': [...]},
        'spf': {'success': True, 'record': 'v=spf1 ...'},
        'dmarc': {'success': True, 'record': 'v=DMARC1; p=reject ...'}
    },
    'ssl': {
        'success': True,
        'valid': True,
        'common_name': 'github.com',
        'issuer': 'Sectigo Limited',
        'days_until_expiry': 405,
        ...
    },
    'headers': {
        'success': True,
        'status_code': 200,
        'headers': {
            'Strict-Transport-Security': {'present': True, 'value': '...'},
            'Content-Security-Policy': {'present': True, 'value': '...'},
            ...
        }
    }
}

A nested dictionary containing everything we learned about the domain.

Storing and Returning Results

        self.results = {...}

        return self.results

We do both:

Store in self.results – So we can access later via scanner.results
Return the value – So callers can use it immediately

# Both of these work:
scanner.scan()
results = scanner.results  # Access stored attribute

# Or:
results = scanner.scan()  # Use return value directly

Part 6: The `get_results` Method

    def get_results(self):
        """
        Get the scan results.

        Returns:
            dict with results, or None if scan hasn't run
        """
        return self.results

Why Have This Method?

You might wonder – can't we just access scanner.results directly? Yes, you can. So why have get_results()?

1. Encapsulation (Future-Proofing)

Right now, get_results() just returns self.results. But later we might add logic:

def get_results(self):
    if self.results is None:
        raise ValueError("No scan has been run yet")
    return self.results

Or:

def get_results(self, format='dict'):
    if format == 'json':
        return json.dumps(self.results)
    return self.results

Code that uses scanner.get_results() doesn't need to change when we add these features. Code that accesses scanner.results directly would.

2. Convention

In many programming languages, direct attribute access is discouraged. Python is more relaxed, but having getter methods makes the API clearer – especially for developers coming from other languages.

3. IDE Autocomplete

When you type scanner. in an IDE, methods appear in the autocomplete list with their docstrings. Direct attributes don't always show up as clearly.

When to Use Direct Access vs Methods

Direct Access (`scanner.results`)	Method (`scanner.get_results()`)
Quick scripts	Library/API code
Internal code	Public interfaces
Simple data retrieval	When logic might be added later

For our project, either approach works. Having both options gives flexibility.

Part 7: How the Scanner Coordinates

Let's visualize what happens when you use the Scanner:

┌─────────────────────────────────────────────────────────────────┐
│                         User Code                               │
│                                                                 │
│   scanner = Scanner("github.com")                               │
│   results = scanner.scan()                                      │
│                                                                 │
└───────────────────────────────┬─────────────────────────────────┘
                                │
                                ▼
┌─────────────────────────────────────────────────────────────────┐
│                    Scanner.scan() method                        │
│                                                                 │
│   1. Record scan time                                           │
│   2. Call dns_scanner.scan_dns("github.com")                    │
│   3. Call ssl_scanner.scan_ssl("github.com")                    │
│   4. Call headers_scanner.scan_headers("github.com")            │
│   5. Combine results into one dictionary                        │
│   6. Store in self.results                                      │
│   7. Return results                                             │
│                                                                 │
└───────────────────────────────┬─────────────────────────────────┘
                                │
        ┌───────────────────────┼───────────────────────┐
        │                       │                       │
        ▼                       ▼                       ▼
┌───────────────┐     ┌───────────────┐     ┌───────────────┐
│  dns_scanner  │     │  ssl_scanner  │     │headers_scanner│
│               │     │               │     │               │
│ • A records   │     │ • Certificate │     │ • HSTS        │
│ • MX records  │     │ • Expiration  │     │ • CSP         │
│ • SPF         │     │ • Issuer      │     │ • X-Frame     │
│ • DMARC       │     │ • Validity    │     │ • etc.        │
│               │     │               │     │               │
└───────┬───────┘     └───────┬───────┘     └───────┬───────┘
        │                     │                     │
        │    Individual       │    scanner          │
        │    results          │    results          │
        │                     │                     │
        └─────────────────────┼─────────────────────┘
                              │
                              ▼
                    Combined Results Dictionary

The Scanner doesn't know how DNS lookups work. It doesn't know about SSL handshakes or HTTP headers. It just knows:

"I need DNS results from dns_scanner"
"I need SSL results from ssl_scanner"
"I need header results from headers_scanner"
"I combine them and return everything"

This is delegation – the Scanner delegates specific tasks to specialized modules.

Part 8: Using the Scanner

Let's see the Scanner in action:

Basic Usage

from scanner import Scanner

# Create a scanner for GitHub
scanner = Scanner("github.com")

# Run the scan
results = scanner.scan()

# Access results
print(f"Domain: {results['domain']}")
print(f"Scanned at: {results['scan_time']}")

# Check DNS
if results['dns']['spf']['success']:
    print(f"SPF: {results['dns']['spf']['record']}")

# Check SSL
if results['ssl']['valid']:
    print(f"Certificate expires in {results['ssl']['days_until_expiry']} days")

# Check Headers
for header, data in results['headers']['headers'].items():
    status = "✓" if data['present'] else "✗"
    print(f"{status} {header}")

Multiple Domains

domains = ["github.com", "google.com", "example.com"]

for domain in domains:
    scanner = Scanner(domain)
    results = scanner.scan()

    # Quick summary
    ssl_status = "✓" if results['ssl'].get('valid') else "✗"
    headers_count = sum(1 for h in results['headers']['headers'].values() if h['present'])

    print(f"{domain}: SSL {ssl_status}, {headers_count}/6 headers")

Output:

github.com: SSL ✓, 6/6 headers
google.com: SSL ✓, 4/6 headers
example.com: SSL ✓, 0/6 headers

Checking Scan Status

scanner = Scanner("github.com")

# Before scanning
print(scanner.results)    # None
print(scanner.scan_time)  # None

# After scanning
scanner.scan()
print(scanner.results)    # {...full results...}
print(scanner.scan_time)  # 2024-12-27 15:30:45+00:00

Part 9: Error Handling

What happens when something goes wrong? Each scanner module handles its own errors:

# If DNS fails
results['dns']['a_records'] = {'success': False, 'error': 'Domain does not exist'}

# If SSL fails
results['ssl'] = {'success': False, 'error': 'Connection timed out'}

# If headers fail
results['headers'] = {'success': False, 'error': 'SSL Error: certificate verify failed'}

The Scanner doesn't need to handle these – it just passes through whatever the individual modules return. Each part can fail independently:

{
    'domain': 'sketchy-site.com',
    'dns': {'a_records': {'success': True, ...}},  # DNS worked
    'ssl': {'success': False, 'error': '...'},     # SSL failed
    'headers': {'success': False, 'error': '...'}  # Headers failed (SSL needed)
}

This tells us: the domain exists (DNS worked), but something's wrong with HTTPS.

Part 10: How This Fits Into the Project

Here's our updated project structure:

scanner/
├── __init__.py         ✓ Exports Scanner class
├── scanner.py          ✓ Main Scanner class (this post)
├── dns_scanner.py      ✓ DNS checks
├── ssl_scanner.py      ✓ SSL checks
└── headers_scanner.py  ✓ Header checks

The scanner/__init__.py makes the Scanner class easy to import:

"""Scanner package for domain security analysis."""

from .scanner import Scanner

__all__ = ['Scanner']

Now external code can do:

from scanner import Scanner  # Clean import

scanner = Scanner("github.com")
results = scanner.scan()

Instead of:

from scanner.scanner import Scanner  # Awkward

Summary: What We've Learned

Classes bundle data and behavior – The Scanner holds a domain, results, and scan time together with methods to work on them
__init__ initializes object state – Called automatically when creating objects, it sets up instance attributes
self refers to the current object – Every method receives it, allowing access to the object's data
Relative imports stay within the package – from . import module finds sibling files in the same folder
Delegation separates concerns – The Scanner coordinates without knowing implementation details of DNS, SSL, or header checking
Store and return gives flexibility – Users can access results immediately or later through the stored attribute

The Complete Scanner Package

We now have a complete scanning package:

File	Purpose	Key Export
`__init__.py`	Package interface	`Scanner`
`scanner.py`	Coordination	`Scanner` class
`dns_scanner.py`	DNS checks	`scan_dns()`
`ssl_scanner.py`	SSL checks	`scan_ssl()`
`headers_scanner.py`	Header checks	`scan_headers()`

All the scanning logic is encapsulated. External code just needs:

from scanner import Scanner
scanner = Scanner("example.com")
results = scanner.scan()

What's Next

The Scanner gives us raw data – a nested dictionary with all our findings. But working with nested dictionaries is tedious:

# This is annoying to write repeatedly
if results['ssl'].get('success') and results['ssl'].get('valid'):
    days = results['ssl']['days_until_expiry']

In the next post, we'll build the models package – a new folder containing:

models/
├── __init__.py    # Package interface
└── report.py      # ScanReport class

The ScanReport class wraps the results dictionary and provides convenient properties:

# Much cleaner!
if report.has_valid_ssl:
    days = report.ssl_days_remaining

We'll add properties like has_valid_ssl, has_spf, has_dmarc, missing_headers, and present_headers – making the data easy to work with for our CLI display and future scoring engine.

This is Part 10 of the Domain Security Analyzer series.

Find the code on GitHub.

Command Palette