Building a Domain Security Analyzer – The Main Scanner Class
In the last post, we completed the HTTP headers scanner. Now we have three independent scanner modules – DNS, SSL, and headers. Each one works perfectly on its own. But our users shouldn't have to call three separate functions and combine results themselves.
Today we build the coordinator – the main Scanner class that ties everything together.
Today's Goal
By the end of this post, you'll understand:
Why we need a coordinating class – The difference between functions and classes
How
__init__initializes objects – Setting up state before scanningHow relative imports work – The
from . importsyntaxWhy we store results as instance attributes – Accessing data after scanning
The Complete Code
Here's the full scanner/scanner.py file. We'll break down every piece:
"""Main scanner that coordinates all security checks."""
from datetime import datetime, timezone
from . import dns_scanner
from . import ssl_scanner
from . import headers_scanner
class Scanner:
"""
Scans a domain for security configuration.
Runs DNS, SSL, and HTTP header checks and returns
a comprehensive result dictionary.
"""
def __init__(self, domain):
"""
Initialize scanner with target domain.
Args:
domain: The domain to scan (e.g., 'github.com')
"""
self.domain = domain.lower().strip()
self.results = None
self.scan_time = None
def scan(self):
"""
Run all security checks on the domain.
Returns:
dict containing all scan results
"""
self.scan_time = datetime.now(timezone.utc)
self.results = {
'domain': self.domain,
'scan_time': self.scan_time.isoformat(),
'dns': dns_scanner.scan_dns(self.domain),
'ssl': ssl_scanner.scan_ssl(self.domain),
'headers': headers_scanner.scan_headers(self.domain)
}
return self.results
def get_results(self):
"""
Get the scan results.
Returns:
dict with results, or None if scan hasn't run
"""
return self.results
Now let's understand every piece.
Part 1: The Module Docstring and Imports
"""Main scanner that coordinates all security checks."""
from datetime import datetime, timezone
from . import dns_scanner
from . import ssl_scanner
from . import headers_scanner
The Docstring
"""Main scanner that coordinates all security checks."""
Same pattern as our other modules – a brief description of the file's purpose. This module's job is coordination, not implementation. It doesn't know how to check DNS records or validate certificates. It just knows to call the modules that do.
The datetime Import
from datetime import datetime, timezone
We import two things from Python's built-in datetime module:
datetime– A class for representing dates and timestimezone– A class for handling time zones
We'll use these to timestamp when each scan occurs. Recording the scan time is important because:
Security configurations change over time
Users might want to compare "what was the status last week vs today"
Debugging requires knowing when something was checked
The Relative Imports
from . import dns_scanner
from . import ssl_scanner
from . import headers_scanner
These three lines import our scanner modules. The dot (.) means "from the current package" – in other words, from the same scanner/ folder where this file lives.
Why from . import instead of just import?
Let's understand the difference:
# This looks for dns_scanner in Python's global path
import dns_scanner # Would fail – dns_scanner isn't installed globally
# This looks for dns_scanner in the current package
from . import dns_scanner # Works – finds scanner/dns_scanner.py
The relative import tells Python: "Don't search everywhere. Look right here, in the same folder as me."
Visualizing the imports:
scanner/
├── __init__.py
├── scanner.py ← We're here
├── dns_scanner.py ← from . import dns_scanner finds this
├── ssl_scanner.py ← from . import ssl_scanner finds this
└── headers_scanner.py ← from . import headers_scanner finds this
After these imports, we can use:
dns_scanner.scan_dns(domain)ssl_scanner.scan_ssl(domain)headers_scanner.scan_headers(domain)
Part 2: Why a Class Instead of Functions?
Before we dive into the code, let's understand why we're using a class here when our individual scanners used plain functions.
The Function Approach
We could have written this as a simple function:
def scan_domain(domain):
return {
'domain': domain,
'dns': dns_scanner.scan_dns(domain),
'ssl': ssl_scanner.scan_ssl(domain),
'headers': headers_scanner.scan_headers(domain)
}
This works! Call scan_domain("github.com") and get results. So why bother with a class?
The Problem with Functions
Functions are stateless – they run, return a value, and forget everything. But scanning has state we might want to keep:
# With a function, you'd have to do this:
results = scan_domain("github.com")
# Later...
domain = results['domain'] # Have to dig into the results
scan_time = results['scan_time'] # Accessing nested data
What if you want to:
Check which domain was scanned without parsing results?
Know if a scan has been run yet?
Re-scan the same domain?
Access the scan time separately?
The Class Approach
A class bundles data and behavior together:
# Create a scanner for a specific domain
scanner = Scanner("github.com")
# The domain is stored
print(scanner.domain) # "github.com"
# Run the scan
scanner.scan()
# Access results anytime
print(scanner.results)
print(scanner.scan_time)
# Re-scan if needed
scanner.scan() # New results, same domain
The scanner "remembers" its domain and results. You can create multiple scanners for different domains and they don't interfere with each other:
github_scanner = Scanner("github.com")
google_scanner = Scanner("google.com")
github_scanner.scan()
google_scanner.scan()
# Each scanner has its own results
print(github_scanner.results) # GitHub's results
print(google_scanner.results) # Google's results
When to Use Classes vs Functions
| Use Functions When | Use Classes When |
| One-shot operations | Need to track state |
| No state to remember | Multiple related operations |
| Simple input → output | Object has identity (this scanner vs that scanner) |
| Utility operations | Will be extended/subclassed |
Our individual scanner modules (dns_scanner, etc.) use functions because each scan is independent – call it, get results, done.
The main Scanner uses a class because it represents "a scanner configured for a specific domain" – an object with identity and state.
Part 3: The Class Definition
class Scanner:
"""
Scans a domain for security configuration.
Runs DNS, SSL, and HTTP header checks and returns
a comprehensive result dictionary.
"""
What is a Class?
A class is a blueprint for creating objects. Think of it like a cookie cutter – the class defines the shape, and each object is a cookie made from that cutter.
class Scanner: # The blueprint
...
# Creating objects from the blueprint
scanner1 = Scanner("github.com") # One cookie
scanner2 = Scanner("google.com") # Another cookie
Each scanner object has its own domain, results, and scan_time. They're independent instances of the same blueprint.
The Class Docstring
"""
Scans a domain for security configuration.
Runs DNS, SSL, and HTTP header checks and returns
a comprehensive result dictionary.
"""
This documents what the class does. When someone types help(Scanner), they see this description. Good docstrings explain:
What the class represents
What it does
How to use it (though we'll add more detail in method docstrings)
Part 4: The __init__ Method
def __init__(self, domain):
"""
Initialize scanner with target domain.
Args:
domain: The domain to scan (e.g., 'github.com')
"""
self.domain = domain.lower().strip()
self.results = None
self.scan_time = None
What is __init__?
__init__ is a special method called the constructor or initializer. Python calls it automatically when you create a new object:
scanner = Scanner("github.com")
# ↑
# This triggers __init__ with domain="github.com"
The double underscores (called "dunder" for "double underscore") mark this as a special Python method. You never call __init__ directly – Python calls it for you.
Understanding self
Every method in a class receives self as its first parameter. self refers to the specific object being worked on.
def __init__(self, domain):
# ↑
# self = the new Scanner object being created
When you write:
scanner = Scanner("github.com")
Python creates a new Scanner object and passes it as self to __init__. The domain parameter gets the value "github.com".
Setting Instance Attributes
self.domain = domain.lower().strip()
self.results = None
self.scan_time = None
These lines create instance attributes – data that belongs to this specific Scanner object.
self.domain = domain.lower().strip()
We store the domain name, but first we clean it:
.lower()– Convert to lowercase ("GitHub.com"→"github.com").strip()– Remove leading/trailing whitespace (" github.com "→"github.com")
Why clean the input? Users might type " GitHub.COM " and expect it to work. By normalizing here, we handle variations gracefully.
self.results = None
We initialize results to None. This indicates "no scan has been run yet." After calling .scan(), this will hold the results dictionary.
self.scan_time = None
Same idea – None until a scan runs, then it holds the timestamp.
Why Initialize to None?
We could skip these lines and only create the attributes when needed:
def __init__(self, domain):
self.domain = domain.lower().strip()
# Don't define self.results or self.scan_time yet
def scan(self):
self.scan_time = datetime.now(timezone.utc)
self.results = {...}
But this creates problems:
scanner = Scanner("github.com")
print(scanner.results) # AttributeError: 'Scanner' has no attribute 'results'
By initializing to None in __init__, we guarantee these attributes always exist:
scanner = Scanner("github.com")
print(scanner.results) # None (not an error)
if scanner.results is None:
print("No scan yet")
This is a defensive programming pattern – define all attributes upfront so code that accesses them doesn't crash.
Part 5: The scan Method
def scan(self):
"""
Run all security checks on the domain.
Returns:
dict containing all scan results
"""
self.scan_time = datetime.now(timezone.utc)
self.results = {
'domain': self.domain,
'scan_time': self.scan_time.isoformat(),
'dns': dns_scanner.scan_dns(self.domain),
'ssl': ssl_scanner.scan_ssl(self.domain),
'headers': headers_scanner.scan_headers(self.domain)
}
return self.results
This is the main method – it does the actual work.
Recording the Scan Time
self.scan_time = datetime.now(timezone.utc)
We capture the current time in UTC (Coordinated Universal Time).
Why UTC?
UTC is the universal reference time – it doesn't change with daylight saving or location. If your server is in New York and a user is in Tokyo, both see the same UTC time. This avoids confusion like "the scan says 3pm but it's 9am here."
What does this return?
datetime.now(timezone.utc)
# datetime(2024, 12, 27, 15, 30, 45, 123456, tzinfo=datetime.timezone.utc)
A datetime object with year, month, day, hour, minute, second, microsecond, and timezone info.
Building the Results Dictionary
self.results = {
'domain': self.domain,
'scan_time': self.scan_time.isoformat(),
'dns': dns_scanner.scan_dns(self.domain),
'ssl': ssl_scanner.scan_ssl(self.domain),
'headers': headers_scanner.scan_headers(self.domain)
}
Let's break down each key:
'domain': self.domain
Include the domain in results. This might seem redundant (we already have scanner.domain), but it makes the results self-contained. If you save results to a file or database, you'll know which domain they're for without external context.
'scan_time': self.scan_time.isoformat()
The .isoformat() method converts the datetime to a string:
datetime(2024, 12, 27, 15, 30, 45).isoformat()
# '2024-12-27T15:30:45'
ISO 8601 format is:
Human-readable
Machine-parseable
Sortable (alphabetical order = chronological order)
Standard across programming languages
'dns': dns_scanner.scan_dns(self.domain)
Call our DNS scanner module. This runs all DNS checks (A, MX, SPF, DMARC) and returns a dictionary. The entire result becomes the value of the 'dns' key.
'ssl': ssl_scanner.scan_ssl(self.domain)
Call our SSL scanner module. Checks certificate validity, expiration, issuer, etc.
'headers': headers_scanner.scan_headers(self.domain)
Call our headers scanner module. Checks for security headers like HSTS, CSP, X-Frame-Options.
What the Results Look Like
After scanning github.com, self.results contains:
{
'domain': 'github.com',
'scan_time': '2024-12-27T15:30:45.123456+00:00',
'dns': {
'a_records': {'success': True, 'records': ['20.207.73.82']},
'mx_records': {'success': True, 'records': [...]},
'spf': {'success': True, 'record': 'v=spf1 ...'},
'dmarc': {'success': True, 'record': 'v=DMARC1; p=reject ...'}
},
'ssl': {
'success': True,
'valid': True,
'common_name': 'github.com',
'issuer': 'Sectigo Limited',
'days_until_expiry': 405,
...
},
'headers': {
'success': True,
'status_code': 200,
'headers': {
'Strict-Transport-Security': {'present': True, 'value': '...'},
'Content-Security-Policy': {'present': True, 'value': '...'},
...
}
}
}
A nested dictionary containing everything we learned about the domain.
Storing and Returning Results
self.results = {...}
return self.results
We do both:
Store in
self.results– So we can access later viascanner.resultsReturn the value – So callers can use it immediately
# Both of these work:
scanner.scan()
results = scanner.results # Access stored attribute
# Or:
results = scanner.scan() # Use return value directly
Part 6: The get_results Method
def get_results(self):
"""
Get the scan results.
Returns:
dict with results, or None if scan hasn't run
"""
return self.results
Why Have This Method?
You might wonder – can't we just access scanner.results directly? Yes, you can. So why have get_results()?
1. Encapsulation (Future-Proofing)
Right now, get_results() just returns self.results. But later we might add logic:
def get_results(self):
if self.results is None:
raise ValueError("No scan has been run yet")
return self.results
Or:
def get_results(self, format='dict'):
if format == 'json':
return json.dumps(self.results)
return self.results
Code that uses scanner.get_results() doesn't need to change when we add these features. Code that accesses scanner.results directly would.
2. Convention
In many programming languages, direct attribute access is discouraged. Python is more relaxed, but having getter methods makes the API clearer – especially for developers coming from other languages.
3. IDE Autocomplete
When you type scanner. in an IDE, methods appear in the autocomplete list with their docstrings. Direct attributes don't always show up as clearly.
When to Use Direct Access vs Methods
Direct Access (scanner.results) | Method (scanner.get_results()) |
| Quick scripts | Library/API code |
| Internal code | Public interfaces |
| Simple data retrieval | When logic might be added later |
For our project, either approach works. Having both options gives flexibility.
Part 7: How the Scanner Coordinates
Let's visualize what happens when you use the Scanner:
┌─────────────────────────────────────────────────────────────────┐
│ User Code │
│ │
│ scanner = Scanner("github.com") │
│ results = scanner.scan() │
│ │
└───────────────────────────────┬─────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ Scanner.scan() method │
│ │
│ 1. Record scan time │
│ 2. Call dns_scanner.scan_dns("github.com") │
│ 3. Call ssl_scanner.scan_ssl("github.com") │
│ 4. Call headers_scanner.scan_headers("github.com") │
│ 5. Combine results into one dictionary │
│ 6. Store in self.results │
│ 7. Return results │
│ │
└───────────────────────────────┬─────────────────────────────────┘
│
┌───────────────────────┼───────────────────────┐
│ │ │
▼ ▼ ▼
┌───────────────┐ ┌───────────────┐ ┌───────────────┐
│ dns_scanner │ │ ssl_scanner │ │headers_scanner│
│ │ │ │ │ │
│ • A records │ │ • Certificate │ │ • HSTS │
│ • MX records │ │ • Expiration │ │ • CSP │
│ • SPF │ │ • Issuer │ │ • X-Frame │
│ • DMARC │ │ • Validity │ │ • etc. │
│ │ │ │ │ │
└───────┬───────┘ └───────┬───────┘ └───────┬───────┘
│ │ │
│ Individual │ scanner │
│ results │ results │
│ │ │
└─────────────────────┼─────────────────────┘
│
▼
Combined Results Dictionary
The Scanner doesn't know how DNS lookups work. It doesn't know about SSL handshakes or HTTP headers. It just knows:
"I need DNS results from dns_scanner"
"I need SSL results from ssl_scanner"
"I need header results from headers_scanner"
"I combine them and return everything"
This is delegation – the Scanner delegates specific tasks to specialized modules.
Part 8: Using the Scanner
Let's see the Scanner in action:
Basic Usage
from scanner import Scanner
# Create a scanner for GitHub
scanner = Scanner("github.com")
# Run the scan
results = scanner.scan()
# Access results
print(f"Domain: {results['domain']}")
print(f"Scanned at: {results['scan_time']}")
# Check DNS
if results['dns']['spf']['success']:
print(f"SPF: {results['dns']['spf']['record']}")
# Check SSL
if results['ssl']['valid']:
print(f"Certificate expires in {results['ssl']['days_until_expiry']} days")
# Check Headers
for header, data in results['headers']['headers'].items():
status = "✓" if data['present'] else "✗"
print(f"{status} {header}")
Multiple Domains
domains = ["github.com", "google.com", "example.com"]
for domain in domains:
scanner = Scanner(domain)
results = scanner.scan()
# Quick summary
ssl_status = "✓" if results['ssl'].get('valid') else "✗"
headers_count = sum(1 for h in results['headers']['headers'].values() if h['present'])
print(f"{domain}: SSL {ssl_status}, {headers_count}/6 headers")
Output:
github.com: SSL ✓, 6/6 headers
google.com: SSL ✓, 4/6 headers
example.com: SSL ✓, 0/6 headers
Checking Scan Status
scanner = Scanner("github.com")
# Before scanning
print(scanner.results) # None
print(scanner.scan_time) # None
# After scanning
scanner.scan()
print(scanner.results) # {...full results...}
print(scanner.scan_time) # 2024-12-27 15:30:45+00:00
Part 9: Error Handling
What happens when something goes wrong? Each scanner module handles its own errors:
# If DNS fails
results['dns']['a_records'] = {'success': False, 'error': 'Domain does not exist'}
# If SSL fails
results['ssl'] = {'success': False, 'error': 'Connection timed out'}
# If headers fail
results['headers'] = {'success': False, 'error': 'SSL Error: certificate verify failed'}
The Scanner doesn't need to handle these – it just passes through whatever the individual modules return. Each part can fail independently:
{
'domain': 'sketchy-site.com',
'dns': {'a_records': {'success': True, ...}}, # DNS worked
'ssl': {'success': False, 'error': '...'}, # SSL failed
'headers': {'success': False, 'error': '...'} # Headers failed (SSL needed)
}
This tells us: the domain exists (DNS worked), but something's wrong with HTTPS.
Part 10: How This Fits Into the Project
Here's our updated project structure:
scanner/
├── __init__.py ✓ Exports Scanner class
├── scanner.py ✓ Main Scanner class (this post)
├── dns_scanner.py ✓ DNS checks
├── ssl_scanner.py ✓ SSL checks
└── headers_scanner.py ✓ Header checks
The scanner/__init__.py makes the Scanner class easy to import:
"""Scanner package for domain security analysis."""
from .scanner import Scanner
__all__ = ['Scanner']
Now external code can do:
from scanner import Scanner # Clean import
scanner = Scanner("github.com")
results = scanner.scan()
Instead of:
from scanner.scanner import Scanner # Awkward
Summary: What We've Learned
Classes bundle data and behavior – The Scanner holds a domain, results, and scan time together with methods to work on them
__init__initializes object state – Called automatically when creating objects, it sets up instance attributesselfrefers to the current object – Every method receives it, allowing access to the object's dataRelative imports stay within the package –
from . import modulefinds sibling files in the same folderDelegation separates concerns – The Scanner coordinates without knowing implementation details of DNS, SSL, or header checking
Store and return gives flexibility – Users can access results immediately or later through the stored attribute
The Complete Scanner Package
We now have a complete scanning package:
| File | Purpose | Key Export |
__init__.py | Package interface | Scanner |
scanner.py | Coordination | Scanner class |
dns_scanner.py | DNS checks | scan_dns() |
ssl_scanner.py | SSL checks | scan_ssl() |
headers_scanner.py | Header checks | scan_headers() |
All the scanning logic is encapsulated. External code just needs:
from scanner import Scanner
scanner = Scanner("example.com")
results = scanner.scan()
What's Next
The Scanner gives us raw data – a nested dictionary with all our findings. But working with nested dictionaries is tedious:
# This is annoying to write repeatedly
if results['ssl'].get('success') and results['ssl'].get('valid'):
days = results['ssl']['days_until_expiry']
In the next post, we'll build the models package – a new folder containing:
models/
├── __init__.py # Package interface
└── report.py # ScanReport class
The ScanReport class wraps the results dictionary and provides convenient properties:
# Much cleaner!
if report.has_valid_ssl:
days = report.ssl_days_remaining
We'll add properties like has_valid_ssl, has_spf, has_dmarc, missing_headers, and present_headers – making the data easy to work with for our CLI display and future scoring engine.
This is Part 10 of the Domain Security Analyzer series.
Find the code on GitHub.