Getting Let’s Encrypt SSL Certificates on Linux and FreeBSD

First install certbot. This is basically the Python script that will read the web server configuration and make the request to the Let’s Encrypt API.

On Debian or Ubuntu:

sudo apt install certbot
sudo apt install python3-certbot-nginx
sudo apt install python3-certbot-apache

On FreeBSD:

sudo pkg install py37-certbot
sudo pkg install py37-certbot-nginx
sudo pkg install py37-certbot-apache
sudo pkg install py37-acme

Note that certbot can only match virtual hosts that listen on port 80.

Run this command for Nginx:

sudo certbot certonly --nginx

Or for Apache:

sudo certbot certonly --apache

Certificates will get saved in /etc/letsencrypt/live on Linux, or /usr/local/etc/letsencrypt/live on FreeBSD

In each sub-directory, there will be 4 files created:

  • privkey.pem = The private key
  • cert.pem = The SSL certificate
  • fullchain.pem = SSL cert + Intermediate Cert chain. This format is required by NGINX and some other web servers
  • chain.pem = Just the intermediate cert

Here’s a Python script that will create a list of all directories with Let’s Encrypt certs:

#!/usr/bin/env python3

import sys, os

if "linux" in sys.platform:
    src_dir = "/etc/letsencrypt/live"
if "freebsd" in sys.platform:
    src_dir = "/usr/local/etc/letsencrypt/live"

sites = [ f.name for f in os.scandir(src_dir) if f.is_dir() ]
for site in sites:
    if os.path.exists(src_dir + "/" + site + "/cert.pem"):
        print("Letsencrypt certificate exists for site:", site)
Advertisement

Migrating from CGI to WSGI for Python Web Scripts on Apache

I began finally migrating some old scripts from PHP to Python late last year, and while I was happy to finally have my PHP days behind me, I noticed the script execution was disappointing. On average, a Python CGI script would run 20-80% slower than an equivalent PHP script. At first I chalked it up to slower libraries, but even basic ones that didn’t rely on database or anything fancy still seemed to be incurring a performance hit.

Yesterday I happened to come across mention of WSGI, which is essentially a Python-specific replacement for CGI. I realized the overhead of CGI probably explained why my Python scripts were slower than PHP. So I wanted to give WSGI a spin and see if it could help.

Like PHP, WSGI is an Apache module that is not included in many pre-packaged versions. So first step is to install it.

On Debian/Ubuntu:

sudo apt-get install libapache2-mod-wsgi-py3

The install process should auto-activate the module.

cd /etc/apache2/mods-enabled/

ls -la wsgi*
lrwxrwxrwx 1 root root 27 Mar 23 22:13 wsgi.conf -> ../mods-available/wsgi.conf
lrwxrwxrwx 1 root root 27 Mar 23 22:13 wsgi.load -> ../mods-available/wsgi.load

On FreeBSD, the module does not get auto-activated and must be loaded via a config file:

sudo pkg install ap24-py37-mod_wsgi

# Create /usr/local/etc/apache24/Includes/wsgi.conf
# or similar, and add this line:
LoadModule wsgi_module libexec/apache24/mod_wsgi.so

Like CGI, the directory with the WSGI script will need special permissions. As a security best practice, it’s a good idea to have scripts located outside of any DocumentRoot, so the scripts can’t accidentally get served as plain files.

<Directory "/var/www/scripts">
  Require all granted
</Directory>

As for the WSGI script itself, it’s similar to AWS Lambda, using a pre-defined function. However, it returns an array or bytes rather than a dictionary. Here’s a simple one that will just spit out the host, path, and query string as JSON:

def application(environ, start_response):

    import json, traceback

    try:
        request = {
            'host': environ.get('HTTP_HOST', 'localhost'),
            'path': environ.get('REQUEST_URI', '/'),
            'query_string': {}
        }
        if '?' in request['path']:
            request['path'], query_string = environ.get('REQUEST_URI', '/').split('?')
            for _ in query_string.split('&'):
                [key, value] = _.split('=')
                request['query_string'][key] = value

        output = json.dumps(request, sort_keys=True, indent=2)
        response_headers = [
            ('Content-type', 'application/json'),
            ('Content-Length', str(len(output))),
            ('X-Backend-Server', 'Apache + mod_wsgi')
        ]
        start_response('200 OK', response_headers)
        return [ output.encode('utf-8') ]
            
    except:
        response_headers = [ ('Content-type', 'text/plain') ]
        start_response('500 Internal Server Error', response_headers)
        error = traceback.format_exc()
        return [ str(error).encode('utf-8') ]

The last step is route certain paths to WSGI script. This is done in the Apache VirtualHost configuration:

WSGIPythonPath /var/www/scripts

<VirtualHost *:80>
  ServerName python.mydomain.com
  ServerAdmin nobody@mydomain.com
  DocumentRoot /home/www/html
  Header set Access-Control-Allow-Origin: "*"
  Header set Access-Control-Allow-Methods: "*"
  Header set Access-Control-Allow-Headers: "Origin, X-Requested-With, Content-Type, Accept, Authorization"
  WSGIScriptAlias /myapp /var/www/scripts/myapp.wsgi
</VirtualHost>

Upon migrating a test URL from CGI to WSGI, the page load time dropped significantly:

The improvement is thanks to a 50-90% reduction in “wait” and “receive” times, via ThousandEyes:

I’d next want to look at more advanced Python Web Frameworks like Flask, Bottle, WheezyWeb and Tornado. Django is of course a popular option too, but I know from experience it won’t be the fastest. Flask isn’t the fastest either, but it is the framework for Google SAE which I plan to learn after mastering AWS Lambda.

Working with HTTP Requests & Responses in Python

http.client

http client is very fast, but also low-level, and takes a few more lines of code than the others.

import http.client
import ssl

try:
    conn = http.client.HTTPSConnection("www.hightail.com", port=443, timeout=5, context=ssl._create_unverified_context())
    conn.request("GET", "/en_US/theme_default/images/hightop_250px.png")
    resp = conn.getresponse()
    if 301 <= resp.status <= 302:
        print("Status: {}\nLocation: {}\n".format(resp.status,resp.headers['Location']))
    else:
        print("Status: {}\nContent-Type: {}\n".format(resp.status, resp.headers['Content-Type']))

except Exception as e:
    print("Status: 500\nContent-Type: text/plain\n\n{}".format(e))

Requests

Requests is a 3rd party, high level library. It does have a simpler format to use, but is much slower than http.client and is not natively supported on AWS Lambda.

import requests

url = "http://www.yousendit.com"
try:
    resp = requests.get(url, params = {}, timeout = 5, allow_redirects = False)
    if 301 <= resp.status_code <= 302:
        print("Status: {}\nLocation: {}\n".format(resp.status_code,resp.headers['Location']))
    else:
        print("Status: {}\nContent-Type: {}\n".format(resp.status_code, resp.headers['Content-Type']))

except Exception as e:
    print("Status: 500\nContent-Type: text/plain\n\n{}".format(e))

A very common mistake when parsing the HTTP X-Forward-For header

Let’s say you have a web server behind a load balancer that acts as a reverse proxy.  Since the load balancer is likely changing the source IP of the packets with its own IP address, it stamps the client’s IP address in the X-Forwarded-For header and then passes it along to the backend server.

Assuming the web server has been configured to log this header instead of client IP, a typical log entry will look like this:

198.51.100.111, 203.0.113.222 – – [10/Mar/2020:01:15:19 +0000] “GET / HTTP/1.1” 200 3041 “-” “Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.122 Safari/537.36”

198.51.100.111 is the client’s IP address, and 203.0.113.222 is the Load Balancer’s IP address.   Pretty simple.  One would assume that it’s always the first entry that’s the client’s IP address, right?

Well no, because there’s an edge case.  Let’s say the client is behind a proxy server that’s already stamping X-Forward-For with the client’s internal IP address.  When the load balancer receives the HTTP request, it will often pass the X-Forwarded-For header unmodified to the web server, which then logs the request like this:

192.168.1.49, 198.51.100.111, 203.0.113.222 – – [10/Mar/2020:01:15:05 +0000] “GET /  HTTP/1.1” 200 5754 “-” “Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.132 Safari/537.36”

192.168.1.49 is the client’s true internal IP, but we don’t care about that since it’s RFC-1918 and not of any practical use.  So it’s actually the second to last entry (not necessarily the first!!!) that contains the client’s public IP address and the one that should be used for any Geo-IP functions.

Here’s some sample Python code:

#!/usr/bin/env python

import os

x_fwd_for = os.environ.get('HTTP_X_FORWARDED_FOR', '')

if ", " in x_fwd_for:
    client_ip = x_fwd_for.split(", ")[-2]
else:
    client_ip = os.environ.get('REMOTE_ADDR', '127.0.0.1')

If behind NGinx, a better solution is to prefer the X-Real-IP header instead:

import os

x_real_ip = os.environ.get('HTTP_X_REAL_IP', '')
x_fwd_for = os.environ.get('HTTP_X_FORWARDED_FOR', '')

if x_real_ip:
    client_ip = x_real_ip
elif x_fwd_for and ", " in x_fwd_for:
    client_ip = x_fwd_for.split(", ")[-2]
else:
    client_ip = os.environ.get('REMOTE_ADDR', '127.0.0.1')

Other platforms can easily be configured to stamp an X-Real-IP header as well.  For example, on an F5 BigIP LTM load balancer, this iRule will do the job:

when HTTP_REQUEST {
    HTTP::header insert X-Real-IP [IP::remote_addr]
}