Migrating from CGI to WSGI for Python Web Scripts on Apache

I began finally migrating some old scripts from PHP to Python late last year, and while I was happy to finally have my PHP days behind me, I noticed the script execution was disappointing. On average, a Python CGI script would run 20-80% slower than an equivalent PHP script. At first I chalked it up to slower libraries, but even basic ones that didn’t rely on database or anything fancy still seemed to be incurring a performance hit.

Yesterday I happened to come across mention of WSGI, which is essentially a Python-specific replacement for CGI. I realized the overhead of CGI probably explained why my Python scripts were slower than PHP. So I wanted to give WSGI a spin and see if it could help.

Like PHP, WSGI is an Apache module that is not included in many pre-packaged versions. So first step is to install it.

On Debian/Ubuntu:

sudo apt-get install libapache2-mod-wsgi-py3

The install process should auto-activate the module.

cd /etc/apache2/mods-enabled/

ls -la wsgi*
lrwxrwxrwx 1 root root 27 Mar 23 22:13 wsgi.conf -> ../mods-available/wsgi.conf
lrwxrwxrwx 1 root root 27 Mar 23 22:13 wsgi.load -> ../mods-available/wsgi.load

On FreeBSD, the module does not get auto-activated and must be loaded via a config file:

sudo pkg install ap24-py37-mod_wsgi

# Create /usr/local/etc/apache24/Includes/wsgi.conf
# or similar, and add this line:
LoadModule wsgi_module libexec/apache24/mod_wsgi.so

Like CGI, the directory with the WSGI script will need special permissions. As a security best practice, it’s a good idea to have scripts located outside of any DocumentRoot, so the scripts can’t accidentally get served as plain files.

<Directory "/var/www/scripts">
  Require all granted
</Directory>

As for the WSGI script itself, it’s similar to AWS Lambda, using a pre-defined function. However, it returns an array or bytes rather than a dictionary. Here’s a simple one that will just spit out the host, path, and query string as JSON:

def application(environ, start_response):

    import json, traceback

    try:
        request = {
            'host': environ.get('HTTP_HOST', 'localhost'),
            'path': environ.get('REQUEST_URI', '/'),
            'query_string': {}
        }
        if '?' in request['path']:
            request['path'], query_string = environ.get('REQUEST_URI', '/').split('?')
            for _ in query_string.split('&'):
                [key, value] = _.split('=')
                request['query_string'][key] = value

        output = json.dumps(request, sort_keys=True, indent=2)
        response_headers = [
            ('Content-type', 'application/json'),
            ('Content-Length', str(len(output))),
            ('X-Backend-Server', 'Apache + mod_wsgi')
        ]
        start_response('200 OK', response_headers)
        return [ output.encode('utf-8') ]
            
    except:
        response_headers = [ ('Content-type', 'text/plain') ]
        start_response('500 Internal Server Error', response_headers)
        error = traceback.format_exc()
        return [ str(error).encode('utf-8') ]

The last step is route certain paths to WSGI script. This is done in the Apache VirtualHost configuration:

WSGIPythonPath /var/www/scripts

<VirtualHost *:80>
  ServerName python.mydomain.com
  ServerAdmin nobody@mydomain.com
  DocumentRoot /home/www/html
  Header set Access-Control-Allow-Origin: "*"
  Header set Access-Control-Allow-Methods: "*"
  Header set Access-Control-Allow-Headers: "Origin, X-Requested-With, Content-Type, Accept, Authorization"
  WSGIScriptAlias /myapp /var/www/scripts/myapp.wsgi
</VirtualHost>

Upon migrating a test URL from CGI to WSGI, the page load time dropped significantly:

The improvement is thanks to a 50-90% reduction in “wait” and “receive” times, via ThousandEyes:

I’d next want to look at more advanced Python Web Frameworks like Flask, Bottle, WheezyWeb and Tornado. Django is of course a popular option too, but I know from experience it won’t be the fastest. Flask isn’t the fastest either, but it is the framework for Google SAE which I plan to learn after mastering AWS Lambda.

Advertisement

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s