Getting web server variables and query parameters in different Python Frameworks

As I explore different ways of doing Web programming in Python via different Frameworks, I kept finding the need to examine HTTP server variables, specifically the server hostname, path, and query string. The method to do this varies quite a bit by framework.

Given the following URL: http://www.mydomain.com:8000/derp/test.py?name=Harry&occupation=Hit%20Man

I want to create the following variables with the following values:

  • server_host is ‘www.mydomain.com’
  • server_port is 8000
  • path is ‘/derp/test.py’
  • query_params is this dictionary: {‘name’: ‘Harry’, ‘occupation’: ‘Hit Man’}

Old School CGI

cgi.FieldStorage() is the easy way to do this, but it returns a list of tuples, and must be converted to a dictionary.

#!/usr/bin/env python3

if __name__ == "__main__":

    import os, cgi

    server_host = os.environ.get('HTTP_HOST', 'localhost')
    server_port = os.environ.get('SERVER_PORT', 80)
    path = os.environ.get('SCRIPT_URL', '/')
    query_params = {}
    _ = cgi.FieldStorage()
    for key in _:
        query_params[key] = str(_[key].value)

Note this will convert all values to strings. By default, cgi.FieldStorage() create numberic values as int or float.

WSGI

Similar to CGI, but environment variables get passed simply in a dictionary as the first parameter. There is no need to load the os module.

def application(environ, start_response):

    from urllib import parse

    server_host = environ.get('HTTP_HOST', 'localhost')
    server_port = environ.get('SERVER_PORT', 80)
    path = environ.get('SCRIPT_URL', '/')
    query_params = {}
    if '?' in environ.get('REQUEST_URI', '/'):
        query_params = dict(parse.parse_qsl(parse.urlsplit(environ['REQUEST_URI']).query))

Since the CGI Headers don’t exist, urllib.parse can be used to analyze the REQUEST_URI environment variable in order to form the dictionary.

Flask

Flask makes this very easy. The only real trick comes with path; the ‘/’ gets removed, so it must be re-added

from flask import Flask, request

app = Flask(__name__)

# Route all possible paths here
@app.route("/", defaults={"path": ""})
@app.route('/<string:path>')
@app.route("/<path:path>")

def index(path):
      
    [server_host, server_port] = request.host.split(':')
    path =  "/" + path
    query_params = request.args
 

FastAPI

This one’s a slightly different because the main variable to examine actually a QueryParams object with is a form of MultiDict

from fastapi import FastAPI, Request

app = FastAPI()

# Route all possible paths here
@app.get("/")
@app.get("/{path:path}")
def root(path, req: Request):

    [server_host, server_port] = req.headers['host'].split(':')
    path = "/" + path
    query_params = dict(req.query_params)

AWS Lambda

Lambda presents a dictionary called ‘event’ to the handler and it’s simply a matter of grabbing the right keys:

def lambda_handler(event, context):

    server_host = event['headers']['host']
    server_port = event['headers']['X-Forwarded-Port']
    path = event['path']
    query_params = event['queryStringParameters']

If multiValueheaders are enabled, some of the variables come in as lists, which in turn may have a list as values, even if there’s only one item.

    server_host = event['multiValueHeaders']['host'][0]
    query_params = {}
    for _ in event["multiValueQueryStringParameters"].items():
        query_params[_[0]] = _[1][0]

Advertisement

Getting started with Flask and deploying Python apps to Google App Engine

Installing Flask on Linux or Mac

On Debian 10 or Ubuntu 20:

sudo pip3 install flask flask-cors

On Mac or FreeBSD:

sudo pip install flask flask-cors

Creating a basic flask app:

#!/usr/bin/env python3

from flask import Flask, request, jsonify

app = Flask(__name__)

@app.route("/", defaults = {'path': ""})
@app.route("/<string:path>")
@app.route("/<path:path>")

def index(path):
    req_info = {
        'host': request.host.split(':')[0],
        'path': "/" + path,
        'query_string': request.args,
        'remote_addr': request.environ.get('HTTP_X_REAL_IP', request.remote_addr),
        'user_agent': request.user_agent.string
    }
    return jsonify(req_info)

if __name__ == '__main__':
    app.run()

Run the app

chmod u+x main.py
./main.py
Running on http://127.0.0.1:5000/ (Press CTRL+C to quit)

Do a test curl against it

$ curl -v "http://localhost:5000/oh/snap?x=1&x=2"

< HTTP/1.0 200 OK
< Content-Type: application/json
< Content-Length: 65
< Access-Control-Allow-Origin: *
< Server: Werkzeug/1.0.1 Python/3.7.8
< Date: Wed, 21 Apr 2021 17:07:58 GMT
<
{"host":"localhost","path":"/oh/snap","query_string":{"x":"1"},"remote_addr":"127.0.0.1","user_agent":"curl/7.72.0"}

Deploying to Google Cloud App Engine

Create a requirements.txt file:

echo "flask" > requirements.txt

Create an app.yaml file:

printf "runtime: python38\nenv: standard\n" > app.yaml 

Now deploy the app to Google using the gCloud command:

gcloud app deploy

Migrating from CGI to WSGI for Python Web Scripts on Apache

I began finally migrating some old scripts from PHP to Python late last year, and while I was happy to finally have my PHP days behind me, I noticed the script execution was disappointing. On average, a Python CGI script would run 20-80% slower than an equivalent PHP script. At first I chalked it up to slower libraries, but even basic ones that didn’t rely on database or anything fancy still seemed to be incurring a performance hit.

Yesterday I happened to come across mention of WSGI, which is essentially a Python-specific replacement for CGI. I realized the overhead of CGI probably explained why my Python scripts were slower than PHP. So I wanted to give WSGI a spin and see if it could help.

Like PHP, WSGI is an Apache module that is not included in many pre-packaged versions. So first step is to install it.

On Debian/Ubuntu:

sudo apt-get install libapache2-mod-wsgi-py3

The install process should auto-activate the module.

cd /etc/apache2/mods-enabled/

ls -la wsgi*
lrwxrwxrwx 1 root root 27 Mar 23 22:13 wsgi.conf -> ../mods-available/wsgi.conf
lrwxrwxrwx 1 root root 27 Mar 23 22:13 wsgi.load -> ../mods-available/wsgi.load

On FreeBSD, the module does not get auto-activated and must be loaded via a config file:

sudo pkg install ap24-py37-mod_wsgi

# Create /usr/local/etc/apache24/Includes/wsgi.conf
# or similar, and add this line:
LoadModule wsgi_module libexec/apache24/mod_wsgi.so

Like CGI, the directory with the WSGI script will need special permissions. As a security best practice, it’s a good idea to have scripts located outside of any DocumentRoot, so the scripts can’t accidentally get served as plain files.

<Directory "/var/www/scripts">
  Require all granted
</Directory>

As for the WSGI script itself, it’s similar to AWS Lambda, using a pre-defined function. However, it returns an array or bytes rather than a dictionary. Here’s a simple one that will just spit out the host, path, and query string as JSON:

def application(environ, start_response):

    import json, traceback

    try:
        request = {
            'host': environ.get('HTTP_HOST', 'localhost'),
            'path': environ.get('REQUEST_URI', '/'),
            'query_string': {}
        }
        if '?' in request['path']:
            request['path'], query_string = environ.get('REQUEST_URI', '/').split('?')
            for _ in query_string.split('&'):
                [key, value] = _.split('=')
                request['query_string'][key] = value

        output = json.dumps(request, sort_keys=True, indent=2)
        response_headers = [
            ('Content-type', 'application/json'),
            ('Content-Length', str(len(output))),
            ('X-Backend-Server', 'Apache + mod_wsgi')
        ]
        start_response('200 OK', response_headers)
        return [ output.encode('utf-8') ]
            
    except:
        response_headers = [ ('Content-type', 'text/plain') ]
        start_response('500 Internal Server Error', response_headers)
        error = traceback.format_exc()
        return [ str(error).encode('utf-8') ]

The last step is route certain paths to WSGI script. This is done in the Apache VirtualHost configuration:

WSGIPythonPath /var/www/scripts

<VirtualHost *:80>
  ServerName python.mydomain.com
  ServerAdmin nobody@mydomain.com
  DocumentRoot /home/www/html
  Header set Access-Control-Allow-Origin: "*"
  Header set Access-Control-Allow-Methods: "*"
  Header set Access-Control-Allow-Headers: "Origin, X-Requested-With, Content-Type, Accept, Authorization"
  WSGIScriptAlias /myapp /var/www/scripts/myapp.wsgi
</VirtualHost>

Upon migrating a test URL from CGI to WSGI, the page load time dropped significantly:

The improvement is thanks to a 50-90% reduction in “wait” and “receive” times, via ThousandEyes:

I’d next want to look at more advanced Python Web Frameworks like Flask, Bottle, WheezyWeb and Tornado. Django is of course a popular option too, but I know from experience it won’t be the fastest. Flask isn’t the fastest either, but it is the framework for Google SAE which I plan to learn after mastering AWS Lambda.