A very common mistake when parsing the HTTP X-Forward-For header

Let’s say you have a web server behind a load balancer that acts as a reverse proxy.  Since the load balancer is likely changing the source IP of the packets with its own IP address, it stamps the client’s IP address in the X-Forwarded-For header and then passes it along to the backend server.

Assuming the web server has been configured to log this header instead of client IP, a typical log entry will look like this:

198.51.100.111, 203.0.113.222 – – [10/Mar/2020:01:15:19 +0000] “GET / HTTP/1.1” 200 3041 “-” “Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.122 Safari/537.36”

198.51.100.111 is the client’s IP address, and 203.0.113.222 is the Load Balancer’s IP address.   Pretty simple.  One would assume that it’s always the first entry that’s the client’s IP address, right?

Well no, because there’s an edge case.  Let’s say the client is behind a proxy server that’s already stamping X-Forward-For with the client’s internal IP address.  When the load balancer receives the HTTP request, it will often pass the X-Forwarded-For header unmodified to the web server, which then logs the request like this:

192.168.1.49, 198.51.100.111, 203.0.113.222 – – [10/Mar/2020:01:15:05 +0000] “GET /  HTTP/1.1” 200 5754 “-” “Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.132 Safari/537.36”

192.168.1.49 is the client’s true internal IP, but we don’t care about that since it’s RFC-1918 and not of any practical use.  So it’s actually the second to last entry (not necessarily the first!!!) that contains the client’s public IP address and the one that should be used for any Geo-IP functions.

Here’s some sample Python code:

#!/usr/bin/env python

import os

x_fwd_for = os.environ.get('HTTP_X_FORWARDED_FOR', '')

if ", " in x_fwd_for:
    client_ip = x_fwd_for.split(", ")[-2]
else:
    client_ip = os.environ.get('REMOTE_ADDR', '127.0.0.1')

If behind NGinx, a better solution is to prefer the X-Real-IP header instead:

import os

x_real_ip = os.environ.get('HTTP_X_REAL_IP', '')
x_fwd_for = os.environ.get('HTTP_X_FORWARDED_FOR', '')

if x_real_ip:
    client_ip = x_real_ip
elif x_fwd_for and ", " in x_fwd_for:
    client_ip = x_fwd_for.split(", ")[-2]
else:
    client_ip = os.environ.get('REMOTE_ADDR', '127.0.0.1')

Other platforms can easily be configured to stamp an X-Real-IP header as well.  For example, on an F5 BigIP LTM load balancer, this iRule will do the job:

when HTTP_REQUEST {
    HTTP::header insert X-Real-IP [IP::remote_addr]
}
Advertisement

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s