GCP HTTP/HTTPS Load Balancers offer great performance, and today I learned of a cool almost hidden feature: the ability to stamp custom headers with client GeoIP info. Here’s a Terraform config snippet:
This will cause the Backend Service to stamp all HTTP requests with a custom header called “X-Client-Geo-Location” with the country abbreviation and city. It can then be parsed on the server to get this information for the client without having to rely on messy X-Forwarded-For parsing and GeoIP lookups.
Here’s a Python example that redirects the user to UK or Australia localized websites:
#!/usr/bin/env python3
import os
try:
client_location = os.environ.get('HTTP_X_CLIENT_GEO_LOCATION', None)
if client_location:
[country,city] = client_location.split(',')
websites = { 'UK': "www.foo.co.uk", 'AU': "www.foo.au" }
if country in websites:
local_website = websites[country]
else:
local_website = "www.foo.com"
print("Status: 301\nLocation: https://{}\n".format(local_website))
except Exception as e:
print("Status: 500\nContent-Type: text/plain\n\n{}".format(e))
To use BGP routing on an AWS or GCP VPN connection, the tunnel interface needs to have its IP address assigned as a /32 and then the remote IP specified:
config system interface
edit "GCP"set vdom "root"set ip 169.254.0.2 255.255.255.255set type tunnelset remote-ip 169.254.0.1 255.255.255.255set interface "wan1"next
end
BGP can be configured under the GUI in Network -> BGP in most cases, but the CLI has additional options. Here’s an example config for a peer 169.254.0.1 with ASN 64512, announcing the 192.168.1.0/24 prefix.
config router bgp
set as 65000
set router-id 192.168.1.254
set keepalive-timer 10
set holdtime-timer 30
set scan-time 15
config neighbor
edit "169.254.0.1"
set remote-as 64512
next
end
config network
edit 1
set prefix 192.168.1.0 255.255.255.0
next
end
I was doing a deep dive read of supported IKEv2 ciphers on GCP native VPNs today and thought I’d setup a quick lab to see which settings would provide best throughput. Lab setup was as follows:
Palo Alto VM-300 on m4.xlarge in us-east-2 (Ohio)
IKEv2 VPN to GCP us-east4 (N. Virginia)
Latency is a steady 13ms round trip time
AWS side test instance is t3.xlarge (4 vCPU / 16 GB RAM)
GCP side test instance is e2-standard-4 (4 vCPU / 16 GB RAM)
Both VMs running Ubuntu Linux 18.04.4
File is 500 MB binary file transfer via SCP
Throughput speeds (in Mbps) using DH Group 14 (2048-bit) PFS:
Encryption / Hash
SHA-512
SHA-256
SHA-1
AES-GCM 256-bit
664
668
672
AES-GCM 128-bit
648
680
704
AES-CBC 256-bit
510
516
616
AES-CBC 192-bit
492
523
624
AES-CBC 128-bit
494
573
658
Average: 604 Mbps
Throughput speeds (in Mbps) using DH Group 5 (1536-bit) PFS:
Encryption / Hash
SHA-512
SHA-256
SHA-1
AES-GCM 256-bit
700
557
571
AES-GCM 128-bit
660
676
616
AES-CBC 256-bit
464
448
656
AES-CBC 192-bit
595
528
464
AES-CBC 128-bit
605
484
587
Average: 574 Mbps
Throughput speeds (in Mbps) using DH Group 2 (1024-bit) PFS:
Encryption / Hash
SHA-512
SHA-256
SHA-1
AES-GCM 256-bit
680
626
635
AES-GCM 128-bit
672
664
680
AES-CBC 256-bit
584
452
664
AES-CBC 192-bit
536
520
664
AES-CBC 128-bit
528
502
656
Average: 608 Mbps
Key Takeaways
GCP will prefer AES-CBC in their negotiations, but AES-GCM provides roughly 25% better throughput. So if throughput is paramount, be sure to have only AES-GCM in the IPSec profile.
If using AES-CBC, SHA-1, while deprecated, is 13% faster than SHA-256 and 25% faster than SHA-512. Since SAs are rebuilt every 3 hours, cracking isn’t as large a concern as in typical SHA-1 use cases.
DH Group does not affect speeds. May as well use the strongest mutually supported value, which is Group 14 (2048-bit). GCP does not support Elliptic Curve (Groups 19-21) so these couldn’t be tested. I would expect faster SA build times, but no change in transfer speeds.
Assuming SHA-256 and Group 14 PFS, this graph summarizes the results:
Follow the instructions here which are summarized below
Add the Google Cloud SDK as a package source:
echo "deb [signed-by=/usr/share/keyrings/cloud.google.gpg] https://packages.cloud.google.com/apt cloud-sdk main"| sudo tee -a /etc/apt/sources.list.d/google-cloud-sdk.list
Healthchecks are failing, even though the service is running and open via fw rules
Healthchecks actually originate from GCP directly, rather than the load balancer instance itself. So these networks must be whitelisted in the firewall rules:
35.191.0.0/16
130.211.0.0/22
The LB works in the same region, but does not respond from different regions
By default, load balancers operate in regional-only mode. To switch to global, edit the frontend properties and look for this radio button:
A minimum 3 NICs are required and will be broken down like so:
eth0 – Public / External Interface facing Internet
eth1 – Management interface used for Cluster sync. Can also be used for security management server communication
eth2 – First internal interface. Usually faces internal servers & load balancers. Can be used for security management server communication
The Deployment launch template has a few fields which aren’t explained very well…
Security Management Server address
A static route to this destination via management interface will be created a launch time. If the Security Management server is accessed via one of the internal interfaces, use a dummy address here such as 1.2.3.4/32 and add the static routes after launch.
SIC key
This is the password to communicate with the Security Management server. It can be set after launch, but if already known, it can be set here to be pre-configured at launch
Automatically generate an administrator password
This will create a new random ‘admin’ user password to allow access to the WebGUI right after launch, which saves some time especially in situations were SSH is slow or blocked.
Note – SSH connections always require public key authentication, even with this enabled
Allow download from/upload to Check Point
This will allow the instance to communicate outbound to Checkpoint to check for updates. It’s enabled by default on most CheckPoint appliances, so I’d recommend enabling this setting
Networking
This is the real catch, and a pretty stupid one. The form fills out these three subnets:
“Cluster External Subnet CIDR” = 10.0.0.0/24
“Management external subnet CIDR” = 10.0.1.0/24
“1st internal subnet CIDR” = 10.0.2.0/24
If using an existing network, erase the pre-filled value and then select the appropriate networks in the drop-down menus like so:
Also, make sure all subnets have “Private Google Access” checked
Post-launch Configuration
After launch, access the gateways via SSH using public key and/or WebGUI to run through initial setup. The first step is set a new password for the admin user:
set user admin password
set expert-password
Since eth1 rather than eth0 is the management interface, I would recommend setting that accordingly:
set management interface eth1
I would also recommend adding static routes. The deployment will create static routes for RFC 1918 space via the management interface. If these need to be overridden to go via an internal interface the CLI command is something like this
set static-route NETWORK/MASK nexthop gateway address NEXTHOP_ADDRESS on
Before importing in to SmartConsole, you can test connectivity by trying to telnet to the security management’s server address on port 18191. Once everything looks good, don’t forget to save the configuration:
save config
Cluster Creation
In SmartConsole, create a new ClusterXL. When prompted for the cluster address, enter the primary cluster address. The easy way to find this is look the the deployment result under Tools -> Deployment manager -> Deployments
Then add the individual gateways with the management interface. Walking through the wizard, you’ll need to define the type of each interface:
Set the first (external) interface to private use
Set the secondary (management) interface as sync/primary
Set subsequent interfaces as private use with monitoring.
Note the wizard tends to list the interfaces backwards: eth2, eth1, eth0
The guide lists a few steps to do within the Gateway Cluster Properties, several of which I disagree with. Instead, I’d suggest the following:
Under Network Management, VPN Domain, create a group that lists the internal subnets behind the Checkpoint that will be accessed via site-to-site and remote access VPNs
On the eth1 interface, set Topology to Override / This Network / Network defined by routes. This should allow IP spoofing to remain enabled
Under NAT, do not check “Hide internal networks behind the Gateway’s external IP” as this will auto-generate a NAT rule that could conflict with site-to-site VPNs. Instead, create manual NAT rules in the policy.
Under IPSec VPN, Link Selection, Source IP address Settings, set Manual / IP address of chosen interface
Do a policy install on the new cluster, and a few minutes later, the GCP console should map the primary and secondary external IP addresses to the two instances
Failover
Failover is done via API call and takes roughly 15 seconds.
On the external network (front end), the primary and secondary firewalls will each get external IP address mapped. CheckPoint calls these “primary-cluster-address” and “secondary-cluster-address”. I’d argue “active” and “standby” would be better names, because the addresses will flip during a failover event.
On the internal network (back end0, failover is done by modifying the static route to 0.0.0.0/0. The entries will be created on the internal networks when the cluster is formed.
Known Problems
The script $FWDIR/scripts/gcp_ha_test.py is missing
This is simply a mistake in CheckPoint’s documentation. The correct file name is:
$FWDIR/scripts/google_ha_test.py
Deployment Fails with error code 504, Resource Error, Timeout expired
Also, while the instances get created and External static IPs allocated, the secondary cluster IP never gets mapped and failover does not work.
Cause: there is a portion of the R80.30 deployment script relating to external IP address mapping that assumes the default service account is enabled, but many Enterprise customers will have default service account disabled as a security best practice. As of January 2020, the only fix is to enable the default service account, then redo the deployment.
StackDriver is enabled at launch, but never gets logs
Same issue as a above. As of January 2020, it depends on the default service account being enabled.
I was able to follow this tutorial but had to make a few adjustments. The main one is to configure the public IP address in the IKEv2 profile (see step 3 below).
Remember of course that the router will need UDP ports 500 & 4500 forwarded by the firewall, which also must support ESP passthrough.
3) Create a custom IKEv2 profile. Note the highlighted public IP address and also the lifetime and DPD interval settings.
crypto ikev2 profile GCP_IKEV2_PROFILE
match address local interface GigabitEthernet0
match identity remote address 0.0.0.0
! If router is behind NAT, set this to the public IPidentity local address 203.0.113.222
authentication remote pre-share
authentication local pre-share
keyring local MY_KEYRING
lifetime 36000 ! 10 hour SA lifetime dpd 60 5 periodic ! 1 minute keepalives
!
4) Configure a custom IPSec transform set and profile. This is 128-bit AES encryption with SHA-256 integrity:
! IPsec Settings
crypto ipsec transform-set ESP_AES128_SHA256 esp-aes esp-sha256-hmac
!
crypto ipsec profile GCP_IPSEC_PROFILE
set security-association lifetime kilobytes disable
set security-association lifetime seconds 10800
set transform-set ESP_AES128_SHA256
set pfs group14 ! 2048-bit
set ikev2-profile GCP_IKEV2_PROFILE
!
5) Finally, create the tunnel interface. Unlike the IKEv2 profile, this simply references the External interface, not the public IP:
interface Tunnel1
ip address 169.254.0.2 255.255.255.252
ip mtu 1460
ip virtual-reassembly in
ip tcp adjust-mss 1420
tunnel source GigabitEthernet0
tunnel mode ipsec ipv4
tunnel destination 35.212.226.126
tunnel protection ipsec profile GCP_IPSEC_PROFILE
!
Troubleshooting
The SAs should look like this:
Router#show crypto ikev2 sa
IPv4 Crypto IKEv2 SA
Tunnel-id Local Remote fvrf/ivrf Status
2 192.168.1.123/4500 35.212.226.126/4500 none/none READY
Encr: AES-CBC, keysize: 128, Hash: SHA256, DH Grp:14, Auth sign: PSK, Auth verify: PSK
Life/Active Time: 36000/1226 sec
Router#show crypto ipsec sa peer 35.212.226.126
interface: Tunnel1
Crypto map tag: Tunnel1-head-0, local addr 192.168.1.123
protected vrf: (none)
local ident (addr/mask/prot/port): (0.0.0.0/0.0.0.0/0/0)
remote ident (addr/mask/prot/port): (0.0.0.0/0.0.0.0/0/0)
current_peer 35.212.226.126 port 4500
PERMIT, flags={origin_is_acl,}
#pkts encaps: 45, #pkts encrypt: 45, #pkts digest: 45
#pkts decaps: 58, #pkts decrypt: 58, #pkts verify: 58
#pkts compressed: 0, #pkts decompressed: 0
#pkts not compressed: 0, #pkts compr. failed: 0
#pkts not decompressed: 0, #pkts decompress failed: 0
#send errors 0, #recv errors 0
local crypto endpt.: 192.168.1.123, remote crypto endpt.: 35.212.226.126
path mtu 1500, ip mtu 1500, ip mtu idb GigabitEthernet0
current outbound spi: 0x962EDB69(2519653225)
PFS (Y/N): N, DH group: none
inbound esp sas:
spi: 0x10B829B(17531547)
transform: esp-aes esp-sha-hmac ,
in use settings ={Tunnel UDP-Encaps, }
conn id: 5, flow_id: Onboard VPN:5, sibling_flags 80000040, crypto map: Tunnel1-head-0
sa timing: remaining key lifetime (sec): (14259)
Kilobyte Volume Rekey has been disabled
IV size: 16 bytes
replay detection support: Y replay window size: 1024
Status: ACTIVE(ACTIVE)
inbound ah sas:
inbound pcp sas:
outbound esp sas:
spi: 0x962EDB69(2519653225)
transform: esp-aes esp-sha-hmac ,
in use settings ={Tunnel UDP-Encaps, }
conn id: 6, flow_id: Onboard VPN:6, sibling_flags 80000040, crypto map: Tunnel1-head-0
sa timing: remaining key lifetime (sec): (14259)
Kilobyte Volume Rekey has been disabled
IV size: 16 bytes
replay detection support: Y replay window size: 1024
Status: ACTIVE(ACTIVE)
outbound ah sas:
outbound pcp sas:
Tried my first VPN to GCP and didn’t have much luck with IKEv1. While it did detect the remote router being behind NAT, Phase1 wouldn’t come up due to an ID mismatch:
received NAT-T (RFC 3947) vendor ID
remote host is behind NAT
IDir '192.168.1.123' does not match to '203.0.113.222'
Where 192.168.1.123 is the Real private IP of the router and 203.0.113.222 is the public NAT IP.
This is consistent with the GCP documentation on this topic, which states the following:
When using one-to-one NAT, your on-premises VPN gateway must identify itself using the same external IP address of the NAT device