Migrating a CheckPoint Management Server in GCP from R80.40 to R81.10

Here’s an outline of the process

  • Launch a new R81.10 VM and create /var/log/mdss.json with the hostname and new IP address
  • On the old R80.40 VM, perform an export (this will result in services being stopped for ~ 15 minutes)
  • On the new R81.10 VM, perform an import. This will take about 30 minutes
  • If using BYOL, re-issue the license with the new IP address

Performing Export on old R80.40 Server

On the old R80.40 server, in GAIA, navigate to Maintenance -> System Backups. If not done already, run a backup. This will give a rough idea of how long the export job will take and the approximate file size including logs.

So for me, the export size can be assumed to be just under 1.2 GB. Then go to CLI and enter expert mode. First, run migrate_server verify

expert

cd $FWDIR/scripts

./migrate_server verify -v R81.10
The verify operation finished successfully.

Now actually do the export. Mine took about 15 minutes and resulted in 1.1 GB file when including logs.

./migrate_server export -v R81.10 -l /var/log/export.tgz

The export operation will eventually stop all Check Point services (cpstop; cpwd_admin kill). Do you want to continue (yes/no) [n]? yes

Exporting the Management Database
Operation started at Thu Jan  5 16:20:33 UTC 2023

[==================================================] 100% Done

The export operation completed successfully. Do you wish to start Check Point services (yes/no) [y]? y
Starting Check Point services ...
The export operation finished successfully. 
Exported data to: /var/log/export.tgz.

Then copy the image to something offsite using SCP or SFTP.

ls -la /var/log/export.tgz 
-rw-rw---- 1 admin root 1125166179 Jan  5 17:36 /var/log/export.tgz

scp /var/log/export.tgz billy@10.1.2.6:

Setting up the new R81.10 Server

After launching the VM, SSH in and set an admin user password and expert mode password. Then save config:

set user admin password

set expert-password

save config

Login to the Web GUI and start the setup wizard. This is pretty must just clicking through a bunch of “Next” buttons. It is recommend to enable NTP though and uncheck “Gateway” if this is a management-only server.

When the setup wizard has concluded, download and install SmartConsole, then the latest Hotfix

One rebooted, login via CLI, go to expert mode, and create a /var/log/mdss.json file that has the name of the Management server (as it appears in SmartConsole) and the new server’s internal IP address. Mine looks like this:

[{"name":"checkpoint-mgr","newIpAddress4":"10.22.33.44"}]

It’s not a bad idea to paste this in to a JSON Validator to ensure the syntax is proper. Also note the square outer brackets, even though there’s only one entry in the array.

Importing the Database

Now we’re ready to copy the exported file from the R80.40 server. /var/log typically has the most room, so that’s a good location. Then run the import command. For me, this took around 20-30 minutes.

scp billy@10.1.2.6:export.tgz /var/log/

cd $FWDIR/scripts
./migrate_server import -v R81.10 -l /var/log/export.tgz

Importing the Management Database
Operation started at Thu Jan  5 16:51:22 GMT 2023

The import operation finished successfully.

If a “Failed to import” message appears, check the /var/log/mdss.json file again. Make sure the brackets, quotes, commas, and colons are in the proper place.

After giving the new server a reboot for good measure, login to CLI and verify services are up and running. Note it takes 2-3 minutes for the services to be fully running:

cd $FWDIR/scripts
./cpm_status.sh 
Check Point Security Management Server is during initialization

./cpm_status.sh 
Check Point Security Management Server is running and ready

I then tried to login via R81.10 SmartConsole and got this message:

This is expected. The /var/log/mdss.json only manages the connection to the gateways, it doesn’t have anything to do with licensing for the management server itself. And, I would guess that doing the import results in the 14 day trial license being overridden. Just to confirm that theory, I launched a PAYG VM, re-did the migration, and no longer saw this error.

Updating the Management Server License

Login to User Center -> Assets/Info -> Product Center, locate the license, change the IP address, and install the new license. Since SmartConsole won’t load, this must be done via CLI.

cplic put 10.22.33.44 never XXXXXXX

I then gave a reboot and waited 2-3 minutes for services to fully start. At this point, I was able to login to SmartConsole and see the gateways, but they all showed red. This is also expected – to make them green, policy must be installed.

I first did a database install for the management server itself (Menu -> Install Database), which was successful. Then tried a policy install on the gateways and got a surprise – the policy push failed, complaining of

From the Management Server, I tried a basic telnet test for port 18191 and it did indeed fail:

telnet 10.22.33.121 18191
Trying 10.22.33.121..

At first I thought the issue was firewall rules, but concluded that the port 18191 traffic was reaching the gateway but being rejected, which indicates a SIC issue. Sure enough, a quick Google pointed me to this:

Policy installation fails with “TCP connection failure port=18191

Indeed, the CheckPoint deployment template for GCP uses “member-a” and “member-b” as the hostname suffix for the gateways, but we give them a slightly different name in order to be consistent with our internal naming scheme.

The fix is change the hostname in the CLI to match the gateway name configured in SmartConsole:

cp-cluster-member-a> set hostname newhostname
cp-cluster-member-01> set domainname mydomain.org
cp-cluster-member-01> save config

After that, the telnet test to port 18191 was successful, and SmartConsole indicated some communication:

Now I have to reset SIC on both gateways:

cp-cluster-member-01> cpconfig
This program will let you re-configure
your Check Point products configuration.

Configuration Options:
----------------------
(1)  Licenses and contracts
(2)  SNMP Extension
(3)  PKCS#11 Token
(4)  Random Pool
(5)  Secure Internal Communication
(6)  Disable cluster membership for this gateway
(7)  Enable Check Point Per Virtual System State
(8)  Enable Check Point ClusterXL for Bridge Active/Standby
(9)  Hyper-Threading
(10) Check Point CoreXL
(11) Automatic start of Check Point Products

(12) Exit

Enter your choice (1-12) :5



Configuring Secure Internal Communication...
============================================
The Secure Internal Communication is used for authentication between
Check Point components

Trust State: Trust established

 Would you like re-initialize communication? (y/n) [n] ? y

Note: The Secure Internal Communication will be reset now,
and all Check Point Services will be stopped (cpstop).
No communication will be possible until you reset and
re-initialize the communication properly!
Are you sure? (y/n) [n] ? y
Enter Activation Key: 
Retype Activation Key: 
initial_module:
Compiled OK.
initial_module:
Compiled OK.

Hardening OS Security: Initial policy will be applied
until the first policy is installed

The Secure Internal Communication was successfully initialized

Configuration Options:
----------------------
(1)  Licenses and contracts
(2)  SNMP Extension
(3)  PKCS#11 Token
(4)  Random Pool
(5)  Secure Internal Communication
(6)  Disable cluster membership for this gateway
(7)  Enable Check Point Per Virtual System State
(8)  Enable Check Point ClusterXL for Bridge Active/Standby
(9)  Hyper-Threading
(10) Check Point CoreXL
(11) Automatic start of Check Point Products

(12) Exit

Enter your choice (1-12) :12

Thank You...
cpwd_admin: 
Process AUTOUPDATER terminated 
cpwd_admin: 
Process DASERVICE terminated 

The services will restart, which triggers a failover. At this point, I went in to Smart Console, edited the member, reset SIC, re-entered the key, and initialized. The policy pushes then were successful and everything was green. The last remaining issue was an older R80.30 cluster complaining of the IDS module not responding. This resolved itself the next day.

Advertisement

Re-sizing the Disk of a CheckPoint R80.40 Management Server in GCP

Breaking down the problem

As we enter the last year of support for CheckPoint R80.40, it’s time to finally get all management servers upgraded to R81.10 (if not done already). But I ran in to a problem when creating a snapshot on our management server in GCP:

This screen didn’t quite make sense because it says 6.69 GB are free, but the root partition actually shows 4.4 GB:

[Expert@chkpt-mgr:0]# df
Filesystem                      1K-blocks     Used Available Use% Mounted on
/dev/mapper/vg_splat-lv_current  20961280 16551092   4410188  79% /
/dev/sda1                          297485    27216    254909  10% /boot
tmpfs                             7572656     3856   7568800   1% /dev/shm
/dev/mapper/vg_splat-lv_log      45066752 27846176  17220576  62% /var/log

As it turns out, the 6 GB mentioned is completely un-partitioned space set aside for GAIA internals:

[Expert@chkpt-mgr:0]# lvm_manager -l

Select action:

1) View LVM storage overview
2) Resize lv_current/lv_log Logical Volume
3) Quit
Select action: 1

LVM overview
============
                  Size(GB)   Used(GB)   Configurable    Description         
    lv_current    20         16         yes             Check Point OS and products
    lv_log        43         27         yes             Logs volume         
    upgrade       22         N/A        no              Reserved for version upgrade
    swap          8          N/A        no              Swap volume size    
    free          6          N/A        no              Unused space        
    -------       ----                                                      
    total         99         N/A        no              Total size  

This explains why the disk space is always inadequate – 20 GB for root, 43 GB for log, 22 GB for “upgrade” (which can’t be used in GCP), 8 GB swap, and the remaining 6 GB set aide for snapshots (which is too small to be of use).

To create enough space for a snapshot we have only one solution: expand the disk size.

List of Steps

After first taking a Disk Snapshot of the disk in GCP, I followed these steps:

! On VM, in expert mode:
rm /etc/autogrow
shutdown -h now

! Use gcloud to increase disk size to 160 GB
gcloud compute disks resize my-vm-name --size 160 --zone us-central1-c

! Start VM up again
gcloud compute instances start my-vm-name --zone us-central1-c

After bootup, ran parted -l and verify partition #4 has been added:

Expert@ckpt:0]# parted -l

Model: Google PersistentDisk (scsi)
Disk /dev/sda: 172GB
Sector size (logical/physical): 512B/4096B
Partition Table: gpt
Disk Flags: 

Number  Start   End     Size    File system     Name       Flags
 1      17.4kB  315MB   315MB   ext3                       boot
 2      315MB   8902MB  8587MB  linux-swap(v1)
 3      8902MB  107GB   98.5GB                             lvm
 4      107GB   172GB   64.4GB                  Linux LVM  lvm


Model: Linux device-mapper (linear) (dm)
Disk /dev/mapper/vg_splat-lv_log: 46.2GB
Sector size (logical/physical): 512B/4096B
Partition Table: loop
Disk Flags: 

Number  Start  End     Size    File system  Flags
 1      0.00B  46.2GB  46.2GB  xfs


Model: Linux device-mapper (linear) (dm)
Disk /dev/mapper/vg_splat-lv_current: 21.5GB
Sector size (logical/physical): 512B/4096B
Partition Table: loop
Disk Flags: 

Number  Start  End     Size    File system  Flags
 1      0.00B  21.5GB  21.5GB  xfs

Then converted the partition to an empty volume and gave it to GAIA:

pvcreate /dev/sda4 -ff
vgextend vg_splat /dev/sda4

After all this, lvm_manager shows the free disk space is being seen:

[Expert@ckpt:0]# lvm_manager

Select action:

1) View LVM storage overview
2) Resize lv_current/lv_log Logical Volume
3) Quit

Select action: 1

LVM overview
============
                  Size(GB)   Used(GB)   Configurable    Description         
    lv_current    20         8          yes             Check Point OS and products
    lv_log        43         4          yes             Logs volume         
    upgrade       22         N/A        no              Reserved for version upgrade
    swap          8          N/A        no              Swap volume size    
    free          126        N/A        no              Unused space        
    -------       ----                                                      
    total         219        N/A        no              Total size 

Creating a snapshot in GAIA is no longer a problem:

Benchmarking Ampere’s ARM CPU in Google Cloud Platform

While creating an instance today I noticed GCP offers ARM based CPUs made by Ampere, a company based in Santa Clara with a large office in Portland. The monthly cost runs about $30/mo for a single CPU with 4 GB RAM – a bit pricier than comparable N1, but slightly less than a comparable T2D, which is the ultra-fast AMD EPYC Milan platform.

Since I mostly run basic Debian packages and python scripts, CPU platform really wasn’t an issue, so I was curious to have a quick bake-off using a basic 16 thread sysbench test to mimic a light to moderate load. Here’s the results

t2a-standard-1

These are based on Ampere Altra and cost $29/mo in us-central1

CPU speed:
    events per second:  3438.95

General statistics:
    total time:                          10.0024s
    total number of events:              34401

Latency (ms):
         min:                                    0.28
         avg:                                    4.63
         max:                                   80.31
         95th percentile:                       59.99
         sum:                               159394.13

Threads fairness:
    events (avg/stddev):           2150.0625/4.94
    execution time (avg/stddev):   9.9621/0.03

t2d-standard-1

These are based on the new 3rd gen AMD Milan platform and cost $32/mo in us-central1

CPU speed:
    events per second:  3672.67

General statistics:
    total time:                          10.0027s
    total number of events:              36738

Latency (ms):
         min:                                    0.27
         avg:                                    4.34
         max:                                  100.28
         95th percentile:                       59.99
         sum:                               159498.26

Threads fairness:
    events (avg/stddev):           2296.1250/3.24
    execution time (avg/stddev):   9.9686/0.02

n1-standard-1

These are based on the older Intel Skylake platform and cost $25/mo in us-central1

Prime numbers limit: 10000

Initializing worker threads...

Threads started!

CPU speed:
    events per second:   913.60

General statistics:
    total time:                          10.0072s
    total number of events:              9144

Latency (ms):
         min:                                    1.08
         avg:                                   17.45
         max:                                   89.10
         95th percentile:                       61.08
         sum:                               159544.06

Threads fairness:
    events (avg/stddev):           571.5000/1.00
    execution time (avg/stddev):   9.9715/0.03

n2d-custom2-4096

These are based on 2nd generation AMD EPYC Rome and cost $44/mo in us-central1

CPU speed:
    events per second:  1623.41

General statistics:
    total time:                          10.0046s
    total number of events:              16243

Latency (ms):
         min:                                    0.89
         avg:                                    9.82
         max:                                   97.24
         95th percentile:                       29.19
         sum:                               159485.50

Threads fairness:
    events (avg/stddev):           1015.1875/3.13
    execution time (avg/stddev):   9.9678/0.02

n2-custom-2-4096

These are based in Intel Cascade Lake and cost $50/mo in us-central1

CPU speed:
    events per second:  1942.56

General statistics:
    total time:                          10.0036s
    total number of events:              19435

Latency (ms):
         min:                                    1.01
         avg:                                    8.21
         max:                                   57.04
         95th percentile:                       29.19
         sum:                               159499.92

Threads fairness:
    events (avg/stddev):           1214.6875/8.62
    execution time (avg/stddev):   9.9687/0.02

e2-medium

These are based on availability and have 1-2 shared CPU cores and cost $25/mo in us-central1

CPU speed:
    events per second:  1620.67

General statistics:
    total time:                          10.0055s
    total number of events:              16217

Latency (ms):
         min:                                    0.85
         avg:                                    9.84
         max:                                   65.18
         95th percentile:                       29.19
         sum:                               159647.07

Threads fairness:
    events (avg/stddev):           1013.5625/3.43
    execution time (avg/stddev):   9.9779/0.02

Summary

Amphere’s ARM CPUs offered slightly lower performance against the latest goodies from AMD. It did however beat it in the bang for buck ratio thanks to costing $1 less per month to run.

But, the key take away is both platforms completely blow away the older CPU platforms from Intel. Here’s some nice little charts visualizing the numbers.

A weird, ugly Error message when running google_ha_test.py

[Expert@cp-member-a:0]# $FWDIR/scripts/google_ha_test.py
GCP HA TESTER: started
GCP HA TESTER: checking access scopes...
GCP HA TESTER: ERROR 

Expecting value: line 1 column 1 (char 0)

Got this message when trying to test a CheckPoint R81.10 cluster build in a new environment. Obviously, this error message is not at all helpful in determining what the problem is. So I wrote a little debug script to try and isolate the issue:

import traceback
import gcp as _gcp 

global api
api = _gcp.GCP('IAM', max_time=20)
metadata = api.metadata()[0]

project = metadata['project']['projectId']
zone = metadata['instance']['zone'].split('/')[-1]
name = metadata['instance']['name']

print("Got metadata: project = {}, zone = {}, name = {}\n".format(project, zone, name))
path = "/projects/{}/zones/{}/instances/{}".format(project, zone, name)

try:
    head, res = api.rest("GET",path,query=None, body=None,aggregate=False)
except Exception as e:
    print(traceback.format_exc())

Running the script, I now see an exception when trying to make the initial API call:

[Expert@cp-cluster-member-a:0]# cd $FWDIR/scripts
[Expert@cp-cluster-member-a:0]# python3 ./debug.py

Got metadata: project = myproject, zone = us-central1-b, name = cp-member-a

Traceback (most recent call last):
  File "debug.py", line 18, in <module>
    head, res = api.rest(method,path,query=None,body=None,aggregate=False)
  File "/opt/CPsuite-R81.10/fw1/scripts/gcp.py", line 327, in rest
    max_time=self.max_time, proxy=self.proxy)
  File "/opt/CPsuite-R81.10/fw1/scripts/gcp.py", line 139, in http
    headers['_code']), headers, repr(response))
gcp.HTTPException: Unexpected HTTP code: 403

This at least indicates the connection to the API is OK and it’s some type of permissions issue with the account.

The CheckPoints have always been really tough to troubleshoot in this aspect, so to keep it simple, I deploy them with the default service account for the project. It’s not explicitly called out

I was able to re-enabled Editor permissions for the default service account with this Terraform code:

# Set Project ID via input variable
variable "project_id" {
  description = "GCP Project ID"
  type = string
}
# Get the default service account info for this project
data "google_compute_default_service_account" "default" {
  project = var.project_id
}
# Enable editor role for this service account
resource "google_project_iam_member" "default_service_account_editor" {
  project = var.project_id
  member  = "serviceAccount:${data.google_compute_default_service_account.default.email}"
  role    = "roles/editor"
}

Making Async Calls to Google Cloud Storage

I have a script doing real-time log analysis, where about 25 log files are stored in a Google Cloud Storage bucket. The files are always small (1-5 MB each) but the script was taking over 10 seconds to run, resulting in slow page load times and poor user experience. Performance analysis showed that most of the time was spent on the storage calls, with high overhead of requesting individual files.

I started thinking the best way to improve performance was to make the storage calls in an async fashion so as to download the files in parallel. This would require a special library capable of making such calls; after lots of Googling and trial and error I found a StackOverFlow post which mentioned gcloud AIO Storage. This worked very well, and after implementation I’m seeing a 125% speed improvement!

Here’s a rundown of the steps I did to get async working with GCS.)

1) Install gcloud AIO Storage:

pip install gcloud-aio-storage

2) In the Python code, start with some imports

import asyncio
from gcloud.aio.auth import Token
from gcloud.aio.storage import Storage

3) Create a function to read multiples from the same bucket:

async def IngestLogs(bucket_name, file_names, key_file = None):

    SCOPES = ["https://www.googleapis.com/auth/cloud-platform.read-only"]
    token = Token(service_file=key_file, scopes=SCOPES)
            
    async with Storage(token=token) as client:
        tasks = (client.download(bucket_name, _) for _ in file_names)
        blobs = await asyncio.gather(*tasks)
    await token.close()
                
    return blobs

It’s important to note that ‘blobs’ will be a list, with each element representing a binary version of the file.

4) Create some code to call the async function. The decode() function will convert each blob to a string.


def main():

    bucket_name = "my-gcs-bucket"
    file_names = {
       'file1': "path1/file1.abc",
       'file2': "path2/file2.def",
       'file3': "path3/file3.ghi",
    }
    key = "myproject-123456-mykey.json" 

    blobs = asyncio.run(IngestLogs(bucket_name, file_names.values(), key_file=key))

    for blob in blobs:
        # Print the first line from each blob
        print(blob.decode('utf-8')[0:75])

I track the load times via NewRelic synthetics, and it showed a 300% performance improvement!

Using GCP Python SDK for Network Tasks

Last week, I finally got around to hitting the GCP API directly using Python. It’s pretty easy to do in hindsight. Steps are below


If not done already, install PIP. On Debian 10, the command is this:

sudo apt install python3-pip

Then of course install the Python packages for GCP:

sudo pip3 install google-api-python-client google-cloud-storage

Now you’re ready to write some Python code. Start with a couple imports:

#!/usr/bin/env python3 

from googleapiclient import discovery
from google.oauth2 import service_account

By default, the default compute service account for the VM or AppEngine will be used for authentication. Alternately, a service account can be specific with the key’s JSON file:

KEY_FILE = '../mykey.json'
creds = service_account.Credentials.from_service_account_file(KEY_FILE)

Connecting to the Compute API will look like this. If using the default service account, the ‘credentials’ argument is not required.

resource_object = discovery.build('compute', 'v1', credentials=creds)

All API calls require the project ID (not name) be provided as a parameter. I will set it like this:

PROJECT_ID = "myproject-1234"

With the connection to the API established, you can now run some commands. The resource object will have several methods, and in each there will typically be a list() method to list the items in the project. The execute() at the end is required to actually execute the call.

_ = resource_object.firewalls().list(project=PROJECT_ID).execute()

It’s important to note the list().execute() returns a dictionary. The actual list of items can be found in key ‘items’. I’ll use the get() method to retrieve the values for the ‘items’ key, or use an empty list if ‘items’ doesn’t exist. Here’s an example

firewall_rules = _.get('items', [])
print(len(firewall_rules), "firewall rules in project", PROJECT_ID)
for firewall_rule in firewall_rules:
    print(" -", firewall_rule['name'])

The API reference guide has a complete list of everything that’s available. Here’s some examples:

firewalls() - List firewall rules
globalAddresses() - List all global addresses
healthChecks() - List load balancer health checks
subnetworks() - List subnets within a given region
vpnTunnels() - List configured VPN tunnels

Some calls will require the region name as a parameter. To get a list of all regions, this can be done:

_ = resource_object.regions().list(project=PROJECT_ID).execute()
regions = [region['name'] for region in _.get('items', [])]

Then iterate through each region. For example to list all subnets:

for region in regions:
    _ = resource_object.subnetworks().list(project=PROJECT_ID,region=region).execute()
    print("Reading subnets for region", region ,"...")
    subnets = _.get('items', [])
    for subnet in subnets:
        print(" -", subnet['name'], subnet['ipCidrRange'])

Deploying a Container to GCP Cloud Run

Copy a working container and review the Dockerfile

git clone https://github.com/jeheyer/network-automation.git
cd network-automation/docker/flask2-sqlchemy
cat Dockerfile

Verify logged in to gcloud and set to correct project:

gcloud projects list
gcloud config set project <PROJECT_ID>

Where <PROJECT_ID> is the Project ID

Then build the container and upload the image to Google Container Registry. Note this step is missing from the quickstart guide

gcloud builds submit --tag gcr.io/<PROJECT_ID>/flask2-sqlalchemy .

Now pick a region and deploy the container. In this example ‘flask2’ is the service name:

gcloud config set run/region us-central1
gcloud run deploy flask2 --image gcr.io/<PROJECT_ID>/flask2-sqlalchemy --allow-unauthenticated

VPNs to Google Cloud Platform (GCP) when FortiGate is behind a NAT gateway

Ran in to problems getting a VPN up and running between GCP and a FortiGate 60-E that was behind a NAT gateway (with ports udp/500 + udp/4500 forwarded). On the GCP side, these messages would show up in the logs:

remote host is behind NAT
generating IKE_AUTH response 1 [ N(AUTH_FAILED) ]
looking for peer configs matching GCP.VPN.GATEWAY.IP[%any]...203.0.113.77[192.168.1.1]

This error means that GCP connected to the Peer VPN gateway successfully, but it in the IKEv2 headers, it identified itself by the private IP rather than the expected public one. AWS is not picky about this, but with GCP, the Peer VPN gateway must identify itself by using the same external IP address of the NAT device.

Most vendors have long supported an option to manually override the IP address for such scenarios. In Cisco IOS or IOS-XE, this can be controlled in the IKEv2 profile with the identity local address option:

crypto ikev2 profile GCP_IKEV2_PROFILE
 match address local interface GigabitEthernet1
 identity local address MY.PUBLIC.IP.ADDRESS
 authentication remote pre-share
 authentication local pre-share
 keyring local GCP_KEYRING
 lifetime 36000
 dpd 20 5 on-demand
!

With Palo Alto, this is configured in the IKE Gateway, Local Identification field:

For the sake of argument, we’ll say that CheckPoint uses the “Statically NATed IP” field to influence Local ID, although this doesn’t actually work.

Fortigate does offer “Local ID” field in version 6.4.6 and higher, under the Phase 1 proposal:

Seems nice and straightforward, but even after changing this setting, the VPN tunnel still won’t establish. Logs on the GCP end change slightly and now show this:

looking for peer configs matching GCP.VPN.GATEWAY.IP[%any]...203.0.113.77[203.0.113.77]

The private IP is no longer showing, so it seems the issue should be solved. Instead, GCP reports a “Peer not responding” message. The Fortigate actually reports Phase 1 success, waits a few seconds, and then starts the negotiation all over. So not very helpful.

I configured a test VPN between the FortiGate and a Palo Alto, which then gave a very specific and extremely useful error message:

IKE phase-1 negotiation is failed. When pre-shared key is used, peer-ID must be type IP address. Received type FQDN

Now this explains the problem! Even though the FortiGate is sending the correct IP address in the IKEv2 header, it’s being sent as the wrong identity type. The 5 identity types are listed in RFC 7815:

  • ID_IPV4_ADDR = 32 bit IPv4 address
  • ID_IPV6_ADDR = 128 bit IPv6 address
  • ID_FQDN = DNS hostname
  • ID_RFC822_ADDR = e-mail address
  • ID_KEY_ID = octet stream

If Fortigate were smart, it would either default to IPv4 address type or auto-determine this based on the text inputted in to the field. But it seems to simply default to FQDN. Oddly, there is a CLI option called “localid-type” under the Phase1-interface that clear is intended to provide this functionality:

FGT60E1234567890 # config vpn ipsec phase1-interface

FGT60E1234567890 (phase1-interface) # edit gcp

FGT60E1234567890 (gcp) # set localid-type 
auto         Select ID type automatically.
fqdn         Use fully qualified domain name.
user-fqdn    Use user fully qualified domain name.
keyid        Use key-id string.
address      Use local IP address.

But, similar to CheckPoint, it just doesn’t work, and can be considered a broken feature.

Since GCP does not support FQDN authentication, VPNs between GCP and FortiGates behind a NAT are not possible at this time.

Enabling Private Google Access in Google Cloud Platform

By default, calls to the various Google Cloud APIs will resolve to a random Google-owned IP, and require outbound internet access, either via external IP, Cloud NAT, or 3rd party network appliance.

If outbound Internet is not required for the application, or not desired for security reasons, enabling Private Google Access allows VM instances to connect an internally routed prefix.

  1. On the subnet, turn on Private Google Access via the radio button
  2. By default, all egress traffic is permitted. If egress traffic is being denied deliberately, create a rule allowing egress traffic to destination 199.36.153.8/30, tcp ports 80 and 443
  3. Create a Private DNS zone called googleapis.com, and apply it to any networks that will use Google Private Access.

In the DNS zone, create two entries:

An A record called ‘private’ that resolves to the following 4 IP addresses:

  • 199.36.153.8
  • 199.36.153.9
  • 199.36.153.10
  • 199.36.153.11

A wildcard record ‘*’ that points to hostname ‘private’

The zone will look like this once created.

Private Google Access should now be working. To test it, ping something.googleapis.com and it should resolve to one of those 4 IP addresses

ping www.googleapis.com
PING private.googleapis.com (199.36.153.9) 56(84) bytes of data.

Getting started with Flask and deploying Python apps to Google App Engine

Installing Flask on Linux or Mac

On Debian 10 or Ubuntu 20:

sudo pip3 install flask flask-cors

On Mac or FreeBSD:

sudo pip install flask flask-cors

Creating a basic flask app:

#!/usr/bin/env python3

from flask import Flask, request, jsonify

app = Flask(__name__)

@app.route("/", defaults = {'path': ""})
@app.route("/<string:path>")
@app.route("/<path:path>")

def index(path):
    req_info = {
        'host': request.host.split(':')[0],
        'path': "/" + path,
        'query_string': request.args,
        'remote_addr': request.environ.get('HTTP_X_REAL_IP', request.remote_addr),
        'user_agent': request.user_agent.string
    }
    return jsonify(req_info)

if __name__ == '__main__':
    app.run()

Run the app

chmod u+x main.py
./main.py
Running on http://127.0.0.1:5000/ (Press CTRL+C to quit)

Do a test curl against it

$ curl -v "http://localhost:5000/oh/snap?x=1&x=2"

< HTTP/1.0 200 OK
< Content-Type: application/json
< Content-Length: 65
< Access-Control-Allow-Origin: *
< Server: Werkzeug/1.0.1 Python/3.7.8
< Date: Wed, 21 Apr 2021 17:07:58 GMT
<
{"host":"localhost","path":"/oh/snap","query_string":{"x":"1"},"remote_addr":"127.0.0.1","user_agent":"curl/7.72.0"}

Deploying to Google Cloud App Engine

Create a requirements.txt file:

echo "flask" > requirements.txt

Create an app.yaml file:

printf "runtime: python38\nenv: standard\n" > app.yaml 

Now deploy the app to Google using the gCloud command:

gcloud app deploy