Migrating a CheckPoint Management Server in GCP from R80.40 to R81.10

Here’s an outline of the process

  • Launch a new R81.10 VM and create /var/log/mdss.json with the hostname and new IP address
  • On the old R80.40 VM, perform an export (this will result in services being stopped for ~ 15 minutes)
  • On the new R81.10 VM, perform an import. This will take about 30 minutes
  • If using BYOL, re-issue the license with the new IP address

Performing Export on old R80.40 Server

On the old R80.40 server, in GAIA, navigate to Maintenance -> System Backups. If not done already, run a backup. This will give a rough idea of how long the export job will take and the approximate file size including logs.

So for me, the export size can be assumed to be just under 1.2 GB. Then go to CLI and enter expert mode. First, run migrate_server verify

expert

cd $FWDIR/scripts

./migrate_server verify -v R81.10
The verify operation finished successfully.

Now actually do the export. Mine took about 15 minutes and resulted in 1.1 GB file when including logs.

./migrate_server export -v R81.10 -l /var/log/export.tgz

The export operation will eventually stop all Check Point services (cpstop; cpwd_admin kill). Do you want to continue (yes/no) [n]? yes

Exporting the Management Database
Operation started at Thu Jan  5 16:20:33 UTC 2023

[==================================================] 100% Done

The export operation completed successfully. Do you wish to start Check Point services (yes/no) [y]? y
Starting Check Point services ...
The export operation finished successfully. 
Exported data to: /var/log/export.tgz.

Then copy the image to something offsite using SCP or SFTP.

ls -la /var/log/export.tgz 
-rw-rw---- 1 admin root 1125166179 Jan  5 17:36 /var/log/export.tgz

scp /var/log/export.tgz billy@10.1.2.6:

Setting up the new R81.10 Server

After launching the VM, SSH in and set an admin user password and expert mode password. Then save config:

set user admin password

set expert-password

save config

Login to the Web GUI and start the setup wizard. This is pretty must just clicking through a bunch of “Next” buttons. It is recommend to enable NTP though and uncheck “Gateway” if this is a management-only server.

When the setup wizard has concluded, download and install SmartConsole, then the latest Hotfix

One rebooted, login via CLI, go to expert mode, and create a /var/log/mdss.json file that has the name of the Management server (as it appears in SmartConsole) and the new server’s internal IP address. Mine looks like this:

[{"name":"checkpoint-mgr","newIpAddress4":"10.22.33.44"}]

It’s not a bad idea to paste this in to a JSON Validator to ensure the syntax is proper. Also note the square outer brackets, even though there’s only one entry in the array.

Importing the Database

Now we’re ready to copy the exported file from the R80.40 server. /var/log typically has the most room, so that’s a good location. Then run the import command. For me, this took around 20-30 minutes.

scp billy@10.1.2.6:export.tgz /var/log/

cd $FWDIR/scripts
./migrate_server import -v R81.10 -l /var/log/export.tgz

Importing the Management Database
Operation started at Thu Jan  5 16:51:22 GMT 2023

The import operation finished successfully.

If a “Failed to import” message appears, check the /var/log/mdss.json file again. Make sure the brackets, quotes, commas, and colons are in the proper place.

After giving the new server a reboot for good measure, login to CLI and verify services are up and running. Note it takes 2-3 minutes for the services to be fully running:

cd $FWDIR/scripts
./cpm_status.sh 
Check Point Security Management Server is during initialization

./cpm_status.sh 
Check Point Security Management Server is running and ready

I then tried to login via R81.10 SmartConsole and got this message:

This is expected. The /var/log/mdss.json only manages the connection to the gateways, it doesn’t have anything to do with licensing for the management server itself. And, I would guess that doing the import results in the 14 day trial license being overridden. Just to confirm that theory, I launched a PAYG VM, re-did the migration, and no longer saw this error.

Updating the Management Server License

Login to User Center -> Assets/Info -> Product Center, locate the license, change the IP address, and install the new license. Since SmartConsole won’t load, this must be done via CLI.

cplic put 10.22.33.44 never XXXXXXX

I then gave a reboot and waited 2-3 minutes for services to fully start. At this point, I was able to login to SmartConsole and see the gateways, but they all showed red. This is also expected – to make them green, policy must be installed.

I first did a database install for the management server itself (Menu -> Install Database), which was successful. Then tried a policy install on the gateways and got a surprise – the policy push failed, complaining of

From the Management Server, I tried a basic telnet test for port 18191 and it did indeed fail:

telnet 10.22.33.121 18191
Trying 10.22.33.121..

At first I thought the issue was firewall rules, but concluded that the port 18191 traffic was reaching the gateway but being rejected, which indicates a SIC issue. Sure enough, a quick Google pointed me to this:

Policy installation fails with “TCP connection failure port=18191

Indeed, the CheckPoint deployment template for GCP uses “member-a” and “member-b” as the hostname suffix for the gateways, but we give them a slightly different name in order to be consistent with our internal naming scheme.

The fix is change the hostname in the CLI to match the gateway name configured in SmartConsole:

cp-cluster-member-a> set hostname newhostname
cp-cluster-member-01> set domainname mydomain.org
cp-cluster-member-01> save config

After that, the telnet test to port 18191 was successful, and SmartConsole indicated some communication:

Now I have to reset SIC on both gateways:

cp-cluster-member-01> cpconfig
This program will let you re-configure
your Check Point products configuration.

Configuration Options:
----------------------
(1)  Licenses and contracts
(2)  SNMP Extension
(3)  PKCS#11 Token
(4)  Random Pool
(5)  Secure Internal Communication
(6)  Disable cluster membership for this gateway
(7)  Enable Check Point Per Virtual System State
(8)  Enable Check Point ClusterXL for Bridge Active/Standby
(9)  Hyper-Threading
(10) Check Point CoreXL
(11) Automatic start of Check Point Products

(12) Exit

Enter your choice (1-12) :5



Configuring Secure Internal Communication...
============================================
The Secure Internal Communication is used for authentication between
Check Point components

Trust State: Trust established

 Would you like re-initialize communication? (y/n) [n] ? y

Note: The Secure Internal Communication will be reset now,
and all Check Point Services will be stopped (cpstop).
No communication will be possible until you reset and
re-initialize the communication properly!
Are you sure? (y/n) [n] ? y
Enter Activation Key: 
Retype Activation Key: 
initial_module:
Compiled OK.
initial_module:
Compiled OK.

Hardening OS Security: Initial policy will be applied
until the first policy is installed

The Secure Internal Communication was successfully initialized

Configuration Options:
----------------------
(1)  Licenses and contracts
(2)  SNMP Extension
(3)  PKCS#11 Token
(4)  Random Pool
(5)  Secure Internal Communication
(6)  Disable cluster membership for this gateway
(7)  Enable Check Point Per Virtual System State
(8)  Enable Check Point ClusterXL for Bridge Active/Standby
(9)  Hyper-Threading
(10) Check Point CoreXL
(11) Automatic start of Check Point Products

(12) Exit

Enter your choice (1-12) :12

Thank You...
cpwd_admin: 
Process AUTOUPDATER terminated 
cpwd_admin: 
Process DASERVICE terminated 

The services will restart, which triggers a failover. At this point, I went in to Smart Console, edited the member, reset SIC, re-entered the key, and initialized. The policy pushes then were successful and everything was green. The last remaining issue was an older R80.30 cluster complaining of the IDS module not responding. This resolved itself the next day.

Advertisement

Re-sizing the Disk of a CheckPoint R80.40 Management Server in GCP

Breaking down the problem

As we enter the last year of support for CheckPoint R80.40, it’s time to finally get all management servers upgraded to R81.10 (if not done already). But I ran in to a problem when creating a snapshot on our management server in GCP:

This screen didn’t quite make sense because it says 6.69 GB are free, but the root partition actually shows 4.4 GB:

[Expert@chkpt-mgr:0]# df
Filesystem                      1K-blocks     Used Available Use% Mounted on
/dev/mapper/vg_splat-lv_current  20961280 16551092   4410188  79% /
/dev/sda1                          297485    27216    254909  10% /boot
tmpfs                             7572656     3856   7568800   1% /dev/shm
/dev/mapper/vg_splat-lv_log      45066752 27846176  17220576  62% /var/log

As it turns out, the 6 GB mentioned is completely un-partitioned space set aside for GAIA internals:

[Expert@chkpt-mgr:0]# lvm_manager -l

Select action:

1) View LVM storage overview
2) Resize lv_current/lv_log Logical Volume
3) Quit
Select action: 1

LVM overview
============
                  Size(GB)   Used(GB)   Configurable    Description         
    lv_current    20         16         yes             Check Point OS and products
    lv_log        43         27         yes             Logs volume         
    upgrade       22         N/A        no              Reserved for version upgrade
    swap          8          N/A        no              Swap volume size    
    free          6          N/A        no              Unused space        
    -------       ----                                                      
    total         99         N/A        no              Total size  

This explains why the disk space is always inadequate – 20 GB for root, 43 GB for log, 22 GB for “upgrade” (which can’t be used in GCP), 8 GB swap, and the remaining 6 GB set aide for snapshots (which is too small to be of use).

To create enough space for a snapshot we have only one solution: expand the disk size.

List of Steps

After first taking a Disk Snapshot of the disk in GCP, I followed these steps:

! On VM, in expert mode:
rm /etc/autogrow
shutdown -h now

! Use gcloud to increase disk size to 160 GB
gcloud compute disks resize my-vm-name --size 160 --zone us-central1-c

! Start VM up again
gcloud compute instances start my-vm-name --zone us-central1-c

After bootup, ran parted -l and verify partition #4 has been added:

Expert@ckpt:0]# parted -l

Model: Google PersistentDisk (scsi)
Disk /dev/sda: 172GB
Sector size (logical/physical): 512B/4096B
Partition Table: gpt
Disk Flags: 

Number  Start   End     Size    File system     Name       Flags
 1      17.4kB  315MB   315MB   ext3                       boot
 2      315MB   8902MB  8587MB  linux-swap(v1)
 3      8902MB  107GB   98.5GB                             lvm
 4      107GB   172GB   64.4GB                  Linux LVM  lvm


Model: Linux device-mapper (linear) (dm)
Disk /dev/mapper/vg_splat-lv_log: 46.2GB
Sector size (logical/physical): 512B/4096B
Partition Table: loop
Disk Flags: 

Number  Start  End     Size    File system  Flags
 1      0.00B  46.2GB  46.2GB  xfs


Model: Linux device-mapper (linear) (dm)
Disk /dev/mapper/vg_splat-lv_current: 21.5GB
Sector size (logical/physical): 512B/4096B
Partition Table: loop
Disk Flags: 

Number  Start  End     Size    File system  Flags
 1      0.00B  21.5GB  21.5GB  xfs

Then converted the partition to an empty volume and gave it to GAIA:

pvcreate /dev/sda4 -ff
vgextend vg_splat /dev/sda4

After all this, lvm_manager shows the free disk space is being seen:

[Expert@ckpt:0]# lvm_manager

Select action:

1) View LVM storage overview
2) Resize lv_current/lv_log Logical Volume
3) Quit

Select action: 1

LVM overview
============
                  Size(GB)   Used(GB)   Configurable    Description         
    lv_current    20         8          yes             Check Point OS and products
    lv_log        43         4          yes             Logs volume         
    upgrade       22         N/A        no              Reserved for version upgrade
    swap          8          N/A        no              Swap volume size    
    free          126        N/A        no              Unused space        
    -------       ----                                                      
    total         219        N/A        no              Total size 

Creating a snapshot in GAIA is no longer a problem:

A weird, ugly Error message when running google_ha_test.py

[Expert@cp-member-a:0]# $FWDIR/scripts/google_ha_test.py
GCP HA TESTER: started
GCP HA TESTER: checking access scopes...
GCP HA TESTER: ERROR 

Expecting value: line 1 column 1 (char 0)

Got this message when trying to test a CheckPoint R81.10 cluster build in a new environment. Obviously, this error message is not at all helpful in determining what the problem is. So I wrote a little debug script to try and isolate the issue:

import traceback
import gcp as _gcp 

global api
api = _gcp.GCP('IAM', max_time=20)
metadata = api.metadata()[0]

project = metadata['project']['projectId']
zone = metadata['instance']['zone'].split('/')[-1]
name = metadata['instance']['name']

print("Got metadata: project = {}, zone = {}, name = {}\n".format(project, zone, name))
path = "/projects/{}/zones/{}/instances/{}".format(project, zone, name)

try:
    head, res = api.rest("GET",path,query=None, body=None,aggregate=False)
except Exception as e:
    print(traceback.format_exc())

Running the script, I now see an exception when trying to make the initial API call:

[Expert@cp-cluster-member-a:0]# cd $FWDIR/scripts
[Expert@cp-cluster-member-a:0]# python3 ./debug.py

Got metadata: project = myproject, zone = us-central1-b, name = cp-member-a

Traceback (most recent call last):
  File "debug.py", line 18, in <module>
    head, res = api.rest(method,path,query=None,body=None,aggregate=False)
  File "/opt/CPsuite-R81.10/fw1/scripts/gcp.py", line 327, in rest
    max_time=self.max_time, proxy=self.proxy)
  File "/opt/CPsuite-R81.10/fw1/scripts/gcp.py", line 139, in http
    headers['_code']), headers, repr(response))
gcp.HTTPException: Unexpected HTTP code: 403

This at least indicates the connection to the API is OK and it’s some type of permissions issue with the account.

The CheckPoints have always been really tough to troubleshoot in this aspect, so to keep it simple, I deploy them with the default service account for the project. It’s not explicitly called out

I was able to re-enabled Editor permissions for the default service account with this Terraform code:

# Set Project ID via input variable
variable "project_id" {
  description = "GCP Project ID"
  type = string
}
# Get the default service account info for this project
data "google_compute_default_service_account" "default" {
  project = var.project_id
}
# Enable editor role for this service account
resource "google_project_iam_member" "default_service_account_editor" {
  project = var.project_id
  member  = "serviceAccount:${data.google_compute_default_service_account.default.email}"
  role    = "roles/editor"
}

CheckPoint SmartView Monitor shows Permanent Tunnels Down, even though they’re up

Being fairly new to CheckPoint, I hadn’t yet used SmartView monitor, which is the windows desktop monitoring application. At first glance it wasn’t very useful. I had terminated several test tunnels to various Cisco, FortiGate, and Palo Alto firewalls, all of which were working fine. But they all showed down in SmartView. What the heck?

Reason: When it comes to monitoring tunnels, CheckPoint by default uses a proprietary protocol they call “tunnel_test” (udp/18234). In order to properly monitor VPN tunnels to Non-CheckPoint Devices, DPD (dead peer detection) must be used.

Here’s how to enable DPD on an interoperable device:

  1. In the CheckPoint SmartConsole folder (usually C:\Program Files (x86)\CheckPoint\SmartConsole), run GuiDBedit.exe
  2. Under Network Objects folder -> network_objects, look for the interoperable device Object. The class name will be “gateway_plain”
  3. Search for Field name tunnel_keepalive_method and change it to dpd
  4. File -> Save All, exit.
  5. Restart SmartConsole and install policy to the applicable Checkpoint gateways / clusters

After making that change, pushing policy, and restarting SmartView Monitor, the tunnels now show green:

Cisco ISR G2 to CheckPoint R80.30 IKEv1 VPN woes

I had previously done Cisco router to CheckPoint R80.30 gateway VPNs before without issue, but for whatever reason could not even establish phase 1 for this one. CheckPoint R80 VPN communities default to AES-256, SHA-1, Group 2, and 1-day timetime which is easy to match on the Cisco with this config:

crypto keyring mycheckpoint
 local-address GigabitEthernet0/0
 pre-shared-key address 192.0.2.190 key abcdefghij1234567890
!
crypto isakmp policy 100
 encr aes 256
 authentication pre-share
 group 2
 hash sha          ! <--- default value
 lifetime 864000   ! <--- default value
!

After verifying connectivity, doing packet captures, and multiple reboots on on both ends, IKE simply would not come up. On the Cisco ISR, debug crypto isakmp wasn’t especially helpful:

Jun 18 11:06:17.085: ISAKMP: (0):purging SA., sa=3246F97C, delme=3246F97C
Jun 18 11:06:17.285: ISAKMP: (0):SA request profile is (NULL)
Jun 18 11:06:17.285: ISAKMP: (0):Created a peer struct for 35.245.62.190, peer port 500
Jun 18 11:06:17.285: ISAKMP: (0):New peer created peer = 0x2CE62C3C peer_handle = 0x80000005
Jun 18 11:06:17.285: ISAKMP: (0):Locking peer struct 0x2CE62C3C, refcount 1 for isakmp_initiator
Jun 18 11:06:17.285: ISAKMP: (0):local port 500, remote port 500
Jun 18 11:06:17.285: ISAKMP: (0):set new node 0 to QM_IDLE
Jun 18 11:06:17.285: ISAKMP: (0):insert sa successfully sa = 2CE620E8
Jun 18 11:06:17.285: ISAKMP: (0):Can not start Aggressive mode, trying Main mode.
Jun 18 11:06:17.285: ISAKMP: (0):found peer pre-shared key matching 192.0.2.190
Jun 18 11:06:17.285: ISAKMP: (0):constructed NAT-T vendor-rfc3947 ID
Jun 18 11:06:17.285: ISAKMP: (0):constructed NAT-T vendor-07 ID
Jun 18 11:06:17.285: ISAKMP: (0):constructed NAT-T vendor-03 ID
Jun 18 11:06:17.285: ISAKMP: (0):constructed NAT-T vendor-02 ID
Jun 18 11:06:17.285: ISAKMP: (0):Input = IKE_MESG_FROM_IPSEC, IKE_SA_REQ_MM
Jun 18 11:06:17.285: ISAKMP: (0):Old State = IKE_READY New State = IKE_I_MM1
Jun 18 11:06:17.285: ISAKMP: (0):beginning Main Mode exchange
Jun 18 11:06:17.285: ISAKMP-PAK: (0):sending packet to 192.0.2.190 my_port 500 peer_port 500 (I) MM_NO_STATE
Jun 18 11:06:17.285: ISAKMP: (0):Sending an IKE IPv4 Packet.
Jun 18 11:06:17.369: ISAKMP-PAK: (0):received packet from 192.0.2.190 dport 500 sport 500 Global (I) MM_NO_STATE
Jun 18 11:06:17.369: ISAKMP-ERROR: (0):Couldn't find node: message_id 2303169274
Jun 18 11:06:17.369: ISAKMP-ERROR: (0):(0): Unknown Input IKE_MESG_FROM_PEER, IKE_INFO_NOTIFY: state = IKE_I_MM1
Jun 18 11:06:17.369: ISAKMP: (0):Input = IKE_MESG_FROM_PEER, IKE_INFO_NOTIFY
Jun 18 11:06:17.369: ISAKMP: (0):Old State = IKE_I_MM1 New State = IKE_I_MM1

The CheckPoint gave a more “useful” error:

Main Mode Failed to match proposal: Transform: AES-256, SHA1, Group 2 (1024 bit); Reason: Wrong value for: Authentication Method

This seemed to imply the CheckPoint was expecting certificate-based authentication rather than PSK. In traditional mode, the gateway is set by default for certificate only. But it’s not clear how this is configured in newer versions.

After poking around settings for quite a while, I simply deleted the VPN community in CheckPoint SmartConsole and re-created it. The connection then popped up immediately.

¯\_(ツ)_/¯

Reset admin password for CheckPoint IaaS Gateway in GCP or AWS

Someone changed the admin password, but we could still access the gateway via the SSH key. The processes for resetting the password, bypassing password history was quite easy:

Go to expert mode and generate a hashed string for password ‘ABCXYZ1234’

[Expert@checkpoint:0]# cpopenssl passwd -1 ABCXYZ1234
$1$I54N3F1M$lk/zHvFaKRKXkUFoiEamq1

Then go back to regular CLI and apply the hashed password

set user admin password-hash $1$I54N3F1M$lk/zHvFaKRKXkUFoiEamq1exit

save config

That’s it. Logging in to GAIA as admin / ABCXYZ1234 will then work

Using CheckPoint Dynamic Objects to Source NAT flows

By default, the CheckPoint will usually have three dynamic objects that can be referenced in firewall and NAT policy rules

  • LocalGateway – Main interface of the CheckPoint
  • LocalGatewayExternal – External interface of the CheckPoint
  • LocalGatewayInternal – First internal interface of the CheckPoint

In a 3-Nic deployment, you may want to reference the second internal NIC, for example to source NAT traffic bound to the internal servers to the CheckPoint’s internal IP address.

To do this, you must create a custom dynamic object in SmartConsole, then manually create it on each gateway.

On the gateway, first verify the internal IP address:

[Expert@gateway]# ifconfig eth2
eth2      Link encap:Ethernet HWaddr 42:01:0A:D4:80:03 
          inet addr:10.1.2.1 Bcast:10.1.2.255 Mask:255.255.255.0

Create the object:

[Expert@gateway]# dynamic_objects -n LocalGateway-eth2 -r 10.1.2.1 10.1.2.1 -a

Verify it’s been created:

[Expert@gateway]# dynamic_objects -l

object name : LocalGateway
range 0 : 198.51.100.100 198.51.100.100

object name : LocalGatewayExternal
range 0 : 198.51.100.100 198.51.100.100

object name : LocalGatewayInternal
range 0 : 10.1.1.10 10.1.1.10

object name : LocalGateway-eth2
range 0 : 10.1.2.1 10.1.2.1

Source: skI1915 – Configuring Dynamic Objects

 

Deploying CheckPoint CloudGuard IaaS High Availability in GCP

A minimum 3 NICs are required and will be broken down like so:

  • eth0 – Public / External Interface facing Internet
  • eth1 – Management interface used for Cluster sync.  Can also be used for security management server communication
  • eth2 – First internal interface.  Usually faces internal servers & load balancers.  Can be used for security management server communication

The Deployment launch template has a few fields which aren’t explained very well…

Security Management Server address

A static route to this destination via management interface will be created a launch time.  If the Security Management server is accessed via one of the internal interfaces, use a dummy address here such as 1.2.3.4/32 and add the static routes after launch.

SIC key

This is the password to communicate with the Security Management server. It can be set after launch, but if already known, it can be set here to be pre-configured at launch

Automatically generate an administrator password

This will create a new random ‘admin’ user password to allow access to the WebGUI right after launch, which saves some time especially in situations were SSH is slow or blocked.

Note – SSH connections always require public key authentication, even with this enabled

Allow download from/upload to Check Point

This will allow the instance to communicate outbound to Checkpoint to check for updates.  It’s enabled by default on most CheckPoint appliances, so I’d recommend enabling this setting

Networking

This is the real catch, and a pretty stupid one.  The form fills out these three subnets:

  • “Cluster External Subnet CIDR” = 10.0.0.0/24
  • “Management external subnet CIDR” = 10.0.1.0/24
  • “1st internal subnet CIDR” = 10.0.2.0/24

If using an existing network, erase the pre-filled value and then select the appropriate networks in the drop-down menus like so:

GCP_Existing_VPCNetworks

Also, make sure all subnets have “Private Google Access” checked

Post-launch Configuration

After launch, access the gateways via SSH using public key and/or WebGUI to run through initial setup.  The first step is set a new password for the admin user:

set user admin password

set expert-password

Since eth1 rather than eth0 is the management interface, I would recommend setting that accordingly:

set management interface eth1

I would also recommend adding static routes. The deployment will create static routes for RFC 1918 space via the management interface.  If these need to be overridden to go via an internal interface the CLI command is something like this

set static-route NETWORK/MASK nexthop gateway address NEXTHOP_ADDRESS on

Before importing in to SmartConsole, you can test connectivity by trying to telnet to the security management’s server address on port 18191. Once everything looks good, don’t forget to save the configuration:

save config

Cluster Creation

In SmartConsole, create a new ClusterXL. When prompted for the cluster address, enter the primary cluster address.  The easy way to find this is look the the deployment result under Tools -> Deployment manager -> Deployments

CheckPoint_Deployment_ClusterIPExternalAddress

Then add the individual gateways with the management interface.   Walking through the wizard, you’ll need to define the type of each interface:

  • Set the first (external) interface to private use
  • Set the secondary (management) interface as sync/primary
  • Set subsequent interfaces as private use with monitoring.

Note the wizard tends to list the interfaces backwards: eth2, eth1, eth0

GCP_Clustering

The guide lists a few steps to do within the Gateway Cluster Properties, several of which I disagree with. Instead, I’d suggest the following:

  • Under Network Management, VPN Domain, create a group that lists the internal subnets behind the Checkpoint that will be accessed via site-to-site and remote access VPNs
  • On the eth1 interface, set Topology to Override / This Network / Network defined by routes. This should allow IP spoofing to remain enabled
  • Under NAT, do not check “Hide internal networks behind the Gateway’s external IP” as this will auto-generate a NAT rule that could conflict with site-to-site VPNs. Instead, create manual NAT rules in the policy.
  • Under IPSec VPN, Link Selection, Source IP address Settings, set Manual / IP address of chosen interface

Do a policy install on the new cluster, and a few minutes later, the GCP console should map the primary and secondary external IP addresses to the two instances

CheckPoint_GCP_External_IPAddresses

Failover

Failover is done via API call and takes roughly 15 seconds.

On the external network (front end), the primary and secondary firewalls will each get external IP address mapped.  CheckPoint calls these “primary-cluster-address” and “secondary-cluster-address”.  I’d argue “active” and “standby” would be better names, because the addresses will flip during a failover event.

On the internal network (back end0, failover is done by modifying the static route to 0.0.0.0/0.  The entries will be created on the internal networks when the cluster is formed.

Known Problems

The script $FWDIR/scripts/gcp_ha_test.py is missing

This is simply a mistake in CheckPoint’s documentation.  The correct file name is:

$FWDIR/scripts/google_ha_test.py

Deployment Fails with error code 504, Resource Error, Timeout expired

DeployFailure

Also, while the instances get created and External static IPs allocated, the secondary cluster IP never gets mapped and failover does not work.

Cause: there is a portion of the R80.30 deployment script relating to external IP address mapping that assumes the default service account is enabled, but many Enterprise customers will have default service account disabled as a security best practice.  As of January 2020, the only fix is to enable the default service account, then redo the deployment.

StackDriver is enabled at launch, but never gets logs

Same issue as a above.  As of January 2020, it depends on the default service account being enabled.

Site-to-Site IPSec VPNs on CheckPoint R80.30

The first step is to create a new object with the public IP address of the other side of the tunnel.  This is fairly well buried in the menus:

R80_30_new_VPN_interop_device

After that, create a new VPN “community” in Objects -> More object types -> VPN Community -> New Meshed VPN and walk through the wizard.

The main gotcha is watch out for weird default settings.  In particular, AES-128 is disabled as encryption cipher for Phase 1.  My guess is since it’s the most popular cipher for Phase 2, they go with the “mix ciphers” strategy.  But personally I just like to use AES-128 for everything – it’s simple, fast, and plenty secure.

CheckPoint Dedicated Management Route

New feature (finally!) in R80.30 is the ability to enabled Management data plane Separation, in order to have a separate route table for the management interface and all management related functions (Policy installation, SSH, SNMP, syslog, GAIA portal, etc).

Let’s assume the interface “Mgmt” has already been set as the management interface with IP address 192.168.1.100 and wants default gateway 192.168.1.1, and “eth5” has been setup as the dedicated sync interface:

set mdps mgmt plane on
set mdps mgmt resource on
set mdps interface Mgmt management on
set mdps interface eth5 sync on
add mdps route 0.0.0.0/0 nexthop 192.168.1.1
save config
reboot

After the box comes up you can verify the management route has been set by going in to expert mode and the the “mplane” command to enter management space:

> expert
[Expert@MyCheckPoint:0]# mplane
Context set to Management Plane
[Expert@MyCheckPoint:1]# netstat -rn
Kernel IP routing table
Destination  Gateway       Genmask         Flags MSS Window irtt Iface
169.254.0.0  0.0.0.0       255.255.255.252 U     0   0      0    eth5
192.168.1.0  0.0.0.0       255.255.255.0   U     0   0      0    Mgmt
0.0.0.0      192.168.1.1   0.0.0.0         UGD   0   0      0    Mgmt

Routes from the main route table relating to management can then be deleted, which makes the data plane route table much cleaner:

[Expert@MyCheckpoint:1]# dplane
Context set to Data Plane

[Expert@MyCheckPoint:0]# netstat -rn
Kernel IP routing table
Destination   Gateway       Genmask         Flags MSS Window irtt Iface
203.0.113.32  0.0.0.0       255.255.255.224 U     0   0      0    bond1.11
192.168.222.0 0.0.0.0       255.255.255.0   U     0   0      0    bond1.22
0.0.0.0       203.0.113.33  0.0.0.0         UGD   0   0      0    bond1.11
192.168.0.0   192.168.222.1 255.255.0.0     UGD   0   0      0    bond1.22