One of the many stupid things for the CheckPoint CloudGuard IaaS appliances in GCP is Checkpoint never took in to account scenarios where multiple clusters exist within the same project and/or same network. This results in a naming conflict for the static routes & access config, and the default behavior will be for different clusters to “steal” routes IP addresses from the others.
To fix this, the first step is give each cluster a unique name. This can be fairly easily done by setting CHKP_TAG
in the Python script $FWDIR/scripts/gcp_had.py
CHKP_TAG = cluster-1
This variable influences the route and access config names. But that still won’t be enough, because their deployment script hard-codes the access config name, so failover still won’t work. You’ll see this in $FWDIR/log/gcp_had.elg
during a failover event:
2024-03-28 23:09:44,259-GCP-CP-HA-ERROR- Operation deleteAccessConfig for https://www.googleapis.com/compute/v1/projects/project-1234/zones/us-west2-b/instances/checkpoint-member-b error OrderedDict([('errors', [OrderedDict([('code', 'INVALID_USAGE'), ('message', 'Invalid access config name `checkpoint-access-config` as the access config name in instance is `x-chkp-access-config`.')])])])
To fix this, the existing access config names must be manually deleted on both members:
gcloud compute instances delete-access-config checkpoint-member-a --zone=us-west2-a --access-config-name="x-chkp-access-config"
gcloud compute instances delete-access-config checkppoint-member-b --zone=us-west2-b --access-config-name="x-chkp-access-config"
Then perform a rolling reboot of both members, and failover should work now.