Making Async Calls to Google Cloud Storage

I have a script doing real-time log analysis, where about 25 log files are stored in a Google Cloud Storage bucket. The files are always small (1-5 MB each) but the script was taking over 10 seconds to run, resulting in slow page load times and poor user experience. Performance analysis showed that most of the time was spent on the storage calls, with high overhead of requesting individual files.

I started thinking the best way to improve performance was to make the storage calls in an async fashion so as to download the files in parallel. This would require a special library capable of making such calls; after lots of Googling and trial and error I found a StackOverFlow post which mentioned gcloud AIO Storage. This worked very well, and after implementation I’m seeing a 125% speed improvement!

Here’s a rundown of the steps I did to get async working with GCS.)

1) Install gcloud AIO Storage:

pip install gcloud-aio-storage

2) In the Python code, start with some imports

import asyncio
from gcloud.aio.auth import Token
from gcloud.aio.storage import Storage

3) Create a function to read multiples from the same bucket:

async def IngestLogs(bucket_name, file_names, key_file = None):

    SCOPES = ["https://www.googleapis.com/auth/cloud-platform.read-only"]
    token = Token(service_file=key_file, scopes=SCOPES)
            
    async with Storage(token=token) as client:
        tasks = (client.download(bucket_name, _) for _ in file_names)
        blobs = await asyncio.gather(*tasks)
    await token.close()
                
    return blobs

It’s important to note that ‘blobs’ will be a list, with each element representing a binary version of the file.

4) Create some code to call the async function. The decode() function will convert each blob to a string.


def main():

    bucket_name = "my-gcs-bucket"
    file_names = {
       'file1': "path1/file1.abc",
       'file2': "path2/file2.def",
       'file3': "path3/file3.ghi",
    }
    key = "myproject-123456-mykey.json" 

    blobs = asyncio.run(IngestLogs(bucket_name, file_names.values(), key_file=key))

    for blob in blobs:
        # Print the first line from each blob
        print(blob.decode('utf-8')[0:75])

I track the load times via NewRelic synthetics, and it showed a 300% performance improvement!

Advertisement

Using GCP Python SDK for Network Tasks

Last week, I finally got around to hitting the GCP API directly using Python. It’s pretty easy to do in hindsight. Steps are below


If not done already, install PIP. On Debian 10, the command is this:

sudo apt install python3-pip

Then of course install the Python packages for GCP:

sudo pip3 install google-api-python-client google-cloud-storage

Now you’re ready to write some Python code. Start with a couple imports:

#!/usr/bin/env python3 

from googleapiclient import discovery
from google.oauth2 import service_account

By default, the default compute service account for the VM or AppEngine will be used for authentication. Alternately, a service account can be specific with the key’s JSON file:

KEY_FILE = '../mykey.json'
creds = service_account.Credentials.from_service_account_file(KEY_FILE)

Connecting to the Compute API will look like this. If using the default service account, the ‘credentials’ argument is not required.

resource_object = discovery.build('compute', 'v1', credentials=creds)

All API calls require the project ID (not name) be provided as a parameter. I will set it like this:

PROJECT_ID = "myproject-1234"

With the connection to the API established, you can now run some commands. The resource object will have several methods, and in each there will typically be a list() method to list the items in the project. The execute() at the end is required to actually execute the call.

_ = resource_object.firewalls().list(project=PROJECT_ID).execute()

It’s important to note the list().execute() returns a dictionary. The actual list of items can be found in key ‘items’. I’ll use the get() method to retrieve the values for the ‘items’ key, or use an empty list if ‘items’ doesn’t exist. Here’s an example

firewall_rules = _.get('items', [])
print(len(firewall_rules), "firewall rules in project", PROJECT_ID)
for firewall_rule in firewall_rules:
    print(" -", firewall_rule['name'])

The API reference guide has a complete list of everything that’s available. Here’s some examples:

firewalls() - List firewall rules
globalAddresses() - List all global addresses
healthChecks() - List load balancer health checks
subnetworks() - List subnets within a given region
vpnTunnels() - List configured VPN tunnels

Some calls will require the region name as a parameter. To get a list of all regions, this can be done:

_ = resource_object.regions().list(project=PROJECT_ID).execute()
regions = [region['name'] for region in _.get('items', [])]

Then iterate through each region. For example to list all subnets:

for region in regions:
    _ = resource_object.subnetworks().list(project=PROJECT_ID,region=region).execute()
    print("Reading subnets for region", region ,"...")
    subnets = _.get('items', [])
    for subnet in subnets:
        print(" -", subnet['name'], subnet['ipCidrRange'])

Deploying a Container to GCP Cloud Run

Copy a working container and review the Dockerfile

git clone https://github.com/jeheyer/network-automation.git
cd network-automation/docker/flask2-sqlchemy
cat Dockerfile

Verify logged in to gcloud and set to correct project:

gcloud projects list
gcloud config set project <PROJECT_ID>

Where <PROJECT_ID> is the Project ID

Then build the container and upload the image to Google Container Registry. Note this step is missing from the quickstart guide

gcloud builds submit --tag gcr.io/<PROJECT_ID>/flask2-sqlalchemy .

Now pick a region and deploy the container. In this example ‘flask2’ is the service name:

gcloud config set run/region us-central1
gcloud run deploy flask2 --image gcr.io/<PROJECT_ID>/flask2-sqlalchemy --allow-unauthenticated

Installing Terraform on FreeBSD

I was pleased to recently discover that Terraform is in the FreeBSD packages. To install, simply do this:

pkg install terraform

As of February 2022, the latest version is 1.0.11. To run 1.1.6, first remove the package:

pkg remove terraform

Then download the most recent version and copy the binary to /usr/local/bin. Should be good to go:

% terraform version
Terraform v1.1.6
on freebsd_amd64

% terraform init

Initializing the backend...

Initializing provider plugins...
- Finding latest version of hashicorp/aws...
- Installing hashicorp/aws v4.2.0...
- Installed hashicorp/aws v4.2.0 (signed by HashiCorp)

Terraform has been successfully initialized!

Hopefully they’ll update to 1.1 soon. Today I learned about the nullable option for variables, and it’s a very useful option when working with parent/child modules.

Anonymous FTP security bug on Synology

adf

% ftp ds218plus
Connected to ds218plus.mydomain.com.
220 DS218Plus FTP server ready.
Name (ds218plus:j5): ftp
331 Guest login ok, send your email address as password.
Password: 
230 Guest login ok, access restrictions apply.
Remote system type is UNIX.
Using binary mode to transfer files.
ftp> ls
229 Entering Extended Passive Mode (|||55703|)
150 Opening BINARY mode data connection for 'file list'.
drwxrwxrwx   1 root     root             4096 Sep 16 10:58 usbshare1
dr-xr-xr-x   1 root     root              142 May  9 14:48 web
drwxr-xr-x   1 root     root               38 Aug 18 22:21 docker

Whoa there! I should only see the directory for Anonymous FTP, not a list of shares. What’s more, I could download files from these directories even though they were never intended to be public.

The first step is disable advanced permissions on whichever directory you want to use for Anonymous FTP. In my case, the share was called ‘public’:

After doing that, I can manually select the share to use for Anonymous FTP:

Clearing out /var/spool/clientmqueue in FreeBSD

My FreeBSD VM with its 10GB virtual hard disk ran out of space today. The primary culprit was /var/spool/clientmqueue consuming nearly 3GB of space:

# du -d 1 /var/spool/
8	/var/spool/output
4	/var/spool/opielocks
2955904	/var/spool/clientmqueue
4	/var/spool/dma
4	/var/spool/lpd
4	/var/spool/lock
4	/var/spool/mqueue
2955936	/var/spool/

But when trying to just delete the files, I got “argument list too long”:

# rm -f /var/spool/clientmqueue/*
/bin/rm: Argument list too long.

In the Google search I learned something interesting: find has a -delete option. This worked well:

# find /var/spool/clientmqueue -name '*' -delete

VPNs to Google Cloud Platform (GCP) when FortiGate is behind a NAT gateway

Ran in to problems getting a VPN up and running between GCP and a FortiGate 60-E that was behind a NAT gateway (with ports udp/500 + udp/4500 forwarded). On the GCP side, these messages would show up in the logs:

remote host is behind NAT
generating IKE_AUTH response 1 [ N(AUTH_FAILED) ]
looking for peer configs matching GCP.VPN.GATEWAY.IP[%any]...203.0.113.77[192.168.1.1]

This error means that GCP connected to the Peer VPN gateway successfully, but it in the IKEv2 headers, it identified itself by the private IP rather than the expected public one. AWS is not picky about this, but with GCP, the Peer VPN gateway must identify itself by using the same external IP address of the NAT device.

Most vendors have long supported an option to manually override the IP address for such scenarios. In Cisco IOS or IOS-XE, this can be controlled in the IKEv2 profile with the identity local address option:

crypto ikev2 profile GCP_IKEV2_PROFILE
 match address local interface GigabitEthernet1
 identity local address MY.PUBLIC.IP.ADDRESS
 authentication remote pre-share
 authentication local pre-share
 keyring local GCP_KEYRING
 lifetime 36000
 dpd 20 5 on-demand
!

With Palo Alto, this is configured in the IKE Gateway, Local Identification field:

For the sake of argument, we’ll say that CheckPoint uses the “Statically NATed IP” field to influence Local ID, although this doesn’t actually work.

Fortigate does offer “Local ID” field in version 6.4.6 and higher, under the Phase 1 proposal:

Seems nice and straightforward, but even after changing this setting, the VPN tunnel still won’t establish. Logs on the GCP end change slightly and now show this:

looking for peer configs matching GCP.VPN.GATEWAY.IP[%any]...203.0.113.77[203.0.113.77]

The private IP is no longer showing, so it seems the issue should be solved. Instead, GCP reports a “Peer not responding” message. The Fortigate actually reports Phase 1 success, waits a few seconds, and then starts the negotiation all over. So not very helpful.

I configured a test VPN between the FortiGate and a Palo Alto, which then gave a very specific and extremely useful error message:

IKE phase-1 negotiation is failed. When pre-shared key is used, peer-ID must be type IP address. Received type FQDN

Now this explains the problem! Even though the FortiGate is sending the correct IP address in the IKEv2 header, it’s being sent as the wrong identity type. The 5 identity types are listed in RFC 7815:

  • ID_IPV4_ADDR = 32 bit IPv4 address
  • ID_IPV6_ADDR = 128 bit IPv6 address
  • ID_FQDN = DNS hostname
  • ID_RFC822_ADDR = e-mail address
  • ID_KEY_ID = octet stream

If Fortigate were smart, it would either default to IPv4 address type or auto-determine this based on the text inputted in to the field. But it seems to simply default to FQDN. Oddly, there is a CLI option called “localid-type” under the Phase1-interface that clear is intended to provide this functionality:

FGT60E1234567890 # config vpn ipsec phase1-interface

FGT60E1234567890 (phase1-interface) # edit gcp

FGT60E1234567890 (gcp) # set localid-type 
auto         Select ID type automatically.
fqdn         Use fully qualified domain name.
user-fqdn    Use user fully qualified domain name.
keyid        Use key-id string.
address      Use local IP address.

But, similar to CheckPoint, it just doesn’t work, and can be considered a broken feature.

Since GCP does not support FQDN authentication, VPNs between GCP and FortiGates behind a NAT are not possible at this time.

Rancid: no matching key exchange method found. Their offer: diffie-hellman-group-exchange-sha1,diffie-hellman-group14-sha1

Time to move Rancid to a newer VM again, this time it’s Ubuntu 20. Hit a snag when I tried a test clogin run:

$ clogin myrouter
Unable to negotiate with 1.2.3.4 port 22: no matching key exchange method found.  Their offer: diffie-hellman-group-exchange-sha1,diffie-hellman-group14-sha1

OpenSSH removed SHA-1 from the defaults a while back, which makes sense since the migration to SHA-2 began several years ago. So looks like SSH is trying to use SHA-2 but the Cisco Router is defaulting to SHA-1, and something has to give in order for negotiation to succeed.

My first thought was to tell the Cisco router to use SHA-2, and this is possible for the MAC setting:

Router(config)#ip ssh server algorithm mac ?
  hmac-sha1      HMAC-SHA1 (digest length = key length = 160 bits)
  hmac-sha1-96   HMAC-SHA1-96 (digest length = 96 bits, key length = 160 bits)
  hmac-sha2-256  HMAC-SHA2-256 (digest length = 256 bits, key length = 256 bits)
  hmac-sha2-512  HMAC-SHA2-512 (digest length = 512 bits, key length = 512 bits

Router(config)#ip ssh server algorithm mac hmac-sha2-256 hmac-sha2-512
Router(config)#do sh ip ssh | inc MAC       
MAC Algorithms:hmac-sha2-256,hmac-sha2-512

But not for key exchange, which apparently only supports SHA-1:

Router(config)#ip ssh server algorithm kex ?
  diffie-hellman-group-exchange-sha1  DH_GRPX_SHA1 diffie-hellman key exchange algorithm
  diffie-hellman-group14-sha1         DH_GRP14_SHA1 diffie-hellman key exchange algorithm

Thus, the only option is to change the setting on the client. SSH has CLI options for Cipher and Mac:

-c : sets cipher (encryption) list.

-m: sets mac (authentication) list

But the option for Key Exchange can only be configured via the /etc/ssh/sshd_config file with this line:

KexAlgorithms +diffie-hellman-group14-sha1

I wanted to change the setting only for Rancid and not SSH in general, hoping that Cisco adds SHA-2 key exchange soon. I found out it is possible to set SSH options in the .cloginrc file. The solution is this:

add  sshcmd  *  {ssh\  -o\ KexAlgorithms=+diffie-
hellman-group14-sha1}

Clogin is now successful:

$ clogin myrouter
spawn ssh -oKexAlgorithms=+diffie-hellman-group14-sha1 -c aes128-ctr,aes128-cbc,3des-cbc -x -l myusername myrouter
Password:
Router#_

By the way, I stayed away from diffie-hellman-group-exchange-sha1 as it’s considered insecure, whereas diffie-hellman-group14-sha1 was considered deprecated but still widely deployed and still “strong enough”, probably thanks to its 2048-bit key length.

Sidenote: this only affects Cisco IOS-XE devices. The Cisco ASA ships with this in the default configuration:

ssh key-exchange group dh-group14-sha256

Docker Cheat Sheet

Install Docker on Ubuntu

sudo apt update
 
sudo apt install apt-transport-https ca-certificates curl software-properties-common

curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
sudo add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu focal stable"
apt-cache policy docker-ce
apt install docker-ce

List images

docker image list

Run a container from an image

docker run --name <RUN_NAME> -p <HOST_PORT>:<CONTAINER_PORT> <IMAGE_NAME>:latest

List running containers

docker ps

Build a container

docker build -t <IMAGE_NAME> <DIR_OF_Dockerfile>

Upload an image to Docker registry

docker push <IMAGE_NAME>

Save (export) an image

docker save <IMAGE_NAME>:latest -o image.tar

Save (export) an image with real-time gziping

docker save <IMAGE_NAME>:latest | gzip > image.tgz

Fix unsigned repo errors for ubuntu and debian

docker system prune

Delete all local images

docker rmi -f $(docker images -aq)