Benchmarking Ampere’s ARM CPU in Google Cloud Platform

While creating an instance today I noticed GCP offers ARM based CPUs made by Ampere, a company based in Santa Clara with a large office in Portland. The monthly cost runs about $30/mo for a single CPU with 4 GB RAM – a bit pricier than comparable N1, but slightly less than a comparable T2D, which is the ultra-fast AMD EPYC Milan platform.

Since I mostly run basic Debian packages and python scripts, CPU platform really wasn’t an issue, so I was curious to have a quick bake-off using a basic 16 thread sysbench test to mimic a light to moderate load. Here’s the results

t2a-standard-1

These are based on Ampere Altra and cost $29/mo in us-central1

CPU speed:
    events per second:  3438.95

General statistics:
    total time:                          10.0024s
    total number of events:              34401

Latency (ms):
         min:                                    0.28
         avg:                                    4.63
         max:                                   80.31
         95th percentile:                       59.99
         sum:                               159394.13

Threads fairness:
    events (avg/stddev):           2150.0625/4.94
    execution time (avg/stddev):   9.9621/0.03

t2d-standard-1

These are based on AMD Milan and cost $32/mo in us-central1

CPU speed:
    events per second:  3672.67

General statistics:
    total time:                          10.0027s
    total number of events:              36738

Latency (ms):
         min:                                    0.27
         avg:                                    4.34
         max:                                  100.28
         95th percentile:                       59.99
         sum:                               159498.26

Threads fairness:
    events (avg/stddev):           2296.1250/3.24
    execution time (avg/stddev):   9.9686/0.02

n1-standard-1

These are based on Intel Skylake and cost $25/mo in us-central1

Prime numbers limit: 10000

Initializing worker threads...

Threads started!

CPU speed:
    events per second:   913.60

General statistics:
    total time:                          10.0072s
    total number of events:              9144

Latency (ms):
         min:                                    1.08
         avg:                                   17.45
         max:                                   89.10
         95th percentile:                       61.08
         sum:                               159544.06

Threads fairness:
    events (avg/stddev):           571.5000/1.00
    execution time (avg/stddev):   9.9715/0.03

n2d-custom2-4096

These are based on AMD EPYC and cost $44/mo in us-central1

CPU speed:
    events per second:  1623.41

General statistics:
    total time:                          10.0046s
    total number of events:              16243

Latency (ms):
         min:                                    0.89
         avg:                                    9.82
         max:                                   97.24
         95th percentile:                       29.19
         sum:                               159485.50

Threads fairness:
    events (avg/stddev):           1015.1875/3.13
    execution time (avg/stddev):   9.9678/0.02

n2-custom-2-4096

These are based in Intel Cascade Lake and cost $50/mo in us-central1

CPU speed:
    events per second:  1942.56

General statistics:
    total time:                          10.0036s
    total number of events:              19435

Latency (ms):
         min:                                    1.01
         avg:                                    8.21
         max:                                   57.04
         95th percentile:                       29.19
         sum:                               159499.92

Threads fairness:
    events (avg/stddev):           1214.6875/8.62
    execution time (avg/stddev):   9.9687/0.02

e2-medium

These are based on availability and have 1-2 shared CPU cores and cost $25/mo in us-central1

CPU speed:
    events per second:  1620.67

General statistics:
    total time:                          10.0055s
    total number of events:              16217

Latency (ms):
         min:                                    0.85
         avg:                                    9.84
         max:                                   65.18
         95th percentile:                       29.19
         sum:                               159647.07

Threads fairness:
    events (avg/stddev):           1013.5625/3.43
    execution time (avg/stddev):   9.9779/0.02

Summary

Amphere’s ARM CPUs offered slightly lower performance against the latest goodies from AMD. It did however beat it in the bang for buck ratio thanks to costing $1 less per month to run.

But, the key take away is both platforms completely blow away the older CPU platforms from Intel. Here’s some nice little charts visualizing the numbers.

Advertisement

Migrating Terraform State Files to Workspaces in an AWS S3 Bucket

Just as I did with GCP a few weeks ago, I needed to circle back and migrate my state files to a cloud storage bucket. This done mainly to centralize the storage location automatically and thus lower the chance of a state file loss or corruption.

Previously, I’d been separating the state files using the -state parameter. I then use a different input file and state file for each environment like this:

terraform apply -var-file=env1.tfvars -state=env1.tfstate
terraform apply -var-file=env2.tfvars -state=env2.tfstate
terraform apply -var-file=env3.tfvars -state=env3.tfstate

To instead store the state files in an AWS S3 bucket, create a backend.tf file with this content:

terraform {
  backend "s3" {
    bucket               = "my-bucket-name"
    workspace_key_prefix = "tf-state"
    key                  = "terraform.tfstate"
    region               = "us-west-1"
  }
}

This will use a bucket named ‘my-bucket-name’ in AWS region us-west-1. Each workspace will store its state file in tfstate/<WORKSPACE_NAME>/terraform.tfstate

Note: if workspace_key_prefix is not specified, the directory ‘env:‘ will be created and used.

Since the backend has changed, I have to run this:

terraform init -reconfigure

I then have to copy the local state files to the correct location that the workspace will be using. This is easiest done with the AWS CLI tool, which will automatically create the sub-directory if it doesn’t exist.

aws s3 cp env1.tfstate s3://my-bucket-name/tf-state/env1/terraform.tfstate
aws s3 cp env2.tfstate s3://my-bucket-name/tf-state/env2/terraform.tfstate
aws s3 cp env3.tfstate s3://my-bucket-name/tf-state/env3/terraform.tfstate

Now I’m ready to run an terraform plan/apply and verify the new state file location is being used.

$ terraform workspace new env1
Created and switched to workspace "env1"!

$ terraform apply -var-file=env1.tfvars

No changes. Your infrastructure matches the configuration.

Terraform has compared your real infrastructure against your configuration and found no differences, so no changes are needed.

Apply complete! Resources: 0 added, 0 changed, 0 destroyed.

$ terraform workspace new env2
Created and switched to workspace "env2"!

$ terraform apply -var-file=env2.tfvars

No changes. Your infrastructure matches the configuration.

Terraform has compared your real infrastructure against your configuration and found no differences, so no changes are needed.

Apply complete! Resources: 0 added, 0 changed, 0 destroyed.

An alternate way to do this migration is enable workspaces first, then migrate the backend to S3.

$ terraform workspace new env1
Created and switched to workspace "env1"!

$ mv env1.tfstate terraform.tfstate.d/env1/terraform.tfstate

$ terraform apply -var-file=env1.tfvars

No changes. Your infrastructure matches the configuration.

Terraform has compared your real infrastructure against your configuration and found no differences, so no changes are needed.

Apply complete! Resources: 0 added, 0 changed, 0 destroyed.

Then create the backend.tf file and run terraform init -reconfigure. You’ll then be prompted to move the state files to S3:

$ terraform init -reconfigure
Initializing modules...

Initializing the backend...
Do you want to migrate all workspaces to "s3"?

Enter a value: yes

$ terraform apply -var-file=env1.tfvars

No changes. Your infrastructure matches the configuration.

Terraform has compared your real infrastructure against your configuration and found no differences, so no changes are needed.

Apply complete! Resources: 0 added, 0 changed, 0 destroyed.

A weird, ugly Error message when running google_ha_test.py

[Expert@cp-member-a:0]# $FWDIR/scripts/google_ha_test.py
GCP HA TESTER: started
GCP HA TESTER: checking access scopes...
GCP HA TESTER: ERROR 

Expecting value: line 1 column 1 (char 0)

Got this message when trying to test a CheckPoint R81.10 cluster build in a new environment. Obviously, this error message is not at all helpful in determining what the problem is. So I wrote a little debug script to try and isolate the issue:

import traceback
import gcp as _gcp 

global api
api = _gcp.GCP('IAM', max_time=20)
metadata = api.metadata()[0]

project = metadata['project']['projectId']
zone = metadata['instance']['zone'].split('/')[-1]
name = metadata['instance']['name']

print("Got metadata: project = {}, zone = {}, name = {}\n".format(project, zone, name))
path = "/projects/{}/zones/{}/instances/{}".format(project, zone, name)

try:
    head, res = api.rest("GET",path,query=None, body=None,aggregate=False)
except Exception as e:
    print(traceback.format_exc())

Running the script, I now see an exception when trying to make the initial API call:

[Expert@cp-cluster-member-a:0]# cd $FWDIR/scripts
[Expert@cp-cluster-member-a:0]# python3 ./debug.py

Got metadata: project = myproject, zone = us-central1-b, name = cp-member-a

Traceback (most recent call last):
  File "debug.py", line 18, in <module>
    head, res = api.rest(method,path,query=None,body=None,aggregate=False)
  File "/opt/CPsuite-R81.10/fw1/scripts/gcp.py", line 327, in rest
    max_time=self.max_time, proxy=self.proxy)
  File "/opt/CPsuite-R81.10/fw1/scripts/gcp.py", line 139, in http
    headers['_code']), headers, repr(response))
gcp.HTTPException: Unexpected HTTP code: 403

This at least indicates the connection to the API is OK and it’s some type of permissions issue with the account.

The CheckPoints have always been really tough to troubleshoot in this aspect, so to keep it simple, I deploy them with the default service account for the project. It’s not explicitly called out

I was able to re-enabled Editor permissions for the default service account with this Terraform code:

# Set Project ID via input variable
variable "project_id" {
  description = "GCP Project ID"
  type = string
}
# Get the default service account info for this project
data "google_compute_default_service_account" "default" {
  project = var.project_id
}
# Enable editor role for this service account
resource "google_project_iam_member" "default_service_account_editor" {
  project = var.project_id
  member  = "serviceAccount:${data.google_compute_default_service_account.default.email}"
  role    = "roles/editor"
}

Migrating Terraform to Workspaces & Storage Buckets

As I started using Terraform more, I quickly realized it’s beneficial to use separate state files for difference groups of resources. It goes without saying multiple environments should be in different state files, as should MSP scenarios where there’s multiple customer deployments running off the same Terraform code. The main benefit is to reduce blast radius if something goes wrong, but the additional benefit is limiting dependencies and improving performance.

So when running Terraform, I’d end up doing these steps:

git pull
terraform init
terraform plan -var-file="env1.tfvars" -state="env1.tfstate"
terraform apply -var-file="env1.tfvars" -state="env1.tfstate"
terraform plan -var-file="env2.tfvars" -state="env2.tfstate"
terraform apply -var-file="env2.tfvars" -state="env2.tfstate"
git add *.tfstate *.tfstate.backup
git commit -m "updated state files"
git push

This works OK, but isn’t ideal for a couple reasons. First, the state file can’t be checked out and updated by two users at the same time – git would try and merge the two files, which would likely result in corruption. Also, state files can contain sensitive information like passwords, and really shouldn’t be stored in the repo at all.

So the better solution is store in a Cloud Storage bucket, such as AWS S3 or Google Cloud Storage. This is usually configured by a backend.tf file that specifies the bucket name and directory prefix for storing state files and looks something like this:

terraform {
  backend "gcs" {
    bucket = "my-gcs-bucket-name"
    prefix = "terraform"
  }
}

After creating this file, we must run terraform init to initialize the new backend:

terraform init
Initializing modules...

Initializing the backend...

Successfully configured the backend "gcs"! Terraform will automatically
use this backend unless the backend configuration changes.

But now if we run terraform with the -state parameter, it will look for the state file in the bucket, not find it, and determine it needs to re-create everything, which is incorrect.

The solution to this problem is use a different workspace for each state file.

terraform workspace list
* default

terraform workspace new env1
Created and switched to workspace "env1"!

You're now on a new, empty workspace. Workspaces isolate their state,
so if you run "terraform plan" Terraform will not see any existing state
for this configuration.

Terraform will now look in the bucket for terraform/env1.tfstate, but that file is still local. So we must manually copy it over:

gsutil copy env1.tfstate gs://my-gcs-bucket/terraform/

Repeat this process for all state files. Now, when we run terraform plan/apply, there is no need to specify the state file. It’s automatically known. And assuming we’ve made no changes, terraform should report no changes required.

terraform workspace select env1
terraform apply -var-file="env1.tfvars"

No changes. Your infrastructure matches the configuration.

Apply complete! Resources: 0 added, 0 changed, 0 destroyed.

terraform workspace select env2
terraform apply -var-file="env2.tfvars"

No changes. Your infrastructure matches the configuration.

Apply complete! Resources: 0 added, 0 changed, 0 destroyed.

And it’s all good

Configure Squid for HTTPS on Debian VM

Verify we’re running the latest version of Debian

lsb_release -a
No LSB modules are available.
Distributor ID:	Debian
Description:	Debian GNU/Linux 11 (bullseye)
Release:	11
Codename:	bullseye

Become root

sudo su

Update packages

apt update && apt upgrade -y

Install the Squid package that has openssl configured and enabled

apt install squid-openssl

Create a local CA, using a 4096-bit key and SHA-2 hashing. This one is good for the next 10 years

openssl req -new -newkey rsa:4096 -sha256 -days 3653 -nodes -x509 -keyout /etc/squid/CA.key -out /etc/squid/CA.crt

Combine the key and cert in to a single file for convenience

cat CA.key CA.crt > CA.pem

Initialize the directory used for minted certs and set permissions so squid owns it

/usr/lib/squid/security_file_certgen -c -s /var/spool/squid/ssl_db -M 4MB
chown -R proxy:proxy /var/spool/squid

Finally, configure Squid to use HTTPS

http_port 3128 ssl-bump cert=/etc/squid/CA.pem generate-host-certificates=on options=NO_SSLv3
ssl_bump bump all

Restart Squid

service squid restart

Test connections by configuring 3128. Note the certificate from the CA, good for 10 years:

export https_proxy=http://localhost:3128

curl -v --cacert CA.crt  https://teapotme.com 

* Uses proxy env variable https_proxy == 'http://localhost:3128'
*   Trying ::1:3128...
* Connected to localhost (::1) port 3128 (#0)
* allocate connect buffer!
* Establish HTTP proxy tunnel to teapotme.com:443
> CONNECT teapotme.com:443 HTTP/1.1
> Host: teapotme.com:443
> User-Agent: curl/7.74.0
> Proxy-Connection: Keep-Alive
> 
< HTTP/1.1 200 Connection established
< 
* Proxy replied 200 to CONNECT request
* CONNECT phase completed!
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
*  CAfile: CA.crt
*  CApath: /etc/ssl/certs
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* CONNECT phase completed!
* CONNECT phase completed!
* TLSv1.3 (IN), TLS handshake, Server hello (2):
* TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8):
* TLSv1.3 (IN), TLS handshake, Certificate (11):
* TLSv1.3 (IN), TLS handshake, CERT verify (15):
* TLSv1.3 (IN), TLS handshake, Finished (20):
* TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1):
* TLSv1.3 (OUT), TLS handshake, Finished (20):
* SSL connection using TLSv1.3 / TLS_AES_256_GCM_SHA384
* ALPN, server did not agree to a protocol
* Server certificate:
*  subject: CN=teapotme.com
*  start date: Nov  6 04:03:48 2022 GMT
*  expire date: Nov  6 04:03:48 2032 GMT
*  subjectAltName: host "teapotme.com" matched cert's "teapotme.com"
*  issuer: C=AU; ST=Some-State; O=Internet Widgits Pty Ltd; CN=localhost
*  SSL certificate verify ok.
> GET / HTTP/1.1
> Host: teapotme.com
> User-Agent: curl/7.74.0
> Accept: */*
> 
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
* old SSL session ID is stale, removing
* Mark bundle as not supporting multiuse
< HTTP/1.1 418 I'm a teapot
< Server: nginx
< Date: Sun, 06 Nov 2022 04:08:13 GMT
< Content-Type: application/json
< Content-Length: 483
< X-Cache: MISS from test-1
< X-Cache-Lookup: MISS from test-1:3128
< Via: 1.1 test-1 (squid/4.13)
< Connection: keep-alive
< 
{
    "host": "teapotme.com",
    "user-agent": "curl/7.74.0",
    "x-forwarded-for": "::1, 35.233.234.155, 172.17.0.1",
    "x-forwarded-proto": "https",
}

Git clone / pull / push fails with ‘no mutual signature algorithm’ on Ubuntu 22 to GCP Cloud Source

I created a new Ubuntu 22 VM a few weeks ago and noticed when trying a git pull or git push to a GCP Cloud Source Repo, I wasn’t having any luck when using SSH:

cd myrepo/
git pull
myusername@myorg.com@source.developers.google.com: Permission denied (publickey).
fatal: Could not read from remote repository.

Please make sure you have the correct access rights
and the repository exists.

The SSH key was a standard RSA with the public key uploaded to Cloud Source SSH Keys, so there was no obvious reason why it wasn’t working.

Next step was try and get some type of debug or error message as to why the public key exchange wasn’t working. Newer versions of Git can turn on SSH debugging by setting the GIT_SSH_COMMAND environment variable, so I did that:

export GIT_SSH_COMMAND="ssh -vvv"

When re-running the git pull request, I get some somewhat useful debugs back:

debug1: Authentications that can continue: publickey
debug3: start over, passed a different list publickey
debug3: preferred gssapi-with-mic,publickey,keyboard-interactive,password
debug3: authmethod_lookup publickey
debug3: remaining preferred: keyboard-interactive,password
debug3: authmethod_is_enabled publickey
debug1: Next authentication method: publickey
debug1: Offering public key: /home/j5/.ssh/id_rsa RSA SHA256:JBgC+R4Ozel+YI+7oEv1UOf9/jLqGBhysN8bpoEDbPU
debug1: send_pubkey_test: no mutual signature algorithm

The ‘no mutual signature algorithm’ indicated one side didn’t like the signing algorithm. I did a Google and found this article which indicates that Ubuntu 22 doesn’t allow RSA by default. I can’t change the setting on the Cloud Source side, so on the Ubuntu 22 client, I did this as a quick work-around:

echo "PubkeyAcceptedKeyTypes +ssh-rsa" > /etc/ssh/ssh_config.d/enable_rsa.conf

And now the git pull/push works without issue.

An alternate solution is instead use Elliptic Curve DSA rather than RSA. To generate a new ECDSA key:

ssh-keygen -t ecdsa
cat ~/.ssh/id_ecdsa.pub

Then copy/paste the key in to the SSH Key Manager. This will be easier to copy/paste then RSA since it’s shorter.

Making Async Calls to Google Cloud Storage

I have a script doing real-time log analysis, where about 25 log files are stored in a Google Cloud Storage bucket. The files are always small (1-5 MB each) but the script was taking over 10 seconds to run, resulting in slow page load times and poor user experience. Performance analysis showed that most of the time was spent on the storage calls, with high overhead of requesting individual files.

I started thinking the best way to improve performance was to make the storage calls in an async fashion so as to download the files in parallel. This would require a special library capable of making such calls; after lots of Googling and trial and error I found a StackOverFlow post which mentioned gcloud AIO Storage. This worked very well, and after implementation I’m seeing a 125% speed improvement!

Here’s a rundown of the steps I did to get async working with GCS.)

1) Install gcloud AIO Storage:

pip install gcloud-aio-storage

2) In the Python code, start with some imports

import asyncio
from gcloud.aio.auth import Token
from gcloud.aio.storage import Storage

3) Create a function to read multiples from the same bucket:

async def IngestLogs(bucket_name, file_names, key_file = None):

    SCOPES = ["https://www.googleapis.com/auth/cloud-platform.read-only"]
    token = Token(service_file=key_file, scopes=SCOPES)
            
    async with Storage(token=token) as client:
        tasks = (client.download(bucket_name, _) for _ in file_names)
        blobs = await asyncio.gather(*tasks)
    await token.close()
                
    return blobs

It’s important to note that ‘blobs’ will be a list, with each element representing a binary version of the file.

4) Create some code to call the async function. The decode() function will convert each blob to a string.


def main():

    bucket_name = "my-gcs-bucket"
    file_names = {
       'file1': "path1/file1.abc",
       'file2': "path2/file2.def",
       'file3': "path3/file3.ghi",
    }
    key = "myproject-123456-mykey.json" 

    blobs = asyncio.run(IngestLogs(bucket_name, file_names.values(), key_file=key))

    for blob in blobs:
        # Print the first line from each blob
        print(blob.decode('utf-8')[0:75])

I track the load times via NewRelic synthetics, and it showed a 300% performance improvement!

Using GCP Python SDK for Network Tasks

Last week, I finally got around to hitting the GCP API directly using Python. It’s pretty easy to do in hindsight. Steps are below


If not done already, install PIP. On Debian 10, the command is this:

sudo apt install python3-pip

Then of course install the Python packages for GCP:

sudo pip3 install google-api-python-client google-cloud-storage

Now you’re ready to write some Python code. Start with a couple imports:

#!/usr/bin/env python3 

from googleapiclient import discovery
from google.oauth2 import service_account

By default, the default compute service account for the VM or AppEngine will be used for authentication. Alternately, a service account can be specific with the key’s JSON file:

KEY_FILE = '../mykey.json'
creds = service_account.Credentials.from_service_account_file(KEY_FILE)

Connecting to the Compute API will look like this. If using the default service account, the ‘credentials’ argument is not required.

resource_object = discovery.build('compute', 'v1', credentials=creds)

All API calls require the project ID (not name) be provided as a parameter. I will set it like this:

PROJECT_ID = "myproject-1234"

With the connection to the API established, you can now run some commands. The resource object will have several methods, and in each there will typically be a list() method to list the items in the project. The execute() at the end is required to actually execute the call.

_ = resource_object.firewalls().list(project=PROJECT_ID).execute()

It’s important to note the list().execute() returns a dictionary. The actual list of items can be found in key ‘items’. I’ll use the get() method to retrieve the values for the ‘items’ key, or use an empty list if ‘items’ doesn’t exist. Here’s an example

firewall_rules = _.get('items', [])
print(len(firewall_rules), "firewall rules in project", PROJECT_ID)
for firewall_rule in firewall_rules:
    print(" -", firewall_rule['name'])

The API reference guide has a complete list of everything that’s available. Here’s some examples:

firewalls() - List firewall rules
globalAddresses() - List all global addresses
healthChecks() - List load balancer health checks
subnetworks() - List subnets within a given region
vpnTunnels() - List configured VPN tunnels

Some calls will require the region name as a parameter. To get a list of all regions, this can be done:

_ = resource_object.regions().list(project=PROJECT_ID).execute()
regions = [region['name'] for region in _.get('items', [])]

Then iterate through each region. For example to list all subnets:

for region in regions:
    _ = resource_object.subnetworks().list(project=PROJECT_ID,region=region).execute()
    print("Reading subnets for region", region ,"...")
    subnets = _.get('items', [])
    for subnet in subnets:
        print(" -", subnet['name'], subnet['ipCidrRange'])

Deploying a Container to GCP Cloud Run

Copy a working container and review the Dockerfile

git clone https://github.com/jeheyer/network-automation.git
cd network-automation/docker/flask2-sqlchemy
cat Dockerfile

Verify logged in to gcloud and set to correct project:

gcloud projects list
gcloud config set project <PROJECT_ID>

Where <PROJECT_ID> is the Project ID

Then build the container and upload the image to Google Container Registry. Note this step is missing from the quickstart guide

gcloud builds submit --tag gcr.io/<PROJECT_ID>/flask2-sqlalchemy .

Now pick a region and deploy the container. In this example ‘flask2’ is the service name:

gcloud config set run/region us-central1
gcloud run deploy flask2 --image gcr.io/<PROJECT_ID>/flask2-sqlalchemy --allow-unauthenticated

Installing Terraform on FreeBSD

I was pleased to recently discover that Terraform is in the FreeBSD packages. To install, simply do this:

pkg install terraform

As of February 2022, the latest version is 1.0.11. To run 1.1.6, first remove the package:

pkg remove terraform

Then download the most recent version and copy the binary to /usr/local/bin. Should be good to go:

% terraform version
Terraform v1.1.6
on freebsd_amd64

% terraform init

Initializing the backend...

Initializing provider plugins...
- Finding latest version of hashicorp/aws...
- Installing hashicorp/aws v4.2.0...
- Installed hashicorp/aws v4.2.0 (signed by HashiCorp)

Terraform has been successfully initialized!

Hopefully they’ll update to 1.1 soon. Today I learned about the nullable option for variables, and it’s a very useful option when working with parent/child modules.