“No Vlan association for STP Interface Member 1.0” when upgrading F5 BigIP in AWS from 13.1.1 to 13.1.3.2

After upgrading several of our AWS Bigip-VEs in AWS from 13.1.1 to 13.1.3.2 without issue, I had big problems with a pair this afternoon.  The first one took forever to boot up, and when it did, was complaining about incomplete configuration.

I’ve seen this before and know it usually means it couldn’t migrate a certain section of the configuration file, and rather than ignoring it, it just can’t load anything.  This is what was showing up in /var/log/ltm and on console:

May 4 17:47:53 bigip warning mcpd[18134]: 01070932:4: Pending local Interface from cluster.: 1.0, configuration ignored
May 4 17:47:53 bigip warning mcpd[18134]: 01070932:4: Pending Interface: 1.0, configuration ignored
May 4 17:47:53 bigip err mcpd[18134]: 01070523:3: No Vlan association for STP Interface Member 1.0.
May 4 17:47:53 bigip emerg load_config_files: "/usr/bin/tmsh -n -g load sys config partitions all base " - failed. -- 01070523:3: No Vlan association for STP Interface Member 1.0. Error: failed to reset strict operations; disconnecting from mcpd. Will reconnect on next command.

The problem was the original 13.1.1 configuration file had this:

net interface 1.0 {

media-fixed 10000T-FD

}

This is really an error from the get-go, since interface 1.0 is the eth0 / management interface and shouldn’t be in the “net interface” section.

My work around was to reset to factory config, then re-create Self IPs and re-sync the cluster.  Alternately, the configuration file could be modified to simply remove the offending lines.

Since I did not see this configuration in any other F5 BigIP-VEs, I’d suspect it was mistakenly inserted in to the 13.1.1-0.0.4 AMI by F5 that I’d launched last summer.

Advertisement

Authentication to Synology Directory Server (LDAP Server)

Upon configuring Directory Server the Synology will provide something like this:

The password configured is password for the ‘root’ user

Configuration for Cisco ASA / AnyConnect

aaa-server SYNOLOGY protocol ldap
aaa-server SYNOLOGY (Inside) host 192.168.1.100
 ldap-base-dn dc=myserver,dc=mydomain,dc=com
 ldap-scope subtree
 ldap-naming-attribute uid
 ldap-login-password <root user password>
 ldap-login-dn uid=root,cn=users,dc=myserver,dc=mydomain,dc=com
 server-type auto-detect

Configuration for FortiGate GUI

  • Common Name Identifier = uid
  • Distinguished Name = cn=users,dc=myserver,dc=mydomain,dc=com
  • Bind Type = Simple

Configuration for F5 BigIP

Need to change Authentication from ‘Basic’ to ‘Advanced’ to set Login LDAP attribute

  • Remote Directory Tree: dc=myserver,dc=mydomain,dc=com
  • Scope: Sub
  • BIND DN: uid=root,cn=users,dc=myserver,dc=mydomain,dc=com
  • Password: <root user password>
  • User Template: uid=%s,cn=users,dc=myserver,dc=mydomain,dc=com
  • Login LDAP Attribute: uid

To use Remote Role Groups:

Attribute String: memberOf=cn=users,cn=groups,dc=myserver,dc=mydomain,dc=com

 

F5 to ADFS 2016 SSL/TLS handshake failure

Browser to ADFS server works fine, but dies when going through the F5 LTM.  Packet capture showed the F5 would send a client hello SSL handshake message as expected, with the ADFS server responding with a TCP RST.

Upon doing some more digging, found this the ADFS 2016 guide:

The load balancer MUST NOT terminate SSL. AD FS supports multiple use cases with certificate authentication which will break when terminating SSL. Terminating SSL at the load balancer is not supported for any use case.

So, the F5 Virtual server should be configured as Layer 4.

The unsupported work-around is set a custom ServerSSL profile with the server name field:

ltm profile server-ssl /Common/serverssl-myserver {
 app-service none
 defaults-from /Common/serverssl
 server-name adfs.mydomain.com
}

01150b21:3: RCODE returned from query: ‘SERVFAIL’.

Came across an interesting problem after our F5 BigIP-VEs were victim to a storage failure in VMWare.  Certain zones couldn’t be modified or in some cases even viewed in ZoneRunner.  Since F5 doesn’t officially support its BIND backend, I knew I was likely on my own for a fix and began poking around /var/named/config/namedb were the files are stored.

[admin@f5bigip01:Active:In Sync] ~ # cd /var/named/config/namedb/
[admin@f5bigip01:Active:In Sync] namedb # ls -ls db.internal.32.30.10.in-addr.arpa.*
 4 -rw-r--r--. 1 named named 977 2017-08-21 12:53 db.internal.32.30.10.in-addr.arpa.
 4 -rw-r--r--. 1 named named 861 2017-08-19 12:06 db.internal.32.30.10.in-addr.arpa.~
12 -rw-r--r--. 1 named named 11302 2017-08-19 11:55 db.internal.32.30.10.in-addr.arpa..jnl

Took a guess that it’s the .jnl file that’s the problem.  So I decided to halt BIND, delete the file, and try again…

[admin@f5bigip01:Active:In Sync] ~ # bigstart stop zrd
[admin@f5bigip01:Active:In Sync] ~ # rm -f *..jnl
[admin@f5bigip01:zrd DOWN:In Sync] ~ # bigstart start zrd

Went back to ZoneRunner and was able to view and edit the zone just fine.

F5 Bigip-VE tips for AWS deployment

Launch and initial configuration

The instructions are slightly incorrect.  You’ll want to ssh as ‘admin’ (not root or ec2-user)

$ ssh -i mykey.pem admin@10.10.10.111

Then use these TMOS commands to set and save a password for the admin user:

(tmos)# modify auth user admin prompt-for-password
(tmos)# save sys config

Login to the GUI as admin with the new password to do licensing and initial configuration.

Interfaces, Self IPs, and VLANs

While F5 guides list a variety of interface configurations, my advice is use 3

  1. eth0: mgmt – Used for SSH, HTTPS, and SNMP polling access
  2. eth1: interface 1.1: vlan “external” in a public subnet – For talking to Internet
  3. eth2: interface 1.2: vlan “internal” in a private subnet – For talking to internal resources and HA

Routing

The default route should of course be via the external interface’s gateway.  Any private IP address spaces (10.0.0.0/8, etc) can be routed via the internal interface’s gateway

If doing an HA pair across multiple availability zones, items with unique IP addresses such as routes, virtual servers, and perhaps pools/nodes will need to go in a separate non-synchronized partition.

  1. To go System -> Users -> Partition list
  2. Create a new partition with a good name (i.e. “LOCAL_ONLY”)
  3. Uncheck the Device Group and set the Traffic Group to “traffic-group-local-only”

 

Installing F5 Images and Hotfixes on BigIP-VE

Often with the Bigip-VEs, installing an image or hotfix via the GUI will be impossible because the volume drop-down menu is empty:

F5GUInoVolumes

First verify the name of the image that’s been uploaded.  It should be in /shared/images

(tmos)# bash
# ls -l /shared/images/*.iso
-rw-r--r--. 1 tomcat tomcat 2096115712 2020-03-11 09:56 /shared/images/BIGIP-13.1.1.4-0.0.4.iso

Then install the image via tmsh with the create-volume option:

# tmsh

(tmos)# install sys software image BIGIP-13.1.1.4-0.0.4.iso volume HD1.2 create-volume

(tmos)# show sys software status
----------------------------------------------------------------
Sys::Software Status
Volume Product Version Build Active Status
----------------------------------------------------------------
HD1.1    BIG-IP  13.1.0.8  0.0.1     yes               complete
HD1.2    BIG-IP  13.1.1.4  0.0.4      no  installing 10.000 pct

At this point you can go back to GUI, watch the installation complete, and boot to that volume.

Fixing expired SSL certificate on F5 GTM

I applied the latest Hotfix to our GTMs tonight and was checking logs just to verify there were no surprises.  Unfortunately, there were: /var/log/gtm was showing SSL errors every 10 seconds complaining of being unable to verify the certificates.  I check the self-sign certs, and sure enough, the had expired a few days ago.

The first step to fix is obvious – renew the cert via System -> Device Certificate -> Device Certificate.  The only field that really matters here is hostname and duration; everything else can be left to defaults.

bigip_device_certificate_renew

Now re-import each others new certificates System -> Device Certificate -> Trusted Device certificates.

bigip_trusted_device_certificate_import

Sync and failover between the BigIP devices is now fixed, buuuuuut the logs show that gtmd is still not happy:

Feb 21 18:07:20 bigip01 notice gtmd[13701]: 011ae020:5: Connection in progress to 192.168.1.2 
Feb 21 18:07:20 bigip01 notice gtmd[13701]: 011ae01c:5: Connection complete to 192.168.1.2. Starting SSL handshake
Feb 21 18:07:20 bigip01 iqmgmt_ssl_connect: SSL error:14090086:SSL routines:SSL3_GET_SERVER_CERTIFICATE:certificate verify failed
Feb 21 18:07:20 bigip01 err gtmd[13701]: 011ae0fa:3: iqmgmt_ssl_connect: SSL error: error:14090086:SSL routines:SSL3_GET_SERVER_CERTIFICATE:certificate verify failed (336134278)

This is happening on the secondary as well:

Feb 21 18:20:26 bigip02 iqmgmt_ssl_connect: SSL error:14090086:SSL routines:SSL3_GET_SERVER_CERTIFICATE:certificate verify failed
Feb 21 18:20:26 bigip02 err gtmd[13788]: 011ae0fa:3: iqmgmt_ssl_connect: SSL error: error:14090086:SSL routines:SSL3_GET_SERVER_CERTIFICATE:certificate verify failed (336134278)

As you may have guessed, GTM needs a separate step to pick up the cert changes.  The solution is here is not all that obvious – run bigip_add on both systems to copy the certs via SSH and dump them in to /config/gtm/server.crt.  Before doing it, keep in mind there’s two requirements for this command to succeed:

  1. The Self IP must be permit SSH.  Since this is a one-time exchange, you can use management IPs even if the GTM server IP is something different.   For non-management IPs, “Port Lockdown: Allow All” should be set.
  2. The user must be set to use Advanced Shell aka BASH (not tmsh).  The simplest way to override this to login as ‘root’, since it’s a pre-defined user that is inherently set for Advanced Shell
[admin@bigip01:Active:In Sync] ~ # bigip_add root@10.1.1.2
Retrieving remote and installing local BIG-IP's SSL certs ...
Enter root password for 10.1.1.2 if prompted
==> Done <==

[admin@bigip02:Active:In Sync] ~ # bigip_add root@10.1.1.1
Retrieving remote and installing local BIG-IP's SSL certs ...
Enter root password for 10.1.1.1 if prompted
==> Done <==

Boom!  GTM is now happy now and I can go home.

Feb 21 18:25:11 bigip01 alert gtmd[13701]: 011a500b:1: SNMP_TRAP: Box 192.168.1.2 state change blue --> green
Feb 21 18:26:16 bigip02 alert gtmd[13788]: 011a500b:1: SNMP_TRAP: Box 192.168.1.1 state change blue --> green

 

 

Enabling Config Syncing on BigIP GTMs

After two weeks of working with the consultant and getting nowhere, I booked a hotel for the weekend

We recently retired data centers, and in the process, transported its GTM to our home office.  Despite being gentle, its single hard drive failed in the process, and once the RMA unit arrive, I was once again reminded how confusing the configuration process is.  The F5 BigIP is loaded with nerd knobs, and the GTM is especially hairy as it has overlapping menus and settings different from the more popular LTM

1) Select the Self IP for Sync communication, and verify iQuery is allowed

This ideally should be an internal interface that has redundancy (for example, is two physical interfaces bonded via LACP).

Usually the internal Self IP is set to “allow all”.  Since I had deliberately set this Self IP as “allow none”, I had to change it to “Allow custom”, and add tcp port 4353.

sol13690

2) Set the config sync source Interface

This will be the step that anyone with F5 HA experience will be familiar with.  However, most will assume it to be under System -> High Availability -> Device Connectivity.

Instead, look under Device Management -> Devices -> and click the self GTM.  Then look under Device Connectivity -> ConfigSync

Note that this is the only step needed under the entire Device Management tree.

3) On the existing GTM, create the other GTM as a server

This is done under DNS -> GSLB -> Servers.  Enter the name and IP address of the other GTM, with the product as “BIG-IP System (Single)”.  Chose the same IP address as the Self IP selected in the previous step.  Set the health monitor to “bigip”

4) Run bigip_add on both units

This will use SSH to exchange SSL certificates.  In order for bigip_add to work, both sides must have SSH enabled, and the account must have advanced shell (bash) set.

admin@(f5bigip01)(cfg-sync In Sync)(Active)(/Common)(tmos)# bash
[admin@f5bigip01:Standby:In Sync] ~ # bigip_add admin@192.168.1.12
Retrieving remote and installing local BIG-IP's SSL certs ...
Password:
==> Done <==

admin@(f5bigip02)(cfg-sync In Sync)(Standby)(/Common)(tmos)# bash
[admin@f5bigip02:Active:In Sync] ~ # bigip_add admin@192.168.1.11
Retrieving remote and installing local BIG-IP's SSL certs ...
Password:
==> Done <==

If the account does not have SSH & bash enabled (which is the default setting), this error appears:

ERROR: Can’t read remote cert via /usr/bin/ssh

Since this is a 1-time exchange, I simply used the management port IP addresses, as it has SSH enabled.

Once this step has been completed, each GTM should have the other’s certificate installed under DNS -> GSLB -> Servers -> Trusted Server Certificates or /config/gtm/server.crt

F5 sol13823

5) On the device whos config you want to overwrite, run gtm_add

This will pull the gtm.conf configuration over iQuery (tcp port 4353). The IP address should match the one that was given in the first step.  In our case, this was the internal interface.

F5 sol13312

F5 TCP Profiles for high speed file transfers

Client Profile: controls connection between the BigIP LTM and the client

General rules for this are:

  • Have a proxy buffer higher than 64 KB but not too high.  131072 B for both works
  • Receive window should be at least 132 KB
  • Enable Rate Pace and Delay Window Control
  • Increase Max Syn Retransmissions from the default of 3 to at least 7
  • By default, keep alive internal is 1800 seconds (30 minutes).  I prefer this much smaller (10-60 seconds), as it can come in to play with loss recovery
  • Congestion control defaults to High Speed, which is fine.  Environments with mobile may want to try Woodside
ltm profile tcp /Common/tcp-highspeed-client {
 app-service none
 defaults-from /Common/tcp
 delay-window-control enabled
 early-retransmit enabled
 keep-alive-interval 30
 proxy-buffer-high 131072
 proxy-buffer-low 131072
 rate-pace enabled
 receive-window-size 524288
 selective-acks enabled
 send-buffer-size 1048576
 syn-max-retrans 10
 tail-loss-probe enabled
}

Server Profile: controls connection between the BigIP LTM and the backend server

In this case you’ll want to match the OS settings of the backend server.  Running “sysctl -a | grep net.ipv4.tcp” is a quick way to discover these.  Example for CentOS 6.5:

net.ipv4.tcp_keepalive_intvl = 75
net.ipv4.tcp_keepalive_time = 7200
net.ipv4.tcp_syn_retries = 5

Profile then matches it.  Also, Congestion control and Slow Start are disabled since we can assume the connection between the LTM and backend servers will be very fast, low latency, and no loss:

ltm profile tcp /Common/tcp-highspeed-server {
 app-service none
 congestion-control none
 defaults-from /Common/tcp
 idle-timeout 7200
 keep-alive-interval 75
 nagle disabled
 proxy-buffer-high 131072
 proxy-buffer-low 98304
 slow-start disabled
 syn-max-retrans 5
}