Monitoring CPU & Memory in IOS-XE

February 2, 2018March 1, 2019 J5Leave a comment

ios-xe_cpu

One important thing to understanding in IOS-XE is the different numbers that can be returned when checking CPU and memory statistics. There’s some very down in the weeds docs on this, but the simplest way to break it down is process vs. platform. Processes is essentially control plane, while platform is data plane.

CPU

Processor CPU

CLI command: show processes cpu

SNMP OIDs:

1.3.6.1.4.1.9.2.1.56.0 = 5 second
1.3.6.1.4.1.9.2.1.57.0 = 1 minute
1.3.6.1.4.1.9.2.1.58.0 = 5 minute

Platform CPU

CLI command: show processes cpu platform

SNMP OIDs:

1.3.6.1.4.1.9.9.109.1.1.1.1.3.7 = 5 second
1.3.6.1.4.1.9.9.109.1.1.1.1.4.7 = 1 minute
1.3.6.1.4.1.9.9.109.1.1.1.1.5.7 = 5 minute

Note – Most platforms will be multi-core.

Memory

Processor Memory

CLI command: show processes memory

SNMP OIDs:

1.3.6.1.4.1.9.9.48.1.1.1.5.1 = Memory Used
1.3.6.1.4.1.9.9.48.1.1.1.6.1 = Memory Free

Platform Memory

CLI command: show platform resources

SNMP OIDs:

1.3.6.1.4.1.9.9.109.1.1.1.1.12.7 = Memory Used
1.3.6.1.4.1.9.9.109.1.1.1.1.13.7 = Memory Free
1.3.6.1.4.1.9.9.109.1.1.1.1.27.7 = Memory Committed

Cacti Templates

These were written for Cacti 0.8.8f

https://spaces.hightail.com/space/FoUD1PvlXA

Improving DNS performance for recursive/cache-only queries to Internet

January 15, 2018November 6, 2018 J5Leave a comment

BIND servers will typically ship with a factory-default hint zone like this:

zone "." {
 type hint;
 file "db.root";
};

You’ll see this db.root file contains a static list of the 13 root servers. It gets the job done, but since recursive queries always go out to the root servers, it’s not ideal.

A better solution: download the complete database from the root servers themselves:

zone "." {
 type slave;
 masters {
  198.41.0.4;
  192.228.79.201;
  192.33.4.12;
  199.7.91.13;
 };
 file "root.cache";
};

This file is roughly 2 MB and will take a few seconds to transfer, but helps deliver much more consistent lookup times since it hits the TLD servers directly without first bouncing off the root servers. Note the significantly lower standard deviation below:

As an added bonus, it will be resilient should the root servers ever come under DDoS.

Cacti 1.0 to 1.1 upgrade: MySQL TimeZone Database is not populated

December 4, 2017 J51 Comment

Give the cacti user permission to read the internal MySQL table for time zone names:

[j5@linux ~]$ mysql -u root -p mysql
mysql> grant select on mysql.time_zone_name to cactiuser@'%';
Query OK, 0 rows affected (0.00 sec)

mysql> flush privileges;
Query OK, 0 rows affected (0.00 sec)

mysql> quit

To populate MySQL with some Timezone information:

[j5@linux ~]$ mysql -u root -p mysql < /usr/share/mysql/mysql_test_data_timezone.sql 
Enter password:

Now there’s at least some stuff there:

mysql> select * from time_zone_name;
+--------------------+--------------+
| Name | Time_zone_id |
+--------------------+--------------+
| MET | 1 |
| UTC | 2 |
| Universal | 2 |
| Europe/Moscow | 3 |
| leap/Europe/Moscow | 4 |
| Japan | 5 |
+--------------------+--------------+
6 rows in set (0.00 sec)

IOS-XE – no negotiation auto

October 23, 2017 J5Leave a comment

This is the equivalent of speed nonegotiate on IOS devices and will be needed to bring the link up if the other end has manually set the port to 1000/full.

Setting admin password for Palo Alto VM in AWS

September 26, 2017 J5Leave a comment

Like the virtual F5, you’ll initially need to SSH to the virtual appliance and change admin password via CLI:

$ ssh -i ~/.ssh/mykey.pem admin@10.10.10.89

admin@PA-VM> configure
Entering configuration mode
[edit] 
admin@PA-VM# set mgt-config users admin password
Enter password : 
Confirm password :

[edit] 
admin@PA-VM# commit

Commit job 2 is in progress. Use Ctrl+C to return to command prompt

..99%........100%
Configuration committed successfully

Then go back to WebGUI, login as admin, and go from there.

Cacti: MySQL table is marked as crashed and should be repaired

September 14, 2017September 14, 2017 J51 Comment

Had to do several reboots of the Cacti VM tonight to do some NFS mount fixes, and noticed graphs weren’t updating and the device list was returning zero rows. Immediately my thought was database, and this was confirmed in cacti.log

2017-09-13 22:00:00 - DBCALL ERROR: SQL Assoc Failed!, Error:145, SQL:"SELECT status, COUNT(*) as cnt FROM `host` GROUP BY status"
2017-09-13 22:00:00 - DBCALL ERROR: SQL Assoc Failed!, Error: Table './cacti/host' is marked as crashed and should be repaired

Also in var/log/mysqld.log:

170913 22:03:00 [ERROR] /usr/libexec/mysqld: Table './cacti/host' is marked as crashed and should be repaired

This blog pointed me to the easy fix:

mysqlcheck -u cactiuser -p --auto-repair --databases cacti

Cisco Serial Console w/ VRF

September 12, 2017October 30, 2018 J5Leave a comment

In this example an HWIC-16a is installed in a 2921 ISR G2 router’s slot 0/0. The management port is configured to a VRF called “MGMT”. The hostnames for the connected devices are set with the “ip host” line along with the VRF, port number (port 0 = tcp port 2003) and local router’s IP address.

hostname isr2921
interface Port-channel1.10
 encapsulation dot1Q 10
 ip vrf forwarding MGMT
 ip address 10.10.10.10 255.255.255.0
!
ip host vrf MGMT router1 2003 10.10.10.10
ip host vrf MGMT router2 2004 10.10.10.10
!
line 0/0/0 0/0/15
 session-timeout 30 
 exec-timeout 30 0
 transport input telnet ssh
!

To connect, specify the VRF name as a parameter

isr2921#telnet router1 /vrf MGMT
Translating "router1"
Trying router1 (10.10.10.10, 2003)...

01150b21:3: RCODE returned from query: ‘SERVFAIL’.

August 21, 2017January 23, 2018 J5Leave a comment

Came across an interesting problem after our F5 BigIP-VEs were victim to a storage failure in VMWare. Certain zones couldn’t be modified or in some cases even viewed in ZoneRunner. Since F5 doesn’t officially support its BIND backend, I knew I was likely on my own for a fix and began poking around /var/named/config/namedb were the files are stored.

[admin@f5bigip01:Active:In Sync] ~ # cd /var/named/config/namedb/
[admin@f5bigip01:Active:In Sync] namedb # ls -ls db.internal.32.30.10.in-addr.arpa.*
 4 -rw-r--r--. 1 named named 977 2017-08-21 12:53 db.internal.32.30.10.in-addr.arpa.
 4 -rw-r--r--. 1 named named 861 2017-08-19 12:06 db.internal.32.30.10.in-addr.arpa.~
12 -rw-r--r--. 1 named named 11302 2017-08-19 11:55 db.internal.32.30.10.in-addr.arpa..jnl

Took a guess that it’s the .jnl file that’s the problem. So I decided to halt BIND, delete the file, and try again…

[admin@f5bigip01:Active:In Sync] ~ # bigstart stop zrd
[admin@f5bigip01:Active:In Sync] ~ # rm -f *..jnl
[admin@f5bigip01:zrd DOWN:In Sync] ~ # bigstart start zrd

Went back to ZoneRunner and was able to view and edit the zone just fine.

F5 Bigip-VE tips for AWS deployment

August 1, 2017January 24, 2019 J5Leave a comment

Launch and initial configuration

The instructions are slightly incorrect. You’ll want to ssh as ‘admin’ (not root or ec2-user)

$ ssh -i mykey.pem admin@10.10.10.111

Then use these TMOS commands to set and save a password for the admin user:

(tmos)# modify auth user admin prompt-for-password
(tmos)# save sys config

Interfaces, Self IPs, and VLANs

While F5 guides list a variety of interface configurations, my advice is use 3

eth0: mgmt – Used for SSH, HTTPS, and SNMP polling access
eth1: interface 1.1: vlan “external” in a public subnet – For talking to Internet
eth2: interface 1.2: vlan “internal” in a private subnet – For talking to internal resources and HA

Routing

The default route should of course be via the external interface’s gateway. Any private IP address spaces (10.0.0.0/8, etc) can be routed via the internal interface’s gateway

If doing an HA pair across multiple availability zones, items with unique IP addresses such as routes, virtual servers, and perhaps pools/nodes will need to go in a separate non-synchronized partition.

To go System -> Users -> Partition list
Create a new partition with a good name (i.e. “LOCAL_ONLY”)
Uncheck the Device Group and set the Traffic Group to “traffic-group-local-only”

LACP with Palo Alto Firewalls

July 17, 2017February 20, 2018 J52 Comments

Today’s task was get LACP working on a Palo Alto, so traffic and fault tolerance could be spread across multiple members of a Cisco 3750X switch stack. The default settings on the Palo Alto surprised me a bit, as I was expecting it to default to active and enable fast timers, but this was easy to set:

Unfortunately during testing, it still took a good minute for failover to work. This is because the standby unit disables interfaces until going active, so there’s a delay of 30-40 seconds for LACP bundling plus an additional 25-50 seconds for Spanning-Tree. Working around Spanning-Tree was easy: just use Edge port aka PortFast. Note it should be enabled at the channel level and ‘trunk’ must be added for it to work on trunk ports:

interface Port-channel4
 description Palo Alto Firewall - LACP
 switchport trunk encapsulation dot1q
 switchport mode trunk
 logging event trunk-status
 logging event bundle-status
 spanning-tree portfast trunk
!

Speeding up LACP took a bit more research. Apparently, only data center grade Cisco switches like the Catalyst 6500 and Nexus line support LACP 1-second fast timers out of the box. The Catalyst 3750 however will support fast timers on the bleeding edge 15.2(4)E train.

Upon testing, the failover downtime due to LACP bundling is now under 10 seconds:

Jul 20 17:58:22 PST: %EC-5-UNBUNDLE: Interface Gi4/1/1 left the port-channel Po31
Jul 20 17:58:22 PST: %EC-5-UNBUNDLE: Interface Gi3/1/1 left the port-channel Po31
Jul 20 17:58:30 PST: %EC-5-BUNDLE: Interface Gi3/1/2 joined port-channel Po32
Jul 20 17:58:32 PST: %EC-5-BUNDLE: Interface Gi4/1/2 joined port-channel Po32

Layer 77

Tips n' Tricks for Network Automation, Cloud, Linux, Containers, Python, etc

Monitoring CPU & Memory in IOS-XE

CPU

Processor CPU

Platform CPU

Memory

Processor Memory

Platform Memory

Cacti Templates

Improving DNS performance for recursive/cache-only queries to Internet

Cacti 1.0 to 1.1 upgrade: MySQL TimeZone Database is not populated

IOS-XE – no negotiation auto

Setting admin password for Palo Alto VM in AWS

Cacti: MySQL table is marked as crashed and should be repaired

Cisco Serial Console w/ VRF

01150b21:3: RCODE returned from query: ‘SERVFAIL’.

F5 Bigip-VE tips for AWS deployment

Launch and initial configuration

Interfaces, Self IPs, and VLANs

Routing

LACP with Palo Alto Firewalls