Cacti Template for Cisco Nexus 9K

Tested on Nexus 93180YC-EX running 7.0(3)I7(6), but should work on others.

https://drive.google.com/open?id=16GLnmSXUbnpu7LWP9p7fdS5nxvJAK5OJ

Sample graphs:

Nexus9K_CPU

Nexus9K_Memroy

Advertisement

EEM Script to Generate Show Tech & Auto Reboot a router

While working through my CSR1000v stability woes, I had the need to automatically generate a “show tech” and then reboot a router after an IP SLA failure was detected.  It seemed fairly easy but I could never get the show tech fully completed before the EMM script would stop running, and the reboot command never worked either.

Posting on Reddit paid off as user caught the problem: EEM scripts by default can only run for 20 seconds.  Since a “show tech” can take longer than this, the subsequent steps may never be processed.  The solution is increase the runtime to say 60 seconds to guarantee the show tech completes:

! Create and run IP SLA monitor to ping default gateway every 5 seconds
ip sla 1
 icmp-echo 10.0.0.1 source-interface GigabitEthernet1
 threshold 50
 timeout 250
 frequency 5
!
ip sla schedule 1 life forever start-time now
!
! Create track object that will mark down after 3 failures
track 1 ip sla 1
 delay down 15 up 30
!
! Create EMM script to take action when track state is down
event manager session cli username "ec2-user"
event manager applet GatewayDown authorization bypass
 event track 1 state down maxrun 60
  action 100 cli command "en"
  action 101 cli command "term len 0"
  action 110 syslog priority notifications msg "Interface Gi1 stopped passing traffic. Generating diag info"
  action 300 cli command "delete /force bootflash:sh_tech.txt"
  action 350 cli command "show tech-support | redirect bootflash:sh_tech.txt"
  action 400 syslog priority alerts msg "Show tech completed. Rebooting now!"
  action 450 wait 5
  action 500 reload

Monitoring CPU & Memory in IOS-XE

ios-xe_cpu

One important thing to understanding in IOS-XE is the different numbers that can be returned when checking CPU and memory statistics.  There’s some very down in the weeds docs on this, but the simplest way to break it down is process vs. platform.  Processes is essentially control plane, while platform is data plane.

CPU

Processor CPU

CLI command: show processes cpu

SNMP OIDs:

1.3.6.1.4.1.9.2.1.56.0 = 5 second
1.3.6.1.4.1.9.2.1.57.0 = 1 minute
1.3.6.1.4.1.9.2.1.58.0 = 5 minute

Platform CPU

CLI command: show processes cpu platform

SNMP OIDs:

1.3.6.1.4.1.9.9.109.1.1.1.1.3.7 = 5 second
1.3.6.1.4.1.9.9.109.1.1.1.1.4.7 = 1 minute
1.3.6.1.4.1.9.9.109.1.1.1.1.5.7 = 5 minute

Note – Most platforms will be multi-core.

Memory

Processor Memory

CLI command: show processes memory

SNMP OIDs:

1.3.6.1.4.1.9.9.48.1.1.1.5.1 = Memory Used
1.3.6.1.4.1.9.9.48.1.1.1.6.1 = Memory Free

Platform Memory

CLI command: show platform resources

SNMP OIDs:

1.3.6.1.4.1.9.9.109.1.1.1.1.12.7 = Memory Used
1.3.6.1.4.1.9.9.109.1.1.1.1.13.7 = Memory Free
1.3.6.1.4.1.9.9.109.1.1.1.1.27.7 = Memory Committed

Cacti Templates

These were written for Cacti 0.8.8f

https://spaces.hightail.com/space/FoUD1PvlXA

 

Cacti 1.0 to 1.1 upgrade: MySQL TimeZone Database is not populated

Give the cacti user permission to read the internal MySQL table for time zone names:

[j5@linux ~]$ mysql -u root -p mysql
mysql> grant select on mysql.time_zone_name to cactiuser@'%';
Query OK, 0 rows affected (0.00 sec)

mysql> flush privileges;
Query OK, 0 rows affected (0.00 sec)

mysql> quit

To populate MySQL with some Timezone information:

[j5@linux ~]$ mysql -u root -p mysql < /usr/share/mysql/mysql_test_data_timezone.sql 
Enter password:

Now there’s at least some stuff there:

mysql> select * from time_zone_name;
+--------------------+--------------+
| Name | Time_zone_id |
+--------------------+--------------+
| MET | 1 |
| UTC | 2 |
| Universal | 2 |
| Europe/Moscow | 3 |
| leap/Europe/Moscow | 4 |
| Japan | 5 |
+--------------------+--------------+
6 rows in set (0.00 sec)

 

Cacti: MySQL table is marked as crashed and should be repaired

Had to do several reboots of the Cacti VM tonight to do some NFS mount fixes, and noticed graphs weren’t updating and the device list was returning zero rows.  Immediately my thought was database, and this was confirmed in cacti.log

2017-09-13 22:00:00 - DBCALL ERROR: SQL Assoc Failed!, Error:145, SQL:"SELECT status, COUNT(*) as cnt FROM `host` GROUP BY status"
2017-09-13 22:00:00 - DBCALL ERROR: SQL Assoc Failed!, Error: Table './cacti/host' is marked as crashed and should be repaired

Also in var/log/mysqld.log:

170913 22:03:00 [ERROR] /usr/libexec/mysqld: Table './cacti/host' is marked as crashed and should be repaired

This blog pointed me to the easy fix:

mysqlcheck -u cactiuser -p --auto-repair --databases cacti