Viewing Reports and Logs : Viewing diagnostics reports : Viewing Alarm Status reports
  
Viewing Alarm Status reports
The Reports > Diagnostics: Alarm Status report provides status for the Mobile Controller alarms.
The Mobile Controller tracks key hardware and software metrics and alerts you to any potential problems so that you can quickly discover and diagnose issues.
Mobile Controller 4.0 and later feature alarm reporting using hierarchical alarms. The system groups certain alarms into top-level categories, such as the SSL Settings alarm. When an alarm triggers, its parent expands to provide more information: for example, the System Disk Full top-level alarm aggregates over multiple partitions. If a specific partition is full, the System Disk Full alarm triggers and the Alarm Status report displays more information regarding which partition caused the alarm to trigger.
The alarm status falls into one of these states:
OK - Signifies that no problems have been found.
Needs Attention - Accompanies a healthy state to indicate management-related issues not affecting the ability of the Mobile Controller to optimize traffic.
Degraded - Indicates that the Mobile Controller is optimizing traffic, but the system has detected an issue.
Admission Control - Indicates that the Mobile Controller is optimizing traffic but has reached its connection limit.
Critical - Indicates that the Mobile Controller might or might not be optimizing traffic; you must address a critical issue.
Suppressed - Appears after a child alarm when its parent alarm is disabled on the Administration > System Settings: Alarms page.
Disabled - Appears when a child alarm is disabled even though its parent alarm is enabled.
The Alarm Status report includes this alarm information.
Alarm
Mobile Controller state
Reason
Configuration
 
Indicates whether a configuration error was detected.
CPU Utilization
Degraded
Indicates that the system has reached the CPU threshold for any of the CPUs in the Mobile Controller. If the system has reached the CPU threshold, check your settings. For details, see Configuring alarm settings.
If your alarm thresholds are correct, reboot the Mobile Controller. For details, see Rebooting and shutting down the Mobile Controller.
If more than 100 MB of data is moved through a Mobile Controller while performing PFS synchronization, the CPU utilization might become high and result in a CPU alarm. This CPU alarm is not cause for concern.
Disk Full
 
Indicates that the system partitions (not the SteelHead Mobile RiOS data store) are full or almost full.
Endpoint Datastore
 
Indicates whether the number of endpoint clients with data store errors has reached the rising threshold. By default, this alarm is enabled.
Endpoint Filesystem Full
 
Indicates whether the number of endpoint clients with File System Full errors has reached the rising threshold. By default, this alarm is enabled.
Endpoint Firewall
 
Indicates whether the number of endpoints with firewall status has reached the rising threshold. By default, this alarm is enabled.
Endpoint Gen Id Error
 
Indicates whether an Endpoint Gen Id error was detected. By default, this alarm is enabled.
Endpoint NFS
 
Indicates whether there has been an NFS error. By default, this alarm is enabled.
Endpoint Service
 
Indicates whether the number of endpoint clients with service errors has reached the rising threshold. By default, this alarm is enabled.
Endpoint SSL Error
 
Indicates whether there has been an SSL error. By default, this alarm is enabled.
Endpoint Version
 
Indicates whether there is a mismatch between software versions in your network. If a software mismatch is detected, resolve the mismatch by upgrading or reverting to a previous version of the software. By default, this alarm is enabled.
Endpoint License
 
Indicates whether the number of connected endpoint licenses (including desktop licenses) has exceeded the licensed limit. For details about updating licenses, see Managing licenses.
Hardware
Either Critical or Degraded, depending on the state
Fan Error - Indicates that a fan is failing or has failed and must be replaced.
Flash Error - Flash Error - Indicates an error with the flash drive hardware. At times, the USB flash drive that holds the system images might become unresponsive; the Mobile Controller continues to function normally. When this error occurs, you can’t perform a software upgrade, as the Mobile Controller is unable to write a new upgrade image to the flash drive without first power-cycling the system.
To reboot the appliance, go to the Administration > Maintenance: Reboot/Shut Down page or enter the CLI reload command to automatically power-cycle the Mobile Controller and restore the flash drive to its proper state.
IPMI - Indicates an Intelligent Platform Management Interface (IPMI) event. (Not supported on all appliance models.)
This alarm triggers when there has been a physical security intrusion. These events trigger this alarm:
chassis intrusion (physical opening and closing of the appliance case)
memory errors (correctable or uncorrectable ECC memory errors)
hard drive faults or predictive failures
power cycle, such as turning the power switch on or off, physically unplugging and replugging the cable, or issuing a power cycle from the power switch controller
By default, this alarm is enabled.
Memory Error - Indicates a memory error: for example, when a system memory stick fails.
Power Supply - Indicates that an inserted power supply cord does not have power, as opposed to a power supply slot with no power supply cord inserted.
RAID - Indicates that the system has encountered RAID errors (for example, missing drives, pulled drives, drive failures, and drive rebuilds). Provides status information for individual drives on the system.
RAID Disk 0 Status
RAID Disk 1 Status
For drive rebuilds, if a drive is removed and then reinserted, the alarm continues to be triggered until the rebuild is complete.
Licensing
Critical
Indicates whether a license on the Mobile Controller is removed, is about to expire, has expired, or is invalid. This alarm triggers if the Mobile Controller has no license installed for its currently configured model.
Autolicense Critical Event - This alarm triggers when the Riverbed Licensing Portal can’t respond to a license request with valid licenses.
Autolicense Informational Event - This alarm triggers if the Riverbed Licensing Portal has information regarding the licenses for a Mobile Controller appliance. For example, this alarm displays when the portal provides a license that is associated with a token previously used on a different Mobile Controller appliance.
Licenses Expired - This alarm triggers if one or more features have at least one license installed, but all of them are expired.
Licenses Expiring - This alarm triggers if the license for one or more features is going to expire within two weeks.
Licensing - This alarm triggers if the Mobile Controller has no BASE or MSPEC license installed for its currently configured model.
The licenses expiring and licenses expired alarms are triggered per feature. For example, if you install two license keys for a feature, LK1-FOO-xxx (expired) and LK1-FOO-yyy (not expired), the alarms don’t trigger, because the feature has one valid license.
Link Duplex
 
Enables an alarm and sends an email notification when an interface was not configured for half-duplex negotiation but has negotiated half-duplex mode.
The alarm displays which interface is triggering the duplex alarm.
Interface aux Half-Duplex
Interface primary Half-Duplex
Link I/O Errors
 
Enables an alarm and sends an email notification when the error rate on an interface exceeds 0.1 percent while either sending or receiving packets. This threshold is based on the observation that even a small link error rate reduces TCP throughput significantly. A properly configured LAN connection experiences very few errors. The alarm clears when the rate drops below 0.05 percent.
The alarm displays the interface with the link error.
Interface aux Link Error
Interface primary Link Error
Link State
Degraded
Indicates that the system has detected a link that is inoperable. You are notified through SNMP traps, email, and alarm status.
Interface aux Down - This alarm triggers if an Ethernet link is inoperable on the aux interface.
Interface primary Down - This alarm triggers if an Ethernet link is inoperable on the primary interface.
By default, this alarm is disabled.
Memory Paging
Degraded
Indicates that the system has reached the memory paging threshold. If 100 pages are swapped approximately every two hours, the SteelHead is functioning properly. If thousands of pages are swapped every few minutes, then reboot the Mobile Controller. For details, see Rebooting and shutting down the Mobile Controller.
If rebooting does not solve the problem, contact the Riverbed Support site at:
Process Dump Creation Error
Degraded
Indicates that the system has detected an error while trying to create a process dump. This alarm indicates an abnormal condition in which RiOS can’t collect the core file after three retries. It can be caused when the /var directory, which is used to hold system dumps, is reaching capacity or other conditions. When this alarm is raised, the directory is blacklisted.
Contact Riverbed Support to correct the issue.
Secure Vault
Degraded
Indicates a problem with the secure vault.
Secure Vault Locked - Needs Attention - Indicates that the secure vault is locked. To optimize SSL connections or to use RiOS data store encryption, the secure vault must be unlocked. Choose Administration > Security: Secure Vault and unlock the secure vault. For details, see Unlocking the secure vault.
SSL
 
Indicates that an error has been detected in your SSL configuration.
SSL Certificates - Indicates that an SSL peering certificate has failed to reenroll automatically within the Simple Certificate Enrollment Protocol (SCEP) polling interval.
SSL Signing Certificate Validity - Indicates that an SSL peering certificate has failed to reenroll automatically within the Simple Certificate Enrollment Protocol (SCEP) polling interval.
Temperature
Critical or Warning
Indicates that the CPU temperature has exceeded or is approaching the critical threshold:
Critical Temperature - Indicates that the CPU temperature has exceeded the critical threshold. The default value for the rising threshold temperature is 70ºC; the default reset threshold temperature is 67ºC.
Warning Temperature - Indicates that the CPU temperature is about to exceed the critical threshold.
Underprovisioned VM
 
Memory, data storage, or CPU resources are insufficient for the maximum number of endpoints.
Does not apply to the Mobile Controller.
Valid Platform
 
Indicates that the hardware platform does not support the Mobile Controller-v.
By default, this alarm is enabled.
Valid VM
 
Indicates that the virtual machine is unavailable.
What this report tells you
The Alarm Status report answers this question: What is the current status of the Mobile Controller?
To view the Alarm Status report
Choose Reports > Diagnostics: Alarm Status to display the Alarm Status page. Alternately, you can select the current system status that appears in the status box in the upper-right corner of each screen (Healthy, Admission Control, Degraded, or Critical) to display the Alarm Status page.