Viewing Diagnostic Reports and Logs : Viewing SCC Alarm Status Reports
  
Viewing SCC Alarm Status Reports
The appliance tracks key hardware and software metrics and alerts you of any potential problems so you can quickly discover and diagnose issues.
RiOS 7.0 and later features better alarm reporting using hierarchical alarms. The system groups certain alarms into top-level categories, such as the SSL Settings alarm. When an alarm triggers, its parent expands to provide more information. For example, the System Disk Full top-level alarm aggregates over multiple partitions. If a specific partition is full, the System Disk Full alarm triggers and the Alarm Status report displays more information regarding that partition caused the alarm to trigger.
The health of the SCC falls into one of the following states:
•  Healthy - The SCC is in a healthy state.
•  Needs Attention - The SCC is in a healthy state indicating management-related issues are not affected but something may need to be looked it. For example, the license may need to be reviewed.
•  Degraded - The SCC has detected an issue.
•  Critical - The SCC has encountered a critical issue that needs to be addressed immediately.
The health of a managed appliance on the SCC falls into one of the following states:
•  Needs Attention - Accompanies a healthy state to indicate management-related issues not affecting the ability of the SteelHead to optimize traffic.
•  Degraded - The SteelHead is optimizing traffic but the system has detected an issue.
•  Admission Control - The SteelHead is optimizing traffic but has reached its connection limit.
•  Critical - The SteelHead might or might not be optimizing traffic; you must address a critical issue.
•  Unsupported - The SteelHead is unsupported.
The Alarm Status report provides the status for the SCC alarms and includes the following alarm information.
Alarm
Reason
CPU Utilization
Displays an alarm when the system has reached the CPU threshold for any of the CPUs in the appliance. If the system has reached the CPU threshold, check your settings.
If your alarm thresholds are correct, reboot the appliance.
If more than 100 MBs of data is moved through an appliance while performing PFS synchronization, the CPU utilization can become high and result in a CPU alarm. This CPU alarm is not cause for concern.
Disk Full
Displays an alarm when the system partitions (not the RiOS data store) are full or almost full. For example, RiOS monitors the available space on /var that is used to hold logs, statistics, system dumps, TCP dumps, and so on.
This alarm monitors the following system partitions:
Partition “/boot Full” Free Space
Partition “/bootmgr Full” Free Space
Partition “/config Full” Free Space
Partition “/data Full” Free Space
Partition “/scratch” Free Space
Partition “/var” Free Space
Hardware
•  Fan Error - Indicates a fan is failing or has failed and needs to be replaced.
•  Flash Error - Indicates an error with the flash drive hardware.
•  IPMI - Indicates an Intelligent Platform Management Interface (IPMI) event. (Not supported on all appliance models.)
This alarm triggers when there has been a physical security intrusion. These events trigger this alarm:
•  Chassis intrusion (physical opening and closing of the appliance case)
•  Memory errors (correctable or uncorrectable ECC memory errors)
•  Hard drive faults or predictive failures
•  Power supply status or predictive failure
By default, this alarm is enabled.
•  Memory Error - Indicates a memory error. For example, when a system memory stick fails.
•  Power Supply - Indicates an inserted power supply cord does not have power, as opposed to a power supply slot with no power supply cord inserted.
Licensing
Displays an alarm when your licenses are current.
•  Autolicense critical event - This alarm triggers on a SteelHead (virtual edition) appliance when the Riverbed Licensing Portal cannot response to a license request with valid licenses. The Licensing Portal cannot issue a valid license for one of these reasons:
–   A newer SteelHead (virtual edition) appliance is already using the token, so you cannot use it on the SteelHead (virtual edition) appliance displaying the critical alarm. Every time the SteelHead (virtual edition) appliance attempts to refetch a license token, the alarm retriggers.
–  The token has been redeemed too many times. Every time the SteelHead (virtual edition) appliance attempts to refetch a license token, the alarm retriggers.
•  Autolicense informational event - This alarm triggers if the Riverbed Licensing Portal has information regarding the licenses for a SteelHead (virtual edition) appliance. For example, the SteelHead (virtual edition) appliance displays this alarm when the portal returns licenses that are associated with a token that has been used on a different SteelHead (virtual edition) appliance.
•  Insufficient Appliance Management License(s) - This alarm triggers if there are not enough licenses to manage all connected appliances.
•  Invalid License(s) - This alarm triggers if there is any invalid license.
•  Licenses Expired - This alarm triggers if one or more features has at least one license installed, but all of them are expired.
•  Licenses Expiring - This alarm triggers if the license for one or more features is going to expire within two weeks.
•  License(s) Missing - This alarm triggers if any licenses are missing.
Note: The licenses expiring and licenses expired alarms are triggered per feature. For example: if you install two license keys for a feature, LK1-FOO-xxx (expired) and LK1-FOO-yyy (not expired), the alarms do not trigger, because the feature has one valid license.
Link Duplex
Displays an alarm when an interface was not configured for half-duplex negotiation but has negotiated half-duplex mode. Half duplex significantly limits the optimization service results.
The alarm displays which interface is triggering the duplex alarm.
•  Interface aux Half-Duplex
•  Interface primary Errors
Link I/O Errors
Displays an alarm when the error rate on an interface has exceeded 0.1 percent while either sending or receiving packets. This threshold is based on the observation that even a small link error rate reduces TCP throughput significantly. A properly configured LAN connection experiences very few errors. The alarm clears when the error rate drops below 0.05 percent.
The alarm clears when the rate drops below 0.05 percent.
Link State
Displays an alarm and sends an email notification if an Ethernet link is lost due to an unplugged cable or dead switch port. Depending on that link is down, the system can no longer be optimizing and a network outage could occur.
This condition is often caused by surrounding devices, like routers or switches, interface transitioning. This alarm also accompanies service or system restarts on the appliance.
For AUX/PRIMARY interfaces, the alarm triggers if in-path support is enabled.
By default, this alarm is disabled.
•  Memory Paging - Displays an alarm when the system has reached the memory paging threshold. If 100 pages are swapped approximately every two hours the SteelHead is functioning properly. If thousands of pages are swapped every few minutes, then reboot the SteelHead. If rebooting does not solve the problem, contact Riverbed Support at https://support.riverbed.com.
•  Process Dump Creation - Displays an alarm when the system has detected an error while trying to create a process dump. This alarm indicates an abnormal condition where RiOS cannot collect the core file after three retries. It can be caused when the /var directory that is used to hold system dumps is reaching capacity or other conditions. When this alarm is raised, the directory is blacklisted.
•  SCC Appliance Configuration Backup - Displays an alarm when the daily back up has failed.
•  SCC External Configuration Backup/Restore - Displays an alarm when the external configuration backup has failed. It updates every 30 seconds.
•  SCC External Statistics Backup/Restore - Displays an alarm when the external statistics backup has failed. It updates every 30 seconds.
Secure Vault
Enables an alarm and sends an email notification if the system encounters a problem with the secure vault:
•  Secure Vault Locked - Needs Attention - Indicates that the secure vault is locked. To optimize SSL connections or to use RiOS data store encryption, the secure vault must be unlocked. Choose Appliance > Secure Vault and unlock the secure vault.
•  Secure Vault New Password Recommended - Degraded - Indicates that the secure vault requires a new, nondefault password. Reenter the password.
•  Secure Vault Not Initialized - Critical - Indicates that an error has occurred while initializing the secure vault. When the vault is locked, SSL traffic is not optimized and you cannot encrypt the RiOS data store.
SSL
Enables an alarm if an error is detected in your SSL configuration:
•  Non-443 SSL Servers - Indicates that during a RiOS upgrade (for example, from 5.5 to 6.0), the system has detected a preexisting SSL server certificate configuration on a port other than the default SSL port 443. SSL traffic cannot be optimized. To restore SSL optimization, you can add an in-path rule to the client-side SteelHead to intercept the connection and optimize the SSL traffic on the nondefault SSL server port.
After adding an in-path rule, you must clear this alarm manually by entering the following CLI command:
stats alarm non_443_ssl_servers_detected_on_upgrade clear
•  SSL Certificates Error - Indicates that an SSL peering certificate has failed to reenroll automatically within the Simple Certificate Enrollment Protocol (SCEP) polling interval.
•  SSL Certificates Expiring - Indicates that an SSL certificate is about to expire.
•  SSL Certificates SCEP - Indicates that an SSL certificate has failed to reenroll automatically within the SCEP polling interval.
Temperature
•  Critical Temperature - Enables an alarm and send an email notification of the CPU temperature exceeds the rising threshold. When the CPU returns to the reset threshold, the critical alarm is cleared. The default value for the rising threshold temperature is 70º C; the default threshold temperature is 67º C.
•  Warning Temperature - Enables an alarm and sends an email notification if the CPU temperature approaches the rising threshold. When the CPU returns to the reset threshold, the waning alarm is cleared.
What This Report Tells You
The Alarm Status report answers the following question:
•  What is the current status of the SCC?
About Report Data
The SCC is designed to retain statistics for up to a maximum of 3 years, based on daily statistics for 2,000 appliances monitoring 50-100 TCP ports per SteelHead. Factors that can influence this number include the number of monitored TCP ports, the number of active interfaces on managed appliances, and changes in types amounts of data collected in RiOS releases.
The SCC polls data every five minutes. In general, the SCC retains 5-minute granularity data points for a maximum of 30 days. 1-hour granularity data points are stored for a maximum of 90 days. Beyond 90 days, SCC retains 1-day granularity data points for up to 3 years. In case of stats in excess of capacity, the SCC deletes the oldest data from each of the three granularities, while attempting to preserve as much recent data as it can.
Note: Be aware that if the SCC and remote appliances lose connectivity with each other, the bandwidth and connection data during the period of lost connectivity can be skewed. For example, if a remote appliance loses connectivity with the SCC for six hours, data for the missing six hours appears to be 0 in reports for periods of Last Day or Custom intervals smaller than one day. However, when the remote appliance reestablishes connectivity, it sends an aggregate data point for the last day. Thus, report for periods longer than Last Day do reflect bandwidth and connection data accurately. If you need to analyze data on the remote SteelHead for the missing period, you can view this in the SCC for the individual remote appliance.
To view the SCC Status report
•  Choose Diagnostics > SCC System: Alarm Status to display the Alarms Status page.
Figure: Alarm Status Report
Related Topics
•  Configuring Alarm Parameters
•  Configuring SNMP Basic Settings