Viewing Alarm Status reports

Control	Description
Backup Integration	Indicates that the backup-integration module has failed.
Block-disk	(Only active on iSCSI/block appliances) - Indicates that the block-disk module has failed.
Failover	Indicates that the Core has failed and the failover peer is in operation.
CPU Utilization	Indicates that the system has reached the CPU threshold for one or more of the CPUs in the Core. If the system has reached the CPU threshold, check your settings. If your alarm thresholds are correct, reboot the Core. Note: If more than 100 MB of data are moved through the Core while performing PFS synchronization, the CPU utilization might become high and result in a CPU alarm. This CPU alarm isn’t cause for concern.
Disk Full	Indicates that one or more of the following partitions on the disk is full: • Partition "/boot" Full • Partition "/bootmgr" Full • Partition "/config" Full • Partition "/data" Full • Partition "/var" Full
Edge Service	Indicates that the Core has lost connection with one of the configured Edges.
Hardware	Indicates that one or more hardware failures have occurred. This alarm setting also enables you to select one or more types of hardware failure (fan error, memory error, and so on), including: • Fan Error - Enables an alarm and sends an email notification if a fan is failing or has failed and needs to be replaced. By default, this alarm is enabled. • Flash Error - Enables an alarm when the system detects an error with the flash drive hardware. By default, this alarm is enabled. • IPMI - Enables an alarm and sends an email notification if an Intelligent Platform Management Interface (IPMI) event is detected. • Other Hardware Error - This alarm indicates that the system has detected a problem with the hardware. The alarm clears when you add the necessary hardware, remove the nonqualified hardware, or resolve other hardware issues. The following issues trigger the hardware error alarm: • The appliance does not have enough disk, memory, CPU cores, or NIC cards to support the current configuration. • The appliance is using a dual in-line memory module (DIMM), a hard disk, or a NIC that isn’t qualified by Riverbed. • DIMMs are plugged into the appliance but the system can’t recognize them because the DIMM modules are in the wrong slot. You must plug DIMM modules into the black slots first and then use the blue slots when all of the black slots are in use. • A DIMM module is broken and you must replace it. • Other hardware issues. By default, all Hardware alarms are enabled. • Power Supply - Enables an alarm and sends an email notification if an inserted power supply cord does not have power, as opposed to a power supply slot with no power supply cord inserted. By default, this alarm is enabled. • RAID - Indicates an error with the RAID array (for example, missing drives, pulled drives, drive failures, and drive rebuilds). An audible alarm might also sound. To see if a disk has failed, enter this CLI command from the system prompt: show raid diagram • For drive rebuilds, if a drive is removed and then reinserted, the alarm continues to be triggered until the rebuild is complete. Rebuilding a disk drive can take four to six hours. This alarm applies only to the SteelHead RAID Series 3000, 5000, and 6000.
High Availability	Indicates that the high-availability feature is degraded.
Licensing	Enables an alarm and sends an email notification if the appliance is unlicensed, if there is an issue with the autolicense, the licenses have expired, the licenses are about to expire, or the model is unlicensed. By default, all Licensing alarms are enabled.
Link Duplex	Indicates that an interface was not configured for half-duplex negotiation but has negotiated half-duplex mode. Half-duplex significantly limits the optimization service results. The alarm displays which interface is triggering the duplex error. Choose Configure > Networking: Data Interfaces and examine the Core link configuration. Next, examine the peer switch user interface to check its link configuration. If the configuration on one side is different from the other, traffic is sent at different rates on each side, causing many collisions. To troubleshoot, change both interfaces to automatic duplex negotiation. If the interfaces don’t support automatic duplex, configure both ends for full duplex. You can enable or disable the alarm for a specific interface. To disable an alarm, choose Settings > System Settings: Alarms and select or clear the check box next to the link alarm.
Link I/O Errors	Indicates that the error rate on an interface has exceeded 0.1 percent while either sending or receiving packets. This threshold is based on the observation that even a small link error rate reduces TCP throughput significantly. A properly configured LAN connection experiences very few errors. The alarm clears when the error rate drops below 0.05 percent. You can change the default alarm thresholds by entering the alarm error-threshold CLI command at the system prompt. For details, see the SteelFusion Command-Line Interface Reference Manual. To troubleshoot, try a new cable and a different switch port. Another possible cause is electromagnetic noise nearby. You can enable or disable the alarm for a specific interface: for example, you can disable the alarm for a link after deciding to tolerate the errors. To enable or disable an alarm, choose Settings > System Settings: Alarms and select or clear the check box next to the link name.
Link State	Indicates that the system has lost one of its Ethernet links due to an unplugged cable or dead switch port. Check the physical connectivity between the appliance and its neighbor device. Investigate this alarm as soon as possible. Depending on which link is down, the system might no longer be optimizing and a network outage could occur. You can enable or disable the alarm for a specific interface. To enable or disable the alarm, choose Settings > System Settings: Alarms and select or clear the check box next to the link name.
Memory Paging	Indicates extended memory paging activity. If 100 pages are swapped every couple of hours, the appliance is functioning properly. If thousands of pages are swapped every few minutes, contact Riverbed Support.
Process Dump Creation Error	Indicates that the system detected an error while trying to create a process dump. This alarm indicates an abnormal condition in which the system can’t collect the core file after three retries. This condition can be caused when the /var directory reaches capacity. When the alarm is raised, the directory is blacklisted.
Secure Vault	Secure Vault Locked - Indicates that the secure vault is locked. To optimize SSL connections or to use RiOS data store encryption, the secure vault must be unlocked. Go to Settings > Security: Secure Vault and unlock the secure vault.
Server Backup	Indicates that one of the following backup failures have occurred: • Proxy connection failure - Indicates that the connection between Core and the proxy server has failed, or the credentials for ESXi proxy login are incorrect. When the connection is restored, correct credentials are provided, or the proxy configuration is deleted, the alarm is cleared. • Backup failure - Indicates that a backup policy has failed. The message identifies the failing server, backup policy name, and the reason for failure. The reason may be a storage backend failure for a snapshot or clone operation, failure to mount or unmount the exports, or a slow proxy server resulting in a timeout. Once the next protection operation succeeds, the alarm is cleared. • Proxy cleanup timeout - Indicates that proxy cleanup is taking more time than expected. If a backup takes longer than 30 minutes, or if a snapshot remains on a VM after a failed backup, Core will trigger an alarm. To fix this issue, check the state of the ESXi server. For ESXi servers, ensure that the failed backup did not leave a snapshot on the VM. Once the next protection operation succeeds, the alarm is cleared. • Snapshot error - Indicates that proxy mounted VMs have associated snapshots and can’t be unmounted. • Excluded VMs - Indicates that VMs are excluded from a backup policy.
Snapshot	Indicates that the connection to one or more of the snapshot storage arrays has failed.
SSL	Indicates that the system detected an error in your SSL configuration.
SteelFusion Core configuration status	Indicates that the Core configuration has been reverted to a previous version and all connections to the Edges are lost. Contact Riverbed Support at https://support.riverbed.com.
SteelFusion Core Service	Indicates that the Core service isn’t running.
SteelFusion Protocol Service	Indicates that an NFS protocol error from the backend storage array is preventing an export from being mounted on the Core. By default, this alarm is enabled.
Storage Volume Status	Indicates that the connection to the export has failed or there is an issue with any of the following: • Backend connectivity • No read/write permissions • Space threshold has been reached • Resize failure • An export is deactivated and unavailable. An export will be deactivated if the blockstore has a critical amount of low space and this particular export has a high rate of new writes. • Initialization of the blockstore for the export fails, making the export unavailable. Check if the data center export was offlined on the Core while I/O operations were in progress. Reactivate the export through the Management Console or the CLI to troubleshoot this issue. • A Resize alarm will be triggered for an export if its size is changed on the storage array and the Core isn’t able to make the new size available to the branch client. Some reasons why a resize may not be propagated to the branch are: • The size of the export on the storage array is reduced. • The increased size of a pinned export cannot be accommodated in the Edge blockstore. • Data sync is blocked between Edge and Core, potentially due to a protocol-related issue.
Temperature	• Critical Temperature - Indicates that the CPU temperature exceeds the rising threshold. When the CPU returns to the reset threshold, the critical alarm is cleared. The default value for the rising threshold temperature is 70ºC; the default reset threshold temperature is 67ºC. • Warning Temperature - Indicates that the CPU temperature is approaching the rising threshold. When the CPU returns to the reset threshold, the warning alarm is cleared. After the alarm triggers, it cannot trigger again until after the temperature falls below the reset threshold and then exceeds the rising threshold again.