Configuring alarm settings

RiOS uses hierarchical alarms that group certain alarms into top-level categories, such as the SSL Settings alarm. When an alarm triggers, its parent expands to provide more information. As an example, the System Disk Full top-level parent alarm aggregates over multiple partitions. If a specific partition is full, the System Disk Full parent alarm triggers and the Alarm Status report displays more information regarding which partition caused the alarm to trigger.

Disabling a parent alarm disables its children. You can enable a parent alarm and disable any of its child alarms. You can’t enable a child alarm without first enabling its parent.

The children alarms of a disabled parent appear on the Alarm Status report with a suppressed status. Disabled children alarms of an enabled parent appear on the Alarm Status report with a disabled status. For more details about alarm status, see Viewing Alarm Status reports.

Enables an alarm and sends an email notification if the SteelHead enters admission control. When this occurs, the SteelHead optimizes traffic beyond its rated capability and is unable to handle the amount of traffic passing through the WAN link. During this event, the SteelHead continues to optimize existing connections, but new connections are passed through without optimization.

• Connection Limit—Indicates the system connection limit has been reached. Additional connections are passed through unoptimized. The alarm clears when the SteelHead moves out of this condition.

• CPU—The appliance has entered admission control due to high CPU use. During this event, the appliance continues to optimize existing connections, but new connections are passed through without optimization. The alarm automatically clears when the CPU usage has decreased.

• MAPI—The total number of MAPI optimized connections have exceeded the maximum admission control threshold. By default, the maximum admission control threshold is 85 percent of the total maximum optimized connection count for the client-side SteelHead. The SteelHead reserves the remaining 15 percent so that the MAPI admission control doesn’t affect the other protocols. The 85 percent threshold is applied only to MAPI connections. RiOS is now passing through MAPI connections from new clients but continues to intercept and optimize MAPI connections from existing clients (including new MAPI connections from these clients). RiOS continues optimizing non-MAPI connections from all clients. The alarm automatically clears when the MAPI traffic has decreased; however, it can take one minute for the alarm to clear.

RiOS preemptively closes MAPI sessions to reduce the connection count in an attempt to bring the SteelHead out of admission control by bringing the connection count below the 85 percent threshold. RiOS closes the MAPI sessions in this order:

• Memory—The appliance has entered admission control due to memory consumption. The appliance is optimizing traffic beyond its rated capability and is unable to handle the amount of traffic passing through the WAN link. During this event, the appliance continues to optimize existing connections, but new connections are passed through without optimization. No other action is necessary; the alarm automatically clears when the traffic has decreased.

• TCP—The appliance has entered admission control due to high TCP memory use. During this event, the appliance continues to optimize existing connections, but new connections are passed through without optimization. The alarm automatically clears when the TCP memory pressure has decreased.

Enables an alarm if the system detects a problem with a connection-forwarding neighbor. The connection-forwarding alarms are inclusive of all connection-forwarding neighbors. For example, if a SteelHead has three neighbors, the alarm triggers if any one of the neighbors are in error. In the same way, the alarm clears only when all three neighbors are no longer in error.

• Cluster Neighbor Incompatible—Enables an alarm and sends an email notification if a connection-forwarding neighbor in a SteelHead Interceptor cluster has path selection enabled while path selection isn’t enabled on another appliance in the cluster.

This alarm is also raised when a connection-forwarding neighbor is running a RiOS version that is incompatible with IPv6, or if the IP address configuration between neighbors doesn’t match. Neighbors must be running RiOS 8.5 or later.

• Multiple Interface—Enables an alarm and sends an email notification if the connection to an appliance in a connection forwarding cluster is lost or is disconnected due to a configuration incompatibility.

• Single Interface—Enables an alarm and sends an email notification if the connection to a SteelHead connection-forwarding neighbor is lost.

Enables an alarm and sends an email notification if the average and peak threshold for the CPU utilization is exceeded. When an alarm reaches the rising threshold, it is activated; when it reaches the lowest or reset threshold, it is reset. After an alarm is triggered, it isn’t triggered again until it has fallen below the reset threshold.

• Rising Threshold—Specify the rising threshold. When an alarm reaches the rising threshold, it is activated. The default value is 90 percent.

• Reset Threshold—Specify the reset threshold. When an alarm reaches the lowest or reset threshold, it is reset. After an alarm is triggered, it isn’t triggered again until it has fallen below the reset threshold. The default value is 70 percent.

If the alarm was caused by an unintended change to the configuration, the configuration can be changed to match the old data store settings again and then a service restart (without clearing) will clear the alarm. Typical configuration changes that require a restart clear are changes to the data store encryption (choose Optimization > Data Replication: Data Store) or enabling extended peer table (choose Optimization > Network Services: Peering Rules).

• Corruption—Enables an alarm and sends an email notification if the RiOS data store is corrupt or has become incompatible with the current configuration. To clear the RiOS data store of data, restart the optimization service and click Clear the Data Store.

• Data Store Clean Required—Enables an alarm and sends an email notification if you need to clear the RiOS data store.

• Encryption Level Mismatch—Enables an alarm and sends an email notification if a data store error such as an encryption, header, or format error occurs.

• Synchronization Error—Enables an alarm if RiOS data store synchronization has failed. The RiOS data store synchronization between two SteelHeads has been disrupted and the RiOS data stores are no longer synchronized.

Enables an alarm if the system partitions (not the RiOS data store) are full or almost full. For example, RiOS monitors the available space on /var, which is used to hold logs, statistics, system dumps, TCP dumps, and so on.

Enables an alarm when the system is either unable to communicate with the domain controller, or has detected an SMB signing error, or that delegation has failed. CIFS-signed and Encrypted-MAPI traffic is passed through without optimization.

Enables an alarm if an attempt to join a Windows domain has failed. The number one cause of failing to join a domain is a significant difference in the system time on the Windows domain controller and the SteelHead. A domain join can also fail when the DNS server returns an invalid IP address for the domain controller.

• Disk Error—Enables an alarm when one or more disks is offline. To see which disk is offline, enter this CLI command from the system prompt:

• Fan Error—Enables an alarm and sends an email notification if a fan is failing or has failed and needs to be replaced. By default, this alarm is enabled.

• Flash Error—Enables an alarm when the system detects an error with the flash drive hardware. By default, this alarm is enabled.

• IPMI—Enables an alarm and sends an email notification if an Intelligent Platform Management Interface (IPMI) event is detected. (Not supported on all appliance models.)

– Power cycle, such as turning the power switch on or off, physically unplugging and replugging the cable, or issuing a power cycle from the power switch controller.

• Management Disk Size Error—Enables an alarm if the size of the management disk is too small to support the SteelHead (Virtual Edition) model.

• Memory Error—Enables an alarm and sends an email notification if a memory error is detected, for example, when a system memory stick fails.

• Other Hardware Error—Enables an alarm if a hardware error is detected. These issues trigger the hardware error alarm:

• Safety Valve: disk access exceeds response times—Enables an alarm when the SteelHead is experiencing increased disk access time and has started the safety valve disk bypass mechanism that switches connections into SDR-A. SDR-A performs data reduction in memory until the disk access latency falls below the safety valve activation threshold.

Disk access time can exceed the safety valve activation threshold for several reasons: the SteelHead might be undersized for the amount of traffic it is required to optimize, a larger than usual amount of traffic is being optimized temporarily, or a disk is experiencing hardware issues such as sector errors, failing mechanicals, or RAID disk rebuilding.

• Power Supply—Enables an alarm and sends an email notification if an inserted power supply cord doesn’t have power, as opposed to a power supply slot with no power supply cord inserted. By default, this alarm is enabled.

• SSD Write Cycle Level Exceeded—Enables an alarm if the accumulated SSD write cycles exceed a predefined write cycle 95 percent level on SteelHead models 7050‑L and 7050-M. If the alarm is triggered, the administrator can swap out the disk before any problems arise.

Enables an alarm and sends an email notification if the inbound QoS WAN bandwidth for one or more of the interfaces is set incorrectly. You must configure the WAN bandwidth to be less than or equal to the interface bandwidth link rate.

• An interface is connected and the WAN bandwidth is set higher than its bandwidth link rate: for example, if the bandwidth link rate is 1536 kbps, and the WAN bandwidth is set to 2000 kbps.

• A nonzero WAN bandwidth is set and QoS is enabled on an interface that is disconnected; that is, the bandwidth link rate is 0.

• A previously disconnected interface is reconnected, and its previously configured WAN bandwidth was set higher than the bandwidth link rate. The Management Console refreshes the alarm message to inform you that the configured WAN bandwidth is set higher than the interface bandwidth link rate.

The alarm clears when you configure the WAN bandwidth to be less than or equal to the bandwidth link rate or reconnect an interface configured with the correct WAN bandwidth.

• Appliance Unlicensed—This alarm triggers if the SteelHead does not have a license installed for its currently configured model. For details about updating licenses, see Managing licenses and model upgrades.

• Autolicense Critical Event—This alarm triggers on a SteelHead (Virtual Edition) appliance when the Riverbed Licensing Portal can’t respond to a license request with valid licenses. The Licensing Portal can’t issue a valid license for one of these reasons:

– A newer SteelHead (Virtual Edition) appliance is already using the token, so you can’t use it on the SteelHead (Virtual Edition) appliance displaying the critical alarm. Every time the SteelHead (Virtual Edition) appliance attempts to refetch a license token, the alarm retriggers.

• Autolicense Informational Event—This alarm triggers if the Riverbed Licensing Portal has information regarding the licenses for a SteelHead (Virtual Edition) appliance. For example, the SteelHead (Virtual Edition) appliance displays this alarm when the portal returns licenses that are associated with a token that has been used on a different SteelHead (Virtual Edition) appliance.

• Licenses Expired—This alarm triggers if one or more features has at least one license installed, but all of them are expired.

• Licenses Expiring—This alarm triggers if the license for one or more features is going to expire within two weeks.

The licenses expiring and licenses expired alarms are triggered per feature. For example: if you install two license keys for a feature, LK1-FOO-xxx (expired) and LK1-FOO-yyy (not expired), the alarms don’t trigger, because the feature has one valid license.

Enables an alarm and sends an email notification when an interface was not configured for half-duplex negotiation but has negotiated half-duplex mode. Half-duplex significantly limits the optimization service results.

You can enable or disable the alarm for a specific interface. To enable or disable an alarm, choose Administration > System Settings: Alarms and select or clear the check box next to the link name.

Enables an alarm and sends an email notification when the link error rate exceeds 0.1 percent while either sending or receiving packets. This threshold is based on the observation that even a small link error rate reduces TCP throughput significantly. A properly configured LAN connection experiences very few errors.

You can change the default alarm thresholds by entering the alarm link_io_errors err-threshold <threshold-value> CLI command at the system prompt. For details, see the Riverbed Command-Line Interface Reference Manual.

You can enable or disable the alarm for a specific interface. For example, you can disable the alarm for a link after deciding to tolerate the errors. To enable or disable an alarm, choose Administration > System Settings: Alarms and select or clear the check box next to the link name.

Enables an alarm and sends an email notification if an Ethernet link is lost due to an unplugged cable or dead switch port. Depending on which link is down, the system might no longer be optimizing and a network outage could occur.

This condition is often caused by surrounding devices, like routers or switches, interface transitioning. This alarm also accompanies service or system restarts on the SteelHead.

You can enable or disable the alarm for a specific interface. To enable or disable an alarm, choose Administration > System Settings: Alarms and select or clear the check box next to the link name.

Enables an alarm and sends an email notification if memory paging is detected. If 100 pages are swapped every couple of hours, the system is functioning properly. If thousands of pages are swapped every few minutes, contact Support at

Enables an alarm and sends an email notification if the SteelHead detects that either NFSv2 or NFSv4 is in use. The SteelHead only supports NFSv3 and passes through all other versions.

• Internal Error—Enables an alarm and sends an email notification if the RiOS optimization service encounters a condition that might degrade optimization performance. By default, this alarm is enabled. Go to the Administration > Maintenance: Services page and restart the optimization service.

• Service Status—Enables an alarm and sends an email notification if the RiOS optimization service encounters a service condition. By default, this alarm is enabled. The message indicates the reason for the condition. These conditions trigger this alarm:

• Unexpected Halt—Enables an alarm and sends an email notification if the RiOS optimization service halts due to a serious software error. By default, this alarm is enabled.

Enables an alarm and sends an email notification if the outbound QoS WAN bandwidth for one or more of the interfaces is set incorrectly. You must configure the WAN bandwidth to be less than or equal to the interface bandwidth link rate.

• An interface is connected and the WAN bandwidth is set to higher than its bandwidth link rate: for example, if the bandwidth link rate is 100 Mbps, and the WAN bandwidth is set to 200 Mbps.

• A nonzero WAN bandwidth is set and QoS is enabled on an interface that is disconnected; that is, the bandwidth link rate is 0.

The alarm clears when you configure the WAN bandwidth to be less than or equal to the bandwidth link rate or reconnect an interface configured with the correct WAN bandwidth.

Enables an alarm and sends an email notification if the system detects that one of the predefined uplinks for a connection is unavailable. The uplink has exceeded either the timeout value for uplink latency or the threshold for observed packet loss.

When an uplink fails, the SteelHead directs traffic through another available uplink. When the original uplink comes back up, the SteelHead redirects the traffic back to it.

Enables an alarm and sends an email notification if a path selection monitoring probe for a predefined uplink has received a probe response from an unexpected relay or interface.

Enables an alarm and sends an email notification if the system detects an error while trying to create a process dump. This alarm indicates an abnormal condition where RiOS can’t collect the core file after three retries. It can be caused when the /var directory is reaching capacity or other conditions. When the alarm is raised, the directory is blacklisted.

• Proxy File Service Configuration—Indicates that a configuration attempt has failed. If the system detects a configuration failure, attempt the configuration again.

• Proxy File Service Operation—Indicates that a synchronization operation has failed. If the system detects an operation failure, attempt the operation again.

Enables an alarm and sends an email notification if a peer SteelHead encounters a problem with the secure transport controller connection. The secure transport controller is a SteelHead that typically resides in the data center and manages the control channel and operations required for secure transport between SteelHead peers. The control channel uses SSL to secure the connection between the peer SteelHead and the secure transport controller.

• Connection with Controller Lost—Indicates that the peer SteelHead is no longer connected to the secure transport controller because:

• Registration with Controller Unsuccessful—Indicates that the peer SteelHead isn’t registered with the secure transport controller, and the controller doesn’t recognize it as a member of the secure transport group.

• Secure Vault Locked—Indicates that the secure vault is locked. To optimize SSL connections or to use RiOS data store encryption, the secure vault must be unlocked. Go to Administration > Security: Secure Vault and unlock the secure vault.

• Secure Vault New Password Recommended—Indicates that the secure vault requires a new, nondefault password. Reenter the password.

• Secure Vault Not Initialized—Indicates that an error has occurred while initializing the secure vault. When the vault is locked, SSL traffic isn’t optimized and you can’t encrypt the RiOS data store. For details, see Unlocking the secure vault.

• Peer Mismatch—Needs Attention. Indicates that the appliance has encountered another appliance that is running an incompatible version of system software. Refer to the CLI, Management Console, or the SNMP peer table to determine which appliance is causing the conflict. Connections with that peer will not be optimized, connections with other peers running compatible RiOS versions are unaffected. To resolve the problem, upgrade your system software. No other action is required as the alarm automatically clears.

• Software Version Mismatch—Degraded. Indicates that the appliance is running an incompatible version of system software. To resolve the problem, upgrade your system software. No other action is required as the alarm automatically clears.

• Non-443 SSL Servers—Indicates that during a RiOS upgrade (for example, from 8.5 to 9.0), the system has detected a preexisting SSL server certificate configuration on a port other than the default SSL port 443. SSL traffic might not be optimized. To restore SSL optimization, you can add an in-path rule to the client-side SteelHead to intercept the connection and optimize the SSL traffic on the nondefault SSL server port.

• SSL Certificates Error (SSL CAs)—Indicates that an SSL peering certificate has failed to reenroll automatically within the Simple Certificate Enrollment Protocol (SCEP) polling interval.

• SSL Certificates Error (SSL Peering CAs)—Indicates that an SSL peering certificate has failed to re-enroll automatically within the SCEP polling interval.

• SSL Certificates Expiring—Indicates that an SSL certificate is about to expire.

Two types of certificates can trigger this alarm: Certificate Authority certificates used to validate servers and SSL Server Certificates that the SteelHead uses when acting as a trusted man in the middle. Depending on the type of certificate, you can review the expiring certificates on the Optimization: SSL > Certificate Authorities page or the Optimization: SSL > SSL Main Settings page. (The alarm only redirects you to the Certificate Authorities page, but you might need to review the SSL Main Settings page for your certificate.)

• SSL Certificates SCEP—Indicates that an SSL certificate has failed to re-enroll automatically within the SCEP polling interval.

• SSL HSM private key not accessible—Indicates that the server-side SteelHead can’t import the private key corresponding to the proxy certificate from a SafeNet Luna Hardware Security Module (HSM) server. The private key is necessary to establish mutual trust between the SteelHead and the HSM for proxied SSL traffic optimization. Check that the server-side SteelHead can access the HSM device and that the private key exists on the HSM server. For details, see the Riverbed Command-Line Interface Reference Manual.

Enables an alarm when an error occurs while repartitioning the disk drives during a storage profile switch. A profile switch changes the disk space allocation on the drives, clears the SteelFusion and VSP data stores, and repartitions the data stores to the appropriate sizes.

You switch a storage profile by entering the disk-config layout CLI command at the system prompt or by choosing Administration > System Settings: Disk Management on an EX or EX+SteelFusion SteelHead.

• Critical Temperature—Enables an alarm and sends an email notification if the CPU temperature exceeds the rising threshold. When the CPU returns to the reset threshold, the critical alarm is cleared. The default value for the rising threshold temperature is 70°C; the default reset threshold temperature is 67°C.

• Warning Temperature—Enables an alarm and sends an email notification if the CPU temperature approaches the rising threshold. When the CPU returns to the reset threshold, the warning alarm is cleared.