SteelHead MIB : Accessing the SteelHead Enterprise MIB
  
Accessing the SteelHead Enterprise MIB
The SteelHead enterprise MIB monitors device status and peers. It provides network statistics for seamless integration into network management systems such as Hewlett Packard OpenView Network Node Manager, PRTG, and other SNMP browser tools.
For details on configuring and using these network monitoring tools, consult their product documentation.
The following guidelines describe how to download and access the SteelHead enterprise MIB using common MIB browsing utilities:
•  You can download the SteelHead enterprise MIB file (STEELHEAD-EX-MIB.txt) from the Help page of the Management Console or from the Riverbed Support site at https://support.riverbed.com and load it into any MIB browser utility.
•  Some utilities might expect a file type other than a text file. If this occurs, change the file extension to the type required by the utility you have chosen.
•  Some utilities assume that the root is mib-2 by default. If the utility sees a new node, such as enterprises, it might look under mib-2.enterprises. If this occurs, use .iso.org.dod.internet.private.enterprises.rbt as the root.
•  Some command-line browsers might not load all MIB files by default. If this occurs, find the appropriate command option to load the STEELHEAD-EX-MIB.txt file: for example, for NET-SNMP browsers, snmpwalk -m all.
Different OID Branches for Steelhead EX Appliances
The STEELHEAD-MIB and STEELHEAD-EX-MIB are different branches, both of which import RBT-MIB (that is, it has to be loaded as a dependency). You must modify the polling tool to gather statistics from a Steelhead EX appliance.
For example, a Steelhead EX OID is a statistics branch of the STEELHEAD-MIB.
•  .1.3.6.1.4.1.17163.1 = RBT-MIB
•  .1.3.6.1.4.1.17163.1.1 = STEELHEAD-MIB
•  .1.3.6.1.4.1.17163.1.1.5 = Statistics branch of STEELHEAD-MIB
•  .1.3.6.1.4.1.17163.1.51 = STEELHEAD-EX-MIB
•  .1.3.6.1.4.1.17163.1.51.5 = Statistics branch of STEELHEAD-EX-MIB
Retrieving Optimized Traffic Statistics by Port
When you perform an snmpwalk on the SteelHead MIB object bwPortTable to display a table of statistics for optimized traffic by port, the command retrieves only the monitored ports. The monitored ports include the default TCP ports and any ports you add. To view the monitored ports that this object returns, choose System Settings > Monitored Ports or enter the following CLI command at the system prompt:
show stats settings bandwidth ports
To retrieve statistics for an individual port, perform an smnpget for that port, as in the following example:
.iso.org.dod.internet.private.enterprises.rbt.products.steelhead.statistics.bandwidth.
bandwidthPerPort.bwPort Table.bwPortEntry.bwPortOutLan.port_number
SNMP Traps
Every SteelHead supports SNMP traps and email alerts for conditions that require attention or intervention. An alarm triggers for most, but not every, event, and the related trap is sent. For most events, when the condition clears, the system clears the alarm and also sends a clear trap. The clear traps are useful in determining when an event has been resolved.
This section describes the SNMP traps. It doesn’t list the corresponding clear traps.
You can view SteelHead health at the top of each Management Console page, by entering the CLI show info command, and through SNMP (health, systemHealth).
The SteelHead tracks key hardware and software metrics and alerts you of any potential problems so that you can quickly discover and diagnose issues. The health of an appliance falls into one of the following states:
•  Healthy - The SteelHead is functioning and optimizing traffic.
•  Needs Attention - Accompanies a healthy state to indicate management-related issues not affecting the ability of the SteelHead to optimize traffic.
•  Degraded - The SteelHead is optimizing traffic but the system has detected an issue.
•  Admission Control - The SteelHead is optimizing traffic but has reached its connection limit.
•  Critical - The SteelHead might or might not be optimizing traffic; you must address a critical issue.
The following table summarizes the SNMP traps sent from the system to configured trap receivers and their effect on the SteelHead health state.
Trap and OID
SteelHead State
 
Text
Description
procCrash
(enterprises.17163.1.51.4.0.1)
Healthy
 
A procCrash trap signifies that a process managed by PM has crashed and left a core file. The variable sent with the notification indicates which process crashed.
A process has crashed and subsequently been restarted by the system. The trap contains the name of the process that crashed. A system snapshot associated with this crash has been created on the appliance and is accessible via the CLI or the Management Console. Riverbed Support might need this information to determine the cause of the crash. No other action is required on the appliance as the crashed process is automatically restarted.
procExit
(enterprises.17163.1.51.4.0.2)
Healthy
A procExit trap signifies that a process managed by PM has exited unexpectedly, but not left a core file. The variable sent with the notification indicates which process exited.
A process has unexpectedly exited and been restarted by the system. The trap contains the name of the process. The process might have exited automatically or due to other process failures on the appliance. Review the release notes for known issues related to this process exit. If none exist, contact Riverbed Support to determine the cause of this event. No other action is required on the appliance as the crashed process is automatically restarted.
cpuUtil
(enterprises.17163.1.51.4.0.3)
Degraded
The average CPU utilization in the past minute has gone above the acceptable threshold.
Average CPU utilization has exceeded an acceptable threshold. If CPU utilization spikes are frequent, it might be because the system is undersized. Sustained CPU load can be symptomatic of more serious issues. Consult the CPU Utilization report to gauge how long the system has been loaded and also monitor the amount of traffic currently going through the appliance. A one-time spike in CPU is normal but we recommend reporting extended high CPU utilization to Riverbed Support. No other action is necessary as the alarm clears automatically.
pagingActivity
(enterprises.17163.1.51.4.0.4)
 
Degraded
The system has been paging excessively (thrashing).
The system is running low on memory and has begun swapping memory pages to disk. This event can be triggered during a software upgrade while the optimization service is still running but there can be other causes. If this event triggers at any other time, generate a debug sysdump and send it to Riverbed Support. No other action is required as the alarm clears automatically.
smartError
(enterprises.17163.1.51.4.0.5)
 
N/A
This alarm is deprecated.
N/A
peerVersionMismatch
(enterprises.17163.1.51.4.0.6)
Degraded
Detected a peer with a mismatched software version.
The appliance has encountered another appliance which is running an incompatible version of system software. Refer to the CLI, Management Console, or the SNMP peer table to determine which appliance is causing the conflict. Connections with that peer will not be optimized, connections with other peers running compatible RiOS versions are unaffected. To resolve the problem, upgrade your system software. No other action is required as the alarm clears automatically.
bypassMode
(enterprises.17163.1.51.4.0.7)
Critical
The appliance has entered bypass (failthru) mode.
The appliance has entered bypass mode and is now passing through all traffic unoptimized. This error is generated if the optimization service locks up or crashes. It can also be generated when the system is first powered on or powered off. If this trap is generated on a system that was previously optimizing and is still running, contact Riverbed Support.
raidError
(enterprises.17163.1.51.4.0.8)
Deprecated
An error has been generated by the RAID array.
A drive has failed in a RAID array. Consult the CLI or Management Console to determine the location of the failed drive. Contact Riverbed Support for assistance with installing a new drive, a RAID rebuild, or drive reseating. The appliance continues to optimize during this event. After the error is corrected, the alarm clears automatically.
Note: Applicable to models 3010, 3510, 3020, 3520, 5010, 5520, 6020, and 6120 only.
storeCorruption
(enterprises.17163.1.51.4.0.9)
Critical
The data store is corrupted.
Indicates that the RiOS data store is corrupt or has become incompatible with the current configuration. To clear the RiOS data store of data, choose Administration > Maintenance: Services, select Clear Data Store, and click Restart to restart the optimization service.
If the alarm was triggered by an unintended change to the configuration, change the configuration to match the previous RiOS data store settings. Then restart the optimization service without clearing the data store to reset the alarm.
Typical configuration changes that require an optimization restart with a clear RiOS data store are enabling enhanced peering or changing the data store encryption.
admissionMemError
(enterprises.17163.1.51.4.0.10)
Admission Control
Admission control memory alarm has been triggered.
The appliance has entered admission control due to memory consumption. The appliance is optimizing traffic beyond its rated capability and is unable to handle the amount of traffic passing through the WAN link. During this event, the appliance continues to optimize existing connections, but new connections are passed through without optimization. No other action is necessary as the alarm clears automatically when the traffic has decreased.
admissionConnError
(enterprises.17163.1.51.4.0.11)
Admission Control
Admission control connections alarm has been triggered.
The appliance has entered admission control due to the number of connections and is unable to handle the amount of connections going over the WAN link. During this event, the appliance continues to optimize existing connections, but new connections are passed through without optimization. No other action is necessary as the alarm clears automatically when the traffic has decreased.
haltError
(enterprises.17163.1.51.4.0.12)
Critical
The service is halted due to a software error.
The optimization service has halted due to a serious software error. See if a core dump or a system dump was created. If so, retrieve and contact Riverbed Support immediately.
serviceError
(enterprises.17163.1.51.4.0.13)
Degraded
There has been a service error. Please consult the log file.
The optimization service has encountered a condition which might degrade optimization performance. Consult the system log for more information. No other action is necessary.
scheduledJobError
(enterprises.17163.1.51.4.0.14)
Healthy
A scheduled job has failed during execution.
A scheduled job on the system (for example, a software upgrade) has failed. To determine which job failed, use the CLI or the Management Console.
confModeEnter
(enterprises.17163.1.51.4.0.15)
Healthy
A user has entered configuration mode.
A user on the system has entered a configuration mode from either the CLI or the Management Console. A log in to the Management Console by user admin sends this trap as well. This is for notification purposes only; no other action is necessary.
confModeExit
(enterprises.17163.1.51.4.0.16)
Healthy
A user has exited configuration mode.
A user on the system has exited configuration mode from either the CLI or the Management Console. A log out of the Management Console by user admin sends this trap as well. This is for notification purposes only; no other action is necessary.
linkError
(enterprises.17163.1.51.4.0.17)
Degraded
An interface on the appliance has lost its link.
The system has lost one of its Ethernet links, typically due to an unplugged cable or dead switch port. Check the physical connectivity between the SteelHead and its neighbor device. Investigate this alarm as soon as possible. Depending on what link is down, the system might no longer be optimizing and a network outage could occur.
This is often caused by surrounding devices, like routers or switches interface transitioning. This alarm also accompanies service or system restarts on the SteelHead.
nfsV2V4
(enterprises.17163.1.51.4.0.18)
Degraded
NFS v2/v4 alarm notification.
The SteelHead has detected that either NFSv2 or NFSv4 is in use. The SteelHead only supports NFSv3 and passes through all other versions. Check that the clients and servers are using NFSv3 and reconfigure if necessary.
powerSupplyError
(enterprises.17163.1.51.4.0.19)
Degraded
A power supply on the appliance has failed (not supported on all models).
A redundant power supply on the appliance has failed on the appliance and needs to be replaced. Contact Riverbed Support for an RMA replacement as soon as practically possible.
asymRouteError
(enterprises.17163.1.51.4.0.20)
Needs Attention
Asymmetric routes detected, certain connections might not be optimized because of this.
Asymmetric routing has been detected on the network. This is likely due to a failover event of an inner router or VPN. If so, no action needs to be taken. If not, contact Riverbed Support for further troubleshooting assistance.
fanError
(enterprises.17163.1.51.4.0.21)
Degraded
A fan has failed on this appliance (not supported on all models).
A fan is failing or has failed and needs to be replaced. Contact Riverbed Support for an RMA replacement as soon practically possible.
memoryError
(enterprises.17163.1.51.4.0.22)
Degraded
A memory error has been detected on the appliance (not supported on all models).
A memory error has been detected. A system memory stick might be failing. Try reseating the memory first. If the problem persists, contact Riverbed Support for an RMA replacement as soon as practically possible.
ipmi
(enterprises.17163.1.51.4.0.23)
Degraded
An IPMI event has been detected on the appliance. Please check the details in the alarm report on the Web UI (not supported on all models).
An Intelligent Platform Management Interface (IPMI) event has been detected. Check the Alarm Status page for more detail. You can also view the IPMI events on the SteelHead, by entering the CLI command:
show hardware error-log all
configChange
(enterprises.17163.1.51.4.0.24)
Healthy
A change has been made to the system configuration.
A configuration change has been detected. Check the log files around the time of this trap to determine what changes were made and whether they were authorized.
datastoreWrapped
(enterprises.17163.1.51.4.0.25)
Healthy
The datastore has wrapped around.
The RiOS data store on the SteelHead went through an entire cycle and is removing data to make space for new data. This is normal behavior unless it wraps too quickly, which might indicate that the RiOS data store is undersized. If a message is received every seven days or less, investigate traffic patterns and RiOS data store sizing.
temperatureWarning
(enterprises.17163.1.51.4.0.26)
Degraded
The system temperature has exceeded the threshold.
The appliance temperature is a configurable notification. By default, this notification is set to trigger when the appliance reached 70 degrees Celsius. Raise the alarm trigger temperature if it is normal for the SteelHead to get that hot, or reduce the temperature of the SteelHead.
temperatureCritical
(enterprises.17163.1.51.4.0.27)
Critical
The system temperature has reached a critical stage.
This trap/alarm triggers a critical state on the appliance. This alarm occurs when the appliance temperature reaches 90 degrees Celsius. The temperature value isn’t user-configurable. Reduce the appliance temperature.
cfConnFailure
(enterprises.17163.1.51.4.0.28)
Degraded
Unable to establish connection with the specified neighbor.
The connection can’t be established with a connection-forwarding neighbor. This alarm clears automatically the next time all neighbors connect successfully.
cfConnLostEos
(enterprises.17163.1.51.4.0.29)
Degraded
Connection lost since end of stream was received from the specified neighbor.
The connection has been closed by the connection-forwarding neighbor. This alarm clears automatically the next time all neighbors connect successfully.
cfConnLostErr
(enterprises.17163.1.51.4.0.30)
Degraded
Connection lost due to an error communicating with the specified neighbor.
The connection has been lost with the connection-forwarding neighbor due to an error. This alarm clears automatically the next time all neighbors connect successfully.
cfKeepaliveTimeout
(enterprises.17163.1.51.4.0.31)
Degraded
Connection lost due to lack of keep-alives from the specified neighbor.
The connection-forwarding neighbor has not responded to a keep-alive message within the time-out period, indicating that the connection has been lost. This alarm clears automatically when all neighbors of the SteelHead are responding to keep-alive messages within the time-out period.
cfAckTimeout
(enterprises.17163.1.51.4.0.32)
Degraded
Connection lost due to lack of ACKs from the specified neighbor.
The connection has been lost because requests have not been acknowledged by a connection-forwarding neighbor within the set time-out threshold. This alarm clears automatically the next time all neighbors receive an ACK from this neighbor and the latency of that acknowledgment is less than the set time-out threshold.
cfReadInfoTimeout
(enterprises.17163.1.51.4.0.33)
Degraded
Timeout reading info from the specified neighbor.
The SteelHead has timed out while waiting for an initialization message from the connection-forwarding neighbor. This alarm clears automatically when the SteelHead is able to read the initialization message from all of its neighbors.
cfLatencyExceeded
(enterprises.17163.1.51.4.0.34)
Degraded
Connection forwarding latency with the specified neighbor has exceeded the threshold.
The amount of latency between connection-forwarding neighbors has exceeded the specified threshold. The alarm clears automatically when the latency falls below the specified threshold.
sslPeeringSCEPAutoReenrollError
(enterprises.17163.1.51.4.0.35)
Needs Attention
There is an error in the automatic re-enrollment of the SSL peering certificate.
An SSL peering certificate has failed to re-enroll with the Simple Certificate Enrollment Protocol (SCEP).
crlError
(enterprises.17163.1.51.4.0.36)
Needs Attention
CRL polling fails.
The polling for SSL peering CAs has failed to update the Certificate Revocation List (CRL) within the specified polling period. This alarm clears automatically when the CRL is updated.
datastoreSyncFailure
(enterprises.17163.1.51.4.0.37)
Degraded
Data store sync has failed.
The RiOS data store synchronization between two SteelHeads has been disrupted and the RiOS data stores are no longer synchronized.
secureVaultNeedsUnlock
(enterprises.17163.1.51.4.0.38)
Needs Attention
SSL acceleration and the secure data store can’t be used until the secure vault has been unlocked.
The secure vault is locked. SSL traffic isn’t being optimized and the RiOS data store can’t be encrypted. Check the Alarm Status page for more details. The alarm clears when the secure vault is unlocked.
secureVaultNeedsRekey
(enterprises.17163.1.51.4.0.39)
Needs Attention
If you wish to use a nondefault password for the secure vault, the password must be rekeyed.
The secure vault password needs to be verified or reset. Initially, the secure vault has a default password known only to the RiOS software so the SteelHead can automatically unlock the vault during system startup.
For details, check the Alarm Status page.
The alarm clears when you verify the default password or reset the password.
secureVaultInitError
(enterprises.17163.1.51.4.0.40)
Critical
An error was detected while initializing the secure vault. Please contact Riverbed Support.
An error occurred while initializing the secure vault after a RiOS software version upgrade. Contact Riverbed Support.
configSave
(enterprises.17163.1.51.4.0.41)
Healthy
The current appliance configuration has been saved.
A configuration has been saved either by entering the
write memory
 
CLI command or by clicking Save to Disk in the Management Console. This message is for security notification purposes only; no other action is necessary.
tcpDumpStarted
(enterprises.17163.1.51.4.0.42)
Healthy
A TCP dump has been started.
A user has started a TCP dump on the SteelHead by entering a
tcpdump
 
or
tcpdump-x
 
command from the CLI. This message is for security notification purposes only; no other action is necessary.
tcpDumpScheduled
(enterprises.17163.1.51.4.0.43)
Healthy
A TCP dump has been scheduled.
A user has started a TCP dump on the SteelHead by entering a
tcpdump
 
or
tcpdump-x
 
command with a scheduled start time from the CLI. This message is for security notification purposes only; no other action is necessary.
newUserCreated
(enterprises.17163.1.51.4.0.44)
Healthy
A new user has been created.
A new role-based management user has been created using the CLI or the Management Console. This message is for security notification purposes only; no other action is necessary.
diskError
(enterprises.17163.1.51.4.0.45)
Degraded
Disk error has been detected.
A disk error has been detected. A disk might be failing. Try reseating the memory first. If the problem persists, contact Riverbed Support.
wearWarning
(enterprises.17163.1.51.4.0.46)
Degraded
Accumulated SSD write cycles passed predefined level.
Triggers on SteelHead models using Solid State Disks (SSDs).
An SSD has reached 95 percent of its write cycle limit. Contact Riverbed Support.
cliUserLogin
(enterprises.17163.1.51.4.0.47)
Healthy
A user has just logged-in via CLI.
A user has logged in to the SteelHead using the command-line interface. This message is for security notification purposes only; no other action is necessary.
cliUserLogout
(enterprises.17163.1.51.4.0.48)
Healthy
A CLI user has just logged-out.
A user has logged out of the SteelHead using the command-line interface using the Quit command or ^D. This message is for security notification purposes only; no other action is necessary.
webUserLogin
(enterprises.17163.1.51.4.0.49)
Healthy
A user has just logged-in via the Web UI.
A user has logged in to the SteelHead using the Management Console. This message is for security notification purposes only; no other action is necessary.
webUserLogout
(enterprises.17163.1.51.4.0.50)
Healthy
A user has just logged-out via the Web UI.
A user has logged out of the SteelHead using the Management Console. This message is for security notification purposes only; no other action is necessary.
trapTest
(enterprises.17163.1.51.4.0.51)
Healthy
Trap Test
An SNMP trap test has occurred on the SteelHead. This message is informational and no action is necessary.
admissionCpuError
(enterprises.17163.1.51.4.0.52)
 
Admission Control
Optimization service is experiencing high CPU utilization.
The appliance has entered admission control due to high CPU use. During this event, the appliance continues to optimize existing connections, but new connections are passed through without optimization. No other action is necessary as the alarm clears automatically when the CPU usage has decreased.
admissionTcpError
(enterprises.17163.1.51.4.0.53)
Admission Control
Optimization service is experiencing high TCP memory pressure.
The appliance has entered admission control due to high TCP memory use. During this event, the appliance continues to optimize existing connections, but new connections are passed through without optimization. No other action is necessary as the alarm clears automatically when the TCP memory pressure has decreased.
systemDiskFullError
(enterprises.17163.1.51.4.0.54)
Degraded
One or more system partitions is full or almost full.
The alarm clears when the system partitions fall below usage thresholds.
domainJoinError
(enterprises.17163.1.51.4.0.55)
Degraded
An attempt to join a domain failed.
An attempt to join a Windows domain has failed.
The number one cause of failing to join a domain is a significant difference in the system time on the Windows domain controller and the SteelHead. When the time on the domain controller and the SteelHead don’t match, this error message appears:
lt-kinit: krb5_get_init_creds: Clock skew too great
 
We recommend using NTP time synchronization to synchronize the client and server clocks. It is critical that the SteelHead time is the same as the time on the Active Directory controller. Sometimes an NTP server is down or inaccessible, in which case there can be a time difference. You can also disable NTP if it isn’t being used and manually set the time. You must also verify that the time zone is correct.
A domain join can fail when the DNS server returns an invalid IP address for the domain controller. When a DNS misconfiguration occurs during an attempt to join a domain, these error messages appear:
Failed to join domain: failed to find DC for domain <domain name>
Failed to join domain : No Logon Servers
 
Additionally, the domain join alarm triggers and messages similar to the following appear in the logs:
Oct 13 14:47:06 bravo-sh81 rcud[10014]: [rcud/main/.ERR] - {- -} Failed to join domain: failed to find DC for domain GEN-VCS78DOM.COM
 
When you encounter this error, go to the Networking > Networking: Host Settings page and verify that the DNS settings are correct.
To verify the time settings, go to the Administration > System Settings: Date/Time page.
certsExpiringError
(enterprises.17163.1.51.4.0.56)
Needs Attention
Some x509 certificates may be expiring.
The service has detected some x.509 certificates used for Network Administration Access to the SteelHead that are close to their expiration dates. The alarm clears when the x.509 certificates are updated.
licenseError
(enterprises.17163.1.51.4.0.57)
Critical
The main SteelHead license has expired, been removed, or become invalid.
A license on the SteelHead has been removed, has expired, or is invalid. The alarm clears when a valid license is added or updated.
hardwareError
(enterprises.17163.1.51.4.0.58)
Either Critical or Degraded, depending on the state
Hardware error detected.
Indicates that the system has detected a problem with the SteelHead hardware. These issues trigger the hardware error alarm:
•  the SteelHead doesn’t have enough disk, memory, CPU cores, or NIC cards to support the current configuration
•  the SteelHead is using a memory Dual In-line Memory Module (DIMM), a hard disk, or a NIC that isn’t qualified by Riverbed
•  a VSP upgrade requires additional memory or a memory replacement
•  other hardware issues
The alarm clears when you add the necessary hardware, remove the unqualified hardware, or resolve other hardware issues.
sysdetailError
(enterprises.17163.1.51.4.0.59)
Needs Attention
Error is found in System Detail Report.
A top-level module on the system detail report is in error. For details, choose Reports > Diagnostics: System Details.
admissionMapiError
(enterprises.17163.1.51.4.0.60)
Degraded
New MAPI connections will be passed through due to high connection count.
The total number of MAPI optimized connections have exceeded the maximum admission control threshold. By default, the maximum admission control threshold is 85 percent of the total maximum optimized connection count for the client-side SteelHead. The SteelHead reserves the remaining 15 percent so the MAPI admission control doesn’t affect the other protocols. The 85 percent threshold is applied only to MAPI connections.
RiOS is now passing through MAPI connections from new clients but continues to intercept and optimize MAPI connections from existing clients (including new MAPI connections from these clients).
RiOS continues optimizing non-MAPI connections from all clients.
This alarm is disabled by default.
The alarm clears automatically when the MAPI traffic has decreased; however, it can take one minute for the alarm to clear.
RiOS pre-emptively closes MAPI sessions to reduce the connection count in an attempt to bring the SteelHead out of admission control by bringing the connection count below the 85 percent threshold. RiOS closes the MAPI sessions in this order:
•  MAPI prepopulation connections
•  MAPI sessions with the largest number of connections
•  MAPI sessions with most idle connections
•  The oldest MAPI session
•  MAPI sessions exceeding the memory threshold
Note: MAPI admission control can’t solve a general SteelHead Admission Control Error (enterprises.17163.5.1.4.0.11); however, it can help to prevent it from occurring.
neighborIncompatibility
(enterprises.17163.1.51.4.0.61)
Degraded
Serial cascade misconfiguration has been detected.
Check your automatic peering configuration. Restart the optimization service to clear the alarm.
flashError
(enterprises.17163.1.51.4.0.62)
Needs Attention
Flash hardware error detected.
At times, the USB flash drive that holds the system images might become unresponsive; the SteelHead continues to function normally. When this alarm triggers, you can’t perform a software upgrade, as the system is unable to write a new upgrade image to the flash drive without first power cycling the system.
To reboot the appliance, go to the Administration > Maintenance: Reboot/Shutdown page or enter the CLI reload command to automatically power cycle the SteelHead and restore the flash drive to its proper function.
 
lanWanLoopError
(enterprises.17163.1.51.4.0.63)
Critical
LAN-WAN loop detected. System will not optimize new connections until this error is cleared.
A LAN-WAN network loop has been detected between the LAN and WAN interfaces on a SteelHead (virtual edition). This can occur when you connect the LAN and WAN virtual NICs to the same vSwitch or physical NIC. This alarm triggers when a SteelHead (virtual edition) starts up, and clears after you connect each LAN and WAN virtual interface to a distinct virtual switch and physical NIC (through the vSphere Networking tab) and then reboot the SteelHead (virtual edition).
optimizationServiceStatusError
(enterprises.17163.1.51.4.0.64)
Critical
Optimization service currently not optimizing any connections.
The optimization service has encountered an optimization service condition. The message indicates the reason for the condition:
•  optimization service isn’t running
This message appears after a configuration file error. For more information, review the SteelHead logs.
•  in-path optimization isn’t enabled
This message appears if an in-path setting is disabled for an in-path SteelHead. For more information, review the SteelHead logs.
•  optimization service is initializing
This message appears after a reboot. The alarm clears on its own; no other action is necessary. For more information, review the SteelHead logs.
•  optimization service isn’t optimizing
This message appears after a system crash. For more information, review the SteelHead logs.
•  optimization service is disabled by user
This message appears after entering the CLI command no service enable or shutting down the optimization service from the Management Console. For more information, review the SteelHead logs.
•  optimization service is restarted by user
This message appears after the optimization service is restarted from either the CLI or Management Console. You might want to review the SteelHead logs for more information.
upgradeFailure
(enterprises.17163.1.51.4.0.65)
Needs attention
Upgrade failed and the system is running the previous image.
A RiOS upgrade has failed and the SteelHead is running the previous RiOS version. Check the banner message in the Management Console to view more information. The banner message displays which upgrade failed along with the RiOS version the SteelHead has reverted to and is currently running.
Check that the upgrade image is correct for your SteelHead.
Verify that the upgrade image isn’t corrupt. You can use the MD5 checksum tool provided on the Riverbed Support site for the verification.
After you have confirmed that the image isn’t corrupt, upgrade the RiOS software again. If the upgrade continues to fail, contact Riverbed Support.
licenseExpiring
(enterprises.17163.1.51.4.0.66)
Needs Attention
One or more licensed features will expire within the next two weeks.
Choose Administration > Maintenance: Licenses and look at the Status column to see which licenses are about to expire. One or more feature licenses are scheduled to expire within two weeks.
This alarm is triggered per feature. Suppose you installed two license keys for a feature, LK1-FOO-xxx, which is going to expire in two weeks, and LK1-FOO-yyy, which isn’t expired. Because one license for the feature is valid, the alarm doesn’t trigger.
licenseExpired
(enterprises.17163.1.51.4.0.67)
Degraded
One or more licensed features have expired.
Choose Administration > Maintenance: Licenses and look at the Status column to see which licenses have expired. One or more feature licenses have expired.
This alarm is triggered per feature. Suppose you installed two license keys for a feature, LK1-FOO-xxx (expired), and LK1-FOO-yyy (not expired). Because one license for the feature is valid, the alarm doesn’t trigger.
clusterDisconnectedSHAlertError
(enterprises.17163.1.51.4.0.68)
Degraded
A cluster SteelHead has been reported as disconnected.
Choose Networking > Network Integration: Connection Forwarding and verify the configuration for both this SteelHead and the neighbor SteelHead. Verify that the neighbor is reachable from this SteelHead.
Next, check that the optimization service is running on both SteelHeads.
This error clears when the configuration is valid.
smbAlert
(enterprises.17163.1.51.4.0.69)
Needs Attention
Domain authentication alert.
The optimization service has detected a failure with domain controller communication or a delegate user.
Confirm that the SteelHead residing in the data center is properly joined to the domain by choosing Networking > Windows Domain.
To view useful debugging information in RiOS 7.0 or later, enter the CLI commands
show protocol domain-auth test join
 
show alarm smb_alert
 
Verify that a delegate user has been added to the SteelHead and is configured with the appropriate privileges.
linkDuplex
(enterprises.17163.1.51.4.0.70)
 
Degraded
An interface on the appliance is in half-duplex mode
Indicates that an interface was not configured for half-duplex negotiation but has negotiated half-duplex mode. Half-duplex significantly limits the optimization service results.
Choose Networking > Networking: Base Interfaces and examine the SteelHead link configuration. Next, examine the peer switch user interface to check its link configuration. If the configuration on one side is different from the other, traffic is sent at different rates on each side, causing many collisions.
To troubleshoot, change both interfaces to automatic duplex negotiation. If the interfaces don’t support automatic duplex, configure both ends for full duplex.
linkIoErrors
(enterprises.17163.1.51.4.0.71)
 
Degraded
An interface on the appliance is suffering I/O errors
Indicates that the error rate on an interface has exceeded 0.1 percent while either sending or receiving packets. This threshold is based on the observation that even a small link error rate reduces TCP throughput significantly. A properly configured LAN connection should experience few errors. The alarm clears when the error rate drops below 0.05 percent.
To troubleshoot, try a new cable and a different switch port. Another possible cause is electromagnetic noise nearby.
You can change the default alarm thresholds by entering the alarm link_errors err-threshold xxxxxx CLI command at the system prompt. For details, see the Riverbed Command-Line Interface Reference Manual.
storageProfSwitchFailed
(enterprises.17163.1.51.4.0.73)
Either Critical or Needs Attention, depending on the state
 
Storage profile switch failed
An error has occurred while repartitioning the disk drives during a storage profile switch. A profile switch changes the disk space allocation on the drives, clears the SteelFusion and VSP data stores, and repartitions the data stores to the appropriate sizes.
You switch a storage profile by entering the disk-config layout CLI command at the system prompt or by choosing Administration > System Settings: Disk Management on an EX or EX+SteelFusion SteelHead and selecting a storage profile.
These reasons can cause a profile switch to fail:
•  RiOS can’t validate the profile.
•  The profile contains an invalid upgrade or downgrade.
•  RiOS can’t clean up the existing VDMKs. During clean up RiOS uninstalls all slots and deletes all backups and packages.
When you encounter this error, switch the storage profile again. If the switch succeeds, the error clears. If it fails, RiOS reverts the SteelHead to the previous storage profile.
•  If RiOS is unable to revert the SteelHead to the previous storage profile, the alarm status becomes critical.
•  If RiOS successfully reverts the SteelHead to the previous storage profile, the alarm status displays needs attention.
clusterIpv6IncompatiblePeerError
(enterprises.17163.1.51.4.0.74)
Degraded
A cluster SteelHead has been reported as IPv6 incompatible.
The optimization service has encountered a peer SteelHead IPv6 incompatibility. The message indicates the reason for the condition:
•  Not all local inpath interfaces configured for IPv6
This message indicates that the peer SteelHead is IPv6 capable and its IP address configuration is correct, but the IP address configuration on the local SteelHead doesn’t match the configuration on the peer SteelHead. The mismatch means that there’s at least one relay on the local appliance that isn’t IPv4 or IPv6 capable. An IPv4 address is necessary for routing between neighbors and an IPv6 address is necessary for v6 optimization.
•  Not all peer inpath interfaces configured for IPv6
This message indicates that the local SteelHead is IPv6 capable and its IP address configuration is correct, but the IP address configuration on the peer SteelHead doesn’t match the configuration on the local SteelHead. The mismatch means that there’s at least one relay on the peer that isn’t IPv4 or IPv6 capable. An IPv4 address is necessary for routing between neighbors and an IPv6 address is necessary for v6 optimization.
•  Cluster IPv6 Incompatible
Indicates that a connection-forwarding neighbor is running a RiOS version that is incompatible with IPv6. Neighbors must be running RiOS 8.5 or later. The SteelHead neighbors pass through IPv6 connections when this alarm triggers.
flashProtectionFailed
(enterprises.17163.1.51.4.0.75)
Critical
Flash disk hasn't been backed up due to not enough free space on /var filesystem.
Indicates that the USB flash drive has not been backed up because there isn’t enough available space in the /var filesystem directory.
Examine the /var directory to see if it is storing an excessive amount of snapshots, system dumps, or TCP dumps that you could delete. You could also delete any RiOS images that you no longer use.
datastoreNeedClean
(enterprises.17163.1.51.4.0.76)
Critical
The data store needs to be cleaned.
You need to clear the RiOS data store. To clear the data store, choose Administration > Maintenance: Services and select the Clear Data Store check box before restarting the appliance.
Clearing the data store degrades performance until the system repopulates the data.
pathSelectionPathDown
(enterprises.17163.1.51.4.0.77)
Degraded
Path Selection - A path went down.
Indicates that one of the predefined paths for a connection is unavailable because it has exceeded either the timeout value for path latency or the threshold for observed packet loss.
When a path fails, the SteelHead directs traffic through another available path. When the original path comes back up, the SteelHead redirects the traffic back to it.
clusterNeighborIncompatibleError
(enterprises.17163.1.51.4.0.80)
 
Degraded
At least one node in the cluster is incompatible.
The optimization service has encountered a neighbor incompatibility. The message indicates one of these conditions:
•  A cluster neighbor is running a RiOS version that doesn’t support the connection between neighbors. Neighbors must be running RiOS 8.6.x or later.
•  A connection-forwarding neighbor in a SteelHead Interceptor cluster has path selection enabled while path selection isn’t enabled on another appliance in the cluster.
secureTransportControllerUnreachable
(enterprises.17163.1.51.4.0.81)
 
SteelHead cannot connect to Secure Transport controller
Indicates that a peer SteelHead is no longer connected to the secure transport controller. The controller is a SteelHead that typically resides in the data center and manages the control channel and operations required for secure transport between SteelHead peers. The control channel between the SteelHeads uses SSL to secure the connection between the peer SteelHead and the secure transport controller.
The peer SteelHead is no longer connected to the secure transport controller because:
•  The connectivity between the peer SteelHead and the secure transport controller is lost.
•  The SSL for the connection isn’t configured correctly.
secureTransportRegistrationFailed
(enterprises.17163.1.51.4.0.82)
 
SteelHead cannot register with Secure Transport controller
Indicates that the peer SteelHead isn’t registered with the secure transport controller and the controller doesn’t recognize it as a member of the secure transport group.
pathSelectionPathProbingError
(enterprises.17163.1.51.4.0.83)
Needs Attention
Path Selection - At least one path has probing error
Indicates that a path selection monitoring probe for a predefined path has received a probe response from an unexpected relay or interface.
webProxyConfigAlarm
(enterprises.17163.1.51.4.0.84)
Degraded
Web Proxy Service Configuration Alarm
Indicates that there’s a problem with the web proxy service configuration.
webProxyServiceAlarm
(enterprises.17163.1.51.4.0.85)
Degraded
Web Proxy Service Status Alarm
Indicates that there’s a problem with the web proxy service.
graniteLunError
(enterprises.17163.1.51.4.0.10000)
Degraded
A LUN has become unavailable.
Check if the Data Center LUN was offlined in SteelFusion Core while IO operations were in progress.
graniteISCSIError
(enterprises.17163.1.51.4.0.10001)
Needs Attention
iSCSI module encountered error.
An iSCSI initiator is not accessible. Review the iSCSI configuration in SteelFusion Core.
graniteISNSError
(enterprises.17163.1.51.4.0.10002)
N/A
This alarm is deprecated.
N/A
graniteSnapshotError
(enterprises.17163.1.51.4.0.10003)
Degraded
Snapshot or Timeout error.
A snapshot failed to be committed to the SAN, or a snapshot has failed to complete due to Windows timing out.
Check the SteelFusion Core logs for details. Retry the Windows snapshot.
graniteBlockstoreError
(enterprises.17163.1.51.4.0.10004)
Degraded
Disk space low
The block store is running out of space. This triggers when only 5 percent of space is available in the block store.
Check your WAN connection as well as connectivity to the SteelFusion Core. The can also happen if clients write more data than can be sent over the WAN for a prolonged period of time.
 
Critical
Disk space full
The block store is out of space.
Check your WAN connection as well as connectivity to the SteelFusion Core. The can also happen if clients write more data than can be sent over the WAN for a prolonged period of time.
 
Degraded
Memory Low
The block store is running out of memory.
This indicates a temporary condition caused by too much IO. Limit the number of active prepop sessions. Check if the IOPS exceeds the model recommendation.
 
Degraded
Read Error
The block store could not read data that was already replicated to the DC. Clients will not see any error because the SteelFusion Edge will fetch the data from the DC.
Check the system logs to determine the root cause. Replace any disks that have failed. The alarm clears when you restart the optimization service.
 
Critical
Critical Read Error
The block store could not read data that is not yet replicated to the DC.
Check the system logs to determine the root cause. Replace any disks that have failed. The alarm clears when you restart the optimization service.
 
Critical
Startup Failed
 
The block store failed to start due to disk errors or an incorrect configuration.
Check the system logs to determine the root cause.
 
Critical
Startup Wrong Version
The SteelFusion Edge software version is incompatible with the block store version on disk.
The alarm indicates that the software has been upgraded or downgraded with an incompatible version. Revert to the previous software version.
 
Critical
Write Error
The block store could not save data to disk due to a media error.
Check the system logs to determine the root cause. Replace any disks that have failed. The alarm clears when you restart the optimization service.
graniteCoreError
(enterprises.17163.1.51.4.0.10005)
Degraded
Unknown Edge
The Edge device has connected to a SteelFusion Core that does not recognize the Edge device. Most likely the configuration present on the SteelFusion Core is missing an entry for the Edge. Check that the Edge is supplying the proper Edge ID by looking at the Branch Storage configuration on the Edge device.
 
 
SteelFusion Core Connectivity
The Edge does not have an active connection with the SteelFusion Core.
Check the network between the Edge and the Core; recheck the Edge configuration on the Core.
 
 
Inner Channel Down
The data channel between SteelFusion Core and the Edge is down.
Check the network between the Edge and the Core.
 
 
Keep-Alive Timeout
The connection between the SteelFusion Core and the Edge has stalled.
Check the network between the Edge and the Core.
graniteUncommittedDataError
(enterprises.17163.1.51.4.0.10006)
Degraded
The level of uncommitted data is too high.
The difference between the contents of the block store and the SteelFusion Core-side LUN is significant. This alarm checks for how much uncommitted data is in the Edge cache as a percentage of the total cache size.
This alarm triggers when the appliance writes a large amount of data very quickly, but the WAN pipe is not large enough to get the data back to the SteelFusion Core fast enough to keep the uncommitted data percentage below 5 percent. As long as data is being committed, the cache will flush eventually.
The threshold is 5 percent, which for a 4 TB (1260-4) system is 200G. To change the threshold, use the following CLI command:
[failover-peer] edge id <id> blockstore uncommitted [trigger-pct <percentage>] [repeat-pct <percentage>] [repeat-interval <minutes>]
 
For example:
Core3(config) # edge id Edge2 blockstore uncommitted trigger-pct 50 repeat-pct 25 repeat-interval 5
 
For details on the CLI command, see the SteelFusion Command-Line Interface Reference Manual.
To check that data is being committed, go to Reports > SteelFusion Edge: Blockstore Metrics on the Edge.
graniteHighAvailabilityError
(enterprises.17163.1.100.4.0.10507)
Critical
High availability module encountered error.
High availability module encountered an error.
graniteApplianceUnlicensedError
(enterprises.17163.1.100.4.0.10509)
Critical
Appliance license expired/invalid.
The SteelFusion appliance is not properly licensed.
vspServiceNotRunningError
(enterprises.17163.1.51.4.0.20002)
Critical
VSP service alarm is triggered.
The virtualization service is not running. The email notification indicates whether the alarm was triggered because the VSP services was disabled, restarted, or crashed.
Restart the VMware service.
virtCpuError
(enterprises.17163.1.51.4.0.20003)
Degraded
Virtualization CPU usage alarm triggered.
Average virtualization CPU utilization of the individual cores has exceeded an acceptable threshold. The default threshold is 90 percent.
If virtual CPU utilization spikes are frequent, it might be because the system is undersized. Sustained virtual CPU load can be symptomatic of more serious issues. Consult the CPU Utilization report (in Individual Cores display mode) to gauge how long the system has been loaded and also monitor the amount of traffic currently going through the appliance. An isolated spike in virtual CPU is normal but Riverbed recommends reporting extended high CPU utilization to Riverbed Support. No other action is necessary as the alarm clears automatically.
Some of the virtual CPU cores whose loads can trigger this alarm are shared by RiOS. This alarm might trigger due to CPU-intensive activities on your virtual machines. If you find this alarm triggers too often, you can increase the trigger thresholds or you can disable the Virtualization CPU utilization alarm.
esxiVersionUnsupportedError
(enterprises.17163.1.51.4.0.20004)
Needs Attention
ESXi version alarm is triggered.
Indicates that the version of ESXi running is unsupported.
esxiCommunicationFailedError
(enterprises.17163.1.51.4.0.20005)
Needs Attention
ESXi communication alarm is triggered.
Indicates that RiOS cannot communicate with ESXi or the ESXi password is not synchronized with RiOS.
Changing the ESXi password using VNC or vSphere triggers this trap in RiOS. When the passwords are not synchronized, RiOS cannot communicate with ESXi.
Make sure that the ESXi RiOS Management IP address is correct or choose EX Features> Virtualization: Virtual Services Platform and enter the password.
esxiNotSetupError
(enterprises.17163.1.51.4.0.20006)
Needs Attention
ESXi setup alarm is triggered.
Indicates that ESXi has not yet been set up on a freshly installed appliance. Complete the initial configuration wizard to enable VSP for the first time. The alarm clears after ESXi installation begins.
esxiDiskCreationFailedError
(enterprises.17163.1.51.4.0.20007)
Critical
ESXi disk creation alarm is triggered.
Indicates that the ESXi disk creation failed during the VSP setup.
vspUnsupportedVmCountError
(enterprises.17163.1.51.4.0.20008)
 
Needs Attention
VSP unsupported VM count alarm is triggered.
Indicates that the number of virtual machines powered on exceeds five.
esxiMemoryOvercommittedError
(enterprises.17163.1.51.4.0.20009)
Needs Attention
 
ESXi memory overcommitted alarm is triggered.
Indicates that the total memory assigned to powered-on VMs is more than the total memory available to ESXi for the VMs. To view this number in vSphere Client, choose Allocation > Memory > Total Capacity.
esxiLicenseExpiredError
(enterprises.17163.1.51.4.0.20010)
Degraded
ESXi license expired alarm is triggered.
Indicates that the ESXi license has expired.
esxiLicenseExpiringError
(enterprises.17163.1.51.4.0.20011)
Degraded
ESXi license expiring alarm is triggered.
Indicates that the ESXi license is going to expire within two weeks.
esxiTrialLicenseError
(enterprises.17163.1.51.4.0.20012)
Degraded
ESXi trial license alarm is triggered.
Indicates that ESXi is using a trial license.
esxiVswitchMtuUnsupportedError (enterprises.17163.1.51.4.0.20013)
Needs Attention
ESXi unsupported vSwitch MTU alarm is triggered.
Indicates that a vSwitch with an uplink or a vmknic interface is configured with the maximum transmission unit (MTU) larger than 1500. Jumbo frames larger than 1500 are not supported. Reconfigure the MTU to 1500 or lower.
esxiInitialConfigFailedError (enterprises.17163.1.51.4.0.20014)
Needs Attention
ESXi initial config failed alarm is triggered.
Indicates an ESXi configuration error. For more information, review the SteelHead logs.