SteelFusion Edge MIB
This appendix provides a reference to the SteelFusion Edge Enterprise MIB and SNMP traps. These tools allow for easy management of the SteelFusion Edges and straightforward integration into existing network management systems.
This appendix includes these sections:
Accessing the SteelFusion Edge enterprise MIB
SNMP traps
Accessing the SteelFusion Edge enterprise MIB
The SteelFusion Edge enterprise MIB monitors device status and peers. It provides network statistics for seamless integration into network management systems such as Hewlett-Packard OpenView Network Node Manager, PRTG, and other SNMP browser tools.
For details on configuring and using these network monitoring tools, consult their product documentation.
The following guidelines describe how to download and access the SteelHead enterprise MIB using common MIB browsing utilities:
You can download the SteelFusion Edge enterprise MIB file (STEELFUSION-MIB.txt) from the Support page of the Management Console or from the Riverbed Support site at https://support.riverbed.com and load it into any MIB browser utility.
Some utilities might expect a file type other than a text file. If this occurs, change the file extension to the type required by the utility you have chosen.
Some utilities assume that the root is mib-2 by default. If the utility sees a new node, such as enterprises, it might look under mib-2.enterprises. If this occurs, use .iso.org.dod.internet.private.enterprises.rbt as the root.
Some command-line browsers might not load all MIB files by default. If this occurs, find the appropriate command option to load the STEELFUSION-MIB.txt file: for example, for NET-SNMP browsers, snmpwalk -m all.
Retrieving optimized traffic statistics by port
When you perform an snmpwalk on the SteelFusion Edge MIB object bwPortTable to display a table of statistics for optimized traffic by port, the command retrieves only the monitored ports. The monitored ports include the default TCP ports and any ports you add. To view the monitored ports that this object returns, choose Administration > System Settings: Monitored Ports or enter this command at the system prompt:
show stats settings bandwidth ports
To retrieve statistics for an individual port, perform an smnpget for that port, as in this example:
.iso.org.dod.internet.private.enterprises.rbt.products.steelhead.statistics.bandwidth.
bandwidthPerPort.bwPort Table.bwPortEntry.bwPortOutLan.port_number
SNMP traps
Every appliance supports SNMP traps and email alerts for conditions that require attention or intervention. An alarm triggers for most events, but not every event, and the related trap is sent. For most events, when the condition clears, the system clears the alarm and also sends a clear trap. The clear traps are useful in determining when an event has been resolved.
This section describes the SNMP traps. It does not list the corresponding clear traps.
RiOS includes support for SNMPv3.
You can view SteelFusion Edge appliance health at the top of each Management Console page, by entering the show info command, and through SNMP (health, systemHealth).
The Edge tracks key hardware and software metrics and alerts you of any potential problems so that you can quickly discover and diagnose issues. The health of an appliance falls into one of these states:
Healthy - The SteelFusion Edge is functioning and optimizing traffic.
Needs Attention - Accompanies a healthy state to indicate management-related issues not affecting the ability of the SteelFusion Edge to optimize traffic.
Degraded - The SteelFusion Edge is optimizing traffic but the system has detected an issue.
Admission Control - The SteelFusion Edge is optimizing traffic but has reached its connection limit.
Critical - The SteelFusion Edge might or might not be optimizing traffic; you must address a critical issue.
This table summarizes the SNMP traps sent from the system to configured trap receivers and their effect on the Edge’s health state.
Trap and OID
Appliance state
Text
Description
procCrash
(enterprises.17163.1.52.4.0.1)
Healthy
A procCrash trap signifies that a process managed by PM has crashed and left a core file. The variable sent with the notification indicates which process crashed.
A process has crashed and subsequently been restarted by the system. The trap contains the name of the process that crashed. A system snapshot associated with this crash has been created on the appliance and is accessible via the CLI or the Management Console. Riverbed Support might need this information to determine the cause of the crash. No other action is required on the appliance as the crashed process is automatically restarted.
procExit
(enterprises.17163.1.52.4.0.2)
Healthy
A procExit trap signifies that a process managed by PM has exited unexpectedly, but not left a core file. The variable sent with the notification indicates which process exited.
A process has unexpectedly exited and been restarted by the system. The trap contains the name of the process. The process might have exited automatically or due to other process failures on the appliance. Review the release notes for known issues related to this process exit. If none exist, contact Riverbed Support to determine the cause of this event. No other action is required on the appliance as the crashed process is automatically restarted.
cpuUtil
(enterprises.17163.1.52.4.0.3)
Degraded
The average CPU utilization in the past minute has gone above the acceptable threshold.
Average CPU utilization has exceeded an acceptable threshold. If CPU utilization spikes are frequent, it might be because the system is undersized. Sustained CPU load can be symptomatic of more serious issues. Consult the CPU Utilization report to gauge how long the system has been loaded and also monitor the amount of traffic currently going through the appliance. A one-time spike in CPU is normal but we recommend reporting extended high CPU utilization to Riverbed Support. No other action is necessary as the alarm clears automatically.
pagingActivity
(enterprises.17163.1.52.4.0.4)
Degraded
The system has been paging excessively (thrashing).
The system is running low on memory and has begun swapping memory pages to disk. This event can be triggered during a software upgrade while the optimization service is still running but there can be other causes. If this event triggers at any other time, generate a debug sysdump and send it to Riverbed Support. No other action is required as the alarm clears automatically.
smartError
(enterprises.17163.1.51.4.0.5)
This alarm is deprecated.
peerVersionMismatch
(enterprises.17163.1.52.4.0.6)
Degraded
Detected a peer with a mismatched software version.
The appliance has encountered another appliance that is running an incompatible version of system software. Refer to the CLI, Management Console, or the SNMP peer table to determine which appliance is causing the conflict. Connections with that peer will not be optimized, connections with other peers running compatible RiOS versions are unaffected. To resolve the problem, upgrade your system software. No other action is required as the alarm clears automatically.
bypassMode
(enterprises.17163.1.52.4.0.7)
Critical
The appliance has entered bypass (failthru) mode.
The appliance has entered bypass mode and is now passing through all traffic unoptimized. This error is generated if the optimization service locks up or crashes. It can also be generated when the system is first powered on or powered off. If this trap is generated on a system that was previously optimizing and is still running, contact Riverbed Support.
raidError
(enterprises.17163.1.52.4.0.8)
Deprecated
An error has been generated by the RAID array.
A drive has failed in a RAID array. Consult the CLI or Management Console to determine the location of the failed drive. Contact Riverbed Support for assistance with installing a new drive, a RAID rebuild, or drive reseating. The appliance continues to optimize during this event. After the error is corrected, the alarm clears automatically.
Note: Applicable to models 3010, 3510, 3020, 3520, 5010, 5520, 6020, and 6120 only.
storeCorruption
(enterprises.17163.1.52.4.0.9)
Critical
The data store is corrupted.
Indicates that the RiOS data store is corrupt or has become incompatible with the current configuration. To clear the RiOS data store of data, choose Administration > Maintenance: Services, select Clear Data Store, and click Restart to restart the optimization service.
If the alarm was triggered by an unintended change to the configuration, change the configuration to match the previous RiOS data store settings. Then restart the optimization service without clearing the data store to reset the alarm.
Typical configuration changes that require an optimization restart with a clear RiOS data store are enabling enhanced peering or changing the data store encryption.
admissionMemError
(enterprises.17163.1.52.4.0.10)
Admission Control
Admission control memory alarm has been triggered.
The appliance has entered admission control due to memory consumption. The appliance is optimizing traffic beyond its rated capability and is unable to handle the amount of traffic passing through the WAN link. During this event, the appliance continues to optimize existing connections, but new connections are passed through without optimization. No other action is necessary as the alarm clears automatically when the traffic has decreased.
admissionConnError
(enterprises.17163.1.52.4.0.11)
Admission Control
Admission control connections alarm has been triggered.
The appliance has entered admission control due to the number of connections and is unable to handle the amount of connections going over the WAN link. During this event, the appliance continues to optimize existing connections, but new connections are passed through without optimization. No other action is necessary as the alarm clears automatically when the traffic has decreased.
haltError
(enterprises.17163.1.52.4.0.12)
Critical
The service is halted due to a software error.
The optimization service has halted due to a serious software error. See if a core dump or a system dump was created. If so, retrieve and contact Riverbed Support immediately.
serviceError
(enterprises.17163.1.52.4.0.13)
Degraded
There has been a service error. Please consult the log file.
The optimization service has encountered a condition that might degrade optimization performance. Consult the system log for more information. No other action is necessary.
scheduledJobError
(enterprises.17163.1.52.4.0.14)
Healthy
A scheduled job has failed during execution.
A scheduled job on the system (for example, a software upgrade) has failed. To determine which job failed, use the CLI or the Management Console.
confModeEnter
(enterprises.17163.1.52.4.0.15)
Healthy
A user has entered configuration mode.
A user on the system has entered a configuration mode from either the CLI or the Management Console. A log in to the Management Console by user admin sends this trap as well. This alarm is for notification purposes only; no other action is necessary.
confModeExit
(enterprises.17163.1.52.4.0.16)
Healthy
A user has exited configuration mode.
A user on the system has exited configuration mode from either the CLI or the Management Console. A log out of the Management Console by user admin sends this trap as well. This alarm is for notification purposes only; no other action is necessary.
linkError
(enterprises.17163.1.52.4.0.17)
Degraded
An interface on the appliance has lost its link.
The system has lost one of its Ethernet links, typically due to an unplugged cable or dead switch port. Check the physical connectivity between the SteelHead and its neighbor device. Investigate this alarm as soon as possible. Depending on what link is down, the system might no longer be optimizing and a network outage could occur.
This alarm is often caused by surrounding devices, like routers or switches interface transitioning. This alarm also accompanies service or system restarts on the SteelHead.
nfsV2V4
(enterprises.17163.1.52.4.0.18)
Degraded
NFS v2/v4 alarm notification.
The SteelHead has detected that either NFSv2 or NFSv4 is in use. The SteelHead only supports NFSv3 and passes through all other versions. Check that the clients and servers are using NFSv3 and reconfigure if necessary.
powerSupplyError
(enterprises.17163.1.52.4.0.19)
Degraded
A power supply on the appliance has failed (not supported on all models).
A redundant power supply on the appliance has failed on the appliance and needs to be replaced. Contact Riverbed Support for an RMA replacement as soon as practically possible.
asymRouteError
(enterprises.17163.1.52.4.0.20)
Needs Attention
Asymmetric routes detected, certain connections might not be optimized because of this.
Asymmetric routing has been detected on the network. This alarm is likely due to a failover event of an inner router or VPN. If so, no action needs to be taken. If not, contact Riverbed Support for further troubleshooting assistance.
fanError
(enterprises.17163.1.52.4.0.21)
Degraded
A fan has failed on this appliance (not supported on all models).
A fan is failing or has failed and needs to be replaced. Contact Riverbed Support for an RMA replacement as soon practically possible.
memoryError
(enterprises.17163.1.52.4.0.22)
Degraded
A memory error has been detected on the appliance (not supported on all models).
A memory error has been detected. A system memory stick might be failing. Try reseating the memory first. If the problem persists, contact Riverbed Support for an RMA replacement as soon as practically possible.
ipmi
(enterprises.17163.1.52.4.0.23)
Degraded
An IPMI event has been detected on the appliance. Please check the details in the alarm report on the web UI (not supported on all models).
An Intelligent Platform Management Interface (IPMI) event has been detected. Check the Alarm Status page for more detail. You can also view the IPMI events on the SteelHead, by entering this command:
show hardware error-log all
configChange
(enterprises.17163.1.52.4.0.24)
Healthy
A change has been made to the system configuration.
A configuration change has been detected. Check the log files around the time of this trap to determine what changes were made and whether they were authorized.
datastoreWrapped
(enterprises.17163.1.52.4.0.25)
Healthy
The datastore has wrapped around.
The RiOS data store on the SteelHead went through an entire cycle and is removing data to make space for new data. This alarm is normal behavior unless it wraps too quickly, which might indicate that the RiOS data store is undersized. If a message is received every seven days or less, investigate traffic patterns and RiOS data store sizing.
temperatureWarning
(enterprises.17163.1.52.4.0.26)
Degraded
The system temperature has exceeded the threshold.
The appliance temperature is a configurable notification. By default, this notification is set to trigger when the appliance reaches 70 degrees Celsius. Raise the alarm trigger temperature if it is normal for the Edge to get that hot, or reduce the temperature of the Edge.
temperatureCritical
(enterprises.17163.1.52.4.0.27)
Critical
The system temperature has reached a critical stage.
This trap/alarm triggers a critical state on the appliance. This alarm occurs when the appliance temperature reaches 90 degrees Celsius. The temperature value is not user-configurable. Reduce the appliance temperature.
cfConnFailure
(enterprises.17163.1.52.4.0.28)
Degraded
Unable to establish connection with the specified neighbor.
The connection cannot be established with a connection-forwarding neighbor. This alarm clears automatically the next time all neighbors connect successfully.
cfConnLostEos
(enterprises.17163.1.52.4.0.29)
Degraded
Connection lost since end of stream was received from the specified neighbor.
The connection has been closed by the connection-forwarding neighbor. This alarm clears automatically the next time all neighbors connect successfully.
cfConnLostErr
(enterprises.17163.1.52.4.0.30)
Degraded
Connection lost due to an error communicating with the specified neighbor.
The connection has been lost with the connection-forwarding neighbor due to an error. This alarm clears automatically the next time all neighbors connect successfully.
cfKeepaliveTimeout
(enterprises.17163.1.52.4.0.31)
Degraded
Connection lost due to lack of keep-alives from the specified neighbor.
The connection-forwarding neighbor has not responded to a keepalive message within the time-out period, indicating that the connection has been lost. This alarm clears automatically when all neighbors of the SteelHead are responding to keepalive messages within the time-out period.
cfAckTimeout
(enterprises.17163.1.52.4.0.32)
Degraded
Connection lost due to lack of ACKs from the specified neighbor.
The connection has been lost because requests have not been acknowledged by a connection-forwarding neighbor within the set time-out threshold. This alarm clears automatically the next time all neighbors receive an ACK from this neighbor and the latency of that acknowledgment is less than the set time-out threshold.
cfReadInfoTimeout
(enterprises.17163.1.52.4.0.33)
Degraded
Timeout reading info from the specified neighbor.
The SteelHead has timed out while waiting for an initialization message from the connection-forwarding neighbor. This alarm clears automatically when the SteelHead is able to read the initialization message from all of its neighbors.
cfLatencyExceeded
(enterprises.17163.1.52.4.0.34)
Degraded
Connection forwarding latency with the specified neighbor has exceeded the threshold.
The amount of latency between connection-forwarding neighbors has exceeded the specified threshold. The alarm clears automatically when the latency falls below the specified threshold.
sslPeeringSCEPAutoReenrollError
(enterprises.17163.1.51.4.0.35)
Needs Attention
There is an error in the automatic re-enrollment of the SSL peering certificate.
An SSL peering certificate has failed to reenroll with the Simple Certificate Enrollment Protocol (SCEP).
crlError
(enterprises.17163.1.51.4.0.36)
Needs Attention
CRL polling fails.
The polling for SSL peering CAs has failed to update the Certificate Revocation List (CRL) within the specified polling period. This alarm clears automatically when the CRL is updated.
datastoreSyncFailure
(enterprises.17163.1.52.4.0.37)
Degraded
Data store sync has failed.
The RiOS data store synchronization between two SteelHeads has been disrupted and the RiOS data stores are no longer synchronized.
secureVaultNeedsUnlock
(enterprises.17163.1.52.4.0.38)
Needs Attention
SSL acceleration and the secure data store cannot be used until the secure vault has been unlocked.
The secure vault is locked. SSL traffic is not being optimized and the RiOS data store cannot be encrypted. Check the Alarm Status page for more details. The alarm clears when the secure vault is unlocked.
secureVaultNeedsRekey
(enterprises.17163.1.52.4.0.39)
Needs Attention
If you wish to use a nondefault password for the secure vault, the password must be rekeyed. Please see the Knowledge Base solution 5592 for more details.
The secure vault password needs to be verified or reset. Initially, the secure vault has a default password known only to the RiOS software so the SteelHead can automatically unlock the vault during system startup.
For details, check the Alarm Status page and see Knowledge Base solution 5592.
The alarm clears when you verify the default password or reset the password.
secureVaultInitError
(enterprises.17163.1.52.4.0.40)
Critical
An error was detected while initializing the secure vault. Please contact Riverbed Support.
An error occurred while initializing the secure vault after a RiOS software version upgrade. Contact Riverbed Support.
configSave
(enterprises.17163.1.52.4.0.41)
Healthy
The current appliance configuration has been saved.
A configuration has been saved either by entering the write memory command or by clicking Save in the Management Console. This message is for security notification purposes only; no other action is necessary.
tcpDumpStarted
(enterprises.17163.1.52.4.0.42)
Healthy
A TCP dump has been started.
A user has started a TCP dump on the SteelHead by entering the tcpdump or tcpdump-x command from the CLI. This message is for security notification purposes only; no other action is necessary.
tcpDumpScheduled
(enterprises.17163.1.52.4.0.43)
Healthy
A TCP dump has been scheduled.
A user has started a TCP dump on the SteelHead by entering the tcpdump or tcpdump-x command with a scheduled start time from the CLI. This message is for security notification purposes only; no other action is necessary.
newUserCreated
(enterprises.17163.1.52.4.0.44)
Healthy
A new user has been created.
A new role-based management user has been created using the CLI or the Management Console. This message is for security notification purposes only; no other action is necessary.
diskError
(enterprises.17163.1.52.4.0.45)
Degraded
Disk error has been detected.
A disk error has been detected. A disk might be failing. Try reseating the memory first. If the problem persists, contact Riverbed Support.
wearWarning
(enterprises.17163.1.52.4.0.46)
Degraded
Accumulated SSD write cycles passed predefined level.
Triggers on SteelHead models using Solid State Disks (SSDs).
An SSD has reached 95 percent of its write cycle limit. Contact Riverbed Support.
cliUserLogin
(enterprises.17163.1.52.4.0.47)
Healthy
A user has just logged in via CLI.
A user has logged in to the SteelHead using the command-line interface. This message is for security notification purposes only; no other action is necessary.
cliUserLogout
(enterprises.17163.1.52.4.0.48)
Healthy
A CLI user has just logged out.
A user has logged out of the SteelHead using the command-line interface using the quit command or Ctrl+D. This message is for security notification purposes only; no other action is necessary.
webUserLogin
(enterprises.17163.1.52.4.0.49)
Healthy
A user has just logged in via the web UI.
A user has logged in to the SteelHead using the Management Console. This message is for security notification purposes only; no other action is necessary.
webUserLogout
(enterprises.17163.1.52.4.0.50)
Healthy
A user has just logged out via the web UI.
A user has logged out of the SteelHead using the Management Console. This message is for security notification purposes only; no other action is necessary.
trapTest
(enterprises.17163.1.52.4.0.51)
Healthy
Trap Test
An SNMP trap test has occurred on the SteelHead. This message is informational and no action is necessary.
admissionCpuError
(enterprises.17163.1.52.4.0.52)
Admission Control
Optimization service is experiencing high CPU utilization.
The appliance has entered admission control due to high CPU use. During this event, the appliance continues to optimize existing connections, but new connections are passed through without optimization. No other action is necessary as the alarm clears automatically when the CPU usage has decreased.
admissionTcpError
(enterprises.17163.1.52.4.0.53)
Admission Control
Optimization service is experiencing high TCP memory pressure.
The appliance has entered admission control due to high TCP memory use. During this event, the appliance continues to optimize existing connections, but new connections are passed through without optimization. No other action is necessary as the alarm clears automatically when the TCP memory pressure has decreased.
systemDiskFullError
(enterprises.17163.1.52.4.0.54)
Degraded
One or more system partitions is full or almost full.
The alarm clears when the system partitions fall below usage thresholds.
domainJoinError
(enterprises.17163.1.52.4.0.55)
Degraded
An attempt to join a domain failed.
An attempt to join a Windows domain has failed. The number one cause of failing to join a domain is a significant difference in the system time on the Windows domain controller and the SteelHead. When the time on the domain controller and the SteelHead do not match, this error message appears:
lt-kinit: krb5_get_init_creds: Clock skew too great
 
We recommend using NTP time synchronization to synchronize the client and server clocks. It is critical that the SteelHead time is the same as the time on the Active Directory controller. Sometimes an NTP server is down or inaccessible, in which case there can be a time difference. You can also disable NTP if it is not being used and manually set the time. You must also verify that the time zone is correct.
A domain join can fail when the DNS server returns an invalid IP address for the domain controller. When a DNS misconfiguration occurs during an attempt to join a domain, these error messages appear:
Failed to join domain: failed to find DC for domain <domain-name>
Failed to join domain : No Logon Servers
 
Additionally, the domain join alarm triggers and messages similar to the following appear in the logs:
Oct 13 14:47:06 bravo-sh81 rcud[10014]: [rcud/main/.ERR] - {- -} Failed to join domain: failed to find DC for domain GEN-VCS78DOM.COM
 
 
 
When you encounter this error, go to the Networking > Networking: Host Settings page and verify that the DNS settings are correct. To verify the time settings, go to the Administration > System Settings: Date/Time page.
certsExpiringError
(enterprises.17163.1.52.4.0.56)
Needs Attention
Some x509 certificates may be expiring.
The service has detected some x.509 certificates used for Network Administration Access to the appliance that are close to their expiration dates. The alarm clears when the x.509 certificates are updated.
licenseError
(enterprises.17163.1.52.4.0.57)
Critical
The main SteelHead license has expired, been removed, or become invalid.
A license on the appliance has been removed, has expired, or is invalid. The alarm clears when a valid license is added or updated.
hardwareError
(enterprises.17163.1.52.4.0.58)
Critical or Degraded
Hardware error detected.
Indicates that the system has detected a problem with the appliance hardware. These issues trigger the hardware error alarm:
the appliance does not have enough disk, memory, CPU cores, or NIC cards to support the current configuration
the appliance is using a memory Dual In-line Memory Module (DIMM), a hard disk, or a NIC that is not qualified by Riverbed
a VSP upgrade requires additional memory or a memory replacement
other hardware issues
The alarm clears when you add the necessary hardware, remove the unqualified hardware, or resolve other hardware issues.
sysdetailError
(enterprises.17163.1.52.4.0.59)
Needs Attention
Error is found in System Detail Report.
A top-level module on the system detail report is in error. For details, choose Reports > Diagnostics: System Details.
admissionMapiError
(enterprises.17163.1.52.4.0.60)
Degraded
New MAPI connections will be passed through due to high connection count.
The total number of MAPI optimized connections have exceeded the maximum admission control threshold. By default, the maximum admission control threshold is 85 percent of the total maximum optimized connection count for the client-side appliance. The appliance reserves the remaining 15 percent so the MAPI admission control does not affect the other protocols. The 85 percent threshold is applied only to MAPI connections.
RiOS is now passing through MAPI connections from new clients but continues to intercept and optimize MAPI connections from existing clients (including new MAPI connections from these clients).
RiOS continues optimizing non-MAPI connections from all clients. This alarm is disabled by default.
The alarm clears automatically when the MAPI traffic has decreased; however, it can take one minute for the alarm to clear.
RiOS preemptively closes MAPI sessions to reduce the connection count in an attempt to bring the appliance out of admission control by bringing the connection count below the 85 percent threshold. RiOS closes the MAPI sessions in this order:
MAPI prepopulation connections
MAPI sessions with the largest number of connections
MAPI sessions with most idle connections
The oldest MAPI session
 
 
 
MAPI sessions exceeding the memory threshold.
Note: MAPI admission control cannot solve a general AdmissionControl Error (enterprises.17163.5.1.4.0.11); however, it can help to prevent it from occurring.
neighborIncompatibility
(enterprises.17163.1.52.4.0.61)
Degraded
Serial cascade misconfiguration has been detected.
Check your automatic peering configuration. Restart the optimization service to clear the alarm.
flashError
(enterprises.17163.1.52.4.0.62)
Needs Attention
Flash hardware error detected.
At times, the USB flash drive that holds the system images might become unresponsive; the appliance continues to function normally. When this alarm triggers, you cannot perform a software upgrade, as the system is unable to write a new upgrade image to the flash drive without first power cycling the system.
To reboot the appliance, go to the Administration > Maintenance: Reboot/Shutdown page or enter the reload command to automatically power cycle the appliance and restore the flash drive to its proper function.
lanWanLoopError
(enterprises.17163.1.52.4.0.63)
Critical
LAN-WAN loop detected. System will not optimize new connections until this error is cleared.
A LAN-WAN network loop has been detected between the LAN and WAN interfaces on a virtual appliance. This can occur when you connect the LAN and WAN virtual NICs to the same vSwitch or physical NIC. This alarm triggers when an appliance starts up, and clears after you connect each LAN and WAN virtual interface to a distinct virtual switch and physical NIC (through the vSphere Networking tab) and then reboot the virtual appliance.
optimizationServiceStatusError
(enterprises.17163.1.52.4.0.64)
Critical
Optimization service currently not optimizing any connections.
The optimization service has encountered an optimization service condition. The message indicates the reason for the condition:
optimization service is not running
This message appears after a configuration file error. For more information, review the logs.
in-path optimization is not enabled
This message appears if an in-path setting is disabled for an in-path appliance. For more information, review the logs.
optimization service is initializing
This message appears after a reboot. The alarm clears on its own; no other action is necessary. For more information, review the logs.
optimization service is not optimizing
This message appears after a system crash. For more information, review the logs.
optimization service is disabled by user
This message appears after entering the no service enable command or shutting down the optimization service from the Management Console. For more information, review the logs.
optimization service is restarted by user
This message appears after the optimization service is restarted from either the CLI or Management Console. You might want to review the logs for more information.
upgradeFailure
(enterprises.17163.1.52.4.0.65)
Needs Attention
Upgrade failed and the system is running the previous image.
A RiOS upgrade has failed and the appliance is running the previous RiOS version. Check the banner message in the Management Console to view more information. The banner message displays which upgrade failed along with the RiOS version the appliance has reverted to and is currently running.
Check that the upgrade image is correct for your appliance.
Verify that the upgrade image is not corrupt. You can use the MD5 checksum tool provided on the Riverbed Support site for the verification.
After you have confirmed that the image is not corrupt, upgrade the RiOS software again. If the upgrade continues to fail, contact Riverbed Support.
licenseExpiring
(enterprises.17163.1.52.4.0.66)
Needs Attention
One or more licensed features will expire within the next two weeks.
Choose Administration > Maintenance: Licenses and look at the Status column to see which licenses are about to expire. One or more feature licenses are scheduled to expire within two weeks.
This alarm is triggered per feature. Suppose you installed two license keys for a feature, LK1-FOO-xxx, which is going to expire in two weeks, and LK1-FOO-yyy, which is not expired. Because one license for the feature is valid, the alarm does not trigger.
licenseExpired
(enterprises.17163.1.52.4.0.67)
Degraded
One or more licensed features have expired.
Choose Administration > Maintenance: Licenses and look at the Status column to see which licenses have expired. One or more feature licenses have expired.
This alarm is triggered per feature. Suppose you installed two license keys for a feature, LK1-FOO-xxx (expired), and LK1-FOO-yyy (not expired). Because one license for the feature is valid, the alarm does not trigger.
clusterDisconnectedSHAlertError
(enterprises.17163.1.52.4.0.68)
Degraded
A cluster SteelHead has been reported as disconnected.
Choose Networking > Network Integration: Connection Forwarding and verify the configuration for both this appliance and the neighbor SteelHead. Verify that the neighbor is reachable from this appliance.
Next, check that the optimization service is running on both appliances.
This error clears when the configuration is valid.
smbAlert
(enterprises.17163.1.52.4.0.69)
Needs Attention
Domain authentication alert.
The optimization service has detected a failure with domain controller communication or a delegate user.
Confirm that the SteelHead residing in the data center is properly joined to the domain by choosing Networking > Windows Domain.
To view useful debugging information enter the show protocol domain-auth test join and show alarm smb_alert commands.
Verify that a delegate user has been added to the appliance and is configured with the appropriate privileges.
linkDuplex
(enterprises.17163.1.52.4.0.70)
Degraded
An interface on the appliance is in half-duplex mode.
Indicates that an interface was not configured for half-duplex negotiation but has negotiated half-duplex mode. Half-duplex significantly limits the optimization service results.
Choose Networking > Networking: Base Interfaces and examine the appliance link configuration. Next, examine the peer switch user interface to check its link configuration. If the configuration on one side is different from the other, traffic is sent at different rates on each side, causing many collisions.
To troubleshoot, change both interfaces to automatic duplex negotiation. If the interfaces do not support automatic duplex, configure both ends for full duplex.
linkIoErrors
(enterprises.17163.1.52.4.0.71)
Degraded
An interface on the appliance is suffering I/O errors.
Indicates that the error rate on an interface has exceeded 0.1 percent while either sending or receiving packets. This threshold is based on the observation that even a small link error rate reduces TCP throughput significantly. A properly configured LAN connection should experience few errors. The alarm clears when the error rate drops below 0.05 percent.
To troubleshoot, try a new cable and a different switch port. Another possible cause is electromagnetic noise nearby.
You can change the default alarm thresholds by entering the alarm link_io_errors err-threshold <threshold-value> command at the system prompt. For details, see the Riverbed Command-Line Interface Reference Manual.
storageProfSwitchFailed
Either Critical or Needs Attention, depending on the state
Storage profile switch failed.
Indicates that an error has occurred while repartitioning the disk drives during a storage profile switch. The repartitioning was unsuccessful.
A profile switch changes the disk space allocation on the drives to allow VSP to use varying amounts of storage. It also clears the SteelFusion and VSP data stores, and repartitions the data stores to the appropriate sizes.
You switch a storage profile by entering the disk-config layout CLI command at the system prompt or by choosing Administration > System Settings: Disk Management and selecting a storage profile.
A storage profile switch requires a reboot of the Edge. The alarm appears after the reboot.
These reasons can cause a profile switch to fail:
RiOS can’t validate the profile.
The profile contains an invalid upgrade or downgrade.
RiOS can’t clean up the existing VDMKs. During cleanup, RiOS uninstalls all slots and deletes all backups and packages.
When you encounter this error, reboot the Edge and then switch the storage profile again. If the switch succeeds, the error clears. If it fails, RiOS reverts the Edge to the previous storage profile.
If RiOS successfully reverts the Edge to the previous storage profile, the alarm status displays needs attention.
If RiOS is unable to revert the Edge to the previous storage profile, the alarm status becomes critical.
clusterIpv6IncompatiblePeerError
(enterprises.17163.1.52.4.0.74)
Degraded
A cluster SteelHead has been reported as IPv6 incompatible.
The optimization service has encountered a peer IPv6 incompatibility. The message indicates the reason for the condition:
Not all local inpath interfaces configured for IPv6
This message indicates that the peer appliance is IPv6 capable and its IP address configuration is correct, but the IP address configuration on the local appliance does not match the configuration on the peer appliance. The mismatch means that there is at least one relay on the local appliance that is not IPv4 or IPv6 capable. An IPv4 address is necessary for routing between neighbors and an IPv6 address is necessary for v6 optimization.
Not all peer inpath interfaces configured for IPv6
This message indicates that the local appliance is IPv6 capable and its IP address configuration is correct, but the IP address configuration on the peer appliance does not match the configuration on the local appliance. The mismatch means that there is at least one relay on the peer that is not IPv4 or IPv6 capable. An IPv4 address is necessary for routing between neighbors and an IPv6 address is necessary for v6 optimization.
Cluster IPv6 Incompatible
Indicates that a connection-forwarding neighbor is running a RiOS version that is incompatible with IPv6. Neighbors must be running RiOS 8.5 or later. The neighbors pass through IPv6 connections when this alarm triggers.
flashProtectionFailed
(enterprises.17163.1.52.4.0.75)
Critical
Flash disk hasn't been backed up due to not enough free space on /var filesystem.
Indicates that the USB flash drive has not been backed up because there is not enough available space in the /var filesystem directory.
Examine the /var directory to see if it is storing an excessive amount of snapshots, system dumps, or TCP dumps that you could delete. You could also delete any RiOS images that you no longer use.
datastoreNeedClean
(enterprises.17163.1.52.4.0.76)
Critical
The data store needs to be cleaned.
You need to clear the RiOS data store. To clear the data store, choose Administration > Maintenance: Services and select the Clear Data Store check box before restarting the appliance.
Clearing the data store degrades performance until the system repopulates the data.
pathSelectionPathDown
(enterprises.17163.1.52.4.0.77)
Degraded
Path Selection - A path went down.
Indicates that one of the predefined paths for a connection is unavailable because it has exceeded either the timeout value for path latency or the threshold for observed packet loss.
When a path fails, the appliance directs traffic through another available path. When the original path comes back up, the appliance redirects the traffic back to it.
clusterNeighborIncompatibleError
(enterprises.17163.1.52.4.0.80)
Degraded
At least one node in the cluster is incompatible.
Indicates that a cluster neighbor is running a RiOS version that does not support the connection between neighbors. Neighbors must be running RiOS 8.6.x or later.
secureTransportControllerUnreachable
(enterprises.17163.1.52.4.0.81)
 
SteelHead cannot connect to Secure Transport controller.
Indicates that a peer appliance is no longer connected to the SteelHead controller. The controller is a SteelHead that typically resides in the data center and manages the control channel and operations required for secure transport between peers. The control channel between the peers uses SSL to secure the connection between the peer appliance and the SteelHead controller.
The peer appliance is no longer connected to the SteelHead controller because:
The connectivity between the peer appliance and the SteelHead controller is lost.
The SSL for the connection is not configured correctly.
secureTransportRegistrationFailed
(enterprises.17163.1.52.4.0.82)
 
SteelHead cannot register with Secure Transport controller.
Indicates that the peer appliance is not registered with the SteelHead controller and the controller does not recognize it as a member of the secure transport group.
pathSelectionPathProbingError
(enterprises.17163.1.52.4.0.83)
Needs Attention
Path Selection - At least one path has probing error.
Indicates that a path selection monitoring probe for a predefined path has received a probe response from an unexpected relay or interface.
webProxyConfigAlarm
(enterprises.17163.1.52.4.0.84)
Degraded
Web Proxy Service Configuration Alarm
Indicates that there is a problem with the web proxy service configuration.
webProxyServiceAlarm
(enterprises.17163.1.52.4.0.85)
Degraded
Web Proxy Service Status Alarm
Indicates that there is a problem with the web proxy service.
portalUnReachableAlarm
(enterprises.17163.1.52.4.0.86)
Degraded
The Riverbed Cloud Portal is unreachable from the SteelHead.
 
temperatureNormal
(enterprises.17163.1.52.4.0.1026)
Healthy
The system temperature is back within the threshold.
No action is necessary.
temperatureNonCritical
(enterprises.17163.1.52.4.0.1027)
Healthy
The system temperature is no longer in a critical stage.
 
cfConnRestored
(enterprises.17163.1.52.4.0.1028)
 
Connection reestablished with the specified neighbor.
 
secureTransportControllerConnected
(enterprises.17163.1.52.4.0.1081)
 
SteelHead successfully connected to Secure Transport controller.
 
secureTransportRegistrationSucceeded
(enterprises.17163.1.52.4.0.1082)
 
SteelHead successfully registered with Secure Transport controller.
 
graniteLunError
(enterprises.17163.1.52.4.0.10000)
Critical or Degraded
Storage module encountered error.
A LUN has become unavailable. Check if the data center LUN was offlined in Core while I/O operations were in progress.
graniteISCSIError
(enterprises.17163.1.52.4.0.10001)
Degraded
Storage Protocol module encountered error.
An iSCSI target on the Edge is not accessible. Review the iSCSI configuration in Core.
graniteSnapshotError
(enterprises.17163.1.52.4.0.10003)
Degraded
Snapshot or timeout error.
A snapshot failed to be committed to the SAN, or a snapshot has failed to complete due to Windows timing out.
Check the Core logs for details. Retry the Windows snapshot.
graniteBlockstoreError
(enterprises.17163.1.52.4.0.10004)
Degraded
Disk space low
The blockstore is running out of space. This triggers when only 5 percent of space is available in the blockstore.
Check your WAN connection as well as connectivity to the Core. The can also happen if clients write more data than can be sent over the WAN for a prolonged period of time.
 
Critical
Disk space full
The blockstore is out of space.
Check your WAN connection as well as connectivity to the Core. The can also happen if clients write more data than can be sent over the WAN for a prolonged period of time.
 
Degraded
Memory Low
The blockstore is running out of memory.
This alarm indicates a temporary condition caused by too much I/O. Limit the number of active prepop sessions. Check if the IOPS exceeds the model recommendation.
 
Degraded
Read Error
The blockstore could not read data that was already replicated to the Core. Clients will not see any error because the Edge will fetch the data from the Core.
Check the system logs to determine the root cause. Replace any disks that have failed. The alarm clears when you restart the optimization service.
 
Critical
Critical Read Error
The blockstore could not read data that is not yet replicated to the Core.
Check the system logs to determine the root cause. Replace any disks that have failed. The alarm clears when you restart the optimization service.
 
Critical
Startup Failed
The blockstore failed to start due to disk errors or an incorrect configuration.
Check the system logs to determine the root cause.
graniteCoreError
(enterprises.17163.1.52.4.0.10005)
Degraded
Unknown Edge
The Edge appliance has connected to a Core that does not recognize the Edge appliance. Most likely the configuration present on the Core is missing an entry for the Edge. Check that the Edge is supplying the proper Edge ID by looking at the Edge storage configuration on the Edge device.
 
Degraded
SteelFusion Core Connectivity
The Edge does not have an active connection with the Core.
Check the network between the Edge and the Core; recheck the Edge configuration on the SteelFusion Core.
 
Degraded
Inner Channel Down
The data channel between the Core and the Edge is down.
Check the network between the Edge and the Core.
 
Degraded
Keep-Alive Timeout
The connection between the Core and the Edge has stalled.
Check the network between the Edge and the Core.
graniteUncommittedDataError
(enterprises.17163.1.52.4.0.10006)
Degraded
The level of uncommitted data is too high.
The difference between the contents of the blockstore and the Core-side LUN is significant. This alarm checks for how much uncommitted data is in the Edge cache as a percentage of the total cache size.
This alarm triggers when the appliance writes a large amount of data very quickly, but the WAN pipe is not large enough to get the data back to the Core fast enough to keep the uncommitted data percentage below 5 percent. As long as data is being committed, the cache will flush eventually.
The threshold is 5 percent, which for a 4 TB (1260-4) system is 200G. To change the threshold, use this command:
[failover-peer] edge id <id> blockstore uncommitted [trigger-pct <percentage>] [repeat-pct <percentage>] [repeat-interval <minutes>]
 
For example:
Core3(config) # edge id Edge2 blockstore uncommitted trigger-pct 50 repeat-pct 25 repeat-interval 5
 
For details on the CLI command, see the SteelFusion Command-Line Interface Reference Manual.
To check that data is being committed, go to Storage > Reports: Blockstore Metrics on the Edge.
graniteHighAvailabilityError
(enterprises.17163.1.52.4.0.10007)
Critical
High availability module encountered error.
High-availability module encountered an error.
graniteApplianceUnlicensedError
(enterprises.17163.1.52.4.0.10009)
Critical
Appliance license expired/invalid.
The Edge appliance is not properly licensed.
hypervisorHardwareMgmtConnectionError
(enterprises.17163.1.52.4.0.30002)
Critical
Hypervisor hardware management connection alarm is triggered.
RiOS has lost IP connectivity or cannot authenticate the connection to the hypervisor motherboard controller.
hypervisorHardwarePowerError (enterprises.17163.1.52.4.0.30003)
Critical
Hypervisor hardware power alarm is triggered.
The hypervisor has lost power unexpectedly.
hypervisorHardwareMemoryError
(enterprises.17163.1.52.4.0.30005)
 
Hypervisor hardware memory error alarm is triggered.
A memory error has occurred; for example, a system memory stick has failed.
hypervisorHardwareOther
HardwareError
(enterprises.17163.1.52.4.0.30006)
 
Hypervisor other hardware error alarm is triggered.
A hardware error has been detected. These issues trigger the other hardware error alarm:
The hypervisor hardware is using a memory Dual In-line Memory Module (DIMM), a hard disk, or a NIC that is not qualified.
The hypervisor hardware has detected a RiOS NIC. The hypervisor does not support RiOS NICs.
DIMMs are plugged into the hypervisor hardware but the hypervisor cannot recognize them because:
a DIMM is in the wrong slot. You must plug DIMMs into the black slots first and then use the blue slots when all of the black slots are in use.
or
a DIMM is broken and you must replace it.
hypervisorHardwareTemperature
Error
(enterprises.17163.1.52.4.0.30007)
 
Hypervisor hardware temperature alarm is triggered.
A hypervisor CPU, board, or platform controller hub (PCH) temperature has exceeded the rising threshold. When the CPU, board, or PCH returns to the reset threshold, the critical alarm clears (after polling for 30 seconds). If the appliance has more than one CPU, both CPUs are reported. The default temperature thresholds are set by the motherboard.
hypervisorLicenseError
(enterprises.17163.1.52.4.0.30008)
 
Hypervisor license alarm is triggered.
Indicates that one of these issues with the virtualization license has occurred:
Hypervisor License Expiring - The license expiration time is within the alarm threshold and is about to expire. The default setting is two weeks.
Hypervisor License Expired - The license has expired.
Hypervisor Using Trial License - The hypervisor is using a trial license.
hypervisorOperationError
(enterprises.17163.1.52.4.0.30009)
Degraded
Hypervisor alarm is triggered.
The hypervisor is in lockdown mode.
vspConnectionError
(enterprises.17163.1.52.4.0.30010)
 
VSP connection alarm is triggered.
A communication issue has occurred between VSP and the hypervisor.
This alarm triggers for any of these issues:
VSP is disconnected from the hypervisor.
The hypervisor password is invalid.
VSP was unable to gather some hardware information.
VSP is disconnected.
vspInstallationError
(enterprises.17163.1.52.4.0.30011)
 
VSP installation alarm is triggered.
VSP is not installed properly and is powered off.
This alarm triggers for any of these issues:
A hypervisor upgrade has failed.
A configuration push from the Hypervisor Installer has failed.
VSP could not gather enough information to set up an interface.
The hypervisor is not installed.
vspLicenseError
(enterprises.17163.1.52.4.0.30012)
Degraded
Riverbed VSP vSphere license alarm is triggered.
This alarm is triggered when the VSP license (VSPESXI) is expired or missing.
vspBaseLicenseError
(enterprises.17163.1.52.4.0.30013)
Critical
Riverbed base license alarm is triggered.
This alarm is triggered when the VSP Base license (VSPBASE) is expired or missing.
vsfedHostHypervisorCpuError
(enterprises.17163.1.52.4.0.30014)
Critical
Virtual SteelFusion Edge Host Hypervisor CPU alarm is triggered.
The CPU capacity and cores reserved for the Virtual Edge VM hosted on Hyper-V are not sufficient to support Virtual Edge operations.
Note: This SNMP trap is available only in Virtual Edge on Hyper-V deployments.
vsfedHostHypervisorConnError
(enterprises.17163.1.52.4.0.30015)
Critical
Virtual SteelFusion Edge Hypervisor Connection alarm is triggered.
The connectivity to the host Hyper-V is lost or not established from the Virtual Edge VM. The reason for connection failure is specified in the alarm description.
Note: This SNMP trap is available only in Virtual Edge on Hyper-V deployments.
vsfedHostHypervisorVersionError
(enterprises.17163.1.52.4.0.30016)
Critical
Virtual SteelFusion Edge Host Hypervisor Version alarm is triggered.
The host hypervisor is running an OS version that is not supported by Virtual Edge.
Note: This SNMP trap is available only in Virtual Edge on Hyper-V deployments.
vsfedHostHypervisorMemoryError
(enterprises.17163.1.52.4.0.30017)
Critical
Virtual SteelFusion Edge Host Hypervisor Memory alarm is triggered.
The memory reserved for the Virtual Edge VM hosted on Hyper-V is not sufficient to support Virtual Edge operations. The alarm is also raised if the memory allocation is dynamic.
Note: This SNMP trap is available only in Virtual Edge on Hyper-V deployments.