SteelFusion Core MIB : SNMP Traps
  
SNMP Traps
Every Core supports SNMP traps and email alerts for conditions that require attention or intervention. An alarm triggers for most, but not every, event, and the related trap is sent. For most events, when the condition clears, the system clears the alarm and also sends a clear trap. The clear traps are useful in determining when an event has been resolved.
This section describes the SNMP traps. It does not list the corresponding clear traps.
RiOS v6.0 and later includes support for SNMPv3.
You can view Core health at the top of each Management Console page, by entering the CLI show info command, and through SNMP (health, systemHealth).
The Core tracks key hardware and software metrics and alerts you of any potential problems so that you can quickly discover and diagnose issues. The health of an appliance falls into one of the following states:
•  Healthy - The Core is functioning optimally.
•  Needs Attention - Accompanies a healthy state to indicate management-related issues not affecting the ability of Core to perform.
•  Degraded - The Core system has detected an issue.
•  Admission Control - The Core is performing but has reached its connection limit.
•  Critical - The Core might or might not be performing; you must address a critical issue.
The following table summarizes the SNMP traps sent from the system to configured trap receivers and their effect on the Core health state.
Trap and OID
Appliance State
Text
Description
procCrash
(enterprises.17163.1.100.4.0.1)
Healthy
A procCrash trap signifies that a process managed by PM has crashed and left a core file. The variable sent with the notification indicates which process crashed.
A process has crashed and subsequently been restarted by the system. The trap contains the name of the process that crashed. A system snapshot associated with this crash has been created on the appliance and is accessible through the CLI or the Management Console. Riverbed Support might need this information to determine the cause of the crash. No other action is required on the appliance as the crashed process is automatically restarted.
procExit
(enterprises.17163.1.100.4.0.2)
Healthy
A procExit trap signifies that a process managed by PM has exited unexpectedly, but not left a core file. The variable sent with the notification indicates which process exited.
A process has unexpectedly exited and been restarted by the system. The trap contains the name of the process. The process might have exited automatically or due to other process failures on the appliance. Review the release notes for known issues related to this process exit. If none exist, contact Riverbed Support to determine the cause of this event. No other action is required on the appliance as the crashed process is automatically restarted.
configChange
(enterprises.17163.1.100.4.0.3)
Healthy
A change has been made to the system’s configuration.
A configuration change has been detected. Check the log files around the time of this trap to determine what changes were made and whether they were authorized.
cpuUtil
(enterprises.17163.1.100.4.0.4)
Degraded
The average CPU utilization in the past minute has gone above the acceptable threshold.
Average CPU utilization has exceeded an acceptable threshold. If CPU utilization spikes are frequent, it might be because the system is undersized. Sustained CPU load can be symptomatic of more serious issues. Consult the CPU Utilization report to gauge how long the system has been loaded and also monitor the amount of traffic currently going through the appliance. A one-time spike in CPU is normal but Riverbed recommends reporting extended high CPU utilization to Riverbed Support. No other action is necessary as the alarm clears automatically.
pagingActivity
(enterprises.17163.1.100.4.0.5)
Degraded
The system has been paging excessively (thrashing).
The system is running low on memory and has begun swapping memory pages to disk. This event can be triggered during a software upgrade while the optimization service is still running but there can be other causes. If this event triggers at any other time, generate a debug sysdump and send it to Riverbed Support. No other action is required as the alarm clears automatically.
linkError
(enterprises.17163.1.100.4.0.6)
Degraded
An interface on the appliance has lost its link.
The system has lost one of its Ethernet links, typically due to an unplugged cable or dead switch port. Check the physical connectivity between the Core and its neighbor device. Investigate this alarm as soon as possible. Depending on what link is down, the system might no longer be optimizing and a network outage could occur.
This is often caused by surrounding devices, like routers or switches interface transitioning. This alarm also accompanies service or system restarts on the Core.
powerSupplyError
(enterprises.17163.1.100.4.0.7)
Degraded
A power supply on the appliance has failed.
A redundant power supply on the appliance has failed on the appliance and needs to be replaced. Contact Riverbed Support for an RMA replacement as soon as practically possible.
fanError
(enterprises.17163.1.100.4.0.8)
Degraded
A fan has failed on this appliance.
A fan is failing or has failed and needs to be replaced. Contact Riverbed Support for an RMA replacement as soon as practically possible.
memoryError
(enterprises.17163.1.100.4.0.9)
Degraded
A memory error has been detected on the appliance (not supported on all models).
A memory error has been detected. A system memory stick might be failing. Try reseating the memory first. If the problem persists, contact Riverbed Support for an RMA replacement as soon as practically possible.
ipmi
(enterprises.17163.1.100.4.0.10)
Degraded
An IPMI event has been detected on the appliance. Please check the details in the alarm report on the web UI.
An Intelligent Platform Management Interface (IPMI) event has been detected. Check the Alarm Status page for more detail. You can also view the IPMI events on the Core, by entering the CLI command:
show hardware error-log all
localFSFull
(enterprises.17163.1.100.4.0.11)
Critical
The appliance local file system is full.
The appliance local file system is full. You must create more space.
Note: The appliance local file system contains no block files from the LUNs.
temperatureCritical
(enterprises.17163.1.100.4.0.12)
Critical
The system temperature has reached a critical stage.
This trap/alarm triggers a critical state on the appliance. This alarm occurs when the appliance temperature reaches 90 degrees Celsius. The temperature value is not user-configurable. Reduce the appliance temperature.
temperatureWarning
(enterprises.17163.1.100.4.0.13)
Degraded
The system temperature has exceeded the threshold.
The appliance temperature is a configurable notification. By default, this notification is set to trigger when the appliance reached 70 degrees Celsius. Raise the alarm trigger temperature if it is normal for the device to get that hot, or reduce its temperature.
scheduledJobError
(enterprises.17163.1.100.4.0.14)
Healthy
A scheduled job has failed during execution.
A scheduled job on the system (for example, a software upgrade) has failed. To determine which job failed, use the CLI or the Management Console.
confModeEnter
(enterprises.17163.1.100.4.0.15)
Healthy
A user has entered configuration mode.
A user on the system has entered a configuration mode from either the CLI or the Management Console. A log in to the Management Console by user admin sends this trap as well. This is for notification purposes only; no other action is necessary.
confModeExit
(enterprises.17163.1.100.4.0.16)
Healthy
A user has exited configuration mode.
A user on the system has exited configuration mode from either the CLI or the Management Console. A log out of the Management Console by user admin sends this trap as well. This is for notification purposes only; no other action is necessary.
secureVaultLocked
(enterprises.17163.1.100.4.0.17)
Critical
Secure vault is locked. The secure datastore cannot be used.
You must unlock the secure vault.
For details, see Unlocking the Secure Vault.
testTrap
(enterprises.17163.1.100.4.0.19)
Healthy
Trap test.
An SNMP trap test has occurred on the Core. This message is informational and no action is necessary.
temperatureNonCritical
(enterprises.17163.1.100.4.0.1012)
Degraded
The system temperature is no longer in a critical stage.
This message is informational and no action is necessary.
temperatureNormal
(enterprises.17163.1.100.4.0.1013)
Healthy
The system temperature is back within the threshold.
This message is informational and no action is necessary.
secureVaultUnlocked
(enterprises.17163.1.100.4.0.1017)
Healthy
Secure vault is unlocked. The secure data store can be used now.
This message is informational and no action is necessary.
edgeError
(enterprises.17163.1.100.4.0.10500)
Critical
Edge module encountered error.
Edge module encountered an error.
highAvailabilityError
(enterprises.17163.1.100.4.0.10501)
Degraded or Critical
High-Availability module encountered error.
A degraded state indicates one of the following conditions:
•  Edge heartbeat channel failure
•  High availability heartbeat timed out
•  Edge blockstore connection failure
A critical state indicates one of the following conditions:
•  Edge blockstore activation failure
•  Edge blockstore local write failure
•  Edge detected split-brain
•  Edge requires activation
lunError
(enterprises.17163.1.100.4.0.10502)
Degraded
LUN module encountered error.
LUN module encountered an error. Check if the data center LUN was offlined in Core while I/O operations were in progress.
iscsiError
(enterprises.17163.1.100.4.0.10503)
Critical
iSCSI module encountered error.
An iSCSI initiator is not accessible. Review the iSCSI configuration in Core.
snapshotError
(enterprises.17163.1.100.4.0.10505)
Critical
Snapshot module encountered error.
A snapshot failed to be committed to the SAN, or a snapshot has failed to complete due to Windows timing out.
Check the Core logs for details. Retry the Windows snapshot.
applianceUnlicensedError
(enterprises.17163.1.100.4.0.10506)
Critical
Appliance license expired/invalid.
Appliance license expired/invalid.
modelUnlicensedError
(enterprises.17163.1.100.4.0.10507)
Critical
Model license expired/invalid.
Model license expired/invalid.
blkdiskError
(enterprises.17163.1.100.4.0.10508)
Critical
Block-disk module encountered error.
Block-disk module encountered an error.
Note: This alarm applies only to Core-v implementations.
backupIntegrationError
(enterprises.17163.1.100.4.0.10509)
Critical
Backup-integration module encountered error.
Backup-Integration module encountered error.
otherHardwareError
(enterprises.17163.1.100.4.0.10510)
Either Critical or Degraded, depending on the state
Hardware error detected.
Indicates that the system has detected a problem with the hardware. These issues trigger the hardware error alarm:
•  the appliance does not have enough disk, memory, CPU cores, or NIC cards to support the current configuration
•  the appliance is using a memory Dual In-line Memory Module (DIMM), a hard disk, or a NIC that is not qualified by Riverbed
•  other hardware issues
The alarm clears when you add the necessary hardware, remove the unqualified hardware, or resolve other hardware issues.