Alarm | Appliance state | Reason |
---|---|---|
Blockstore | Degraded or Critical | Indicates that the system has encountered any of the following issues with the blockstore: • Disk space Low—The blockstore is running out of space. Check your WAN connection as well as connectivity to the Core. This can also happen if clients write more data than can be sent over the WAN for a prolonged period of time. • Disk space Full—The blockstore is out of space. Check your WAN connection as well as connectivity to the Core. This can also happen if clients write more data than can be sent over the WAN for a prolonged period of time or when the Core HA time does not match the Edge HA time. It is critical that the Core HA time is synchronized with the Edge HA time. We recommend using NTP time synchronization to synchronize the Core HA and the Edge HA. You must also verify that the time zone is correct. • Memory Low—The blockstore is running out of memory. This indicates a temporary condition caused by too much IO. Limit the number of active prepopulation sessions. Check if IOPS is more than what is recommended for the appliance model. • Read Error—The blockstore could not read data that was already replicated to the data center. Clients will not see any error because the Edge will fetch the data from the Core. Check the system logs to determine the root cause. Replace any disks that have failed. The alarm clears when you restart the service. • Read Cache Error—An appliance with read cache SSDs cannot start the read cache. Replace any failed or missing SSDs, and restart the appliance. If no drives have failed or no drives are missing, check system logs for more detailed information, and contact Riverbed Support for assistance as needed. • Critical Read Error—The blockstore could not read data that is not yet replicated to the Core. Check the system logs to determine the root cause. Replace any disks that have failed. The alarm clears when you restart the service. • Startup Failed—The blockstore failed to start due to disk errors or an incorrect configuration. Check the system logs to determine the root cause. • Startup Wrong Version—The Edge software version is incompatible with the blockstore version on disk. The alarm indicates that the software has been upgraded or downgraded with an incompatible version. Revert to the previous software version. • Standby Wrong Version—The Edge software version running on the standby peer is incompatible with the version on the active peer appliance in a high-availability pair. The active peer has been upgraded or downgraded to an incompatible software version. This alarm typically triggers during the process of upgrading both Edges in a high-availability pair. When upgrading across versions with disk-format changes, this is a normal part of the upgrade process. To clear the alarm, update the software version running on the standby peer to match the software version running on the active peer. |
Write Error—The blockstore could not save data to disk due to a media error. Check the system logs to determine the root cause. Replace any disks that have failed. The alarm clears when you restart the service. | ||
Edge HA Service | Either Degraded or Critical | Indicates that only one of the appliances in a high-availability (HA) Edge pair is actively serving storage data (the active peer). As the system writes new data to the active peer, it is reflected to the standby peer, which stores a copy of the data in its local data store. The two appliances maintain a heartbeat protocol between them, so that if the active peer goes down, the standby peer can take over servicing the LUNs. If the standby peer goes down, the active peer continues servicing the LUNs after raising this alarm and sending an email that the appliance is degraded or critical. The email contains the IP address of the peer appliance. Degraded indicates that the edge HA is not functioning but the LUNs are being serviced. After a failed peer resumes, it resynchronizes with the other peer in the HA pair to receive any data that was written since the time of the failure. After the peer receives all the written data, the normal HA mode resumes and any future writes are reflected to both peers. Critical indicates that the LUNs are no longer available and are not being serviced. Contact Riverbed Support. |
Storage Volume Status | Critical or Degraded | Indicates that the connection to the volume has failed or there is an issue with any of the following: • Backend connectivity • No read/write permissions • Space threshold has been reached • Resize failure • A LUN is deactivated. A LUN will be deactivated if the blockstore has a critical amount of low space and this particular LUN has a high rate of new writes. • Initialization of the blockstore for the LUN fails. • Connectivity issues between Edge and Core. If the status is Degraded, the export is available to be written on the Edge however there may be issues with writing the Edge’s data to the backend storage array. If the status is Critical, the export may not be available to be used by the clients at the branch. |
Riverbed Host Tools Version | Degraded | Indicates that the Riverbed host tools package (RHSP) is incompatible with the Windows server version. RHSP provides snapshot capabilities by exposing the Edge through iSCSI to the Windows Server as a snapshot provider. RHSP is compatible with 64-bit editions of Microsoft Windows Server 2008 R2 or later and can be downloaded from the Riverbed Support site at https://support.riverbed.com. |
Snapshot | Degraded | A snapshot failed to be committed to the Core, or a snapshot has failed to complete at the Edge because the blockstore is full, needs credentials, or there is a misconfiguration at the Core. Check the Core logs for details. Retry the Windows snapshot. |
Core | Degraded | Indicates that the system has encountered any of the following issues with the Core: • Unknown Edge—The Edge appliance has connected to a Core that does not recognize the appliance. Most likely the configuration present on the Core is missing an entry for the Edge. Check that the Edge is supplying the proper Edge ID. To find the Edge ID, choose Storage > Storage Edge Configuration on the Edge appliance. • Core Connectivity—The Edge does not have an active connection with the Core. Check the network between the Edge and the Core; recheck the Edge configuration on the Core. • Inner Channel Down—The data channel between the Core and the Edge is down. The connection between the Core and the Edge has stalled. Check the network between the Edge and the Core. • Keep-Alive Timeout—The connection between the Core and the Edge has stalled. Check the network between the Edge and the Core. |
Edge Service | Needs Attention | Indicates that the Edge appliance connected to the Core is not servicing the Core. Check that Edge appliance is running. |
Uncommitted Edge Data | Degraded | Indicates that a large amount of data in the blockstore needs to be committed to Core. The difference between the contents of the blockstore and the Core-side LUN is significant. This alarm checks for how much uncommitted data is in the Edge cache as a percentage of the total cache size. This alarm triggers when the appliance writes a large amount of data very quickly, but the WAN pipe is not large enough to get the data back to the Core fast enough to keep the uncommitted data percentage below 5 percent. As long as data is being committed, the cache will flush eventually. The threshold is 5 percent, which for a 4 TiB (1260-4) system is 200 GiB. To change the threshold, use this command: [failover-peer] edge id <id> blockstore uncommitted [trigger-pct <percentage>] [repeat-pct <percentage>] [repeat-interval <minutes>] For example: Core3(config) # edge id Edge2 blockstore uncommitted trigger-pct 50 repeat-pct 25 repeat-interval 5 To check that data is being committed, go to Storage > Reports: Blockstore Metrics on the Edge. |
Server Backup | Indicates that one of the following backup failures have occurred: • Failed connection to the server—The connection between the Edge and the ESXi or Windows server is down, the server is not running, or there are incorrect credentials for the ESXi or vCenter server login. To fix this issue, check if the server or vCenter is reachable from the Edge and vice-versa. Also ensure that the correct credentials are being used for the ESXi server or vCenter. This alarm is cleared when the connection is restored between the Edge and the ESXi server or vCenter. • Backup failure on the Edge—A backup has failed on the Edge. The alarm displays a message with the affected server. • LUN is shared among multiple Windows servers—At least one LUN is shared among two or more Windows servers. To fix this issue, make sure that the LUN has access to only one host or IP address. The alarm is cleared when servers no longer share LUN(s) and the next protect operation succeeds. • Server with a backup policy does not have a LUN—A server with an associated backup policy does not have any VMs or LUNs to protect. |