Configuring Storage : Configuring Replication
  
Configuring Replication
This section describes how to configure replication on the Core. As of version 4.3, you can configure physical or virtual Cores for seamless failover and recovery between data centers without any data loss. In this environment, your primary and secondary data centers are always synchronized in case of large scale failures such as power loss, natural disasters, or hardware failure. You connect two separate Cores that are each connected to their own storage array in two separate data centers.
The primary data center receives all the reads and writes from the Edge, and is connected to the secondary data center. The secondary data center hosts the replica LUNs, which are copies of the primary LUNs located at the primary data center.
Replication can interoperate with all existing physical or virtual Cores, and all existing Edges, as long as the software version is compatible (version 4.0 and later).
This section describes the following topics:
•  Base Requirements
•  Before You Begin
•  Basic Steps
•  Setting Up the Data Centers for Replication
•  Pairing the Cores
•  Configuring the Witness
•  Configuring Edges and LUNs for Replication
•  Suspending Replication
•  Initiating Failover (Recovering from Primary Data Center Failure)
•  Failing Back to the Primary Data Center
•  Terminating Replication
Base Requirements
You must meet the following requirements to set up replication:
•  The backend storage array must be configured for each Core that will be included in the replication configuration.
•  The primary data center should be able to reach the secondary data center through the chosen interfaces.
•  The secondary Core cannot have any Edges or LUNs.
•  Each Core must have the same configuration. For example, if high-availability is configured on the primary data center’s Cores, it must also be configured on the secondary Cores.
•  The Edges should be able to reach the secondary data center.
Before You Begin
1. Create discoverable LUNs on storage arrays in both data centers. The LUNs you create on the primary data center will serve the Edges during normal operation. The LUNs you create on the secondary data center are replicas of the primary data center LUNs and should not be added to the Core. The secondary data center LUNs should be exactly the same size as those in the primary data center so they can serve the Edges in the event of a primary data center failure.
Storage array vendors and models can vary between data centers, as well as LUN size. For this reason, Riverbed permits a size leeway of 1 GB to allow the LUN created on the secondary data center storage array to be at most 1 GB larger (but not smaller) than the primary data center LUN. For example, a 4 GB LUN on the primary data center may have a 5 GB replica LUN on the secondary data center. The larger part of the LUN in secondary data center is not used.
2. Create the Journal LUN you will be using on the backend. The Journal LUN is a dedicated LUN on the backend storage that is used to temporarily journal writes for replicated LUNs when replication is suspended. If you have Cores set up for high-availability, share the same Journal LUN between the Cores in case one of the Cores fails.
The sizing of the Journal LUN is dependent on the number of LUNs you want to replicate. Riverbed recommends a thinly provisioned LUN of 500 GB or more. If the Journal LUN is not large enough, an alarm will be triggered with a reminder of the minimum required size (calculated based on the currently configured replica LUNs).
3. Ensure that the Journal LUN is accessible by the Core, and that the Journal LUN is not added to the configuration.
4. If you plan to configure Cores for high-availability, ensure that they are set up before you start configuring replication.
Basic Steps
This table describes the basic steps needed to set up replication across data centers, followed by detailed procedures.
Task
Reference
1. Configure replication between the two data centers (roles, replication interfaces, and Journal LUN).
2. Configure the Witness.
3. Configure the Edges and LUNs for replication and start first sync.
Setting Up the Data Centers for Replication
First, set up the primary data center, where the Edges and LUNs will reside. The secondary data center is on standby and should not have any configured LUNs or Edges, and it is the data center that the primary will be replicating to.
To set up the primary data center for replication
1. Log in to the management console on the primary Core.
2. Choose Configure > Replication: Set Up / Monitor to display the Replication (Set Up / Monitor) page.
3. Select Primary in the Replication Configuration panel.
4. Complete the configuration using the controls described in this table.
Setting
Description
Data Center Name
Specify a unique identifier for the primary data center.
Replication Interface
Specify the replication interface (for example, primary, auxiliary) to be used for routing replication traffic between the two data centers.
Ensure that the specified interface can reach the secondary data center and vice-versa.
Note: If possible, use a dedicated interface for replication.
Journal LUN
Specify the dedicated Journal LUN you provisioned on the storage array from the drop-down list.
Note: If the Journal LUN you created does not appear in the drop-down list, select Rescan LUNs.
5. Click Set Configuration.
To modify any of these settings, click Clear Replication Settings.
You can add more replication interfaces by clicking Add Interface. To remove a replication interface, click the X icon next to the interface.
Next, set up the secondary data center.
To set up the secondary data center for replication
1. On the secondary Core, choose Configure > Replication: Set Up / Monitor to display the Replication (Set Up / Monitor) page.
The Replication Configuration panel shows the current replication settings.
2. Select Secondary in the Replication Configuration panel.
3. Complete the configuration using the controls described in this table.
Setting
Description
Data Center Name
Specify the name for the secondary data center.
Replication Interface
Specify the replication interface (for example, primary, auxiliary) to be used for routing replication traffic between the two data centers. If you have Cores that are set up for high-availability, assign the same interface to both.
Ensure that the specified interface can reach the primary data center and vice-versa.
Note: If possible, use a dedicated interface for replication.
Journal LUN
Select the dedicated Journal LUN you provisioned on the storage array from the drop-down list. If you have Cores that are set up for high-availability, choose the same Journal LUN for both.
Note: If the Journal LUN that you created does not appear in the drop-down list, select Rescan LUNs.
Note: If the Cores are configured for high-availability, each pair of Cores in each data center must share the same Data Center Name, Role, and Journal LUN. A warning appears if you attempt to assign different values to each Core.
4. Click Set Configuration.
The secondary data center Core is now ready for communication with the primary Core.
Once you have set up the data centers, they are prepared to connect to each other. If the Cores are set up for high-availability, repeat these steps on the failover peer.
The next step is to peer the Cores together from the primary Core.
Pairing the Cores
After you have specified the roles of the primary and secondary Cores for replication, pair them together from the primary data center to establish connections for replication. If you have two Cores in a high-availability configuration, you pair both of them to the secondary data center. For example, you could pair Core X to Core X’ and Core Y to Core Y’.
Note: If you are setting up Cores for replication and also plan on setting them up for high-availability, ensure that you configure high-availability first. For more information about high-availability, see Configuring Failover.
Figure: Replication in a High-Availability Setup
To pair the Cores
1. From the primary Core, choose Configure > Replication: Set Up / Monitor to display the Replication (Set Up / Monitor) page.
2. In the Replication Pair Connection panel, specify the name of the secondary data center you chose in Step 3 in the Secondary Data Center Name text box.
3. In the Secondary IP text box, specify the IP address of the secondary Core’s replication interface.
4. Click Connect to Secondary.
Repeat Steps 1 to 4 if you have configured high-availability Cores in the primary data center. You only have to peer each high-availability Core with one Core in the secondary data center.
After you have paired the Cores, the next step is to set up the Witness.
Configuring the Witness
The Witness is an Edge that you choose to be the authoritative source on each data center’s state in case they enter a “split-brain” scenario, in which both Cores attempt to journal writes at the same time. Any requests to suspend replication and start journaling are approved by the Witness, which ensures that only one data center is approved for journaling at any given time.
You can only configure the Witness from the primary data center; however, once it is set up, it is available and visible on all Cores across both the primary and secondary data centers.
These requirements must be met for the Witness to work:
•  The Edge must be running version 4.3 or later.
•  The Witness must be reachable from both data centers when it is configured.
Leader and Follower Roles in High-Availability
If the Cores are set up for high-availability, one Core has the role of the leader and the other has the role of the follower. Some configuration changes are only possible on the leader Core. If the leader is down, the follower Core assumes the role of leader. When the original leader comes back up again, it resumes its role.
Even though you can add Edges to both high-availability Cores, only the Edges on the leader Core can be configured as a Witness.
Witness Recommendations
•  Choose a high-availability Edge on the leader Core as the Witness in order to protect against having a single point of failure on the Edge.
•  Choose a Witness in a location that would not be affected by disasters that could potentially bring down the data center.
•  Set up multiple redundant paths from the Witness to the primary and secondary data centers. For more information about configuring WAN redundancy, see the SteelFusion Design Guide.
To set up the Witness
1. From the primary Core, choose Configure > Replication: Set Up / Monitor to display the Replication (Set Up / Monitor) page.
2. In the Witness Information panel, select an Edge from the Witness Name drop-down list.
Note: If the Cores are configured for high-availability, select the Edge that is connected to the primary data center Leader Core.
3. Click Update Witness.
Once the Ready for Failover status changes to Yes, the Edge is ready to assume the role of Witness.
Now that the Cores are aware of each other and the Witness is configured, add the replica LUNs on the primary data center and start first sync.
Changing the Witness
If the Witness is down or experiencing another type of failure, you can change it to a different Edge in the Witness Information panel on the primary Core. Any issues relating to the Witness appear in the Reports > Diagnostics: Alarm Status page under Edge Service.
Note: You can change the Witness at any time, as long as replication is not suspended.
To change the Witness
1. From the primary Core, choose Configure > Replication: Set Up / Monitor to display the Replication (Set Up / Monitor) page.
2. In the Witness Information panel, select the Edge from the Witness Name drop-down list.
3. Click Update Witness.
Configuring Edges and LUNs for Replication
After you have set up the Witness, you will map the replica LUNs. A replica LUN is a LUN on the secondary data center that acts as a mirror to the primary LUN in the primary data center. The replica LUN can be either iSCSI or Fibre Channel, and must be equal in size or larger than the primary LUN (a 1 GB leeway exists). To start replication for an Edge, each of its mapped LUNs must have an associated replica LUN. When you start replication for an Edge, the Core replicates all the LUNs that are mapped to that Edge.
Once the replication for an Edge is enabled, the first sync of data from the primary LUN to the replica LUN starts. During this process, the Core continues to run normally without interruption. There are two options for first sync:
•  Full Sync - The Core performs a full block-by-block copy of the primary LUN to the replica LUN until they are exact copies of each other and are fully synchronized. This mode is selected by default.
•  None - The Core does not copy any blocks from the primary LUN to the replica LUN. You may want to select this option if your LUNs are not formatted yet or if you plan to use a third-party replication tool to synchronize the LUNs.
Note: Enabling and starting replication is not supported for individual LUNs.
To map the replica LUNs and start replication
1. From the primary Core, choose Configure > Replication: Set Up / Monitor to display the Replication (Set Up / Monitor) page.
2. In the Edges panel, click the Edge you want to configure.
3. Click the LUN you want to map.
4. Select a replica LUN from the Secondary LUN Serial drop-down list.
The drop-down list is prepopulated with the replica LUNs that you previously created on the storage array in the secondary data center. The size of the replica LUN is listed next to each one.
To verify whether all LUNs configured on the secondary data center storage array are being displayed on the primary Core for replication, you can use the show storage coredr sec-site-luns command.
5. Select the First Sync Type from the drop-down list.
6. Click Save LUN Replica Mappings and Sync Types.
If there are multiple LUNs mapped to the Edge, the Start Replication button is not enabled until all LUNs have mapped replicas.
7. Click Start Replication.
The Edge Status changes to enabled. If you selected Full from the First Sync Type drop-down list, the Status indicates the first sync percentage of completion.
After the LUN is copied to the secondary data center during the first sync, the primary LUNs and replica LUNs are in a consistent state and replication is active. You can choose a different sync type for each LUN.
To add a new LUN to a replicating Edge
1. Choose Configure > Manage: LUNs to display the LUNs page and select the Edge Mapping tab.
2. Select the Edge to map the LUN to from the SteelFusion Edge drop-down list.
3. Select the replica LUN from the LUN to Map for Replication drop-down list.
4. Select the First Sync Type from the drop-down list.
5. Click Update Mapping.
Suspending Replication
During an extended data center failure or planned network event such as a network upgrade on the secondary data center, you can suspend replication on the primary data center to prevent the Edge’s blockstore from accumulating excessive data. Suspending replication will not clear any replication settings.
While replication is suspended, the primary data center Core uses the Journal LUN to track the writes from the Edge until the connection is restored. When the secondary data center is restored, the Cores automatically reconnect and journaled writes are synced to the secondary data center to make the replica LUN and primary LUN consistent. Failback is possible if this data center is preferred. For more information about failback, see Failing Back to the Primary Data Center.
If the Cores are set up for high-availability, replication must be suspended from the leader Core in the primary data center. Once the leader approves the change, the follower also suspends replication, and they are both in the suspended state. If the leader fails for any reason, the follower automatically becomes the leader.
To suspend replication on the primary data center
1. On the primary Core, choose Configure > Replication: Set Up / Monitor to display the Replication (Set Up / Monitor) page.
Replication is still active, so the primary Core is still attempting to replicate to the secondary data center before acknowledging writes to the Edge.
2. Click Suspend Replication in the Replication Pair Connection panel.
The Core now contacts the Witness to verify whether it is possible to suspend replication to the secondary data center. Because the Witness knows the state of each data center, it is the authoritative source for this information.
Once the Core successfully suspends replication, the Status in the Replication Pair Connection and Edges panels will change to Suspended. The primary data center now writes to the storage array and journal all the incoming writes to the Journal LUN.
To resume replication, click Resume Replication.
Initiating Failover (Recovering from Primary Data Center Failure)
In the event of primary data center failure, where the data center is decommissioned for an extended period of time, you can easily have the primary data center’s LUNs and Edges fail over to the secondary data center to continue Edge operations. For the Edges, this process is entirely transparent. A failover is not automatic; it is initiated by the administrator and is only permitted when the following criteria are met:
•  The primary data center is down.
•  All live and replica LUNs are synchronized across data centers (to prevent data loss).
•  The failover is approved by the Witness.
•  All LUNs on Edges are in an active state.
After failover is complete, the Secondary Core’s role in Replication Configuration panel will change to Primary. All Edge connections and data commits will move to the new Primary, Edge commits will be “In Progress”, and replication will be suspended.
The original secondary data center will now act as primary and the original primary data center will act as secondary whenever it recovers from the failure.
Note: Any Edge that you did not set up for replication does not have its LUNs available during failover because they were not being replicated on the secondary data center.
To initiate failover to the secondary data center
1. On the secondary Core, choose Configure > Replication: Set Up / Monitor to display the Replication (Set Up / Monitor) page.
2. Click Initiate Failover in the Initiate Failover panel.
Core contacts the Witness to verify if the primary data center is actually down.
If the secondary Core has successfully taken over and is serving the LUNs, the Role in the Replication Configuration panel appears as “Primary.”
Failing Back to the Primary Data Center
Once a failed data center has recovered, you can restore its role to primary if it is a preferred data center (due to hardware configuration or geographical location, for example). To failback to the original primary data center, you initiate failover from the preferred data center, which is currently in the secondary role.
To failback to the original primary data center
1. On the current primary Core, choose Settings > Maintenance: Service.
2. Click Stop to stop the SteelFusion Core service.
3. On the secondary Core (the original primary Core), choose Configure > Replication: Set Up / Monitor to display the Replication (Set Up / Monitor) page.
4. Click Initiate Failover in the Initiate Failover panel.
5. On the Core where you stopped the SteelFusion Core service, choose Settings > Maintenance: Service to display the Service page.
6. Click Restart to restart the SteelFusion Core service.
The data centers return to their original primary and secondary roles.
Terminating Replication
To stop replication for an Edge, you must first terminate replication for the Edge, and then unmap all replica LUNs associated with that Edge. To completely terminate replication for a Core, you must disable replication for all the replicating Edges.
Note: If the Cores are set up for high-availability, terminating replication from the primary data center leader also terminates it on all four nodes.
To terminate replication for the Edge
1. On the primary Core, choose Configure > Replication: Set Up / Monitor to display the Replication (Set Up / Monitor) page.
2. In the Edges panel, click Terminate Replication.
The status for the Edge changes to disabled.
Next, you will terminate replication for the Core.
To terminate replication for the Core
1. On the primary Core, choose Configure > Replication: Set Up / Monitor to display the Replication (Set Up / Monitor) page.
2. In the Edges panel, click Terminate Replication.
The status for the Edge changes to disabled.
3. Repeat Step 2 for each Edge with an enabled status.
Replication for all Edges must be in a disabled state in order to terminate Core replication.
4. For each LUN in the Edges panel, select No Replica LUN from the Secondary LUN Serial drop-down list.
5. Click Save Replica Mappings and Sync Types.
The status for the current mapped LUN changes from active to deleting. Once the Core has finished unmapping the replica LUN, the status changes to unmapped.
Repeat these steps for each LUN under the Edge.
6. Click Terminate Replication and Clear Settings in the Replication Pair Connection panel.
7. Click Clear Replication Settings in the Replication Configuration panel.