Planning for a Data Protection Deployment


Question	Why This Is Important
WAN-Side Considerations
Is this a two-site or a multisite (fan-in, fan-out) data protection opportunity?	In a two-site deployment, the same SteelHead models are often selected for each site. In a multisite (fan-in, fan-out) deployment, the SteelHead at the central site is sized to handle the data transfers to and from the edge sites.
What is the WAN link size?	Knowing the WAN link size is essential in determining: which models are feasible for deployment because the SteelHeads specifications are partially based on the WAN rating. the level of data reduction the SteelHeads must deliver to meet the ultimate data protection objective.
What is the network latency between sites?	Knowing the latency in the environment is essential for providing accurate performance estimates. Network latency and WAN link size are used together to calculate buffer sizes on the SteelHead to provide optimal link utilization. Although SteelHeads are generally able to overcome the effects of latency for network protocols used in data protection solutions, some are still latency sensitive.
Is there a dedicated link for disaster recovery?	Environments with a dedicated link are typically easier to configure. Environments with shared links must employ features such as QoS to ensure that data protection traffic receives an adequate amount of bandwidth necessary to meet the ultimate objective.
LAN-Side Considerations
Which backup or replication products are you using?	Certain backup or replications products require special configuration. Knowing what is currently in use is essential for providing configuration recommendations and performance estimates. Riverbed has experience with different data protection products and business relationships with many different replication vendors. Many have similar configuration options and network utilization behaviors. Some examples of backup and replication products: EMC - SRDF/A RecoverPoint NetApp - SnapMirror, SnapVault IBM - GlobalMirror, XIV replication HDS - TrueCopy, Hitachi Universal Replicator Symantec - NetBackup Vision Solutions - Double-Take CA - ARCserver HP - Continuous Access EVA HDS - TrueCopy IBM - PPRC
Are you using synchronous or asynchronous replication?	Asynchronous replication is typically a very good fit. By comparison, synchronous replication has very stringent latency requirements and is rarely a good fit for WAN optimization. Many types of data protection traffic are not typically considered replication of either type, such as backup jobs.
What is your backup methodology?	Knowing the backup type and schedule provides insight into the frequency of heavy data transfers and the level of repetition within these transfers. Some examples of backup methodologies are: A single full backup and an incremental backup for life (synthetic full). A daily full backup. A weekly full backup and a daily incremental backup.
Are your data streams single or multistream? What is the total number of replication streams?	Knowing the number of TCP streams is essential in providing a configuration recommendation and performance estimate. Because SteelHeads proxy TCP/IP, the number of TCP streams created by the data protection solution can impact the SteelHead resource utilization. RiOS v5.0 and earlier have a constraint that each TCP session (stream) is serviced by a single CPU core, so splitting the load across many streams is essential to fully use the resources in larger, multicore SteelHeads. RiOS v5.5 or later has multicore features that allow multiple CPU cores to process a single stream. When considering the number of streams, of primary importance is the number of heavyweight data streams that carry significant amounts of traffic. In addition, consider that any smaller control streams that carry a small amount of traffic (such as these present in many backup systems and some FCIP systems). Depending on the data protection technology in use, there might be options to increase the number of streams in use. As a first step, determine how many streams are observed in the current environment. Determine whether there is a willingness to increase the number of data streams if a method to do so is suggested.
Is there a FCIP/iFCP gateway? If yes, what is the make, model, and firmware version?	Some FCIP/iFCP gateways (or particular firmware versions of some gateways) do not adhere fully to the TCP/IP or FCIP standards. Depending on what is in use they might require firmware upgrades, special configuration, or cannot be optimized at this time. Gateways are mainly seen in fibre channel SAN replication environments such as SRDF/A, MirrorView, and TrueCopy. Typical firmware versions: Cisco MDS, FCIP v4.1(3) Brocade 7500 FOS v6.3.1, QLogic isr6142 v2.4.3.2.
Is compression enabled on the gateway or the replication product? If yes, what is the current compression ratio?	Most data protection environments using FCIP or iFCP gateways use their built-in compression method, because this is a best practice of the product vendors and the SAN vendors who configure them. However, the best practice for WAN optimization of these technologies is to disable any compression currently in use and employ the SteelHead optimization instead. The first-pass LZ compression in the SteelHead typically matches the compression already in use and then RiOS SDR allows for an overall level of data reduction that improves the previous compression ratio. Knowing the current compression ratio achieved using the built-in compression method is important in determining whether the SteelHeads can improve upon it.
Are SteelHeads already deployed? If yes, what is their make and the RiOS version?	If the environment already has SteelHeads deployed and data protection is a new requirement, knowing the current appliance models in use can determine if adequate system resources are available to meet the objectives without adding additional hardware. Knowing the current RiOS version is essential in determining what features and tuning opportunities are available in the RiOS release to provide the optimal configuration for data protection. If the environment does not already use SteelHeads, Riverbed can recommend the ideal RiOS version based on the environment and data protection objective.
X-Factor Considerations
How much new incremental data is added daily or hourly?	The rate of change information is extremely useful alongside the dataset size information to provide accurate performance estimates. If a dataset is too large for a single RiOS data store to find the data patterns for the entire dataset without wrapping continuously, Riverbed can plan system resources based on servicing the amount of data that changes hourly or daily.
What is the total size of the dataset?	For some data protection solutions such as backup, knowing the dataset size is extremely important for RiOS data store sizing. Ideally you want to select SteelHeads that can find the data patterns for the entire dataset without continuously wrapping the RiOS data store. For SAN-based solutions this information can be more difficult to gather, but even rough estimates can help. For example, you can estimate the size of the Logical Unit Number (LUNs) that are subject to replication or the size of the databases stored on an array.
What is the dataset type? For example, Exchange, VMware, SQL, or file system.	Different types of data exhibit different characteristics when they appear on the network as backup or replication traffic. For example, file system data or VMware images often appear as large, sequential bulk transfers, and lend themselves well to disk-based data reduction. On the other hand, real-time replication of SQL database updates can often present a workload that requires heavy amounts of disk seeks. These types of workloads can lend themselves better to a memory-based approach to data reduction.
Is the data pre-compressed?	You must determine if precompressed data is present for accurate performance estimates. Data stored at the point of origin in a precompressed format (such as JPEG images, video, or any type of data that has been compressed separately with utility tools such as WinZip), might see limited data reduction from SteelHeads.
Is the data encrypted?	Data stored at the point of origin in a preencrypted format (such as DPM-protected documents or encrypted database fields and records) might see limited data reduction from the SteelHead.
How repeatable is the data?	You must determine if repeatable data is present for accurate performance estimates. Data that contains internal repetition (such as frequent, small updates to large document templates) typically provide very high levels of data reduction.
What LAN-side throughput is needed to meet the data protection goal?	It is the speed of data going in and out of the systems on the LAN that establishes whether the data protection objectives can be met. The LAN-side throughput can be calculated by dividing the total amount of changed data by the time window for the replication or backup job. The WAN-side throughput and level of data reduction represent the level of optimization.