SteelHead™ Deployment Guide : Data Protection Deployments : Planning for a Data Protection Deployment
  
Planning for a Data Protection Deployment
This section describes methods for planning a successful data protection deployment. You must consider several variables, each of which can have a significant impact on the model, number, and configuration of SteelHeads required to deliver the required result. This section includes the following topics:
  • LAN-Side Throughput and Data Reduction Requirements
  • Predeployment Questionnaire
  • Riverbed strongly recommends that you read both of these sections and complete the questionnaire. Riverbed also recommends that you consult with Riverbed Professional Services or an authorized Riverbed Delivery Partner when planning for a data protection deployment.
    For information about the other factors to consider before you design and deploy the SteelHead in a network environment, see Choosing the Right SteelHead Model.
    LAN-Side Throughput and Data Reduction Requirements
    This section describes requirements and configurations from LAN-side throughput and data reductions. This section includes the following topics:
  • Configuring a Nightly Full Database Backup
  • Configuring a Daily File Server Replication
  • Configuring a Very Large Nightly Incremental Backup
  • The basis for correctly qualifying, sizing, and configuring SteelHeads for use in a data protection environment depends on that the deployed SteelHeads can:
  • receive and process data on the LAN at the required rate (LAN-side throughput), and
  • reduce the data by a certain X-Factor, to
  • transfer data given certain WAN-side bandwidth constraints.
  • These constraints are defined by the following formula:
    LAN-side Throughput / X-Factor <= WAN-side Bandwidth
    You derive the LAN-side throughput requirements from an understanding of the maximum amount of data that must be transferred during a given time period. Often, the time allotted to transfer data is defined as a target Recovery Point Objective (RPO) for your organization.
    The RPO describes the acceptable amount of data loss measured in time. You must recover data at this time. This is generally a definition of what an organization determines is an acceptable data loss following a disaster; it is measured in seconds, minutes, hours, days, or weeks. For example, an RPO of 2 hours means that you can always recover the state of data 2 hours in the past.
    The following link provides an Excel throughput calculator that you can use to calculate bandwidth requirements expressed in other forms of time objectives: https://splash.riverbed.com/message/8478#8478.
    The X-Factor describes the level of data reduction necessary to fit the LAN data into the WAN link. For example, if LAN-side throughput required to meet RPO is 310 Mbps and WAN-side bandwidth available is 155 Mbps, then X-Factor is 2x. X-Factor is highly dependent on the nature of the data, but in practice it generally ranges from 2x (for LZ-only compression) to 4-8x (for default SDR mode).
    Configuring a Nightly Full Database Backup
    Objective:
    “I want to copy 1.8 TB of nightly database dumps over my OC-3 within a 10-hour window.”
    Formula:
    1.8 TB / 10 hours = 400 Mbps
    Solution:
    An OC-3 link has a capacity of 155 Mbps. To deliver 400 Mbps, the SteelHead must reduce the total bandwidth over the WAN by 400/155 = 2.58x.
    Configuring a Daily File Server Replication
    Objective:
    “After consolidating the NetApp file servers from branch offices, I expect daily SnapMirror updates from my data center to go from 400 GB to 4 TB per day. I have a designated DS-3 that is nearly maxed out. Can the SteelHead help me replicate all 4 TB each day using my DS-3?”
    Formula:
    4 TB / 1 day = 370 Mbps
    Solution:
    A DS-3 link has a capacity of 45 Mbps. To deliver 370 Mbps, the SteelHead must reduce the total bandwidth over the WAN by 370/45 = 8.2x. This is within the range of data reduction that the SteelHead can achieve using default SDR, depending on the amount of redundancy present in the data streams.
    Configuring a Very Large Nightly Incremental Backup
    Objective:
    “The incremental Tivoli Storage Manager (TSM) backup at a remote site is typically 600 GB and the backup window each night is 8 hours. Can I perform these backups over the WAN using a T1 link?”
    Formula:
    600 GB / 8 hours = 166 Mbps
    Solution:
    A T1 link has a capacity of 1.5 Mbps. To deliver 166 Mbps, the SteelHeads must reduce the total bandwidth over the WAN by 166/1.5 = 110x. This is a very high level of reduction that is typically out of range for data protection deployments.
    To support backups over the WAN, you must upgrade the WAN link. A T3 link, for example, has a capacity of 45 Mbps. Using a T3 link, the SteelHeads needs to achieve a data reduction of 166/45 = 3.7x, which is attainable for many deployments.
    Predeployment Questionnaire
    To organize and take a survey of the WAN-side, LAN-side, and X-Factor considerations, use the predeployment questionnaire in the following table. Discuss your completed survey with Riverbed Professional Services or an authorized delivery partner, to determine the best model, number, and initial configuration of the SteelHeads to deploy.
    For a Microsoft Word version of the Data Protection questionnaire go to http://splash.riverbed.com/message/3194.
    s
    Question
    Why This Is Important
    WAN-Side Considerations
    Is this a two-site or a multisite (fan-in, fan-out) data protection opportunity?
    In a two-site deployment, the same SteelHead models are often selected for each site. In a multisite (fan-in, fan-out) deployment, the SteelHead at the central site is sized to handle the data transfers to and from the edge sites.
    What is the WAN link size?
    Knowing the WAN link size is essential in determining:
  • which models are feasible for deployment because the SteelHeads specifications are partially based on the WAN rating.
  • the level of data reduction the SteelHeads must deliver to meet the ultimate data protection objective.
  • What is the network latency between sites?
    Knowing the latency in the environment is essential for providing accurate performance estimates. Network latency and WAN link size are used together to calculate buffer sizes on the SteelHead to provide optimal link utilization. Although SteelHeads are generally able to overcome the effects of latency for network protocols used in data protection solutions, some are still latency sensitive.
    Is there a dedicated link for disaster recovery?
    Environments with a dedicated link are typically easier to configure. Environments with shared links must employ features such as QoS to ensure that data protection traffic receives an adequate amount of bandwidth necessary to meet the ultimate objective.
    LAN-Side Considerations
    Which backup or replication products are you using?
    Certain backup or replications products require special configuration. Knowing what is currently in use is essential for providing configuration recommendations and performance estimates. Riverbed has experience with different data protection products and business relationships with many different replication vendors. Many have similar configuration options and network utilization behaviors.
    Some examples of backup and replication products:
  • EMC - SRDF/A RecoverPoint
  • NetApp - SnapMirror, SnapVault
  • IBM - GlobalMirror, XIV replication
  • HDS - TrueCopy, Hitachi Universal Replicator
  • Symantec - NetBackup
  • Vision Solutions - Double-Take
  • CA - ARCserver
  • HP - Continuous Access EVA
  • HDS - TrueCopy
  • IBM - PPRC
  • Are you using synchronous or asynchronous replication?
    Asynchronous replication is typically a very good fit. By comparison, synchronous replication has very stringent latency requirements and is rarely a good fit for WAN optimization.
    Many types of data protection traffic are not typically considered replication of either type, such as backup jobs.
    What is your backup methodology?
    Knowing the backup type and schedule provides insight into the frequency of heavy data transfers and the level of repetition within these transfers.
    Some examples of backup methodologies are:
  • A single full backup and an incremental backup for life (synthetic full).
  • A daily full backup.
  • A weekly full backup and a daily incremental backup.
  • Are your data streams single or multistream?
    What is the total number of replication streams?
    Knowing the number of TCP streams is essential in providing a configuration recommendation and performance estimate. Because SteelHeads proxy TCP/IP, the number of TCP streams created by the data protection solution can impact the SteelHead resource utilization.
  • RiOS v5.0 and earlier have a constraint that each TCP session (stream) is serviced by a single CPU core, so splitting the load across many streams is essential to fully use the resources in larger, multicore SteelHeads.
  • RiOS v5.5 or later has multicore features that allow multiple CPU cores to process a single stream.
  • When considering the number of streams, of primary importance is the number of heavyweight data streams that carry significant amounts of traffic. In addition, consider that any smaller control streams that carry a small amount of traffic (such as these present in many backup systems and some FCIP systems).
    Depending on the data protection technology in use, there might be options to increase the number of streams in use. As a first step, determine how many streams are observed in the current environment. Determine whether there is a willingness to increase the number of data streams if a method to do so is suggested.
    Is there a FCIP/iFCP gateway?
    If yes, what is the make, model, and firmware version?
    Some FCIP/iFCP gateways (or particular firmware versions of some gateways) do not adhere fully to the TCP/IP or FCIP standards. Depending on what is in use they might require firmware upgrades, special configuration, or cannot be optimized at this time.
    Gateways are mainly seen in fibre channel SAN replication environments such as SRDF/A, MirrorView, and TrueCopy.
    Typical firmware versions: Cisco MDS, FCIP v4.1(3) Brocade 7500 FOS v6.3.1, QLogic isr6142 v2.4.3.2.
    Is compression enabled on the gateway or the replication product?
    If yes, what is the current compression ratio?
    Most data protection environments using FCIP or iFCP gateways use their built-in compression method, because this is a best practice of the product vendors and the SAN vendors who configure them. However, the best practice for WAN optimization of these technologies is to disable any compression currently in use and employ the SteelHead optimization instead.
    The first-pass LZ compression in the SteelHead typically matches the compression already in use and then RiOS SDR allows for an overall level of data reduction that improves the previous compression ratio.
    Knowing the current compression ratio achieved using the built-in compression method is important in determining whether the SteelHeads can improve upon it.
    Are SteelHeads already deployed?
    If yes, what is their make and the RiOS version?
    If the environment already has SteelHeads deployed and data protection is a new requirement, knowing the current appliance models in use can determine if adequate system resources are available to meet the objectives without adding additional hardware.
    Knowing the current RiOS version is essential in determining what features and tuning opportunities are available in the RiOS release to provide the optimal configuration for data protection. If the environment does not already use SteelHeads, Riverbed can recommend the ideal RiOS version based on the environment and data protection objective.
    X-Factor Considerations
    How much new incremental data is added daily or hourly?
    The rate of change information is extremely useful alongside the dataset size information to provide accurate performance estimates. If a dataset is too large for a single RiOS data store to find the data patterns for the entire dataset without wrapping continuously, Riverbed can plan system resources based on servicing the amount of data that changes hourly or daily.
    What is the total size of the dataset?
    For some data protection solutions such as backup, knowing the dataset size is extremely important for RiOS data store sizing. Ideally you want to select SteelHeads that can find the data patterns for the entire dataset without continuously wrapping the RiOS data store.
    For SAN-based solutions this information can be more difficult to gather, but even rough estimates can help. For example, you can estimate the size of the Logical Unit Number (LUNs) that are subject to replication or the size of the databases stored on an array.
    What is the dataset type? For example, Exchange, VMware, SQL, or file system.
    Different types of data exhibit different characteristics when they appear on the network as backup or replication traffic. For example, file system data or VMware images often appear as large, sequential bulk transfers, and lend themselves well to disk-based data reduction.
    On the other hand, real-time replication of SQL database updates can often present a workload that requires heavy amounts of disk seeks. These types of workloads can lend themselves better to a memory-based approach to data reduction.
    Is the data pre-compressed?
    You must determine if precompressed data is present for accurate performance estimates. Data stored at the point of origin in a precompressed format (such as JPEG images, video, or any type of data that has been compressed separately with utility tools such as WinZip), might see limited data reduction from SteelHeads.
    Is the data encrypted?
    Data stored at the point of origin in a preencrypted format (such as DPM-protected documents or encrypted database fields and records) might see limited data reduction from the SteelHead.
    How repeatable is the data?
    You must determine if repeatable data is present for accurate performance estimates. Data that contains internal repetition (such as frequent, small updates to large document templates) typically provide very high levels of data reduction.
    What LAN-side throughput is needed to meet the data protection goal?
    It is the speed of data going in and out of the systems on the LAN that establishes whether the data protection objectives can be met. The LAN-side throughput can be calculated by dividing the total amount of changed data by the time window for the replication or backup job. The WAN-side throughput and level of data reduction represent the level of optimization.