Deployment in ESXi

This chapter provides several best practices to deploy NetIM on a VMware ESXi hypervisor, including:

• Use dedicated resources for consistent performance.

• Meet the system requirements in your virtual machine (VM) deployments.

• Understand NetIM docker IP addressing.

• Be familiar with the port access list.

• Be cautious when using vMotion and VMTools.

• Use the recommended deployment steps for deploying NetIM in ESXi 6.5, 6.7, or 7.0.

• Make adjustments to the IP address definitions if required.

General best practices

Guaranteeing a consistent NetIM user experience requires a careful selection of virtual resources and tools to cope with the initial load, especially to maintain daily performance and when you configure more network elements and services.

Consider these best practices when you are using VMware ESXi as a hypervisor:

• Use dedicated virtual resources.

• Consider future growth when sizing NetIM.

• Be aware of vMotion usage limitations.

• Use VM tools with caution because Riverbed does not support them.

• Use additional, unsupported OS packages, agents, or sensors with caution.

• Understand, organize, and document your IP addressing plan.

Importance of dedicated resources

NetIM responsiveness and performance depend on computing capacity, memory, and storage availability. Many services run in parallel, and when you add new devices, interfaces, or network processes to the NetIM platform, dedicated system resources are key for maintaining the NetIM user experience.

Best practices for consistent performance of the NetIM systems:

• Resource planning—A full review of resources needed should be concluded before installing the NetIM VMs.

• Resource allocation—An appropriate selection of hypervisor resources and the corresponding installation process of the NetIM OVA files performs a correct initial allocation. This basic provisioning sets a resource baseline.

• Resource reservation—Reservation of resources ensures that NetIM has adequate, contiguous resources available for its daily stable operation.

• Resource prioritization—In cases where resource contention exists in the host system, an increase in resource priority can ensure NetIM has timely access to allocated and reserved resources for CPU, memory, and storage.

The performance of virtual network infrastructure (vSwitch, vRouters, and so on) is directly related to the cluster's performance and related communications in the swarm. We recommend that virtual network infrastructure provide non-blocking forwarding between the NetIM VMs. Traffic shaping or policing is typically not implemented. If traffic shaping or policing becomes necessary, the communications specifications in the installation guide should be used as the baseline or minimum to support NetIM operation. Lower values degrade NetIM operation.

The system requirements definitions for NetIM provide the baseline for initial deployment. See the “System requirements” section below. Often, the requirements are adequate for ongoing operation(s). When increasing resources for the NetIM VMs, the performance of the resources is typically as important as the amount of resource added. If NetIM benefits from reserved resources, in cases of host systems where resource contention may be high, NetIM VMs will have high-priority resources available. Then an increase from Normal to High (or to a custom value) will not congest memory and CPU utilization.

System requirements

Actual deployment requirements depend on your licensed polling limits. To select the compute capacity (CPUs), memory, and storage to deploy NetIM, see the Alluvio NetIM System and Deployment Requirements Guide.

Understanding NetIM docker IP addressing

Prior to NetIM 2.5.1, NetIM internal uses IPv4 addresses in these subnets by default:

• 10.255.0.0/16

• 10.50.0.0/16

• 10.60.0.0/16

• 172.17.0.0/16

• 172.18.0.0/16

In NetIM 2.5.1 and later (see NetIM 2.5.1 Release Notes), the range of NetIM default IP addresses is:

• bridge: 198.18.0.0/18

• docker_gwbridge: 198.18.64.0/18

• ingress: 198.18.128.0/18

• default_primary-network: 198.18.192.0/18

• default_agent-network: 198.19.0.0/18

During setup, NetIM modifies subnet usage if it detects one of the above subnets is in use in your network. NetIM setup’s Advanced Docker configuration step allows you to view and manually select alternative subnets.

Subnets used by NetIM internal should not be routable subnets in your enterprise network.

During installation, an automated check is performed to determine IPv4 address ranges that can be used for the cluster. The ranges identified automatically are usually RFC-1918 IPv4 addresses. Alternately, these address ranges can be predetermined during the planning and then implemented when the NetIM systems are configured after initial deployment.

Access to ports

NetIM requires different ports open to communicate in your network. Check the ports table and its function in the Alluvio NetIM System and Deployment Requirements Guide.

vMotion use and limitations

VM snapshots, through vMotion, are not supported while the NetIM services are running in any circumstance.

VMTools

VMware Tools (VMTools) is a group of services and modules in VMware products to help in the operation of certain OS processes and automation; however, Riverbed does not support the actions executed to the NetIM OS through VMTools.

Deploy NetIM in ESXi 6.5, 6.7, or 7.0

To deploy NetIM software on the ESXi server(s), you need to download two OVA packages from the Support site (netim_core_2XX_XXX.ova and netim_microservices_2XX_XXX.ova) and copy them to your local hypervisor.

Using a supported browser and your ESXi host login credentials, log in to one or more ESXi servers that host the NetIM VMs.

Note: The OVAs do not need to be deployed to the same ESXi server. However, the VMs created must be able to communicate overall required ports.

To deploy the different NetIM components, follow the steps in the Alluvio NetIM Virtual Edition Installation Guide. You will create:

• a single NetIM manager from the microservices OVA.

• one or more NetIM workers from the microservices OVA.

• zero or more NetIM data managers from the microservices OVA.

• a single NetIM core from the netim_core OVA.

The performance of virtual network infrastructure (vSwitch, vRouters, and so on) directly impacts the cluster's performance and the related communications in the swarm. We recommend that virtual network infrastructure provide non-blocking forwarding between the NetIM VMs. Traffic shaping or policing is typically not implemented. If traffic shaping or policing becomes necessary, the communication specifications in the installation guide should be used as the baseline or minimum to support NetIM operation. Lower values degrade NetIM operation.

Configuring NetIM in ESXi

The initial configuration sets up the NetIM components such that they are associated and can communicate with each other. You perform this configuration through the VM consoles. After configuration and startup of all NetIM components, use a web browser to access the NetIM web UI for licensing.

Find detailed configuration steps, see the Alluvio NetIM Virtual Edition Installation Guide. These steps include:

• Setting up the NetIM manager, data manager, workers, swarm, and core.

• Signing in to the NetIM VM web user interface.

Post-deployment adjustments

Increasing VM disk space, memory, and vCPUs

Find detailed configuration steps to increase virtual resources in the Alluvio NetIM Virtual Edition Installation Guide.

The system requirements definitions for NetIM provide the baseline for initial deployment; see the “System requirements” section above. Often the requirements are adequate for ongoing operation(s). When increasing resources for the NetIM VMs, the performance of the resources is typically as important as the amount of resource added. If NetIM benefits from reserved resources, in cases of host systems where resource contention may be high, NetIM VMs will have high-priority resources available. The increased priority will minimize resource contention.

Scaling NetIM with additional data managers

You should always try to plan your NetIM deployment to allocate the recommended number of data managers for the scale of your deployment. When you add data managers to the swarm, metric persistence (Cassandra) and event streaming (Kafka) automatically scales and balances across the manager and data managers. While it is always possible to add additional data managers after your initial deployment, rebalancing any existing data persistence across the manager and data managers incur additional overhead and should be avoided.

In addition to scaling considerations related to deployment size, NetIM systems with many concurrent Web User Interface sessions can benefit from additional Data Managers. Additional Data Managers can provide further scaling support.

Scaling NetIM with additional workers

You should attempt to plan your NetIM deployment such that you also allocate the recommended number of workers for the expected scale of your deployment (see the “Systems requirements” section). However, it is less important to accurately plan and allocate the exact number of workers that your deployment may require. You can add additional workers and load balance across multiple workers in the swarm at any time.

Add workers to meet growing or changing requirements, including these:

• Accounts

• Type of data collected

• Number of elements polled

• Frequency of collection

• Class of service reporting (consumes higher resources)

• User-defined metrics (custom or specific) the type, number, and frequency

The tenant services are not automatically scaled and load balanced across the workers. You need to manually scale up certain services in the tenant stack to take advantage of the added workers.

When adding additional workers, we recommend that you scale up these services:

• Poller

• Alerting

• Thresholding

You can scale up swarm services by entering the scale command on the NetIM Manager VM:

scale tenant-stack/<id> <tenant-service-name> <number-of-replicas>

The scale command persists the service scaling across reboots or restarts. For example, if you deployed three workers in your NetIM swarm, we recommend that you scale your poller, alerting, and thresholding services to the number of workers by entering these commands on the NetIM manager:

scale tenant-stack/1 poller 3
scale tenant-stack/1 alerting 3
scale tenant-stack/1 thresholding 3

Adjust the internal subnets

If you need to adjust the internal subnets, you may do so after initial setup.

To adjust the internal subnets

1. Stop all services on the core and swarm.

2. Run the docker swarm leave -f command on all non-manager nodes.

3. Run the docker swarm leave -f command on the manager.

4. Run setup again on the manager and answer “yes” when prompted if you want to perform Advanced Docker Configuration.

5. Run setup again on all data manager and worker nodes.

6. Run setup again on the core.