About Performance Enhancements
This section describes configuration settings that are not required but can improve network throughput and acceleration performance.
About domain process CPU pinning
Network I/O is processed by vhost threads, which are threads in the Quick Emulator (QEMU) user space. Vhost threads should be pinned to match the guest virtual CPU (vCPU) threads. We recommend pinning at least two CPUs for vhost threads, which allows these threads to run on the same subset of physical CPUs and memory, improving system performance. This sample XML configuration pins CPUs 0 and 2 to be used for vhost thread processing.
<cputune>
<emulatorpin cpuset="0,2"/>
</cputune>
About disk I/O thread allocation and pinning
Input/output threads (I/O threads) are dedicated event loop threads for supported disk devices. I/O threads perform block I/O requests that can improve the scalability of some systems, in particular Symmetric Multiprocessing (SMP) host and guest systems that have many logical unit numbers (LUNs).
We recommend pinning I/O threads to physical CPUs that reside in the same non-uniform memory access (NUMA) node. This sample XML configuration defines four I/O threads for the disk devices by using the iothreads XML element, and pins CPUs 4, 6, 8, and 10 to I/O threads 1, 2, and 3 by using the iothreadpin XML element.
<domain>
<iothreads>4</iothreads>
</domain>
<cputune>
<iothreadpin iothread='1' cpuset='4,6,8,10'/>
<iothreadpin iothread='2' cpuset='4,6,8,10'/>
<iothreadpin iothread='3' cpuset='4,6,8,10'/>
</cputune>
About disk I/O load distribution
We recommend equally distributing the processing of RiOS data store disk I/O processes by thread. This example shows how to allocate four threads and eight RiOS data store disks. Allocating two disks for each thread equally distributes the processing.
<disk type='file' device='disk'>
<driver name='qemu' type='qcow2' iothread='1' cache='none' io='threads'/>
<source file='/work/quicksilver/storage/oak-cs737-vsh1-segstore1.img'/>
<target dev='vdb' bus='virtio'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x08' function='0x0'/>
</disk>
About virtual CPU-to-physical CPU pinning
Pinning each virtual CPU to a separate physical core can increase acceleration performance. We recommend that physical CPU 0 (zero) is not used for pinning virtual CPUs. Use the lscpu command to view the configuration of the physical cores on the KVM host. Find CPUs that share the same NUMA node (virtual CPU).
In this example physical CPUs 0 to 4 share node 0, physical CPUs 5 to 9 share node 1, physical CPUs 10 to 14 share node 2, and physical CPUs 15 to 19 share node 3.
NUMA node0 CPU(s): 0-4
NUMA node1 CPU(s): 5-9
NUMA node2 CPU(s): 10-14
NUMA node3 CPU(s): 15-19
Open the domain.xml file, and then add a <cputune> section to the file and assign each virtual CPU to a single, separate physical core. For example:
<cputune>
<vcpupin vcpu='0' cpuset='5'/>
<vcpupin vcpu='1' cpuset='6'/>
<vcpupin vcpu='2' cpuset='7'/>
<vcpupin vcpu='3' cpuset='8'/>
</cputune>
In this example, virtual CPU (or node) 0 is pinned to physical CPU 5, virtual CPU 1 is pinned to physical CPU 6, virtual CPU 2 is pinned to physical CPU 7, and virtual CPU 3 is pinned to CPU 8.
Save your changes to the domain.xml file, and then restart the virtual machine.
About data store and management disk separation
Performance improvements can be achieved by placing the data store and management virtual disks on separate physical storage devices. The data store should be placed on the fastest disk drive.
About disk cache mode
Write performance improvements on the data store disk drive can be achieved by setting the disk cache mode to none. Riverbed supports qcow2 format for the data store disk. With caching mode set to none, the host page cache is disabled, but the disk write cache is enabled for the guest. In this mode, the write performance in the guest is optimal because write operations bypass the host page cache and go directly to the disk write cache. If the disk write cache is battery-backed, or if the applications or storage stack in the guest transfer data properly (either through fsync operations or file system barriers), then data integrity can be ensured.
Open the domain.xml file created during the SteelHead for KVM instantiation. Append these attributes to the <driver> element within the <disk> section that refers to your RiOS data store:
cache='none' io='threads'
Example:
<disk type='file' device='disk'>
<driver name='qemu' type='qcow2' cache='none' io='threads'/>
<source file='/work/quicksilver/storage/myHost-vsh1-segstore1.img'/>
<target dev='vdb' bus='virtio'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x08' function='0x0'/>
</disk>
Save your changes to the domain.xml file, and restart the virtual machine.