About blockstore prefetch and prepopulation

One of the drawbacks of storage protocols communicating across a wide area network (WAN) is that the subsequent requests for further data may be nonsequential to the point that they seem random. This randomness is by design and a facet of the backend storage, but it makes rapid data delivery across a WAN using a storage protocol difficult due to the high-latency adding a long turnaround time between request and response.

One way to mitigate this effect is for the sending side to be able to predict in some way what the subsequent requests will be so that data can be sent without waiting for the request. However, this is not possible with traditional storage protocols.

The system architecture is appropriate for this type of approach because data on the Edge is held locally in the blockstore. Blockstore is the persistent local cache of storage blocks linked to one or more dedicated LUNs at the data center. As long as the data blocks that the Edge wants to read are already in blockstore, no request needs to be transmitted across the WAN through the Core to backend storage. Instead, the Edge responds to the read request serving data at local disk speed. Since the Edge blockstore benefits from a three-tier architecture of memory, solid-state disk (SSD), and hard disk drive (HDD), the read response is faster than could be achieved by storage in traditional branch servers. But if the required data is not in blockstore (called a blockstore miss), the Edge requests the data from the backend storage by asking the Core. The Core understands specific blockstore formats and can send the requested data and also continue to proactively send across additional blocks of data to the blockstore that it predicts the Edge may need. If the prediction is successful, the Edge will then find the subsequent data it needs locally in the blockstore and therefore does not need to submit further requests to the Core.

Depending on the scenario, populating the Edge blockstore can be performed by one or more of these methods:

• Prefetch—The act of pushing data out from the Core in response to a blockstore miss at the Edge. Not only does the Core push out the requested data, it also sends across additional data that it predicts may be required by the Edge on subsequent reads. Prefetch is reactive (on demand) in that it operates in response to the Edge.

• Prepopulation—Sometimes referred to as full prepopulation. Prepopulation is the act of proactively sending data without a request from the Edge. Prepopulation does not work with unpinned storage. Unpinned storage is populated dynamically (incorporating prefetch) as the Edge begins using it.