Benefits and uses of all-flash storage in the M&E workflow

Introduction

The advantage that Flash-based storage in general, and NVMe in particular brings to the M&E workflow is the ability to disconnect the scaling of performance from the scaling of capacity. This can be used to patch non-optimal workflows (valid use of the technology when done intentionally) but it can also be used to deliver targeted performance improvements to data sets that are critical for parallel workflows without expanding capacity simply to increase performance.

Background, feel free to skip ahead.

Using Isilon as a general example, the method for building a cluster was to determine the rough ratio of performance to capacity that a workload needed. This would drive the selection of a node type. Consideration was given to the general shape of the workload, which generally would include a smaller data set that required higher performance and a much larger set of data that could make do with less performance. During deployment and migration, a small percentage of the high-performance data would be distributed across each drive in every node along with a larger share of the non-performant data. When a user accessed the high-performance data, the aggregate effect of the data distribution would make the cluster appear to be much more performant than a cluster sized for just the high-demand data. Had the initial estimates of the ratio of high to low-performance data been correct, and the data continued to grow at that ratio, the cluster could scale indefinitely, and all workloads would be satisfied. However, misestimating the initial ratio, suffering unplanned growth in the high-performance data set, or increasing demand for the lower-performance data set quickly led to disaster.

Another requirement particular to the M&E space is the need for highly performant Metadata operations. General use file servers don’t generally have directories with thousands of files or require directory tree walks across large parts of the namespace. Both situations are common in M&E workflows, representing the third tier of super-performant storage required for the acceptable operation of a file server. Initially, Isilon lacked the ability to provide this metadata acceleration. They responded in two ways; the A100 class nodes and SSDs. Isilon initially answered this issue with the A100 class nodes, available until recently. These nodes added compute, RAM, network, and cache to existing clusters without increasing the spindle count. These nodes tended to improve metadata read performance and, to a lesser degree, general write performance. This was due to the additional RAM and L1 cache that clients could use for these operations. It was an expensive way to address the problem, though, and soon SSD drives began to appear in new node classes, first as an option, then as a required part of the design.

As prices of SSD drives dropped and the size of metadata pools and performant data sets increased, the cache drives became larger and more numerous in the node classes. Eventually, the ability to cache small amounts of data was added in response to customer demand. Experimental all-flash variants of the S class nodes were tested, and eventually, the all-flash F class node family was released.

The good stuff.

Today, all-flash clusters provide the ability to deliver the best available performance for data sets of any size. The need to interleave higher and lower performance data on a larger cluster is removed, and both data sets can be scaled independently on hardware optimized for the use case. This is an important development as the move to higher-density spinning drives made the high-to-low-performance ratio calculations untenable. Based on spindle count performance, the modern equivalent of a basic X410 cluster today could have a capacity as high as 2.1PB. This is not really a fair comparison, because a lot of other things have changed over the years, but if aggregate IOPS is a driving factor, spindle count is still very important. Flash storage breaks this ratio by providing massive data transfer speed to drives of arbitrary size.

Of course, this can be used to paper over a bad workflow, and in many cases that is exactly what happens. But in a disciplined organization, good data management practices combined with skilled pipeline development can be served with an all-flash cluster that is a fraction size and cost of a traditional cluster. This frees up budget, rack space, cooling, and power to scale the compute environment without adding additional data center spending. This in turn makes the technology team better able to absorb changing requirements from the creative and production organizations. Increases in data needs can be separated by size and performance requirements and purchase requests can be more directly mapped back to the requests from the productions. For example, explaining that a storage purchase of 800TB is necessary to meet the performance requirements of a 100TB texture cache request is no longer necessary. At the other end of the scale, not needing to lend performance from lower to higher priority data allows the nearline storage to embrace density and keep much more data in a more accessible format. This shortens the time to promote cold data back to production or to delay data archiving to address more pressing needs (but don’t do this forever).

The production mindset, and what you can do about it (TLDR; nothing).

Production never asks for less, and they never go back to how it used to be. Having a magic wand that instantly increased the performance of a year-old cluster by 100% wouldn’t produce a year of steady quiet production growth. Once the additional capacity was common knowledge, the land rush to have more complex scenes, additional retakes, and extra content would consume the additional capability nearly instantly. Every improvement in capacity or performance is immediately converted into more creative iterations on the project. One of the old sayings is that there are two types of production storage, ‘new’ and ‘full’. The evolution of flash-based storage is a great addition to the M&E technology toolbox, in that it turns the single performance/capacity slider back into two somewhat independent ones, but only if it’s used as a part of a well-designed and implemented pipeline. The production infrastructure should exist to support the technology process, which exists to support the production process. The production workflow is generally unyielding, so anything that makes the supporting infrastructure more responsive to the technology process translates directly into a better quality of life and ROI for the technology team. A more direct mapping of costs to needs and the ability to provide targeted capability improvements without expensive side effects can simplify conversations with production and management.