Data Protection with Cloud Volumes ONTAP: Part II – Choosing the Right AWS Storage

Overview

In the previous post of this series we examined Cloud Volumes ONTAP (CVO) and common use cases for it. One ideal use case for CVO, if your data resides on a NetApp FAS or AFF FAS array on premises, is data protection and disaster recovery. Using CVO for data protection enables the business to protect its key assets without having to build a remote DC, sign a co-lo contract, or deploy and manage hardware.

Cloud Volumes ONTAP in AWS runs on EC2 instances backed by EBS storage. Given the prevailing unit cost of EBS, you might wonder if there is any scenario in which CVO is viable from a cost standpoint compared to running on-prem. Especially so if you need to protect hundreds of TBs of data. The answer is that it can be, but only if your CVO cloud solution is cost optimized.

There are three layers of cost optimization to consider: compute, licensing, and storage. We’ll cover each in this series, but in this post we’ll focus on the storage layer and how S3 – a lower cost option – can be used to reduce the amount EBS storage required.

Before going further, if you’re unfamiliar with NetApp architecture, you may want to check out this Concepts Doc from NetApp. It will help you get your bearings as we’re about to dive a little deeper.

NetApp FabricPools

We touched on FabricPools briefly in the previous post. It’s a tiering mechanism that extends the capacity of your NetApp array to S3 compliant storage. The details complete with requirements, limitations, and how policies work, etc. can be found in this NetApp Technical Report.

Although this feature is commonly thought of as a way to extend on-prem storage to the cloud, it definitely has applicability with CVO too, especially as a replication target. Using FabricPools and a target volume tiering policy set to “all,” your replication target can be backed by a few TBs of EBS and hundreds of TBs of S3, which can save you many thousands of dollars at scale on your AWS bill.

AWS Storage Classes, Types, and Cost

AWS EBS pricing and S3 pricing changes frequently. Since cost should be a major factor of any cloud design, it’s helpful to consider pricing for each storage type relative to the others, and to select the lowest cost storage that will meet your service level commitments.

Within EBS and S3 there are several types and classes, each with different price points. CVO uses EBS io1 and gp2 for its boot and root volumes. Data aggregates can be deployed on EBS io1, gp2, st1, or sc1. FabricPools can tier data from those relatively costly aggregates to S3 standard, IA (Infrequently Accessed), IA One-Zone, and Intelligent Tiering.

Consider the following: gp2 and sc1 are both EBS types available for CVO data aggregates, but sc1 is 75% less expensive. It’s also HDD, not SSD, and is intended for entirely different workloads – but that’s quite a gap in price. Similarly, S3 One Zone IA is about 55% less expensive than S3 Standard and Intelligent Tiering. The price differences are even more dramatic when comparing EBS types to S3 classes. S3 One Zone IA is about 90% less expensive than EBS gp2!

It’s important to note that for brevity, we’re only comparing capacity costs, but there are other cost factors to be considered. For example, the EBS io1 tier incur provisioned IOPS costs and all S3 tiers have data retrieval and API costs. It’s also important to point out that there’s a fundamental difference in how EBS and S3 are billed. With EBS, you pay for what you provision, regardless of the portion used. Since you don’t actually provision S3 capacity, you only pay for what you use. The key take-away? Depending on your workload, the other cost factors can be important, and must be considered for a final design.

Cloud Volumes ONTAP DR Design

Let’s consider a design that makes sense for a DR target. SnapMirror, NetApp’s native block based replication feature, will be conducting all of the writes to CVO, assuming we’re not in DR mode. The write traffic for replication is likely to be constrained at the WAN link between your DC and AWS, rather than at the storage layer itself. Given this and the fact that heavy reads are only likely during a recovery -under “normal” operation the usage profile will be heavy writes – S3 is a good fit for the replication target.

The FabricPools TR referenced above states, “For most environments, a 1:10 local tier:cloud tier ratio is extremely conservative while providing significant storage savings.” Following this reference, 10TB of EBS st1 and 100TB of S3 IA is a capable and cost effective storage layout for a CVO DR target.

Why choose st1, do we really need to provision that much of it, and why not use S3 One Zone IA to save even more?

EBS st1 is the lowest cost EBS option that supports FabricPools. It’s also a good fit when recovery is required, blocks are called back from the cloud tier to the local tier, and that workload is mostly sequential. However, moving volumes to a new aggregates is easy and simple. Making an adjustment to a different type of EBS storage after recovery is also an option if the workload profile requires it.

You do need to be careful with volume size. A single-node CVO instance has a limit of 34 EBS attached volumes. If part of your DR plan calls for serving all data from EBS during fail-over, you don’t want your CVO instance to have several small EBS volumes attached, as it will limit the total capacity that instance is ultimately capable of supporting. That said, from what we’ve seen in many customer environments, metadata on the local tier of target DR volumes consumes < 4%.

One large volume in an aggregate can also lead to performance bottlenecks, especially if you expect to conduct volume move operations during a recovery event.  Ultimately you want to strike a balance.  Be aware of EBS performance capabilities and that individual aggregates are limited to 6 x EBS volumes.

You can save 20% with S3 One Zone IA when compared to S3 IA. However, if your data set will take weeks to initialize (first full), you don’t want to risk losing it as a result of an AWS AZ failure. However unlikely, such an occurrence will create weeks of vulnerability without a DR copy of your data during re-initialization (more on protecting against EBS failure in a later post).

Lastly, be sure to add an S3 Endpoint to the route table used by the subnet your CVO instance is deployed in. It will ensure all CVO back-end storage traffic stays on the AWS network and does not traverse the public internet.

Multiple S3 Storage Classes from One CVO Instance

In the next section of the series we’ll look at some of the limitations of Cloud Manager, the management system and API endpoint for CVO. We’ll discuss how to work around one of these limitations to enable different S3 storage classes for different aggregates within the same instance.

Reader Interactions

Trackbacks