Data Protection with Cloud Volumes ONTAP: Part VI – Replication

Overview

Throughout this series we’ve examined the functional and cost impact of different deployment options for Cloud Volumes ONTAP (CVO) in AWS. We’ve also looked at the options in the use context of Data Protection and DR. As mentioned in the introductory post of the series, these are the most likely use cases among existing NetApp customers. The reason is simple: native replication.

Replication Options

Native replication refers to the fact that the replication engine is native to, or built into, a system’s operating environment. It provides advantages over all other replication options. The most significant of which is integration. NetApp’s native replication engine, Snapmirror, integrates with several ONTAP features such as snapshots, storage efficiency (dedup, compression, and compaction), and storage virtual machines (SVM). This integration along with its reliability and feature set make it the logical choice for Data Protection and Disaster Recovery of data sets that reside on a NetApp system.

Cloud Manager Makes Replication Easy

It’s a simple drag and drop. Once a source and target system have been added to Cloud Manager, replication can be configured by dragging your source to your target system and following a simple interactive menu to select specific volumes, policies, and schedules. In the background a peer relationship is built between the two systems, volumes are provisioned on the target system, and Snapmirror relationships are created.

Not All Replication is Easy

While drag and drop replication is great, it is limited. It can be used to setup volume replication but does not support Snapmirror SVM replication setup. There’s an important distinction between the two. Snapmirror Volume replication ensures that all data from a source system is copied to the target and is available in the event of a system failure or disaster. But it accounts for data only. It does not account for configuration. NFS exports, CIFS (SMB) shares, and network configuration are not included in Snapmirror Volume replication.

Snapmirror SVM replication includes both data and configuration. It enables a simple and comprehensive failover process. There’s no need to manually create shares or exports or ensure that all the share permissions and export rules from your source system (which would be down or unavailable) are identical and in place on your DR system (which is now serving critical data).

Although SVM Snapmirror replication can’t be setup via Cloud Manager, it’s still supported in CVO. It just needs to be configured using Systems Manager or the CLI. So SVM Snapmirror replication is the logical choice for DR, right? Actually, there’s another limitation to consider.

Fabric Pools

As we discussed at length in the second post in this series, Fabric Pools enables data to be tiered from EBS storage to lower cost S3 storage. This can be an important cost optimization strategy, especially for DR. The issue with Fabric Pools and Snapmirror SVM replication is that the tiering policies of both source and target volumes must be the same. Most source volumes will not have tiering policies set to “all” but most target volumes should, especially if you’re protecting TB’s of data and you prefer lower AWS bills.

Automate a Solution with AWS Lambda

Since Snapmirror Volume replication supports source and target volumes with different tiering policies, your source volume can be set to “none” while your target is set to “all”. Cost is lower but share and export configuration and permissions are missing. Fortunately, AWS Lambda offers a simple and robust serverless automation platform that can be used to close the gap between Volume and SVM replication.

Lambda, coupled with NetApp’s API, can deliver similar functionality to SVM Snapmirror replication but without the Fabric Pools limitation. Below is a simple function written in bash that can be called by a custom runtime. It queries the source NetApp for CIFS share configuration and drops the results in an S3 bucket. This could easily be extended to support a fully orchestrated failover by using the source CIFS configuration to create the shares on the target system on a periodic basis in advance of failover. Our automation engineers can help with this or more complex custom orchestration scenarios (using more advanced runtimes like Python) to make failover seemless and reliable.

function handler () {
  
  # setenv
  user="api_readonly_user"
  baseurl="https://source.netapp.example.com/api"
  svmuuid="52726127-c848-11e8-af3f-xxxxxxxxxxxx"
  # store apipasswd as encrypted lambda environment variable or in secrets manager
  curlcmd="curl -s -X GET -u "${user}:${apipasswd}" -k "${baseurl}""
  cifsfile=cifs.config.$(date '+%m%d%y')
  cifsfilepath=/tmp/${cifsfile}
  expdate=$(date -d '+1 week' '+%Y-%m-%d')
  s3bucket="s3-bucket-name"

  # loop through shares, get share info for each
  ${curlcmd}/protocols/cifs/shares |jq -r .records[].name |while read share
  do
    ${curlcmd}/protocols/cifs/shares/${svmuuid}/${share} >> ${cifsfilepath}
  done
  aws s3api put-object --bucket ${s3bucket} --key ${cifsfile} --body ${cifsfilepath} --expires ${expdate} > /dev/null 2>&1
  if [ $? -eq 0 ]; then
    RESPONSE="0"
  else
    RESPONSE="1"
  fi
  echo $RESPONSE
}

Replication Seeding

Many environments can easily support ongoing incremental replication updates. This of course depends on the size of a data set, rate of change, latency, and available bandwidth to AWS. The bigger challenge is typically initialization or one time seeding of a full data set. It can be difficult to justify the time, effort, and cost to increase bandwidth to the level required to complete initialization. An old friend told me to “never underestimate the bandwidth of a station wagon full of tapes”. Fortunately we don’t have to go back to 80’s tech for a solution to this problem, but the principal still applies. AWS Snowball enables data to be copied over a LAN, then physically migrated and imported into an AWS region. In our next post we’ll discuss how Snowball can be leveraged to seed replication, reduce WAN utilization, and accelerate DR deployment.

Reader Interactions

Trackbacks