Backups to Cloud, What’s the big deal ?

As a Solution Designer, working with one of the largest product companies, I have faced many questions from customers, partners. In past 12 – 18 months, a single question has always been present in meetings and seminars. Can we backup to cloud ? As we move towards more agile applications, move towards a cloud / As a service model is gaining momentum. But backups to cloud, in itself has many questions – Why would you want to move your backups to cloud ? What do you consider as a backup, snapshots ? Daily Backups ? Long Term Retention Data ? or Archival Data…  How frequently would you want to do it ?

To make things simpler, Yes! you can backup to cloud. Any modern backup software could do that. But like we always would want, an optimized way to perform it. Let’s try and answer the question – Why would you want to move your backups to cloud ? , Did you feel it is cheaper (Cost effective, CAPEX cost is minimal), Does your IT infra is in Cloud setup, or Did your CXO asked you to do this… whatever the need be, now that you have to, below are some general points to keep in mind while designing such solution, these are out of my own experience, hence should be taken with a grain of salt! There are three major components and many deciding factors to this, but I will try to keep it short.

Components – Data for Backups, Mechanism to move Data to Cloud, Cloud Provider.

It matters what we are backing up, whether it is encrypted, compressed data like images, GIS, CAD CAM, then its is not a worthy candidate which I will explain later below. Databases or data with high change rates will share the same fate (can be better with some intelligent optimization). Amount of data matters as well, more the data, more will be the backup time or more bandwidth required for backups in lesser time. The best strategy will be to minimize the amount of data with deduplication, compression and protect the data from being read from malicious sources via encryption.

Mechanism to Move backups to Cloud – It is actually very simple, just add a AWS EFS (Amazon Elastic File System) to your Linux machine (backup server) and start backing up to it!  The design challenges are more esoteric than how easy it seems. I have seen organizations using myriad of ways to move backups to cloud, I will not list them since they were not really very efficient. As a general rule there are two ways to move backups to cloud – via an appliance – which caches the data and tiers to cloud on specific time intervals or other policies and the other way is via a backup software which directly writes the backup data to cloud. Below design is beneficial and recommended for both the solutions.B1

In the figure, Front End Data in light orange is the data you want to protect to cloud, this data can be databases, Virtual machines, file system data, logs, end user data etc. Before we send the data to a cloud provider, we need to D -Deduplication, C– Compression, E – encryption and I – Indexing so that we can minimize the amount of data that is to be sent to cloud (public, private or hybrid). This will make the backup process faster and very cost efficient. Since more is the data to write to cloud, more is the cloud storage required, and some cloud providers (like AWS) also charge for number of API calls an application does to them for writing data to their storage. To minimize such costs, variable block, inline deduplication with a single deduplication pool is highly recommended .

Keeping deduplication inline and source based (deduplication occurring on clients), we reduce the amount of storage required on appliance (for post process deduplication) and also ensure faster backups to appliance. Variable block deduplication is process used by appliances like Data Domain from DellEMC which constantly change their chunking factor between 4 KB to 32 KB hence ensuring highest deduplication factor (more on this on later blogs!). Remember here, the smaller the size of the deduplication chunk the higher the deduplication ratio shall be. There multiple mechanisms of compression, fastest and most efficient at the same time and one who has stood the test of time is LZ. It is very important to encrypt the data we are backing up since once its in cloud, control on it would be a shared responsibility between you and the cloud provider. Always encrypt the data and keep encryption keys On – Premise, wherever possible. Indexing is necessary to ensure that we always have the knowledge what was backed up, how much, when, for how long and where. All the process mentioned should be inline for least overhead and processing nightmares.

Above process only backs up data to your PBBA – Purpose Built Backup Appliance (Gartner gave this name to Intelligent Disk based backup appliances, more on this later.), hereafter your backup software should have the capability to tier this data to your choice of Public / Private / Hybrid Cloud provider as and when needed in deduplicated, compressed and encrypted format. All major backup software have this functionality, it is only few who optimize the process most, I have had a good experience with DellEMC NetWorker and DellEMC Avamar, both can be integrated with Data Domain as well. The process of sending data to cloud is fairly simple and can be managed and monitored from the native backup software console itself. The software should also have the capability to recall singular files and not the complete backup set as and when required. Since all major cloud providers actually charge on amount of data read back from their cloud storage.

But what if you only wanted to use a software and wanted to backup the data directly to a cloud provider (the second scenario)! In that case as well, you would deduplicate, compress, encrypt data on source server itself and send the data directly to the cloud storage, without the intervention of a media server, since a media server will just add extra hop to the whole process and slows the operation. B1_2

Above is a diagram in which clients (servers, Virtual Machines) can send data directly to any supported cloud storage vendor as per schedule from backup server software, note that there is NO MEDIA Server in this case, this feature is called Client Direct in which clients directly send deduplicated, compressed and encrypted data to backup Storage and is very efficient and fast, since it does not require any MEDIA server. The following feature is available in DellEMC NetWorker. Cost savings and performance in this case are just huge since there is no media server (hence no OS license required, No server cost, no FC port cost, so forth..) required. Since each client is sending their own data, in deduplicated format performance is better than a media server architecture.

Cloud Provider – All this while I have been whistling about how to do “backups” and have not shed any light on what this “Cloud Storage” is. Cloud Storage is provided by a Cloud Provider like AWS – Amazon Web Services, Microsoft Azure, GCE – Google Compute Engine etc. and backup storage in all of these cases is a form of OBJECT STORAGE. In case of AWS this is S3, S3 – IA and S3 – Glacier, for Microsoft its BLOB Storage etc. I will invest another blog explaining what is an OBJECT storage and what are its mechanics, since it deserves its own blog. For now, just remember Object Storage is not meant for frequent large sequential writes (that’s what backups are!) and large sequential reads (like restore of 4 TB Oracle database). This is also a reason for using a caching device (Purpose Built Backup appliance) for backing up large amount of data on-premise first and then sending only deduplicated, encrypted data to Cloud Storage (or Object Storage). While choosing the object storage make sure, they have highest level of redundancy available and always read their SLA (Service level Agreement) documentation with an eye of a detective. There are some compatibility specifications to be taken into account, your backup software, PBBA (Purpose Built Backup appliance) should integrate with major cloud providers.

I just realized, this is my first blog and once again like always, I have overshot! I will leave you with below links to solidify my claim about the products I mentioned above. Take it easy and Take care guys!

NetWorker with Data Domain – Cloud Tier Demo

Avamar with Data Domain – Cloud Tier Demo

NetWorker with CloudBoost – Direct Cloud Backup

 

Advertisements