I am back! and not with a data protection writeup, because last few weeks I have not been able to put away the thought of Software Defined Storage. So once, again here is a backup guy talking about storage. I was curious about the current state of software defined storage in the industry and decided to get my hands dirty. I’ve done some research and reading on SDS over the course of the last month(well actually more than this) and this is a cruxof what I’ve learned from my teammates, customers and people with whom I work.
What is SDS?
The term is very broadly used to describe many different products with various features and capabilities. It is the term for defining the trend towards data storage becoming independent of the underlying hardware, so basically no more fancy SAN boxes as per SDS, simply put. SDS is nothing but a data storage software that includes policy-based provisioning, management of data storage that is independent of the underlying hardware, virtualization or OS. So basically it is Hardware Agnostic, which is good news considering the prices of servers and DAS.
I first looked at IDC and Gartner. IDC defines software defined storage solutions as solutions that deploy controller software (the storage software platform) that is decoupled from underlying hardware, runs on industry standard hardware, and delivers a complete set of enterprise storage services. Gartner defines SDS in two separate parts, Infrastructure and Management:
- Infrastructure SDS uses commodity hardware such as x86 servers, JBOD and offer features through software orchestration. It creates and provides data center services to replace or augment traditional storage arrays.
- Management SDS controls hardware but also controls legacy storage products to integrate them into a SDS environment. It interacts with existing storage systems to deliver greater agility of storage services.
The general Attributes of SDS
There are many characteristics of SDS in fact each vendor adds a new dimension to the offering, only making it better in long run, however I have put some key characteristics below, which you should give a look at:
- Hardware and Software Abstraction – SDS always includes abstraction of logical storage services and capabilities from the underlying physical storage systems. It does not really matter to SDS software whether a server has SAS, SATA, SSD, PCIe card as storage, all is welcome.
- Storage Virtualization – External-controller based arrays include storage virtualization to manage usage and access across the drives within their own pools, other products exist independently to manage across arrays and/or directly attached server storage.
- Automation and Orchestration –SDS includes automation with policy-driven storage provisioning, and service-level agreements (SLAs) generally replace the precise details of the actual hardware.
- Centralized Management –SDS includes management capabilities with a centralized point of management.
- Enterprise storage features –SDS includes support for all the features desired in an enterprise storage offering, such as compression, deduplication, replication, snapshots, data tiering, and thin provisioning.
When and How to use SDS
There are a host of considerations when developing a software defined storage strategy. Below is a list of some of the important items to consider during the process.
- Storage Management – You need to know how does a storage works, no not just the IOPS, performance but other details such as queue depth in an OS, how does different applications see same storage differently etc. You will have to test different settings and get the necessary performance for your environment. Not All applications require read cache, compression, however some do, just an example how detailed this can be. the more detailed you go, the better SLA as a storage admin you deliver. Most of the times, these all nuances are overlooked which cost dearly in any environment. So while designing or purchasing even a normal storage understand your application, because you buy storage not cause of features but for your application which runs your business. Massive research is required for choosing from so many options like (vSAN, ScaleIO, Ceph etc.), all you need to remember is that you are doing all this for your application, not to save money, or get fancy IOPS but deliver the needed SLA.
- Cloud and Microservices Integration – Yes Probably today, you are not using cloud and you may also not require Jenkins for your application deployment, but its only a matter of time before this all is required. IT is moving faster than light these days (not really !) and growth of data has created new avenues as to where all it ca be used. Data is the new OIL. An intelligent SDS software has the capability to tier data across cloud platforms or even on a “cheaper storage”, does you software defined storage (in case of Object for example) support s3, REST APIs, SWIFT, HDFS, CAS etc. all at once. Designing of the SDS solution should be such that it is future ready. To give you an outline, if you are using a SDS solution for block storage, you should look out for Docker, Virtualization, COTS integration, understand your vendor’s roadmap and see if it aligns with your present and future requirements.
- Expansion and Scalability – So then how much can your SDS solution can really scale. 10 nodes ? 100 nodes ? 1000 nodes ? , well the answer lies in your requirement, very few organizations require 1000 nodes+ , but when we talk about scalability, we also look at scalability without drop in performance. A lot of vendors may be able to scale, but if performance suffers, we are back to square one. There are different parameters to judge performance (easier way is to just mentioned IOPS and throughput) ranging from SLA, service catalog and stress testing. While procuring a SDS solution remember the performance should only increase with increase in size (of capacity, nodes, controllers etc.). The other important thing to remember is how many sites can it support? A good SDS solution (block, Object etc.) should be able to support multiple sites across the globe and by this I obviously mean should be able replicate the data in selective format across sites and can be managed from a single management console.
- Architecture Matters – How does your SDS solution transfer data from host to the device, how does an actual read and write occur and how it is different from the solution, what suits you. You will have to go into the details and understand it. In case of SDS the Architecture and details matter much more, because probably you are no longer having the luxury if the old SAN box, what you now have are servers and disks in them, how do you get performance? by knowing the software which you procured. You need to understand networking, OS, Hyper-Visor, Disk, RAM 101 and 201 details. You should look out for solutions which are stateless and do not depend on specific processes to complete to move forward, thus averting bottlenecks. Have multiple detailed discussion with your vendor, the more you learn now, the less service issues you will have later.
- Test, Test and Test Again before GO LIVE – Understanding your application is one thing and knowing how it will behave in your environment is another, so before you cut the red ribbon, TEST. Make sure you have left no stone un-turned, Yes, probably you cannot do this for all applications in your setup, but Tier -1 application deserve this. Don’t shy, you will thank yourself later. Another important point I would like to make is, understand how the data will be migrated from old legacy SAN array to SDS solution, what will be the implications, will there be a downtime if yes, then how much and how to minimize it. One of the original purposes of SDS was to be hardware agnostic so there should be no reason to remove and replace all of your existing hardware. A good SDS solution should enable you to protect your existing investment in hardware as opposed to requiring a forklift upgrade of all of your hardware. A new SDS implementation should complement your environment and protect your investment in existing servers, storage, networking, management tools and employee skill sets.
- Functionalities and Features – So does your SDS solution perform deduplication ? Compression ? Encryption ? Erasure Coding ? Replication ? Let’s step back a bit. How many of these features do you actually need in your applications. Probably Encryption, Compression, Replication, for a performance block storage or even Erasure Coding on an object storage. Do not go for functionalities, if you will never use them, think of these as the side missions to you actual requirement. Understand what you really need, performance ? Scale ? Availability ? IOPS ? and then decide.
There are many use cases and benefits of SDS (File, Block and Object) like Expansion, Automation, Cloud Integration, reduction in operational and management expenses, Scalability, Availability and durability, ability to leverage COTS and so and so forth. Few weeks back I did write a comparison between two major SDS – block vendors, you can check it out here. There are many vendors starting from DellEMC (ScaleIO), VMware vSAN , DellEMC (ECS) and so and so forth. The only key I would want you to take away is to learn you environment and understand you setup in detail and then choose what suits you the best!