Almost two months since I last wrote. No, I was not utterly busy, just procrastinating on my blog topics. Last week, I was lucky to be part of a meeting which had nothing to do with data protection (I didn’t know this!). The customer I met has several thousand Virtual machines (OK, around 9,000 VMs) and his concern is not data protection (at least initially) but performance on these. These VMs are used for running web servers, databases, Hadoop clusters, some even hold cold archives and so on and so forth. Obviously storage and corresponding IOPS performance required here is mammoth. Also, just to make things clear, the infra has VMware, Hyper-V, KVM, RHEV etc. They have a bunch of storage equipment as well, from almost all of major data storage vendors. In the conversation with customer, I learned their main concern was Cost, Scalability, Performance, Data Services, DR capabilities and integration with ecosystem (applications, hyper-visors, OS, network etc). They had already tried almost every vendor and they were “satisfied” but were not particularly happy. They were looking for something software defined, which could perform like enterprise storage, or even better for their scale.
As Wikipedia describes it, “Software-defined storage (SDS) is a term for computer data storage software for policy-based provisioning and management of data storage independent of the underlying hardware. Software-defined storage typically includes a form of storage virtualization to separate the storage hardware from the software that manages it. The software enabling a software-defined storage environment may also provide policy management for features such as replication, thin provisioning, snapshots and backup.” Software-defined storage (SDS) is a key driver of data center transformation. As a data center grade SDS, the enterprise features, availability, performance and flexibility of Software defined storage makes it perfect for traditional array consolidation, private cloud/IaaS, and new emerging technologies like DevOps and container microservices. Since SDS is hardware agnostic (does not depend on type of drive, disk, network), it’s very easy for it to take advantage of new hardware releases immediately. Therefore, with SDS you can leverage newer hardware in market (such as NVMe Drives) providing performance and acceleration advancements.
Well, then what are the options in market for SDS? Now, before I take a plunge into this topic, would want to clarify, I in this blog will only be referring to Software Defined Block Storage (I will leave file and object for some other day.). If you perform a quick Google search, you will find almost everyone proclaiming the right to throne of SDS – Block kingdom. Before we choose a winner, I would want to re-iterate the requirements so that we can judge wisely. We need following attributes – SCALABILITY (No, not Terabytes (common, that was required in late 2000’s), Petabytes, Zeta bytes), Performance (on almost all block sizes, not just on 8K, 16K etc., this is important as different applications have different block sizes on which they deliver best results the storage should adapt to the same.), COTS enabled (can I deploy the storage on servers?), Data Services (Snapshots, Compression, Replication, Encryption etc.), Integration with ecosystem (supports for all OS, hyper-visors, container systems, microservices etc.). That seems a lot to ask from a single product, but this is how the dice is rolled in case of block storage requirement. But hadn’t we already solved all these issues with Traditional SAN systems? Well only for a while, as scale of IT infra grows, requirement for stateless systems managed by microservices are needed more and more for running “newer” applications and optimizing already existing ones.
We all want what we can’t have, normal human nature: a single globally distributed, unified storage system, that is infinitely scalable, easy to manage, replicated between several data centers and serves block devices, file systems, and object, all without any issues and delivering data services such as Data compression, and snapshots etc. However this is not really possible, not at scale. The point is that some storage systems are for IOPS, some for scale, some just for sprawl. With these different requirements it becomes extremely difficult to code storage for all the use cases. Adding different data services, just increases the data hops between different daemons, involved, reducing performance, as far as I believe as of now, with present technology and trends it is difficult to achieve a storage which does all, not that a unified storage does not work, but when you need performance, purpose built is the way to go. I have been an admin for storage for some time in my earlier life and I acknowledge that managing a Unified storage is much easier and simpler, than multiple purpose built appliances, but then again I would say, if I need block performance, I would bet my life on a purpose built Software defined Storage for block.
SCALEIO and CEPH are two most valid candidates to hold the baton for Software defined Block storage, but who is the real winner, in terms of attributes mentioned above. I will try to demystify on architecture levels and usability. So here is what CEPH delivers in single software, in a single go…
- Scalable distributed block storage
- Scalable distributed object storage
- Scalable distributed file system storage
- Scalable control plane that manages all of the above
To sweeten the deal, this all is free, for any capacity almost (well this depends, if you are a storage admin, you know what I mean). This is Holy Grail of storage (almost!), this is all OPEN SOURCE. But as a technologist if you look underneath the skin, remove the flesh, and understand the skeleton of a software, there are a lot of things happening here. Let’s check what CEPH has in its kitty. As I earlier mentioned fundamental problem with any multi-purpose tool is that it makes compromises in each “purpose” it serves, this is for a simple reason, cause a multipurpose storage like CEPH is designed to do many things and different things interfere with each other. It’s like you are asking a toaster to toast (which is fine) and also to fry your steak (All the best with that!), with present technology and coding it is possible but then there are some “TRADE-OFFS” Ceph’s trade-off, as a multi-purpose tool, is the use of a single “object storage” layer. You have a block interface (RBD), an object interface (RADOSGW), and a filesystem interface (CephFS), all of which talk to an underlying object storage system (RADOS). Here is the CEPH architecture from their documentation:
RADOS itself is reliant on an underlying file system to store its objects. So the diagram should actually look like this:
So in a given data path, for example a block written to disk, there is a high level of overhead:
In contrast, a purpose-built block storage system that does not compromise and is focused solely on block storage, like DellEMC ScaleIO, can be significantly more efficient:
(Here, SDC is ScaleIO Data Client which hosts the application which requires the IOPs and SDS is ScaleIO Data Server which pools the storage from multiple other SDS machines. A single server can act as both SDC and SDS.) This allows skipping two steps, but more importantly, it avoids complications and additional layers of indirection/abstraction as there is a 1:1 mapping of the ScaleIO client’s block and the block(s) on disk in the ScaleIO cluster. By comparison, multi-purpose systems need to have a single unified way of laying out storage data, which can add significant overhead, even at smaller scales. Ceph, for example, takes any of its “client data formats” (object, file, block), slices them up into “stripes”, and distributes those stripes across many “objects”, each of which is distributed within replicated sets, which are ultimately stored on a Linux file system in the Ceph cluster. Here’s the diagram from the Ceph documentation describing this:
This is a great architecture if you are going to normalize multiple protocols, but it’s a terrible architecture if you are designing for high performance block storage only, reason simple enough, there will be just too many calculations and “INSIDE IOPS” for a heavy transactional workload. In terms of latency, Ceph’s situation would get much grimmer, with Ceph having incredibly poor latency, almost certainly due to their architecture compromises.
DellEMC ScaleIO is software that creates a server-based SAN from local application server storage (local or network storage devices). ScaleIO delivers flexible, scalable performance and capacity on demand andintegrates storage and compute resources, scaling to hundreds of servers (also called nodes). As an alternative to traditional SAN infrastructures, ScaleIO combines hard disk drives (HDD), solid state disk (SSD), Peripheral Component Interconnect Express (PCIe) flash cards and NVMe drives to create a virtual pool of block storage with varying performance tiers. As opposed to traditional Fibre Channel SANs, ScaleIO has no requirement for a Fibre Channel fabric between the servers and the storage. This further reduces the cost and complexity of the solution. In addition, ScaleIO is hardware-agnostic and supports both physical and virtual application servers.
It creates a Software-Defined Storage (SDS) environment that allows users to exploit the unused local storage capacity in any server. ScaleIO provides a scalable, high performance, fault tolerant distributed shared storage system. Once again it can be installed on VMware, Xen, Hyper-V, Bare Metal servers etc., you get the vibe.
There are other problems besides performance with a multi-purpose system. The overhead I outlined above also means the system has to be hefty to just to do internal jobs, every new task or purpose it takes on includes overhead in terms of business logic, processing time, and resources consumed. In most common configurations, ScaleIO, being purpose-built takes less of the host system’s resources such as memory and CPU. Ceph would take significantly more resources than ScaleIO, making it a very poor choice for “hyper-converged”, semi-hyper-converged, scale-out deployments. This means that if you built two separate configurations of Ceph vs. ScaleIO that are designed to deliver the same performance levels, ScaleIO would have significantly better TCO, just factoring in the cost of the more expensive hardware required to support Ceph’s heavyweight footprint. So this also ensures that purpose built software just not promise and deliver performance but also cost effectiveness. I stumbled upon an old YouTube video (https://www.youtube.com/watch?v=S9wjn4WN4tE) showcasing how on block storage ScaleIO performs better than Ceph on similar compute resources. If you watch the video in entirety it clearly shows that ScaleIO exploits the underlying the resources much more efficiently, making it more scalable over time.
If you want to build a relatively low cost, high performance, distributed block storage system that supports bare metal, virtual machines, and containers, then you need something purpose built for block storage (Performance Matters!). You need a system optimized for block, ScaleIO. If you haven’t already, checked out ScaleIO, which is free to download and use at whatever size you want, Installing ScaleIO is very easy and can be made up and running in less than 10 minutes. Run these tests yourself. Report the results if you like. I am adding some documentation for ScaleIO which I found extremely useful understanding the way ScaleIO works: ScaleIO Architecture Guide. I will be writing more about SDS, specifically on its native data services like snapshots (as it pertains to Data protection) and ways to protect it via enterprise backup software and data protection appliances .