I’ve seen a few definitions and watched a few presentations and I’ve never really been able to very easy and clearly articulate what object storage actually is! We all know it is an architecture that managed data as an object (rather than in blocks/sectors or a hierarchy) but I never really understood what an object was…! Might just be me being stupid but after a bit of reading I understood it a lot better once i understood the characteristics of an object e.g.
- An object is independent of the application i.e. it doesn’t need an OS or an application to be able to make sense of the data. This means that a users can access the content (e.g. JPEG, Video, PDF etc) directly from a browser (over HTTP/HTTPS) rather than needing to use a specific application. This means no app servers required, dramatically improving simplicity and performance (of course you can still access object storage via an application if needed)
- Object storage is globally accessible i.e. no requirement to move or copy data (locations, firewalls etc)… instead data is accessible from anywhere
- Object storage is highly parallelized, what this means is that there are no locks on write operations meaning that we have the ability to have hundreds of thousands of users distributed around the world all writing simultaneously, none of the users need to know about one another and their behavior will not impact others. This is very different to traditional NAS storage where if you want it available in a secondary site it would need to replicated to another NAS platform which is sat passive and cannot be written to directly.
- Object storage is linearly scalable i.e. there is no point at which we would expect performance to be impacted, it can continue to grow and there is no need to manage around limitations or constraints such as capacity or structure.
- Finally it’s worth noting that object platforms are extensible, really all this means is that it has the ability to easily extend the capabilities without large implementation efforts, examples within this context is things like the ability to enrich data with meta-data and add policies such as retention, protection and where data cannot live (compliance).
Object storage is the way to organize data by addressing and manipulating discrete units of data called objects. Each object, like a file, is a stream of binary data. However, unlike files, objects are not organised in a hierarchy of folders and are not identified by its path in the hierarchy. Each object is associated with a key made of a string when created, and you may retrieve an object by using the key to query the object storage. As a result, all of the objects are organized in a flat name space (one object cannot be placed inside another object). Such organisation eliminates the dependency between objects but retains the fundamental functionality of a storage system: storing and retrieving data. The main profit of such organisation is very high level of scalability.
Both files and objects have metadata associated with the data they contain, but objects are characterized by their extended metadata. Each object is assigned a unique identifier which allows a server or end user to retrieve the object without needing to know the physical location of the data. This approach is useful for automating and streamlining data storage in cloud computing environments. S3 and Swift are the most commonly used cloud object protocols. Amazon S3 (Simple Storage Service) is an online file storage web service offered by Amazon Web Services. OpenStack is a free and open-source software platform for cloud computing. The S3 protocol is the most commonly used object storage protocol. So, if you’re using 3rd party applications that use object storage, this would be the most compatible protocol. Swift is a little bit less than S3, but still very popular cloud object protocol. S3 was developed by AWS and it’s API is open for third party developers. Swift protocol is managed by the OpenStack Foundation, a non-profit corporate entity established in September 2012 to promote OpenStack software and its community. More than 500 companies have joined the project. Below are some major difference between S3 and SWIFT.
Unique features of S3:
- Bucket-level controls for versioning and expiration that apply to all objects in the bucket
- Copy Object – This allows you to do server-side copies of objects
- Anonymous Access – The ability to set PUBLIC access on an object and serve it via HTTP/HTTPS without authentication.
- S3 stores its objects in a bucket.
Unique features of SWIFT
SWIFT API allows Unsized object create feature, Swift is the only protocol where you can use “Chunked” encoding to upload an object where the size is not known beforehand. S3 require multiple requests to achieve this. SWIFT stores the objects in its “Containers”.
Authentication (S3 vs SWIFT)
S3 – Amazon S3 uses an authorization header that must be present in all requests to identify the user (Access Key Id) and provide a signature for the request. An Amazon access key ID has 20 characters. Both HTTP and HTTPS protocols are supported.
SWIFT – Authentication in Swift is quite flexible. It is done through a separate mechanism creating a “token” that can be passed around to authenticate requests. Both HTTP and HTTPS protocols are supported.
Retention and AUDIT (S3 vs SWIFT)
Retention periods are supported on all object interfaces including S3 and Swift. The controller API provides the ability to audit the use of the S3 and Swift object interfaces.
Large Objects (S3 vs SWIFT)
S3 Multipart Upload allows you to upload a single object as a set of parts. After all of these parts are uploaded, the data will be presented as a single object. OpenStack Swift Large Object is comprised of two types of objects: segment objects that store the object content, and a manifest object that links the segment objects into one logical large object. When you download a manifest object, the contents of the segment objects will be concatenated and returned in the response body of the request.
So which object storage API to use ? Well, both have their benefits, at specific use cases. DellEMC ECS is an on-premise object storage solution which allows users to have multiple object protocols like S3, SWIFT, CAS, HTTPS, HDFS, NFSv3 etc all in a single machine. It is built on servers with their DAS storage running ECS software and is also available in software format which can be deployed on your own servers.
There are many benefits of using ECS as your own object storage:
- Supports multiple protocols like – S3, SWIFT, CAS, HTTPS, HDFS, NFSv3 and CIFS via software and OEM partners like Panzura.
- ECS is a WORM compliant storage, so once a data chunk has been written to WORM bucket, it cannot be deleted from a the root / administrator of ECS as well, until the application renders its retention time to be over. It is also SEC 17a- 4f complied, which is a must of you are archiving financial, insurance etc. data sets.
- ECS is available in both appliance version and also can be deployed on your own tested servers, however on appliance you can be assured of performance and resiliency, since the complete hardware and software solution is from a single vendor.
- ECS is scalable, we can start as low as 100 – 200 TB and can go Exabytes, maintaining a single namespace or even a bucket if need be, just by adding nodes which have their own compute and storage.
- Data in ECS is erasure coded, so it can sustain loss of disks, nodes or even sites on type and scale of deployment.
- ECS is dense storage – you can use 8 TB drives or 12 TB drives for keeping data center space utilization to a low.
- ECS is used by multiple enterprises at large, small and midsize environment, in 2017 Gartner put ECS on top of its Distributed Files and Object Storage systems, read the synopsis here.
- Unlike other object storage which either support small size objects or large size objects and have massive performance problems when asked to do both workloads, ECS is with its own unique architecture can manage both small and large workloads, read more here.
- ECS provides native data and metadata search, and maintains strong consistency of data across multiple sites.
- ECS has integration with major vendors in all IT verticals like
- Backup – Integration with NetWorker, Data Domain, Netbackup, Commvault etc.
- Archival – Integration with SourceOne, Enterprise Vault, Informatica, NICE etc.
- Integration with NTP software for file archival, works with Panzura, Ctera etc. which enables object storage to be showcased as file storage.
- ECS has native versioning of data, encryption, replication and multi site availability of data from day one.
- ECS has native HDFS engine, so you can tier you old data to ECS in HDFS format and can still run MAP REDUCE queries without performing an ETL. More details.
There is no end to it. In modern IT space, there is no such solution like ECS for object storage, the architecture is such that it can start low and scale as much as the requirement be and integrate with plethora of application and API for data storage, service and protection. ECS can be downloaded for free and installed for test purposes here.