Currently Being Moderated
dslik

Use Cases for Metadata in Object Storage

Posted by dslik in Objects in Context on Jan 14, 2014 9:54:22 PM

One of the main differentiators of object storage when compared to traditional file systems is the ability to store rich metadata.

 

The use of rich metadata isn't new — in fact many file formats include rich metadata, and many applications are built around the use of rich metadata. What is new is that object storage protocols provides simple and interoperable ways to store, manage and query for metadata, thus make things a lot simpler for developers:

 

  1. Instead of metadata being embedded within files in file specific formats, with object storage, metadata is accessed and stored in a standardized way. Different applications define different metadata schemas and meanings, rather than different formats and data layouts.

  2. Instead of applications requiring embedded or external databases to keep track of copies or caches of metadata embedded in files, with object storage, the object storage provides direct access and query to the metadata through a standardized interface that is consistent for all data.

 

From a high level, there are four main use cases for metadata in object storage:

 

Use Case 1 - Basic Storage

 

Many file formats include rich metadata. For example, most JPEG images contain information about what camera was used, light levels, even where the photo was taken geographically. Often this file-level metadata is extracted and stored as object metadata once applications transition from files to objects.

 

In CDMI, metadata is stored as JSON objects, lists and strings, and thus can contain arbitrary schemas as defined by the application or data type. This allows existing metadata, such as found in DICOM or JPEG images, to be stored without change.

 

S3 and Swift support metadata, but are limited by HTTP headers, which can't easily represent hierarchies and have restricted character sets.

 

Use Case 2 - Tagging for Organization

 

Many applications allow "tags" can be attached to objects allowing faceted search, multiple views, etc. This is primarily a way to move beyond a rigid hierarchy. Most photo and music programs are already using rich metadata to provide multiple views like this, allowing you to sort and filter by artist, song length, album, etc.

 

In CDMI, tags are JSON lists, and can be used with query to identify all objects that have or don't have a given set of tags.

 

While S3 and Swift allow tags to be attached to objects via header metadata, they don't currently provide metadata query. Separate services such as SimpleDB must be used, with metadata being stored twice.

 

Adding a metadata query API is currently proposed for Swift by HP (See https://wiki.openstack.org/w/images/8/82/OSMS_API_v0.8_external.pdf)

 

Use Case 3 - Advanced Query

 

This is one step beyond tagging, where object storage metadata provides noSQL-style query allowing, for example, looking at billing data for all meters in a given zip code, where consumption is greater than a given value (uses metadata tags of zip code and consumption level), for a smart grid example.

 

In CDMI, all metadata is designed for noSQL backends, and the standard includes a persistent query interface that allows storage objects to be created as a result of queries. This is far more efficient and flexible then trying to maintain state over HTTP (which violates the principles of REST).

 

Neither S3 or Swift currently provide search functionality. As described above, this functionality is proposed to be added to Swift.

 

Use Case 4 - Workflow

 

A new asset is ingest, which triggers a notification. This triggers workflow steps, for example, a quality check program runs, and when it passes or fails, a metadata flag is added. Based on the metadata flag, the asset is routed for cleanup, or for transcoding, etc. Metadata flags indicate where in the workflow the object is, and allows the process to be automated.

 

In CDMI, notification queues can be generated allowing realtime notifications when objects are created, modified or deleted, and these notifications can be based on metadata. This makes creating workflows very easy, with everything being performed through a standardized storage interface.

 

Neither S3 or Swift currently provide notification functionality.

Comments

Filter Blog

By date:
By tag: