Currently Being Moderated

One of the design goals of the Cloud Data Management Protocol is to abstract the internal implementation of a cloud storage provider from the API and the client. CDMI enables this through the use of Data System Metadata, which are explicitly and inherited statements of intent made by a client that instruct a provider on what storage characteristics the client desires (and is willing to pay for). This is contrasted with traditional storage management systems, where the exact storage configuration would be defined by selecting specific disks drives, the desired RAID level, etc.

 

Instead of specifying technology, the client specifies desired service characteristics:

 

  • cdmi_data_redundancy - The number of component failures that can be tolerated before data is lost
  • cdmi_immediate_redundancy - The number of component failures that can be tolerated before data is lost, such that the client does not receive a commitment of storage until that level is reached
  • cdmi_infrastructure_redundancy - The number of site failures that can be tolerated before data is lost
  • cdmi_data_dispersion - The geographic separation between sites that is used to determine if two sites count as two separate failure zones
  • cdmi_geographic_placement - Geopolitical restrictions on the placement of the data
  • cdmi_latency - The maximum latency desired for retrieval
  • cdmi_throughput - The minimum throughput desired for retrieval
  • cdmi_RTO - The maximum time into the past that the state of the stored data may be rolled back after a component or site failure
  • cdmi_RPO - The maximum time into the future required to restore the past state of stored data after a component or site failure

 

For example, for critical data, an application or end-user may specify a data_redundancy of "3", an infrastructure_redundancy of "2", and an RTO of "0". For high performance streaming data, one may specify a latency of "1" and a throughput of "10000000" (10 Mbytes/sec). For reduced redundancy cold archival storage, one may specify a latency of "3600000" (one hour), and a data_redundancy of "1".

 

Applications can attach data system metadata to data objects and containers. Data System Metadata associated with a container is inherited by all objects within that container, unless overridden by explicitly stated Data System Metadata associated with an object. As Data System Metadata is "sticky", and travels with objects as they traverse from one system to another, which simplifies data management. Finally, clients can discover what degree of service is being provided by the cloud by inspecting the dynamically generated "_provided" metadata items.

 

These features provide a powerful, open and extensible foundation for enabling advanced cloud-based storage management functions. To illustrate this, consider a cloud provider that supports both high and low latency storage. What does CDMI enable them to offer?

 

1. CDMI decouples data and management functions, allowing namespace and management functions to be decoupled from data operations.

 

a) CDMI allows a client to traverse and inspect a single namespace of mixed-latency objects, regardless of the latency of individual stored data objects. This means that clients don't have to have separate namespaces, as the desired latency is a characteristic of the content, rather then being based on the organization of the content

 

b) CDMI allows the latency associated with individual and collections of stored objects to be specified and to be determined

 

c) CDMI provides serialization, where a collection of objects can be packaged up and retrieved or stored as a single operation, allowing more efficient operations against high-latency stores

 

2. CDMI defines both synchronous and asynchronous modes of access:

 

a) Immediate Sync - When a client (which can optionally first determine the retrieval latency of an object) issues an HTTP GET to read a high-latency stored data, the HTTP request blocks until the request is able to complete. Note that a GET to retrieve metadata, including the requested and provided latency, always completes without delay.

 

b) Immediate Async - When a client issues an HTTP PUT or POST to update existing high-latency stored data, a 202 Accepted is returned, and an HTTP GET can be used to monitor the completion status of the update.

 

c) Deferred Async - A client can issue an HTTP PUT to indicate that low latency requested for the data. This HTTP request completes immediately, and GET can be used to determine when the data is now accessible with low latency. Clients can register for asynchronous notifications which indicate when the data is available for low latencies.

 

Clients can also register "Jobs", which allow the latency for arbitrary collections of objects to be changed from high latency to low latency (or vice-versa). The job status allows progress to be monitored.

 

A job can also be used to automatically move low latency data back to high latency status based on characteristics such as time since last access.

 

We'll see some examples of this technique later in this post.

 

3. CDMI enables many use cases for high latency data:

 

a) Check-in/Check-out of a large data set: A large collection of related data needs to be moved from one SLO to another as a group (either decreasing latency or increasing latency) under the control of another system, without changing the namespace. E.g. Hadoop job archiving, completed/reactivated projects, scheduled patient visits, closed accounts, compliance events, periodic fixity verification, etc.

 

b) Explicit Low SLO: The temperature of individual objects is specifically set to be cold. E.g. Public/private cloud cloud storage, SCM application WORN (write-once, read-never) data, logs, and compliance data.

 

In order to see how this is enabled by CDMI, let's look at some protocol examples:

 

Protocol Examples:

 

1. Create an object with a high-latency SLO:

 

PUT /HighLatencyDataObject.txt HTTP/1.1

Host: cloud.example.com

Content-Type: application/cdmi-object

X-CDMI-Specification-Version: 1.0.2

 

{

     "metadata" : { "cdmi_latency", "1000000" },

     "value" : "Server, you may take up to a kilosecond before you send the data to a client"

}

 

Including the "cdmi_latency" metadata item indicates to the server that a high-latency SLO is desired.

 

2. Create a container where every object will automatically have a high-latency SLO:

 

PUT /HighLatencyContainer/ HTTP/1.1

Host: cloud.example.com

Content-Type: application/cdmi-container

X-CDMI-Specification-Version: 1.0.2

 

{

     "metadata" : { "cdmi_latency", "1000000" }

}

 

The "cdmi_latency" metadata item is inherited to children data objects and containers.

 

3. Update a high-latency object have a low latency:

 

PUT /HighLatencyDataObject.txt HTTP/1.1

Host: cloud.example.com

Content-Type: application/cdmi-object

X-CDMI-Specification-Version: 1.0.2

 

{

     "metadata" : { "cdmi_latency", "10" },

}

 

Updating the "cdmi_latency" metadata item indicates to the server that a different latency SLO is desired. The server then does what is needed to transparently move the object to lower latency storage without changing the namespace.

 

4. Determine the requested and provided latency for an object or container:

 

GET /HighLatencyDataObject.txt?metadata:cdmi_latency;metadata:cdmi_latency_provided HTTP/1.1

Host: cloud.example.com

X-CDMI-Specification-Version: 1.0.2

 

Response:

 

HTTP/1.1 200 OK

X-CDMI-Specification-Version: 1.0.2

Content-Type: application/cdmi-object

 

{

     "metadata" : {

          "cdmi_latency", "10",

          "cdmi_latency_provided", "1000000"

     }

}

 

The "cdmi_latency_provided" item indicates what latency a client will see, and "cdmi_latency" indicates what latency the client would like to see. Once an object has been moved/cached to a lower latency location, the value of "cdmi_latency_provided" changes to reflect this.

 

5. Find all objects with an access latency greater than one second:

 

PUT /FindHighLatencyObjects.query?value HTTP/1.1

Host: cloud.example.com

Content-Type: application/cdmi-queue

X-CDMI-Specification-Version: 1.0.2

 

{

     "valuetransferencoding" : "json",

     "metadata" : {

          "cdmi_queue_type" : "cdmi_query_immediate",

          "cdmi_scope_specification" : [ { "metadata" : { "cdmi_latency_provided" : "> 1000"} ],

          "cdmi_results_specification" : { "objectID" : "" }

     }

}

 

Response:

 

HTTP/1.1 201 Created

Content-Type: application/cdmi-queue

X-CDMI-Specification-Version: 1.0.2

{

     "value" : [

          { "objectID" : "00007E7F0010CEC234AD9E3EBFE9531D" },

          { "objectID" : "00007E7F0010DCECC805FB6D195DDBCB" },

          { "objectID" : "00007E7F00102E230ED82694DAA975D2" },

          { "objectID" : "00007E7F0010EB9092B29F6CD6AD6824" },

     ]

}

 

This finds the four objects that have a latency more than a second latency. A similar request could find all objects where the requested latency is under a second, or where the requested latency does not match the provided latency, etc.

 

6. Request notifications when objects transition from high to low latency:

 

PUT /LoweredLatencyObjects.notify HTTP/1.1

Host: cloud.example.com

Content-Type: application/cdmi-queue

X-CDMI-Specification-Version: 1.0.2

 

{

     "valuetransferencoding" : "json",

     "metadata" : {

          "cdmi_queue_type" : "cdmi_notification_queue",

          "cdmi_notification_events" : [ "cdmi_modify_complete" ],

          "cdmi_scope_specification" : [ { "metadata" : { "cdmi_latency_provided" : "< 1000"} ],

          "cdmi_results_specification" : { "objectID" : "" }

     }

}

 

This creates a persistent queue that will be automatically filled with notifications as objects transitions from one SLO to another.

 

7. Move all MPEG movies to high latency storage:

 

PUT /jobs/movieMigration.job HTTP/1.1

Host: cloud.example.com

Content-Type: application/cdmi-object

X-CDMI-Specification-Version: 1.0.2

 

{

     "metadata" : {

          "cdmi_job_action" : "cdmi_job_action_update_metadata",

          "cdmi_job_target" : [ { "mimetype" : "== video/mpeg" }, { "objectName" : "ends .mpeg" } ],

          "cdmi_job_action_params" : [ { "update_add" : { "metadata" { "cdmi_latency_provided" : "1000000"} } } ]

     }

}

 

This finds all objects that have a mime type of MPEG or end in .mpeg, and sets the latency SLO to 1 kiloseconds. The same process can be used to bulk move from a high latency SLO to a low latency SLO.

 

As with latency (illustrated above), each dimension of control afforded by Data System Metadata enables industry-leading functionality, and reduces both server and client complexity.

Comments

Filter Blog

By date:
By tag: