Thursday, March 06, 2008

Entities, Containers, and Authorities

SQL Server Data Services exposes what we call the 'ACE' concept.  This stands for Authority, Container, and Entity.  These basic concepts are the building blocks with which you build your applications using SSDS.

You will notice that I intentionally switched the order in my blog post title.  The reasoning is that I believe it is easier to understand the model when you learn the entity first.

Flexible Entities

At the core of it all is the idea of a flexible entity.  SSDS does not impose a schema on the shape of your data.  You are free to define whatever attributes you like with the model and choose some simple types to help you query that data later (string, number, boolean, datetime, or byte[]).  Consider the following C# class and how it will be represented on the wire (using the REST API).

public class Book
{
    public string Name { get; set; }
    public string ISBN { get; set; }
    public DateTime PublishDate { get; set; }
    public int Pages { get; set; }
    public byte[] Image { get; set; }
}
 
In order to store an instance of this Book class, we would need to serialize it on the wire like so (again this is for REST, not SOAP):
 
<Book xmlns:s="http://schemas.microsoft.com/stratus/2008/03/"
 xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
  <s:Id>-544629171</s:Id>
  <Name xsi:type="x:string">Some Book</Name>
  <ISBN xsi:type="x:string">1234567</ISBN>
  <PublishDate xsi:type="x:dateTime">2008-03-06T12:56:53.122-08:00</PublishDate>
  <Pages xsi:type="x:decimal">350</Pages>
  <Image xsi:type="x:base64Binary">CgsMDQ==</Image>
</Book>

A couple of things to notice about this XML - there is an attribute called Id.  This is one of the system level attributes that are required.  Along with the container and authority ID, this id will form the basis of the unique URI that represents this resource.  The other system attributes are Kind and Version.  We control the Id and Kind, but the Version is maintained by SSDS (and why we don't need to send it initially).  For the REST API, the Kind equates to the root XML element for the entity (in this case Kind is "Book").

Flexible means flexible

I believe that most developers will use SSDS with a strongly-typed backing class like Book.  The serialization and subsequent deserialization of the instance is actually up to you as a developer however.

However, nothing forces me to actually use a backing class for this service.  It can be helpful for developers to think of the data they store in SSDS in terms of objects or even rows in a database, but that is not the only way to think about it.  More accurately, we can think about the data as a sparse property bag.

I can just as easily store another Book object with the following:

<Book xmlns:s="http://schemas.microsoft.com/stratus/2008/03/"
 xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
  <s:Id>MySpecialBook</s:Id>
  <Name xsi:type="x:string">Some Book</Name>
  <ISBN xsi:type="x:string">1234567</ISBN>
  <PublishDate xsi:type="x:string">January 2008</PublishDate>
</Book>

Notice in this example that I have removed some properties and then changed the PublishDate property type.  This is completely legal when using SSDS and no error will occur for different properties on the same Kind.  It is up to you as a developer to figure out the shape of the data coming back and deserialize it appropriately.

As you can see, a flexible entity is really about having different shapes to your data (and no schema).

image

Containers

Containers are collections of entities.  We like to say that the container is unit of consistency for our data.  It turns out that containers are the broadest domain of a query or search.  As such, containers must also contain a complete copy of all the entities within (that is, they must be co-located on the same node).  This further means that there must be some practical limit to the size of the container (as measured by space on disk) at some point because we are constrained by physical storage (also memory and CPU to some extent).

We don't have a hard limit at this point except to say it must exist, but eventually if your container gets big enough, you should consider partitioning it for resource efficiency reasons.

image

Because containers are another type of entity (albeit a somewhat special one), they also have the system attribute Id.  If we look at what a container looks like, you will see that it is a simple XML structure with no custom properties (though that could change).

<s:Container xmlns:s="http://schemas.microsoft.com/sitka/2008/03/"
 xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
 xmlns:x="http://www.w3.org/2001/XMLSchema">
  <s:Id>Books</s:Id>
  <s:Version>1</s:Version>
</s:Container>
 
In this case, the container here is called "Books".  The Version attribute comes back from SSDS to tell me what version of the entity I am looking at.  It can be used for simple concurrency (more on that in another post).

Authority

Finally, we have the idea of an authority.  This is a collection of containers co-located in a specific datacenter today (though this might not always remain true).  The authority is analogous to a namespace for .NET developers in a sense as it scopes the containers within it.  There is a DNS name attached to the authority that we use when we address a resource using REST.  This DNS name is provisioned for a particular user and added to a subdomain off the main SSDS domain name.

image

 

Pulling it together

Now that we have covered what ACE means, let's pull it together and see how it affects our URI that we build to address a resource.

https://{authority}.data.sitka.microsoft.com/v1/{containerID}/{entityID}

This build a stable URI for us given an authority name, container Id, and entity Id.

I don't always need all three identifiers in order to address a resource.  I can actually query to find resources.  For instance, I will query the contents of an authority (the containers), if I target the authority with a GET request:

https://{authority}.data.sitka.microsoft.com/v1/?q=

More commonly, I can query the contents of a container (the entities) by targeting the container URI with a GET request:

https://{authority}.data.sitka.microsoft.com/v1/{containerID}?q=

In this model, you will notice the ?q= portion of the URI.  This is the indicator that we want a query.  I am not specifying one here, so it acts more like a SELECT *.  In a later post, I will cover the query model in more detail.