Thursday, 13 March 2008

SSDS Query Model

Note: The query model is subject to change based on feedback; this is how it stands today.  You can pre-register for the beta at the SSDS home page.

Design Decisions

In this post, I am going to cover how the query model works in SQL Server Data Services (SSDS) today and some of the design goals of SSDS in general.

The first thing to understand is that the SSDS team made a conscious decision to start simple and expand functionality later.  The reasoning behind this is simple:

  • Simple services that lower the bar to get started are easier to adopt.  We want to offer the smallest useful feature set that developers can start using to build applications.
  • At the same time, we want to make sure that every feature that is available will scale appropriately at internet-level.

As such, the right model is what the team chose:  start simple and expose richer and richer functionality as we prove out the scale and developer's need.  The team is committed to short (8 week) development cycles that prioritizes the features based on feedback.

The Query Model

Now that I have covered the design decisions, let's take a look at how the query model actually operates.  From my last post, you see that I already showed you the syntax that begins the query operation (?q=).  What is important to understand is the following:

  • The widest scope for any search is the Container.

Well... almost, but I will get to that.  To put this another way: you cannot retrieve entity objects today by searching at the authority level.  That's right:  there is no cross-container search.  This might change in the future, but that is how it is today.  For developers familiar with LDAP terminology, this roughly equates to a SearchScope.OneLevel operation.  The syntax again is:

https://{authority}.data.sitka.microsoft.com/v1/{container}?q={query}

It is important to note that there is no trailing forward slash after the container Id in that. 

Now, what did I mean by "almost"?  If you saw my last post, I showed the following:

https://{authority}.data.sitka.microsoft.com/v1/?q=

This would imply query capabilities, right?  Well, it turns out that you can query at the Authority level, but you can only query Container objects (not the entities contained in them).  Since a Container is a simple entity with only Id and Version today, it is of limited usefulness to query at this level.  However, if we were to add customizable metadata support to attach to a Container, then query might become much more interesting (e.g. find all containers where attribute "foo" equals "bar").

Syntax

The SSDS team decided to adopt a LINQ-like syntax that is carried on the query string.  It has the basic format of the following:

from e in entities {where clause} select e

Only the part inside the {}'s is modifiable*.  We can infer the following from this syntax:  One, you can perform only simple selects today.  There are no joins, group by, ordering, etc.  Two, there is no projection today.  There is only the return of the complete entity as the entity is both the unit of storage (POST and PUT) as well as the unit of return (GET).

Now, let's inspect the "{where clause}".  The syntax here in more detail is:

where {property} {operator} {constant}

Property

The '{property}' part of the expression can operate over both system properties (Id, Kind) as well as flexible properties (anything you added).  The syntax is slightly different depending on which it is.  For example:

e.Kind
e["MyProperty"]

In the case of the system properties, we use the dot notation and the name of the system property.  The custom flex properties are addressed using the brace [] syntax in a weakly-typed syntax.  This makes sense of course as there is no way we could know the syntax of a schema-less entity.

Operator

The operators ({operator}) that are supported are: ==, <, >, >=, <=, !=

Constant

Finally, for the '{constant}', we have just that - a constant.  We do not currently support other expressions or other properties here.  As an example the following is invalid:

e["Pages"] > e["AvgPages"]

while, this would be perfectly valid:

e["Pages"] > 300

The type of the constant is inferred by the syntax.  Using the wrong type will net you zero results possibly, so keep this in mind.  Here are some simple examples to show how to format the constant:

e["xsi_decimal_type"] > 300
e["xsi_string_type"] != "String"
e["xsi_boolean_type"] == true
e["xsi_dateTime_type"] == DateTime("2008-02-09T07:45:23.855")
e["xsi_base64Binary_type"] == Binary("AB34CD")
e.Kind == "String"
e.Id == "String"

The last point here in this syntax is that we can tie together multiple expressions using the && (AND), || (OR), and ! (NOT) logical operators.  Precedence can be set using parentheses ().

Paging

Results are limited to 500 per GET operation.  This is an arbitrary number right now, so don't depend on it.  The EntitySet that is returned is implicitly ordered by Id, so you would need to perform a simple loop logic to page through larger sets of data.  Something to the effect of:

from e in entities where e.Id > "last ID seen" select e

Pulling It Together

Since the query is submitted on the querystring of the URL, we need to encode the querystring using a URL encoding function.  It is actually easiest to use a UrlTemplate from .NET 3.5 which does all the right things for you.

Here is a properly formatted GET request that you can type into a browser to retrieve a set of entities.

https://dunnry.data.sitka.microsoft.com/v1/books?q=from+e+in+entities+where+e["Pages"]+>+300+&&+e.Kind=="Bookv2"+select+e

In this case, I am asking for Entities of the Kind "Bookv2" that have more than 300 (numeric) pages (from "Pages" flex property).  Simple, right?

The actual format that is returned is POX today (support for JSON and APP coming).  It would look something like this:

<s:EntitySet xmlns:s="http://schemas.microsoft.com/sitka/2008/03/"
 xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
 xmlns:x="http://www.w3.org/2001/XMLSchema">
  <Bookv2>
    <s:Id>MyNewBook</s:Id>
    <s:Version>1</s:Version>
    <Name xsi:type="x:string">My Special Book 423</Name>
    <ISBN xsi:type="x:string">ISBN 423</ISBN>
    <PublishDate xsi:type="x:dateTime">2008-02-09T07:43:51.13625</PublishDate>
    <Pages xsi:type="x:decimal">400</Pages>
  </Bookv2>
</s:EntitySet>

The EntitySet tag wraps around one or many (in this case one) XML nodes that are of Kind "Bookv2".  As a developer, I need to interpret this XML and read back the results.

The last point here is that this is the entire entity.  As a consumer of this service, I need to think about how my entities will be used.  Since there is no projection and only a full entity is returned, it may make sense to break apart larger entities with commonly queried properties and leave larger binary properties (and later blobs) in separate entities.  You don't want to pay for the transfer cost or the performance hit of transferring multi-MB (or GB) entities when you only want to read a single flex property from it.  I envision people will start to make very light and composable entities to deal with this.

To anyone wondering what "Sitka" is in the namespaces or the URI... it is the old code name for SSDS.  That will more than likely change with the next refresh.

Limitations Today

The queries are fairly simple.  There is no capability for cross-container queries or joins of any type today.  There are no group by operators, or order by functionality.  There is also no "LIKE", Contains, BeginsWith, or EndsWith functionality.

I have to stress however, that this is a starting point for the SSDS query API, not the final functionality.  I will of course update this blog with the new functionality as it rolls into the service.  Again, the team decided that it was better to put a simple and approachable service out there today and gather feedback on what works and what doesn't for specific scenarios than to sit back and code a bunch of functionality that might not be necessary or meet the user's needs.  I think this was a good decision and there is an amazing variety of applications that you can build using just this API.

* - not quite... turns out you can change the 'e' to anything you like as long as you are consistent in reference, but that hardly counts as changing the query.