Wednesday, 02 July 2008

Working with Objects in SSDS Part 3

Here is my last installment in this series of working with objects in SQL Server Data Services.  For background, readers should read the following:

Serialization in SSDS

Working with Objects in SSDS Part 1

Working with Objects in SSDS Part 2

Last time, we concluded with a class called SsdsEntity<T> that became an all-purpose wrapper or veneer around our CLR objects.  This made it simple to take our existing classes and serialize them as entities in SSDS.

In this post, I want to discuss how the querying in the REST library works.  First a simple example:

var ctx = new SsdsContext(
    "authority=http://dunnry.data.beta.mssds.com/v1/;username=dunnry;password=secret"
    );

var container = ctx.OpenContainer("foo");
var foo = new Foo { IsPublic = false, Name = "MyFoo", Size = 12 };

//insert it with unique id guid string
container.Insert(foo, Guid.NewGuid().ToString());

//now query for it
var results = container.Query<Foo>(e => e.Entity.IsPublic == false && e.Entity.Size > 2);

//Query<T> returns IEnumerable<SsdsEntity<T>>, so foreach over it
foreach (var item in results)
{
    Console.WriteLine(item.Entity.Name);
}

I glossed over it in my previous posts with this library, but I have a class called SsdsContext that acts as my credential store and factory to create SsdsContainer objects where I perform my operations.  Here, I have opened a container called 'foo', which would relate to the URI (http://dunnry.data.beta.mssds.com/v1/foo) according to the authority name I passed on the SsdsContext constructor arguments.

I created an instance of my Foo class (see this post if you want to see what a Foo looks like) and inserted it.  We know that under the covers we have an XmlSerializer doing the work to serialize that to the proper POX wire format.  So far, so good.  Now, I want to retrieve that same entity back from SSDS. The key line here is the table.Query<T>() call.  It accepts a Expression<Func<SsdsEntity<T>, bool>> argument that represents a strongly typed query.

For the uninitiated, the Expression<TDelegate> is a way to represent lambda expressions in an abstract syntax tree.  We can think of them as a way to model what the expression does without generating the bits of code necessary to actually do it.  We can inspect the Expression and create new ones based on it until finally we can call Compile and actually convert the representation of the lambda into something that can execute.

The Func<SsdsEntity<T>, bool> represents a delegate that accepts a SsdsEntity<T> as an argument and returns a boolean.  This effectively represents the WHERE clause in the SSDS LINQ query syntax.  Since SsdsEntity<T> contains an actual type T in the Entity property, you can query directly against it in a strongly typed fashion!

What about those flexible properties that I added to support flexible attributes outside of our T?  I mentioned that I wanted to keep the PropertyBucket (a Dictionary<string, object>) property public for querying.  In order to use the flexible properties that you add, you simply use it in a weakly typed manner:

var results = container.Query<Foo>(e => e.PropertyBucket["MyFlexProp"] > 10);

As you can see, any boolean expression that you can think of in the string-based SSDS LINQ query syntax can now be expressed in a strongly-typed manner using the Func<SsdsEntity<T>, bool> lambda syntax.

How it works

Since I have the expression tree of what your query looks like in strongly-typed terms, it is a simple matter to take that and convert it to the SSDS LINQ query syntax that looks like "from e in entities where [....] select e" that is appended to the query string in the REST interface.  I should say it is a simple matter because Matt Warren did a lot of the heavy lifting for us and provided the abstract expression visitor (ExpressionVisitor) as well as the expression visitor that partially evaluates the tree to evaluate constants (SubTreeEvaluator).  This last part is important because it allows us to write this:

int i = 10;
string name = "MyFoo";

var results = container.Query<Foo>(e => e.Entity.Name == name && e.Entity.Size > i);

Without the partial tree evaluation, you would not be able to express the right hand side of the equation.  All I had to do was implement an expression visitor that correctly evaluated the lambda expression and converted it to the LINQ syntax that SSDS expects (SsdsExpressionVisitor).  It would be a trivial matter to actually implement the IQueryProvider and IQueryable interfaces to make the whole thing work inside LINQ to Objects.

Originally, I did supply the IQueryProvider for this implementation but after consideration I have decided that using methods from the SsdsContainer class instead of the standard LINQ syntax is the best way to proceed.  Mainly, this has to do with the fact that I want to make it more explicit to the developer what will happen under the covers rather than using the standard Where() extension method.

Querying data

The main interaction to return data is via the Query<T> method.  This method is smart enough to add the Kind into the query for you based on the T supplied.  So, if you write something like:

var results = container.Query<Foo>(e => e.Entity.Size > 2);

This is actually translated to "from e in entities where e["Size"] > 2 && e.Kind == "Foo" select e".  The addition of the kind is important because we want to limit the results as much as possible.  If there happened to be many kinds in the container that had the flexible property "Size", it would actually return those as well in the wire response.

Of course, what about if you want that to happen?  What if you want to return other kinds that have the "Size" property?  To do this, I have introduced a class called SsdsEntityBucket.  It is exactly what it sounds like.  To use it, you simply specify a query that uses additional types with either the Query<T,U,V> or Query<T,U> methods.  Here is an example:

var foo = new Foo
{
    IsPublic = true,
    MyCheese = new Cheese { LastModified = DateTime.Now, Name = "MyCheese" },
    Name = "FooMaster",
    Size = 10
};

container.Insert(foo, foo.Name);
container.Insert(foo.MyCheese, foo.MyCheese.Name);

//query for bucket...
var bucket = container.Query<Foo, Cheese>(
    (f, c) => f.Entity.Name == "FooMaster" || c.Entity.Name == "MyCheese"
    );

var f1 = bucket.GetEntities<Foo>().Single();
var c1 = bucket.GetEntities<Cheese>().Single();

The calls to GetEntities<T> returns IEnumerable<SsdsEntity<T>> again.  However, this was done in a single call to SSDS instead of multiple calls per T.

Paging

As I mentioned earlier, I wanted the developer to understand what they were doing when they called each method, so I decided to make paging explicit.  If I had potentially millions of entities in SSDS, it would be a bad mistake to allow a developer to issue a simple query that seamlessly paged the items back - especially if the query was something like e => e.Id != "".  Here is how I handled paging:

var container = ctx.OpenContainer("paging");

List<Foo> items = new List<Foo>();
int i = 1;

container.PagedQuery<Foo>(
    e => e.Entity.Size != 0,
    c =>
    {
        Console.WriteLine("Got Page {0}", i++);
        items.AddRange(c.Select(s => s.Entity));
    }
);

Console.WriteLine(items.Count);

The PagedQuery<T> method takes two arguments.  One is the standard Expression<Func<SsdsEntity<T>, bool>> that you use to specify the WHERE clause for SSDS, and the other is Action<IEnumerable<SsdsEntity<T>>> which represents a delegate that takes an IEnumerable<SsdsEntity<T>> and has a void return.  This is a delegate you provide that does something with the 500 entities returned per page (it gets called once per page).  Here, I am just adding them into a List<T>, but I could easily be doing anything else here.  Under the covers, this is adding the paging term dynamically into the expression tree that is evaluated.

What's next

This is a good head start on using the REST API with SSDS today.  However, there are a number of optimizations that could be made to the model: additional overloads, perhaps some extension methods for common operations, etc.

As new features are added, I will endeavor to update this as well (blob support comes to mind here).  Additionally, I have a few optimizations planned around concurrency for CRUD operations. 

I have published this out to Code Gallery and I welcome feedback and bug fixes.  Linked here.