Friday, 14 November 2008

Fixing the SDS HOL from Azure Training Kit

If you downloaded the Azure Services Training Kit (which you should), you would find a compilation error on some of the SQL Data Services HOLs.

image

The error is somewhat self-explanatory:  the solution is missing the AjaxControlToolkit.  The reason that this file is missing is not because we forgot it, but rather our automated packaging tool was trying to be helpful.  You see, we have a tool that cleans up the solutions by deleting the 'bin' and 'obj' folders and any .pdb files in the solution before packaging.  In this case, it killed the bin directory where the AjaxControlToolkit.dll was deployed.

To fix this error, you just need to visit AjaxControlToolkit project on CodePlex and download it again.  The easiest way is to download the AjaxControlToolkit-Framework3.5Sp1-DllOnly.zip, extract the 'Bin' directory and copy it in to the root of the solution you are trying to use.

Sorry about that - we will fix it for our next release.

(updated: added link)

Tuesday, 11 November 2008

Refreshed REST library for SQL Data Services

I finally got around to doing a quick refresh on the SSDS REST Library.  It should now be called SDS REST library of course, but I doubt I will change the name as that would break the URL in Code Gallery.

I am calling this one a 'Refresh' release because I am not adding any features.  The purpose of this release was to fix the serialization such that it runs in partial trust.  Partial trust support is desirable because that means you can use this library in Windows Azure projects.

I found out an interesting fact while working on this about the XmlSerializer.  First, serializing a generic type, in this case SsdsEntity<T> works just fine in partial trust.  However, deserializing that exact same type will not work without Full trust.  To fix it, I had to actually remove any and all code that tried to do it.  Instead, I deserialized the T in SsdsEntity<T> and manually created the SsdsEntity part.  You can see those updates in the SsdsEntitySerializer class as well as the SsdsEntity<T> Attributes property in the setter if you check.  I don't see any problems with my solution, and in fact, it may end up being more efficient.

Remaining work to do:  implement the JOIN, OrderBy, and TOP operations (if I find the time).

Get it here:  SDS REST Library

Tuesday, 04 November 2008

NeoGeo on Building with SDS

During PDC, I was lucky enough to film a short video with Marc Hoeppner, one of the Regional Directors from Germany and the Managing Director of NeoGeo New Media GmbH.  Marc was involved very early with SQL Data Services and has provided some valuable feedback to us on features and direction.

I managed to get Marc to show me his company's media asset management product, neoMediaCenter.NET.  What struck me during this interview was how his team did a hybrid approach to cloud services.  That is, their original product uses SQL Server on the backend for storage and querying.  Instead of forcing customers to make an either/or decision, they took the approach of offering both.  You can move your data seamlessly between the cloud or the on-premises database. 

There are some real advantages to using SQL Data Services for this product: namely, with a click, you can move the data to the cloud where it can essentially be archived forever, but still available for consumption.  We like to term this 'cold storage'.  Imagine the model where you have thousands and thousands of digital assets.  For the assets that are temporally relevant, you can store them in the local on-premises database for the least latency.  However, as the data ages, it tends to be used less and less frequently.  Today, companies either invest in a bunch of new storage, archive it off to tape, or just delete the content once it gets to a certain age.  Instead of forcing customers to make one of these choices, Marc has added the capability to move this data out of the on-premises store and out to the cloud seamlessly.  It still appears in the application, but is served from the cloud.  This makes accessing this data simple (unlike tape or deleting it) as well as relatively inexpensive (unlike buying more disk space yourself).

Once we have multiple datacenters up and operational, you also get the geo-location aspect of this for free.  It may be the case that for certain sets of distributed customers, using the geo-located data is in fact faster than accessing the data on-premises as well.

This is a very cool demo.  If you watch towards the end, Marc shows a CIFS provider for SDS that allows you to mount SDS just like a mapped network drive.  Marc mentions it in the video, but he managed to build all this functionality in just a week!  It is interesting to note that Marc's team also made use of the SSDS REST library that provided the LINQ and strongly typed abstraction for querying and working with SDS (it was named before SDS, hence SSDS still).  I am happy to see that of course since I had a bit to do with that library. :)

Watch it here

Monday, 27 October 2008

Ruby on SQL Data Services SDK

A few months back, I embarked on a mission to get these Ruby samples produced as I felt it was important to show the flexibility and open nature of SDS.  The problem was that I have no practical experience with Ruby.  With this in mind, I looked to my friend, former co-worker, and Ruby pro, James Avery to get this done.  He did a terrific job as the developer and I am happy to present this today.

With the announcement today at PDC of the Azure Services Platform, we are releasing a set of Ruby samples for SQL Data Services (SDS), formerly called SSDS.  We are putting the source on GitHub, and the samples will be available as gems from RubyForge.

The samples really consist of a number of moving parts:

image

At the core of the samples is a Ruby REST library for SDS.  It performs the main plumbing to using the service.  Next, we have two providers for building applications: an ActiveRecord provider and an ActiveResource provider.  These providers make use of the Ruby REST library for SDS.

Finally, we have two samples that make use of the providers.  We have built a version of RadiantCMS that uses the ActiveRecord provider and a simple task list sample that shows how to use the ActiveResource provider.

To get started:

  1. Visit GitHub and download sds-tasks or sds-radiant.
  2. View the Readme file in the sample as it will direct you to download Rails and some of other pre-requisites.
  3. Use the gem installers to download the sds-rest library.
  4. Set some configuration with your SDS username/password (now called Azure Solution name and password when you provision).
  5. That's it!  Just run the ruby script server and see how REST and Ruby works with SDS.

NOTE:  PDC folks can get provisioned for SQL Services and .NET Services by visiting the labs and getting a provisioning code.  This is exclusive to PDC attendees until the beta.

I will look to James to provide a few posts and details on how he built it.  Thanks again James!

What's new in SQL Data Services for Developers?

I dropped in on a couple of the developers in the SQL Data Services team (formerly SSDS) to chat and film a video about some of the new features being released with the PDC build of the service.  Jason Hunter and Jeff Currier are two of the senior developers on the SDS team and with a few days warning that I would be stopping by, they managed to put together a couple cool demos for us.

This is a longer video, with lots of code and deep content - so set aside some time and watch Jason and Jeff walk us through the new relational features as well as blob support.

What's new in SQL Data Services for Developers?

Thursday, 04 September 2008

PhluffyFotos v2 Released

We have just released an updated version of the SSDS sample application called 'PhluffyFotos'.  Clouds are fluffy and this is a cloud services application - get it?  This sample application is a ASP.NET MVC and Windows Mobile application showing how to build a photo tagging and sharing site using our cloud data service, SSDS.  For this update:

  • Updated to MVC Preview 4.  We have removed hardcoded links and used the new filtering capability for authorization.  Of course, Preview 5 was just (and just) released as we were putting this out the door.  I might update this to Preview 5 later, but it will not be a big deal to do so. 
  • Updated to add thumbnail support.  Originally, we just downloaded the entire image and resized to thumbnail size.  This drags down performance in larger data sizes, so we fixed it for this release.
  • Updated to use the SSDS blob support.  Blob support was recently added with the latest sprint.  Previously, we were using the 'base64Binary' attributes to store the picture data.  With the new blob support, you supply a content type and content disposition, which will be streamed back to you on request. 
  • Updated to use the latest SSDS REST library.  This library gives us the ability to use and persist CLR objects to the service and use a LINQ-like query syntax.  This library saved us a ton of time and effort in building the actual application.  All the blob work, querying, and data access was done using this library.

The sample is available for download at CodePlex, and a live version is available to play with at PhluffyFotos.com.  I am opening this one up to the public to upload photos.  Maybe I am playing with fire here, so we will see how well it goes.  Keep in mind that this is a sample site and I will periodically blow away the data.  The live version has an added feature of integrating a source code viewer directly into the application.

SSDS REST Library v2 Released

I have just updated the Code Gallery page to reflect the new version of the REST-based library for SSDS.  This is a fairly major update to library and adds a ton of new features to make working with SSDS even easier than it already is for the .NET developer.  Added in this release:

  • Concurrency support via Etags and If-Match, If-None-Match headers.  To get a basic understanding of how this works, refer here.
  • Blob support.  The library introduces a new type called SsdsBlobEntity that encapsulates working with blobs in SSDS.  Overloads are available for both synchronous as well as async support.
  • Parallelization support via extension methods.  The jury is still out on this one and I would like to hear some feedback on it (both the technique as well as the methods).  Instead of using an interface, factory methods, etc., we are using extension methods supplied in a separate assembly to support parallel operations.  Since there are many different techniques to parallelize your code, this allows us to offer more than one option.  Each additional assembly can also take dependencies that the entire library might not want to take as well.  Imagine that we get providers for Parallel Extensions, CCR, or perhaps other home-baked remedies.  A very simple provider using Parallel Extensions is included.
  • Bug fixes.  Hard to believe, but yes, I did have a few bugs in my code.  This release cleans up a few of the ones found in the LINQ expression syntax parser as well as a few oversights in handling date/times.
  • Better test coverage.  Lots more tests included to not only prove out that stuff works, but also to show how to use it.

If you just want to see this library in action, refer to the photo sharing and tagging application called 'PhluffyFotos' that pulls it all together (sans parallelization, I suppose).  You can use the integrated source viewer to see how the library works on a 'real' application (or a real sample application at least).

Tuesday, 26 August 2008

Concurrency with SSDS via REST

Eugenio already covered concurrency via the SOAP interface with with latest post.  The idea is exactly the same in REST, but the mechanics are slightly different.  For REST, you specify a "Etag" value and either the If-Match or If-None-Match headers.

Here is a simplified client that does a PUT/POST operation on SSDS:

internal void Send(Uri scope, string etag, string method, string data,
    Action<string, WebHeaderCollection> action, Action<WebException> exception)
{
    using (var client = new WebClient { Credentials = _credentials })
    {
        client.Headers.Add(HttpRequestHeader.ContentType, "application/x-ssds+xml");

        if (etag != null)
            client.Headers.Add(HttpRequestHeader.IfMatch, etag);

        client.UploadStringCompleted += (sender, e) =>
        {
            if (e.Error != null && exception != null)
            {
                exception((WebException)e.Error);
            }
            else
            {
                if (action != null)
                {
                    action(e.Result, client.ResponseHeaders);
                }
            }
        };

        client.UploadStringAsync(scope, method, data);
    }
}

 

All this does is add the If-Match header and the Etag (which corresponds to the Flexible Entity Version system attribute).  This instructs the system to only update if the version held in SSDS matches the version specified in the Etag with the If-Match header.

Failure of this condition will result in a 412 error "A precondition, such as Version, could not be met".  You simply need to handle this exception and move on.

Next, there are times when you have a large blob or a largish flexible entity.  You only want to perform the GET if you don't have the latest version.  In this case, you specify the Etag again with the If-None-Match header.

Here is a simplified client that shows how the GET would work:

public void Get(Uri scope, string etag,
    Action<string, WebHeaderCollection> action, Action<WebException> exception)
{
    using (var client = new WebClient { Credentials = _credentials })
    {
        client.Headers.Add(HttpRequestHeader.ContentType, "application/x-ssds+xml");

        if (etag != null)
            client.Headers.Add(HttpRequestHeader.IfNoneMatch, etag);

        client.DownloadStringCompleted += (sender, e) =>
        {
            if (e.Error != null && exception != null)
            {
                exception((WebException)e.Error);
            }
            else
            {
                if (action != null)
                    action(e.Result, client.ResponseHeaders);
            }

        };

        client.DownloadStringAsync(scope);
    }

When you add the Etag and this header, you will receive a 304 error "Not Modified" if the content has NOT changed since the Etag value you sent.

I am attaching a small Visual Studio sample that includes this code and demonstrates these techniques.

Friday, 25 July 2008

Rendering POX for SSDS in Internet Explorer

With the release of Sprint 3 bits, you might have noticed that you are prompted to download now when you hit the service directly from IE.  Because the content type changed from 'application/xml' to 'application/x-ssds+xml', IE just doesn't know how to render the resulting response.

This is simple to fix.  Copy the following to a .reg file and merge into your registry.

Windows Registry Editor Version 5.00

[HKEY_CLASSES_ROOT\MIME\Database\Content Type\application/x-ssds+xml]
"CLSID"="{48123BC4-99D9-11D1-A6B3-00C04FD91555}"
"Extension"=".xml"
"Encoding"=hex:08,00,00,00

Now, you should be back to the behavior you are used to.

Wednesday, 02 July 2008

Working with Objects in SSDS Part 3

Here is my last installment in this series of working with objects in SQL Server Data Services.  For background, readers should read the following:

Serialization in SSDS

Working with Objects in SSDS Part 1

Working with Objects in SSDS Part 2

Last time, we concluded with a class called SsdsEntity<T> that became an all-purpose wrapper or veneer around our CLR objects.  This made it simple to take our existing classes and serialize them as entities in SSDS.

In this post, I want to discuss how the querying in the REST library works.  First a simple example:

var ctx = new SsdsContext(
    "authority=http://dunnry.data.beta.mssds.com/v1/;username=dunnry;password=secret"
    );

var container = ctx.OpenContainer("foo");
var foo = new Foo { IsPublic = false, Name = "MyFoo", Size = 12 };

//insert it with unique id guid string
container.Insert(foo, Guid.NewGuid().ToString());

//now query for it
var results = container.Query<Foo>(e => e.Entity.IsPublic == false && e.Entity.Size > 2);

//Query<T> returns IEnumerable<SsdsEntity<T>>, so foreach over it
foreach (var item in results)
{
    Console.WriteLine(item.Entity.Name);
}

I glossed over it in my previous posts with this library, but I have a class called SsdsContext that acts as my credential store and factory to create SsdsContainer objects where I perform my operations.  Here, I have opened a container called 'foo', which would relate to the URI (http://dunnry.data.beta.mssds.com/v1/foo) according to the authority name I passed on the SsdsContext constructor arguments.

I created an instance of my Foo class (see this post if you want to see what a Foo looks like) and inserted it.  We know that under the covers we have an XmlSerializer doing the work to serialize that to the proper POX wire format.  So far, so good.  Now, I want to retrieve that same entity back from SSDS. The key line here is the table.Query<T>() call.  It accepts a Expression<Func<SsdsEntity<T>, bool>> argument that represents a strongly typed query.

For the uninitiated, the Expression<TDelegate> is a way to represent lambda expressions in an abstract syntax tree.  We can think of them as a way to model what the expression does without generating the bits of code necessary to actually do it.  We can inspect the Expression and create new ones based on it until finally we can call Compile and actually convert the representation of the lambda into something that can execute.

The Func<SsdsEntity<T>, bool> represents a delegate that accepts a SsdsEntity<T> as an argument and returns a boolean.  This effectively represents the WHERE clause in the SSDS LINQ query syntax.  Since SsdsEntity<T> contains an actual type T in the Entity property, you can query directly against it in a strongly typed fashion!

What about those flexible properties that I added to support flexible attributes outside of our T?  I mentioned that I wanted to keep the PropertyBucket (a Dictionary<string, object>) property public for querying.  In order to use the flexible properties that you add, you simply use it in a weakly typed manner:

var results = container.Query<Foo>(e => e.PropertyBucket["MyFlexProp"] > 10);

As you can see, any boolean expression that you can think of in the string-based SSDS LINQ query syntax can now be expressed in a strongly-typed manner using the Func<SsdsEntity<T>, bool> lambda syntax.

How it works

Since I have the expression tree of what your query looks like in strongly-typed terms, it is a simple matter to take that and convert it to the SSDS LINQ query syntax that looks like "from e in entities where [....] select e" that is appended to the query string in the REST interface.  I should say it is a simple matter because Matt Warren did a lot of the heavy lifting for us and provided the abstract expression visitor (ExpressionVisitor) as well as the expression visitor that partially evaluates the tree to evaluate constants (SubTreeEvaluator).  This last part is important because it allows us to write this:

int i = 10;
string name = "MyFoo";

var results = container.Query<Foo>(e => e.Entity.Name == name && e.Entity.Size > i);

Without the partial tree evaluation, you would not be able to express the right hand side of the equation.  All I had to do was implement an expression visitor that correctly evaluated the lambda expression and converted it to the LINQ syntax that SSDS expects (SsdsExpressionVisitor).  It would be a trivial matter to actually implement the IQueryProvider and IQueryable interfaces to make the whole thing work inside LINQ to Objects.

Originally, I did supply the IQueryProvider for this implementation but after consideration I have decided that using methods from the SsdsContainer class instead of the standard LINQ syntax is the best way to proceed.  Mainly, this has to do with the fact that I want to make it more explicit to the developer what will happen under the covers rather than using the standard Where() extension method.

Querying data

The main interaction to return data is via the Query<T> method.  This method is smart enough to add the Kind into the query for you based on the T supplied.  So, if you write something like:

var results = container.Query<Foo>(e => e.Entity.Size > 2);

This is actually translated to "from e in entities where e["Size"] > 2 && e.Kind == "Foo" select e".  The addition of the kind is important because we want to limit the results as much as possible.  If there happened to be many kinds in the container that had the flexible property "Size", it would actually return those as well in the wire response.

Of course, what about if you want that to happen?  What if you want to return other kinds that have the "Size" property?  To do this, I have introduced a class called SsdsEntityBucket.  It is exactly what it sounds like.  To use it, you simply specify a query that uses additional types with either the Query<T,U,V> or Query<T,U> methods.  Here is an example:

var foo = new Foo
{
    IsPublic = true,
    MyCheese = new Cheese { LastModified = DateTime.Now, Name = "MyCheese" },
    Name = "FooMaster",
    Size = 10
};

container.Insert(foo, foo.Name);
container.Insert(foo.MyCheese, foo.MyCheese.Name);

//query for bucket...
var bucket = container.Query<Foo, Cheese>(
    (f, c) => f.Entity.Name == "FooMaster" || c.Entity.Name == "MyCheese"
    );

var f1 = bucket.GetEntities<Foo>().Single();
var c1 = bucket.GetEntities<Cheese>().Single();

The calls to GetEntities<T> returns IEnumerable<SsdsEntity<T>> again.  However, this was done in a single call to SSDS instead of multiple calls per T.

Paging

As I mentioned earlier, I wanted the developer to understand what they were doing when they called each method, so I decided to make paging explicit.  If I had potentially millions of entities in SSDS, it would be a bad mistake to allow a developer to issue a simple query that seamlessly paged the items back - especially if the query was something like e => e.Id != "".  Here is how I handled paging:

var container = ctx.OpenContainer("paging");

List<Foo> items = new List<Foo>();
int i = 1;

container.PagedQuery<Foo>(
    e => e.Entity.Size != 0,
    c =>
    {
        Console.WriteLine("Got Page {0}", i++);
        items.AddRange(c.Select(s => s.Entity));
    }
);

Console.WriteLine(items.Count);

The PagedQuery<T> method takes two arguments.  One is the standard Expression<Func<SsdsEntity<T>, bool>> that you use to specify the WHERE clause for SSDS, and the other is Action<IEnumerable<SsdsEntity<T>>> which represents a delegate that takes an IEnumerable<SsdsEntity<T>> and has a void return.  This is a delegate you provide that does something with the 500 entities returned per page (it gets called once per page).  Here, I am just adding them into a List<T>, but I could easily be doing anything else here.  Under the covers, this is adding the paging term dynamically into the expression tree that is evaluated.

What's next

This is a good head start on using the REST API with SSDS today.  However, there are a number of optimizations that could be made to the model: additional overloads, perhaps some extension methods for common operations, etc.

As new features are added, I will endeavor to update this as well (blob support comes to mind here).  Additionally, I have a few optimizations planned around concurrency for CRUD operations. 

I have published this out to Code Gallery and I welcome feedback and bug fixes.  Linked here.

Thursday, 26 June 2008

Working with Objects in SSDS Part 2

This is the second post in my series on working with SQL Server Data Service (SSDS) and objects.  For background, you should read my post on Serializing Objects in SSDS and the first post in this series.

Last time I showed how to create a general purpose serializer for SSDS using the standard XmlSerializer class in .NET.  I created a shell entity or a 'thin veneer' for objects called SsdsEntity<T>, where T was any POCO (plain old C#/CLR object).  This allowed me to abstract away the metadata properties required for SSDS without changing my actual POCO object (which, I noted was lame to do).

If we decide that we will use SSDS to interact with POCO T, an interesting situation arises.  Namely, once we have defined T, we have in fact defined a schema - albeit one only enforced in code you write and not by the SSDS service itself.  One of the advantages of using something like SSDS is that you have a lot of flexibility in storing entities  (hence the term 'flexible entity') without conforming to schema.  Since, I want to support this flexibility, it means I need to think of a way to support not only the schema implied by T, but also additional and arbitrary properties that a user might consider.

Some may wonder why we need this flexibility:  after all, why not just change T to support whatever we like?  The issue comes up most often with code you do not control.  If you already have an existing codebase with objects that you would like to store in SSDS, it might not be practical or even possible to change the T to add additional schema.

Even if you completely control the codebase, expressing relationships between CLR objects and expressing relationships between things in your data are two different ideas - sometimes this problem has been termed 'impedance mismatch'.

In the CLR, if two objects are related, they are often part of a collection, or they refer to an instance on another object.  This is easy to express in the CLR (e.g. Instance.ChildrenCollection["key"]).  In your typical datasource, this same relationship is done using foreign keys to refer to other entities.

Consider the following classes:

public class Employee
{
    public string EmployeeId { get; set; }
    public string Name { get; set; }
    public DateTime HireDate { get; set; }
    public Employee Manager { get; set; }
    public Project[] Projects { get; set; }
}

public class Project
{
    public string ProjectId { get; set; }
    public string Name { get; set; }
    public string BillCode { get; set; }
}

Here we see that the Employee class refers to itself as well as contains a collection of related projects (Project class) that the employee works on.  SSDS only supports simple scalar types and no arrays or nested objects today, so we cannot directly express this in SSDS.  However, we can decompose this class and store the bits separately and then reassemble later.  First, let's see what that looks like and then we can see how it was done:

var projects = new Project[]
{
    new Project { BillCode = "123", Name = "TPS Slave", ProjectId = "PID01"},
    new Project { BillCode = "124", Name = "Programmer", ProjectId = "PID02" }
};

var bill = new Employee
{
    EmployeeId = "EMP01",
    HireDate = DateTime.Now.AddMonths(-1),
    Manager = null,
    Name = "Bill Lumbergh",
    Projects = new Project[] {}
};

var peter  = new Employee
{
    EmployeeId = "EMP02",
    HireDate = DateTime.Now,
    Manager = bill,
    Name = "Peter Gibbons",
    Projects = projects
};

var cloudpeter = new SsdsEntity<Employee>
{
    Entity = peter,
    Id = peter.EmployeeId
};

var cloudbill = new SsdsEntity<Employee>
{
    Entity = bill,
    Id = bill.EmployeeId
};

//here is how we add flexible props
cloudpeter.Add<string>("ManagerId", peter.Manager.EmployeeId);

var table = _context.OpenContainer("initech");
table.Insert(cloudpeter);
table.Insert(cloudbill);

var cloudprojects = peter.Projects
    .Select(s => new SsdsEntity<Project>
    { 
        Entity = s,
        Id = Guid.NewGuid().ToString()
    });

//add some metadata to track the project to employee
foreach (var proj in cloudprojects)
{
    proj.Add<string>("RelatedEmployee", peter.EmployeeId);
    table.Insert(proj);
}

All this code does is create two employees and two projects and set the relationships between them.  Using the Add<K> method, I can insert any primitive type to go along for the ride with the POCO.  If we query the container now, this is what we see:

<s:EntitySet 
    xmlns:s="http://schemas.microsoft.com/sitka/2008/03/" 
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
    xmlns:x="http://www.w3.org/2001/XMLSchema">
  <Project>
    <s:Id>2ffd7a92-2a3b-4cd8-a5f7-55f40c3ba2b0</s:Id>
    <s:Version>1</s:Version>
    <ProjectId xsi:type="x:string">PID01</ProjectId>
    <Name xsi:type="x:string">TPS Slave</Name>
    <BillCode xsi:type="x:string">123</BillCode>
    <RelatedEmployee xsi:type="x:string">EMP02</RelatedEmployee>
  </Project>
  <Project>
    <s:Id>892dbb1e-ba47-4c87-80e6-64fbb46da935</s:Id>
    <s:Version>1</s:Version>
    <ProjectId xsi:type="x:string">PID02</ProjectId>
    <Name xsi:type="x:string">Programmer</Name>
    <BillCode xsi:type="x:string">124</BillCode>
    <RelatedEmployee xsi:type="x:string">EMP02</RelatedEmployee>
  </Project>
  <Employee>
    <s:Id>EMP01</s:Id>
    <s:Version>1</s:Version>
    <EmployeeId xsi:type="x:string">EMP01</EmployeeId>
    <Name xsi:type="x:string">Bill Lumbergh</Name>
    <HireDate xsi:type="x:dateTime">2008-05-25T23:59:49</HireDate>
  </Employee>
  <Employee>
    <s:Id>EMP02</s:Id>
    <s:Version>1</s:Version>
    <EmployeeId xsi:type="x:string">EMP02</EmployeeId>
    <Name xsi:type="x:string">Peter Gibbons</Name>
    <HireDate xsi:type="x:dateTime">2008-06-25T23:59:49</HireDate>
    <ManagerId xsi:type="x:string">EMP01</ManagerId>
  </Employee>
</s:EntitySet>

As you can see, I have stored extra data in my 'flexible' entity with the ManagerId property (on one entity) and RelatedEmployee property on the Project kinds.  This allows me to figure out later what objects are related to each other since we can't model the CLR objects relationships directly.  Let's see how this was done.

public class SsdsEntity<T> where T: class
{
    Dictionary<string, object> _propertyBucket = new Dictionary<string, object>();

    public SsdsEntity() { }

     [XmlIgnore]
    public Dictionary<string, object> PropertyBucket
    {
        get { return _propertyBucket; }
    }

    [XmlAnyElement]
    public XElement[] Attributes
    {
        get
        {
            //using XElement is much easier than XmlElement to build
            //take all properties on object instance and build XElement
            var props =  from prop in typeof(T).GetProperties()
                         let val = prop.GetValue(this.Entity, null)
                         where prop.GetSetMethod() != null
                         && allowableTypes.Contains(prop.PropertyType) 
                         && val != null
                         select new XElement(prop.Name,
                             new XAttribute(Constants.xsi + "type",
                                 XsdTypeResolver.Solve(prop.PropertyType)),
                             EncodeValue(val)
                             );

            //Then stuff in any extra stuff you want
            var extra = _propertyBucket.Select(
                e =>
                     new XElement(e.Key,
                        new XAttribute(Constants.xsi + "type",
                             XsdTypeResolver.Solve(e.Value.GetType())),
                            EncodeValue(e.Value)
                            )
                );

            return props.Union(extra).ToArray();
        }
        set
        {
            //wrap the XElement[] with the name of the type
            var xml = new XElement(typeof(T).Name, value);

            var xs = new XmlSerializer(typeof(T));

            //xml.CreateReader() cannot be used as it won't support base64 content
            XmlTextReader reader = new XmlTextReader(
                xml.ToString(),
                XmlNodeType.Document,
                null
                );

            this.Entity = (T)xs.Deserialize(reader);

            //now deserialize the other stuff left over into the property bucket...
            var stuff = from v in value.AsEnumerable()
                        let props = typeof(T).GetProperties().Select(s => s.Name)
                        where !props.Contains(v.Name.ToString())
                        select v;

            foreach (var item in stuff)
            {
                _propertyBucket.Add(
                    item.Name.ToString(),
                    DecodeValue(
                        item.Attribute(Constants.xsi + "type").Value,
                        item.Value)
                    );
            }
        }
    }

    public void Add<K>(string key, K value)
    {
        if (!allowableTypes.Contains(typeof(K)))
            throw new ArgumentException(
		String.Format(
		"Type {0} not supported in SsdsEntity",
		typeof(K).Name)
		);

        if (!_propertyBucket.ContainsKey(key))
        {
            _propertyBucket.Add(key, value);
        }
        else
        {
            //replace the value
            _propertyBucket.Remove(key);
            _propertyBucket.Add(key, value);
        }
    }
}

I have omitted the parts of SsdsEntity<T> from the first post that didn't change.  The only other addition you don't see here is a helper method called DecodeValue, which as you might guess, interprets the string value in XML and attempts to cast it to a CLR type based on the xsi:type that comes back.

All we did here was add a Dictionary<string, object> property called PropertyBucket that holds our extra stuff we want to associate with our T instance.  Then in the getter and setter for the XElement[] property called Attributes, we are adding them into our array of XElement as well as pulling them back out on deserialization and stuffing them back into the Dictionary.  With this simple addition, we have fixed our in flexibility (or lack thereof) problem.  We are still limited to the simple scalar types, but as you can see you can work around this in a lot of cases by decomposing the objects down enough to be able to recreate them later.

The Add<K> method is a convenience only as we could operate directly against the Dictionary.  I also could have chosen to keep the Dictionary property bucket private and not expose it.  That would have worked just fine for serialization, but I wanted to also be able to query it later.

In my last post, I said I would introduce a library where all this code is coming from, but I didn't realize at the time how long this post would be and that I still need to cover querying.  So... next time, I will finish up this series by explaining how the strongly typed query model works and how all these pieces fit together to recompose the data back into objects (and release the library).

Tuesday, 17 June 2008

Working with Objects in SSDS Part 1

Last time we talked about SQL Server Data Services and serializing objects, we discussed how easy it was to use the XmlSerializer to deserialize objects using the REST interface.  The problem was that when we serialized objects using the XmlSerializer, it left out the xsi type declarations that we needed.  I gave two possible solutions to this problem - one that used the XmlSerializer and 'fixed' the output after the fact, and the other built the XML that we needed using XLINQ and Reflection.

Today, I am going to talk about a third technique that I have been using lately that I like better.  It uses some of the previous techniques and leverages a few tricks with XmlSerializer to get what I want.  First, let's start with a POCO (plain ol' C# object) class that we would like to use with SSDS.

public class Foo
{
    public string Name { get; set; }
    public int Size { get; set; }
    public bool IsPublic { get; set; }
}

In it's correctly serialized form, it looks like this on the wire:

<Foo xmlns:s="http://schemas.microsoft.com/sitka/2008/03/"
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xmlns:x="http://www.w3.org/2001/XMLSchema">
  <s:Id>someid</s:Id>
  <s:Version>1</s:Version>
  <Name xsi:type="x:string">My Foo</Name>
  <Size xsi:type="x:decimal">10</Size>
  <IsPublic xsi:type="x:boolean">false</IsPublic>
</Foo>

You'll notice that we have the additional system metadata attributes "Id" and "Version" in the markup.  We can account for the metadata attributes by doing something cheesy like deriving from a base class:

public abstract class Cheese
{
    public string Id { get; set; }
    public int Version { get; set; }
}

However this is very unnatural as our classes would all have to derive from our "Cheese" abstract base class (ABC).

public class Foo : Cheese
{
    public string Name { get; set; }
    public int Size { get; set; }
    public bool IsPublic { get; set; }
}

Developers familiar with remoting in .NET should be cringing right now as they remember the hassles associated with deriving from MarshalByRefObject.  In a world without multiple inheritance, this can be painful.  I want a model where I can use arbitrary POCO objects (redundant, yes I know) and not be forced to derive from anything or do what I would otherwise term unnatural acts.

What if instead, we derived a generic entity that could contain any other entity?

public class SsdsEntity<T> where T: class
{
    string _kind;

    public SsdsEntity() { }

    [XmlElement(Namespace = @"http://schemas.microsoft.com/sitka/2008/03/")]
    public string Id { get; set; }

    [XmlIgnore]
    public string Kind
    {
        get
        {
            if (String.IsNullOrEmpty(_kind))
            {
                _kind = typeof(T).Name;
            }
            return _kind;
        }
        set
        {
            _kind = value;
        }
    }

    [XmlElement(Namespace = @"http://schemas.microsoft.com/sitka/2008/03/")]
    public int Version { get; set; }

    [XmlIgnore]
    public T Entity { get; set; }
}

In this case, we have simply wrapped the POCO that we care about in a class that knows about the specifics of the SSDS wire format (or more accurately could serialize down to the wire format).

This SsdsEntity<T> is easy to use and provides access to the strongly typed object via the Entity property.

foomembers

Now, we just have to figure out how to serialize the SsdsEntity<Foo> object and we know that the metadata attributes are taken care of and our original POCO object that we care about is included.  I call it wrapping POCOs in a thin SSDS veneer.

The trick to this is to add a bucket of XElement objects on the SsdsEntity<T> class that will hold our public properties on our class T (i.e. 'Foo' class).  It looks something like this:

[XmlAnyElement]
public XElement[] Attributes
{
    get
    {
        //using XElement is much easier than XmlElement to build
        //take all properties on object instance and build XElement
        var props =  from prop in typeof(T).GetProperties()
                     let val = prop.GetValue(this.Entity, null)
                     where prop.GetSetMethod() != null
                     && allowableTypes.Contains(prop.PropertyType)
                     && val != null
                     select new XElement(prop.Name,
                         new XAttribute(Constants.xsi + "type",
                            XsdTypeResolver.Solve(prop.PropertyType)),
                         EncodeValue(val)
                         );

        return props.ToArray();
    }
    set
    {
        //wrap the XElement[] with the name of the type
        var xml = new XElement(typeof(T).Name, value);

        var xs = new XmlSerializer(typeof(T));

        //xml.CreateReader() cannot be used as it won't support base64 content
        XmlTextReader reader = new XmlTextReader(
            xml.ToString(),
            XmlNodeType.Document,
            null);

        this.Entity = (T)xs.Deserialize(reader);
    }
}

In the getter, we use Reflection and pull back a list of all the public properties on the T object and build an array of XElement.  This is the same technique I used in my first post on serialization.  The 'allowableTypes' object is a HashSet<Type> that we use to figure out which property types we can support in the service (DateTime, numeric, string, boolean, and byte[]).  When this property serializes, the XElements are simply added to the markup.

The EncodeValue method shown is a simple helper method that correctly encodes string values, boolean, dates, integers, and byte[] values for the attribute.  Finally, we are using a helper method that returns from a Dictionary<Type,string> the correct xsi type for the required attribute (as determined from the property type).

For deserialization, what happens is that the [XmlAnyElement] attribute causes all unmapped attributes (in this case, all non-system metadata attributes) to be collected in a collection of XElement.  When we deserialize, if we simply wrap an enclosing element around this XElement collection, it is exactly what we need for deserialization of T.  This is shown in the setter implementation.

It might look a little complicated, but now simple serialization will just work via the XmlSerializer.  Here is one such implementation:

public string Serialize(SsdsEntity<T> entity)
{
    //add a bunch of namespaces and override the default ones too
    XmlSerializerNamespaces namespaces = new XmlSerializerNamespaces();
    namespaces.Add("s", Constants.ns.NamespaceName);
    namespaces.Add("x", Constants.x.NamespaceName);
    namespaces.Add("xsi", Constants.xsi.NamespaceName);

    var xs = new XmlSerializer(
        entity.GetType(),
        new XmlRootAttribute(typeof(T).Name)
        );

    XmlWriterSettings xws = new XmlWriterSettings();
    xws.Indent = true;
    xws.OmitXmlDeclaration = true;

    using (var ms = new MemoryStream())
    {
        using (XmlWriter writer = XmlWriter.Create(ms, xws))
        {
            xs.Serialize(writer, entity, namespaces);
            ms.Position = 0; //reset to beginning

            using (var sr = new StreamReader(ms))
            {
                return sr.ReadToEnd();
            }
        }
    }
}

Deserialization is even easier since we are starting with the XML representation and don't have to build a Stream in memory.

public SsdsEntity<T> Deserialize(XElement node)
{
    var xs = new XmlSerializer(
        typeof(SsdsEntity<T>),
        new XmlRootAttribute(typeof(T).Name)
        );

    //xml.CreateReader() cannot be used as it won't support base64 content
    XmlTextReader reader = new XmlTextReader(
        node.ToString(),
        XmlNodeType.Document,
        null);
    
    return (SsdsEntity<T>)xs.Deserialize(reader);
}

If you notice, I am using an XmlTextReader to pass to the XmlSerializer.  Unfortunately, the XmlReader from XLINQ does not support handling of base64 content, so this workaround is necessary.

At this point, we have a working serializer/deserializer that can handle arbitrary POCOs.  There are some limitations of course:

  • We are limited to the same datatypes that SSDS supports.  This also means nested objects and arrays are not directly supported.
  • We have lost a little of the 'flexible' in the Flexible Entity (the E in the ACE model).  We now have a rigid schema defined by SSDS metadata and T public properties and enforced on our objects.

In my next post, I will attempt to address some of those limitations and I will introduce a library that handles most of this for you.

Wednesday, 28 May 2008

Serialization in SSDS

SQL Server Data Services returns data in POX (plain ol' XML) format.  If you look carefully at the way the data is returned, you can see that individual flex entities look somewhat familiar to what is produced from the XmlSerializer.  I say 'somewhat' because we have the data wrapped in this 'EntitySet' tag.

<s:EntitySet 
   xmlns:s="http://schemas.microsoft.com/sitka/2008/03/"
   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
   xmlns:x="http://www.w3.org/2001/XMLSchema">
  <PictureTag>
    <s:Id>1e803f90-e5e5-4524-9e3c-3ba960be9494</s:Id>
    <s:Version>1</s:Version>
    <PictureId xsi:type="x:string">3a1714bc-8771-4f6c-8d16-93238f126d9f</PictureId>
    <TagId xsi:type="x:string">ab696b85-1bdc-4bed-8824-dfbf9b67b5cc</TagId>
  </PictureTag>
</s:EntitySet>

I am using the PictureTag from the PhluffyFotos sample application, but this could be any flexible entity.  If we extract the PictureTag element and children from the surrounding EntitySet, we can very easily deserialize this into a class.

Given a class 'PictureTag':

public class PictureTag
{
    [XmlElement(Namespace="http://schemas.microsoft.com/sitka/2008/03/")]
    public string Id { get; set; }
    [XmlElement(Namespace = "http://schemas.microsoft.com/sitka/2008/03/")]
    public int Version { get; set; }
    public string PictureId { get; set; }
    public string TagId { get; set; }
}

We can deserialize this class in just 3 lines of code:

string xml = @"<s:EntitySet 
               xmlns:s=""http://schemas.microsoft.com/sitka/2008/03/""
               xmlns:xsi=""http://www.w3.org/2001/XMLSchema-instance""
               xmlns:x=""http://www.w3.org/2001/XMLSchema"">
              <PictureTag>
                <s:Id>1e803f90-e5e5-4524-9e3c-3ba960be9494</s:Id>
                <s:Version>1</s:Version>
                <PictureId xsi:type=""x:string"">3a1714bc-8771-4f6c-8d16-93238f126d9f</PictureId>
                <TagId xsi:type=""x:string"">ab696b85-1bdc-4bed-8824-dfbf9b67b5cc</TagId>
              </PictureTag>
            </s:EntitySet>";

var xmlTag = XElement.Parse(xml).Element("PictureTag");

XmlSerializer xs = new XmlSerializer(typeof(PictureTag));
var tag = (PictureTag)xs.Deserialize(xmlTag.CreateReader());

Now, the 'tag' variable is a PictureTag instance.  As you can see, deserialization is a snap.  What about serialization, however?

If I reverse the process using the following code, you will notice that something has changed:

using (var ms = new MemoryStream())
{
    //add a bunch of namespaces and override the default ones too
    XmlSerializerNamespaces namespaces = new XmlSerializerNamespaces();
    namespaces.Add("s", @"http://schemas.microsoft.com/sitka/2008/03/");
    namespaces.Add("x", @"http://www.w3.org/2001/XMLSchema");
    namespaces.Add("xsi", @"http://www.w3.org/2001/XMLSchema-instance");

    XmlWriterSettings xws = new XmlWriterSettings();
    xws.Indent = true;
    xws.OmitXmlDeclaration = true;

    using (XmlWriter writer = XmlWriter.Create(ms, xws))
    {
        xs.Serialize(writer, tag, namespaces);
        ms.Position = 0; //reset to beginning

        using (var sr = new StreamReader(ms))
        {
            xmlTag = XElement.Parse(sr.ReadToEnd());
        }
    }
}

If I look in the 'xmlTag' XElement, I get somewhat different XML back:

<PictureTag
  xmlns:s="http://schemas.microsoft.com/sitka/2008/03/"
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xmlns:x="http://www.w3.org/2001/XMLSchema">
  <s:Id>1e803f90-e5e5-4524-9e3c-3ba960be9494</s:Id>
  <s:Version>1</s:Version>
  <PictureId>3a1714bc-8771-4f6c-8d16-93238f126d9f</PictureId>
  <TagId>ab696b85-1bdc-4bed-8824-dfbf9b67b5cc</TagId>
</PictureTag>

I lost the 'xsi:type' attributes that I need in order to signal to SSDS how to treat the type.  Bummer.

We can manually add the attributes (fix-up) after the serialization.  Let's see how that would work:

XNamespace xsi = @"http://www.w3.org/2001/XMLSchema-instance";
XNamespace ns = @"http://schemas.microsoft.com/sitka/2008/03/";

//xmlTag is XElement holding our
var nodes = xmlTag.Descendants();
foreach (var node in nodes)
{
    if (node.Name != (ns + "Id") && node.Name != (ns + "Version"))
    {
        node.Add(
            new XAttribute(
                xsi + "type",
                GetAttributeType(node.Name.LocalName.ToString(), typeof(PictureTag))
                )
            );
    }
}

We need to loop through each node and set the 'xsi:type' attribute appropriately.  Here is my quick and dirty implementation:

static Dictionary<Type, string> xsdTypes = new Dictionary<Type, string>()
{
    {typeof(string), "x:string"},
    {typeof(int), "x:decimal"},
    {typeof(long), "x:decimal"},
    {typeof(float), "x:decimal"},
    {typeof(decimal), "x:decimal"},
    {typeof(short), "x:decimal"},
    {typeof(DateTime), "x:dateTime"},
    {typeof(bool), "x:boolean"},
    {typeof(byte[]), "x:base64Binary"}
};

private static string GetAttributeType(string name, Type type)
{
    var prop = type.GetProperty(name);

    if (prop != null)
    {
        if (xsdTypes.ContainsKey(prop.PropertyType))
            return xsdTypes[prop.PropertyType];
    }

    return xsdTypes[typeof(string)];
}

When all is said and done, I am back to what I need:

<PictureTag
   xmlns:s="http://schemas.microsoft.com/sitka/2008/03/"
   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
   xmlns:x="http://www.w3.org/2001/XMLSchema">
  <s:Id>1e803f90-e5e5-4524-9e3c-3ba960be9494</s:Id>
  <s:Version>1</s:Version>
  <PictureId xsi:type="x:string">3a1714bc-8771-4f6c-8d16-93238f126d9f</PictureId>
  <TagId xsi:type="x:string">ab696b85-1bdc-4bed-8824-dfbf9b67b5cc</TagId>
</PictureTag>

However, I am not sure I really like this technique.  It seems like that if I am going to be using Reflection to 'fix-up' the XML from the XmlSerializer, I might as well just use it to build the entire thing.  With that in mind, here is the next implementation of SSDS Serialization:

public static XElement CreateEntity<T>(T instance, string id) where T : class, new()
{
    XNamespace ns = @"http://schemas.microsoft.com/sitka/2008/03/";
    XNamespace xsi = @"http://www.w3.org/2001/XMLSchema-instance";
    XNamespace x = @"http://www.w3.org/2001/XMLSchema";

    if (instance == null)
        return null;

    if (String.IsNullOrEmpty(id))
        throw new ArgumentNullException("id");

    Type type = typeof(T);

    // Create an element for each non-system, non-binary property on the class
    var properties =
        from p in type.GetProperties()
        where xsdTypes.ContainsKey(p.PropertyType) &&
              p.Name != "Id" &&
              p.Name != "Version" &&
              !p.PropertyType.Equals(typeof(byte[]))
        select new XElement(p.Name,
                   new XAttribute(xsi + "type", xsdTypes[p.PropertyType]),
                   p.GetValue(instance, null)
               );

    // Binary properties are special, since they must be serialized as Base-64
    var binaryProperties =
        from p in type.GetProperties()
        where p.PropertyType.Equals(typeof(byte[])) && (p.GetValue(instance, null) != null)
        select new XElement(p.Name,
                   new XAttribute(xsi + "type", xsdTypes[p.PropertyType]),
                   Convert.ToBase64String((byte[])p.GetValue(instance, null))
               );

    // Construct the Xml
    var xml = new XElement(type.Name,
        new XElement(ns + "Id", id), //here is the Id element
        new XAttribute(XNamespace.Xmlns + "s", ns),
        new XAttribute(XNamespace.Xmlns + "xsi", xsi),
        new XAttribute(XNamespace.Xmlns + "x", x),
        properties,
        binaryProperties
        );

    return xml;
}

In this case, we are using Reflection to build a list of Properties in the object and depending on the type (byte[] array is special), we build the XElement ourselves and assemble the entity by hand.  We can use it like this:

XElement entity = CreateEntity<PictureTag>(tag, tag.Id);

Of course, there are a number of other techniques that I am not covering in this already very long post.  Perhaps in my next post we will look at a few others.

Wednesday, 07 May 2008

The Business Value of SQL Server Data Services

In my second video of a planned series of videos (number yet to be determined), I interviewed Tudor Toma, Group Program Manager and Soumitra Sengupta, SSDS Architect about the business value of SSDS.

Watch the Video

One point that I want to emphasize is that from a business perspective, SSDS tackles two of the biggest pain points that companies have to deal with today when creating new IT systems:  capital expenditures (CapEx) and operational expenditures (OpEx).  Beyond just the dollar cost of how much software costs to create, you need to think about how much hardware or hosting will cost.  You have to plan capacity and guess the load the system will generate.  Next, you have to figure out how many support personnel will be required to maintain the system.  Someone has to maintain the software, repair machines when they go down, defrag, patch, update, etc.

One of the greatest business values (as opposed to technical values) you get from SSDS (and cloud services in general) is the ability to redeploy the capital to other resources.  In the case of CapEx, you can deploy this perhaps to marketing or content creation.  In the case of OpEx, those expenses can be redeployed to more useful tasks in the enterprise (creating new systems, upgrading, and expanding operations perhaps?).  While there are opportunities to just save this money, I think more realistically the money is going to get spent on the other things that you wished you had time and resources for.

I think the value is easy to understand from a startup's perspective.  But even enterprise users should be thinking about this.  Big budget IT programs could easily be switched out to cloud services at a fraction of the cost to build out the support and infrastructure needs.  Money that would otherwise be spent for disaster recovery (you do DR planning, right?) can be re-purposed or saved.  The days of multi-million dollar new IT investments are numbered in a lot of cases in my opinion.  I can still see a few cases where it would be necessary depending on a companies core value proposition, but for a majority, it just doesn't make sense.

Wednesday, 09 April 2008

PhluffyFotos Sample Available

I just posted the first version of PhluffyFotos, our SQL Server Data Services (SSDS) sample app to CodePlex.  PhluffyFotos is a photo sharing site that allows users to upload photos and metadata (tags, description) to SSDS for storage.  As the service gets more features and is updated, the sample will be rev'd as well.

Points of interest that will likely also be blog posts in themselves:

  • This sample has a LINQ-to-SSDS provider in it.  You will notice we don't use any strings for queries, but rather lambda expressions.  I had a lot of fun writing the first version of this and I would expect that there are a few more revisions here to go.  Of course, Matt Warren should get a ton of credit here for providing the base implementation.
  • This sample also uses a very simplistic ASP.NET Role provider for SSDS.  Likely updates here will include encryption and hashing support.
  • We have a number of Powershell cmdlets included for managing authorities and containers.

I have many other ideas for this app as time progresses, so you should check back from time to time to see the updates.

In case anyone was wondering about the name: clouds are fluffy... get it?

You need to have SSDS credentials to run this sample.  If you don't have credentials yet, you can see an online version until then at http://www.phluffyfotos.com

Even if you don' t have access to SSDS credentials yet, the code is worth taking a look.

Monday, 31 March 2008

Interested in SQL Server Data Services?

Are you a Ruby on Rails (RoR) shop?  PHP shop?  Java shop? Are you a web startup using open source technologies to build your services?

Great news,  we have a limited number of seats available for folks like you to get firsthand exposure to our new HTTP-based (SOAP and REST) data service:  SQL Server Data Services (SSDS).  You will get early access to the service and the chance to influence the service itself.

We are interested in getting some feedback from people like you that don't necessarily use Microsoft technologies.  If you can connect using HTTP, you can use SSDS, so there are very few client limitations here.

How to get involved

If you are one of those non-.NET developers that sees value in utility storage and query processing, send me an email to dpesdr (AT) microsoft.com.

In your email, please tell me about your company and about your product where you think SSDS might fit.  We will review your email and follow up with more information.

Details

When: April 24-25th, 2008
Where:  Microsoft Silicon Valley Campus 

Cost

This is a free event but seating is limited - you must have a confirmed reservation to attend this event.  Attendees are responsible for any transportation or lodging costs.  Microsoft will provide breakfast, lunch, and light snacks during the event.  You are responsible for your own dinner expenses.

About SQL Server Data Services

SQL Server Data Services (SSDS) is a highly scalable web facing data storage and query processing utility. Built on robust SQL Server database technology, these services provide high availability and security and support standards-based web protocols and interfaces (SOAP, REST) for rapid provisioning and ease of programming. Businesses can store and access all types of data from birth to archival and sers can access information on any device, from the desktop to a mobile device.

Key Features and Solution Benefits

Application Agility for quick deployment
•    Internet standard protocols and Interfaces (REST, SOAP).
•    Flexible data model with no schema required.
•    Simple text base query model.
•    Easy to program to from any programming environment.
On-Demand Scalability
•    Easy storage and access. Pay as you grow model.
•    Scales as data grows.
•    Web services for provisioning, deployment, and monitoring.
Business-Ready SLA
•    Built on robust Microsoft SQL Server database and Windows server technologies.
•    Store and manage multiple copies of the data for reliability and availability.
•    Back up data stored in each data cluster. Geo-redundant data copies to ensure business continuity.
•    Secure data access to help provide business confidentiality and privacy.

Thursday, 13 March 2008

SSDS Query Model

Note: The query model is subject to change based on feedback; this is how it stands today.  You can pre-register for the beta at the SSDS home page.

Design Decisions

In this post, I am going to cover how the query model works in SQL Server Data Services (SSDS) today and some of the design goals of SSDS in general.

The first thing to understand is that the SSDS team made a conscious decision to start simple and expand functionality later.  The reasoning behind this is simple:

  • Simple services that lower the bar to get started are easier to adopt.  We want to offer the smallest useful feature set that developers can start using to build applications.
  • At the same time, we want to make sure that every feature that is available will scale appropriately at internet-level.

As such, the right model is what the team chose:  start simple and expose richer and richer functionality as we prove out the scale and developer's need.  The team is committed to short (8 week) development cycles that prioritizes the features based on feedback.

The Query Model

Now that I have covered the design decisions, let's take a look at how the query model actually operates.  From my last post, you see that I already showed you the syntax that begins the query operation (?q=).  What is important to understand is the following:

  • The widest scope for any search is the Container.

Well... almost, but I will get to that.  To put this another way: you cannot retrieve entity objects today by searching at the authority level.  That's right:  there is no cross-container search.  This might change in the future, but that is how it is today.  For developers familiar with LDAP terminology, this roughly equates to a SearchScope.OneLevel operation.  The syntax again is:

https://{authority}.data.sitka.microsoft.com/v1/{container}?q={query}

It is important to note that there is no trailing forward slash after the container Id in that. 

Now, what did I mean by "almost"?  If you saw my last post, I showed the following:

https://{authority}.data.sitka.microsoft.com/v1/?q=

This would imply query capabilities, right?  Well, it turns out that you can query at the Authority level, but you can only query Container objects (not the entities contained in them).  Since a Container is a simple entity with only Id and Version today, it is of limited usefulness to query at this level.  However, if we were to add customizable metadata support to attach to a Container, then query might become much more interesting (e.g. find all containers where attribute "foo" equals "bar").

Syntax

The SSDS team decided to adopt a LINQ-like syntax that is carried on the query string.  It has the basic format of the following:

from e in entities {where clause} select e

Only the part inside the {}'s is modifiable*.  We can infer the following from this syntax:  One, you can perform only simple selects today.  There are no joins, group by, ordering, etc.  Two, there is no projection today.  There is only the return of the complete entity as the entity is both the unit of storage (POST and PUT) as well as the unit of return (GET).

Now, let's inspect the "{where clause}".  The syntax here in more detail is:

where {property} {operator} {constant}

Property

The '{property}' part of the expression can operate over both system properties (Id, Kind) as well as flexible properties (anything you added).  The syntax is slightly different depending on which it is.  For example:

e.Kind
e["MyProperty"]

In the case of the system properties, we use the dot notation and the name of the system property.  The custom flex properties are addressed using the brace [] syntax in a weakly-typed syntax.  This makes sense of course as there is no way we could know the syntax of a schema-less entity.

Operator

The operators ({operator}) that are supported are: ==, <, >, >=, <=, !=

Constant

Finally, for the '{constant}', we have just that - a constant.  We do not currently support other expressions or other properties here.  As an example the following is invalid:

e["Pages"] > e["AvgPages"]

while, this would be perfectly valid:

e["Pages"] > 300

The type of the constant is inferred by the syntax.  Using the wrong type will net you zero results possibly, so keep this in mind.  Here are some simple examples to show how to format the constant:

e["xsi_decimal_type"] > 300
e["xsi_string_type"] != "String"
e["xsi_boolean_type"] == true
e["xsi_dateTime_type"] == DateTime("2008-02-09T07:45:23.855")
e["xsi_base64Binary_type"] == Binary("AB34CD")
e.Kind == "String"
e.Id == "String"

The last point here in this syntax is that we can tie together multiple expressions using the && (AND), || (OR), and ! (NOT) logical operators.  Precedence can be set using parentheses ().

Paging

Results are limited to 500 per GET operation.  This is an arbitrary number right now, so don't depend on it.  The EntitySet that is returned is implicitly ordered by Id, so you would need to perform a simple loop logic to page through larger sets of data.  Something to the effect of:

from e in entities where e.Id > "last ID seen" select e

Pulling It Together

Since the query is submitted on the querystring of the URL, we need to encode the querystring using a URL encoding function.  It is actually easiest to use a UrlTemplate from .NET 3.5 which does all the right things for you.

Here is a properly formatted GET request that you can type into a browser to retrieve a set of entities.

https://dunnry.data.sitka.microsoft.com/v1/books?q=from+e+in+entities+where+e["Pages"]+>+300+&&+e.Kind=="Bookv2"+select+e

In this case, I am asking for Entities of the Kind "Bookv2" that have more than 300 (numeric) pages (from "Pages" flex property).  Simple, right?

The actual format that is returned is POX today (support for JSON and APP coming).  It would look something like this:

<s:EntitySet xmlns:s="http://schemas.microsoft.com/sitka/2008/03/"
 xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
 xmlns:x="http://www.w3.org/2001/XMLSchema">
  <Bookv2>
    <s:Id>MyNewBook</s:Id>
    <s:Version>1</s:Version>
    <Name xsi:type="x:string">My Special Book 423</Name>
    <ISBN xsi:type="x:string">ISBN 423</ISBN>
    <PublishDate xsi:type="x:dateTime">2008-02-09T07:43:51.13625</PublishDate>
    <Pages xsi:type="x:decimal">400</Pages>
  </Bookv2>
</s:EntitySet>

The EntitySet tag wraps around one or many (in this case one) XML nodes that are of Kind "Bookv2".  As a developer, I need to interpret this XML and read back the results.

The last point here is that this is the entire entity.  As a consumer of this service, I need to think about how my entities will be used.  Since there is no projection and only a full entity is returned, it may make sense to break apart larger entities with commonly queried properties and leave larger binary properties (and later blobs) in separate entities.  You don't want to pay for the transfer cost or the performance hit of transferring multi-MB (or GB) entities when you only want to read a single flex property from it.  I envision people will start to make very light and composable entities to deal with this.

To anyone wondering what "Sitka" is in the namespaces or the URI... it is the old code name for SSDS.  That will more than likely change with the next refresh.

Limitations Today

The queries are fairly simple.  There is no capability for cross-container queries or joins of any type today.  There are no group by operators, or order by functionality.  There is also no "LIKE", Contains, BeginsWith, or EndsWith functionality.

I have to stress however, that this is a starting point for the SSDS query API, not the final functionality.  I will of course update this blog with the new functionality as it rolls into the service.  Again, the team decided that it was better to put a simple and approachable service out there today and gather feedback on what works and what doesn't for specific scenarios than to sit back and code a bunch of functionality that might not be necessary or meet the user's needs.  I think this was a good decision and there is an amazing variety of applications that you can build using just this API.

* - not quite... turns out you can change the 'e' to anything you like as long as you are consistent in reference, but that hardly counts as changing the query.

Monday, 10 March 2008

SQL Server Data Services Interview

My interview of Istvan Cseri is online now from MIX08.  Istvan covers why we should care about SSDS and how it would impact developers.  Check it out.

Thursday, 06 March 2008

Entities, Containers, and Authorities

SQL Server Data Services exposes what we call the 'ACE' concept.  This stands for Authority, Container, and Entity.  These basic concepts are the building blocks with which you build your applications using SSDS.

You will notice that I intentionally switched the order in my blog post title.  The reasoning is that I believe it is easier to understand the model when you learn the entity first.

Flexible Entities

At the core of it all is the idea of a flexible entity.  SSDS does not impose a schema on the shape of your data.  You are free to define whatever attributes you like with the model and choose some simple types to help you query that data later (string, number, boolean, datetime, or byte[]).  Consider the following C# class and how it will be represented on the wire (using the REST API).

public class Book
{
    public string Name { get; set; }
    public string ISBN { get; set; }
    public DateTime PublishDate { get; set; }
    public int Pages { get; set; }
    public byte[] Image { get; set; }
}
 
In order to store an instance of this Book class, we would need to serialize it on the wire like so (again this is for REST, not SOAP):
 
<Book xmlns:s="http://schemas.microsoft.com/stratus/2008/03/"
 xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
  <s:Id>-544629171</s:Id>
  <Name xsi:type="x:string">Some Book</Name>
  <ISBN xsi:type="x:string">1234567</ISBN>
  <PublishDate xsi:type="x:dateTime">2008-03-06T12:56:53.122-08:00</PublishDate>
  <Pages xsi:type="x:decimal">350</Pages>
  <Image xsi:type="x:base64Binary">CgsMDQ==</Image>
</Book>

A couple of things to notice about this XML - there is an attribute called Id.  This is one of the system level attributes that are required.  Along with the container and authority ID, this id will form the basis of the unique URI that represents this resource.  The other system attributes are Kind and Version.  We control the Id and Kind, but the Version is maintained by SSDS (and why we don't need to send it initially).  For the REST API, the Kind equates to the root XML element for the entity (in this case Kind is "Book").

Flexible means flexible

I believe that most developers will use SSDS with a strongly-typed backing class like Book.  The serialization and subsequent deserialization of the instance is actually up to you as a developer however.

However, nothing forces me to actually use a backing class for this service.  It can be helpful for developers to think of the data they store in SSDS in terms of objects or even rows in a database, but that is not the only way to think about it.  More accurately, we can think about the data as a sparse property bag.

I can just as easily store another Book object with the following:

<Book xmlns:s="http://schemas.microsoft.com/stratus/2008/03/"
 xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
  <s:Id>MySpecialBook</s:Id>
  <Name xsi:type="x:string">Some Book</Name>
  <ISBN xsi:type="x:string">1234567</ISBN>
  <PublishDate xsi:type="x:string">January 2008</PublishDate>
</Book>

Notice in this example that I have removed some properties and then changed the PublishDate property type.  This is completely legal when using SSDS and no error will occur for different properties on the same Kind.  It is up to you as a developer to figure out the shape of the data coming back and deserialize it appropriately.

As you can see, a flexible entity is really about having different shapes to your data (and no schema).

image

Containers

Containers are collections of entities.  We like to say that the container is unit of consistency for our data.  It turns out that containers are the broadest domain of a query or search.  As such, containers must also contain a complete copy of all the entities within (that is, they must be co-located on the same node).  This further means that there must be some practical limit to the size of the container (as measured by space on disk) at some point because we are constrained by physical storage (also memory and CPU to some extent).

We don't have a hard limit at this point except to say it must exist, but eventually if your container gets big enough, you should consider partitioning it for resource efficiency reasons.

image

Because containers are another type of entity (albeit a somewhat special one), they also have the system attribute Id.  If we look at what a container looks like, you will see that it is a simple XML structure with no custom properties (though that could change).

<s:Container xmlns:s="http://schemas.microsoft.com/sitka/2008/03/"
 xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
 xmlns:x="http://www.w3.org/2001/XMLSchema">
  <s:Id>Books</s:Id>
  <s:Version>1</s:Version>
</s:Container>
 
In this case, the container here is called "Books".  The Version attribute comes back from SSDS to tell me what version of the entity I am looking at.  It can be used for simple concurrency (more on that in another post).

Authority

Finally, we have the idea of an authority.  This is a collection of containers co-located in a specific datacenter today (though this might not always remain true).  The authority is analogous to a namespace for .NET developers in a sense as it scopes the containers within it.  There is a DNS name attached to the authority that we use when we address a resource using REST.  This DNS name is provisioned for a particular user and added to a subdomain off the main SSDS domain name.

image

 

Pulling it together

Now that we have covered what ACE means, let's pull it together and see how it affects our URI that we build to address a resource.

https://{authority}.data.sitka.microsoft.com/v1/{containerID}/{entityID}

This build a stable URI for us given an authority name, container Id, and entity Id.

I don't always need all three identifiers in order to address a resource.  I can actually query to find resources.  For instance, I will query the contents of an authority (the containers), if I target the authority with a GET request:

https://{authority}.data.sitka.microsoft.com/v1/?q=

More commonly, I can query the contents of a container (the entities) by targeting the container URI with a GET request:

https://{authority}.data.sitka.microsoft.com/v1/{containerID}?q=

In this model, you will notice the ?q= portion of the URI.  This is the indicator that we want a query.  I am not specifying one here, so it acts more like a SELECT *.  In a later post, I will cover the query model in more detail.

Wednesday, 05 March 2008

Introducing SQL Server Data Services

Finally!  I can start to talk what I have been working on for the last three months.  Today, Ray Ozzie announced the release of SQL Server Data Services (SSDS).  SSDS is our cloud-based data storage and processing service.  Exposed over HTTP endpoints (REST or SOAP), SSDS delivers your data anywhere in a pay-as-you-go manner.  Fundamentally different than other storage services available today is the fact that SSDS is built on the SQL platform.  This allows us over time to expose richer and richer capabilities of the underlying platform.  Today, we have a set of simple query capabilities that allow us to still build quite sophisticated applications in 'scale-free' manner.

Over the coming weeks and months, I will be blogging more about the shape of the service as well as introducing some very cool samples that show off the power of SSDS.

For more SSDS coverage, checkout the SSDS Team Blog as well.

MIX Sessions

If you are attending MIX08, Nigel Ellis will be delivering a session on SSDS on Thursday at 8:30am in the Delfino 4005 Room. Session recording as well as slides will be available within 24 hours at http://sessions.visitmix.com.

Additionally, we will have a few Open Space sessions available for more information:

  • Thursday 12pm – Soumitra Sengupta will talk about the business value of SSDS at Area 1 of Open Space.
  • Thursday 2pm – Jeff Currier and Jason Hunter will talk about developing with SSDS in the Theatre area at Open Space.
  • Friday 11:30am – Istvan Cseri will talk about architecting for SSDS at Area 1 of Open Space.

Beta Opportunity

A limited beta for SSDS will be starting soon.  If you are interested in participating, I encourage you sign up.