Thursday, June 26, 2008

Working with Objects in SSDS Part 2

This is the second post in my series on working with SQL Server Data Service (SSDS) and objects.  For background, you should read my post on Serializing Objects in SSDS and the first post in this series.

Last time I showed how to create a general purpose serializer for SSDS using the standard XmlSerializer class in .NET.  I created a shell entity or a 'thin veneer' for objects called SsdsEntity<T>, where T was any POCO (plain old C#/CLR object).  This allowed me to abstract away the metadata properties required for SSDS without changing my actual POCO object (which, I noted was lame to do).

If we decide that we will use SSDS to interact with POCO T, an interesting situation arises.  Namely, once we have defined T, we have in fact defined a schema - albeit one only enforced in code you write and not by the SSDS service itself.  One of the advantages of using something like SSDS is that you have a lot of flexibility in storing entities  (hence the term 'flexible entity') without conforming to schema.  Since, I want to support this flexibility, it means I need to think of a way to support not only the schema implied by T, but also additional and arbitrary properties that a user might consider.

Some may wonder why we need this flexibility:  after all, why not just change T to support whatever we like?  The issue comes up most often with code you do not control.  If you already have an existing codebase with objects that you would like to store in SSDS, it might not be practical or even possible to change the T to add additional schema.

Even if you completely control the codebase, expressing relationships between CLR objects and expressing relationships between things in your data are two different ideas - sometimes this problem has been termed 'impedance mismatch'.

In the CLR, if two objects are related, they are often part of a collection, or they refer to an instance on another object.  This is easy to express in the CLR (e.g. Instance.ChildrenCollection["key"]).  In your typical datasource, this same relationship is done using foreign keys to refer to other entities.

Consider the following classes:

public class Employee
{
    public string EmployeeId { get; set; }
    public string Name { get; set; }
    public DateTime HireDate { get; set; }
    public Employee Manager { get; set; }
    public Project[] Projects { get; set; }
}

public class Project
{
    public string ProjectId { get; set; }
    public string Name { get; set; }
    public string BillCode { get; set; }
}

Here we see that the Employee class refers to itself as well as contains a collection of related projects (Project class) that the employee works on.  SSDS only supports simple scalar types and no arrays or nested objects today, so we cannot directly express this in SSDS.  However, we can decompose this class and store the bits separately and then reassemble later.  First, let's see what that looks like and then we can see how it was done:

var projects = new Project[]
{
    new Project { BillCode = "123", Name = "TPS Slave", ProjectId = "PID01"},
    new Project { BillCode = "124", Name = "Programmer", ProjectId = "PID02" }
};

var bill = new Employee
{
    EmployeeId = "EMP01",
    HireDate = DateTime.Now.AddMonths(-1),
    Manager = null,
    Name = "Bill Lumbergh",
    Projects = new Project[] {}
};

var peter  = new Employee
{
    EmployeeId = "EMP02",
    HireDate = DateTime.Now,
    Manager = bill,
    Name = "Peter Gibbons",
    Projects = projects
};

var cloudpeter = new SsdsEntity<Employee>
{
    Entity = peter,
    Id = peter.EmployeeId
};

var cloudbill = new SsdsEntity<Employee>
{
    Entity = bill,
    Id = bill.EmployeeId
};

//here is how we add flexible props
cloudpeter.Add<string>("ManagerId", peter.Manager.EmployeeId);

var table = _context.OpenContainer("initech");
table.Insert(cloudpeter);
table.Insert(cloudbill);

var cloudprojects = peter.Projects
    .Select(s => new SsdsEntity<Project>
    { 
        Entity = s,
        Id = Guid.NewGuid().ToString()
    });

//add some metadata to track the project to employee
foreach (var proj in cloudprojects)
{
    proj.Add<string>("RelatedEmployee", peter.EmployeeId);
    table.Insert(proj);
}

All this code does is create two employees and two projects and set the relationships between them.  Using the Add<K> method, I can insert any primitive type to go along for the ride with the POCO.  If we query the container now, this is what we see:

<s:EntitySet 
    xmlns:s="http://schemas.microsoft.com/sitka/2008/03/" 
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
    xmlns:x="http://www.w3.org/2001/XMLSchema">
  <Project>
    <s:Id>2ffd7a92-2a3b-4cd8-a5f7-55f40c3ba2b0</s:Id>
    <s:Version>1</s:Version>
    <ProjectId xsi:type="x:string">PID01</ProjectId>
    <Name xsi:type="x:string">TPS Slave</Name>
    <BillCode xsi:type="x:string">123</BillCode>
    <RelatedEmployee xsi:type="x:string">EMP02</RelatedEmployee>
  </Project>
  <Project>
    <s:Id>892dbb1e-ba47-4c87-80e6-64fbb46da935</s:Id>
    <s:Version>1</s:Version>
    <ProjectId xsi:type="x:string">PID02</ProjectId>
    <Name xsi:type="x:string">Programmer</Name>
    <BillCode xsi:type="x:string">124</BillCode>
    <RelatedEmployee xsi:type="x:string">EMP02</RelatedEmployee>
  </Project>
  <Employee>
    <s:Id>EMP01</s:Id>
    <s:Version>1</s:Version>
    <EmployeeId xsi:type="x:string">EMP01</EmployeeId>
    <Name xsi:type="x:string">Bill Lumbergh</Name>
    <HireDate xsi:type="x:dateTime">2008-05-25T23:59:49</HireDate>
  </Employee>
  <Employee>
    <s:Id>EMP02</s:Id>
    <s:Version>1</s:Version>
    <EmployeeId xsi:type="x:string">EMP02</EmployeeId>
    <Name xsi:type="x:string">Peter Gibbons</Name>
    <HireDate xsi:type="x:dateTime">2008-06-25T23:59:49</HireDate>
    <ManagerId xsi:type="x:string">EMP01</ManagerId>
  </Employee>
</s:EntitySet>

As you can see, I have stored extra data in my 'flexible' entity with the ManagerId property (on one entity) and RelatedEmployee property on the Project kinds.  This allows me to figure out later what objects are related to each other since we can't model the CLR objects relationships directly.  Let's see how this was done.

public class SsdsEntity<T> where T: class
{
    Dictionary<string, object> _propertyBucket = new Dictionary<string, object>();

    public SsdsEntity() { }

     [XmlIgnore]
    public Dictionary<string, object> PropertyBucket
    {
        get { return _propertyBucket; }
    }

    [XmlAnyElement]
    public XElement[] Attributes
    {
        get
        {
            //using XElement is much easier than XmlElement to build
            //take all properties on object instance and build XElement
            var props =  from prop in typeof(T).GetProperties()
                         let val = prop.GetValue(this.Entity, null)
                         where prop.GetSetMethod() != null
                         && allowableTypes.Contains(prop.PropertyType) 
                         && val != null
                         select new XElement(prop.Name,
                             new XAttribute(Constants.xsi + "type",
                                 XsdTypeResolver.Solve(prop.PropertyType)),
                             EncodeValue(val)
                             );

            //Then stuff in any extra stuff you want
            var extra = _propertyBucket.Select(
                e =>
                     new XElement(e.Key,
                        new XAttribute(Constants.xsi + "type",
                             XsdTypeResolver.Solve(e.Value.GetType())),
                            EncodeValue(e.Value)
                            )
                );

            return props.Union(extra).ToArray();
        }
        set
        {
            //wrap the XElement[] with the name of the type
            var xml = new XElement(typeof(T).Name, value);

            var xs = new XmlSerializer(typeof(T));

            //xml.CreateReader() cannot be used as it won't support base64 content
            XmlTextReader reader = new XmlTextReader(
                xml.ToString(),
                XmlNodeType.Document,
                null
                );

            this.Entity = (T)xs.Deserialize(reader);

            //now deserialize the other stuff left over into the property bucket...
            var stuff = from v in value.AsEnumerable()
                        let props = typeof(T).GetProperties().Select(s => s.Name)
                        where !props.Contains(v.Name.ToString())
                        select v;

            foreach (var item in stuff)
            {
                _propertyBucket.Add(
                    item.Name.ToString(),
                    DecodeValue(
                        item.Attribute(Constants.xsi + "type").Value,
                        item.Value)
                    );
            }
        }
    }

    public void Add<K>(string key, K value)
    {
        if (!allowableTypes.Contains(typeof(K)))
            throw new ArgumentException(
		String.Format(
		"Type {0} not supported in SsdsEntity",
		typeof(K).Name)
		);

        if (!_propertyBucket.ContainsKey(key))
        {
            _propertyBucket.Add(key, value);
        }
        else
        {
            //replace the value
            _propertyBucket.Remove(key);
            _propertyBucket.Add(key, value);
        }
    }
}

I have omitted the parts of SsdsEntity<T> from the first post that didn't change.  The only other addition you don't see here is a helper method called DecodeValue, which as you might guess, interprets the string value in XML and attempts to cast it to a CLR type based on the xsi:type that comes back.

All we did here was add a Dictionary<string, object> property called PropertyBucket that holds our extra stuff we want to associate with our T instance.  Then in the getter and setter for the XElement[] property called Attributes, we are adding them into our array of XElement as well as pulling them back out on deserialization and stuffing them back into the Dictionary.  With this simple addition, we have fixed our in flexibility (or lack thereof) problem.  We are still limited to the simple scalar types, but as you can see you can work around this in a lot of cases by decomposing the objects down enough to be able to recreate them later.

The Add<K> method is a convenience only as we could operate directly against the Dictionary.  I also could have chosen to keep the Dictionary property bucket private and not expose it.  That would have worked just fine for serialization, but I wanted to also be able to query it later.

In my last post, I said I would introduce a library where all this code is coming from, but I didn't realize at the time how long this post would be and that I still need to cover querying.  So... next time, I will finish up this series by explaining how the strongly typed query model works and how all these pieces fit together to recompose the data back into objects (and release the library).

Comments are closed.