Wednesday, June 4, 2014

Brewmaster Template Definition

In order to author a Brewmaster template, you will need to have a basic understanding of how the template is structured.  In this blog post, I will cover all the basic terminology.

Methodology Explained

A central tenant to the Brewmaster template is that a template represents the goal state of how you want your deployment to be configured.  This has some interesting implications.  Namely, it means that if the state of your deployment is already as described, we skip it.  For example, if you describe 3 VMs and we only find 2 VMs in an existing deployment, we will deploy 1 VM.  We won't try to deploy 3 additional VMs in this example.  If you describe a Storage Account, we will make sure it exists.  Same thing for the Affinity Group.

Since we also view the template as your goal state, it means you can have the reverse scenario with a slight caveat:  my template describes 2 VMs and Brewmaster finds 3 VMs already deployed.  In this case and with your explicit configuration, we will remove the unreferenced VM.  The caveat here is that you need to explicitly tell Brewmaster to be destructive and remove VMs.

There is one more caveat to this 'goal state' idea and that is around Network settings.  Given that Network settings are global in nature to the subscription, we didn't want you to have to describe every possible Virtual Network and Subnet for every deployment (even the ones not done in Brewmaster).  If we were strict about a 'goal state', we would have to do so.  Instead, because of the global nature of these things and the monolithic description and configuration, we have taken an additive-only approach here.  We will only attempt to add additional Virtual Networks, Subnets, DNS Servers, etc., and we will never delete them if they become unreferenced or are not fully described.  One consequence of this is that if you describe Virtual Network A with a single subnet called Subnet AA and Brewmaster finds Virtual Network A with existing Subnet BB, we will not remove Subnet BB, and instead you will be left with both Subnet AA and Subnet BB.

Now that you have the general methodology we follow, let's look at the possible configuration options.

Template Schema

At the highest level, you have 9 basic sections inside a template:

  • Parameters
  • Network
  • AffinityGroup
  • StorageAccounts
  • CloudServices
  • DeploymentGroups
  • Credentials
  • ConfigSets
  • Configurations


This section contains replaceable params used throughout the template.  Parameters values themselves are either supplied at deployment time or rely upon a described default.  Template authors can include optional metadata in the parameter definition for things like description, type hints, and even validation logic.  A parameter can be of type String, DateTime, Boolean, or Number.  Brewmaster's website will generate a dynamic UI and perform validation based on the values declared in this section.

We recommend that template authors use the type hints and validation capabilities to prevent easily avoided misconfigurations due to data entry mistakes.


The network settings described in here allow you to declare things like Virtual Networks, subnets, DNS servers, and Local Sites.  Keeping in mind the caveat that Networks are an additive-only approach, you need only describe the portions of the Networks settings in Azure that directly apply to your intended deployment.


A Brewmaster template is scoped to a single Affinity Group (AG).  Early on, we supported multiple AGs, but soon realized that things became far too complicated for a user to keep the definitions straight inside a single template.  In the IaaS world, it turns out that you really need an AG to do most anything interesting.  Virtual Networks must exist in an AG for instance.  If you want your IaaS VMs to be able to communicate, you have to have one.  The implication of having a required AG definition is two-fold:  1.)  we enforce the best practices that all your resources are in an AG, and 2.) it means you can only deploy to a single datacenter right now per deployment.  If you need to have a multi-datacenter (region) deployment, you would just deploy the same template multiple times.


Each described Cloud Service (CS) can also describe a single VmDeployment within it.  That deployment can next describe multiple VMs (up to Azure's limit).  All the supported VM settings are described within a VM (Disks, Endpoints, etc.).  Each VM can be associated with one or more ConfigSets.  I will describe this more below in detail, but ultimately, this is how you control what configuration is done on each VM.  Each VM can also be assigned a single Deployment Group (more below).  If one is not assigned, it belongs to the 'default' Deployment Group.

If the CS does not exist, Brewmaster will create it.  If the CS already exists and there is a deployment in the Production slot, Brewmaster will attempt to reconcile what is in the template with what has been found in the existing deployments.  Each machine name is unique within a CS, so if we find a matching VM name in the template configuration, Brewmaster will attempt to configure the machine as described.


Deployment Groups (DGs) are a mechanism by which you can control the dependencies between machines.  By default, all VMs are in a single DG aptly called 'default'.  If you specify additional DGs and VMs have been assigned to them, those machines will be configured completely before any other VMs are configured (end to end).  The order of the described DGs are also the order in which everything is deployed and configured.  Most templates will not use this capability - the default DG is just fine.  However, a classic example of needing this capability is something like Active Directory.  You would want your domain controllers to fully deploy and be configured and ready before additional VMs were spun up and configured if they were to also join the domain.


Instead of littering the template with a bunch of usernames and passwords, the template allows you to describe the credentials once and then refer to it everywhere.  We strongly suggest parameterizing the credentials here and not hardcoding them.  This will maximize the re-usability of your template.


ConfigSets provide a convenient way to group VMs together for common configurations.  Inside of a ConfigSet, you can configure Endpoints that will be applied to all VMs within the ConfigSet.  Additionally, this is where you associate a particular Configuration (1 or more) to a set of VMs.

At deployment time, the common Endpoints will be expanded and added into the Endpoint configuration for each VM found to reference that ConfigSet.  In general, you will have a single ConfigSet per type of VM within your template definition.  So, if I have a SQL Server Always On template, I might have 1 ConfigSet for my SQL nodes, 1 for my Quorum nodes, and 1 for my Active Directory nodes.  If you find that you have common configuration between ConfigSets (e.g. each ConfigSet needs to format data disks), that means you can factor those operations out and apply it as a reusable Configuration in each ConfigSet.


A Configuration is a set of DSC resources (or Powershell scripts) that will be applied to a VM.  You can factor your DSC resources into smaller, more re-usable sets, and reference them from the ConfigSet.  For instance, suppose I build out Active Directory and SQL Server in two ConfigSets.  I could have 2 different configurations - one for configuring Active Directory and one for configuring SQL Server.  However, if I also had some common configuration (e.g. formatting attached disks), I could create a 3rd Configuration and simply include that Configuration as a reference in each ConfigSet.  This way, we keep to the DRY principle and you only need to reference the reusable Configurations amongst one or more ConfigSets.

As a best practice, we recommend building out any custom actions as a DSC resource.  Long term, this ends up being a much more manageable practice than attempting to manage configuration as simple Powershell scripts.

Wrap Up

Hopefully this gives you a quick tour around the Brewmaster template definition and philosophy.  As we update things to support new features in Azure, you will see the schema change slightly.  Make sure to check out our documentation to keep up.

Tuesday, January 29, 2013

Anatomy of a Scalable Task Scheduler

On 1/18 we quietly released a version of our scalable task scheduler (creatively named 'Scheduler' for right now) to the Windows Azure Store.  If you missed it, you can see it in this post by Scott Guthrie.  The service allows you to schedule re-occurring tasks using the well-known cron syntax.  Today, we support a simple GET webhook that will notify you each time your cron fires.  However, you can be sure that we are expanding support to more choices, including (authenticated) POST hooks, Windows Azure Queues, and Service Bus Queues to name a few.

In this post, I want to share a bit about how we designed the service to support many tenants and potentially millions of tasks.  Let's start with a simplified, but accurate overall picture:


We have several main subsystems in our service (REST API fa├žade, CRON Engine, and Task Engine) and additionally several shared subsystems across additional services (not pictured) such as Monitoring/Auditing and Billing/Usage.  Each one can be scaled independently depending on our load and overall system demand.  We knew that we needed to decouple our subsystems such that they did not depend on each other and could scale independently.  We also wanted to be able to develop each subsystem potentially in isolation without affecting the other subsystems in use.  As such, our systems do not communicate with each other directly, but only share a common messaging schema.  All communication is done over queues and asynchronously.


This is the layer that end users communicate with and the only way to interact with the system (even our portal acts as a client).  We use a shared secret key authentication mechanism where you sign your requests and we validate them as they enter our pipeline.  We implemented this REST API using Web API.  When you interact with the REST API, you are viewing fast, lightweight views of your scheduled task setup that reflects what is stored in our Job Repository.  However, we never query the Job Repository directly to keep it responsive to its real job - providing the source data for the CRON Engine.

CRON Engine

This subsystem was designed to do as little as possible and farm out the work to the Task Engine.  When you have an engine that evaluates cron expressions and fire times, it cannot get bogged down trying to actually do the work.  This is a potentially IO-intensive role in the subsystem that is constantly evaluating when to fire a particular cron job.  In order to support many tenants, it must be able run continuously without bogging down in execution.  As such, this role only evaluates when a particular cron job must run and then fires a command to the Task Engine to actually execute the potentially long running job.

Task Engine

The Task Engine is the grunt of the service and it performs the actual work.  It is the layer that will be scaled most dramatically depending on system load.  Commands from the CRON Engine for work are accepted and performed at this layer.  Subsequently, when the work is done it emits an event that other interested subsystems (like Audit and Billing) can subscribe to downstream.  The emitted event contains details about the outcome of the task performed and is subsequently denormalized into views that the REST API can query to provide back to a tenant.  This is how we can tell you your job history and report back any errors in execution.  The beauty of the Task Engine emitting events (instead of directly acting) is that we can subscribe many different listeners for a particular event at any time in the future.  In fact, we can orchestrate very complex workflows throughout the system as we communicate to unrelated, but vital subsystems.  This keeps our system decoupled and allows us to develop those other subsystems in isolation.

Future Enhancements

Today we are in a beta mode, intended to give us feedback about the type of jobs, frequency of execution, and what our system baseline performance should look like.  In the future, we know we will support additional types of scheduled tasks, more views into your tasks, and more complex orchestrations.  Additionally, we have setup our infrastructure such that we can deploy to to multiple datacenters for resiliency (and even multiple clouds).  Give us a try today and let us know about your experience.