AZR432 – SQL Azure Federations Deep Dive 1


Most developers are familiar with the concept of scaling out their application tier; with SQL Azure Federations it is now possible to scale out the data tier as well. In this session we will deep dive on building large scale solutions on SQL Azure.

In this session we will cover patterns and techniques for building scalability into your relational databases. SQL Azure Federations allow databases to be spread over 100s of nodes in the Azure datacentre with databases paid for by the day. This presents a unique avenue for dealing with particularly massive volumes of data, of user load, or both.

This session will discuss how to design a schema for federation scale-out while still maintaining the value afforded by a true relational (SQL) database. We’ll look at approaches for minimizing cross federation queries and as well as approaches to fan-out queries when necessary. We will examine approaches for dealing with elastically scaling applications and other high load scenarios.

Introduction

Kiwi Chris Auld is back to go into more detail on SQL Federations.

Disclaimer: These are conference session notes I compiled during various sessions at Microsoft Tech Ed 2012, September 11-14, 2012.  The majority of the content comprises notes taken from the presentation slides accompanied, occasionally, by my own narration.  Some of the content may be free hand style.  Enjoy… Rob

Agenda

  • Overview
  • Tips and Tricks
    • Design and Development
      • Picking a Federation Model
      • Picking Reference Tables
      • Generating a key without bottlenecks
      • Coding fan-out queries
    • Administration
      • Configuring layout
      • Where and when to split
  • vNext
    • Improvements

SQL Azure has been renamed..  Keeping it “simple”: SQL Database = SQL Azure

Scalability model for the Cloud

  • Cloud Apps allow massive scale
    • Orders of magnitude more than burst
  • Cloud Apps demand the best economics
    • Best Price/Performance
    • Elasticity + Pay-as-you-go

Data Scale Challenges

  • For small scenarios scale up is cheaper & easier
  • For larger scenarios scale out is the only solution
    • Massive diseconomies of scale
      • 1×64 way server >>> $$$ 64×1 way servers
    • Reach a limit (can’t get a big enough box)
    • Shared resource contention
  • Scale out differs

Federations in SQL Database

  • Canonical 3 tier app scales by
    • Adding and removing nodes
    • Buying a huge DB server
  • Federations extend the model to the DB tier
    • Add and remove Azure nodes with federations
    • Scale on demand to your traffic without downtime

Sort of like a load balancer for databases.  Database horizontal partitioning under transactional load.
Federation is essentially a database sharding strategy.

Why use Federations?

  • Scale beyond single DB to almost unlimited scale
    • Many nodes
  • Best economics
    • elastic tier that can repartition
    • Always on
  • Simplified multi-tenancy
  • Simplified development and admin
    • Platform supports sharding
    • Proper RDBMS

Concepts

Sharding/federation: A named dimension over which data is sliced.
’CREATE FEDERATION fed_name(key_label, distribution_type)

Atomic unit: What is the smallest atomic unit (record) to shard from?  e.g. a Customer record (which can’t be shared across two shards).

Repartitioning without downtime: 

  • SPLIT members into workloads over nodes e.g. ALTER FEDERATION <name> SPLIT AT(key=value)
  • DROP members to shrink back to fewer nodes
  • MERGE not yet supported

Built-in Data Dependent Routing

  • DDR ensure apps can discover where data is just-in-time (JIT)
    • Apps don’t need to cache a ‘shard map’
    • No cache coherency issues even with partitioning
  • Prevents connection pool fragmentation issues

USE FEDERATION <name>(<key>=value);

[Demo]

Working with Federations

  • Create federation (shard)
  • Create federation member
  • insert data (turn filtering off for multiple records)
  • Supports standard T-SQL
  • using FEDERATED ON to assign column to federation scope
  • Needs to be applied consistently when sharding a table (e.g. on the atomic unit – e.g. customerId)
  • Can only do one split at a time
  • Can only split on discrete values (int, varbinary, uniqueidentifier)
  • Can use a dynamic view to track progress

Concepts – 3 kinds of tables

  • Federated Tables
    • Contain data that is distributed by the federation
    • Contains a slice of data in each database
    • Optimized for read/write at scale
    • Bounded by atomic unit/filtering
  • Reference Tables
    • Duplicate copy of data in each federation member
    • Must manually modify data in each federation member, thus eventually consistent
    • optimised for read
    • Used for referential integrity
    • No cross database transactions in SQL Azure
  • Central table
    • Refer to tables that are created in the federation root (database) for low traffic objects such as metadata
    • Single point of load (use sparingly) or through a cache

Picking Federations

  • Normalize data model to 3NF (or more) – sort of (keep in mind the persistence of the federated column needs to be used across the related data entities) – sort of like adding a “federation key” across the data model
  • Apply scale-first DB design principals
  • Pick Federations, i.e. “Table Groups” that need to scale out

Picking Reference Tables

  • Look up table
  • Joined in queries

Generating Unique Keys

  • Identity not available (because it’s shared)
  • Identity Generation can be expensive for large apps
    • provides linearly increasing values
    • provides no gaps in generation
    • only generated at the data tier
    • creates bottleneck, must have some sort of shared counter
  • Benefits of GUID
    • no centralized id generation
    • generated anywhere in the tiers
    • random distribution over an enormous address space
  • Can also use varbinary

Data Dependent Routing

Based on ‘atomic units’.  DDR connects to the correct database federation member. 

FILTERING=ON – connects to a single atomic unit (based on the federation key), good for:

  • Management tasks
  • CRUD

FILTERING=OFF – for fan-out queries (data across multiple federation members)

Used for:

  • Reporting queries
    • e.g. Union or aggregate queries
  • Unaligned queries
  • fan-out queries (union all, additive/non-additive aggregates)

Member query: part sent to each member
Summary query: post processing query for member query results

[Demo] A number of different T-SQL queries which demonstrate querying across multiple and single federations.

Well, there’s obviously a lot to cover off here.  I’ll be coming back to this topic myself to run a few sample proof of concepts, stay tuned here at Sanders Technology for more on SQL Federation.


About Rob Sanders

IT Professional and TOGAF 9 certified architect with nearly two decades of industry experience, 18 years in commercial software development and 11 years in IT consulting. Check out the "About Rob" page for more information.


Leave a comment

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

One thought on “AZR432 – SQL Azure Federations Deep Dive

  • William Denniss

    Wow, this is very very interesting. That does solve some major issues larger apps will run into (like you blogged, “can’t get a big enough box”). It’s nice to see solutions for scaling relational databases other than “don’t use relational databases, use NoSQL”!