High Availability–It’s Not All About The Hardware/Software

As I write this I’m sitting my my house working with the last hour of juice left in my laptop. We had a major Snow + Seattle = Bad Newssnowstorm here in the Seattle area which was followed by a big ice storm and some heavy winds, all of which have coincided to knock out power to over 200,000 homes. I’ve been without power for over 75 hours now and it’s given me a lot of time to think about multiple facets of High Availability (HA) and Disaster Recovery (DR).

HA/DR are critical items for all companies. HA in various degrees, but most certainly DR and the ability to provide some kind of business continuity model. Different companies, large and small, have different requirements around HA/DR. Some could easily remain down for a couple of days with no lingering effects, others cannot sustain downtime on their most critical systems of more than a couple of minutes. Based upon those requirements (and any budget constraints) a company could implement HA/DR using Failover Clustering, Mirroring, Log Shipping, SAN Replication, Peer-to-Peer Replication, VMotion or some other technology.



Where HA/DR often falls down

This is all good and well but what about the people that you need in order to get things up and running, the folks that you need in order to provide you with ongoing support in the event that a situation arises where you have to go to your DR site, or provide support for that critical system where a mirror has failed?

In the last 80 hours I have been called by the NOC at work stating that they have an issue. I don’t have power at home so my internet access is off. I don’t have sufficient power in my phone to tether it (don’t tell the wireless company I can do that) and it would take 4 hours sitting in the car to get sufficient power to allow me to do that. I do have a mobile broadband USB device, however I’m not getting coverage in my house right now from that carrier and I have been unable to get out of my road. In other words in the whole HA/DR side of things I’m a single point of failure. Luckily I have a colleague with power who I was able to point the NOC to in order to provide resolution to the issue.

My colleague and I live less than 5 miles apart, we are just luckily on different circuits and substations for the power company. If neither of us had been able to dial in then the problem would not have been resolved.


If you truly want to implement HA/DR you need to go further than just looking at your hardware infrastructure or the way that your software is setup. It means more than your DNS configuration and your Active Directory setup. There’s more to it than setting a Failover Partner in your application connection string. It’s about the people that you need to support these things!



Artificial limitations

I’ve been a DBA for about 12 years now. I’ve not been hands on with a server since doing datacenter implementation work about 5 years ago. Other than in very small shops it’s rare that a DBA touches a server, most don’t even have access to the datacenter in which their database servers reside. There has been a huge move towards setup in Colocation Datacenters (CoLo’s) over the last few years, meaning that more often than not the datacenter where you servers reside are probably not even in the same physical building that you are. There is also the current move to the Cloud initiative that’s gaining some traction. Given all of this information I ask you this:

Why are DBAs so often tied to a physical location for work?


This is a serious question. There are so few telecommuting options for DBAs out there. You can look on the job boards and see virtually none. There’s a lot of relocation opportunities if you want to move to another state, but it doesn’t make a lot of sense. If you are going to geographically disperse your hardware due to HA/DR needs why would you not do that same for the teams that have to support it?

So many companies point blank refuse to even entertain the idea of telecommuters from another state (heck the vast majority won’t even contemplate allowing IT support staff to work from home a couple of days a week).

The same companies that happily outsource all development processes to an unknown entity in India or China refuse to entertain the idea of allowing Windows/Unix Admins or DBAs to telecommute.


I feel that it is somewhat ironic that the same people that expect you to come to the office and be tied to a physical location also have you be on call so that you can provide remote support when something goes wrong.



Moving beyond the office

Forcing IT support staff to come to a physical location 5 days a week makes little to no sense to me. I can understand the desire to have folks in the same physical location but as your organization grows it really needs to find a way to get past that mentality. There are various options such as LiveMeeting or GoToMeeting that can provide a telepresence in meetings for people that work remotely. Instant Messaging can be used to constantly keep people in touch. Services such as Yammer can be used to quickly and easily provide collaborative thought outside of meetings.

Looking at my current situation, if my colleague and I had both been without power we would have had no way to provide support. Our servers sit in a CoLo that we don’t have access to so even if we could have gotten there, we would have had no hands on way to work with the servers.

Now imagine we had a third person on the team that lived and worked in another state (or that the second person was living in another state and telecommuted in). Even with power down for me (or the entire state) we would still have someone available to provide support (unless by extremely unusual circumstances they were also down). With a remote DR site they could have effected whatever steps were necessary to bring that online and check that things were working ok.

Let’s look at this from another angle. I’m in the Pacific Time Zone and get to work usually 7:00-7:30am and leave 4:30-5:00pm. If we had someone working in the Eastern Time Zone and they worked the same hours we would have coverage (without emergencies handled by being on call) from 4am PST – 4:30pm PST. That’s an extra 3 hours of standard DBA coverage. Imagine how much more work could get done. How overnight problems could be identified and resolved earlier?

We live in a global society and the company that I work for does business internationally. Would it not make a lot more sense to provide a wider margin of coverage? Think about it, it would also improve the quality of life for anyone that had to be on call. For me the worst time to get called out is about 4am as I’ll never get back to sleep. If there was someone already in the office on the east coast I wouldn’t need to get called out. The same for them with evening plans. They could go out to dinner or a movie knowing that they wouldn’t have to worry about missing a call.




It’s the 21st century, there are many options to prevent the need for anyone to be physically tied to an office location (certainly amongst IT staff). In order to provide higher levels of support to companies they really need to embrace telecommuting options beyond providing emergency support.



Late Note: After 80 hours PSE were able to restore power to my neighborhood. Throughout the outage PSE did a fantastic job of handling things. Ultimately over 375,000 were without power at one time or another. PSE provided constant information wherever possible via their Twitter feed and website. They brought in crews from around the country to assist in the restoration efforts. All in all I think they handled things really well. I have no complaints with how they handled any of this. (I just wish I’d been closer to the top of the restore list)

How To Not Be a Commodity DBA – Part 1

A couple of months ago I wrote about the concept of the DBA as a commodity (DBAaaC), and how it can be so easy for management to consider any DBA as a basic, and replaceable resource within their organization. This got me to thinking (and it would seem that my thinking takes a long time) that maybe a little guidance would be in order to help prevent you from becoming a DBAaaC.

Not everyone is in the same situation, so I’ve broken this down in to a couple of parts, one for those of you that have the time, and inclination to get things done, and those of you that can barely find a half hour to themselves.

First up…when time for you isn’t a problem…

Continue reading “How To Not Be a Commodity DBA – Part 1”

The Nature of Absurdity

The other night I was adjusting some quorum settings for the WSFC that was underlying a couple of AGs at work. Quorum needed fixing up as we were going to perform a site failover while some network switches were updated on the primary site, and I didn’t want to risk the potential of an outage should a WAN link go down between the primary DC and the DR DC. Fortunately PS makes this pretty simple, I just ensured that quorum would be maintained by the votes in the DR DC by removing them from a couple of machines in the primary DC:

(Get-ClusterNode -Name MySQLNode1).NodeWeight=0;

(Get-ClusterNode -Name MySQLNode2).NodeWeight=0;


And then checking the votes on the WSFC to ensure that the DR site would stay up no matter what:

Get-ClusterNode | ft Name, NodeWeight -auto


Then I failed over the AG, I could have used T-SQL to do this in SSMS or SQLCMD:




But I’ve written a GUI interface to do this sort of thing in C#. This provides our NOC the ability to perform failovers themselves so that they can apply Windows CUs without having to call someone up to failover. The C# originally used T-SQL under the covers, but I’ve changed it recently so that it uses SMO (although have had to work around an issue using T-SQL):



The once the FEX and switch work was completed I added the votes back to the primary DC and failed everything back again.


It was while I was doing all this that I got to thinking about just how absurd all of this stuff is.

  • 20 years ago I would have laughed at the thought of all this random jargon and what it might mean
  • 15 years ago I would have laughed at the though of Microsoft creating a proper shell language, with them preferring dim heavy VBScript
  • 10 years ago I’d told you that you needed a disk quorum for your MSCS because otherwise you were going to have FCI problems with a machine was restarted
  • 5 years ago I’d have laughed at the idea of using AGs for production (that laughter has turned to tears many times)
  • 3 years ago the closest I would have gotten to C# would have been a tart orange
  • 2 years ago the concept of being a major contributor to a large open source project would have been laughable
  • 1 year ago I’d have given you an evil look at the merest mention of SMO


This stuff all moves so fast, it’s just ridiculous, there’s just no way to keep up with it all. Think of all the other changes over the last dozen years of SQL Server and Windows…things like DMVs, Azure, AWS, compression, TVPs, temp variables, CTEs, output clauses, lag and lead, partition by, partition tables, etc…

Spend a moment to think about all the stuff you’ve been through, for however long you’ve been doing this nonsense, and realize the absurd nature of it all.


By the way, this was deliberately acronym heavy, so here are some definitions:

  • AG – Availability Group
  • AWS – Amazon Web Services
  • CTE – Common Table Expression
  • DC – Data Center
  • DMV – Dynamic Management View
  • DR – Disaster Recovery
  • FEX – Cisco Fabric Extender (network stuffs)
  • FCI – Failover Cluster Instances
  • GUI – Graphical User Interface
  • MSCS – Microsoft Cluster Service (precursor to WSFC)
  • NOC – Network Operations Center (those folks who are the first in line when something goes bad, and also the ones that have the lousy job of waking you up in the middle of the night because something went wrong)
  • PS – PowerShell
    • ft – Format-Table (in Get-ClusterNode command)
    • -auto – automatically size data columns returned to the client
  • Quorum – a majority of votes in a WSFC that is required to ensure that the clustering services stay online to ensure that there is not a risk of a split-brain scenario which could cause massive data problems (oversimplification)
  • SMO – (SQL) Server Management Objects
  • SQL – Structured Query Language
  • SQLCMD – SQL Server Command Line Interface
  • SSMS – SQL Server Management Studio
  • T-SQL – Transact-SQL
  • TVP – Table-Valued Parameters
  • VBScript – Microsoft Visual Basic Scripting Edition
  • WAN – Wide Area Network
  • WSFC – Windows Server Failover Cluster

I’m pretty sure just going through all the SQL Server word salad is enough to drive a person to drink.

Bug – Getting Read-Routing Data Using SMO

I was futzing around with SMO recently, and wanted to grab information around the read-routing list in an Availability Group. Things went fine for 2012 and 2014, but when it came to SQL Server 2016 I ran into a problem.

The read-routing property in SMO is a string collection (reference https://msdn.microsoft.com/en-us/library/microsoft.sqlserver.management.smo.availabilityreplica.readonlyroutinglist.aspx), which is fine for 2012/4, but SQL Server 2016 introduced the idea of load-balanced read-routing. When used this provide you with the option of having multiple read-only replicas which can handle traffic, and are load-balanced (it only uses a basic round-robin algorithm, but that’s a lot better than the single replica option that exists in the earlier versions).  In order to correctly know how read-routing is configured you need to know what replicas are in a load-balanced group. SMO does not provide you with the ability to get at this information (nor does it give you the chance to configure read-routing in a load-balanced scenario), as it is a basic collection, and has no further property information around it.

I’ve created a Connect item (Gathering Read Routing information using SMO is inaccurate in SQL 2016) to try and get MS to look into this, and maybe provide a fix, please upvote if you can.

If you want to try this for yourself you can use the C# below, try it against the lower versions, and then on a load-balanced 2016 configuration. While it’s still possible to get this data from TSQL I find it annoying that the server management objects that are specifically written to deal with this stuff don’t do the job.

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using Microsoft.SqlServer.Management.Smo;
using Microsoft.SqlServer.Management.Common;

namespace SmoTesting
class Program
static void Main(string[] args)
Console.WriteLine("Enter the servername");
string connectServer = Console.ReadLine();
Console.WriteLine("Enter the AG name");
string agName = Console.ReadLine();

Server srv = new Server();
srv = new Server(connectServer);
srv.ConnectionContext.StatementTimeout = 60; //timeout after 60 seconds running the query

foreach (AvailabilityGroup ag in srv.AvailabilityGroups)
if (ag.Name == agName)
foreach (AvailabilityReplica ar in ag.AvailabilityReplicas)
if (ar.Name.ToString() == "connectServer")
foreach (Object obj in ar.ReadonlyRoutingList)
Console.WriteLine(" {0}", obj);
catch (Exception ex)
Console.WriteLine("press a key");

DBA as a Commodity

If you’ve been working around the IT industry for a few years then you have probably heard mention of commodity servers. These are machines that run a standardized hardware spec, and are easily interchangeable with others. The price point around the machines is that it is typically cheaper to replace than it is to repair them, so once the warranty runs out you just throw the old one away and get a new one.

I was thinking about this the other day, and how it really goes beyond servers, which is where I thought about Database Administrators as a Commodity (or DBAaaC).

When first considering DBAaaC as an idea it really does not seem to make much sense. After all DBAs have to do all kinds of different work, and many have very different sets of skills that are applicable to a wide variety of situations.

If you are thinking that then the odds are that you are a DBA, or have worked closely with one over the years. This is not something that the vast majority of people, in particular IT management type folks seem to think.

What makes me believe this to be the case?

DBA Job Requirements

I’ve been in the DBA business for about 17 years now (why yes, that does make me feel old, thank you for asking). In this time I have had a few jobs, interviewed for many, and been an interviewer for even more. I have seen more generic DBA job desscriptions than I care to remember.

Generic Job Description
Typical job description

There are hundreds of jobs descriptions for SQL Server DBA jobs that look almost exactly like the one above, do a search yourself and see what commonalities seem to exist between roles. Think back to the last job description that you read, was it something similar to this?

Would it surprise you to learn that the above job description was actually copied from the job description of a company that was looking for an Oracle DBA?


How many jobs have you seen looking for someone holding an MCDBA? That’s a certification that existed through SQL Server 2000 and was replaced by the MCITP with 2005, and then with the MCSE Data Platform in the last couple of years. It would seem that with management team that these things are easily interchangeable, after all there really is not difference between running SQL Server 2000 running in a cluster, and SQL Server 2016 with a multi-subnet Availability Group running on a cluster that is not joined to a domain, using DMVs to help you identify performance bottlenecks.

What about jobs that are looking for an MVP? That’s an award given by Microsoft for community contributions. It is not a barometer of technical knowledge, and yet I have seen several DBA jobs looking for people who are MVPs.

Over the years I have seen many jobs that also will only consider people that have bachelor degrees, because apparently having a bachelor degree means that you will be good at your job. I do not have a degree (of any kind), and know several excellent DBAs that do not either, but every one of us would be excluded from these jobs just because we did not take four years of college (and we’ll ignore for now how that plays into systemic racism in the US).


Here are the general guidelines for DBAs that I’ve seen over the years:

  • Junior DBA – < 3 years of experience
  • DBA – 3-6 years of experience
  • Senior DBA – > 6 years of experience

These are usually baked right into the job descriptions, and it is how companies seem to manage peoples careers and job progressions (not to mention candidates looking for job seem to think that their skills put them in line for a job thanks to years working with SQL Server, not with actual knowledge).

This completely ignores whether a person looking for a DBA role has 5 years or varied experience, or 5 years doing the same thing day in and day out (checking if jobs ran, and answering business questions).

All the Technologies

Your generic job description will frequently be looking for someone with experience (if not expertise) with

  • SSIS
  • SSRS
  • DTS (yes really)
  • AlwaysOn (even though it’s not a thing)
  • MSCS (even though it was replaced by WSFC almost a decade ago)
  • VLDB (many places believe > 50GB = VLDB)
  • VMWare
  • Azure
  • TCP/IP
  • Windows 2003/2008/2008R2/2012/2012R2
  • TSQL
  • Maintenance Plans
  • Replication
  • and many more…

There tend to be long lists, like the above, for job roles. It is really just a list of buzzwords that the hiring manager (or the recruiting team) have put together so that they can plug them into their software to automatically weed out candidates that don’t have those words in their resumes.

The above examples are all ways to quickly recognize if a company is likely to treat anyone coming in as a DBAaaC.

What would be better?

A More Reasonable Job Description

Given the very generic information that is listed for almost all DBA positions what would make a role stand out? Usually it is something that is a little more crafted, and provides and understanding of the world in which the DBA works. For example a job that listed the following things:

  • Required 3+ years with Availability Groups
  • Understanding of DMVs and how to use them to troubleshoot server performance
  • Can tune queries to improve performance and lower server impact
  • Ability to manage SQL Server backups and restores to meet companies documented RPO/RTO
  • Experience managing maintenance for tables larger than 100GB in size
  • Knowledge of how Windows and Active Directory settings can impact SQL Server performance
  • Can work with development teams on best practice guidelines for new code, and provide performance enhancements when problems occur
  • Can troubleshoot SSIS packages
  • Preferred current MCSE certification

This by no means would be a perfect list, however it does provide some good information, and would let a potential candidate know that management here at least has a clue as to what is required to perform in the role, and provides a basis for a conversation.

Working as a DBA

Once you are in a company how can you recognize whether or not you are being treated as a DBAaaC?

Think about potential new hires that the company are looking to make. Are they using generic job descriptions?

Do promotions seem to happen thanks to longevity in the role, rather than be based upon some more hard based criteria around the work that you do? Is there a planned career path for you to move to the next position?

Do you perform the same work day in, and day out? Are you solely focused on backups, or indexing, or checking on failed jobs?

Is there a way that you can receive ongoing training to help you get additional knowledge? (even if this means just paying for some online videos, or giving you a section of time every month to spend on skills development)

Does your manager treat you like a truly valued member of the team, or are you just there ensuring that things are set to run smoothly?

If any of the above ring true to you then the chances are that you are seen as a commodity, and you can be thrown away at any point, and replaced by someone else (at least management believes that to be the case).

A Word of Advice to Management

There is every chance that you believe that the people that work for you are interchangeable, no matter the role (I’ve seen this over the years with developers, database, Windows, and Unix admins). People really aren’t, except at the lowest skill levels (and frequently even not then).

People are not machines, do not think that you can just throw the old one away and pick a new one up. Consider the ongoing learning that is required for your people to have skills that remain current, and help you move ahead. Certainly, at times you will have people leave, or you may have to let people go, but don’t treat your entire team as if they could be replaced at a moments notice.

If you think that all DBAs are the same, spend some time looking on Stack Exchange at some of the questions asked, and consider what questions on there your DBAs could answer (and what questions they may post). You’ll quickly see that DBAs are all different, they have different areas of focus, and each can give you different things. They are far from being a commodity.

Who Needs a DBA?

I was perusing my RSS Feeds this morning, and came across an article by Steve Jones (b|t) about Changing Times. In this post Steve talks about Microsoft not having any DBAs in their cloud team, and that it is all managed by DevOps folks. This message came from a talk that was given at SQLBits and summarized by Victoria Holt.

MS devops facts

Wow, this looks amazing, and really seems to ring the death knell for the DBA…or does it?

Let’s consider what we are looking at here, and for the purposes of this, I am going to focus on Azure SQL Database (or whatever we’re calling it this week).

There are 1.7 million production databases that do not have DBAs. Well, that’s flat out wrong, there might not be DBAs at Microsoft, but you can be pretty sure that there are DBAs out there managing a portion of those. I guess we should really focus on the Microsoft side of things.

So there are 1.7 million databases, what is Microsoft’s role in these databases? Other than keeping up the infrastructure that the databases are based upon not much really. They have no posted performance metrics to which they must aspire. They are not going to be in the business of ensuring that your backups are taken, your HA requirements are based upon you setting up the services and regions correctly, and they don’t care how fast or slow your query is. There is an SLA for uptime, and don’t worry, if it gets below 99% you get a 25% discount.

Given that there are no requirements to perform any work that a typical DBA would be required to do why would they need a DBA at all? They wouldn’t. It is this kind of information that makes people who work as DBAs question their existence and future, and frankly, there is no need to do that.

As a DBA you will find that the cloud will become a significant part of your future (if it is not already), and your role will shift, but there will always be a place for the on-premises products (for as long as it is made), and that means DBAs to manage it. Heck if you want to see what the world without DBAs looks like just skim through dba.stackexchange sometime.

I feel that a lot of messaging coming out of Microsoft puts a not insignificant level of fear into some people, as they attempt to set an example where the only people that really matter are developers, and developers sitting in a DevOps role. Is Microsoft going to change their tune around any of this? Nope, I don’t see that happening at any point. One thing you could question is, given that there are all these Azure databases that don’t have DBAs why is it that Microsoft has any DBAs working for them at all? Surely DevOps people would be all that they need.

Logon Triggers Do Not Prevent SSMS Connections

I recently saw a post on StackExchange where a user was having a problem with tempdb filling up, which was causing their logon trigger to fail.

The logon trigger was attempting to prevent users from connecting with SQL Server Management Studio (SSMS). It seems that the user was not aware that it is trivial to connect using SSMS to SQL bypassing the app level restriction in place.

Let’s look at an example.

First we’ll create the logon trigger that explicitly looks to see if the application attempting to logon is management studio:

IF @app LIKE N'%Management Studio%'
THROW 51000, 'Connection not allowed.', 1;



Now we can test this by opening a new window. We should see an error:


Looks like it’s working just fine.

So let’s circumvent this, try connecting again, this time we just need to edit the Additional Connection Parameters in the connection dialog and pass along a new application name.


And now the connection will work just fine, completely bypassing the trigger (if trying this, don’t forget to disable your trigger when done).