Improving Performance When Querying Multiple DMVs

A couple of days ago I posted a stored procedure (sp_GetAGInformation) which queried multiple DMVs to pull together a bunch of AvailabilityGroup information. If you took a look at the code you would see that it used a couple of CTEs (Common Table Expressions).

CTEs are a great way to do recursive work, and they can also greatly simplify reading code. A CTE without recursion is really nothing more than a subquery that is nicely wrapped. 

For example:

Basic CTE

Is the same thing as:

Basic Subquery

This can easily lead you down the path towards poor performance. It is quite easy to define a CTE once and use it multiple times, not realizing that every time you use the CTE then you are performing that subquery, meaning is has to be evaluated and executed. 

For smaller queries this is not usually a problem, but for larger queries and more importantly when working with DMVs this can be a serious performance problem.

The original version of sp_GetAGInformation used CTEs exclusively, and it seemed to run ok. Once it was deployed I found that every once in a while it would perform horrendously. The usual execution time was around a second, but sometimes it would run for 5 minutes or longer.  When I started to track down the performance details I came to discover that a lot of the DMVs are using Table Valued Functions under the covers. These can have inherent performance problems when it comes to joins as there are no associated statistics. 

sys.availability_groups plan

sys.dm_hadr_availability_replica_states plan

A rewrite to use Table Variables made a significant performance improvement and eliminated the one off crazy plans. To understand why let's take a look at the final plans comparing the Table Variable version of sp_GetAGInformation and the CTE based sp_GetAGInformation_Slow  (all plans pulled using the free SQL Sentry Plan Explorer)

sp_GetAGInformation_slow - click to expand

sp_GetAGInformation - click to expand

As you can see, the CTE based plan is large. It actually take a 3MB memory commit to run, and when it does so there are a large number of TVFs in there that get joined to other TVFs.

sp_GetAGInformation_slow more details - click to expand

I encourage you to pull the procs and take a look at the plans on your local test systems, and to test performance with CTEs versus Table Variables/Temp Tables. The difference might surprise you. But for sure, avoid using DMVs within CTEs in your queries.

sp_GetAGInformation - Updated

Yesterday I posted sp_GetAGInformation, a stored procedure for gathering information about configuration of Availability Groups. Based on feedback I've added an additional column to indicate the health state of the AG.

Download the updated version for this additional information.

Gathering AG Information - sp_GetAGInformation

The release of SQL 2012 brought about SQL Server AlwaysOn Availability Groups (AGs) as a new way to manage HA for databases.

With AGs came a whole lot of new DMVs to give you information. They also provided a nice dashboard which gives a view into the status of a particular AG

AG Dashboard

This can be quite useful, however it is missing a great deal of information, that as a DBA, I would find useful, like read routing and listener configurations. On top of that the dashboard only provides information on one of the AGs at a time. If you have more than one AG then you have to open up an entirely new dashboard.

This just wasn't working out for me, and so I wrote a stored procedure (sp_GetAGInformation) to provide me with the configuration information for all the AGs running on a server.

When executed it provides:

  • Availability Group Name
  • Listener Name (if exists)
  • Primary Replica Name
  • Automatic Failover Partner (if exists)
  • Sync Secondary Replicas (if any)
  • Async Secondary Replicas (if any)
  • Read Routing Replicas (if any, in routing order)
  • List of Databases in Availability Group

Results of executing sp_GetAGInformation

As you can quickly see in the above example the AGAdvWrks AG has a listener, an auto-failover partner and two servers in the read routing order. It also contains two databases. AGTestAG doesn't have any sync secondaries, or a listener, and only contains a single database.

If you have several AGs running in your environment this can be a real time saver. What's also great is to pull this data centrally and report against it.

For example, right now I have a PowerShell process that queries every server, pulls the data back to a central location and reports on any changes in the configuration (if a servers gets pulled out for some reason, or a database added or removed from an AG). This can be an a real timesaver, in particular when you need to connect to a primary, but aren't sure which server it is (given that neither SQLPS nor SSMS support multisubnet failover connection settings).

One of the limitations is that the data can only be obtained from the primary in an AG as certain sets of the data only reside there, and the read routing configuration can be (and should be) set differently on each server.

Give sp_GetAGInformation as try and let me know what you think. Any ideas for improvements are warmly welcomed.