Tag: Availability Groups

The Deception of the Log Reuse Wait Type

One day you’re happily working away and get interrupted by an alert that says your transaction logs are getting full or maybe growing. Ever the vigilant person you logon, crank open SSMS or ADS and take a look at sys.databases to check out the log_reuse_wait_desc.

AVAILABILITY_REPLICA the DMV proudly advises. Availability Groups – the lifesaver of your organization and the bane of your nightmares.

You crack open a SQL script and run it against the primary to see which replica is running behind. Strangely all the replicas appear to be up to date. The send queue is hovering around 60. The redo queue is not greater than a thousand on any of the asynchronous replicas.

Unsure why this is happening you use the tried and trusted method of restarting the hadr endpoint to see if that kicks things into motion. Nothing is resolved. The log is geting fuller and fuller and you really don’t want that thing to grow.

What do you do next?

You may go around to the secondary replicas and remove them from the AG, one at a time, to see if they fix the problem. Alternatively, you go ahead and check the running sessions on each of the replicas to see if there’s something going on you haven’t considered.

Your query discovers that the AG replica in your DR site is still taking a backup from overnight. It should have completed hours ago, but for some reason it is still running. Wondering if it is related you kill the backup.

With the backup dead, suddenly the transaction log on your primary clears and you can get back to your interrupted nap hard work.

 

This is an example of a choice in messaging within SQL Server that makes it more difficult to find a solution.

In this case the problem of the transaction log not truncating would have ideally been marked as ACTIVE_BACKUP_OR_RESTORE, after all, it was a backup that was blocking the log truncation. But the folks that wrote the product felt AVAILABILITY_REPLICA was more appropriate because the backup was running on a different replica than the primary and that was the root location for the truncation issue.

So, be aware, messaging may not always accurately reflect problems that are occurring.

 

Bug – Getting Read-Routing Data Using SMO

I was futzing around with SMO recently, and wanted to grab information around the read-routing list in an Availability Group. Things went fine for 2012 and 2014, but when it came to SQL Server 2016 I ran into a problem.

The read-routing property in SMO is a string collection (reference https://msdn.microsoft.com/en-us/library/microsoft.sqlserver.management.smo.availabilityreplica.readonlyroutinglist.aspx), which is fine for 2012/4, but SQL Server 2016 introduced the idea of load-balanced read-routing. When used this provide you with the option of having multiple read-only replicas which can handle traffic, and are load-balanced (it only uses a basic round-robin algorithm, but that’s a lot better than the single replica option that exists in the earlier versions).  In order to correctly know how read-routing is configured you need to know what replicas are in a load-balanced group. SMO does not provide you with the ability to get at this information (nor does it give you the chance to configure read-routing in a load-balanced scenario), as it is a basic collection, and has no further property information around it.

I’ve created a Connect item (Gathering Read Routing information using SMO is inaccurate in SQL 2016) to try and get MS to look into this, and maybe provide a fix, please upvote if you can.

If you want to try this for yourself you can use the C# below, try it against the lower versions, and then on a load-balanced 2016 configuration. While it’s still possible to get this data from TSQL I find it annoying that the server management objects that are specifically written to deal with this stuff don’t do the job.


using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using Microsoft.SqlServer.Management.Smo;
using Microsoft.SqlServer.Management.Common;

namespace SmoTesting
{
class Program
{
static void Main(string[] args)
{
Console.WriteLine("Enter the servername");
string connectServer = Console.ReadLine();
Console.WriteLine("Enter the AG name");
string agName = Console.ReadLine();


Server srv = new Server();
try
{
srv = new Server(connectServer);
srv.ConnectionContext.StatementTimeout = 60; //timeout after 60 seconds running the query

foreach (AvailabilityGroup ag in srv.AvailabilityGroups)
{
if (ag.Name == agName)
{
ag.PrimaryReplicaServerName.ToString());
foreach (AvailabilityReplica ar in ag.AvailabilityReplicas)
{
if (ar.Name.ToString() == "connectServer")
{
foreach (Object obj in ar.ReadonlyRoutingList)
{
Console.WriteLine(" {0}", obj);
}
}
}
}
}
}
catch (Exception ex)
{
Console.WriteLine(ex.InnerException.ToString());
}
finally
{
srv.ConnectionContext.Disconnect();
}
Console.WriteLine("press a key");
Console.Read();
}
}
}

BUG with Availability Groups and sys.master_files

I recently came across a bug with SQL Server and Availability Groups whereby catalog view data is incorrectly reported on all secondary replicas.

This bug has the potential for putting the availability of your environment at risk as reporting around capacity could be calculated incorrectly.

Continue reading “BUG with Availability Groups and sys.master_files”