Category: Availability Groups

Bug – Getting Read-Routing Data Using SMO

I was futzing around with SMO recently, and wanted to grab information around the read-routing list in an Availability Group. Things went fine for 2012 and 2014, but when it came to SQL Server 2016 I ran into a problem.

The read-routing property in SMO is a string collection (reference https://msdn.microsoft.com/en-us/library/microsoft.sqlserver.management.smo.availabilityreplica.readonlyroutinglist.aspx), which is fine for 2012/4, but SQL Server 2016 introduced the idea of load-balanced read-routing. When used this provide you with the option of having multiple read-only replicas which can handle traffic, and are load-balanced (it only uses a basic round-robin algorithm, but that’s a lot better than the single replica option that exists in the earlier versions).  In order to correctly know how read-routing is configured you need to know what replicas are in a load-balanced group. SMO does not provide you with the ability to get at this information (nor does it give you the chance to configure read-routing in a load-balanced scenario), as it is a basic collection, and has no further property information around it.

I’ve created a Connect item (Gathering Read Routing information using SMO is inaccurate in SQL 2016) to try and get MS to look into this, and maybe provide a fix, please upvote if you can.

If you want to try this for yourself you can use the C# below, try it against the lower versions, and then on a load-balanced 2016 configuration. While it’s still possible to get this data from TSQL I find it annoying that the server management objects that are specifically written to deal with this stuff don’t do the job.


using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using Microsoft.SqlServer.Management.Smo;
using Microsoft.SqlServer.Management.Common;

namespace SmoTesting
{
class Program
{
static void Main(string[] args)
{
Console.WriteLine("Enter the servername");
string connectServer = Console.ReadLine();
Console.WriteLine("Enter the AG name");
string agName = Console.ReadLine();


Server srv = new Server();
try
{
srv = new Server(connectServer);
srv.ConnectionContext.StatementTimeout = 60; //timeout after 60 seconds running the query

foreach (AvailabilityGroup ag in srv.AvailabilityGroups)
{
if (ag.Name == agName)
{
ag.PrimaryReplicaServerName.ToString());
foreach (AvailabilityReplica ar in ag.AvailabilityReplicas)
{
if (ar.Name.ToString() == "connectServer")
{
foreach (Object obj in ar.ReadonlyRoutingList)
{
Console.WriteLine(" {0}", obj);
}
}
}
}
}
}
catch (Exception ex)
{
Console.WriteLine(ex.InnerException.ToString());
}
finally
{
srv.ConnectionContext.Disconnect();
}
Console.WriteLine("press a key");
Console.Read();
}
}
}

Availability Groups Issue With 2016 CU2

SQL Server 2016 has been out for a few months now, with Cumulative Update 2 coming out in late September. Yesterday I was running into issues with deploying CU2 to one of my environments.

Typically, when running Availability Groups (AGs) you patch all of the secondary replicas, and then fail over to one of those which will then upgrade the user databases (SSISDB caveat not included). In this case after applying CU2 to one of the secondary replicas it was no longer able to communicate properly with the primary, and so was not synchronizing.

Looking in the SQL Server error log showed the following error:

An error occurred in a Service Broker/Database Mirroring transport connection endpoint, Error: 8474, State: 11. (Near endpoint role: Target, far endpoint address: ”)

This was indicative of an issue with the AG HADR endpoint (yes, still called a database mirroring connection).

Figuring that the connection issue was with the newly patch secondary I queried for the connection error on the secondary.

SELECT r.replica_server_name ,
 r.endpoint_url ,
 rs.connected_state_desc ,
 rs.last_connect_error_description ,
 rs.last_connect_error_number ,
 rs.last_connect_error_timestamp
 FROM sys.dm_hadr_availability_replica_states rs
 JOIN sys.availability_replicas r ON rs.replica_id = r.replica_id
 WHERE rs.is_local = 1;

The relevant column here being last_connect_error_description

An error occurred while receiving data: ‘10054(An existing connection was forcibly closed by the remote host.)’.

Having checked the error log and knowing that there were no login errors I knew that there was something else going on.

The servers in question were not fresh builds of SQL Server 2016, rather they had been upgraded from earlier versions, with the AGs being upgraded along the way. Older versions of SQL Server used the RC4 encryption algorithm on the endpoints, and so I was curious as to whether that had been changed as a part of any of the upgrade processes.

SELECT name ,
 type_desc ,
 state_desc ,
 role_desc ,
 is_encryption_enabled ,
 connection_auth_desc ,
 encryption_algorithm_desc
 FROM sys.database_mirroring_endpoints;

The relevant column being the encryption_algorithm_desc

RC4

I thought there was a chance that this was the issue, and wanted to change it, but in a non-breaking way (as there was another secondary replica that was using this algorithm).

Fortunately, the SQL Server team provided the ability to use more than one algorithm on and endpoint (or even none). By altering the endpoint I could specify a newer AES algorithm, with a fallback to RC4. All it required was an alter statement to be executed on each of the replicas.

ALTER ENDPOINT [Hadr_endpoint]
 STATE=STARTED
 AS TCP (LISTENER_PORT = 5022, LISTENER_IP = ALL)
 FOR DATA_MIRRORING (ROLE = ALL, AUTHENTICATION = WINDOWS NEGOTIATE
 , ENCRYPTION = REQUIRED ALGORITHM AES RC4)
 GO

As soon as this command was executed the AG picked up, the databases started synchronizing once more, and things were back to a happy state.

As a recommendation, check your endpoint encryption algorithms prior to applying any cumulative updates, or service packs to SQL Server, and ensure that they are current (use AES primarily). You also have the option to turn off encryption, but I wouldn’t recommend it.

TL;DR

If you are running an older RC4 encryption algorithm on your Availability Group or Database Mirroring endpoints you may lose connectivity when applying 2016 cumulative updates. Update to the newer AES algorithm to prevent this.

 

 

BUG with Availability Groups and sys.master_files

I recently came across a bug with SQL Server and Availability Groups whereby catalog view data is incorrectly reported on all secondary replicas.

This bug has the potential for putting the availability of your environment at risk as reporting around capacity could be calculated incorrectly.

Continue reading “BUG with Availability Groups and sys.master_files”