Tag: SQL

Using STRING_AGG to Return Schema Information

A few weeks ago, someone came to me wanting a dump of tables in a database and all the columns as well. They were working on some data governance and the first step was to know roughly what existed in the database.

After a chat, we decided what they needed was a list of columns and all the tables where those columns were used and another list of tables with a list of the columns used in the table. To help import the data into their tooling they needed a comma separated list of the tables for each column and the columns for each table.

After spending many years using various forms of FOR XML to do this sort of thing, I decided it was time to make my life easier and use STRING_AGG instead.

STRING_AGG can take a set of data and use the provided delimiter to create a list. As a bonus, you can also use the WITHIN GROUP to order the list of items within the STRING_AGG statement.

Here’s a quick example of I used STRING_AGG to return the table and column data requested by the governance folks.

/* Return a list columns in the database and for each column a comma delimited list of tables in which that column appears */
SELECT
c.name AS ColumnName
, STRING_AGG(CONCAT( QUOTENNAME(s.name), '.', QUOTENAME(t.name)), ', ')
WITHIN GROUP (ORDER BY CONCAT(QUOTENAME(s.name), '.', QUOTENAME(t.name))) AS TableListForColumn
, COUNT(*) AS TableUsageCountForColumn
FROM sys.tables AS t
JOIN sys.columns AS c ON c.object_id = t.object_id
JOIN sys.schemas AS s ON s.schema_id = t.schema_id
GROUP BY c.name
ORDER BY TableUsageCountForColumn DESC
, ColumnName ASC;
/* Return a list of tables in a database and for each table a comma delimited list of columns in that table */
SELECT
CONCAT(QUOTENAME(s.name), '.', QUOTENAME(t.name)) AS TableName
, STRING_AGG(c.name, ', ' )
WITHIN GROUP (ORDER BY c.name) AS ColumnList
, COUNT(*) AS CountColumnsInTable
FROM sys.tables AS t
JOIN sys.columns AS c ON c.object_id = t.object_id
JOIN sys.schemas AS s ON s.schema_id = t.schema_id
GROUP BY concat(quotename(s.name), '.', quotename(t.name))
ORDER BY CountColumnsInTable DESC
, TableName ASC

BUG with Availability Groups and sys.master_files

I recently came across a bug with SQL Server and Availability Groups whereby catalog view data is incorrectly reported on all secondary replicas.

This bug has the potential for putting the availability of your environment at risk as reporting around capacity could be calculated incorrectly.

Continue reading “BUG with Availability Groups and sys.master_files”

Getting the SQL Server Install Path

One of the challenges that comes along with a large number of SQL Servers, that aren’t always built out in a standard fashion, is that you run into problems whereby correct AV exclusions are not set, leading to performance problems (and other issues).

Going through and quickly grabbing the location of the SQL Server executable, or the directory in which it resides can be cumbersome. Doing it once is a pain, doing it 30 times multiplies that pain by a factor of about 400.

A quick query run against your servers (and done against server groups) can quickly return results for your machines and help you get on your way again.

Continue reading “Getting the SQL Server Install Path”

Gathering AG Information – sp_GetAGInformation

The release of SQL 2012 brought about SQL Server AlwaysOn Availability Groups (AGs) as a new way to manage HA for databases.

With AGs came a whole lot of new DMVs to give you information. They also provided a nice dashboard which gives a view into the status of a particular AG
AG+Dashboard

AG Dashboard

This can be quite useful, however it is missing a great deal of information, that as a DBA, I would find useful, like read routing and listener configurations. On top of that the dashboard only provides information on one of the AGs at a time. If you have more than one AG then you have to open up an entirely new dashboard.

This just wasn’t working out for me, and so I wrote a stored procedure (sp_GetAGInformation) to provide me with the configuration information for all the AGs running on a server.

When executed it provides:

  • Availability Group Name
  • Listener Name (if exists)
  • Primary Replica Name
  • Automatic Failover Partner (if exists)
  • Sync Secondary Replicas (if any)
  • Async Secondary Replicas (if any)
  • Read Routing Replicas (if any, in routing order)
  • List of Databases in Availability Group

sp_GetAGInformation+Results
Results of executing sp_GetAGInformation

As you can quickly see in the above example the AGAdvWrks AG has a listener, an auto-failover partner and two servers in the read routing order. It also contains two databases. AGTestAG doesn’t have any sync secondaries, or a listener, and only contains a single database.

If you have several AGs running in your environment this can be a real time saver. What’s also great is to pull this data centrally and report against it.

For example, right now I have a PowerShell process that queries every server, pulls the data back to a central location and reports on any changes in the configuration (if a servers gets pulled out for some reason, or a database added or removed from an AG). This can be an a real timesaver, in particular when you need to connect to a primary, but aren’t sure which server it is (given that neither SQLPS nor SSMS support multisubnet failover connection settings).

One of the limitations is that the data can only be obtained from the primary in an AG as certain sets of the data only reside there, and the read routing configuration can be (and should be) set differently on each server.

Give sp_GetAGInformation as try and let me know what you think. Any ideas for improvements are warmly welcomed.

Traffic Flow With Read-Intent Routing

One of the big advantages to using SQL Server Availability Groups is the ability to automatically push read traffic over to a secondary server. This is particularly useful for larger queries that would take a few seconds to run and consume large amounts of resources. It’s not something recommended for short, fast queries, just because the additional latency of connecting to the secondary could slow down the overall response time for the query.

The Microsoft documentation on setting up Read-Only Routing in SQL AGs is pretty solid and explains how to get this up and running.

 

Firewall and traffic routing

In secure environments there is usually a firewall that resides between the front end web, application or mid-tier servers and the backend database server. This firewall would block all traffic to the backend except for specific ports to specific IP addresses. This is one of the defense in depth items that helps to keep your databases secure.

When using a firewall in conjunction with SQL Server Availability Groups (AGs) it is common to just open up the firewall to the AG Listener. That way there is a single IP open for all the database servers that reside in the AG and any machine that is not acting as the AG primary is not available through the firewall (reducing attack vectors again, a nice side effect).

Given this you might well expect that when routing traffic off to a readable secondary in the AG that it would follow the flow of:

download (5)

Here the client (either directly or through a web, app, or mid-tier) performs an action that does a read query against the AG Listener. The expected traffic flow would be (from what we would see IP address wise, the AG Listener would actually connect to the primary, in this case SQL1):

Client – AG Listener – Readable Secondary – AG Listener – Client
so
Client – SQLAG01 – SQL2 – SQLAG01 – Client

This way the primary server (in this case SQL1) would arbitrate all the traffic for the query that comes in. In fact read routing does not function this way.

In order to perform the expected task of reducing the load on the primary the primary actually tells the client to redirect to the secondary server, and so the process goes:

download (6)

The correct communication is

Client – AG Listener – Secondary – AG Listener – Client – Secondary – Client
or
Client – SQLAGL01 – SQL2 – SQLAGL01 – Client – SQL2 – Client

When the client request comes in SQL has to check that the readable secondary is available to accept the query (otherwise it will go to the next server in the routing list, which is why you should always have the primary as the last server in the routing list, just in case every other server is out of service).

This means the query will take a little longer to execute as the arbitration and network changes will take additional milliseconds to complete (why it is not ideal for small, fast selects).

 

Where does the firewall come in?

Using a firewall and only opening up the IP of the Listener is the best way to handle security, but if you want to use readable secondary server and read-intent routing that’s not going to work. Due to the way that the traffic is routed you would need to open up the firewall to each individual server and port that would be a secondary.

So in our above example the firewall would need to be opened to SQLAGL01, SQL1 & SQL2 in order to support client requests. If those rules aren’t opened then you’re client traffic will be blocked and you’ll get the dreaded “Named Pipes Provider: Error 40” error, which isn’t much of a help.

 

Testing your read-intent connections

A really useful way of testing your read-intent connections is to use a quick PowerShell script from your front end server (if running Windows) prior to putting it into rotation. Download Check-ReadRouting.PS1 and enter the AG Listener name, or IP Address and the name of a database in the AG. If things are working correctly it will return the name of the primary and first server in your read-only routing list.

If you get a timeout then you have either not set the read-intent URL correctly for your secondary, or you are having firewall issues connecting, and so should investigate further.

 

Read-routing can be really powerful and useful, you just have to be careful of the gotchas in getting it working correctly.

 

SHOWPLAN Permission Denied For A Sysadmin

I came across a curious issue the other day when trying to look at the actual execution plan of a stored procedure that was being developed. Strangely I couldn’t get an estimated plan either. All I got was the confusing error:

 

SHOWPLAN permission denied in database ‘TestDB’

As a sysadmin on the system in question I was very confused by this and it took a little bit of investigation to find out that the problem was actually being caused by the execution context of the stored procedure. 

Here’s a quick repro:

 

First, create a new user and table, then load that table up with a few rows (just for demo purposes)


IF (SELECT name FROM sys.databaseprincipals WHERE name = 'TestUser') IS NULL
  
CREATE USER [TestUser] WITHOUT LOGIN;
GO

IF (SELECT OBJECTID('dbo.SomeData', N'T')) IS NULL
BEGIN
   CREATE TABLE
dbo.SomeData
      
(
          
Id INT IDENTITY(1,1) NOT NULL
           ,
NothingNess CHAR(10) DEFAULT ''
          
, CONSTRAINT PK_SomeData PRIMARY KEY(Id)
       )
END
GO

INSERT INTO dbo.SomeData DEFAULT VALUES;
GO 100

Now we’ll create a couple of procs to pull data from the table. The first will execute under the context of the calling user, and the other under the context of the user TestUser that was just created.

CREATE PROCEDURE dbo.GetSomeData @Id INT
AS
SELECT
Id, NothingNess
FROM dbo.SomeData
WHERE Id = @Id
GO

CREATE PROCEDURE dbo.GetSomeDataTestUser @Id INT
WITH EXECUTE AS
'TestUser'
AS
SELECT
Id, NothingNess
FROM dbo.SomeData
WHERE Id = @Id
GO

Now we can execute both of these procedures and get results just fine.

 

EXEC dbo.GetSomeData 1;
EXEC dbo.GetSomeDataTestUser 1;

However if we attempt to run while grabbing the actual execution plan, or attempt to grab an estimated plan then the second call dbo.GetSomeDataTestUser will fail with the SHOWPLAN permission denied error.

What happens is that when the stored procedure runs it does so under the context of TestUser, and that user does not have SHOWPLAN permissions in the database. This means that you cannot display an execution plan, not could you gather TIME or IO statistics. 

This didn’t make much sense to me at first, just because I was a sysadmin, but after thinking for a while it is actually the correct way to handle things. All the permissions are contained for execution of the code within TestUser. So if TestUser does not have the rights to do something, or access something then the calling user would not be able to get to that information.

As a workaround for this all that needs to happen is to grant the showplan permission to TestUser and everything will be fine. For example:

GRANT SHOWPLAN TO TestUser;

It’s worth noting that this permission is database scoped, so don’t expect to execute it in one database and have it work everywhere.

Technet has a good write-up about all the things covered by SHOWPLAN, it is worth a read for more information.


TL;DR – If you get SHOWPLAN denied errors check that the context for the user actually execution the query has the requisite permissions, and grant showplan if needed. 

SQL Server 2012 Data Loss Bug

Microsoft have just posted a hotfix for a very serious bug in SQL Server 2012

This bug could potentially cause you to lose data during online index rebuilds.

 

At this time I would not recommend downloading the hotfix (wait for a CU which will address the problem). In the meantime follow the posted workaround which is to perform online index rebuilds with a MAXDOP of 1. For example:

 

ALTER INDEX <idxname> ON <objectname> REBUILD WITH (ONLINE=ON, MAXDOP=1);

 

Change Tracking Cleanup Limitation

I’ve been “fortunate” enough to be working a great deal with Change Tracking recently and have run into a limitation that it’s worth keeping an eye out for on your servers.

If you have a database, with Change Tracking enabled, and the total number of transactions exceeds 14.4 million for all of the tables that are tracked in that database then you are going to be in for some hurt.

The problem is that the process to clean up the CT data in the (internal) syscommittab table only runs once a minute, and only cleans up a maximum of 10,000 rows every time it runs.

10000 (rows) x 60 (minutes an hour) x 24 (hours a day) = 14,400,000

Given that it cannot clear faster than that limitation the internal table will continue to grow over time and it will cause other cleanup processes to slow down, as well as your CT queries. This is definitely an issue in SQL 2012, I have no yet checked other versions.

Here’s a quick screen grab where you can see the cleanup and the number of rows cleaned up:

download (7)

 

You can see if you are running into this limitation by checking the age of the earliest syscommittab row and seeing if it exceeds your retention time

select datediff(hour, commit_time, getutcdate()) 
from sys.dm_tran_commit_table 
where commit_ts = (select min(min_valid_version) 
  from sys.change_tracking_tables)

 

You can also check the size and rowcount of the syscommittab table to gauge if it’s something you should be concerned with

selectt.Name as CTName
, sum(p.rows) as ChangeTrackingRows
, convert(numeric(10,2),(sum(u.total_pages)*8)/1024.0) as SizeMB
from sys.internal_tables t
join sys.partitions p on t.object_id = p.object_id
join sys.allocation_units u on (p.hobt_id = u.container_id OR p.partition_id = u.container_id)
left join sys.change_tracking_tables c on t.parent_object_id = c.object_id
where t.name = 'syscommittab'
group by t.name

 

I have a whole bunch more stuff coming on Change Tracking (including an undocumented workaround for this issue), but for now this is important to know.

Filtered Indexes on Computed Columns

I was trying to improve the performance of some code recently and decided that a filtered index could really help me out. Upon attempting to add the index I got an error stating that you cannot add filtered indexes to computed columns. This is the case even if the computed column is persisted.

That’s a shame as this would be a really useful thing to have. 

There is actually an open Connect item on this. It’s been open for 4 years now, so I’m not sure that there will ever be any traction on it, but feel free to upvote it yourself.

 

Code to repeat the problem below:

CREATE TABLE #SomeData 

    (

      RowID INT NOT NULL CONSTRAINT PKSomeData PRIMARY KEY

    , Quantity INT NOT NULL

    , Price MONEY NOT NULL

    , TotalCost AS (Quantity * Price) PERSISTED

    );

 

CREATE INDEX IDX_CostGreaterThan10 ON #SomeData (RowID) WHERE TotalCost > 10;

Msg 10609, Level 16, State 1, Line 1
Filtered index ‘IDX_CostGreaterThan10’ cannot be created on table ‘#SomeData’ because the column ‘TotalCost’ in the filter expression is a computed column. Rewrite the filter expression so that it does not include this column.