Change Tracking Cleanup – Example

In my last post I wrote about a limitation of Change Tracking and in particular a limitation in the cleanup process for the syscommittab table.

I thought that a quick example of seeing this in action might be useful in allowing you to check for the issue yourself.

 

First things first, we setup a database, enable CT, create and load up a table with rows.

 


CREATE DATABASE CTTest;
ALTER DATABASE CTTest SET RECOVERY SIMPLE;
ALTER DATABASE CTTest SET CHANGE_TRACKING = ON (CHANGE_RETENTION = 5 MINUTES, AUTO_CLEANUP = ON);
USE CTTest
GO
CREATE TABLE CTTable (C1 INT NOT NULL
,
C2 UNIQUEIDENTIFIER NOT NULL
,
C3 UNIQUEIDENTIFIER NOT NULL
,
CONSTRAINT PK_CTTable PRIMARY KEY CLUSTERED (C1)
);
GO
ALTER TABLE CTTable ENABLE CHANGE_TRACKING;
GO
CREATE TABLE CTLoader (C1 INT NOT NULL
,
C2 UNIQUEIDENTIFIER NOT NULL
,
C3 UNIQUEIDENTIFIER NOT NULL
,
CONSTRAINT PK_CTLoader PRIMARY KEY CLUSTERED (C1)
);
SET NOCOUNT ON;
DECLARE @i INT = 1
WHILE @i < 2000000
BEGIN
BEGIN TRAN
INSERT INTO
CTTable
SELECT @i, NEWID(), NEWID();
COMMIT TRAN
SELECT
@i += 1
END

With the table in place we create an XEvent session to capture when the cleanup runs for syscommittab.

CREATE EVENT SESSION [ChangeTrackingCleanup] ON SERVER
ADD EVENT sqlserver.syscommittab_cleanup
(
ACTION(
sqlserver.database_id
, sqlserver.sql_text
, sqlserver.session_id
, sqlserver.username
, sqlserver.client_hostname
, sqlos.task_time
, sqlserver.tsql_stack
, sqlserver.database_name
, package0.collect_system_time
)
WHERE (database_id=6)
)
ADD TARGET package0.event_file(SET filename=N'C:Program FilesMicrosoft SQL ServerMSSQL11.MSSQLSERVERMSSQLLogChangeTrackingCleanup.xel')
WITH (EVENT_RETENTION_MODE=ALLOW_SINGLE_EVENT_LOSS)
GO
ALTER EVENT SESSION [ChangeTrackingCleanup] ON SERVER STATE = START
GO

 

With the session running we just need to wait a few minutes and pull in the XEvent data to see what’s been going on with the cleanup (HT for getting the XEM information to Jon Kehayias post)

DECLARE @path NVARCHAR(260), @mdpath NVARCHAR(260)
-- Get the log file name and substitute * wildcard in
SELECT
@path = LEFT(column_value, LEN(column_value)-CHARINDEX('.', REVERSE(column_value)))
+
'*'
+ RIGHT(column_value, CHARINDEX('.', REVERSE(column_value))-1)
FROM sys.dm_xe_sessions s
JOIN sys.dm_xe_session_object_columns soc
ON s.address = soc.event_session_address
WHERE s.name = 'ChangeTrackingCleanup'
AND soc.OBJECT_NAME = 'event_file'
AND soc.column_name = 'filename'
-- Get the metadata file name and substitute * wildcard in
SELECT
@mdpath = LEFT(column_value, LEN(column_value)-CHARINDEX('.', REVERSE(column_value)))
+
'*'
+ RIGHT(column_value, CHARINDEX('.', REVERSE(column_value))-1)
FROM sys.dm_xe_sessions s
JOIN sys.dm_xe_session_object_columns soc
ON s.address = soc.event_session_address
WHERE s.name = 'ChangeTrackingCleanup'
AND soc.OBJECT_NAME = 'event_file'
AND soc.column_name = ' metadatafile'
SELECT
row_number() OVER (
ORDER BY event_data.value('(event/@timestamp)[1]', 'datetimeoffset')) AS RowNum
, n.value('(data[@name="database_id"]/value)[1]', 'int') AS DatabaseId
, n.value('(data[@name="rows_deleted"]/value)[1]', 'bigint') AS CommitRowsDeleted
, CONVERT(datetime2(0), DATEADD(hour, -7, n.value('(data[@name="cleaned_upto_date"]/value)[1]', 'datetimeoffset'))) AS CleanedUpToDate
, event_data.value('(event/@name)[1]', 'nvarchar(200)') AS EventType
, CONVERT(datetime2(0), DATEADD(hour, -7, event_data.value('(event/@timestamp)[1]', 'datetimeoffset'))) AS EventTime
FROM (
SELECT
CAST(event_data AS XML) AS event_data
FROM
sys.fn_xe_file_target_read_file(@path, @mdpath, NULL, NULL)
WHERE
OBJECT_NAME = 'syscommittab_cleanup'
) AS tab
CROSS APPLY event_data.nodes('event') AS q(n)

syscommittab+cleanup

Running this shows that the cleanup process runs about every 62 seconds and only deletes 10,000 rows. Given that, there’s no way to be able to keep syscommittab under control if you run more than 14 million transactions through change tracked tables on a daily basis.

Change Tracking Cleanup Limitation

I’ve been “fortunate” enough to be working a great deal with Change Tracking recently and have run into a limitation that it’s worth keeping an eye out for on your servers.

If you have a database, with Change Tracking enabled, and the total number of transactions exceeds 14.4 million for all of the tables that are tracked in that database then you are going to be in for some hurt.

The problem is that the process to clean up the CT data in the (internal) syscommittab table only runs once a minute, and only cleans up a maximum of 10,000 rows every time it runs.

10000 (rows) x 60 (minutes an hour) x 24 (hours a day) = 14,400,000

Given that it cannot clear faster than that limitation the internal table will continue to grow over time and it will cause other cleanup processes to slow down, as well as your CT queries. This is definitely an issue in SQL 2012, I have no yet checked other versions.

Here’s a quick screen grab where you can see the cleanup and the number of rows cleaned up:

download (7)

 

You can see if you are running into this limitation by checking the age of the earliest syscommittab row and seeing if it exceeds your retention time

select datediff(hour, commit_time, getutcdate()) 
from sys.dm_tran_commit_table 
where commit_ts = (select min(min_valid_version) 
  from sys.change_tracking_tables)

 

You can also check the size and rowcount of the syscommittab table to gauge if it’s something you should be concerned with

selectt.Name as CTName
, sum(p.rows) as ChangeTrackingRows
, convert(numeric(10,2),(sum(u.total_pages)*8)/1024.0) as SizeMB
from sys.internal_tables t
join sys.partitions p on t.object_id = p.object_id
join sys.allocation_units u on (p.hobt_id = u.container_id OR p.partition_id = u.container_id)
left join sys.change_tracking_tables c on t.parent_object_id = c.object_id
where t.name = 'syscommittab'
group by t.name

 

I have a whole bunch more stuff coming on Change Tracking (including an undocumented workaround for this issue), but for now this is important to know.

Filtered Indexes on Computed Columns

I was trying to improve the performance of some code recently and decided that a filtered index could really help me out. Upon attempting to add the index I got an error stating that you cannot add filtered indexes to computed columns. This is the case even if the computed column is persisted.

That’s a shame as this would be a really useful thing to have. 

There is actually an open Connect item on this. It’s been open for 4 years now, so I’m not sure that there will ever be any traction on it, but feel free to upvote it yourself.

 

Code to repeat the problem below:

CREATE TABLE #SomeData 

    (

      RowID INT NOT NULL CONSTRAINT PKSomeData PRIMARY KEY

    , Quantity INT NOT NULL

    , Price MONEY NOT NULL

    , TotalCost AS (Quantity * Price) PERSISTED

    );

 

CREATE INDEX IDX_CostGreaterThan10 ON #SomeData (RowID) WHERE TotalCost > 10;

Msg 10609, Level 16, State 1, Line 1
Filtered index ‘IDX_CostGreaterThan10’ cannot be created on table ‘#SomeData’ because the column ‘TotalCost’ in the filter expression is a computed column. Rewrite the filter expression so that it does not include this column.

 

You Can’t Meet Your RPO/RTO With AlwaysOn

“That title may have caught your attention. AlwaysOn is the future of HA/DR for SQL Server, and has been since the release of SQL 2012.  

AlwaysOn is actually a marketing term which covers Failover Cluster Instances (FCIs) and Availability Groups (AGs). Allan Hirt (@sqlha | blog) is a strong proponent of ensuring that people understand what this actually means. So much so that he even ranted about it a little

I’ve used FCIs for years, going back to the active/passive clustering days of old, and I’ve used Availability Groups in the last few months. They are both great, and both have limitations: FCIs with their shared storage and AGs with some network and quorum oddities. 

Both of them will do a fine job for you if you have the time, patience, and in the case of AGs, money to get them up and running. They still will not allow you to meet your RPO/RTO though. 

Critical to your business and your users is your up time, and that’s where the Recovery Time Objective (RTO) and Recovery Point Objective (RPO)  come into play. They reflect amount of time it will take to get your services back up and running, as well as the level of data loss that you are willing to accept.

 

 

Where FCI/AG win

The key problem with FCI/AG is that they do everything that they can to ensure that transactions are kept as up to date as possible. With FCI you move an entire instance over to another node, everything committed goes with it. With AGs the log records are shipped to the secondaries and applied in a synchronous or asynchronous fashion. The asynchronous setting is designed to get transactions there as soon as possible, and great for longer distances or where low commit times are ultra-critical. Both of these solutions solve two problems…a hardware issue or a software issue.

What does that mean? If your server goes down, then you can failover and lose next to nothing and be back up and running quickly. If Windows goes out to lunch on one of the machines then you can failover and keep ticking along.

So where do they fall down? 

 

 

What FCI/AG cannot do

Let’s say there’s a code release and a table accidentally has an update run against it with no where clause. All of a sudden you have a table in a 500GB database which contains 3 million rows and all of the information is wrong. Your users cannot use the application, your help desk is getting call after call and you are stuck. 

Your only option here is to restore your backup and roll up your transaction logs to the point right before the update happened. You’ve done tests on this and know that it will take 120 minutes to get back to that point. Now you have a 2 hour outage and users are screaming, the CIO is at your desk wondering how this happened, and demanding you get the database back up sooner.

FCIs and AGs are not going to help you in this situation. That update is already committed and so failing over the instance won’t help. The transaction logs were hardened immediately on your synchronous partner and applied within 5 seconds on your asynchronous target.

So how has AlwaysOn helped you in this situation? It hasn’t. And while you can sit there cussing out Microsoft for pushing this solution that has this massive failing it’s not going to solve your problem. That’s why you need something more than AlwaysOn. 

 

 

You can pry Log Shipping from my cold dead hands

 

 “Log Shipping?” I hear you ask, “but that’s so old.”

It sure is. It’s old, it’s clunky, and it is perfect for the scenario I just mentioned.  

You can configure log shipping to delay writing transaction logs to remote servers. Let’s say you delay logs for 1 hour. That accidental mass update was performed, you realize that you are in trouble. You quickly apply the logs on the secondary to the point in time before the update, bring the database online and repoint your clients. You are back up again in 5 minutes. It’s a momentary issue. Sure, you have an outage, but that outage lasts a fraction of the time. Your help desk is not inundated with calls, your users aren’t left out in the cold for hours. 

There’s nothing to say that you have to delay applying those logs for an hour. It could be 2 hours, or even 24. It really all depends on how you want to handle things. 

Sure, you have to do manual failover, and you don’t have the ability for automatic page level restores from one of the synchronous AG secondaries, but you have a level of data resiliency that AlwaysOn does not provide you. 

 

So while AlwaysOn technologies are great, and you should absolutely use them to enhance HA/DR in your business, but you have to be aware of their limitations, and be sure to use other parts of SQL Server to ensure that you can keep your business running. 

In-Memory OLTP – Shiny, But Useful?

SQL 2014 introduces In-Memory OLTP to the product (although I prefer the name Hekaton).  It is a significant leap forward so far as the technology goes, and can lead to significant performance improvements when utilized, but is it going to be a technology for the masses? Maybe eventually, but right now I see limited use cases for it.

 

Limitations on compiled tables

First up there are limitations around what you can do with the In-Memory tables. There’s no option to perform a truncate or merge into one. You also cannot perform a cross-database join with one, or access it from a CLR module using context connections (see Accessing Memory-Optimized Tables Using Interpreted Transact-SQL).

Interested in using contained databases for portability? Forget using an In-Memory table then, just not going to happen. You can’t put one on a particular filegroup or partition either, or use computed columns. Like to use IDENTITY? Not any more. DRI? Nope, not supported, nor are clustered primary keys.

That being said you can do pretty much anything else with these tables, just so long as you don’t want to use a LOB data type, datetimeoffset, or potentially have a row overflow (see Supported Data Types). Then again, were that the case you shouldn’t be doing those kinds of things in your OLTP system. We’ve never seen anyone do that, right?

You could also have problems with transactions in these tables, and so you have to be very careful with setting retry logic

You might want to be careful when creating your tables too, because once they are there you can’t alter them, you actually have to drop and recreate it.

 

Limitations on indexes

Unique indexes are a no go, so are filtered ones. Want to drop or alter an index? Well that’s a table drop and recreate right there. Those null columns you have? Well you can’t have those in an index. Maybe it’s time to start listening to Joe Celko and getting rid of those pesky nullable columns.

 

Limitations on compiled procedures

On the bright side your developers will no longer be able to use cursors, so it’s not all doom and gloom. Set based items are a little more challenging though, you cannot do a multi-row insert using INSERT…VALUES, but that’s pretty minor, and just requires more typing.

You won’t be getting too recursive, CTEs aren’t supported, nor are subqueries. 

DELETE and UPDATE from aren’t allowed, nor are CASE statements. Forget about using UDFs, or the OUTPUT clause to get information back.

Want to get data from more than one table? UNION won’t get you there, nor will INTERSECT or EXCEPT. OUTER JOINS and APPLY aren’t supported either.

Filtering your results? Don’t expect to use ( OR IN (..)) as a construct or NOT, and you better be sure what you are looking for because LIKE isn’t allowed either. 

By no means is this a complete list, there are lots more limitations. 

 

So many limitations, such little time

In-Memory is fast, that’s been demonstrated, but there are so many restrictions around it’s usage that for most companies it is not going to be something that is even worth looking at. The use cases for it are probably too extreme.  Yes, it will get used, and used to great effect by some, but I think that we are looking at less than 1% of the install base could actually see true use cases and be able to adjust their development methods and code base to be able to support it (at least in this version).

So while this is an exciting breakthrough in performance you’ll have to excuse me while I keep working on tuning my code and indexes for performance, and setting up my AGs for DR.

 

 

 

For a full list of Unsupported Transact-SQL Constructs for In-Memory OLTP visit Technet

 

So Long MCM

The MCM cert is dead. So is the replacement MCSM and the MCA. That’s according to an email that was distributed tonight by Shelby Grieve, the Director of Certification Product Management. 

The end of the program comes in a short email that clearly talks of the end of the certs

Microsoft will no longer offer Masters and Architect level training rotations and will be retiring the Masters level certification exams as of October 1, 2013.

That’s pretty clear information right there.  

This from an email that came out at 10pm, the Friday of a long holiday weekend in the US. 

Microsoft appears to be attempting to bury this when people aren’t around, hoping that nobody will notice and that the furor will have calmed down some by the time people start checking back in on Tuesday.

 

Attaining an MCM is no easy feat. There is a reason that there are only a couple of hundred people with that cert. Taking this away really pains everyone who has worked so hard to get it. Those who have given up many hours, and for some many thousands of dollars. 

I guess it just means being a part of an exclusive club now, one that will not get any larger.  

That is a really sad thing, because there are people out there that should have this cert, that are actively working towards it, and now that is going to be taken away.

 

Follow Up 8/31/13 08:23

Jen Stirrup (blog | @jenstirrup) has created a Connect item “Please don’t get rid of the MCM and MCA programs“. If these are certs that you have any interest in at all I highly recommend you go and vote it up. Being honest, it won’t mean much, but it might at least let folks understand the deep level of dissatisfaction its killing has created. 

 

 

  

SSMS – Index Outside The Bounds Of An Array

Hit a strange issue this morning. I was working with one of the network guys testing out connecting to SQL behind an F5 load balancer, something I’ve done multiple times previously.

I was able to connect using SQLCMD, Invoke-SQLCmd and using a UDL source, but for some reason, every time I tried to connect using SSMS I would get an error stating that an “Index was outside the bounds of an array”.

SSMS+-+Index+out+of+bounds+on+an+array
Advanced Information for Error

A little research showed that this error cropped up when trying to connect to SQL 2012 from an earlier version of SSMS. This wasn’t the case here though, everything was SQL 2012, and I was able to connect to the server directly without any problems, it was only an error when going through the F5 VIP.

After a little work and research with the network admin we discovered that OneConnect was causing the issue. It was attempting to pool connections and was causing the problem, turning it off fixed the issue entirely.

Just something to keep an eye out for if you use F5’s to handle load balancing or DR client redirection.

Formatting Number Output

I’ve been working with some large data load processes recently and have been dumping upwards of a billion records into tables. As these are not instantaneous loads I found myself wanting to keep an eye on how much data was loaded.

This is nice and easy to do using the sys.partitions system view. Here’s an example from AdventureWorks

SELECT t.name, rows

FROM sys.partitions p

   INNER JOIN sys.tables t

       ON p.OBJECT_ID = t.OBJECT_ID

WHERE p.index_id IN (0,1)

ORDER BY rows DESC;

top+10

This is all good and well, but when you start getting to large values on the tables it becomes a little difficult to tell the difference between 31 million and 312 million. That’s when I resort back to using commas. Unfortunately SQL Server does not make this easy, although it can be accomplished with a couple of converts.

SELECT t.name, substring(convert(varchar, convert(money, rows), 1), 1, len(convert(varchar, convert(money, rows), 1))-3)

FROM sys.partitions p

   INNER JOIN sys.tables t

       ON p.OBJECT_ID = t.OBJECT_ID

WHERE p.index_id IN (0,1)

ORDER BY rows DESC

top+10+2

This is still a little arduous to type, so I created a function to do the work. Now it’s just a case of calling that function whenever I want to have the numbers presented to me in a much more readable format.

CREATE FUNCTION dbo.fnNumberComma (@iVal bigint)

RETURNS VARCHAR(30)

AS

BEGIN

DECLARE @oVal VARCHAR(30)


SELECT @oVal = SUBSTRING(CONVERT(VARCHAR(30), CONVERT(money, @iVal),1),1,LEN(CONVERT(VARCHAR(30),CONVERT(money,@iVal),1))-3)

RETURN (@oVal)

END

GO



SELECT t.name, dbo.fnNumberComma(rows) as [RowCount]

FROM sys.partitions p

   INNER JOIN sys.tables t

       ON p.OBJECT_ID = t.OBJECT_ID

WHERE p.index_id IN (0,1)

ORDER BY rows DESC;

top+10+3

Going Into Standby Using Powershell

Since moving to Windows 8 on my primary laptop I have found myself using PowerShell to perform shutdown and restart tasks. For me hitting the charms and going through the options to do a simple shutdown was just way too much work, especially when I could just hit the PowerShell icon and type stop-computer or restart-computer. 

One annoyance I found was that there was no pause-computer or equivalent to put the machine into standby, something very useful for a laptop.

Given that this was something I wanted to do I put something a new function in my profile to give me that ability. Now I can just type pause-computer and it will go into standby. Script below…

Function Pause-Computer 

    {

        Write-Output "Going to standby..."

        Start-Sleep -Seconds 1

        &"$env:SystemRootSystem32rundll32.exe" powrprof.dll,SetSuspendState Standby

    }

Developing To Specifications

I’m a DBA. As a class of people you will find that DBAs have a tendency to rant a little about developers. I would certainly be someone that you would find in that category. The trouble is that most of the time I don’t think that it is the developers fault, it is just a case of shooting the messenger.

As an example let’s look at a new database release that was being asked.

The biggest issue was that the code was provided so late in the cycle that the issues found (and there were a great many of them) could not be fixed. Why could they not be fixed? Because an arbitrary release data had been provided to the customer and this data could not slip for any reason whatsoever. Artificial deadlines, one of the worst things that devs and DBAs have to deal with.

The developers agreed to get the code fixes added to their backlog and to get it into a sprint for fixing in the next month. So after much discussion and a firm commitment we decided to move ahead with the release.

My next question to the dev team was “how are you going to get the data out? You have a few procs here for loading data into tables, but nothing for being able to consume that data afterwards.”

The response was a little stunning:

The only requirement to get done by is to have the data written to a database. After this deploy, we are going to create a way to get the data out

Outstanding. Way to develop to requirements.

In this instance I cannot really place blame on the dev team, they are just doing what is asked of them. I think the bigger problem is sitting with the folks who are gathering the business requirements and translating those, along with timelines, up to the business.

I think that it might be time for DBAs to stop pointing fingers at devs and start holding accountable those who are causing these problems, namely the Business Analysts and Project Managers.

Who’s with me on this?