Tag: SQLBP

How To Interview–A Quick Guide

This post was originally going to be called “Interview Dos and Don’ts” but I didn’t want it to get confused with interviewing on DOS, or to start a grammar war on the use of Do’s or Dos.

I had the chance to interview a couple of candidates for a senior developer position today. Each one of the interviewees had greater than 12 years experience in developing for SQL Server. Each one of the interviewees had different styles in the interview process.

Rather than focus on specific questions and answers in this post I actually want to look at how each candidate handled themselves in general and in response to questions. All the interviews here were performed by our DBA team of three people.

 

Candidate 1

The first interviewee had a great deal of experience and was already seated when we walked in the room. Rather than stand and greet everyone the candidate stayed in his chair and just reached out his hand. Not a good start.

I figure that everyone has a quick 30 second “this is what I can do” spiel, and so that was the first thing asked. In that situation I would expect someone to give a very brief overview of their knowledge and experience. For some unknown reason this person decided to give a life biography in a rather condescending manner.

Technical questions followed. Each one of those was met with the kind of attitude that indicated “I’m better than you”. Many of the answers were wrong. When asking for clarification on answers we were frequently met with a wave of the hand and “I don’t really know”.

At one point a question was asked and the follow up used some obvious logic to disprove the original answer. At this point I would expect the candidate to change one of the answers. They didn’t. A new and third answer appeared that completely (apparently) disproved the logic. Pretty stunning.

After what was an all too long period of interviewing this person, we let them go and moved on to the next person.

 

Candidate 2

Our second victim interviewee immediately stood and greeted us when we walked in. He had a nervous smile on his face as he shook hands and introduced himself.

The elevator pitch went ok and there was none of the obnoxious behavior the first candidate. Then we started in on the tech questions and things started going a little sour.

I can understand folks getting a little nervous when interviewing but this candidate seemed to go to pieces. Questions were frequently answered with “I know this, but I can’t remember” and a panicked look. With this person being so obviously flustered we tried to get them back on track by throwing over a couple of softball questions. At this point I think we’d lost the candidate entirely and brain panic took over.

At the end of the interview he almost seemed on the verge of tears and it seemed to us that he started trying to play the emotional angle to try and have us feel sorry for him in an effort to have us walk away with a good impression.

We’re DBAs, empathy is bred out of us so this didn’t work.

 

Candidate 3

The final candidate greeted us with a warm friendly smile as he stood, shook everyone’s hand and introduced himself. He repeated back out names to be sure that he had heard them correctly and settled into his seat.

The elevator pitch was spot on. We got a very quick overview of his skills and what he’d been working on recently.

When we started in on the technical questions he gave obvious consideration of what we were asking, looked for clarifying points for some things and provided concise answers. By no means was he able to answer each and every question, however when he did not know he was quick to admit as such, provide what he felt would be the answer and a pointer to where he would look for a definitive answer.

This prompted a question around community involvement in which he mentioned that he liked to attend events like SQLSaturday and read blogs. When asked about favorite blogs he was able to provide a few and name the sessions and speakers from the most recent local SQLSat event. This was a huge plus as this candidate obviously enjoys working with SQL Server enough to spend time and effort outside to increase his knowledge outside of regular work activities.

When a couple of questions were answered incorrectly and we questioned the responses we were met with a genuine interest in getting the correct information and were able to build conversation around those items.

We were sad when time was up and we had a great time sitting and interviewing this person.

 

So which one would you choose?

If you were an interviewer, technical responses aside, which one of these candidates would be at the top of your list?

Captain Obnoxious?
Nervously Tearful?
Quietly Comfortable?

I know which one was at the top of ours.

SQL Clustering–Network Binding Order Warnings

In setting up my Windows 2008 R2/SQL 2008 R2 cluster this week I came across a warning in the Setup Support Rules stating that “The domain network is not the first bound network.”

 Network Binding Order Error

 

This didn’t make any sense to me as I had been very careful in setting the binding order in the network connections properties:

Binding Order Set

 

Deciding to look for further information on this I opened up the Detail.txt file from the Setup Bootstrap/Log directory and searched for network binding order:

Binding Order Setup Log

 

Strangely the first bound network is Local Area Connect* 9. I had no idea what this network was as it was not listed in the network adapter list.

I remembered a discussion on Twitter a couple of weeks ago between Allan Hirt (blog|twitter) and Denny Cherry (blog|twitter) where they talked about this exact problem and the fact that it was a hidden cluster adapter which needed to be moved in the binding order, and that had to be done in the registry.

I opened up regedit and drilled down to HKEY_LOCAL_MACHINESYSTEMCurrentControlSetservicesTcpipLinkage and opened up the Bind value:

Registry Showing Incorrect Binding Order

 

I’ve highlighted the first item in the bind order. The trouble is that this GUID value really means nothing to us…time to crack open PowerShell

gwmi Win32_NetworkAdapterConfiguration | where-object {$_.IPEnabled -eq "True"} | ft Description, SettingID -auto

List of active adapters

 

As you can see the Microsoft Failover Cluster Virtual Adapter GUID matches the first bound value. Back to the registry to move that adapter down the list with a quick cut and paste.

*Warning, playing in the registry can cause system instability, BSOD problems, singularities and quantum fluctuations*

Registry with correct binding

 

Starting up the cluster install again and the Network binding order warning is gone

 

Hopefully the next releases of Windows and SQL won’t have this problem. In the meantime this is good to know. Thanks Allan and Denny for engaging on Twitter and leading me to the answer to the problem.

T-SQL Tuesday #18–CTE A Simpler Form Of Recursion

It’s T-SQL Tuesday time folks. Bob Pusateri (blog|twitter) is the host and has chosen Common Table Expressions (CTEs) as the topic of the month.

Several ideas came to mind for writing about CTEs, one of the best uses I’ve seen for one recently was to grab the name of the most recent backup file for a database. You’ll have to ask Aaron Nelson (blog|twitter) to hook you up with that one though.

I thought I’d write about an interesting problem that was posed to me in an interview a couple of months ago.

 

Here’s a table

I was given a table with two columns; ManagerID, EmployeeID.

This table was populated with a few values thusly:

USE TempDB

GO

create table #ManagersEmployees (ManagerID int, EmployeeID int)

insert into #ManagersEmployees 

values(1,2), (2,3), (2,4), (2,5), (3,6), (3,7), (3,8)

    , (4,10),(5,11),(5,12), (12,13), (12,14)

GO

I was asked to write a recursive procedure to pull out the manager, employee tree for a given ManagerID.

 

CTEs to the rescue

Having done a little work with CTEs and understanding that I could easily write a recursive query using them I was able to quite quickly put together a script to pull the information needed. By throwing it into a procedure it could quickly and easily be executed.

CREATE PROCEDURE ManagerRecursion_CTE @ManagerID INT

AS

SET NOCOUNT ON

;WITH Managers_CTE (ManagerID, EmployeeID )

AS ( SELECT ManagerID, EmployeeID FROM #ManagersEmployees  

        WHERE ManagerID = @ManagerID

UNION ALL

    SELECT e.ManagerID, e.EmployeeID 

        FROM #ManagersEmployees e 

            INNER JOIN Managers_CTE c on e.ManagerID = c.EmployeeID)

SELECT * FROM Managers_CTE ORDER BY ManagerID, EmployeeID

GO

 

I tested and this worked nicely, it was a simple solution and provided the requested results.

 

That’s not recursion

The trouble is that while the results were not correct I was advised that this was not recursive and did not meet the criteria. Back to the drawing board then.

After a lot more work I came up with the following:

CREATE PROCEDURE ManagerRecursion_NonCTE @ManagerID INT

AS

SET NOCOUNT ON

DECLARE @rowcnt INT, @lastrow INT

DECLARE @Tbl_Results TABLE (rowid INT IDENTITY(1,1), ManagerID INT, EmployeeID INT)

 

INSERT INTO @Tbl_Results (ManagerID, EmployeeID)

SELECT ManagerID, EmployeeID

FROM #ManagersEmployees

WHERE ManagerID = @ManagerID

 

SET @rowcnt = @@ROWCOUNT

SET @lastrow = 0

WHILE @rowcnt > 0

BEGIN

INSERT INTO @Tbl_Results (ManagerID, EmployeeID)

SELECT m.ManagerID, m.EmployeeID

FROM #ManagersEmployees m

    INNER JOIN @Tbl_Results t

        ON m.ManagerID = t.EmployeeID

WHERE rowid > @lastrow

 

SELECT @rowcnt = @@ROWCOUNT

 

SELECT @lastrow = @@IDENTITY - @rowcnt

END

SELECT ManagerID, EmployeeID FROM @Tbl_Results ORDER BY ManagerID, EmployeeID

GO

 

I tested this and got back the same results as with the first procedure with all the values I passed in. Deep breath on this one as it was pushing the limits of what I could produce on the spot in an interview.

 

That’s still not recursion

Again, while the results we correct this was not recursive. It was back to the drawing board once more. This time I had to admit defeat, however did tell the interviewer that I would work on a solution at home and email it in. He gave me his contact information, we completed the rest of the interview and I went home determined to get the right data in the manner that the interviewer wanted.

After a whole bunch of reading and a lot of work I finally came up with correct results in a recursive procedure which I emailed in to get feedback.

CREATE PROC ManagerRecursion @EmpID INT, @InnerLoop INT = 0

AS

BEGIN

    IF @InnerLoop = 0

        BEGIN

        CREATE TABLE #Tbl_Results (ManagerID INT, EmployeeID INT)

        END

            INSERT INTO #Tbl_Results (ManagerID, EmployeeID)

            SELECT ManagerID, EmployeeID FROM #ManagersEmployees WHERE ManagerID = @EmpID

 

            SET @EmpID = (SELECT MIN(EmployeeID) FROM #ManagersEmployees WHERE ManagerID = @EmpID)

 

            WHILE @EmpID IS NOT NULL

            BEGIN

                EXEC ManagerRecursion @EmpID, 1

                SET @EmpID = (SELECT MIN(EmployeeID) FROM #ManagersEmployees WHERE EmployeeID > @EmpID 

                    AND EmployeeID NOT IN (SELECT ManagerID FROM #Tbl_Results)

                    AND EmployeeID IN (SELECT EmployeeID FROM #Tbl_Results))

            END

    IF @InnerLoop = 0   

        BEGIN

        SELECT * FROM #Tbl_Results order by ManagerID, EmployeeID

        DROP TABLE #Tbl_Results

        END      

END

 

GO

 

Unfortunately no feedback was forthcoming. I felt good about providing this solution despite that. I enjoy a challenge and this was certainly one of those.

 

So what was the right way?

That depends on who you ask. For me the first way was the right was. It performed well, the code was clean and easy and required a minimum amount of development. I feel that the solution here was exactly the reason that CTEs were created in the first place.

The second solution was a lot more work, the query got more complex and it does not perform as well as the first.

The final procedure was true recursion in that the procedure calls itself over and over again until all of the results are returned. It’s a loop that makes cursors look like they perform well. It was easily the worst performing of the three.

 

It all goes to show there’s more than one way to get the results you need. It’s also interesting how an example like this shows just how much work the SQL development team have done to help reduce complexity and improve performance.

Stop Logging Successful Backups

It’s great that SQL writes to the Event log and SQL log every time a backup completes. You get to see lots of data saying that database master was backed up and database model was backed up, then that msdb was backed up etc…

 

Is it really that useful?

Well, at least it can be useful to have in there. The thing is there are better ways to find out when your databases were last backed up (such as using PowerShell or querying MSDB).

The downside is that it quickly fills your logs with information that you don’t really need to know. It gets even worse if you perform frequent transaction logs backups. If you have three databases that are having their logs dumped to disk every 20 minutes all of a sudden your SQL log is next to useless. All you are seeing is backup success messages and not much else. It makes it all too easy to miss important the things that you really need to pay attention to.

 

There is hope

Yes, it’s true! You can stop SQL from dumping that information into the log.

By enabling trace flag T3226 you kill these messages dead. Worry not, any failure will still be written to the logs however all those pointless notifications will vanish. All of a sudden your logs will be cleaner and meaner, important things will stand out and your scroll wheel can take a break.

 

How to enable T3226

Open up SQL Configuration Manager and the SQL service. Under the Advanced tab you’ll see startup parameters. Just tack ;-T3226 on to the end, apply the changes and restart SQL.

 

The steps are further outlined at http://msdn.microsoft.com/en-us/library/ms345416.aspx and http://msdn.microsoft.com/en-us/library/ms190737.aspx.

 

Go ahead, clean up those log files…you know you want to.

Adding LUNs In VMware – A Cautionary Tale

Over the last 10 years plus of being a DBA I’ve performed LUN manipulation tasks on dozens of occasions with no errors or problems. Other than adding new disks to Windows 2003 clusters I’ve never had to take SQL Server offline to perform these tasks either.

 

A simple request

I needed a new drive to be added to a server I had just taken over in order to perform backups of some large databases. This is something that the SAN engineers and Windows support teams have a lot of experience doing and so we weren’t concerned that there might be any problems.

 

The uh-oh moment

I was running a couple of scripts on the SQL instance hosted on that server when all of a sudden I had a corruption errors show up. The dreaded error 823 sat there in red staring me in the face. I had just taken over this machine, there were no backups and right as we were trying to add a LUN to be able to perform those things went boom.

My stomach hit the floor.

 

Remembering the words of Douglas Adams

Don’t panic!

The last thing you want to do as a DBA when there’s a problem is panic. Panic I did not. I took a deep breath and walked down to speak with the SAN and Windows admins to find out what might have happened. They reported no problems, showed that the new LUN was on there and a new drive created ready for me to start performing backups.

My nerves calmed somewhat I went back to my desk and started trying to figure out what the problem was. The error logs were showing that the data files did not exist. Very strange as after connecting to a remote desktop session on the server I was able to see the drive and files without any issue even though SQL could not.

I started digging through the Windows event logs to see if anything stood out. Right at the time that the errors started happening a single error showed in the system event log:

Event Type: Error
Event Source: PlugPlayManager
Event Category: None
Event ID: 12
Description: The device ‘LUN SCSI Disk Device’ (SCSIDisk&Ven&Prod_LUN&Rev4&23432&3&2) disappeared from the system without first being prepared for removal.

The error stated that the LUN hosting the data files lost connectivity and came back online almost immediately.

 

The fix

With SQL not able to see the files or drive I had no choice but to restart the instance in the hope that things would come back online. With bated breath I restarted the service and waited to see what would happen. One by one the databases started coming back online and my pulse slowed a little. Once all the databases showed as recovered without error I kicked off a DBCC CHECKDB on each one so that I could feel comfortable that everything was good and happy. The DBCC execution took 7 hours and everything came back clean. Happy with this knowledge I kicked off a backup and went to bed.

 

So what happened?

Unlike any of the SQL instances I have worked with before this one was running on a VMware ESX virtual machine and the SQL LUNs, rather than being a part of the VM itself, were raw mapped drives. A raw mapped drive is handled using raw device mapping which creates a symbolic link from within the VMware file system to external LUN (at least that’s what my admin tells me and I believe him).

What we found was a quirk that when VMware performed a bus scan to find the new LUN and create the mapping it very briefly dropped connectivity to the existing raw mapped drive. This is what caused Windows to lose the drive and SQL to lose access to the files. The drive was down for about 3 seconds, more than enough time for SQL to recognize a problem.

 

Lesson learned

From now on I will be very careful with any kind of LUN changes on VMware machines with raw mapped drives. I’ll ensure that we schedule an outage and take SQL offline while any adjustments are made just in case something like this crops up again.

Fun With Windows Logins In SQL

Sometimes you come across quirkiness when playing with SQL Server. Once in a while those things are related to security. This happens to be one of those times.

 

Release My Code

I was provided with a script by the dev team to create a new proc and grant permissions on that proc to a Windows login that already had access to SQL. No big deal. I executed the script, received no errors and happily went about the rest of my day. A couple of hours later I got an email from the developer saying that when they attempted to call the proc they were getting an error stating that the server principal was not able to access the database under the current security context.

 

Fix It Damn You

After much troubleshooting to no avail I ended up scripting out all the users permissions, dropping and recreating the user, then applying those permissions back again. Everything worked and the developers went away happy. I myself was not happy and so restored an old copy of the database to a test server to try and find out what the deal was. I found something interesting. Join me in a short but fascinating journey into the world of the Windows login…

 

Setup A Test

The following is based upon test cases in SQL 2008 and 2008 R2. I have not tested on SQL 2005 and so do not know if the behavior might be different.

Create yourself a new shiny Windows login on your machine and then. Mine’s called SQL1 (because I’m original like that). Now add that as a login within SQL.

CREATE LOGIN [AnonyLPTPSQL1] FROM WINDOWS

GO

Just because validation is good it’s worth double checking that the login exists

SELECT Name, type_desc FROM sys.server_principals 

    WHERE name = 'AnonyLpTpSQL1'

    ORDER BY name

 

Looks like we’re good on that front.

Now let’s create a dummy database for our security test and add a table and proc to that.

CREATE DATABASE SecurityTest

GO

 

USE SecurityTest

GO

 

CREATE TABLE dbo.QuickTest (ID INT, Col2 VARCHAR(30))

GO

INSERT INTO dbo.QuickTest VALUES (1, 'some text'), (2, 'different text');

 

CREATE PROCEDURE dbo.ShowData

AS

    SELECT ID, Col2 FROM dbo.QuickTest

GO

Feel free to execute the proc at this point just to be sure that you get results.

 

 Any Users There?

We’ve not added any users to this database, and most certainly that new login we just created isn’t going to be there. In the interests of being sure let’s check real quick.

SELECT Name FROM SecurityTest.sys.database_principals 

    WHERE name = 'AnonyLpTpSQL1'

    ORDER BY name

Nope, no user there.

 

Now For Some Magic

If we try to grant execute on the ShowData proc to that login it’s going to fail because that user doesn’t exist in the database, right? I know for sure that’s what happens with SQL logins, after all I’ve seen the “Cannot find the user ‘<username>’, because it does not exist or you do not have permission” error on countless occasions.

Let’s try it anyway and see what happens

GRANT EXECUTE ON SecurityTest.dbo.ShowCols TO [AnonyLPTPSQL1]

Yup, there’s the error…wait, what? There’s no error, it was successful. That must be a mistake surely?

Well let’s check the database principals again

SELECT Name FROM SecurityTest.sys.database_principals 

    WHERE name = 'AnonyLpTpSQL1'

    ORDER BY name

So it appears as though a user has been created for the login without us needing to specify it to do so. In actual fact it’s gone one step further and also created a schema.

SELECT name, principal_id FROM SecurityTest.sys.schemas    

    WHERE name = 'AnonyLpTpSQL1'

 

 

But Wait…

Let’s try to execute the proc under the context of that user

EXECUTE AS USER = 'AnonyLpTpSQL1';

EXEC SecurityTest.dbo.ShowCols;

 

Hey, that’s the error that the devs were reporting they were having. This makes no sense though. We granted permissions on the proc and there’s a user there and everything, what gives?

 

Security Fundom

So there’s a user and there’s permissions on the proc but we get an error. Most strange. Well, actually I know we granted permissions on the proc, but did they actually make it there? We can use sys.database_permissions to double check this. In this case we’ll check permissions for all the Windows users in the database.

USE SecurityTest

GO

SELECT 

      dpri.name

    , dper.class_desc

    , object_name(dper.major_id) as ObjectName

    , permission_name 

FROM 

    sys.database_permissions dper

        INNER JOIN sys.database_principals dpri

            ON dper.grantee_principal_id = dpri.principal_id

WHERE dpri.type = 'U'

 

Yup, the permissions are definitely there. Execute on ShowCols. So that’s not an issue.

Note, there’s another row in the table showing CONNECT permissions for dbo. This permission is missing for our login. Interesting. Let’s add that for our user.

GRANT CONNECT ON DATABASE::SecurityTest TO [AnonyLPTPSQL1]

Running the permissions script again now gives 3 rows.

Now if we attempt to run the proc under the context of the login.

EXECUTE AS USER = 'AnonyLpTpSQL1';

SELECT USER_NAME(); -- Confirm we're using the right user

 

EXEC SecurityTest.dbo.ShowCols;

 

It worked!

 

So What Is Going On Here?

By attempting to assign permissions to an object in a database to a Windows login a schema and user are automatically created however the login is not granted the right to connect to the database in order to be able to utilize those permissions. We can easily grant the ability to connect to the database that is required for that login to use the already assigned permissions.

This is a very strange behavior (especially given that SQL logins do not work the same way). I’m sure that there is reason behind this and use cases, even though I’ve not been able to find any documentation describing this.

If you’ve run across this before, have use cases or know of documentation describing the behavior please let me know in the comments and I’ll update the post.

Are You Putting Out?–The SQL OUTPUT Clause

I had some code come through the other day from a developer, it was designed to perform an insert and then return the inserted data back to the client for display. This kind of code is probably seen very frequently out there in the real world.

 

Quick, Create A Table

As we’ll be playing around a little lets throw in a small table just for demonstration purposes:

CREATE TABLE OutputTesting

    (

      ID INT IDENTITY(1,1) PRIMARY KEY CLUSTERED

    , RecordDate DATETIME NOT NULL 

    , EnteredValue VARCHAR(100) NOT NULL

    )

 

 

Old School Insert

Here was the developer code that came through:

BEGIN TRAN

        INSERT INTO OutPutTesting (RecordDate, EnteredValue)

        VALUES (getdate(), 'Some other text')      

SELECT ID, RecordDate, EnteredValue FROM OutputTesting WHERE ID = SCOPE_IDENTITY()

COMMIT TRAN

The first thing of note here is a complete lack of error handling in the statement. Sure we have a begin and commit tran but if there’s a problem with the insert we just don’t deal with it and a transaction will be left open (not to mention nothing being returned to the client).

The second thing to notice is how this would return data back to the client. We would actually perform a query against the table using SCOPE_IDENTITY() in our where clause. SCOPE_IDENTITY() contains the value inserted within the current session and current scope (meaning that it won’t return the value for any table that might have had an insert due to a trigger).

This kind of query was heavily used back in the 2000 days but now there is a much better way to do this. Enter…

 

The Output Clause

Whenever an insert, update, delete or merge is performed you can use OUTPUT to return back the impacted data.

Usage is very simple, just plop the OUTPUT right in the middle there and you are done. So taking the initial example (and for now ignoring the error aspects) we can simply rewrite it as the following:

BEGIN TRAN    

        INSERT INTO OutPutTesting (RecordDate, EnteredValue)

        OUTPUT INSERTED.ID, INSERTED.RecordDate, INSERTED.EnteredValue 

        VALUES (getdate(), 'Some other text')

COMMIT TRAN

We got back results exactly the same way as we would have in the old school fashion and did so without having to perform the extra read against the OutputTesting table.

 

Where Output Falls Down

Nothing’s perfect right? The thing with OUTPUT is that it will attempt to return results even if your statement fails. Let’s rewrite the query slightly and use the TRY…CATCH syntax to handle errors:

BEGIN TRY

    BEGIN TRAN    

        INSERT INTO OutPutTesting (RecordDate, EnteredValue)

        OUTPUT INSERTED.ID, INSERTED.RecordDate, INSERTED.EnteredValue 

        VALUES (getdate(), 'Some text')

    IF @@TRANCOUNT > 0

    COMMIT TRAN

    

END TRY

 

BEGIN CATCH

    SELECT ERROR_MESSAGE() AS InsertError

    ROLLBACK TRAN

END CATCH

Ok, great, this worked as expected. Let’s throw a spanner in the works and pass invalid data to the insert and see what happens:

BEGIN TRY

    BEGIN TRAN    

        INSERT INTO OutPutTesting (RecordDate, EnteredValue)

        OUTPUT INSERTED.ID, INSERTED.RecordDate, INSERTED.EnteredValue 

        VALUES ('2011/01/301', 'Some text')

    IF @@TRANCOUNT > 0

    COMMIT TRAN

    

END TRY

 

BEGIN CATCH

    SELECT ERROR_MESSAGE() AS InsertError

    ROLLBACK TRAN

END CATCH

Hey, what gives? The insert failed (we have our insert error there) yet it still attempted to return results for the OUTPUT.  We could end up with all kinds of problems if we tried to return that back to the client. People would freak out, it would be anarchy!

 

A Better Way To Handle Output

If we’re going to go ahead and use OUTPUT it’s always best to push that data into a table variable and then use that to return results back to the client if the insert is successful, otherwise we can just pass back the error:

DECLARE @OutputResults TABLE (

      ID INT 

    , RecordDate DATETIME

    , EnteredValue VARCHAR(100))

    

BEGIN TRY

    BEGIN TRAN    

        INSERT INTO OutPutTesting (RecordDate, EnteredValue)

        OUTPUT INSERTED.ID, INSERTED.RecordDate, INSERTED.EnteredValue INTO @OutputResults

        VALUES (getdate(), 'Some text')

    IF @@TRANCOUNT > 0

    COMMIT TRAN

    SELECT * FROM @OutputResults

END TRY

 

BEGIN CATCH

    SELECT ERROR_MESSAGE() AS InsertError

    ROLLBACK TRAN

END CATCH

Now if we this time attempt to insert invalid data:

DECLARE @OutputResults TABLE (

      ID INT 

    , RecordDate DATETIME

    , EnteredValue VARCHAR(100))

    

BEGIN TRY

    BEGIN TRAN    

        INSERT INTO OutPutTesting (RecordDate, EnteredValue)

        OUTPUT INSERTED.ID, INSERTED.RecordDate, INSERTED.EnteredValue INTO @OutputResults

        VALUES ('2011/01/301', 'Some text')

    IF @@TRANCOUNT > 0

    COMMIT TRAN

    SELECT * FROM @OutputResults

END TRY

 

BEGIN CATCH

    SELECT ERROR_MESSAGE() AS InsertError

    ROLLBACK TRAN

END CATCH

All we get back is the actual insert error, so much cleaner.

We should really take the error handling a step further and use RAISERROR and pass in all the parameters, but this is at least a solid foundation and far better than what we had previously.

 

Output Isn’t Just For Inserts

Heck no, output is good for all you DML operations.

Delete will tell you the records you just removed:

DELETE FROM OutputTesting

OUTPUT DELETED.ID, DELETED.RecordDate, DELETED.EnteredValue

WHERE ID = 3

 

Update can give you information on what the old and new values are for the record:

UPDATE OutputTesting
SET RecordDate = DATEADD(d, 1, GETDATE())

OUTPUT INSERTED.ID, INSERTED.RecordDate AS NewDate, DELETED.RecordDate AS OldDate

WHERE ID = 2

 

Merge has a little bonus OUTPUT feature called $action which tells you what what kind of DML action was performed as well as giving you the data:

DECLARE @InputTable TABLE (ID INT, RecordDate DATETIME, EnteredValue VARCHAR(100))

INSERT INTO @InputTable VALUES 

    (1, GETDATE(), 'Some new text'), 

    (2, GETDATE(), 'Some other text'),

    (500, GETDATE(), 'New text')

 

MERGE OutputTesting    as Target

USING (SELECT ID, RecordDate, EnteredValue FROM @InputTable) AS Source

ON Target.ID = Source.ID

WHEN MATCHED THEN

    UPDATE SET Target.RecordDate = Source.RecordDate, Target.EnteredValue = Source.EnteredValue

WHEN NOT MATCHED BY TARGET THEN

    INSERT (RecordDate, EnteredValue)

    VALUES (Source.RecordDate, Source.EnteredValue)

    OUTPUT $action, INSERTED.*, DELETED.*;

 

Output Is Your Friend

Remember folks, old school has it’s place, like a Flock Of Seagulls hairstyle at an 80’s revival concert, but there are new and better things now.

Output loves you, it gives you lots of information and it won’t kill your server to give it to you. Output is putting out, are you?

Keeping MSDB Clean

I have a bunch of monitoring routines running against my SQL Servers. One of these checks available disk space on drives. Today I was surprised to see that one of my SQL system drives was looking pretty full. This is unusual to say the least because I dedicate a drive to the system (master/model/msdb) databases. Upon connecting to the server I discovered that MSDB had blown up to over 12GB in size.

 

Checking For Large Tables

I used the following query to look for large tables in MSDB (this query is completely based upon running profiler against SQL Server while running a Disk Usage By Table report.

SELECT TOP 10

    a3.name+'.'+a2.name AS [tablename],

    a1.rows as row_count,

    (a1.reserved + ISNULL(a4.reserved,0))* 8 AS reserved 

FROM

    (SELECT 

        ps.object_id,

        SUM (CASE WHEN (ps.index_id < 2) THEN row_count ELSE 0 END) AS [rows],

        SUM (ps.reserved_page_count) AS reserved

        FROM sys.dm_db_partition_stats ps

        GROUP BY ps.object_id) AS a1

LEFT OUTER JOIN 

    (SELECT 

        it.parent_id,

        SUM(ps.reserved_page_count) AS reserved,

        SUM(ps.used_page_count) AS used

        FROM sys.dm_db_partition_stats ps

        INNER JOIN sys.internal_tables it ON (it.object_id = ps.object_id)

        WHERE it.internal_type IN (202,204)

        GROUP BY it.parent_id) AS a4 

        ON (a4.parent_id = a1.object_id)

INNER JOIN 

    sys.all_objects a2  

        ON ( a1.object_id = a2.object_id ) 

INNER JOIN 

    sys.schemas a3 

        ON (a2.schema_id = a3.schema_id)

WHERE 

    a2.type <> N'S' and a2.type <> N'IT'

ORDER BY 

    reserved desc,[tablename]

The results were a little shocking to say the least.

  • sysmail_mailitems contained 717,878 rows and was over 9GB in size
  • backupfile/backupmediafamily/backupmediaset/backupset combined had over 8.6million rows and totalled over 2.5GB in their space usage

 

Forgetting To Clean Up

MSDB is something frequently forgotten. Cleaning up job history is nice and easy, it’s just a setting in the SQL Agent. Cleaning up your sent emails and backup records is not as straightforward. Why Microsoft hasn’t included this I don’t know, especially given that they have provided stored procedures to perform the cleanup of this data.

My server has been in service since March 2009. That’s a lot of backups and a lot of emails sent. I needed to get a handle on cleaning this up and then implementing a process to ensure that these tables didn’t get out of control again.

 

Deleting Old Mail Items

Microsoft have provided the procedure sysmail_delete_mailitems_sp to perform cleanup of the sysmail_mailitems table. Two parameters can get passed in:

  • @sent_before – datetime – used to delete mail items sent prior to a certain date
  • @sent_status – varchar(8) – used to force deletion of only a certain sent status

In this instance I was not concerned with what items were going to be deleted, I just wanted to get rid of everything older than 14 days.

Here’s a bad way to do this:

DECLARE @DATE DATETIME = dateadd(d, -14, getdate())

EXEC msdb.dbo.sysmail_delete_mailitems_sp @sent_before = @DATE

Actually let me clarify, this isn’t a bad way to do this if you don’t have 2 years and 9GB of data to delete. In fact this is the script I implemented in a job to continue to keep the table size down. Running this against such a large amount of data just blew up the transaction log, filled the drive and then forced a rollback.

Going a slightly smarter route I performed the deletion in small batches (5 days worth of email at a time) using the following:

DECLARE @maildate DATE = '2009-06-15'

WHILE @maildate < DATEADD(d, -14, GETDATE())

    BEGIN

        EXEC msdb.dbo.sysmail_delete_mailitems_sp @sent_before = @maildate

        SELECT @maildate = DATEADD(d, 5, @maildate)

    END

This took a little while however the transaction log stayed small and it completed without incident.

 

Deleting Backup History

Again Microsoft do a kindness to help cleanup old data. This time it’s the procedure sp_delete_backuphistory which we pass

  • @oldest_date – datetime – the oldest backup history record we want to keep

 

Having learned my lesson from deleting the old mail items I again attempted to perform the delete in batches using:

DECLARE @backupdate DATE = '2009-06-15'

WHILE @backupdate < DATEADD(d, -90, @backupdate)

    BEGIN

        EXEC sp_delete_backuphistory @backupdate

        SELECT @backupdate = DATEADD(d, 15, @backupdate)

    END

I figured this would take a while. Half an hour later it was still churning away. Wait stats showed a crazy amount of i/o going on. This didn’t seem right so I stopped the script and took a look at the execution plan.

A pretty large plan which makes it difficult to look through. Luckily a few weeks ago I had downloaded the completely awesome SQL Sentry Plan Explorer from SQLSentry.net. If you have not installed this FREE tool yet then I recommend you go out there right now, well after finishing reading this post, and get it. It makes execution plans not just readable but sortable, viewable and workable. Just using Plan Explorer once can save you hours. I just want to add I am not associated with SQL Sentry in any manner nor am I getting any kickback, the tool is just that good.

Pasting the estimated execution plan XML into Plan Explorer showed multiple clustered index scans against the backupset table. It seems that a couple of critical indexes are missing from this table.

I was quickly able to ascertain that a where clause was being used on the backup_finish_date, which is indexed, however it needed to return the media_set_id which is not included in the index. Later on in the plan I could see that there was a join against the media_set_id which isn’t indexed.

These missing indexes were causing the massive i/o numbers I was seeing. To resolve I quickly added a couple of indexes.

USE msdb

go

IF NOT EXISTS (SELECT * FROM sys.indexes WHERE OBJECT_ID = OBJECT_ID('[dbo].[backupset]') AND name = 'IDX_BackupSet_FinDate_MediaSet')

CREATE NONCLUSTERED INDEX IDX_BackupSet_FinDate_MediaSet ON backupset(backup_finish_date) include (media_set_id)

 

IF NOT EXISTS (SELECT * FROM sys.indexes WHERE OBJECT_ID = OBJECT_ID('[dbo].[backupset]') AND name = 'IDX_BackupSet_MediaSet')

CREATE NONCLUSTERED INDEX IDX_BackupSet_MediaSet ON backupset(media_set_id)

This done I took another look at the estimated plan and things looked a great deal cleaner.

I kicked off the script again and this time it completed in under 5 minutes. Several million rows were deleted and 2GB of space reclaimed.

 

Keeping Things Clean

Not wanting to ever run into this situation again I added a job with a couple of steps to delete mail items older than 14 days and backup history older than 90 days.

This is done with a quick execution of the cleanup procedures. Thanks to the data being brought current and the job running every night I won’t have to worry about extended run times or MSDB getting out of control.

/* Cleanup old backup history */

DECLARE @backupdate DATE = DATEADD(d, -90, GETDATE())

EXEC msdb.dbo.sp_delete_backuphistory @backupdate

/* Cleanup old mail items */

DECLARE @maildate DATE = DATEADD(d, -14, GETDATE())

EXEC msdb.dbo.sysmail_delete_mailitems_sp @sent_before = @maildate

 

Final Thought

I hadn’t taken sufficient care in ensuring that old data was kept clean. In fact I hadn’t even realized that backup history and old mail items could be a concern. Since I found this today I’ve started going through and performing this cleanup action on each of my SQL instances. Once that’s done I’ll be using PowerShell to deploy the nightly cleanup job on every instance.

I highly recommend checking your own servers to see if you might be encountering the same problems.

T-SQL Tuesday #13–A Mysterious Business

It’s T-SQL Tuesday again and Steve Jones (blog|twitter), Mr SQLServerCentral, is hosting this month. Steve has provided a topic that is near and dear to the heart of pretty much every DBA – “What issues have you had in interacting with the business to get your job done”.

I really didn’t know what I could possibly write about this given that I’ve never had an issue with the business.

Stopped laughing yet?

Here’s an actual business challenge that I have had the fun to deal with in the last few months.

 

What is the maximum number of columns you can have in a table?

It’s a simple little question like that which instantly sets off the alarms in your head, especially when it comes from a Business Analyst. What worried me even more was that this individual had a habit of promising the moon on a stick to users without actually understanding the bigger picture of what might be involved.

The BA was delighted when I told him that the non-wide table limit was 1,024 columns (based upon the Maximum Capacity Specifications in BOL). This took the plotz-meter from rumble to gurgle.

I heard nothing more from the BA for a couple of months so figured it had been just a curiosity that I had answered, then…

 

Please deploy this work order to create a new table

A work order showed up in my queue. A script was attached to create a bunch of new tables. One of these tables was to have 514 columns, another 365 and another 234 along with a few others that were in the 30-40 column range. That’s some pretty wide tables. Each of these tables was to also have a whole host of indexes (more than 40 on one of them) and were expected to contain several million rows of data. There were also FK constraints (well at least there was some DRI).

With an impending sense of doom I asked why we were creating such wide tables and how they were going to be used. I had a hope that these were going to be used in some kind of OLAP solution. It was a faint hope given that we don’t have Analysis Services deployed anywhere.

I was advised that data was going to be loaded into the tables using a “real-time” ETL process to provide reporting functionality to the user base. There would be a bunch of reports written which could be called, there would also be ad-hoc reporting access to a small group of power users.

I could understand if all of these tables were denormalized to the max however to actually get useful data out would require performing a minimum of four joins.

 

The business needs them

When taken back to the development team they stated that they developed this to the requirements that they were given. I can understand and appreciate this, they are working with what they are provided. I have no doubt they would have questioned this requirement and just been asked to proceed.

I performed a code review and sent back my recommendations. Narrower tables, denormalization, fewer indexes (given that they could never cover any of the queries that would be executed). I also explained the huge performance issues that we were going to experience, the impact to the other databases on the server and how we were going to see some quite significant disk issues.

My recommendations were returned with “these need to go in, the business needs them and we have made a commitment to get them in by next week”.

 

You’re there to keep the boss happy

My concerns were obviously not being heeded so it was escalation time. I went to my boss and explained the situation. I went over the concerns I had, the expected performance problems we would experience, the bad design, the lack of denormalization which would have been the ideal solution for what the team was attempting to do.

It sat with the boss for a couple of days. Finally he got back to me. It had gone up the chain and come back down again. The decision was that “the business” needed this and until I could provide hard numbers that this was a problem in production it would go in.

That was a bit of an epiphany moment. I found out that I am not actually there to do the best job that I can. I’m not there to get the most out of each and every server, eek out a little more performance from a SQL instance and get things running all nice and smooth. I came to the realization that I was just there to keep the boss happy.

 

Performance isn’t everything

The new tables went into production. Disk queuing went through the roof, buffer cache went through the floor and my head went through the desk. Tickets were opened because things were performing so poorly, they were rerouted to the BA and closed as “expected behavior”.

I provided perfmon stats and I showed the severe impact that these new tables and processes were having on our environment. I sat down with management and explained what was going on over and over again figuring that some change would be made.

I tracked down a few of the power-users who were a part of “the business” and asked them what they thought of the whole situation. To a person they told me that this was not what they wanted, not what they asked for and not what they needed.

 

These tables still exist. Performance is still lousy. I’ve stopped taking this to management as it wasn’t keeping my boss happy. I would love to say that the business won, but it doesn’t actually look like the business did. It seems more than likely that this was the brainchild of one BA who got the ear of someone important and made their dream a nightmare reality.

T-SQL Tuesday #10 – Applying Indexes To Replicated Tables

It’s T-SQL Tuesday time again! This month Michael J. Swart (blog|twitter) is hosting and has given us the topic of indexes.

I’ve previously written about how you can find indexes that are not being used and get their sizes. Not being allowed to reuse that meant that I had to come up with something else.

 In my previous post on calculating replication schema options I documented a way of being able to change article options in transactional replication so that you could apply all the non-clustered indexes on your subscribing table that exist on the publishing one. But what if you are using replication to offload reporting and you need to create indexes that don’t exist on the publishing table?

You could always manually apply these indexes any time that you perform a snapshot but this requires manual work and can easily be forgotten.

Fortunately there is another option, you can have replication automatically apply a post-snapshot SQL script which will create those indexes for you. Here’s how…

 

Create a basic publication (snapshot or transactional) and add a table article. Perform the snapshot and compare indexes between the tables (in this example I am using the person.address table from AdventureWorks2008).

After the snapshot here’s how the indexes stand

 

For this example we’re going to create a new index on Admindb.person.address. To do this first we need to generate the index creation script

 

USE AdminDB

GO

IF NOT EXISTS (SELECT * FROM sys.indexes WHERE object_id = OBJECT_ID(N‘[person].[address]’) AND name = N’NCIX_Zip’)
CREATE NONCLUSTERED INDEX [NCIX_Zip] ON [person].[address]
(
    [PostalCode] ASC
)
WITH (ONLINE = ON, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
GO

 

Save this script to C:ReplicationScriptsCreateReportingIndex.sql

Now we change the existing publication so that whenever a snapshot is taken it will apply this script when applied to the subscriber. This change can be done in a couple of ways.

The quick and easy way is to use sp_changepublication and just pass in the publication name, property change and location of the script.

 

USE AdventureWorks2008
GO

EXEC sp_changepublication    
@publication = ‘Post Snapshot Test’ 
, @property = N’post_snapshot_script’  ,
@value = ‘C:ReplicationScriptsCreateReportingIndex.sql’
  , @force_invalidate_snapshot = 1 –must invalidate existing snapshots for this to take effect
GO

 

This can also be done using the GUI (right click on the publication and select Properties)

 

 

When you next perform a snapshot then the script will be applied. We can check that this actually worked first by looking at replication monitor where it will tell you that the script was applied.

 

And then you can check the table itself and confirm that the index exists

 

There are a couple of final things to note.

Usually you will specify a unc path for the SQL script so that you just have to maintain a single copy. You just need to ensure that the subscriber is able to access the post-snapshot script location.

Post-snapshots are not limited to creating indexes, in fact you can do pretty much anything just so long as the Distribution Agent has the access;

  • Send yourself an email to let you know that the snapshot has been applied on the subscriber
  • Apply table permissions
  • Create views

If you use replication and find yourself manually reapplying indexes it’s worth evaluating whether using post-snapshot scripts might save you some work.