Category: SQL

Don’t Shrink TempDB!

As a general practice don’t shrink any databases. Every once in a while there is a need to do so. I had one of those needs today.

 

My disk is full

Thanks to a cumulative effect of a misconfigured SQL install, incorrect location of TempDB and the server pagefile combined with a user running a huge query joining 20+ tables with no where clause we had a drive run out of disk space. Unfortunately this caused another process to break which needed to grow the transaction log on another database.

I contacted the business owner immediately explaining how we needed to move TempDB to another drive and restart SQL (in fact I’d already performed the alter statement to move TempDB off to it’s own LUN where it should have been all along). The business owner stated that we could not restart SQL until after the evening so we were left in a situation where certain things were failing.

 

Why not shrink TempDB?

That was my feeling. TempDB was 18GB larger than it had ever needed to be so I figured I could just shrink the data file (there was just one) and reclaim 50% of the difference to keep things chugging along until the restart. I attempted to perform the shrink on the file and it did nothing. I checked TempDB and nothing was using it, but still the shrink failed.

Interesting problem. Argenis Fernandez (blog|twitter) just joined my team and he came over to try and help. He suggested running DBCC FREEPROCCACHE and DBCC DROPCLEANBUFFERS given that the user impact would be minimal given the servers usage patterns. 

After running the DBCC commands I was able to perform the shrink, clean up the space and keep everything running.

 

Corruption – that’s why

Curious as to why running those DBCC commands allowed me to shrink the file I posted to #sqlhelp on Twitter asking why.

Pretty much immediately I got a reply from Paul Randal (blog|twitter) that made my stomach churn:

Dont Shrink Tempdb

 

Paul and Aaron Bertrand (blog|twitter) sent me links to a Microsoft KB article that clearly explains this.

From Paul I bring you the KB in a nutshell: unless you quiesce SQL Server, a TempDB data file shrink can cause corruption.

 

This post brought to you by yet another reason not to perform a shrink.

Presenting For The PASS PowerShell Virtual Chapter

A quick note, on Wednesday at 1PM EST I’ll be giving an online presentation on one of the two topics I submitted for the PASS Summit this year.

 The presentation will be heavy on the demo side of things and I’ll cover the basics of querying SQL Server using PowerShell, running scripts against multiple machines quickly and easily as well as bulk loading data into SQL.

Please check out http://www.powershell.sqlpass.org/ for a link to the livemeeting.

 

Central auditing of SQL permissions with PowerShell & TSQL

Description: As a DBA it can be a challenge to know who has permissions to what SQL instances and what objects. The more instances you have the more complex that task. In this presentation I’ll share a method using PowerShell and TSQL that can be used to capture permissions from all of your SQL instances and load them into a centralized location. We’ll even take it a step further by auditing those permissions so that we can quickly and easily identify any that might have changed.

 

SQL Clustering–Network Binding Order Warnings

In setting up my Windows 2008 R2/SQL 2008 R2 cluster this week I came across a warning in the Setup Support Rules stating that “The domain network is not the first bound network.”

 Network Binding Order Error

 

This didn’t make any sense to me as I had been very careful in setting the binding order in the network connections properties:

Binding Order Set

 

Deciding to look for further information on this I opened up the Detail.txt file from the Setup Bootstrap/Log directory and searched for network binding order:

Binding Order Setup Log

 

Strangely the first bound network is Local Area Connect* 9. I had no idea what this network was as it was not listed in the network adapter list.

I remembered a discussion on Twitter a couple of weeks ago between Allan Hirt (blog|twitter) and Denny Cherry (blog|twitter) where they talked about this exact problem and the fact that it was a hidden cluster adapter which needed to be moved in the binding order, and that had to be done in the registry.

I opened up regedit and drilled down to HKEY_LOCAL_MACHINESYSTEMCurrentControlSetservicesTcpipLinkage and opened up the Bind value:

Registry Showing Incorrect Binding Order

 

I’ve highlighted the first item in the bind order. The trouble is that this GUID value really means nothing to us…time to crack open PowerShell

gwmi Win32_NetworkAdapterConfiguration | where-object {$_.IPEnabled -eq "True"} | ft Description, SettingID -auto

List of active adapters

 

As you can see the Microsoft Failover Cluster Virtual Adapter GUID matches the first bound value. Back to the registry to move that adapter down the list with a quick cut and paste.

*Warning, playing in the registry can cause system instability, BSOD problems, singularities and quantum fluctuations*

Registry with correct binding

 

Starting up the cluster install again and the Network binding order warning is gone

 

Hopefully the next releases of Windows and SQL won’t have this problem. In the meantime this is good to know. Thanks Allan and Denny for engaging on Twitter and leading me to the answer to the problem.

T-SQL Tuesday #18–CTE A Simpler Form Of Recursion

It’s T-SQL Tuesday time folks. Bob Pusateri (blog|twitter) is the host and has chosen Common Table Expressions (CTEs) as the topic of the month.

Several ideas came to mind for writing about CTEs, one of the best uses I’ve seen for one recently was to grab the name of the most recent backup file for a database. You’ll have to ask Aaron Nelson (blog|twitter) to hook you up with that one though.

I thought I’d write about an interesting problem that was posed to me in an interview a couple of months ago.

 

Here’s a table

I was given a table with two columns; ManagerID, EmployeeID.

This table was populated with a few values thusly:

USE TempDB

GO

create table #ManagersEmployees (ManagerID int, EmployeeID int)

insert into #ManagersEmployees 

values(1,2), (2,3), (2,4), (2,5), (3,6), (3,7), (3,8)

    , (4,10),(5,11),(5,12), (12,13), (12,14)

GO

I was asked to write a recursive procedure to pull out the manager, employee tree for a given ManagerID.

 

CTEs to the rescue

Having done a little work with CTEs and understanding that I could easily write a recursive query using them I was able to quite quickly put together a script to pull the information needed. By throwing it into a procedure it could quickly and easily be executed.

CREATE PROCEDURE ManagerRecursion_CTE @ManagerID INT

AS

SET NOCOUNT ON

;WITH Managers_CTE (ManagerID, EmployeeID )

AS ( SELECT ManagerID, EmployeeID FROM #ManagersEmployees  

        WHERE ManagerID = @ManagerID

UNION ALL

    SELECT e.ManagerID, e.EmployeeID 

        FROM #ManagersEmployees e 

            INNER JOIN Managers_CTE c on e.ManagerID = c.EmployeeID)

SELECT * FROM Managers_CTE ORDER BY ManagerID, EmployeeID

GO

 

I tested and this worked nicely, it was a simple solution and provided the requested results.

 

That’s not recursion

The trouble is that while the results were not correct I was advised that this was not recursive and did not meet the criteria. Back to the drawing board then.

After a lot more work I came up with the following:

CREATE PROCEDURE ManagerRecursion_NonCTE @ManagerID INT

AS

SET NOCOUNT ON

DECLARE @rowcnt INT, @lastrow INT

DECLARE @Tbl_Results TABLE (rowid INT IDENTITY(1,1), ManagerID INT, EmployeeID INT)

 

INSERT INTO @Tbl_Results (ManagerID, EmployeeID)

SELECT ManagerID, EmployeeID

FROM #ManagersEmployees

WHERE ManagerID = @ManagerID

 

SET @rowcnt = @@ROWCOUNT

SET @lastrow = 0

WHILE @rowcnt > 0

BEGIN

INSERT INTO @Tbl_Results (ManagerID, EmployeeID)

SELECT m.ManagerID, m.EmployeeID

FROM #ManagersEmployees m

    INNER JOIN @Tbl_Results t

        ON m.ManagerID = t.EmployeeID

WHERE rowid > @lastrow

 

SELECT @rowcnt = @@ROWCOUNT

 

SELECT @lastrow = @@IDENTITY - @rowcnt

END

SELECT ManagerID, EmployeeID FROM @Tbl_Results ORDER BY ManagerID, EmployeeID

GO

 

I tested this and got back the same results as with the first procedure with all the values I passed in. Deep breath on this one as it was pushing the limits of what I could produce on the spot in an interview.

 

That’s still not recursion

Again, while the results we correct this was not recursive. It was back to the drawing board once more. This time I had to admit defeat, however did tell the interviewer that I would work on a solution at home and email it in. He gave me his contact information, we completed the rest of the interview and I went home determined to get the right data in the manner that the interviewer wanted.

After a whole bunch of reading and a lot of work I finally came up with correct results in a recursive procedure which I emailed in to get feedback.

CREATE PROC ManagerRecursion @EmpID INT, @InnerLoop INT = 0

AS

BEGIN

    IF @InnerLoop = 0

        BEGIN

        CREATE TABLE #Tbl_Results (ManagerID INT, EmployeeID INT)

        END

            INSERT INTO #Tbl_Results (ManagerID, EmployeeID)

            SELECT ManagerID, EmployeeID FROM #ManagersEmployees WHERE ManagerID = @EmpID

 

            SET @EmpID = (SELECT MIN(EmployeeID) FROM #ManagersEmployees WHERE ManagerID = @EmpID)

 

            WHILE @EmpID IS NOT NULL

            BEGIN

                EXEC ManagerRecursion @EmpID, 1

                SET @EmpID = (SELECT MIN(EmployeeID) FROM #ManagersEmployees WHERE EmployeeID > @EmpID 

                    AND EmployeeID NOT IN (SELECT ManagerID FROM #Tbl_Results)

                    AND EmployeeID IN (SELECT EmployeeID FROM #Tbl_Results))

            END

    IF @InnerLoop = 0   

        BEGIN

        SELECT * FROM #Tbl_Results order by ManagerID, EmployeeID

        DROP TABLE #Tbl_Results

        END      

END

 

GO

 

Unfortunately no feedback was forthcoming. I felt good about providing this solution despite that. I enjoy a challenge and this was certainly one of those.

 

So what was the right way?

That depends on who you ask. For me the first way was the right was. It performed well, the code was clean and easy and required a minimum amount of development. I feel that the solution here was exactly the reason that CTEs were created in the first place.

The second solution was a lot more work, the query got more complex and it does not perform as well as the first.

The final procedure was true recursion in that the procedure calls itself over and over again until all of the results are returned. It’s a loop that makes cursors look like they perform well. It was easily the worst performing of the three.

 

It all goes to show there’s more than one way to get the results you need. It’s also interesting how an example like this shows just how much work the SQL development team have done to help reduce complexity and improve performance.

Upcoming Clustering Fun

I’ve a fun week upcoming, I get to do a bunch of installs of SQL 2008 R2 on some Windows 2008 R2 clusters.

 

I’ve had a lot of experience in building and working with Windows 2003 clusters and deploying SQL 2008, this is a new era for me and means new toys to play with and new things to learn. I’m excited to really get to grips with Windows 2008 clustering, there are some significant differences between it and 2003 which is going to provide some challenges. I’m aiming to perform quite a few build up and tear downs of one cluster over the next couple of weeks to build up my levels of comfort with the newer technology, and work towards getting some nice scripts together to do so.

 

As a consequence of Windows 2008 R2 being new to me for clustering I decided to pick up Pro SQL Server 2008 Failover Clustering by Allan Hirt (blog|twitter), a Clustering MVP and guru (who has a new whitepaper out about Applying Updates to a Clustered Instance of SQL Server 2008 or SQL Server 2008 R2). Allan was on the MidnightDBA web show a couple of weeks ago (go watch it) where there was lots of conversation on several aspects of High Availability.

 

I’m still only a quarter of the way through the book but have already found out some fantastic information on things such as the network priority and installing server roles from the command line.

 

As the build goes along I plan on putting up a couple of posts to track the process and gotchas as well as a final review of Allan’s book, so check back soon.

Checking Database Space With PowerShell

Have you ever been asked the question “quick, I need to know how much free space there is in the Nodrap database on server Cromulent”?

Ever wished that you could quickly grab that information?

You might well have a TSQL script that will grab that information for you. I’ve been working on pulling that information centrally, grabbing that data for multiple servers becomes a much simpler task when using PowerShell.

 

 

Why PowerShell and not TSQL?

For me the big bonus that PowerShell provides me for grabbing this data is that I can pull it quickly and easily, I can also simply run it for multiple machines and the same script works for versions SQL 2000 and up.

 

 

How big is my database?

Open up your favorite PowerShell ISE (you can even use the one that comes with Windows). If the aren’t already loaded you’ll need to add the SQL snapins

Add-PSSnapin SqlServer* 


 

 

To get a list of databases, their sizes and the space available in each on the local SQL instance.

dir SQLSERVER:SQLlocalhostdefaultdatabases | Select Name, Size, SpaceAvailable | ft -auto


 

CheckDBWPS SizeSpace

 

Very quick and easy. The information here is a little misleading though.

The Size is the size of the database and includes size of the transaction log(s). SpaceAvailable only represents the space available in the data files however, unless we’re looking at SQL 2000 in which case it’s the free space in the data and log files. To make things even more confusing Size is reported in MB and SpaceAvailable in KB.

To get more details we need to look at the transaction log information.

 

 

How big is my transaction log?

To grab this information we need to go deeper. In this case we will focus on the AdventureWorks database.

dir SQLSERVER:SQLlocalhostdefaultdatabasesAdventureWorkslogfiles | 

    Select Name, Size, UsedSpace | ft -auto


 

CheckDBWPS LogSpace

 

 This information is all in KB which helps have it make a little more sense.

 

Doing the math

Now we have the size of the database, the size of the log and the free space in each it’s some quick math which gives us the sizing information.

  • Size = Database Size in MB
  • Size / 1024 (from log) = Log size in MB
  • UsedSpace / 1024 = Log used in MB
  • (Size – UsedSpace) / 1024 = Log free in MB
  • Size – (Size / 1024) (from log) = Data files size in MB
  • SpaceAvailable / 1024 = Space free in data files
  • Size – (SpaceAvailable / 1024) – (Size / 1024) (from log) = Space used in data files
  • Size – ((SpaceAvailable / 1024) – ((Size – UsedSpace) / 1024) = Space used in data files (SQL 2000)

 

Outputting nice results

Running all this and grabbing the data from different areas can lead to messy results. This is where the PowerShell DataTable come to the rescue.

A DataTable is a PowerShell object, much like a SQL table that can hold data for you.

First you create the object, define and add columns and then add rows before finally returning the data.

Here’s a quick example:

$dataTable = New-Object system.Data.DataTable "SomeStuff"

#Specify the column names for the data table

$col1 = New-Object system.Data.DataColumn FunkyText,([string])

$col2 = New-Object system.Data.DataColumn JustANumber,([int])

$col3 = New-Object system.Data.DataColumn TodaysDate,([datetime])

 

#Add the columns to the data table

$dataTable.Columns.Add($col1)

$dataTable.Columns.Add($col2)

$dataTable.Columns.Add($col3)

#Create a new row

$row = $dataTable.NewRow()

#Add data for each column in the row

$row.FunkyText = "Something really funky"

$row.JustANumber = 1395

$row.TodaysDate = ((Get-Date ).ToString("yyyy-MM-dd HH:mm:ss"))

#Add the new row to the datatable

$dataTable.Rows.Add($row)

 

#Output the datatable

$dataTable | out-gridview


 

CheckDBWPS Datatable

 

 

Putting everything together

Running the following script will pull the data and log information for all the databases on the SQL instance specified in the top parameter. It also handles named instances without modification, and is factored to return the correct information for SQL 2000 as well as higher versions

Note: Run DBCC UPDATEUSAGE for SQL 2000 instances to ensure that the data is accurate

Param ($Servername= 'localhost')

  
Function Get-DBSizes ($Servername)

{

$Servername

#Check to see if it's a name instance, if it is the location will be different

if ($Servername -ilike "**") { $BaseLocation = "SQLSERVER:SQL$Servernamedatabases" }

    else { $BaseLocation = "SQLSERVER:SQL$Servernamedefaultdatabases" }

$dblist = dir $BaseLocation -Force | select Name

foreach ($db in $dblist)

{

    $location = $db.Name

    $locationFixed = $location -replace "[.]", "%2e"

    #Grab the database information

    $DBInfo = dir $BaseLocation -Force | Where-Object {$_.Name -eq $location; $_.Refresh()} | 

        select Name, 

                size,

                SpaceAvailable,

                CompatibilityLevel

 

    #Pull the log information

    #Use measure-object to sum up sizes in the event that we might have more than one log file

    $logsize = dir $BaseLocation$locationFixedlogfiles | Where-Object {$_.Name; $_.Refresh()} | Measure-Object -Property Size -Sum 

    $logused = dir $BaseLocation$locationFixedlogfiles | Where-Object {$_.Name; $_.Refresh()} | Measure-Object -Property UsedSpace -Sum 

    $sp = $DBInfo.SpaceAvailable

    $TotalDBSizeMB = $DBInfo.size

    $LogSizeMB = ($logsize.sum / 1024)

    $LogUsedMB = ($logused.Sum / 1024)

    $LogFreeMB = ($LogSizeMB - $LogUsedMB)

    $DataFilesSizeMB = ($TotalDBSizeMB - $LogSizeMB)

    $SpaceAvail = ($dbinfo.SpaceAvailable / 1024)

    
    #Because SQL2000 spaceavailable    includes log space we have to do different calculations depending on version

    #Run DBCC UPDATEUSAGE on your 2000 databases nightly to help ensure this data is accurate

    $Compat = $DBInfo.CompatibilityLevel

    if ($Compat -eq 'Version80') { $DataFileFreeMB = ($SpaceAvail - $LogFreeMB)    }

        else { $DataFileFreeMB = $SpaceAvail }

    

    

    $DataFileUsedMB = ($DataFilesSizeMB - $DataFileFreeMB)

    $DataFilePercentUsed = ($DataFileUsedMB / $DataFilesSizeMB) * 100

    $DataFilePercentFree = 100 - $DataFilePercentUsed 

    $LogPercentUsed = ($LogUsedMB / $LogSizeMB) * 100

    $LogPercentFree = 100 - $LogPercentUsed

    $date = (Get-Date ).ToString("yyyy-MM-dd HH:mm:ss")

    

    #Write the results into the data table

    $row = $dataTable.NewRow()

    $row.ServerName = $Servername

    $row.DatabaseName = $location

    $row.TotalDBSizeMB = $TotalDBSizeMB

    $row.DataFilesSizeMB = $DataFilesSizeMB

    $row.DataFilesUsedMB = $DataFileUsedMB

    $row.DataFilesFreeMB = $DataFileFreeMB

    $row.DataPercentUsed = $DataFilePercentUsed

    $row.DataPercentFree = $DataFilePercentFree

    $row.LogFilesSizeMB = $LogSizeMB

    $row.LogFilesUsedMB = $LogUsedMB

    $row.LogFilesFreeMB = $LogFreeMB

    $row.LogPercentUsed = $LogPercentUsed

    $row.LogPercentFree = $LogPercentFree

    $row.Date = $date

    $dataTable.Rows.Add($row)

    
    #And we are done

}

}

 
 

#Create data table to hold the results

$dataTable = New-Object system.Data.DataTable "Results"

#Specify the column names for the data table

$col1 = New-Object system.Data.DataColumn ServerName,([string])

$col2 = New-Object system.Data.DataColumn DatabaseName,([string])

$col3 = New-Object system.Data.DataColumn TotalDBSizeMB,([int])

$col4 = New-Object system.Data.DataColumn DataFilesSizeMB,([int])

$col5 = New-Object system.Data.DataColumn DataFilesUsedMB,([int])

$col6 = New-Object system.Data.DataColumn DataFilesFreeMB,([int])

$col7 = New-Object system.Data.DataColumn DataPercentUsed,([decimal])

$col8 = New-Object system.Data.DataColumn DataPercentFree,([decimal])

$col9 = New-Object system.Data.DataColumn LogFilesSizeMB,([int])

$col10 = New-Object system.Data.DataColumn LogFilesUsedMB,([int])

$col11 = New-Object system.Data.DataColumn LogFilesFreeMB,([int])

$col12 = New-Object system.Data.DataColumn LogPercentUsed,([decimal])

$col13 = New-Object system.Data.DataColumn LogPercentFree,([decimal])

$col14 = New-Object system.Data.DataColumn Date,([datetime])

#Add the columns to the data table

$dataTable.Columns.Add($col1)

$dataTable.Columns.Add($col2)

$dataTable.Columns.Add($col3)

$dataTable.Columns.Add($col4)

$dataTable.Columns.Add($col5)

$dataTable.Columns.Add($col6)

$dataTable.Columns.Add($col7)

$dataTable.Columns.Add($col8)

$dataTable.Columns.Add($col9)

$dataTable.Columns.Add($col10)

$dataTable.Columns.Add($col11)

$dataTable.Columns.Add($col12)

$dataTable.Columns.Add($col13)

$dataTable.Columns.Add($col14)

 
 

 
#Provide the name of the SQL server that we want to check

 
$WarningPreference = "silentlycontinue"

#Call the function to populate the results

#get-content c:Tempserverlist.txt | % {Get-DBSizes $_  } 

Get-DBSizes $Servername

 

#Output the results

$dataTable | Out-GridView

 

 

Click to embiggen

 

Final note

Download the PS1 file and save it on your machine, from there you can call the script and just pass in the SQL instance to get the results eg:

./Get-DatabaseSize.ps1 localhost
./Get-DatabaseSize.ps1 cromulentawesomesauce

In a follow up post I’ll show how we apply a minor tweak to the script and have the data loaded into a SQL database.

Guest Post On Hey, Scripting Guy! Blog

Recently I was contacted by Aaron Nelson (blog|twitter) who provided me an awesome opportunity to write a guest blog post for the Hey, Scripting Guy! Blog. Naturally I jumped at the chance and the post went live this week as a part of a week of SQL related post in honor of SQLRally which is coming up next week in Orlando.SQLRally

 

My post was on using PowerShell to report on SQL Server backup status. Go have a read and let me know what you think. With the rest of the week being dedicated to SQL and PowerShell don’t forget to keep checking the Hey, Scripting Guy! Blog.

Stop Logging Successful Backups

It’s great that SQL writes to the Event log and SQL log every time a backup completes. You get to see lots of data saying that database master was backed up and database model was backed up, then that msdb was backed up etc…

 

Is it really that useful?

Well, at least it can be useful to have in there. The thing is there are better ways to find out when your databases were last backed up (such as using PowerShell or querying MSDB).

The downside is that it quickly fills your logs with information that you don’t really need to know. It gets even worse if you perform frequent transaction logs backups. If you have three databases that are having their logs dumped to disk every 20 minutes all of a sudden your SQL log is next to useless. All you are seeing is backup success messages and not much else. It makes it all too easy to miss important the things that you really need to pay attention to.

 

There is hope

Yes, it’s true! You can stop SQL from dumping that information into the log.

By enabling trace flag T3226 you kill these messages dead. Worry not, any failure will still be written to the logs however all those pointless notifications will vanish. All of a sudden your logs will be cleaner and meaner, important things will stand out and your scroll wheel can take a break.

 

How to enable T3226

Open up SQL Configuration Manager and the SQL service. Under the Advanced tab you’ll see startup parameters. Just tack ;-T3226 on to the end, apply the changes and restart SQL.

 

The steps are further outlined at http://msdn.microsoft.com/en-us/library/ms345416.aspx and http://msdn.microsoft.com/en-us/library/ms190737.aspx.

 

Go ahead, clean up those log files…you know you want to.

Adding LUNs In VMware – A Cautionary Tale

Over the last 10 years plus of being a DBA I’ve performed LUN manipulation tasks on dozens of occasions with no errors or problems. Other than adding new disks to Windows 2003 clusters I’ve never had to take SQL Server offline to perform these tasks either.

 

A simple request

I needed a new drive to be added to a server I had just taken over in order to perform backups of some large databases. This is something that the SAN engineers and Windows support teams have a lot of experience doing and so we weren’t concerned that there might be any problems.

 

The uh-oh moment

I was running a couple of scripts on the SQL instance hosted on that server when all of a sudden I had a corruption errors show up. The dreaded error 823 sat there in red staring me in the face. I had just taken over this machine, there were no backups and right as we were trying to add a LUN to be able to perform those things went boom.

My stomach hit the floor.

 

Remembering the words of Douglas Adams

Don’t panic!

The last thing you want to do as a DBA when there’s a problem is panic. Panic I did not. I took a deep breath and walked down to speak with the SAN and Windows admins to find out what might have happened. They reported no problems, showed that the new LUN was on there and a new drive created ready for me to start performing backups.

My nerves calmed somewhat I went back to my desk and started trying to figure out what the problem was. The error logs were showing that the data files did not exist. Very strange as after connecting to a remote desktop session on the server I was able to see the drive and files without any issue even though SQL could not.

I started digging through the Windows event logs to see if anything stood out. Right at the time that the errors started happening a single error showed in the system event log:

Event Type: Error
Event Source: PlugPlayManager
Event Category: None
Event ID: 12
Description: The device ‘LUN SCSI Disk Device’ (SCSIDisk&Ven&Prod_LUN&Rev4&23432&3&2) disappeared from the system without first being prepared for removal.

The error stated that the LUN hosting the data files lost connectivity and came back online almost immediately.

 

The fix

With SQL not able to see the files or drive I had no choice but to restart the instance in the hope that things would come back online. With bated breath I restarted the service and waited to see what would happen. One by one the databases started coming back online and my pulse slowed a little. Once all the databases showed as recovered without error I kicked off a DBCC CHECKDB on each one so that I could feel comfortable that everything was good and happy. The DBCC execution took 7 hours and everything came back clean. Happy with this knowledge I kicked off a backup and went to bed.

 

So what happened?

Unlike any of the SQL instances I have worked with before this one was running on a VMware ESX virtual machine and the SQL LUNs, rather than being a part of the VM itself, were raw mapped drives. A raw mapped drive is handled using raw device mapping which creates a symbolic link from within the VMware file system to external LUN (at least that’s what my admin tells me and I believe him).

What we found was a quirk that when VMware performed a bus scan to find the new LUN and create the mapping it very briefly dropped connectivity to the existing raw mapped drive. This is what caused Windows to lose the drive and SQL to lose access to the files. The drive was down for about 3 seconds, more than enough time for SQL to recognize a problem.

 

Lesson learned

From now on I will be very careful with any kind of LUN changes on VMware machines with raw mapped drives. I’ll ensure that we schedule an outage and take SQL offline while any adjustments are made just in case something like this crops up again.

T-SQL Tuesday #10 – Applying Indexes To Replicated Tables

It’s T-SQL Tuesday time again! This month Michael J. Swart (blog|twitter) is hosting and has given us the topic of indexes.

I’ve previously written about how you can find indexes that are not being used and get their sizes. Not being allowed to reuse that meant that I had to come up with something else.

 In my previous post on calculating replication schema options I documented a way of being able to change article options in transactional replication so that you could apply all the non-clustered indexes on your subscribing table that exist on the publishing one. But what if you are using replication to offload reporting and you need to create indexes that don’t exist on the publishing table?

You could always manually apply these indexes any time that you perform a snapshot but this requires manual work and can easily be forgotten.

Fortunately there is another option, you can have replication automatically apply a post-snapshot SQL script which will create those indexes for you. Here’s how…

 

Create a basic publication (snapshot or transactional) and add a table article. Perform the snapshot and compare indexes between the tables (in this example I am using the person.address table from AdventureWorks2008).

After the snapshot here’s how the indexes stand

 

For this example we’re going to create a new index on Admindb.person.address. To do this first we need to generate the index creation script

 

USE AdminDB

GO

IF NOT EXISTS (SELECT * FROM sys.indexes WHERE object_id = OBJECT_ID(N‘[person].[address]’) AND name = N’NCIX_Zip’)
CREATE NONCLUSTERED INDEX [NCIX_Zip] ON [person].[address]
(
    [PostalCode] ASC
)
WITH (ONLINE = ON, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
GO

 

Save this script to C:ReplicationScriptsCreateReportingIndex.sql

Now we change the existing publication so that whenever a snapshot is taken it will apply this script when applied to the subscriber. This change can be done in a couple of ways.

The quick and easy way is to use sp_changepublication and just pass in the publication name, property change and location of the script.

 

USE AdventureWorks2008
GO

EXEC sp_changepublication    
@publication = ‘Post Snapshot Test’ 
, @property = N’post_snapshot_script’  ,
@value = ‘C:ReplicationScriptsCreateReportingIndex.sql’
  , @force_invalidate_snapshot = 1 –must invalidate existing snapshots for this to take effect
GO

 

This can also be done using the GUI (right click on the publication and select Properties)

 

 

When you next perform a snapshot then the script will be applied. We can check that this actually worked first by looking at replication monitor where it will tell you that the script was applied.

 

And then you can check the table itself and confirm that the index exists

 

There are a couple of final things to note.

Usually you will specify a unc path for the SQL script so that you just have to maintain a single copy. You just need to ensure that the subscriber is able to access the post-snapshot script location.

Post-snapshots are not limited to creating indexes, in fact you can do pretty much anything just so long as the Distribution Agent has the access;

  • Send yourself an email to let you know that the snapshot has been applied on the subscriber
  • Apply table permissions
  • Create views

If you use replication and find yourself manually reapplying indexes it’s worth evaluating whether using post-snapshot scripts might save you some work.