Tag: Restores

The Curious Case of the Restore Slowdown

For quite some time I had been seeing a situation where database restores were slowing down after 50-60% of the restore was completed (you can output restore progress using WITH STATS = <some value 1-100> or use DMVs or sp_whoisactive). This was not really a big deal with small databases, it would just take an extra couple of minutes to complete. But when I had the need to restore a 16 TB database it became a real problem.

The restore would kick off at a decent rate giving me around 2 GB/sec (restoring over a WAN from multiple files to a server with 144 cores, 512 GB of RAM, and multiple data files on NVME storage). This was acceptable. Then, a little over halfway through the restore performance dropped to 200 MB/sec and then 100 MB/sec (this being viewable through perfmon)

Restore Performance Drop

This issue I could reproduce time and again. Something was up with the product.

In investigating I kicked off another restore and watched the process. Things started swimmingly reading from the 4 backup files on the NAS and processing fast. Then the restore seemed to complete reading 3 of those files and slow down dramatically reading the fourth one. At the same time I could see a single core on the database server being pegged at 100% usage.

There wasn’t anything I could do to resolve this issue (short of using a ridiculous number of backup files in an attempt to make the low perf period as short as possible).

Given that this had the potential for a real RTO/RPO problem it was time to call in the big guns over at Microsoft support.

We ran through a few more scenarios including taking stack dumps during the period when the restore was low (DBCC STACKDUMP <- NEVER EVER DO THIS ON A PROD SYSTEM).

Working through the dumps the MS support engineer found bug from last year whereby disks formatted with a 4K sector size could experience slow restore performance.

I checked up on my servers to confirm that we were using 4K using the built in FSUTIL command. In this case FSUTIL FSINFO NTFSINFO <drive>. This provides a wealth of useful information about the disk, and showed that it was indeed using a 4K sector size.

4K sector size

Fortunately this issue was fixed in October last year with

  • SQL Server 2014 Service Pack 3
  • SQL Server 2014 Service Pack 2 – Cumulative Update 12
  • SQL Server 2016 Service Pack 2 – Cumulative Update 1
  • SQL Server 2016 Service Pack 1 – Cumulative Update 9
  • SQL Server 2017 – Cumulative Update 7

The sad thing for me was that the particular instance I was restoring to was SQL Server 2012, for which no patch was released (as that version is out of mainstream support). But I was able to confirm that the slowdown did not happen when attempting to restore against an appropriately patched version of a fully supported version of SQL Server (time to put on the harder press for convincing management to allow us to upgrade).

So if you are seeing problems with restores getting slow and are on SQL Server 2014/2016/2017 get yourself patched up to date. If you are stuck on an older version either redo the low level formatting on your storage (not recommended), take uncompressed backups (mildly recommended), or work to get yourself on a fully supported version of the product.

Generic Database Restores

The other day I found myself in the position of needing to restore multiple databases to a server. In this case all the log files would be on one drive and all the data files on another. Rather than sit there and write restore scripts for ~20 databases I figured it would be quicker to create a quick wrapper that I could just pass the filename into a then let nature take its course.

This is very basic, no restore performance tuning enhancements for example, however it does just the job that I need it to.

If I was smart I would have made this a stored procedure and then iterated through the list of backups in the directory using powershell and then executed the proc for each one. Maybe next time.

Anyhow, view the script below or download Database Restores Generic.sql

 

SET NOCOUNT ON

 

DECLARE @File NVARCHAR(1000) = N'MyDatabase.bak'

DECLARE @Path NVARCHAR(1000) = N'D:SQLBackups'

DECLARE @MDFPath NVARCHAR(1000) = 'E:MSSQLData'

DECLARE @LDFPath NVARCHAR(1000) = 'L:MSSQLLogs'

 

DECLARE @FullLoc NVARCHAR(2000)= @Path + @File

 

DECLARE @DatabaseName NVARCHAR(128)

DECLARE @RestoreMDF NVARCHAR(2000)

DECLARE @RestoreLDF NVARCHAR(2000)

DECLARE @RestoreCommandFull NVARCHAR(4000)

DECLARE @sqlexec NVARCHAR(4000) 

 

DECLARE @MDFID INT

DECLARE @LDFID INT

DECLARE    @MDFName NVARCHAR(128)

DECLARE @LDFName NVARCHAR(128)

 

 

DECLARE @RestoreFiles TABLE

    (

      ID INT IDENTITY(1,1),

      LogicalName NVARCHAR(128) , PhysicalName NVARCHAR(260) , [Type] CHAR(1) , FileGroupName NVARCHAR(128) , [size] NUMERIC(20, 0) ,

      MAXSIZE NUMERIC(20, 0) , FileID BIGINT , CreateLSN NUMERIC(25, 0) , DropLSN NUMERIC(25, 0) , UniqueID UNIQUEIDENTIFIER ,

      ReadOnlyLSN NUMERIC(25, 0) , ReadWriteLSN NUMERIC(25, 0) , BackupSizeInBytes BIGINT , SourceBlockSize INT , FileGroupID INT ,

      LogGroupGUID UNIQUEIDENTIFIER , DifferentialBaseLSN NUMERIC(25, 0) , DifferentialBaseGUID UNIQUEIDENTIFIER ,

      IsReadOnly BIT , IsPresent BIT , TDEThumbprint VARBINARY(32)

    )

 

DECLARE @RestoreHeader TABLE

    (

        BackupName NVARCHAR(128), BackupDescription NVARCHAR(255), BackupType SMALLINT, ExpirationDate DATETIME,

        Compressed CHAR(1), POSITION SMALLINT, DeviceType TINYINT, UserName NVARCHAR(128), ServerName NVARCHAR(128),

        DatabaseName NVARCHAR(128), DatabaseVersion INT, DatabaseCreationDate DATETIME, BackupSize NUMERIC(20,0),

        FirstLSN NUMERIC(25,0), LastLSN NUMERIC(25,0), CheckpointLSN NUMERIC(25,0), DatabaseBackupLSN NUMERIC(25,0),

        BackupStartDate DATETIME, BackupFinishDate DATETIME, SortORder SMALLINT, [CodePage] SMALLINT, UnicodeLocaleId INT,

        UnicodeComparisonStyle INT, CompatabilityLevel TINYINT, SoftwareVendorId INT, SoftwareVersionMajor INT,

        SoftwareVersionMinor INT, SoftwareVersionBuild INT, MachineName NVARCHAR(128), Flags INT, BindingID UNIQUEIDENTIFIER,

        RecoveryForkID UNIQUEIDENTIFIER, [Collation] NVARCHAR(128), FamilyGUID UNIQUEIDENTIFIER, HasBulkLoggedData BIT,

        IsSnapshot BIT, IsReadOnly BIT, IsSingleUser BIT, HasBackupChecksums BIT, IsDamaged BIT, BeginsLogChain BIT,

        HasIncompleteMetaData BIT, IsForceOffline BIT, IsCopyOnly BIT, FirstRecoveryForkID UNIQUEIDENTIFIER,

        ForkPointLSN NUMERIC(25,0), RecoveryModel NVARCHAR(60), DifferentialBaseLSN NUMERIC(25,0), 

        DifferentialBaseGUID UNIQUEIDENTIFIER, BackupTypeDescription NVARCHAR(60), BackupSetGUID UNIQUEIDENTIFIER,

        CompressedBackupSize BIGINT

        )

 

SELECT @sqlexec = 'RESTORE FILELISTONLY FROM DISK = ''' + @FullLoc + ''''

INSERT INTO @RestoreFiles EXEC (@sqlexec)

 

 

SELECT @sqlexec = 'RESTORE HEADERONLY FROM DISK = ''' + @FullLoc + ''''

INSERT INTO @RestoreHeader EXEC (@sqlexec)

 

SELECT @DatabaseName = DatabaseName FROM @RestoreHeader

 

 

SELECT @MDFID = MIN(ID) FROM @RestoreFiles WHERE [Type] != 'L'

WHILE @MDFID IS NOT NULL

BEGIN

 

 

    IF @MDFID = 1

        BEGIN

            SELECT @RestoreMDF = 'WITH MOVE ' + CHAR(39) + LogicalName + CHAR(39) + ' TO ' + CHAR(39) + @MDFPath +  REVERSE(LEFT(REVERSE(PhysicalName), CHARINDEX('', REVERSE(PhysicalName))-1)) + CHAR(39) + CHAR(13) FROM @RestoreFiles WHERE ID = @MDFID

            END

    ELSE

        BEGIN

            SELECT @RestoreMDF = @RestoreMDF + ', MOVE ' + CHAR(39) + LogicalName + CHAR(39) + ' TO ' + CHAR(39) + @MDFPath +  REVERSE(LEFT(REVERSE(PhysicalName), CHARINDEX('', REVERSE(PhysicalName))-1)) + CHAR(39) FROM @RestoreFiles WHERE ID = @MDFID

        END

 

SELECT @MDFID = MIN(ID) FROM @RestoreFiles WHERE ID > @MDFID AND [Type] != 'L' 

END

 

 

SELECT @LDFID = MIN(ID) FROM @RestoreFiles WHERE [Type] = 'L'

WHILE @LDFID IS NOT NULL

BEGIN

    SELECT @RestoreLDF = ISNULL(@RestoreLDF,'') + ', MOVE ' + CHAR(39) +LogicalName + CHAR(39) + ' TO ' + CHAR(39) + @LDFPath +  REVERSE(LEFT(REVERSE(PhysicalName), CHARINDEX('', REVERSE(PhysicalName))-1)) + CHAR(39) FROM @RestoreFiles WHERE ID = @LDFID

 

SELECT @LDFID = MIN(ID) FROM @RestoreFiles WHERE ID > @LDFID AND [Type] = 'L' 

END

 

 

 

SELECT @RestoreCommandFull = 'RESTORE DATABASE ' + QUOTENAME(@DatabaseName) + ' ' + CHAR(13) + 'FROM DISK = ''' + @FullLoc + '''' + CHAR(13)

SELECT @RestoreCommandFull = @RestoreCommandFull + @RestoreMDF + @RestoreLDF + CHAR(13) + ', NORECOVERY, STATS=20;'

--PRINT @RestoreCommandFull

 

EXEC (@RestoreCommandFull)

 

Quickly Build Transaction Log Restore Commands With PowerShell

I’m in the middle of a server migration and so I’m setting up log shipping to get the databases consistent between the two servers prior to cutting over.

The backup copy over the WAN was taking a long time and so I had a lot of transaction log files that I needed to apply. I could have let log shipping take care of performing the log restores, but I found that a secondary process had created log backups also, just with different names. This lack of consistency was causing me a problem so I needed a quick way to build a restore command of all the files in order.

This was a quick win for PowerShell. I was able to quickly grab a list of the files in date order and pipe those to a string and build the restore command. This I was able to copy into SSMS and perform the restores.

$filelist = dir LogShippingPrimaryp$backup | where-object {$_.Extension -eq '.trn'} | Sort-Object {$_.CreationTime} | select Name

 

foreach ($file in $filelist)

{

$restorefile = $file.Name

 

$text = "restore log MyDatabase from disk = 'P:ForRestore$restorefile' with norecovery;"

Write-Output $text

}

 

If you wanted to take this a step further you could actually use invoke-sqlcmd to execute the $text string and perform the restore (don’t even contemplate doing that if you aren’t running at least SQL 2008 R2). I always prefer to do that within SSMS though.

While I’ve used this script for a migration task it also works really well if you need to perform a point in time restore for your database from a full backup and some transaction log backups. Can you imagine how long it would take you to build out the restore string to recover a database which had transaction log backups every 15 minutes for the last 20 hours? That’s 80 log files. Want to do that manually? This is a much nicer way to go.

Optimizing Database Restores

You can Google/Bing/AltaVista and get a lot of results for speeding up your backups. Those results will include using third-party tools, performing file and filegroup backups, differential backups and the like. Some of the more technical blogs will talk about adjusting some of the backup parameters to improve throughput.

There’s really not much written (that I’ve found, please add comments with links to posts you’ve found) that talks about tuning database restores.

I started messing with some restore parameters last week and found that it’s possible to make some dramatic performance improvements.

 

How this started

We were performing a data center migration at work. As a part of this migration we were going from SQL 2000 to SQL 2008R2 for some of our databases.

The database, we’ll call MyDB, is 620GB in size. There are 5 data files and a single log file. Current backups are performed using Snap technology, which wasn’t going to help us with performing a migration as we needed to get the data over and setup logshipping, so we needed to perform a regular backup.

Third party tools were not an option as this SQL instance was running SP3a and we weren’t able to find anything that would work below SP4.

A full backup was taken to local disk, it took a little over 9 hours (server and disk limitations prevented any work to try and improve that).

The backup file was transferred to a USB drive, taken to the new datacenter and copied over to a CIFS share where it would be available for performing restores.

 

Hardware/software setup

It’s going to help to have an idea of the configuration that’s being used here.

Hardware:

  • Server: HP DL585 G7 (clustered)
  • CPU: 4 x AMD 6176SE (2294Mhz, 12 cores) for a total of 48 cores
  • RAM: 128GB
  • Storage: Netapp SAN with WAFL filesystem – Fiber connected with 2 x 4Gb HBAs, MPIO configured for least queue depth
  • Network: 10Gb/s (HP NC375i) – 2 connections, configured for failover only, no load balancing
  • One LUN defined (as mount points) for each file in the database to be restored

 

Software:

  • SQL 2008R2 CU7 Enterprise Edition
  • Max server memory: 100GB
  • Affinity masks not set
  • Max degree of parallelism: 6
  • 4 x tempdb files

 

 

Basic restore

The initial restore was performed with default options, no optimization was performed.

RESTORE DATABASE MyDB FROM DISK = 'CIFSDeviceBackupSQLBackupsProductionForRestore.bak' 

WITH MOVE 'MyDB' TO 'L:MyDB_DataMSSQLDataMyDB_Data.mdf'

    , MOVE 'MyDB2' TO 'L:MyDB_Data2MSSQLDataMyDB_Data2.ndf'

    , MOVE 'MyDB3' TO 'L:MyDB_Data3MSSQLDataMyDB_Data3.ndf'

    , MOVE 'MyDB4' TO 'L:MyDB_Data4MSSQLDataMyDB_Data4.ndf'

    , MOVE 'MyDB5' TO 'L:MyDB_Data5MSSQLDataMyDB_Data5.ndf'

    , MOVE 'MyDB_Log' TO 'L:MyDB_LogsMSSQLDataMyDB_Log.ldf'

    , STATS = 5

    , NORECOVERY

 

Restore time was 2:48.

 

Optimization techniques

I knew that adjusting the MaxTransferSize and BufferCount settings could provide significant improvements in backup performance (go read a fantastic SQLCat article on this) and figured that the same would be true for performing restores.

Before I could start changing values I needed to know what values were being used with the basic restore. To do this I enabled trace flags 3213 (trace SQL activity during a backup/restore) & 3605 (output trace activity to the SQL error log).

DBCC TRACEON (3213, -1)

DBCC TRACEON (3605, -1)

 

With the flags enabled I performed another restore which provided the following information:

image

So by default we’re running with a BufferCount of 6 and MaxTransferSize of 1,024.

 

The MaxTransferSize value is the largest unit of transfer to be used between SQL Server and the backup media. This is specified in bytes and can range in value from 65536 (64 KB) to 4194304 (4 MB).

The BufferCount specifies the number of IO buffers to be used for the restore operation. Any positive number can be used for this, however you have to be careful with larger numbers as you could potentially cause out of memory errors.

 

Changing the MaxTransferSize

Not having any obvious hardware restrictions I decided that as the restore was being performed across the network adjusting the MaxTransferSize might provide the biggest benefit.

Feeling confident I just bumped it to the max and performed the restore.

RESTORE DATABASE MyDB FROM DISK = 'CIFSDeviceBackupSQLBackupsProductionForRestore.bak' 

WITH MOVE 'MyDB' TO 'L:MyDB_DataMSSQLDataMyDB_Data.mdf'

    , MOVE 'MyDB2' TO 'L:MyDB_Data2MSSQLDataMyDB_Data2.ndf'

    , MOVE 'MyDB3' TO 'L:MyDB_Data3MSSQLDataMyDB_Data3.ndf'

    , MOVE 'MyDB4' TO 'L:MyDB_Data4MSSQLDataMyDB_Data4.ndf'

    , MOVE 'MyDB5' TO 'L:MyDB_Data5MSSQLDataMyDB_Data5.ndf'

    , MOVE 'MyDB_Log' TO 'L:MyDB_LogsMSSQLDataMyDB_Log.ldf'

    , STATS = 5

    , NORECOVERY

    , MAXTRANSFERSIZE = 4194304 

 

This time the restore took 45 minutes, a reduction of 2 hours over the default.

 

Changing the BufferCount

Having already reduced the restore time by over 70% I wanted to see if I could get a little bit more performance from it. So I made an adjustment to the BufferCount value and set it to 75.

RESTORE DATABASE MyDB FROM DISK = 'CIFSDeviceBackupSQLBackupsProductionForRestore.bak' 

WITH MOVE 'MyDB' TO 'L:MyDB_DataMSSQLDataMyDB_Data.mdf'

    , MOVE 'MyDB2' TO 'L:MyDB_Data2MSSQLDataMyDB_Data2.ndf'

    , MOVE 'MyDB3' TO 'L:MyDB_Data3MSSQLDataMyDB_Data3.ndf'

    , MOVE 'MyDB4' TO 'L:MyDB_Data4MSSQLDataMyDB_Data4.ndf'

    , MOVE 'MyDB5' TO 'L:MyDB_Data5MSSQLDataMyDB_Data5.ndf'

    , MOVE 'MyDB_Log' TO 'L:MyDB_LogsMSSQLDataMyDB_Log.ldf'

    , STATS = 5

    , NORECOVERY

    , MAXTRANSFERSIZE = 4194304 

    , BUFFERCOUNT = 75

 

This restore took a little less time, 38 minutes, shaving another 7 minutes off the already shortened time. At this point I was down about 78% over the regular restore command.

I tried pushing the BufferCount to 100 to see if I could trim things further but found no appreciable difference and so left it at 75.

 

Comparing backup types

As a part of the move log shipping was to be setup to offload reporting for users and so I needed to perform a restore of the database from the new SQL 2008R2 server to another with exactly the same specifications.

The native SQL 2000 backup file was 520GB in size, with no optimizations it restored in 2:48, and optimized restored in just 38 minutes (not bad for a network restore).

The compressed SQL 2008R2 backup file was just 121GB in size. An optimized restore of this database took just 23 minutes. That’s an 86% saving over the unoptimized SQL 2000 based restore.

 

Summary

There are times when you are not able to perform partial restores, or you need to restore entire databases to a new environment in as short a time as possible. Don’t always accept the defaults that SQL Server will give you in these situations. There are things that you can do to improve the performance of your restores (and your backups). It’s worth your spending some time now working in your environment and optimizing your restores.

A little time playing with these things now could be the difference between your database being down for 20 minutes as opposed to 3 hours.

How much is 2 1/2 hours worth to your business?

 

 

Such fun.