Mass Updating Job Step Retry Attempts

Sometimes I find that the SQL Server team make curious choices as regards default operations. When setting up replication by default the log reader and distribution agent job steps will retry 2,147,483,647 times before failing. Doing the math on the retry interval of 1 minute that means that the job wouldn’t actually hit a failed status for over 4000 years. While you can setup alerts around replication a lot of folks don’t do that and you could easily be in a situation where replication has been down for so long that you have to reinitialize the subscription and perform a snapshot to get things running again.

Having a great many distribution jobs running on my server I wanted to perform a mass update. It could have gone through and updated each job step but that would have taken me a couple of days. I needed a better way. Digging through the SQL Server Agent stored procedures I found that sp_update_jobstep would do the work for me. I just needed to pass in the relevant parameters and then restart the jobs so that the change took effect.

As such I ended up writing the following script. Just adjust the @retries variable to the value that you want to work with (in this instance it would be an hours worth of retries before the job failed). This script is limited to the Distribution job step but could be easily modified to manage the log reader also.

Running this script took two minutes and it updated 60 distribution agent jobs. I’m getting ready to kick it off on the rest of my servers.

SET NOCOUNT ON

      

DECLARE @retries INT = 60 --Set to the value that you want for retry attempts

 

DECLARE @JobIdUpd UNIQUEIDENTIFIER ,

    @JobStepUpd INT,

    @JobStatus INT,

    @JobName SYSNAME,

    @Message NVARCHAR(256)

 

 

--Create a temp table which will hold the results from xp_sqlagent_enum_jobs

--Note xp_sqlagent_enum_jobs is an undocumented stored procedure, it may not be there soon

CREATE TABLE #enumjobs

    (

      job_id UNIQUEIDENTIFIER NOT NULL ,

      last_run_date INT NOT NULL ,

      last_run_time INT NOT NULL ,

      next_run_date INT NOT NULL ,

      next_run_time INT NOT NULL ,

      next_run_schedule_id INT NOT NULL ,

      requested_to_run INT NOT NULL ,

      request_source INT NOT NULL ,

      request_source_id SYSNAME NULL ,

      running INT NOT NULL , 

      current_step INT NOT NULL ,

      current_retry_attempt INT NOT NULL ,

      job_state INT NOT NULL

    ) 

 

 

   

DECLARE UpdateJobsCrsr CURSOR

FOR

SELECT  sjs.job_id ,

        sjs.step_id ,

        sj.name

FROM    msdb.dbo.sysjobsteps sjs

        INNER JOIN msdb.dbo.sysjobs sj ON sjs.job_id = sj.job_id

WHERE   subsystem = 'Distribution'

        AND retry_attempts > 60

 

OPEN UpdateJobsCrsr ;

 

FETCH NEXT FROM UpdateJobsCrsr

INTO @JobIdUpd, @JobStepUpd,@JobName

WHILE ( @@FETCH_STATUS <> -1 ) 

    BEGIN

   

--Update the job step so that the number of retry attempts is set to the @retries variable                

EXEC msdb.dbo.sp_update_jobstep @job_id = @jobidUpd, @step_id = @jobstepUpd, @retry_attempts = @retries  

SELECT @Message = N'Job ' + @JobName + ' updated.'

RAISERROR (@Message, 0, 1) WITH nowait 

 

--We have to restart the job in order for this change to take effect

--Check to see if the job is running, if so stop and restart it, otherwise leave it alone

--Note, you'll need to be a sysadmin for this to work (or change up some stuff)

    TRUNCATE TABLE #enumjobs;

    INSERT INTO #enumjobs

                EXEC master.dbo.xp_sqlagent_enum_jobs 1, 'sa', @JobIdUpd;

                

SELECT @JobStatus = job_state FROM #enumjobs

 

IF @JobStatus <> 4 --This is the job idle state

    BEGIN

--Stop the job

        EXEC msdb.dbo.sp_stop_job @job_id = @JobIdUpd

--Wait a couple of seconds for the job to stop

        WAITFOR DELAY '00:00:03'        

--Start the job up again

        EXEC msdb.dbo.sp_start_job @job_id = @JobIdUpd    

    END

ELSE 

    BEGIN

        SELECT @Message = N'Job ' + @JobName + ' was not running, please start manually if required.'              

        RAISERROR (@Message, 0, 1) WITH NOWAIT

    END

                

        FETCH NEXT FROM UpdateJobsCrsr

            INTO @JobIdUpd, @JobStepUpd, @JobName

    END

 

 

CLOSE UpdateJobsCrsr ;

GO

DEALLOCATE UpdateJobsCrsr ;

GO

 

--Cleanup

DROP TABLE #enumjobs   

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s