Mass Updating Job Step Retry Attempts

Sometimes I find that the SQL Server team make curious choices as regards default operations. When setting up replication by default the log reader and distribution agent job steps will retry 2,147,483,647 times before failing. Doing the math on the retry interval of 1 minute that means that the job wouldn’t actually hit a failed status for over 4000 years. While you can setup alerts around replication a lot of folks don’t do that and you could easily be in a situation where replication has been down for so long that you have to reinitialize the subscription and perform a snapshot to get things running again.

Having a great many distribution jobs running on my server I wanted to perform a mass update. It could have gone through and updated each job step but that would have taken me a couple of days. I needed a better way. Digging through the SQL Server Agent stored procedures I found that sp_update_jobstep would do the work for me. I just needed to pass in the relevant parameters and then restart the jobs so that the change took effect.

As such I ended up writing the following script. Just adjust the @retries variable to the value that you want to work with (in this instance it would be an hours worth of retries before the job failed). This script is limited to the Distribution job step but could be easily modified to manage the log reader also.

Running this script took two minutes and it updated 60 distribution agent jobs. I’m getting ready to kick it off on the rest of my servers.

SET NOCOUNT ON
      
DECLARE @retries INT = 60 --Set to the value that you want for retry attempts
 
DECLARE @JobIdUpd UNIQUEIDENTIFIER ,
    @JobStepUpd INT,
    @JobStatus INT,
    @JobName SYSNAME,
    @Message NVARCHAR(256)
 
 
--Create a temp table which will hold the results from xp_sqlagent_enum_jobs
--Note xp_sqlagent_enum_jobs is an undocumented stored procedure, it may not be there soon
CREATE TABLE #enumjobs
    (
      job_id UNIQUEIDENTIFIER NOT NULL ,
      last_run_date INT NOT NULL ,
      last_run_time INT NOT NULL ,
      next_run_date INT NOT NULL ,
      next_run_time INT NOT NULL ,
      next_run_schedule_id INT NOT NULL ,
      requested_to_run INT NOT NULL ,
      request_source INT NOT NULL ,
      request_source_id SYSNAME NULL ,
      running INT NOT NULL , 
      current_step INT NOT NULL ,
      current_retry_attempt INT NOT NULL ,
      job_state INT NOT NULL
    ) 
 
 
   
DECLARE UpdateJobsCrsr CURSOR
FOR
SELECT  sjs.job_id ,
        sjs.step_id ,
        sj.name
FROM    msdb.dbo.sysjobsteps sjs
        INNER JOIN msdb.dbo.sysjobs sj ON sjs.job_id = sj.job_id
WHERE   subsystem = 'Distribution'
        AND retry_attempts > 60
 
OPEN UpdateJobsCrsr ;
 
FETCH NEXT FROM UpdateJobsCrsr
INTO @JobIdUpd, @JobStepUpd,@JobName
WHILE ( @@FETCH_STATUS <> -1 ) 
    BEGIN
   
--Update the job step so that the number of retry attempts is set to the @retries variable                
EXEC msdb.dbo.sp_update_jobstep @job_id = @jobidUpd, @step_id = @jobstepUpd, @retry_attempts = @retries  
SELECT @Message = N'Job ' + @JobName + ' updated.'
RAISERROR (@Message, 0, 1) WITH nowait 
 
--We have to restart the job in order for this change to take effect
--Check to see if the job is running, if so stop and restart it, otherwise leave it alone
--Note, you'll need to be a sysadmin for this to work (or change up some stuff)
    TRUNCATE TABLE #enumjobs;
    INSERT INTO #enumjobs
                EXEC master.dbo.xp_sqlagent_enum_jobs 1, 'sa', @JobIdUpd;
                
SELECT @JobStatus = job_state FROM #enumjobs
 
IF @JobStatus <> 4 --This is the job idle state
    BEGIN
--Stop the job
        EXEC msdb.dbo.sp_stop_job @job_id = @JobIdUpd
--Wait a couple of seconds for the job to stop
        WAITFOR DELAY '00:00:03'        
--Start the job up again
        EXEC msdb.dbo.sp_start_job @job_id = @JobIdUpd    
    END
ELSE 
    BEGIN
        SELECT @Message = N'Job ' + @JobName + ' was not running, please start manually if required.'              
        RAISERROR (@Message, 0, 1) WITH NOWAIT
    END
                
        FETCH NEXT FROM UpdateJobsCrsr
            INTO @JobIdUpd, @JobStepUpd, @JobName
    END
 
 
CLOSE UpdateJobsCrsr ;
GO
DEALLOCATE UpdateJobsCrsr ;
GO
 
--Cleanup
DROP TABLE #enumjobs   

SirSQL.net

Mass Updating Job Step Retry Attempts

Leave a comment Cancel reply

Share this:

Related

Leave a comment Cancel reply