Sometimes I find that the SQL Server team make curious choices as regards default operations. When setting up replication by default the log reader and distribution agent job steps will retry 2,147,483,647 times before failing. Doing the math on the retry interval of 1 minute that means that the job wouldn’t actually hit a failed status for over 4000 years. While you can setup alerts around replication a lot of folks don’t do that and you could easily be in a situation where replication has been down for so long that you have to reinitialize the subscription and perform a snapshot to get things running again.
Having a great many distribution jobs running on my server I wanted to perform a mass update. It could have gone through and updated each job step but that would have taken me a couple of days. I needed a better way. Digging through the SQL Server Agent stored procedures I found that sp_update_jobstep would do the work for me. I just needed to pass in the relevant parameters and then restart the jobs so that the change took effect.
As such I ended up writing the following script. Just adjust the @retries variable to the value that you want to work with (in this instance it would be an hours worth of retries before the job failed). This script is limited to the Distribution job step but could be easily modified to manage the log reader also.
Running this script took two minutes and it updated 60 distribution agent jobs. I’m getting ready to kick it off on the rest of my servers.