I’m looking for best practice in dealing with scheduled SQL Server Agent jobs in SQL Server 2012 availability groups. Maybe I missed something, however at the current state I feel that SQL Server Agent is not really integrated with this great SQL2012 feature.
How can I make a scheduled SQL agent job aware of a node switch? For example I have a job running on the primary node which loads data each hour. Now if the primary goes down, how can I activate the job on the secondary which now becomes primary?
If I schedule the job always on the secondary it fails because then the secondary is read-only.
asked Jun 24 ’13 at 8:26
Within your SQL Server Agent job, have some conditional logic to test for if the current instance is serving the particular role you are looking for on you availability group:
All this does is pull the current role of the local replica, and if it’s in the PRIMARY role, you can do whatever it is that your job needs to do if it is the primary replica. The ELSE block is optional, but it’s to handle possible logic if your local replica isn’t primary.
Of course, change ‘YourAvailabilityGroupName’ in the above query to your actual availability group name.
Don’t confuse availability groups with failover cluster instances. Whether the instance is the primary or secondary replica for a given availability group doesn’t affect server-level objects, like SQL Server Agent jobs and so on.
I’m aware of two concepts to accomplish this.
Prerequisite: Based on Thomas Stringer’s answer, I created two functions in the master db of our two servers:
Make a job terminate if it’s not executed on the primary replica
For this case, every job on both servers needs either of the following two code snippets as Step 1:
Check by group name:
Check by database name:
If you use this second one, beware of the system databases though – by definition they can not be part of any availability group, so it’ll always fail for those.
Both of these work out of the box for admin users. For non-admin users, you have to do add extra permissions, one of them suggested here :
If you set the failure action to Quit job reporting success on this first step, you won’t get the job log full of ugly red cross signs, for the main job they’ll turn into yellow warning signs instead.
From our experience, this is not ideal. We at first adopted this approach, but quickly lost track regarding finding jobs that actually had a problem, because all the secondary replica jobs cluttered the job log with warning messages.
What we then went for is:
If you adopt this concept, you’ll actually need to create two jobs per task you want to perform. The first one is the “proxy job” that checks if it’s being executed on the primary replica. If so, it starts the “worker job”, if not, it just gracefully ends without cluttering the log with warning or error messages.
While I personally don’t like the idea of having two jobs per task on every server, I think it’s definetly more maintainable, and you don’t have to set the failure action of the step to Quit job reporting success. which is a bit awkward.
For the jobs, we adopted a naming scheme. The proxy job is just called
This utilizes the svf_AgReplicaState function shown above, you could easily change that to check using the database name instead by calling the other function.
From within the only step of the proxy job, you call it like this:
This utilizes Tokens as shown here and here to get at the current job’s id. The procedure then gets the current job name from msdb, appends worker to it and starts the worker job using sp_start_job .
While this is still not ideal, it keeps the job logs more tidy and maintainable than the previous option. Also, you can always have the proxy job run with a sysadmin user, so adding any extra permissions isn’t necessary.
Rather than doing this on a per job basis (checking every job for the state of the server before deciding to continue), I’ve created a job running on both servers to check to see what state the server is in.
- If its primary, then enable any job that has a step targeting a database in the AG.
- If the server is secondary, disable any job targeting a database in the AG.
This approach provides a number of things
- it works on servers where there are no databases in AG (or a mix of Db’s in/out of AGs)
- anyone can create a new job and not have to worry about whether the db is in an AG (although they do have to remember to add the job to the other server)
- Allows each job to have a failure email that remains useful (all your jobs have failure emails right?)
- When viewing the history of a job, you actually get to see whether the job actually ran and did something (this being the primary), rather than seeing a long list of success that actually didn’t run anything (on the secondary)
This proc is executed every 15 mins on each server. (has the added bonus of appending a comment to inform people why the job was disabled)
Its not fool proof, but for overnight loads and hourly jobs it gets the job done.
answered May 10 ’16 at 20:47
I like this idea even better than my own approach, +1. Couldn t you have posted this three years ago? 😉 takrl Dec 19 ’16 at 12:47
Another way is to insert a step in each job, which should run first, with the following code:
Set this step to continue with the next step on success, and to quit the job reporting success on a failure.
I find it cleaner to add an extra step instead of adding extra logic to an existing step.
It is always better to create a new Job Step which checks if it is a Primary Replica then everything is fine to continue with the job execution else if it is a Secondary Replica then Stop the job. Do not fail the job else it will keep sending unnecessary notifications. Instead stop the job so that it is cancelled and no notifications are sent out whenever these jobs are executed on the Secondary Replica.
Below is the script to add a first step for a specific job.
Note to execute the script:
- Replace ‘XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX’ with Job_ID
- Replace ‘YYYYYYYYYYYYYYYYYYYYYYYYYY’ with Job_Name
Also note that this script should work well even on those servers which do not have availability groups. Will execute only for SQL Server versions 2012 and beyond.
answered May 30 at 5:28
2017 Stack Exchange, Inc