Over the past couple of months, I’ve experienced an issue where an Azure Update Management deployment will fail to run on several servers. When I look at the deployment history these servers where listed with the status of Failed to start.
Checking the logs for the individual servers, I see that they all had an exception message that stated, “Job was suspended. For additional troubleshooting, check the Microsoft-SMA event logs on the computers in the Hybrid Runbook Worker Group that tried to run this job.”
So, I checked the Microsoft-SMA event logs on the computers and found they all had an error event id 15105 with the task category of HybridErrorWhilePollingQueue. In each case the event showed that the remote server returned an error: (401) Unauthorized.
Going back into the Azure Automation Account I checked the System Hybrid Workers and saw that all the machines that where having this problem, had a Last Seen Time of over a month ago. Anything over 60 minutes is considered to be in a troubled state by Update Management.
The Fix for the failed start in Azure Update Management
To resolve this issue, you have to remove the device as a Hybrid Worker in Azure Automation. After doing this, it will automatically add itself back as a Hybrid Worker and was able to run update deployments again.
- Connect to the server and run the script below to stop the Microsoft Management Agent, clear the cache, and remove the Hybrid Worker configuration.
123Stop-Service -Name HealthService Remove-Item -Path ‘C:Program FilesMicrosoft Monitoring AgentAgentHealth Service State’ -Recurse Remove-Item -Path “HKLM:softwaremicrosofthybridrunbookworker” -Recurse -Force
- Open the Azure Portal and navigate to the Automation Account with the Update Management solution
- Open the Hybrid Worker Group blade and select the System Hybrid Worker Groups
- Select the server that you are removing and click Delete
Note: Alternatively, you can use the Remove-AzureRmAutomationHybridWorkerGroup cmdlet if you prefer
- Restart the Microsoft Management Agent on the server
Start-Service -Name HealthService
After 5-15 minutes you should see the server reappear on the System Hybrid Worker Groups list in Azure. Once it does you are good to go.