Have you ever tried to debug a situation where notifications seem to be working but only part of the time in Operations Manager? I ran into this situation recently and found a couple of items which assist with this specifically for Operations Manager 2012. This blog post will cover:
- What is a resource pool?
- General recommendations to debug notifications send through a resource pool
- How to determine what server is active in a resource pool
What is a resource pool?
Resource Pools were added in Operations Manager 2012. As a quick definition for what they are: “A resource pool is a collection of management servers used to distribute work amongst themselves and take over work from a failed member.” – subset from http://technet.microsoft.com/en-us/library/hh230706.aspx. I would update that slightly to the following definition based upon this discussion that a gateway can also be a member of a resource pool: http://www.systemcentercentral.com/BlogDetails/tabid/143/IndexID/94138/Default.aspx . “A resource pool is a collection of management servers and/or gateway servers used to distribute work amongst themselves and take over work from a failed member.”
For additional readings on resource pools I recommend:
- https://www.catapultsystems.com/cfuller/archive/2012/07/24/automatic-and-manual-resource-pools-in-operations-manager-2012-scom-sysctr.aspx
- https://www.catapultsystems.com/cfuller/archive/2012/09/04/opsmgr-scom-resources-pools-what-they-do-not-do-sysctr.aspx
- http://www.systemcentercentral.com/quicktricks-can-a-gateway-be-a-member-of-a-resource-pool-or-is-a-pool-limited-to-only-management-servers/
General recommendations to debug notifications send through a resource pool:
The following is a summary of insights and best practices I’ve been combined based on feedback from most of the OpsMgr alpha-geeks on the planet. (Thank you Kevin, Scott, Flemming, Tao, Dieter and Kevin).
- Don’t forget first step debugs like doing a telnet to the SMTP server both by IP address and by name (verifying port connectivity and name resolution). This needs to be done from EVERY management server which is part of the notifications resource pool (which by default is all management servers in the management group). If specific servers cannot connect to the SMTP server because of firewall or routing restrictions consider removing these from the notifications resource pool (down to a minimum of two in a production environment when not debugging).
- Try locking down the notification pool down to 1 management server during debugging.
- The best logging will be the SMTP server logs. If you cannot get SMTP server logs, setup a network trace and look at the network data.
- Once alerting is configured, OpsMgr will alert if it is unable to send email. The only exception to this that was seen is where the SMTP server accepts the email and then discards it elsewhere.
- Try Tao’s email script (http://blog.tyang.org/2010/07/05/powershell-script-test-smtp/) adding a line to initialize the MOM api and write an event to the Operations Manager log that you sent an email.
- If the email is getting into the email server you may need to trace the emails through the email system itself.
- Loading up a local IIS relay for OpsMgr to go to first makes it easy to identify what emails are being sent and gives you access to the full logs for the first hop of the email transfer.
- For a small number of alerts you can also create a script which dumps everything to a text file with PowerShell (http://scug.be/dieter/2011/05/11/scom-dump-alerts-to-text-file-and-mail/) and then sends out a blat.exe or PowerShell mail. This gives you a file generated and timestamped so you can rule out OpsMgr.
- You may also want to check out the Papercut tool which is available at: papercut.codeplex.com
How to determine what server is active in a resource pool
Update: A member of the System Center Central community (Alex) provided an updated query which covers all application pools and has results which are more consistent than what I put together. Alex, thank you for writing this and for sharing it!
select
BaseManagedEntity.DisplayName
,cs.agent.AGentGuid
,cs.WorkFlowExecutionLocationAgent.AgentRowId
,cs.workflowexecutionlocation.WorkflowExecutionLocationRowId
,cs.workflowexecutionlocation.DisplayName
from cs.WorkFlowExecutionLocationAgent
inner join cs.workflowexecutionlocation
ON cs.WorkFlowExecutionLocationAgent.WorkFlowExecutionLocationAgentRowId = cs.workflowexecutionlocation.WorkflowExecutionLocationRowId
inner join CS.agent
ON CS.agent.AgentRowId=cs.WorkFlowExecutionLocationAgent.AgentRowId
inner join BaseManagedEntity
ON BaseManagedEntity.BaseManagedEntityId = CS.agent.AGentGuid
where cs.workflowexecutionlocation.DisplayName like '%Pool%'
Sample output results from my lab (single management server) and another lab are shown below: (truncated to only the two relevant fields – name of the management server and name of the resource pool)
OM01.cloud.pvt | AD Assignment Resource Pool |
OM01.cloud.pvt | All Management Servers Resource Pool |
OM01.cloud.pvt | Notifications Resource Pool |
OM1.CAT.DEMO | GSM Pool |
OM2.CAT.DEMO | Network Device Pool |
OM1.CAT.DEMO | AD Assignment Resource Pool |
OM2.CAT.DEMO | All Management Servers Resource Pool |
OM1.CAT.DEMO | Notifications Resource Pool |
Summary: I hope that this blog post provided some interesting insights into resource pools, how to debug them and how to determine who the active resource pool member is!
Update: Alexey Zhuravlev put together a sample pack on how to debug the active pool member which is available at: http://www.systemcentercentral.com/pack-catalog/demo-pool-owner/