The ability to incorporate Top Processes utilizing CPU for individual CPU alerts in OMS
I would like to have the ability to generate a CPU alert for any system monitored by OMS. This alert should fire when the CPU reaches 100% for 30 minutes and has a CPU Queue greater than 10. When the alert condition is triggered, I would like for it to capture the top 10 processes using CPU at the time that the alert condition is reached. This is an example of the alerting capabilities that I need OMS to have. I will not have SCOM in this environment, just OMS. I am not able to incorporate secondary conditions into the alerting. I am also not able to provide valuable context to the alerts so that the person receiving the alert knows more about the potential root cause. I would also like to be able to create individual alerts that apply to every machine with OMS without having to create an alert for every computer in OMS.
Currently you can use Azure Alerts to trigger alerts for CPU usage based thresholds for any system. When the alert is triggered, as an action – since you seek advance use-case like details of top processes running; suggested approach is using DevOps options, meant for such scenarios and power users – such as yourself.
Leveraging Azure Runbook or Alerts Webhook when an alert is fired, can do the necessary debugging action. For example: Alerts can trigger via Webhook to Azure LogicApp, which can run a script/powershell to get necessary debug details of OS as well as your app and then use connector like email or slack – to push these details to necessary team.
For creating alerts which span across machines in OMS, suggest using the new query language appropriately – to aggregate or group results for your machines. Then configuring alerts for this query.