Monitor Automated Kernel Updates
Our AKS nodes ran into this bug (https://github.com/Azure/aks-engine/issues/1356), which completely stopped the automated AKS kernel updates that are supposed to take place nightly.
Updates were stalled for weeks until I double-checked the status due to this security notice (https://portal.msrc.microsoft.com/en-US/security-guidance/advisory/ADV190020), discovered the problem, and applied the work-around to get us past the broken kernel.
This problem should not have had to be discovered manually, but neither my team nor any team at Azure support was notified of the stuck kernel, to my knowledge. I would like failed node updates to cause a notification to be automatically sent to both my team and, ideally, Azure support. A monitor on the actual updater application would make sense, as well as some sort of configurable monitor that would check the node kernel versions periodically and notify if it falls behind the most recent version in the series by more than some configurable amount of time, defaulting to, say, 48 hours.