Support for Azure Monitor Autoscale
Provide support for Azure Monitor Autoscale that automatically adds instances based on a metric.
This would allow us to mitigate issues and create awareness about them without any manual action.
Bonus points if we can remove instances as well.
Please take a look at the Kubernetes cluster autoscaler and let us know if you see the need for additional capabilities or a different approach.
Tom Kerkhove commented
For those interested - There is a custom metrics adapter in the works - https://github.com/Azure/azure-k8s-metrics-adapter
Chris Geier commented
I like the progress. The article itself and autoscale functionality is not 100% clear what your configuration options are. What we are looking to do is to configure alerts or similar when the cluster itself is running low on resources. (Memory, Processor) then add additional nodes to add resources. once the node is in we would obviously have to likely increase the replicas as well to take advantage. It seems a lot of this would be possible through other Azure services such as automation, logic apps, events, etc. I have seen some people doing interesting things along those lines but its not super clean at this point.
Tom Kerkhove commented
Kubernetes Cluster Autoscaler looks ok but it would still be great to have integration with Azure Monitor Autoscale (https://docs.microsoft.com/en-us/azure/monitoring-and-diagnostics/monitoring-autoscale-get-started) as well.
This would allow us to have all auto-scaling configuration in one place instead of some parts in Azure Monitor and some parts on AKS instances.
Another benefit is that this also allows us to receive webhooks when the scaling is happening.
How Azure Monitor Autoscale scales my cluster is really up to you - This can be by using cluster autoscaler behind the scenes or simply using the existing "manual scale" but doing that for us.
AdminAKS Team (Admin, Microsoft Azure) commented
We're supporting node autoscaling using the Kubernetes cluster autoscaler, which scales out when pods are pending due to lack of resources and scales in when clusters are underutilized. The current support is documented here:
However, we're working on making this more natively integrated - eg. as a checkbox in the portal.