Provide out of the box resource governance for Kusto clusters
Our Kusto cluster is used for production site and tools and needs to have 99.99% availability. However, as our cluster is open to tens of thousands of users across the company for ad-hoc queries, we often suffer from unexpected cluster wide outages due to misbehaved customers. We have tried both cluster-wide query limits as well as customized approaches, however, neither is helping us much. We would like to see Kusto to provide the resource governance solution out of box.
Chango V. commented
Same issue as Simon's. We own a cluster where we publish valuable data for the bigger organization. We are having increasingly hard time keeping the various consumers in check. It's manual effort, chasing the offenders when we identify them, asking them to fix their queries or slow down.
What's need is per user and per app id throttling features. We should also be able to set limits on the % of cluster or node capacity a user can consume in a unit of time.
Something as simple as what ARM API does with a cap of requests per minute per user would already be a good step toward preventing unhealthy clients from taking down clusters.
Simon Liu commented
One idea I recently learned on how RedShift does this is that they provide separate queues. Users can set policy and allocate resources based on the queue.
Scott Hinckley commented
Agreed. Needs to be at the user/app level as well as per-query limitations that can be set on CPU and memory usage.