Support selection of nodes to remove when scaling down
Support selection of nodes to remove when scaling down.
At the moment when you scale down then the oldest nodes will always stay. Would be a nice feature to be able to remove a selected node when scaling down or maybe manually.

12 comments
-
Amit Gupta commented
This feature is definitely required as oldest node to be scale down instead of earliest one.
In node, there are multiple pods and usually one of them require to scale up or down.
In this situation, when pod scale up and node too and after cool down time, node with only new pod remains and node with all other pods destroyed which result in temporary outage for application. -
Andy Dysart commented
Definitely would like this feature. I have a 6 node cluster and node 1 is seriously sick and will not successfully start up any pod that gets assigned to it. I'd like to just recycle that node, but there doesn't appear to be any good way to do that.
-
Benoit Delville commented
This feature would still be useful today
-
Kaat, Robert te commented
AKS is simply not stable enough to never have nodes behaving unexpectedly. Being able to recycle the node (not restart, but remove it and deploy a new one) would be a solution.
-
Inge Knudsen commented
We see the same need now, as the node disk runs out of storage because docker fails to garbage collect images. Although we have a manual workaround to replace the node, it would be nice if we could be given the option to recycle/replace a node automatically when a condition is met indicating that the node is getting unhealthy
-
Chris Egan commented
recently hit an issue with a long running cluster running out of Ephemeral Storage, tried scaling up and down again and it created a new node that worked then removed the working node again, at the moment without being able to select which node to remove to allow me to cycle them there is no way for me to fix this, cluster is on latest version so i cant use the upgrade trick to cycle the nodes
-
Jean-Francois Borie commented
I would like this feature. When a node is in a bad state like bellow. I get a lot of disk pressure, and cannot recycle the bad node.
Or delete oldest node.
Warning ImageGCFailed 4m (x9023 over 51d) kubelet, aks-agentpool-11593595-0 (combined from similar events): wanted to free 2858247782 bytes, but freed 0 bytes space with errors in image deletion: [rpc error: code = Unknown desc = Error response from daemon: conflic
t: unable to delete 1fe6774e5e9e (cannot be forced) - image has dependent child images, rpc error: code = Unknown desc = Error response from daemon: conflict: unable to delete 568c4670fa80 (cannot be forced) - image has dependent child images, rpc error: code = Unknown d
esc = Error response from daemon: conflict: unable to delete 45f863657427 (must be forced) - image is being used by stopped container a5535e99b357, rpc error: code = Unknown desc = Error response from daemon: conflict: unable to delete 0cc176312b6c (cannot be forced) - i
mage has dependent child images, rpc error: code = Unknown desc = Error response from daemon: conflict: unable to delete 8b62259d6e5a (must be forced) - image is being used by stopped container 526fb62ea1c5, rpc error: code = Unknown desc = Error response from daemon: co
nflict: unable to delete 0f0674bb09c7 (must be forced) - image is being used by stopped container 4c5890d6ee80, rpc error: code = Unknown desc = Error response from daemon: conflict: unable to delete ce450e0bd7ef (must be forced) - image is being used by stopped containe
r 4ba722bd63b3, rpc error: code = Unknown desc = Error response from daemon: conflict: unable to delete 740fca4c5301 (must be forced) - image is being used by stopped container 7abcf6fb30b1, rpc error: code = Unknown desc = Error response from daemon: conflict: unable to
delete 2f4511f25c5a (must be forced) - image is being used by stopped container 346764e4242a, rpc error: code = Unknown desc = Error response from daemon: conflict: unable to delete 445f4bf19852 (must be forced) - image is being used by stopped container b6ce7f63b130, r
pc error: code = Unknown desc = Error response from daemon: conflict: unable to delete 402292e7b610 (must be forced) - image is being used by stopped container 0dd7e77fc759] -
jtasipit commented
This feature is really a must. Its a big limitation not to have this feature.
-
Juan Jacobs commented
I would really appreciate this feature
-
Anonymous commented
either allowing us to choose which nodes to delete when scaling down, or having the oldest nodes deleted during a scale down operation would both be better than the current behavior (newest nodes are deleted during scaledown)
-
Anonymous commented
I agree with the comment. I've got a node which has run out of diskspace. I just want to scale up and back down to get rid of the oldest node. As it stands, how can I *ever* delete the first/oldest node?
-
Anonymous commented
I would recommend by default removing the oldest nodes upon scaling down