Support for allocating Azure Batch nodes using containers instead of VMs
It would be great if Azure Batch could use the upcoming Windows Containers to allocate new nodes as containers instead of VMs, with an appropriate increase in auto-scale formula evaluation frequency.
This could potentially reduce the time from job submission to executing at full capacity from the current 5-10 minutes to well under a minute.
These are some of the scenarios where this would benefit us:
In scenario one a customer might use hundreds of nodes to run a large job in a few minutes, but the wait for nodes to be allocated and then become available can tripple the time waiting for results.
In scenario two we have to keep some nodes running 24/7 just in case the user runs a small number of test simulations in preparation for a larger job, as we don't want them to wait 5-10 minutes for a few 30 second simulations to run. Being able to spin up a container in a few seconds would save us money here.
In scenario three, when a customer is running just a few test simulations in preparation for a larger job, we currently have to handle the situation where waiting for more nodes to spin up and running the jobs in parallel takes longer than just running them in series on the single available node, so the customer is forced to wait either way. Faster scaling with containers could make this a non-issue.
James Thurley commented
Clarification: A proposed way of implementing this would be to allow us to create a container host cluster which can be associated with multiple Azure Batch pools. The Azure Batch pools would then request containers from this central cluster when creating nodes.
This would enable us to manage the container host cluster to try and predict demand, increasing the likelihood that we can create containers quickly without waiting for additional VMs to be allocated.