Allow aligning fault domains and update domains in an availability set
In certain cases I want to manage the availability of data replicas across a cluster of machines, and can endure more downtime for greater predictability of the nature of downtime.
An example is Kafka, where I can handle a large portion of machines going down so long as I can make predictions of how they'll be correlated so I can plan how I lay out topic partitions to keep a majority of partitions up.
With the way availability sets are currently set up, I can limit my downtime in case of a fault, or in case of an upgrade, but I can't align my planning on fault and upgrade domains to provide higher level guarantees.
As an example: if I have a cluster of 9 machines, and 3 fault domains, and 3 upgrade domains. If I only cared about faults, I could have a replication factor of 3 and distribute them across fault domains and be fairly confident that a fault would only take out 1 replica and I would still have majority and the cluster stays healthy. If an upgrade comes along though, I no longer have those guarantees and may lose 2 or 3 replicas and my cluster will be down, even though I've only lost 1/3rd of machines.
The same holds true if you swap upgrade and fault domains and plan the other way.
If I do 7 replicas in the cluster, and distribute across fault and upgrade domain combinations, I can guarantee that if any 3 machines go down (across either FD or UD), I'll still have majority, but this brings more than double cost and complexity.
If I could have my availability set configured such that all my machines had their Fault and Upgrade domains aligned (such that being in FD 1 guarantees being in UD 1, etc), then I can guarantee availability with only 3 replicas (in the above example) in the Upgrade or Fault case.