Better fault tolerance and self-healing
We had an issue this month where every single web app on our app service plan started throwing 500 errors. After contacting support they informed us that it was most likely caused by an issue accessing the storage and to try enabling local cache settings so that the instances could use those in the case of a failure.
The solution that worked was to scale down to a lower level and then scale up again - from that point all of the sites were restored.
It would be better if the monitoring could detect and resolve this issue automatically so that the apps aren't left in a broken state without manual intervention. I was under the impression that Azure monitoring should automatically fail over to working hardware in cases like this.