Make TrafficManager more robust when there's a major outage
We configured the TrafficManager to either point to deployement1 or deployment2, which live in different locations, depending on which is in the "prod" role (machines are allocated) or the "backup" role (machines are deallocated). Using the TrafficManager is attractive because the customer doesn't need to make any changes on their end; if we need to revise the TrafficManager to point to the other deployment, it's simple and straightforward to allocate the machines and run a few lines of PS code. However, during the recent extended outage in the South Central US, after allocating the backup machines, when we tried to revise the TrafficManager to switch to the backup machines, it hung and didn't switch over and never recovered on its own.
I would expect that this would be a critical use case for the TrafficManager. In times of crisis (this was a legitimate Azure crisis). TrafficManager needs to be more robust and operate properly. If its operation is dependent on a datacenter being up then its implementation is not robust enough.
Thank you for sharing your perspective and we apologize for any impact you may have faced. We are constantly working on maintaining the availability of this service including the leanings from the event you mentioned.