Address Azure Functions Resiliency Weakness to Azure Service DNS issues
Azure DNS issue R50C-5RZ caused multiple core Azure function triggers to fail across our subscriptions due to loss of access to Storage Accounts.
The DNS issue made all our storage accounts temporarily unresolvable by DNS and because of the tight coupling with Azure functions, function triggers failed due to the inability to resolve storage account urls but did not recover when the DNS issue was corrected.
There were some Socketexception / 'remote host not found' errors logged in App insights but seemed to be no way to know that the trigger was in a failed state (unless a build up of unprocessed messages is detected). The functions wouldn't recover unless a manual restart was actioned.
Ideally the Azure functions should be tolerant of loss of access to the Storage Account and also clearly indicate status or loss of connectivity to the dependent storage account in the portal.
This is an awesome idea!
We’re very unlikely to be able to reduce our dependency on DNS, but we can generally handle failing triggers more gracefully.
“Handle temporary loss of a storage account without requiring a manual restart” is a scenario we’re targeting.
Keep the feedback coming!