We welcome user feedback and feature requests!

Add url-based probe health checks to Web Apps

When I spin up multiple instances of a Web App, and one of them goes unhealthy in the sense that it starts throwing HTTP 500 errors, I'd like the load balancer to take it out of the load, ideally by using load balancer probing.

This is achieved through WebRoles as detailed here: https://msdn.microsoft.com/library/azure/jj151530.aspx, but this functionality is not available in Web Apps. It would really help soften the blow of instance-based outages.

604 votes
Vote
Sign in
(thinking…)
Sign in with: Microsoft
Signed in as (Sign out)
You have left! (?) (thinking…)
Chad shared this idea  ·   ·  Flag idea as inappropriate…  ·  Admin →

26 comments

Sign in
(thinking…)
Sign in with: Microsoft
Signed in as (Sign out)
Submitting...
  • Manabu Suzuki commented  ·   ·  Flag as inappropriate

    Thank you for suggesting Proactive Auto Heal.
    I need to know why the instance has become unhealthy. Can I automatically save a memory dump before an unhealthy instance is restarted by Proactive Auto Heal?
    Microsoft support engineers say that if there is no memory dump, they can't investigate the cause of the unhealthy condition.

  • Manabu Suzuki commented  ·   ·  Flag as inappropriate

    When one instance of WebApps occur issue, if I can manually exclude it from load balance of WebApps, it's still good.

  • Manabu Suzuki commented  ·   ·  Flag as inappropriate

    When one instance of WebApps occur issue, I'd like to avoid customers seeing error message repeatedly. If I can exclude this instance from load balance of WebApps temporarily, customers can connect other WebApps and continue their work.

  • Mike commented  ·   ·  Flag as inappropriate

    Do you have ALWAYS on , in the web app in Application Settings? If so, Azure sends a test to ensure the app is still in memory every 5 mins. It sends it as an http request. If you put in a rewrite statement to https, this fixes the http 500 errors, if they are re-occurring every 5 mins evenly. Not sure why yours are occuring, but this fixed our metrics...when we saw them every 5 mins evenly.

  • Trevor commented  ·   ·  Flag as inappropriate

    I've got a few on-prem web apps that I would like to migrate to Azure App Service. It seems the Azure Load Balancer service can do this but why can't App Service? This is my greatest fear about switching. It almost seems that we'd get a faster solution to this by containerising and using Service Fabric Mesh to get the "serverless", once that service was GA'd. That is a massive workaround that may or may not work in practice here though.

  • Josep Planells commented  ·   ·  Flag as inappropriate

    It seems to me very difficult to achieve a trully high availability scenario with minimal downtimes with the current options. Is there any way to resolve in an automated way a single instance app crashed for any reason? We can not see how. LB does not help, since it continues delevering traffic to that crashed instance. Ahother option would be to detectect the issue with an instance in other monitoring system and restart it. But Auto Healing restarts all the instances, so we'll have downtime. So, no matter how many instances do you have, if one or any becomes irresponsive, there is no way to solve it.

  • Tore Lervik commented  ·   ·  Flag as inappropriate

    Simple web-tests that the LB can use would fix a lot of cases, and it would make app services much more robust.

    Yesterday a system went down because one out of 4 instances crashed. 3 Instances were working fine but because of how the LB operates this didn't matter much.

    Right now the only way to recover is to restart the whole app, or find the the instance that is broken and kill the correct process on that host (which is a pain to get right)
    Would be nice with an easier way to restart just a single instance: https://feedback.azure.com/forums/169385-web-apps/suggestions/32127793-add-ability-to-restart-a-specific-instance

  • Tuukka P commented  ·   ·  Flag as inappropriate

    Any updates on this? We have a fairly long web role instance startup as we need to load a large set of read-only data from blob storage into server memory, and during this time the node instance is not able to serve requests. A custom health probe allows us to direct requests to instances that are ready. We cannot move from web role instances to web app instances unless we have a mechanism to deal with this situation, and since classic cloud services do not support ARM, we are left with legacy solutions for deploying these services.

  • James Reategui commented  ·   ·  Flag as inappropriate

    You guys could get some ideas from Docker/Kubernetes as to how they handle Health Checks and killing pods in the set that go bad. If you guys can pull this off it would really take App Services to the next level. For us at least it would mean not having to work on migrating to Docker.

  • Ahmed commented  ·   ·  Flag as inappropriate

    Having a defective load balancer like this means we can't keep using Web Apps in production.

    We've encountered multi-hour long outages as well, including because an instance was bad because of Azure's side.

    Advertising defective products as production-ready does real damage to businesses. How are other features considered higher up in the pipeline?

  • Anonymous commented  ·   ·  Flag as inappropriate

    Just curious if there's been any movement on this issue? We're trying to deploy a spring based web app to an app service and experiencing the same issues. When 1 instance crashes the whole web app seems to go down. Would like to talk with somebody from the Azure product team to bounce ideas around and gain an understanding of how java apps work under the hood in an app service.

  • UserKG commented  ·   ·  Flag as inappropriate

    This is killing me. My worker web app takes a significant amount of time to start up in its WCF service constructor (reading data from SLOW d:\home Azure file share that all web app instances share). It seems the load balancer is directing traffic to it even before the constructor has finished. This is causing terrible latency when this instance is first rotated in. I'm now regretting not having gone with WebRoles. Please help!

← Previous 1

Feedback and Knowledge Base