Configurable back-end health check aggressiveness
Behind my frontdoor are two "back-ends", each consists of a single web app.
For each back-end I have configured a health check with interval of 120 seconds. My expectation was that this leads to roughly 30 requests per hour.
In reality, my application insights shows 64000 requests in the past 24 hours, that's more than 40 requests per minute! A live traffic log confirms this: I see health check requests come in almost every second...
With the current behavior there is hardly any correlation with the configured "Interval" setting.
It would be great if there was an option to tune down the health check aggressiveness for simple web applications.
Sandeep Kumar commented
The probes are so aggressive because of the backends being hit by N POP servers located through out the world (as described in the linked thread in the post). Why do all servers probe? So that they can each determine the latency of backends. If only 1 of them hit, it would be hard to route requests efficiently from all parts of the world.
I would recommend splitting the probes into two categories:
Type 1: Probes whose purpose is to check if the backend is up and running.
- These probes can be more frequent, let's say every 5 seconds, but only 1 (or very few) POP server should do this probe.
Type 2. Probes to help calculate latency.
- These probes should be done by all POP servers located through out the world, and we should be able to set it to much longer time interval than the current maximum of 255 seconds. The documentation should also explicitly mention that all N pop servers (not just 1) will hit the backend every X seconds.
- If latency based routing is not enabled, then this probe shouldn't be done.
Same with me
Paul LeBlond commented
Additionally, I've got a web API built using Azure Functions. I cannot use Front Door Service with my API because it keeps the functions running 24/7.
Just being able to configure health check time to 1/day for my functions would be fine. Even better, configure health check rate based on time of day (99% of our usage is 8AM-5PM). Even better than that, if Front Door Service looked at usage over time and dynamically adjusted health check rates accordingly. Put some of that machine learning to work!
Sebastian Groeneveld commented
What strikes me most is that with a configured health check interval of "120 seconds", your application will get health check requests every 1-2 seconds....
For my simple web app, the logging of the health checks has by far become the most expensive component...
I wouldn't consider entirely disabling AI for health checks a solution, hence I posted this request:
I have deleted my previous comment and removed my 3 votes because, as it turns out, there is a perfectly acceptable way of dealing with this issue. For people having the same problems here are the basics:
Application Insights allows you to mark requests from certain user agents as synthetic. In the full framework version of the SDK this is done in the ApplicationInsights.config file (look for SyntheticUserAgentTelemetryInitializer, or for more information about this file: https://docs.microsoft.com/en-us/azure/azure-monitor/app/configuration-with-applicationinsights-config). In the .NET Core version of the framework you might need to write a simple custom TelemetryInitializer yourself that inspects the request's user agent and marks the telemetry accordingly.
When everything that is synthetic is marked as such, you can add a simple custom ITelemetryProcessor implementation to filter out all synthetic requests. For an example, see: https://docs.microsoft.com/en-us/azure/azure-monitor/app/api-filtering-sampling
Aleksander Pawlak commented
I've enabled Azure Front Doors for a test/dev scenario to a web app that has application insights enabled. It started draining heavily as all frontdoors hearbeats are logged and are causing AppInsights log analytics to be pricey. There should be a way to disable that?