How can we improve Microsoft Azure Stream Analytics?

Increase late arrival tolerance window

Right now, this dropdown list allows up to 20 days of tolerance.

We have a requirement to "catch up" on old events. If for some reason our Event Hub publisher is down, when we start sending old events again, the timestamps may go back to the start of the month.

It would be great to increase this tolerance level to a full month (31 days), if possible (or longer).

Is this 20 day limit a restriction of memory buffer used by Stream Analytics, or would this be a simple change in the GUI?

20 votes
Vote
Sign in
(thinking…)
Sign in with: Microsoft
Signed in as (Sign out)
You have left! (?) (thinking…)
Kirk Marple shared this idea  ·   ·  Flag idea as inappropriate…  ·  Admin →

3 comments

Sign in
(thinking…)
Sign in with: Microsoft
Signed in as (Sign out)
Submitting...
  • Johan Berghmans commented  ·   ·  Flag as inappropriate

    Could it be possible to add a setting to the late arrival pane in the stream analytics gui to disable the late arrival setting - for instance by using a "checkbox" control? (effectively setting it to -1 - as explained below in the ARM template setting). This would allow users to disable the late arrival setting for blob based stream inputs.

  • Andre Podnozov commented  ·   ·  Flag as inappropriate

    This limit can be removed using ARM templates. See 'eventsLateArrivalMaxDelayInSeconds' property here: https://docs.microsoft.com/en-us/azure/templates/microsoft.streamanalytics/streamingjobs

    "The maximum tolerable delay in seconds where events arriving late could be included. Supported range is -1 to 1814399 (20.23:59:59 days) and -1 is used to specify wait indefinitely. If the property is absent, it is interpreted to have a value of -1."

    Of course, the Portal also needs the ability to specify the 'wait indefinitely' value.

  • Nelson Morais commented  ·   ·  Flag as inappropriate

    We would also like to see this "limitation" removed.

    Besides using SA in live scenarios we would also like to use SA on scenarios where we need to input very old data and still use the nice SA time window functions to aggregate data.

    Situations like simulating past events by replaying old data into an event hub are an example on how we would like to use SA.

    Currently if we try to do these replays, the output of the timestamp of the old messages is the timestamp of the message upon entering in the event hub (or the message gets droped - depending on SA configuration) even if we use the Timestamp By functionality to point to one of the input fields that has the correct timestamp. This completely breaks the capacity to use SA on these scenarios.

    In our case, using the blob as the SA input and the timestamp of the blob storage is also not feasible because the timestamp on the blob might not be correct.

    Are there any workarounds to use SA on scenarios like these?

Feedback and Knowledge Base