Increase late arrival tolerance window
Right now, this dropdown list allows up to 20 days of tolerance.
We have a requirement to "catch up" on old events. If for some reason our Event Hub publisher is down, when we start sending old events again, the timestamps may go back to the start of the month.
It would be great to increase this tolerance level to a full month (31 days), if possible (or longer).
Is this 20 day limit a restriction of memory buffer used by Stream Analytics, or would this be a simple change in the GUI?
David Garza commented
One workaround for my use case is that I replay data from a while ago and I sub-group each window by a interval timestamp. Although Late Arrival policy adjusts the timestamp to be the earliest allowed, by sub-grouping I gain the smaller chunk aggregates I am looking for. Just an idea.
COUNT(*) as windowCount
(DATEPART(SECOND, timestamp) / 10), -- sub-group by 10 second
) AS intervalGroup,
(DATEPART(SECOND, timestamp) / 10), -- sub-group by 10 second intervals
Berghmans, Johan commented
Could it be possible to add a setting to the late arrival pane in the stream analytics gui to disable the late arrival setting - for instance by using a "checkbox" control? (effectively setting it to -1 - as explained below in the ARM template setting). This would allow users to disable the late arrival setting for blob based stream inputs.
Andre Podnozov commented
This limit can be removed using ARM templates. See 'eventsLateArrivalMaxDelayInSeconds' property here: https://docs.microsoft.com/en-us/azure/templates/microsoft.streamanalytics/streamingjobs
"The maximum tolerable delay in seconds where events arriving late could be included. Supported range is -1 to 1814399 (20.23:59:59 days) and -1 is used to specify wait indefinitely. If the property is absent, it is interpreted to have a value of -1."
Of course, the Portal also needs the ability to specify the 'wait indefinitely' value.
Nelson Morais commented
We would also like to see this "limitation" removed.
Besides using SA in live scenarios we would also like to use SA on scenarios where we need to input very old data and still use the nice SA time window functions to aggregate data.
Situations like simulating past events by replaying old data into an event hub are an example on how we would like to use SA.
Currently if we try to do these replays, the output of the timestamp of the old messages is the timestamp of the message upon entering in the event hub (or the message gets droped - depending on SA configuration) even if we use the Timestamp By functionality to point to one of the input fields that has the correct timestamp. This completely breaks the capacity to use SA on these scenarios.
In our case, using the blob as the SA input and the timestamp of the blob storage is also not feasible because the timestamp on the blob might not be correct.
Are there any workarounds to use SA on scenarios like these?