Support Event Hubs as stream type data input
Its easy to store stream data in Azure to skip many times data copy operation in each Azure Services. This is beautiful story to realize lambda-architecture in Azure.
Yes, please! This use case: Event Hubs Archive --Data Factory--> Data Lake Store <-- U-SQL ingest <--Scheduler is vital. Right now there are mucho blockers on the adl-a side, with support for Avro and the empty Avro files (file header, no blocks) generated by event hub capture.
Event better, it would be a wow moment to extract this data (from avro) into a table and automatically keep it up to date when new files arrive.
Iain Shepherd (ishepher) commented
Are you thinking, a one-step version of
Event Hubs Archive --Data Factory--> Data Lake Store <-- U-SQL ingest <--Scheduler
Now that I write it out like that, your suggestion does sound very useful..!
Nick Darvey commented
It'd be highly valuable to us to be able to pour messages coming from IOT Hub into Data Lake
Sachin C Sheth commented
Thanks for your suggestions. Had a few clarifying questions, so that we can understand your requirements better.
What is your description of a lambda architecture please? My understanding is that this done currently by having multiple readers forking data off from the message broker (EventHubs, Kafka etc.) to support a cold path and a hot path. How will supporting EventHubs as a stream input data type in Azure Data Lake make lambda architectures better?
Also, you indicated that you are data copy operations many times. Can you please explain where do you that and what stores do you copy to and what processing you do on each copy?
Azure Data Lake